InfluxDB 3 Open Source Now in Public Alpha Under MIT/Apache 2 License

Navigate to:

New InfluxDB 3 Core and InfluxDB 3 Enterprise products now available for alpha testing.

Today we’re excited to announce the alpha release of InfluxDB 3 Core (download), the new open source product in the InfluxDB 3 product line along with InfluxDB 3 Enterprise (download), a commercial version that builds on Core’s foundation. InfluxDB 3 Core is a recent-data engine for time series and event data. InfluxDB 3 Enterprise adds historical query capability, read replicas, high availability, scalability, and fine-grained security.

For open source, we knew it was important to build a product that could run as a single process that would be easy to set up and start using right away. We also realized that many of our customers wanted an operationally simple database as an option either instead of or in addition to our scalable distributed system. The result is InfluxDB 3 Core in the open source (dual-licensed under MIT or Apache 2), and InfluxDB 3 Enterprise, a commercial version of the core open source offering.

These products are built on more than four years of development and powered by the FDAP stack—Apache Flight, DataFusion, Arrow, and Parquet—and delivered on our rebuilt time series database architecture. They deliver all the key capabilities of InfluxDB 3, including unlimited cardinality, native object storage support, and a powerful SQL query engine, while maintaining our commitment to the open source community.

I’ll dive into both products, focusing on how they address critical gaps in the developer toolset for time series data. I’ll also address several related topics in detail, outlined below:

  1. InfluxDB 3.0 open source will be called InfluxDB 3 Core, a recent-data engine persisting Parquet files and enabling queries against the last 72 hours of data. Development of Core will carry on under the permissive MIT or Apache 2 license
  2. In addition to releasing InfluxDB 3 Core, we are releasing InfluxDB 3 Enterprise, a commercial version of the open source Core product.
  3. Key features of Core and Enterprise and the unique spot they fill in the time series toolset with diskless architecture, fast recent data processing, and embedded Python for plugins and triggers.
  4. Compatibility with previous InfluxDB versions, migration tooling, and what to expect when upgrading from 1.x/2.x.
  5. Our commitment to open source, permissive licensing, and InfluxData’s continued focus on maintaining a clear distinction between open source and commercial products.

If you’re interested in downloading and using the software right away, you can find the getting started guides for both Core and Enterprise. You can find the open source repo for InfluxDB 3 Core here. During the alpha period, InfluxDB 3 Enterprise can be accessed with a free, time-limited trial. For Enterprise, we only ask for your email address during the setup process so you can get started without talking to anyone.

InfluxDB 3 Core key features

InfluxDB 3 Core gives developers a new tool for time series data management—a high-performance recent-data engine optimized for querying the last 72 hours of data. This focused approach enables Core to deliver exceptional performance for real-time monitoring, data collection, and streaming analytics use cases. By optimizing specifically for this pattern, we’ve achieved query response times under 10ms for last-value queries and under 50ms for hour-long ranges.

“By optimizing for the most common use cases, we've created a system that delivers exceptional performance while remaining truly open source under permissive licensing.”

InfluxDB 3 Core is designed to operate either with a local disk and no dependencies or “diskless,” using object storage (e.g., S3) for all data. Paired with an embedded Python VM for writing plugins and a last value and distinct value cache, InfluxDB 3 Core is a useful data collector, monitoring agent, and recent time series database that persists data into Parquet files for long-term storage and access by third-party systems.

Diskless Architecture

A key feature of Core (and Enterprise) is their ability to operate in “diskless” mode, using object storage as the only persistence layer. While they maintain the ability to operate with only a local disk, the option to run statelessly using only object storage enables more dynamic operating environments. In these environments, data can be accessed seamlessly by third-party systems that can read from the object store.

Writes into the database are validated and buffered into an in-memory WAL that is flushed once per second to object storage. Writers can either receive a response after flush, guaranteeing durability, or receive an immediate response after validation. After being flushed, this data is put into an in-memory Arrow buffer that is queryable.

The WAL is periodically snapshotted, persisting the in-memory Arrow buffers to Parquet files on object storage. This process deletes the WAL files with data persisted to Parquet and writes snapshot files containing a summary of what was persisted. This keeps the WAL size down and manageable.

Third-party query engines, data lakes, and data warehouses can directly query the Parquet files that Core lands in object storage, giving users more ways to access their historical time series data.

Each host that writes data into object storage persists all files with a path starting with a unique identifier assigned by the user at startup time. Because all data is kept in object storage, we get all the benefits that come along with that: Multi-AZ durability guarantees, backup utilities, and the entire ecosystem of third-party object storage tooling. If a writing host goes down for some reason, a new host can be spun up with the identifier of the old host and pick up where it left off.

Third-party query engines, data lakes, and data warehouses can directly query the Parquet files that Core lands in object storage, giving users more ways to access their historical time series data. We chose Parquet as the persistence format specifically because of its broad adoption in the data ecosystem. This has become even more important as the Iceberg Catalog format has gained in popularity. InfluxDB 3 makes a great agent for landing real-time data in object storage and Iceberg Catalogs.

Fast Recent Data

InfluxDB 3 has features designed for fast access to recent data. This includes the in-memory buffer, Parquet cache, Last Value Cache, and Distinct Value Cache. Our performance targets are to query last values and distinct values in under 10 milliseconds, the last hour in under 50 milliseconds, and queries up to 72 hours in the past in less than a few hundred milliseconds. Making this possible with object storage used for persistence means we have a variety of in-memory caches.

The in-memory buffer serves as the fast query path for data in the WAL that has not yet been converted to Parquet and persisted. It is kept in the Arrow format in builders and appended to as data arrives. As data is snapshotted from the WAL, buffered to Parquet, and persisted to object storage, we write it into an in-memory Parquet cache before clearing it from the buffer. This means that for recent data, we should never have to touch object storage to answer a query.

The Last Value Cache is a new feature that lets users configure the database to cache the last N values seen for individual series, specific column values, or on a hierarchy. This can be done on a per-table basis or across the database as a whole. For example, if you have sensor data and you have the columns site_name, machine_id, and sensor_id, you could configure the last value cache to keep values on that hierarchy (site -> machine -> sensor) and then quickly get back the last two values seen for a specific sensor, all sensors within a machine, or all sensors within an entire site. The cache acts as an in-memory round-robin database that gets populated as WAL flushes occur (every second by default).

The Distinct Value Cache is another new feature that lets users configure the database to cache the unique values seen for a column or hierarchy of columns, similar to the way the Last Value Cache works. It populates on WAL flushes (every second), just like the Last Value Cache. While this same information is accessible via the SQL engine against the raw data, the Distinct Value Cache is designed to return values in 10 to 30 milliseconds, making it a great fit for building snappy UI experiences.

Plugins and Triggers via Embedded Python

As part of this alpha release, we’re testing the experience for a new plugin system that lets users define Python scripts that can collect, process, transform, and monitor data on the fly directly in the database. It comes with an all-new API and development process. It’s still at a very early stage, so we’ll be iterating on the functionality and exact developer experience—there may be breaking API changes during this time.

“We’re excited about the range of possibilities the plugin system will enable, particularly when paired with the fast recent data query engine and last value cache. We picked Python because of its broad adoption and the ability of most LLMs to write short Python scripts.”

The plugin system is the logical successor to functionality in earlier versions of InfluxDB, like Continuous Queries, Tasks, Kapacitor, and Telegraf. While Kapacitor and Telegraf continue to work with InfluxDB 3, the plugin system brings this functionality directly into the database. This system enables:

  • Custom data collection and transformation
  • Real-time monitoring and alerting
  • Integration with third-party services
  • Scheduled task execution
  • Downsampling and aggregation
  • HTTP endpoint creation for custom APIs

Users can define plugins that are triggered by various data lifecycle events in the database. The plugin API includes the ability to query the database, write data back into the database, and connect to any third-party service enabled through Python’s ecosystem of libraries and tools. The trigger points for plugins are:

  • On WAL flush sends a batch of write data to a plugin once a second (can be configured).
  • On Snapshot (persist of Parquet files) sends the metadata to a plugin to do further processing against the Parquet data or send the information elsewhere (e.g., adding it to an Iceberg Catalog).
  • On Schedule executes plugin on a schedule configured by the user, and is useful for data collection and deadman monitoring.
  • On Request binds a plugin to an HTTP endpoint at /api/v3/plugins/<name> where request headers and content are sent to the plugin, which can then parse, process, and send the data into the database or to third party services

We’re excited about the range of possibilities the plugin system will enable, particularly when paired with the fast recent data query engine and last value cache. We picked Python because of its broad adoption and the ability of most LLMs to write short Python scripts. We think that with the tools available today, even non-programmers will be able to create plugins in the database to solve their domain-specific problems.

InfluxDB 3 Enterprise

InfluxDB 3 Enterprise is the second product we’re announcing today, which builds on Core’s foundation with the following capabilities:

  • High availability configuration
  • Read replicas for query and plugin processing scalability
  • Enhanced security features
  • Historical data compaction and indexing to enable faster queries for anything over one hour
  • Row-level delete support (coming soon)
  • Integrated admin UI (coming soon)

Enterprise is designed for operational simplicity whether deployed on bare metal, VMs, containers, or Kubernetes. Its architecture enables the isolation of different workloads while sharing only files on object storage, making it ideal for custom deployment architectures.

We will have data migration tools from previous versions of InfluxDB to bring over historical data.

Compatibility with previous InfluxDB versions

While we weren’t able to bring forward all features from the previous versions of InfluxDB, we have worked hard to bring some of the old APIs into the new version. We’ve maintained compatibility with these existing InfluxDB features:

  • Support for InfluxDB 1.x and 2.x write APIs
  • InfluxDB Line Protocol support
  • InfluxQL query support (and the v1 query API)

However, InfluxDB 3 does have some limitations compared to v1 and v2 with respect to how data is ingested. For Core, there is a hard limit of five databases and 2,000 tables across the server. For Enterprise, the limits are 100 databases and 4,000 tables. We’ve done this to limit resource utilization and how many individual Parquet files need to be persisted to object storage on snapshot of the buffer. Depending on how these limits work for our users, we may work to increase these in the future.

While InfluxDB 3 still supports schema on write, it does not support the addition of new tag columns after a table is created. This is because the set of tags and the time represent the primary key in tables. However, new fields can be added at any time. When creating schemas in InfluxDB 3, only unique identifying information for a row should be in a tag, while everything else should be a field. Generally, it will be best to use fields for data, not tags.

We will be working on data migration tools for InfluxDB Enterprise. Because open source is designed only for data in the last 72 hours, our recommendation for migration to open source is to mirror writes from older versions onto a new running open source instance and then change over after 72 hours.

Unfortunately, we are not able to bring a compatibility layer forward for Flux users at this time. We’re hoping that the combination of the Python plugin system, SQL, and InfluxQL, will give users all the functionality they previously had with Flux.

The FDAP stack: core components of InfluxDB 3

We began developing InfluxDB 3 more than four years ago, building a new Rust-based core around the FDAP stack (Apache Arrow Flight, Apache DataFusion, Apache Arrow, and Apache Parquet). Investing in Apache Software Foundation projects and building InfluxDB 3 around them is one part of our strategy with open source development. We believe open source exists to create widely used commodities—free to use, improve, build on, commercialize, and inspire derivative projects. Specifically, we believe that a state-of-the-art, high-performance SQL engine with parser, planner, optimizer, and vectorized execution should be freely available to any user or company for any purpose without restriction, even if it’s by InfluxDB competitors.

“As we release new versions of InfluxDB built on this technology, we now have a strong tailwind that will continue to drive new features and performance along the way.”

We made the deliberate choice to build around open standards with the goal of having broader compatibility with third-party projects and products. This led to SQL for the query language, Flight and Arrow for RPC, and Parquet for the file format. The choice of Parquet has become even more important with the rise of the Iceberg Catalog format. We recognized the importance of this and even contributed nanosecond timestamps to the Iceberg Spec and implementation, to support the precisions that InfluxDB requires.

Over the last 4.5 years, we’ve helped build DataFusion into the high-performance columnar query engine it is today. Along the way, we developed, open sourced, and donated to the ASF the object store abstraction that gives DataFusion the ability to execute against files in the object stores of any of the major cloud providers. The results of our efforts and many contributors around DataFusion, led by InfluxData Staff Engineer and PMC Chair Andrew Lamb, can be seen in the SIGMOD paper from last year, and DataFusion’s recent spot atop the rankings of single node query engines against Parquet files.

This pace of innovation is only accelerating because of DataFusion’s home in the ASF. It’s what makes it strategically safe for companies of all sizes to contribute to and improve DataFusion. The largest companies in the world and startups of all kinds are not only using DataFusion, but also improving the performance, features, and reliability of the engine itself. These advancements flow directly into InfluxDB 3, continuously improving its performance and giving us the best possible outcome we could have hoped for when we embarked on this journey. It’s an unbeatable strategy compared to closed, proprietary software—we accelerate maturity by years.

From the start, our goal was for the core engine to be adopted by as many users and companies as possible, even beyond InfluxDB itself. This broad adoption fosters a larger pool of contributors who push the boundaries of innovation while creating more robust software. It’s battle-tested in many different environments, for many use cases, and with all kinds of data. As we release new versions of InfluxDB built on this technology, we now have a strong tailwind that will continue to drive new features and performance along the way.

Our open source philosophy

With today’s announcement, we are continuing our commitment to open source and maintaining a clear separation between our open source projects and commercial offerings. Rather than restricting usage through licensing, we’ve chosen to differentiate through architectural decisions that benefit both our open source users and commercial customers. We believe this approach fosters a more vibrant community while ensuring we can continue investing in open source development for the long term.

The decision to focus Core on recent data reflects a careful balance of technical and business considerations. Core’s 72-hour optimization isn’t just a commercial boundary – it’s an architectural choice that enables better performance, reliability, and simplicity for the most common time series workloads. By optimizing for the most common use cases, we’ve created a system that delivers exceptional performance while remaining truly open source under permissive licensing. This focused approach allows us to ensure reliable operation by avoiding the complexity of compaction in the open source offering. Furthermore, it encourages ecosystem integration by making it simple for users to combine Core with their choice of third-party tools for historical analysis.

Ultimately, we’ll iterate on feedback from our community and our customers. We want to ensure that some version of InfluxDB will still serve at-home and side-project use cases. Depending on the feedback we receive, we may open up a not-for-commercial-use tier for Enterprise that is free to use.

Development timeline

The alpha period will focus on extensive testing and performance validation, integrating community feedback, refining the API, and enhancing operational experience. During this period, we may make breaking changes to file formats or APIs. Our goal is to transition into a beta in early March, which would mark the end of any potential breaking changes. A general release is planned for April, subject to learnings from the alpha and beta periods.

Joining the community and giving feedback

This is just the start of an ongoing journey, with continuous development and iteration happening in the open. Check out our getting started guide for Core and Enterprise, and join the following channels to give feedback:

The alpha releases of InfluxDB 3 Core and Enterprise represent our vision for the future of time series data and our commitment to the open source community. We look forward to your feedback and participation in shaping the future of InfluxDB.