Announcing InfluxDB 3 Enterprise free for at-home use and an update on InfluxDB 3 Core’s 72-hour limitation

Navigate to:

Two weeks into the alpha release of InfluxDB 3 Core (our new open source offering) and InfluxDB 3 Enterprise (our newest commercial offering), we’ve received a good amount of feedback that the 72 hour limitation in Core is too limiting. This fell into three categories:

  1. At-home users using InfluxDB for home sensors and systems metrics often look at weeks, months, or even a year of data for their data
  2. Open source users who expect to be able to write data from any time frame and query any data they write to the database
  3. Open source users who expect InfluxDB 3 Core to be able to query large historical ranges of data, just like InfluxDB 1 & 2 open source can

For the users in category 1, we’re announcing a free tier of InfluxDB 3 Enterprise for at-home, non-commercial use. It will be rate limited in some way, but our intention is to give a free option with all the capabilities of Enterprise for these at-home users. If you’re an at-home user interested in this, please reach out on our community Discord and tell us more about what kinds of rate limits will work for your use case.

For the users in category 2, we’ve lifted the limitation on what time-stamped data can be written to InfluxDB 3 Core (you can write for any historical period) and we’ve lifted the limitation on what period of time can be queried. However, the limitation on the range of time a single query can cover is still limited (in hours), due to specific implementation details which I’ll cover more about in this post. InfluxDB 3 Core is optimized for querying short ranges of time (i.e. hours, not days).

For the users in category 3, we understand that InfluxDB 3 Core doesn’t cover everything that InfluxDB 1 and 2 do–it’s designed to fill a unique role in the time series toolkit, offering a highly performant, recent-data engine. For those on versions 1 and 2 and are happy with them, there’s no reason to move off. For users requiring the ability to query longer ranges of time, this is one of the capabilities we sell in the InfluxDB 3 product line.

To see why this limitation exists in Core, but not in Enterprise (our commercial offering), the rest of this post gets into the technical details.

InfluxDB 3 Core and Enterprise organize data as it is written in into 10-minute blocks of time based on the timestamp of the data. If you issue a write request with thousands of lines of Line Protocol that have timestamps for the same measurement, but ranging over a period of an hour, this will be split into six chunks for each block of time in that hour (:00, :10, :20, :30, etc.). This data is kept in memory for fast query access and it is also written to a WAL for durability (this WAL can exist entirely in object store for diskless operation).

The WAL is periodically snapshotted to keep its size down. By default, this happens every 10 minutes, at which point the in-memory buffers will be written to Parquet files (one per measurement (i.e. table), per 10-minute block of time). The Parquet data is also put into an in-memory cache so that queries against this recently persisted data do not need to go to object storage. Once it is in the Parquet cache, the queryable WAL buffer is cleared.

When a query comes into the server, the time range of the query is examined to determine which Parquet files must be included in the query planning and then execution process. Data from the WAL buffer is always included in the query plan. Querying across many files will start to degrade performance and use up more RAM due to DataFusion reading metadata and row groups from each Parquet file.

From our testing, the more files that are included in a query, the more RAM usage there is and the slower the query gets. The impacts are even more pronounced if the files are not in the cache and the server must go to object store to get the metadata and then data. Executing a query in Core with a large range of time can potentially result in thousands of GET requests to object store and in many cases will get the database OOM killed or DataFusion will stop execution because the memory budget it has been allocated has been exhausted.

Because of these performance properties, we’ve set a configuration option that limits a query plan to 432 Parquet files, which is a 72-hour range of time given the 10-minute time blocks. This is an option that can be set on the server while starting it. We view this as a service protection mechanism. Values much higher than that will likely not yield great user experiences. Even querying across that many files will be less than ideal if you’re looking for very fast query response times.

InfluxDB 3 Enterprise lifts this limitation by including a compactor that rewrites these 10-minute files into larger blocks of time. Not only does it rewrite those files, it sorts the data by series and writes out a separate index that lets the query engine know which files contain what data. This is what makes it possible for InfluxDB 3 Enterprise to query across larger time ranges of data. It also enables Enterprise to give faster query response times on any time range that spans longer than an hour.

We think that InfluxDB 3 Core offers a compelling set of features for real-time, recent time series data. With the embedded Python processing engine and API, it makes Core ideal to act as a data collector that has store and forward capability and the ability to query data in real-time as it is ingested. It can also do ETL, monitoring and alerting, and shipping of data to object storage and other third-party services. This is all in addition to acting as a diskless time series database for recent data.

InfluxDB 3 Enterprise represents a full historical time series database along with everything that Core enables. It’s one of the products we sell. Building a sustainable business is what enables us to continue building InfluxDB 3 Core as a permissively licensed open source project. It also enables us to continue contributing and driving forward the state of the art in query engines with our work in Apache DataFusion, Arrow, Parquet, and the Rust object store crate.

If InfluxDB 3 Core sounds like it meets the needs of a project you’re working on, we hope you’ll give it a try.