Introducing InfluxDB 3 Core & Enterprise Public Alpha
Session date: Jan 28, 2025 08:00am (Pacific Time)
Join InfluxData Founder and CTO Paul Dix and Senior Product Manager Pete Barnett on January 28th to learn more about the newest additions to the InfluxDB 3 product line: InfluxDB 3 Core and Enterprise.
We’re excited to share the alpha releases of these new developments! InfluxDB 3 Core, an open-source product licensed under MIT/Apache 2, serves as a recent-data engine for time series and event data. InfluxDB 3 Enterprise, a commercial version built on Core’s foundation, adds long-term query capabilities, read replicas, high availability, scalability, and fine-grained security.
Both Core and Enterprise are the result of more than four years of development, powered by the FDAP stack—Apache Flight, DataFusion, Arrow, and Parquet—and delivered on our completely rebuilt time series database architecture.
In this webinar, Paul and Pete will dive into:
- An in-depth overview of InfluxDB 3 Core & Enterprise
- Key features and how they address critical gaps for developers
- How you can participate in the public alpha
- Real-world use cases and applications
- Interactive Q&A + Feedback
Don’t miss this opportunity to be among the first to experience the future of data monitoring. Download the alpha here and register today to learn more.
Watch the Webinar
Watch the webinar “Introducing InfluxDB 3 Core & Enterprise Public Alpha” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Introducing InfluxDB 3 Core & Enterprise Public Alpha.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. Speakers:
- Peter Barnett: Senior Product Manager, InfluxData
- Paul Dix: Founder and Chief Technology Officer, InfluxData
PETER BARNETT: 00:00
Let’s go ahead and kick it off. Hello, everybody. My name is Peter Barnett. I am a product manager here at InfluxData. I’m joined with our CTO and co-founder, Paul Dix. And today, we’re going to be going through a real under the hood deep dive on all the new features and product announcements that have recently come out about InfluxDB 3 Core and InfluxDB 3 Enterprise. How they work, what a lot of the special features you’re going to want to be working through when you’re trying these out, how to get started, and of course, again, a lot of those technical details throughout. So, there’s going to be time for some questions at the end. If you want to ask questions, please go ahead and use the Q&A option. You can send them to myself directly. Feel free to ask them throughout the presentation, but we’ll save the answering portion for the end. And so, with that, I’ll go ahead and hand it over to, again, our presenter today, Paul Dix, and take it away, Paul.
PAUL DIX: 00:53
All right. Thanks, Pete. Thank you, everybody, for joining us. So just cover the agenda really quickly. So obviously, I’m Paul Dix. Pete, you met. The agenda: so first, we’re going to talk about InfluxDB3 Core, which is the open source release of InfluxDB 3. We’ll cover how to write data, the new API, the CLI, querying data using SQL, the processing engine that we brought in, which is basically a Python VM inside the database. And then we’ll get into some of the details of InfluxDB 3 Enterprise. The architecture, the compactor, those kinds of things. And then at the very end, time for Q&A. So InfluxDB 3 Core, we’re calling basically a real-time data collector and query engine. It’s basically a recent data engine. So, the why behind it is it is built to collect and process data in real time while persisting that data to either local disk or to object storage. It is optimized for queries against recent data, which are entirely in RAM. So basically, queries that are against data that you’ve recently written into the database, against recent timeframes, shouldn’t have to touch object storage, even though the entire database is designed to ship data to object storage.
PAUL DIX: 02:19
So, it’s useful for real-time system monitoring, edge data collection, and transformation, streaming analytics, and sensor use cases where you want to do monitoring, learning on a variety of sensor data. So, we built Core for basically simplicity and speed. We had, in 2023, some other releases of InfluxDB 3, the distributed versions of the database, which we offer as a service and a commercial product on-premises. But for Core, we thought it was important to have something that was a single process, single binary, that was really easy to get set up and get going, right? So, there’s a one-command install. It has native support for SQL and InfluxQL, which we were able to bring into this version. And it’s built on the FDAP stack, which is Apache Arrow Flight, DataFusion, Arrow, and Parquet, right? All projects within the Apache Software Foundation, and we are heavy contributors to all these things. As I mentioned before, it’s optimized for recent data queries. And we’ll cover some of the details here. There’s a buffer of data that goes into the write-ahead log that’s there for fast query access. There’s a cache of Parquet data. Parquet is the file format that we persist data into object storage in. There’s also a last value cache or recent N values cache, which keeps the values for specific time series in memory, and then DISTINCT value cache. And then the embedded Python processing engine, which is useful for data collection, data transformation and gives you access to all the Python libraries in that ecosystem.
PAUL DIX: 04:05
So, let’s dive into writing data in InfluxDB 3 Core. So, we’ll cover the data model, some of the tools, the API, and the architecture of the database. So, the data model should look familiar for people who use SQL databases, right? You have a logical database. Beneath that, you have tables, and each table has a collection of columns. We have different kinds of columns. There’s time, which is a required column in every single table. It’s a nanosecond precision timestamp. There are tags, if you’re familiar with previous versions of Influx. It’s basically key-value pairs. The key is the column name, and the value is basically a string that we keep in dictionary formats. There’s N64, float 64, UN64, Bool, and String. So, unlike other time series databases which focus just on floats, InfluxDB can store different kinds of time series data within the database. So, comparing this data model to previous versions of InfluxDB, right, we have InfluxDB 1, InfluxDB 2, and now Core. So, in version 1, you had a concept of a database and retention policy. Those two combined were basically a thing that stored data, right? In version 2, we just called this a bucket. In version 3, it’s just called a database. In versions 1 and 2, you have something called a measurement. In Core, that is called a table. And then in all of these, you have columns, and you have the same column types, tags, fields, which can be you in float, bool, and string, and then time.
PAUL DIX: 05:49
So, to write data into the database, we created a text-based protocol that was basically easy to construct and easy for humans to basically look at and get a feel for what the data is. So, the text-based protocol has the table name at the beginning, right, which is, in this case, CPU in this example, a comma, and then the tag set, which is key-value pairs separated by an equal sign, right? So, we see we have the tag host, we have the tag region, and we have the tag application. Then you have a space, and then you have the field set, which are, again, key-value pairs, where you have different field types. So, integers are basically a number with an I after it. U-ints would have a U after it. Floats are basically just the number, either with the decimal or not. If you don’t have a— If you have a number and then no trailing I or U, the database assumes that value is afloat. And then there’s strings, and then Booleans are just going to be true or false. And then finally, a timestamp, which is represented as basically an Epic from 1970, and that can be in whatever precision you want. By default, it is nanosecond precision. But InfluxDB 3 will look at the timestamp and guess at the precision if you don’t provide it. So, you can write data in and specify the precision as nanosecond, millisecond, or microsecond, millisecond, and seconds from Epic.
PAUL DIX: 07:22
All right. The new HTTP API. So, Core and Enterprise both support the InfluxDB 1 and 2 APIs. So, one is basically just /write. Version 2 is /API/v2/write. And it’s basically the same line protocol format. We’ve also added a version 3 API that will take the same line protocol at /API/v3/write_lp. But there are a couple of behavior differences between what you would expect in version 1 and version 2. So, the first is a parameter called accept_partial. If you set this to true, you can post thousands of lines in a single request. It will parse and validate those lines against the schema that you have. If any of those lines has a schema conflict or it can’t parse for some reason, it will take the data that’s acceptable, write that into the database, and then return an error for which lines had an error. There is no sync, which I’ll get into in a little bit. I have to explain some of the internals of the database for that to make sense. And going back, format, actually, this isn’t valid on the right API. Format will come in later in the query API, so a partial right example. Here at the top, we’ve done a partial right, and we said, “Accept partial rights true.” And basically, it accepted whatever line protocol we had that was valid. And then, it will return an error for each line that threw an error. So, there you can see the data.
PAUL DIX: 09:11
You can see the original line that was there, the line number in the request, and the error message for that. Now, with accept_partial=false, it’ll basically do an all-or-nothing validation of all the lines in a request. And if any one of them fails, it will reject everything. And it will still return a line-by-line detail of what the errors were. So that’s the HTTP API. We also have a command line interface built into the database. So basically, using the command line interface, you can start the server, but you can also perform operations against a remote server. So, in this case, we’re using the CLI to write data. We put this line protocol into a file. We issue a command that says, “Write to the database server metrics,” and we specify that the data is coming from this file. So, it’ll read that file. It’ll make a request to the remote server to do the write. You can also do simple one-offs, basically just like do a write and then specify in quotes the line. You can also pipe data into this command. So basically, you could do cat server data piped into InfluxDB 3 write and specify the database. So, Core is designed around a diskless architecture. Now, you can operate with a locally attached disk, but basically, it’s designed around being able to use object storage as its only durability layer.
PAUL DIX: 10:50
Now, the backing object store can be S3, Google’s Object Store, Azure’s Object Store, or basically anything that’s S3 compatible, right? So, like MINIO, CEF, any of those kinds of things. Or it can be a local file system, or the backing object store could just be RAM, right? If you don’t want to— If you want to just operate this thing as a totally in-memory processor of data that you’re feeding in, you can just use RAM as the backing store. But using that object store abstraction essentially means we have to design the write-ahead log and all the data that gets written around the pieces that an object store would give you, right? So, we buffer data in a wall, and then we send that periodically to the object store. So, we can see in this diagram on the right, writes come in. We keep it buffered in RAM. And once a second, we flush those writes to a single file that gets written to object storage. Once that file has been flushed, we put that data into a queryable buffer, which is basically stored in arrow format so that it has fast access for queries. And then periodically, that data is snapshotted to actual Parquet files. So, Parquet represents the long-term format that the database uses for data. So, looking at the full picture of the right path, here on the left, we have a client that’s going to make a write request to the server. First, the server checks the write against the schema that it has, right? And all this schema information is cached in RAM. So, it validates everything. If it’s not validated, immediately returns an error. If it is valid, then it sends it into the write buffer.
PAUL DIX: 12:44
That write buffer just has a one-second timer. Once every second, it will flush the data to a file and object storage, and then it will put it into the queryable buffer. And once it’s in the queryable buffer, a response will be sent back to the client. What this means is that writing clients can read their own writes. So, if it were to turn around and query the same server, it would get the result back they had just written in. Now, what this also means is that writes can take up to a second to get a response, right? An individual write request is its response time is gated by that flush interval of the wall. Now, that flush interval is basically a configuration option. You can scale that back to, say, 100 milliseconds. The downside there is, basically, you’re creating 10 times as many files in object storage. So overall, it doesn’t limit total throughput within the server. But for a single thread in a client that’s making a right request, it will only be able to send a request once a second. So, if you want to you could parallelize that to send many, many requests. The nice thing about this is once the client receives a response, you get all the durability guarantees of the backing object store that you’re using, right? So, in Amazon, that means you have multi-AZ durability. It also means you get all the tooling of object storage, right? So, if you have the bucket that you’re writing data into configured to do multi-region replication, you know that all the data that you just wrote in is also going to be replicated to whatever other region.
PAUL DIX: 14:28
Now, this is where the no-sync option comes in. So, for clients that are okay with some tiny window of data loss, which is basically that once a second flush, they can specify as a parameter in the API that they don’t want to wait for a sync. So, what that means is the client will make a request, the data will get validated, and if it’s valid, it will put it into the buffer, and it will immediately return a response to the client, right? So, the response time for an individual request will be much, much faster. Now, what that means is if the server crashes or if it’s unable to persist that wall file to object storage when that one-second timer is up, that data won’t be persistent, right? So, there’s no durability guarantee there. It also means that if a client makes a write request with the no-sync option and they immediately query the database, that data may or may not show up. It just depends on whether or not it hit the condition of querying the data before it got persisted to object storage and put into the queryable buffer. So, let’s look at the queryable buffer in a little bit more detail. Now, the point of the queryable buffer is essentially to, one, keep data, all the data that’s in the write-ahead log, right, which is in just a series of wall files for quick access to query. That’s one of the purposes. But the other purpose is that periodically, we will snapshot the wall to clear out old wall files. And when we do that snapshot, essentially, the data in the queryable buffer is persisted as Parquet files in object storage.
PAUL DIX: 16:17
Now, by default, the queryable buffer organizes data into 10-minute blocks of time. And these 10-minute blocks of time are based on the data timestamp of the data being written in. It’s not based on ingest time or wall clock time of the server, right? So, most of the time, people are writing data with a timestamp roughly equivalent to now, right? So, you’d expect as you’re writing data in, each 10-minute block of time gets filled up. And gradually, as the wall fills up those 10-minute blocks, what we want to do is once a 10-minute block becomes cold for rights, basically, that time has passed, we want to persist that data to object storage as Parquet files because Parquet is much more efficient in terms of memory usage than arrow. And it also allows us to snapshot the old wall files, which keeps the size of the wall down, right? We want to keep the size of the wall down so that we limit, essentially, the recovery time on restart. And also, wall files are much, much larger than underlying Parquet files. So, let’s look at the life cycle here of how data flows through the system. As I mentioned before, one of the goals of Core is that queries against recent data should not have to touch wall. They should basically be entirely against data that is in RAM. So, to do that, basically, when the data in the queryable buffer gets snapshotted, it gets written to a Parquet file, and the data of that Parquet file also gets put into a Parquet cache, an in-RAM Parquet cache. And then once it’s in that cache, then the queryable buffer is updated so that it evicts that data that it just snapshotted.
PAUL DIX: 18:11
So, there’s no point at which you can send a query that won’t actually hit data in RAM. All right. Let’s get into querying the database. So, we have a couple of new endpoints. Now, Core supports Flight SQL, which our previous version 3 products. That’s how you query the database primarily. But Core and Enterprise also have HTTP endpoints that might be a little bit easier to use for people who aren’t requesting 10 million rows in a request or whatever. So, the two endpoints are one for querying SQL and one for querying InfluxQL. Both GET and POST are supported against these endpoints, right? GET is four queries. They’re short enough that they can be encoded in a URL. POST is four queries that are much, much longer that can’t fit in a URL, and need to be put into the body of a request. So, the parameters are q for the query, db for the database, params for parameterized SQL. So essentially, you can have key-value pairs that you put in for parameterized SQL queries. And then finally, the format, which is the response format of the data coming back. So, the response formats that it supports today are json, json lines, csv, pretty print, or basically output at the command line or viewing in a browser, or parquet. The nice thing about parquet is if you are going to do a request that’s going to return millions of rows, parquet is going to be the most efficient way to do that, right? It’s going to be more compressed, and there are a variety of tools that you can use on the client side to read the Parquet data.
PAUL DIX: 19:57
So, querying against the API using curl, you can see here we’re using query SQL. We specify the database. We specify the format as json lines. And the query we’re executing is just show tables, which shows the basic tables we have in this database. So here we are using the pretty print format. And you can see it puts it into a nice little table for viewing. And then we also have a query enabled through the CLI, right? So, you can run InfluxDB 3 query, the database, and then the actual query that you want to execute. So there, we’re showing it in pretty print format. But again, the query CLI also supports the other format options. So, let’s look at the different kinds of tables that exist within the database server. So, if you do a show tables, you’ll see this output on the right. And you can see there are tables called iox, which was the codename of InfluxDB 3 when we first started developing it about four and a half years ago. Those are tables that are created by the users of the database, right? So, any new table that you create gets put in there. And one key element that I forgot to mention, if you’re not familiar with InfluxDB, is all the schema is created on write, right? So basically, databases are created, tables are created, columns are defined. All that stuff is done when you write data into the server. It’s done on the fly. There are command line options for creating databases and creating tables, but you don’t have to do that. You can literally just start throwing data at the database, and it will just create all that stuff for you. Oops. One sec.
PAUL DIX: 21:49
So, the next type is system tables. So those are basically— Those keep information about internal state of the database or state of files on object storage and stuff like that. So here we can see we have distinct caches, last value caches, the Parquet files, processing engine triggers and plugins, queries, which is basically just an in-memory table that is a record of what queries have recently run. And then lastly, we have information schema tables, right? So those basically just give you information about the schema of the different user-created tables within the database. So, the tables, the views, the columns. So here is an example of getting, say, two records from the query system table. So, there you can see some of the information you get back. Here, we can see getting the information schema. So, we want to see what columns are defined in the CPU table that we’ve created. All right. So that’s the basics on queries. We aren’t going to dive into the actual querying data via SQL and stuff like that. We have a bunch of information on our getting started guide and in our documentation. This is less about SQL and more about the functionality specific to Core and Enterprise. So, the last value cache is a new feature that we created. I think this is something that’s been a long-running feature request for InfluxDB, which is basically just an in-RAM cache that’s optimized for returning the most recent values of an individual time series or a group of time series.
PAUL DIX: 23:43
So, the last value cache has a hierarchical structure. You can just store the last values of individual series, but you can also arrange it in the hierarchy like we have in the example on the right, right? So, we have the last value cache keyed off building, then machine, then sensor in this example, right? So, you can quickly access all the data for a given machine or a given building So here on the left, we have some example data in a table that we fed in. And on the right, we have the command. We’re using the CLI to create the last-value cache. There’s also an HTTP API for this. So, we have a command where we create the last value cache, and we say, “The key columns for this one are going to be region, host, and app, and the value column that we want to keep.” We only care about storing whatever the last value of status was. So, in this case, we’re going to store the last value, but the last value cache will take an argument where you can say how many last values you want to keep, right? So, you could say, “I want the 100 most recent values for whatever the time series is.” And if you have enough RAM to fit that, that’s what the database will cache and will give you quick access. This cache is populated as you write data into the database, right? So, it’s basically like a write-through cache. So, on startup, the cache will start empty. But as you write data into the database, the cache will get filled with whatever you put in. So querying data from the last-value cache. Here we’re using the CLI again to execute a query. As you can see, we’ve added essentially a table function to get data from it. So, it’s still SQL, but basically, we’ve added this new function called last_cache in place of where the table would normally go.
PAUL DIX: 25:38
So, we’re selecting the app and the status and time from last_cache (‘cpu’, ‘appStat’), right? So basically, we’re saying the last cache of the CPU table and the appStat cache name. Basically, you can create many last caches on an individual table depending on how you want the data organized and what you want cached. So, the distinct value cache is similar to the last value cache in the sense that it’s hierarchical, but it’s just about storing distinct values of an individual column or a tree of columns. So, the purpose of this is basically for building UI experiences where you have drop-downs or drill-down selectors in a tree where you want to load the data dynamically on the fly, and you want it to be very, very fast, right? This is designed to return values in under 30 milliseconds. The last value cache is designed to return values in under 10 milliseconds. So here, again, we have this hierarchy-building machine sensor. So, we can say, “What buildings do I have? What machines do I have in this building? Or what machines do I have all around?” So again, distinct value cache. We have some data on the left. We have the create command on the right. We see we’re going to create a distinct cache against the server metrics database, the CPU table. We specify the columns. And basically, this comma-separated list of columns is a hierarchy, and then we give it a name. In this case, it’s distCache. And querying it, similar to last value cache, we have a special table function for pulling data from that. So, select app from distinct value cache.
PAUL DIX: 27:33
And the nice thing is you can use table expressions and stuff like that to use whatever this query result is in another piece of a more complex query if you want. And same thing with the last value cache. So last value cache, for instance, you could query from the last value cache, order those results by some scoring function, and then have a subquery from that that takes those results and queries, say, the top five individual time series from that result. All right. Let’s talk about the processing engine. So, as I mentioned, it’s an embedded Python virtual machine. And our goal here was to build and improve upon the functionality of previous versions of the database, right? So, in InfluxDB 1, we had a feature called Continuous Queries. We had a separate piece of software called Kapacitor that had its own language for specifying processing against time series data. In version 2, we had something called Flux Tasks. And for all versions 1, 2, and 3, we have a data collector called Telegraf, right, that’ll work against any of them. So, the idea for the processing engine is it can fill any of these roles, right? The functionality that has can fill any of these roles, but all the plug-in logic is in Python. We chose Python because of the ubiquity of the language and because of the broad ecosystem that it has, right? There are tons of libraries to do a bunch of different things. And then lastly, Python is probably one of the languages that the current LLMs and AIs can write best, right? You can ask any LLM to help you write Python code, and it’s quite effective.
PAUL DIX: 29:27
So, this basically brings Python execution inside the database, and it’s useful for data collection, for transformation, for processing, for monitoring, and all those kinds of things. So, the Python plugins can be triggered off four different kinds of events, right? So, there’s a different plug-in type for each of these events. The first is WAL Flush. As I mentioned before, once a second, by default, a WAL file gets written to object storage. You can create a plugin that will also receive the contents of that WAL file as basically just rows written to different databases and different tables. And then you can have some sort of arbitrary logic on top of that in your Python code. There’s Parquet Persist, which is basically when it persists Parquet files, it can send a notification to a Python plug-in with the file metadata, which the plugin can then use to either get the file and do more processing after the fact or send that information elsewhere like to a catalog like Iceberg or Delta Lake or anything like that. Scheduled Task is basically similar to tasks in version 2 of InfluxDB and Telegraf itself, which is on some schedule, run this plugin, right? And all the plugins have an API built in so that you can write data into the database, you can query from the database, and there are other kinds of APIs that we’ll be adding over time. And then the last one is basically a request-style plug-in. So, what this does is it binds the plug-in to an endpoint under /API/v3/engine/ and then whatever path you want it to bind to.
PAUL DIX: 31:25
And basically, when you make a request to that path, it will pass the request headers and the request body to the plugin along with the API into the database so that you can process the request, you can make writes into the database. You can query the database. And then you can send a response back to the user. And you can send a response in JSON, in text, in HTML, in whatever format you want. So, let’s look at an example here. Essentially, for the WAL trigger, first you would create a plug-in, which is basically just a file, a Python file, in a directory on the server, right? So, when you start up the server, you specify a plug-in directory, and that’s where it looks for these plugins. When you create a trigger, you specify what file the trigger maps to, which is the plug-in. And then whenever that trigger hits, in the case of WAL flush, the trigger it’s bound to a specific database, and the trigger can be hit on either a write to any table within that database or a write to a specific table within that database. So, data comes into the wall buffer every second that gets flushed. And when that flush occurs, the data gets sent to the queryable buffer for query access. And it also gets sent to the processing engine for any wall triggers that it maps to.
PAUL DIX: 33:05
All right. I want to talk a little bit about the limitations of Core. So, as I mentioned, Core is optimized for queries against recent data. And it’s really optimized for queries in short-time ranges, right? Minutes or a few hours. It is not meant to be a database for querying long ranges of time or large swaths of historical data. And the reason that is basically because of how the data is organized. As I mentioned before, we put data into 10-minute blocks of time. Each of those 10-minute blocks gets persisted as a Parquet file. So basically, you can see here in the diagram on the right, you have 24 files per four-hour period, and basically, those files just stack up. So, when a query comes in, if we have all that data in the Parquet cache, that’s great. That’s by design, and we don’t have to go to object storage. If you have large historical data sets and stuff like that, we don’t keep that in the Parquet cache. That’s not what we’re trying to do. We keep it in object storage. So, if you were to query some historical period, it would have to do depth requests to object storage for each of those files that you’re hitting. So, by default, we have a limit of the number of Parquet files that we will execute against in a query plan. That limit is set to 432, which is basically 72 hours’ worth of time in these 10-minute increments. That limit is configurable by the user when you start up the server. You can set it lower. You can set it higher. The impact of changing that limit of querying against more and more files is, one, slower queries, right, for the queries that do execute against more files.
PAUL DIX: 35:05 And a query that hits more files, we use up more RAM because it has to pull that data into RAM so it executes against it. And we kept the limit down to 432, basically, to optimize for robustness and stability of the database. So, if you’re trying to go beyond that, you’d have to do some testing. Now, InfluxDB Enterprise adds some capabilities that essentially lift that limit. And what that is, is essentially rewriting the data and reorganizing the data for historical queries and queries with longer ranges than, say, two hours. So, we’ll get into that and the features of Enterprise next. So, Enterprise builds off InfluxDB 3 Core. So, it includes all the functionality of Core, and it’s basically an in-place upgrade, right? You don’t have to move your data around. You basically just swap out the binary, start it up, and you’re good to go. So, it’s very easy to set up, and it’s very easy to test. As I mentioned, the optimizations we’ve made in the single server configuration is that it will compact data so that any query over greater than, say, a two-hour block of time is probably going to be faster in Enterprise than it is in Core. And I’ll dig into the compaction in the next slide. And then the last bit is we’ve added capabilities to have high availability, read replicas, and to pull this compaction function out into a dedicated node. And all of this is designed so they can run on bare metal, in virtual machines, or in Kubernetes. So, this is what historical or querying against compacted data looks like. Basically, what you have on the right is a diagram that shows at the top the first generation of data, right?
PAUL DIX: 37:04
That’s the generation that either Core or Enterprise lands in these 10-minute blocks of time. As those blocks of time age out, they get older, they get compacted into larger blocks. Now, the default compaction scheme is 10 minutes to 20 minutes to 60 minutes to 4 hours to 24 hours to 5 days. So, as you go down the tree, you can see our 20-minute blocks get compacted into 1-hour blocks. In this example, the 1-hour blocks get compacted into two. You can set by configuration what you want the compaction scheme to be. But as you can see, if you were going to query some historical period, you access far fewer files, right? Because to access, say, this period, which is the last four hours, we only must access these four files that are highlighted, plus whatever’s in the buffer. So, the high availability functionality is designed around the object storage. And it’s designed around individual nodes being able to act as readers and writers and writers being able to write to the object store at the same time. They each write to their own dedicated area, and then downstream readers can pick up the rights from multiple nodes. This makes it so you can create high availability, read replication, all sorts of things. So, in this case, we have two nodes acting as both reader and writer, and then some basic high-availability setup. Read replication is interesting because you can have a couple of nodes acting as writers, and then in a virtually unlimited number of readers, I mean, you’re limited by essentially how many requests you can make to object storage.
PAUL DIX: 38:59
So, there will, in practice, be some sort of limit, but it will hopefully be fairly high. And the read replicas basically can serve as query servers. They can serve as processing engine servers. And these setups give you different possibilities for the kind of cluster setup that you’d want to create. So, we have some examples that we can work through really quickly. So, the simplest is basically a two-node setup. We have two nodes acting both as reader and writer, and one of them acting as a compactor. Now, one of the things about this setup is the one that’s acting as a compactor is obviously going to use more resources than the other node. But this kind of setup will give you a simple HA configuration where if one of the nodes goes down, you can send writes and queries to the other node. This would be preferable for use cases where you basically don’t have a very large workload and you’re really optimizing for keeping your footprint as small as possible to save on cost. A more recommended approach would be the three-node setup, right? So, you have two nodes acting both as reader and writer, and then one dedicated node acting as a compactor. The nice part about this is compaction doesn’t compete for resources on your reader and your writer. And the other thing is, your compactor will then be able to be scaled independently of what’s going on your reader and writer. So, you can set your resources based on what you see in the environment. And then the last one is, basically, just extending that previous one with readers, right? So, in this case, you have a couple of nodes dedicated as your write nodes, and they do handle all ingest and persisting data to object storage.
PAUL DIX: 40:56
And downstream, you have the compactor compacting files, and you have the reader reading files. And the readers could be query servers, they could be processing engine servers. That allows you to create an arbitrary set of queries and processing engine servers. They’re isolated from each other. Now, one thing I should mention is that in this architecture, this multi-node architecture, it does mean that there is basically a time to be readable delay between when you write data in and when that data is readable from a downstream replica, right? So, as I mentioned, the writer, you write data in, it flushes a wall file to disk once a second, and then it returns a response to the client that made the request. The readers pull object storage continuously via GET request for the next file in the sequence. When they pick up that file, they parse it, put it into their queryable buffer, and that data will show up in queries. In our testing, now the polling interval is configurable by the user, right, so you can set it to much higher frequency if you want the data to be readable faster, or lower frequency if you want to save cost on GET requests. We’ve built the reader functionality so that list requests only happen on server startup to load an initial state, and then everything else is driven through GET requests to save on cost. So, the default configuration, I believe, is, the polling interval is 250 milliseconds, so we’ve observed that the usual time to be readable, between a write going in and it being visible in a downstream reader, is anywhere from 200 to 600 milliseconds. It depends.
PAUL DIX: 42:51
All right. So, we’ll close out with a look at what’s next, and then we’ll have time for Q&A. So, our focus over the next few months is Core is going to be focused on continuous improvements to the functionality, mostly inside the plug-in system and the plug-in API and that user experience. And then performance and, basically, ease of use around the API and the CLI. Enterprise will be focused on improving compaction, performance, implementing more fine-grained security controls, and then eventually, migrations from previous versions. So, our timeline here, so right now we’re in Alpha, so during the Alpha period, we will probably make breaking changes to APIs and file formats. We are creating nightly builds. And really, these builds are designed for testing purposes only. This isn’t for production usage because we’re making, potentially, breaking changes without a migration path. When we do a breaking change, you will have to start over your testing from a fresh database, from a fresh install. Our goal is to move into the Beta period in early March. And at that point, the release APIs will be locked down, and any changes we make will have in-place migration tooling to be able to upgrade to the new build, and it should just work. Our focus during the Beta period is really going to be on performance, robustness, and support tooling and operations tooling. And then finally, we’re targeting a General Availability in April. So, at that point, we’re saying Core is good to use for production, and Enterprise we will officially be offering as a commercial product.
PAUL DIX: 44:37
Now, we have three different solutions instead of just Core and Enterprise. So, we have Core, which is obviously open source. It’s permissively licensed under MIT or Apache 2 at the user’s choosing. We have Enterprise on the right, which is basically our commercial product. But we’ve also decided to create a pricing tier for Enterprise that is free for at-home usage, right? So, if you’re using InfluxDB at home for your sensors or your networking equipment or whatever, we have a free forever plan. It will be rate limited. We don’t know what the rate limits will be yet, but our goal is to make it so that at-home usage is just free. You can just use the Enterprise database with the features that it has around compaction primarily so that you can do historical queries and get access to everything. And that is all we have for today. Pete?
PETER BARNETT: 45:39
Absolutely. Yeah. We’ll dive into it first off. Thanks, Paul, for going through that. So, thank you all for— We’ve got a lot of questions to get into here. If you want to continue to ask questions, please use the Q&A function for the Zoom webinar. That’ll be a lot easier than messaging us through the Zoom chat. And Paul as well, if you can just go to the next slide as well briefly. Also, if you want to, please join us on Discord where you can connect with us to continue the conversation, ask more questions, really, through the whole process. I’ll also be dropping the links here shortly in the chat. But Paul, let’s kick it off. We’ve got a lot of questions, again, like I’ve said. We’ll start maybe on Core, and then we’ll sort of progress from there. One of the first questions we’ve got, and this is kind of a continuing one that’s come in, is the limit of the data presence in InfluxDB 3 Core to 72 hours. Will Core support keeping that data for a longer period and running those queries, six months, two years, etc., downstream?
PAUL DIX: 46:36 Yeah. So currently, Core doesn’t have any sort of retention policy enforcement. So, any data that you write into the database will be persisted to the disk or object store as Parquet files. And those files are just going to be there forever until you decide to delete them. When we made the announcement on January 13th about the functionality, we made an update yesterday that announced that you will now be able to write data for any historical time period, and you will also be able to query data for any historical period, right? So, if you have data from a year ago, you’ll be able to query it. However, the limitation is that the range of an individual query will be limited to that file limit configuration that you have set, right? And as I mentioned, the default for that is 432 files, which means the range of a historical query will be limited to 72 hours. You can increase that file limit. The other thing is all the data stored in object storage, if you use object storage—all of that is kept in basically a directory structure that should be readable by third-party clients and downstream systems, right? So, if you wanted to do historical analysis inside of anything that can read data from object storage, you’d be able to do that.
PETER BARNETT: 47:56
Great. Thanks. So, one thing we hear a lot about here is cardinality. Questions on about the cardinality V3. What’s the limit in the Core version versus maybe our more clustered versions and others that we currently have in commercial services?
PAUL DIX: 48:11
Yeah. So, in either version, there is no limit on cardinality. And the reason for that is the previous versions of InfluxDB created—it was basically two databases in one. It was a time series database, and it was also an inverted index. And that inverted index mapped metadata to the underlying time series. And the thing that killed you on cardinality was the maintenance of that inverted index. As you had more and more unique values appear in your tags and more and more unique series, that inverted index would have to be updated. It would get larger. It’s expensive to maintain. Version 3, both Core and Enterprise, don’t have that inverted index. They basically just store the data in Parquet format, and we rely on the performance of the columnar query engine to be able to execute queries fast, quickly. Now, in Enterprise, the compactor does have functionality whereas it compacts the data, it will also write an index file that maps unique values to what files they appear in, right? And what that does is it allows the query engine, when it sees a query, if one of the columns in the query, there’s something in the where clause, where that column appears in the index, it can reference the index to basically rule out as many Parquet files as possible from the query plan to execute faster. Now, I didn’t cover this in all the details. Basically, there is an API and a CLI in Enterprise to set which columns you want to have indexed. By default, it indexes the tag columns. So, whatever the tag columns you have, it’s going to index those. But there are many cases where you have tagged columns and you don’t want those indexed. So, you can override that behavior.
PAUL DIX: 50:04
So, cardinality will have an impact there, but it should be a lot less pronounced than what you see in version 1 and version 2. And the other thing is the way data is organized is it creates that index for each block of time. And as I mentioned, the blocks of time are configurable by the user, and the default is 20 minutes, 1 hour, 4 hours, 24 hours, 5 days. So, a 5-day block of time is the largest index you would have. And you can change that, potentially, to just say, “I only want 1-day block of blocks of time,” and it would create that index per. There’s another down-in-the-weeds implementation detail that makes the index much more efficient than it was in version 1 or 2, which is the index doesn’t actually store the string values that you’re indexing. It hashes them into U64s. So, from a space perspective, the index is much more efficient, so.
PETER BARNETT: 51:12
Great. Thank you. So, keeping on the differences between older versions thread, one of the ones we keep hearing about as well is Flux. Just what’s the view of Flux going forward and will it be incorporated as part of Core and Enterprise as well?
PAUL DIX: 51:27
So currently, we weren’t able to bring Flux forward, right? The reason we were able to bring InfluxQL forward is because it looks very similar to SQL, and we were basically able to create in Rust. So, this version of the database is written in Rust. Previous versions were written in Go. We were able to create in Rust a parser that would parse InfluxQL into a Data Fusion logical plan. Data Fusion is the query engine that we built the database around, right? It’s a SQL query engine. So, because of that, we were able to bring InfluxQL forward. Even that effort was still one person working essentially on their own for six months and then another person joining them is basically like a year-long effort to get that into the database. And continuous improvement after that with testing based on people hitting bugs where it wasn’t totally bug for bug compatible with previous versions of InfluxQL. Flux is not just a query language. It’s a scripting engine and has libraries and all this other stuff. So given resource constraints and time constraints and stuff like that, we just couldn’t create a Rust implementation of Flux. And we aren’t able, at this time, to create a bridge between the two. We had tried to create a bridge between Flux and this new version of the database in our distributed products a few years ago. And based on actual testing with customers, we found that the performance of it was so poor that basically broke the experience. So essentially, at this point, we don’t have a migration pathway for Flux. We aren’t able to bring it into version 3. That may change over time, but right now, we aren’t able to do it. We did try for a couple of years, and we just were not able to find a solution that made it, from a performance perspective, appealing to anybody, so.
PETER BARNETT: 53:28
Okay. Is the new V3 Write API capable of writing binary Parquet data?
PAUL DIX: 53:40
Sorry. I was muted. It is not. So, we do want to have a bulk load API at some point, and that bulk load API will probably support the Parquet format. But for streaming data, which is what the V3 Write API is designed for, it just doesn’t make sense. Parquet is in a format where you stream individual rows at a time. So bulk load is something we want to do. I don’t imagine we will get to that during this Alpha or Beta period. That is probably a feature that we’ll add later in the year after we’ve gotten to the general release.
PETER BARNETT: 54:21
Okay. So, processing engine. For the embedded Python, is it plain Python or will it allow Pypackages that have compiled backends?
PAUL DIX: 54:30
So, it is Python with a virtual environment. We’re literally doing this work right now. You can import packages. The PR that’s up right now that we haven’t yet merged in will support both pip and uv as the kind of—people say uv or uv. It’ll support either one of those to load packages in. So right now, the processing engine is only supported in the Docker builds. Although I think with this change, we’ll be able to support it with Mac OS builds and Linux builds. So basically, it will use the Python with virtual environment that when you start up the server, you specify that information. And you’ll be able to use basically any Python package that you’re able to install into that virtual environment. And we also have an API where you can give it a requirements.txt, and it will load those for you on the server.
PETER BARNETT: 55:34
Okay. Keeping on that thread, we talked about what the processing engine, where it can sort of fill some of those goals. Speaking about Telegraf, is Telegraf now inside the DB or is it still compatible with an external Telegraf instance, sort of an output plugin?
PAUL DIX: 55:51
So, Telegraf is still compatible with InfluxDB 3. We’ve updated a couple of plugins to support version 3 specifically. So, you can continue using Telegraph. But you can also like with this plug-in system, you could use the database as Telegraf, right? You can configure basically this database to run at the edge without object storage without even using the local disk if you don’t want it. And you can configure plugins to collect data and then write that data into a remote target. So, you could use Core or Enterprise as an edge data collector. The nice thing about using it as an edge data collector is it’s queryable, and it has a persistence layer if you want to use that. So, you could literally use it for store and forward capabilities and stuff like that. So, from the perspective of what the possible feature set is, what the feature space is, it’s larger with using InfluxDB 3 Core and Enterprise as a data collector than Telegraf supports. Telegraf’s written in Go, it doesn’t support dynamic loading of plugins, for instance. If you want to have a dynamic plug-in, what people end up doing is they end up shelling out to some arbitrary code. With Core and Enterprise, you can create a plug-in on the fly and load it. There was one key plug-in feature that I forgot to include, which is that you can load plugins from a directory on the server, but we also have a GitHub repo that will have plugins. We have example plugins there, and we’ll take plugins from the community. And you can load plugins from that GitHub repo on the fly just by referencing the path.
PAUL DIX: 57:48
We can probably share that somewhere. I mean, we’ll share it in the Discord. But yeah, the idea is you can load plugins dynamically into a running server without any sort of restart or anything like that, which is a feature we’ve long wanted for Telegraf and haven’t been able to because of the implementation language.
PETER BARNETT: 58:07
Okay. Great. I do know we are coming up on time here, so we’ll continue to answer some questions beyond this top of the hour because we still have a lot left. But for those who can’t stay, again, I would highly recommend you join the Discord. I’ve dropped the links in the chat or the Slack community. But really, if you want to have that continuing conversation and questions, please feel free to join there. Paul, we’ve got several questions all talking about kind of the same idea, which is the 72-hour limit and just some questions around Core there. Really, first off, are you able to write older timestamps to that limit in general, and then number two is if you’re looking for getting that longer-frame extended querying capabilities, is there an open-source solution for that type of capabilities?
PAUL DIX: 58:54
Yeah. So, as I mentioned before, you can write data to any time period. So, you can write it to a week ago, a year ago, whatever, and the database will persist it. If you want longer-range time queries, there’s nothing that we have in the open source in version 3, right? To do that, you need to have a compactor, and we chose to keep the compactor in the commercial products so that we could build a business and continue selling a time series database. One alternative we had considered was putting the compactor into a source-available build and then having limitations on what you could do with it and stuff like that. And we decided that, one, it’d just be a little too complex to deal with, and two, we didn’t want to do a source-available build. We really prefer to have either open source that’s permissively licensed, and you can do whatever you want with it, up to and including, competing with us, or a commercial product where it’s very clear that this is the thing we sell and this is the value it provides.
PETER BARNETT: 01:00:04
Right. And so again, I do want to also mention there is, though, a free version of Enterprise that you can certainly use. While not open source, if you’re using it for a home sort of use case, there is Enterprise at Home, which will be sort of a great option for those types of pieces here. So, getting into a few other questions around storage itself, is it possible to keep disk storage 1.x instead of object storage?
PAUL DIX: 01:00:27
Yeah. So, the database uses an object store abstraction. And when you start up the database, you tell it what object store to use. And like I said, the valid ones are memory, local file system, or S3, Azure, or GCP, or an S3-compatible object store.
PETER BARNETT: 01:00:48
Deleting data in V1, V2, cumbersome at times for sort of something we’ve known and heard before. Is it any easier in V3 and what’s sort of the path forward there?
PAUL DIX: 01:01:01
So, Drop Database and Drop Table are currently supported. And those are basically soft deletes right now. So essentially, we rename the database and we rename the table, but we keep the files around in object storage so that you can turn around and create a new table with the exact same name or a new database with the exact same name. We will have row-level deletes, but that will only be in Enterprise because the way row-level deletes will work is you will submit a request to the server to say, “I want to delete data matching this predicate in this table.” And then the compactor will pick that up. And basically, what it will have to do is it will literally have to rewrite all the Parquet files that match the predicate that you provided, right? So, deletes are not cheap because it means you’re literally rewriting data. But the design goal is that deletes can happen without it impacting performance. The Enterprise is designed so that you can pull the compactor away from the nodes that are serving queries that are serving rights. Now, the other side of that is deletes are eventually consistent, right? Deletes don’t happen immediately. If you issue a delete request, that gets picked up, and that happens on whatever timeline it takes for the compactor to go through and process all that data. So, deletes are really the functionality is probably more designed for GDPR compliance than it is for frequently doing deletes. If you’re frequently deleting data, it’s not something that the server is really optimized for. Or if you’re doing that, I would say you have to think about the schema design for where you’re putting what data, right? You’d probably want to isolate stable data that you know you want to keep around from unstable data where you may be issuing deletes. And basically, in the unstable data tables or databases, you’d keep the amount of data you have in there limited, right?
PETER BARNETT: 01:03:03
Okay. Maybe we’ll answer just a couple more questions here. Grouping some of these together, clustering. Can you maybe just talk a little bit about what makes the clustering approach with Enterprise different from perhaps our InfluxDB 3 clustered product that’s currently available?
PAUL DIX: 01:03:21
Yeah. So, the clustered product is designed to run— This is the distributed version of the database that we first released in 2023. It’s designed only to run inside of Kubernetes, right? So, it has a bunch of tooling that goes along with it. It has a totally different architecture. It has these ingesters. It has a shared catalog, which is a Postgres database, and a catalog service. It has a separate compactor and separate queries, but it also has a bunch of code and tooling to essentially to scale the ingestion tier, to scale the query tier, right? None of the operational tooling that Clustered has is part of Enterprise. Enterprise is basically like it provides the lower-level building blocks to create a cluster, but operational tooling and all that kind of stuff are basically at this point on the user to create, right? You’d have to set up your own load balancer to load balance rights or to load balance queries. You’d have to set up whatever logic you want to make sure that if a writer goes down, it gets spun back up. Or if a querier goes down, it gets spun back up. I mean, the nice part about that is it’s more flexible because it’s not this turnkey all-in-one solution. So that if you want to run it on bare metal you can, or if you want to run it inside Kubernetes you can, but you’re going to end up creating the Kubernetes deployments and all this other stuff to make all that work. So, one of the key architectural differences between enterprise and the previous distributed version of the database is the catalog in enterprise is stored only on object storage, right? So, the only dependency that enterprise has in terms of third-party software is an object store, a network addressable object store. Nothing else is required.
PAUL DIX: 01:05:17
The other thing is distributed doesn’t have this concept of TTBR. When it accepts a write and returns a response, that data will show up in a query. Whereas Enterprise has TTBR for downstream replicas. One of the nice things about the enterprise architecture is because object store is the only thing that these components share, you can have a truly isolated model where ingest, query, and compaction are completely isolated from each other. I mean, they do share the object store. So, if you bring the object store down, then great sadness for everybody. But in the previous distributed version and clustered and dedicated, they have also a catalog service that they share. So, it’s kind of a set of trade-offs. Yeah.
PETER BARNETT: 01:06:11
Okay. So, two more questions here. One is around migrations. Can you just talk about the migration from core to enterprise as well as migrations you may expect from V1, V2 to a V3 instance? Any timelines as well?
PAUL DIX: 01:06:27
Yeah. So, since Core is designed to basically be optimized for recent data, we’re not really building migration tooling. We may build migration tooling now that you can query any historical period, but we’re not focused on building migration tooling for V1 and V2 to core, right? The idea with core is you can turn it on. If you mirror the rights from V1 and V2 into core at some period, at some point in the future, you basically flip a switch, and then you point all your queries to this new version, which will have the window of time that you care about. For enterprise, we will build migration tooling to go from V1 and V2 into enterprise, right? And that will be a tool that you have to run. They’ll convert all the old TSM data into Parquet files that will get put into object storage that enterprise can read. Now, for migration from Core to Enterprise, it’s not a migration at all, right? They share the same file formats. They share the same structure. So, all you have to do is replace the binary, replace the Docker image, turn it on with the same arguments that you started the core server with, and it will load the data from object storage and turn on immediately. The only thing that will take a bit of time is compacting historical data, right? So, if you have a large block of historical data that you had written in core and you want that compacted, that’s going to take time to happen because I have to rewrite all that data and reorganize it to optimize it for queries.
PETER BARNETT: 01:07:59
Okay. Great. And then maybe talking lastly, we’ll maybe stop here at the last question as we’re approaching 10 minutes after. Just talking briefly about larger imports of data, which we’ve had a few questions on. What’s sort of the best process to go through in importing a lot of historic data, millions of data points all at once? How would you sort of recommend going through that process?
PAUL DIX: 01:08:24
Yeah. So, if it’s not using the migration tooling that we provide, unfortunately, the way you have to do that is you have to write it all in the front door with the line protocol, right? So, it’s going to take some time. As I mentioned, we will have bulk load APIs. And in fact, we will probably build those APIs over the course of building migration tooling. So, at that point, what we will try to do is make some sort of command line interface to that migration tooling available so that you could load data from some other source into some format and convert it into the Parquet files that Enterprise expects and put them into object storage. So that is basically going to take a bit longer, right? We’re probably not going to get to that during this Alpha or Beta period. So later this year.
PETER BARNETT: 01:09:25
Okay. Great. Thank you, Paul. We do have just over almost like two dozen questions still between what’s in the Q&A plus the chat. And so, we’re not going to have time to get to all of those, unfortunately. Would highly recommend you join us on Discord. There are a lot of questions around pricing, questions around performance metrics, questions around just some differences, again, between V1, V2, V3. If you’ve been in Discord, you’ll see we’re continuing to answer questions there all the time, and we can definitely continue to do that. So, if you have these questions, again, please bring them to the Discord channel. You can also follow us on Reddit. You can go to Slack as well. We have definitely enjoyed this opportunity to answer all these questions and kick this off. And again, just the last thing I would say is this is just an Alpha, and we are really listening for a lot of your feedback. We’ve already made some big adjustments, changes, and updates based on the feedback from the community. And want to make sure we continue that process, again, during this Alpha and Beta period as we approach GA. With that, we’ll go ahead and close here. I certainly appreciate you all taking the time to tune in and give us all these great questions. And I can’t wait to see what you build over the coming months. Thank you all.
PAUL DIX: 01:10:39
Thanks, Pete.
[/et_pb_toggle]

Paul Dix
Founder and Chief Technology Officer, InfluxData
Paul is the creator of InfluxDB. He has helped build software for startups, large companies, and organizations like Microsoft, Google, McAfee, Thomson Reuters, and Air Force Space Command. He is the series editor for Addison Wesley’s Data & Analytics book and video series. In 2010, Paul wrote the book Service-Oriented Design with Ruby and Rails for Addison-Wesley. In 2009, he started the NYC Machine Learning Meetup, which now has over 13,000 members. Paul holds a degree in computer science from Columbia University.

Peter Barnett
Senior Product Manager, InfluxData
Peter Barnett is a Senior Product Manager at InfluxData, where he guides the development of new InfluxDB solutions. With more than eight years of experience in software engineering and product management, his expertise lies in data analytics and time series products. Peter previously was the Director of Product at a Series B startup and worked as a software engineer for a Fortune 500 organization. With multiple degrees in technology and business, Peter is passionate about solving complex problems and delivering value through innovative, customized solutions.