Meet the Experts: InfluxDB 3.0 Product Roadmap and Update
Session date: Mar 12, 2024 08:00am (Pacific Time)
InfluxDB is the purpose-built time series database. In the last year, InfluxData launched an entirely new InfluxDB 3.0 product line—InfluxDB Cloud Dedicated, InfluxDB Cloud Serverless, and InfluxDB Clustered. InfluxDB 3.0 is built in Rust and sits on top of Apache Arrow and DataFusion. Apache Parquet is an open source columnar data file format chosen as the persistent format.
With InfluxDB 3.0, developers gain 45x better write throughput, 90% reduction in storage costs, 100x faster queries for high cardinality data, and 45x faster queries for recent data.
Whether you’re an InfluxDB pro or just learning about the time series database, this webinar will provide developers with a database overview, new InfluxDB 3.0 features, and tips and tricks.
In this webinar, Gary Fowler, VP of Products, will dive into:
- An InfluxDB 3.0 product line overview – learn which version is best for your needs
- Dive into key features and improvements – unlimited cardinality at scale
- Product roadmap for 2024 – what’s coming up in the next 12-18 months
This one-hour webinar will feature a product update and live Q&A time.
Watch the Webinar
Watch the webinar “Meet the Experts: InfluxDB 3.0 Product Roadmap and Update “ by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Meet the Experts: InfluxDB 3.0 Product Roadmap and Update.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. Speakers:
- Caitlin Croft: Director of Marketing, InfluxData
- Gary Fowler: VP of Products, InfluxData
CAITLIN CROFT: 00:00
We’ll get started. It looks like we have a bunch of people. So, once again, hello, everyone, and welcome to today’s webinar. Today is a Meet the Expert series. We have Gary Fowler, who’s our VP of Product joining us, and he will be providing a product update and a quick little roadmap. Without further ado, I’m going to hand things off to Gary.
GARY FOWLER: 00:27
All right, Caitlin, thank you. And thanks, everyone, for taking some time with us today. What I’m showing here is what we’re going to talk about. First, I’m going to take you through a high-level 2024 product roadmap. Then we’re going to dive a little bit deeper into our plans for our first InfluxDB 3-based open source flavor and then some upgrades around it that we’re right now tentatively calling Community Edge and Pro. Then we’ll talk about the InfluxDB 3-based products we released last year and what we have planned for them coming up this year. And then we’ll talk about some other things that are planned but not quite on the roadmap. We don’t know exactly when they’ll land yet. We’ll talk about those. And then finally, we’ll open it up for Q&A. All right. So, before I dive into the roadmap, I have to issue all the standard disclaimers that come with software development roadmaps. This is always subject to change. This is our forecast of today, but there are no guarantees. Those of you that are also providing software for your customers, you know what I’m talking about there.
GARY FOWLER: 01:31
All right. So, here’s a high-level view of the roadmaps. We have this split up into three major areas. First are the new products that we hope to deliver, including that first OSS version on 3.0 that I talked about. Then we will also talk about— our second area is in performance and manageability. We’ve met some, but not all, of our performance goals with InfluxDB 3, so we’re continuing to work on it. We’ve also recognized the need to provide some additional observability and manageability tools to our customers, which is why you see service dashboards for InfluxDB Cloud Dedicated on there. And then we want to provide a basic administrative UI, right? Our previous products had a UI, but we haven’t had one yet with our InfluxDB 3 products, and we want to do that. And then thirdly, there’s some important feature development that we need to do to continue moving InfluxDB 3 forward, including some data eviction features, some new version 3 APIs, and then a big thing that we’re working on is Apache Iceberg integration support, which we are very excited about.
GARY FOWLER: 02:41
All right, so let’s first talk about this family of products that we’re calling Edge, Community, and Pro. These are tentative names. I don’t know that these will be the final names that we’ll come out with at launch, but it’s what we’re calling them right now. But these are the brand new offerings this year. And so really, Edge is our InfluxDB 3-based open source software. This will be for customers that just want to use our open source software. InfluxDB Edge will be our version 3 of that. Shortly after releasing Edge, we’re also going to release a premium version. So, this will have a different license or will be closed source. The free version will be called InfluxDB Community. And then we will have a— tentatively, again, we may change the names. And then there will be an upgrade to a commercial version, an easy upgrade, a license-based upgrade from Community to something tentatively called Pro.
GARY FOWLER: 03:42
So, one of the things that we’re trying to do with version 3 of the open source software is make it much easier to upgrade to a commercial version. We had customers in the past that loved our open source and were very interested in moving to a commercial— moving to a commercial version, depending on which one it was. Sometimes it required a data migration, which many didn’t want to do because it took time and energy. So, we wanted to make this much easier. So, the upgrade from Edge to Community is simply a software upgrade. The data stays the same, no data migration. And then the upgrade from Community to Pro would simply be applying a license. So, we think it’s going to be much easier. If you want to stay on open source, that is great. But if you want to move to a free version, Community, that will be really easy, too. And then if you want to upgrade that to a commercial version that has additional security and some scalability, you can do that as well. So, there will be a few additional features in Community from Edge to give our customers a reason to move to the free version. The free version will be a registered version, and that’s what will allow us to easily be able to apply a license if you want to go to the commercial version. Commercial version, we don’t know exactly the complete feature set yet, but we know we’re going to be adding scaling and security features to it, just like we had with open source software when you moved to one of our commercial products in previous versions.
GARY FOWLER: 05:16
All right. So, now let’s talk about the commercial products we released last year and what we have planned for them this year. And just a reminder of why we built InfluxDB 3 before I dive right into it. The reason that we built InfluxDB 3 was really based on feedback we heard from customers. While some of you liked our proprietary but powerful query and scripting language that we called Flux, there were many that really wanted to use something they were more familiar with and wanted us to support SQL. Others preferred that we bring forward our InfluxQL software. So, we heard that message. That was one of the main objectives of InfluxDB 3 is to provide SQL support. In addition, some of our customers with very large data sets were really pushing the cardinality limits of our previous versions. So, we wanted to give them additional options, but our previous architecture made it difficult to continue to raise the cardinality limits, and we kind of hit a point where, hey, that architecture didn’t allow us to continue to raise it to get the more unlimited type of cardinality that we wanted. And then we had customers that really wanted to use third-party visualization and other tools behind what was already supported. So, we had a plug-in for Grafana for visualization and alerting and things like that. But customers wanted to use some other products like Tableau and Apache Superset and really wanted to work with anything that could support JDBC, and that’s what we set out with 3.0.
GARY FOWLER: 06:56
Both Influx and our customers wanted to reduce the cost of storage, as for our managed product and then for our customers that are self-managed so that they could reduce their storage cost. And certainly, even in our managed cloud service so we could provide on some of that cost savings related to our pricing to those customers. So, cost reduction was also a big driver for InfluxDB 3. And of course, everybody always wants ingestion and query performance to be the best it can possibly be. So, you see here at the bottom, performance was a big factor in why we developed 3.0. So, this is the product portfolio that we have for 3.0. We just talked about the open source software on the left. But our commercial products, early in 2023, we released something called InfluxDB Cloud Serverless. This was basically our multi-tenant solution and where we offer free and pay-as-you-go flavors of our cloud software. Later in the year, we released InfluxDB Cloud Dedicated, about mid-year. This is our single-tenant dedicated cloud solution. And then towards the end of the year, we released something we called InfluxDB Clustered, which is a self-managed version for on-prem or private cloud. We’ll talk more about this one in a few slides ahead.
GARY FOWLER: 08:22
So, here are some of the early positive results from InfluxDB 3. We’ve seen significantly better write throughput and zero time to be ready. The zero TTBR has been really important for some of our customers. That means that the data that gets ingested is immediately queryable. In previous versions, we delivered data via Kafka Streams, at least in our Cloud 2 product, which meant there could be a small delay from when the data was written to when it went off of the stream and into the database. So, there’d be a few seconds of delay. Now, there is no delay. That data is queryable immediately. So, it has really enabled kind of that real-time analytics that some of our customers were looking for. The 3.0, for any of you that have looked at any of our material before, you know it’s based on the Apache FDAP stack. So, Apache Arrow and Parquet. The storage is Parquet, and this is a columnar data store. And so, the improved compression we get from the columnar data store that we use in InfluxDB 3, along with the use of object storage for historical data in 3.0, has really meant that we’ve been able to have a drastic reduction in storage cost. For certain types of queries, we see improved performance. Especially high cardinality data and leading edge or recent data, we see a big performance in queries. I’ll talk more about that in a second because some areas we’ve not seen improvement and we’re working on, and that’s part of our roadmap for 2024.
GARY FOWLER: 10:07
One last thing I would want to point out before I move on to the next slide, though, is the total cost of ownership benefits that InfluxDB 3.0 provides. So, in past versions, customers that were using our open source software sometimes really had to make a trade-off on, “Okay. We’re not purchasing any software right now. We’re using free software. We have to pay for storage. We’re self-managing that. If we wanted to move to—” if they wanted to move to a commercial version, they have to say, “Okay. We’re paying a certain amount for our storage already in terms of disk or cloud storage. Now we are also going to be paying for a commercial license from Influx.” And in addition to that, sometimes it involved a data migration. So, that cost would be fairly high for them. What InfluxDB 3 provides is based on this improved compression and the object storage that we’re using, that allows our customers— even in a self-managed where they’re putting software in their own cloud or on their own premise, what it allows is that reduction in storage costs sometimes makes up for the difference in what the commercial license would cost. So, somebody that is using an older open source version and looking at their total cost of ownership of that open source version versus an InfluxDB 3 commercial version, even if they’re paying for that additional license, sometimes their total cost of ownership is actually less because of how much less they’re paying for storage. And so that is one of the big incentives that we’ll see people use as they decide whether to move to InfluxDB 3 or not, either with the open source or the commercial product.
GARY FOWLER: 12:02
All right. So, that’s kind of a recap on what we’ve been doing or what we did last year with our commercial products in 2024. So, you may say, “Okay. That sounds great, but what are you doing this year?” The one big area of emphasis is in the area of performance. While the new architecture of InfluxDB 3 allowed us to make some fairly significant leaps in some areas of the performance we just talked about, there’s still a few areas where we still have some work to do. Certain types of queries, including historical single-series queries, are taking longer to return than in the previous versions. So, we’re working to try to narrow these gaps, and this is going to be a significant focus throughout the year. So, those of you that went on the 3.0 journey with us, you’ve seen this. You’ve seen performance be great in some areas. In some areas, you would like it to be improved. And we’ve heard that, and we’re working dramatically, frantically, to try to get that— to close those gaps and get better performance. In addition, especially with our Cloud Dedicated product, we haven’t yet provided the tooling our customers need to monitor the service. So, we’re working on service dashboards and some other things to improve visibility and manageability of the platform. You’re going to start to see those things land, hopefully even within this quarter or early in Q2, so we’re excited about that.
GARY FOWLER: 13:28
We’re also doing some key feature development. So, we have some interesting new features planned in 2024, from a new V3 management API for token and database management to a new V3 write-line protocol API. So, you notice I said a write-line protocol API and not just a new V3 write API. The reason I say that is the new architecture of 3.0 is going to eventually allow us to support, potentially, some other formats besides line protocol. It is Arrow-based, so at some point, we’d like to be able to allow ingestion in Arrow format and possibly some others. But we’re going to start with the line protocol API because that’s what everyone is using. Keep in mind, you can still use earlier versions of the API with 3.0. So, we made it to where the V1 and V2 version of the write and query APIs work with 3.0. But we are coming out with version 3 versions of—V3 versions of those APIs as well.
GARY FOWLER: 14:36
I already talked a little bit about Iceberg, but we’re excited about this. We have one customer that is using this right now, kind of the first implementation we have of this, and we’re working to make it more of a general feature that we can offer to everyone. But what it’s going to do is it’s going to allow customers using data lake and data warehouse products—for instance, it could be anything that supports Iceberg, but that’s where we’ve seen the demand so far, it’s going to allow them to query in InfluxDB directly from their query interfaces. So, we have a customer, for instance, that is a big Snowflake customer, and they wanted to be able to join tables from InfluxDB to some of the other data that they had in Snowflake, all using the Snowflake query tools and not have to make a copy or replicate their data. So, this has been something that we’ve seen customers get really excited about when we’ve talked about it. And we’re working on that right now, and we hope to land that sometime this year.
GARY FOWLER: 15:42
And the other big thing that we’re working on—and even though there’s a little bit of philosophy difference here, in that some people will say, “Hey, time series, by nature, it’s historical data. You can’t change history. There should be no reason to delete it,” we know that, hey, mistakes happen, and we do want to have some data eviction features, and so we’re working on those. The first data eviction features to land will be a drop database and a drop table feature. The drop database already does exist. You can delete a database today using the influxctl utility, but you can’t reuse the name, so a little usability issue that we need to address. So, hopefully, in [inaudible] you’ll be able to drop the database and then immediately create a new database with the same name, and then they’ll be able to drop the table. I’ll talk about row deletion in a few minutes. Another thing that I skipped over here that we are working on is attribute-based access control. So, we hope to see some features related to that land this year as well. All right. So, I mentioned InfluxDB Clustered before. This is our self-managed version of InfluxDB 3. I think when we initially launched InfluxDB Clustered, our customers and maybe even our employees assumed this was the next generation of our Enterprise product, but not really sure that is the case. Some customers that may have used Enterprise in the past that are interested in the benefits of InfluxDB 3 may choose to use our Community Pro edition in the future, or some will indeed look at InfluxDB Clustered.
GARY FOWLER: 17:34
Clustered is Kubernetes-based, which makes it a little bit more complex to deploy, but it makes it a lot easier to scale up and down. So, it’s really a good choice for customers with varying workloads, needs to scale up, or growing workloads that they need to scale up. We had a big demand for Clustered, and when we released it, almost immediately, we ended up throttling the deployment as we were learning how to help our customers navigate through the deployment and dependencies. One of the things we identified fairly early on was that we needed to offer Helm Chart support. Not all of our customers had Kubernetes experience, and so just deploying Kubernetes was difficult for them, and so we want to make this easier. So, we’re adding Helm Chart support, we’re doing some other things to make the deployment easier, and then we’re relaunching and making it available to all. Now, that doesn’t mean that you can’t get it before then. The queue is not long. So, if you are interested in Clustered, contact us. We can get it to you. But we haven’t made it kind of— we haven’t reduced that throttle. We haven’t made it a GA product just yet because we want to be able to help our customers get through the deployment, and there’s only so many of them that we can do at once. So, good news for us is there was a lot of demand for Clustered. The bad news was, for our customers, that we throttled a little bit, but fairly early in Q2 or Q3, we’re going to open that up to everyone.
GARY FOWLER: 19:14
All right. So, the last section I have here is some items that we have planned, but we don’t yet have a high-level estimate on when they might land. Doesn’t mean that there’s not work going on, that we’re not going to do them. I can’t forecast them for when they will land yet. So, this is by no means a comprehensive list. Those of you that work in software development, which is many of you, you know how long and large your backlogs can be. We have a very large one. My engineering team gets more asks from the product team, from my team here, than they can handle. We have a very good engineering team, but they can only do so much at once. We do have a large backlog, but I wanted to pick out a few things that we get asked about quite a bit. First one is row deletion. I mentioned that we’re adding drop database and drop table support, and we hope those will come fairly soon. Row deletion is a little bit harder project for us, and we don’t have a projection yet on exactly when it will land. It could be in 2024, but it might be in 2025 before it gets completed. The way that we store data in InfluxDB 3 makes row deletion a little bit more complex because we’re putting the historical data in object store. And essentially, if data is deleted out of that historical data, we need to go and rebuild those historical data files. So, it just will take us a little bit longer, but we do have it planned.
GARY FOWLER: 20:42
Another one that we are asked about is a replacement for the task and Flux capability that existed in InfluxDB version 2. And as our CTO and co-founder, Paul Dix, has talked about in the past, we do plan to deliver a most likely Python-based script editor. Might also support JavaScript. We’re not sure. Definitely, Python. Script processor that you can run at a database level. This means no ingress of the data. You can run it at the database level, and it can be used for transformation, downsampling, etc. I don’t yet have a projection for when this will land. Just prior to this meeting, there was a really nice internal Slack conversation that talked about some progress on this. So, I’m excited for it, but I don’t know exactly when it will come yet. And then speaking of downsampling, it’s one of the things that you can do with the scripting engine. We also do have plans for a built-in downsampling feature, making it much easier to do downsampling without having to learn Flux or use Flux to be able to do it. I don’t have a forecast for when that will land, but it is something that we have been actively thinking about. And with that, I’m going to turn it back over to Caitlin so she can start the Q&A session.
CAITLIN CROFT: 22:05
Perfect.
GARY FOWLER: 22:05
Thanks for listening.
CAITLIN CROFT: 22:07
Thank you, Gary, and thank you, everyone. There are a ton of questions, so let’s see how many of them we can get through. First off, there’s a lot of questions, Gary, around Flux and kind of data querying. I’m just wondering if you can touch a little bit more on why the direction with Flux and what customers and community members can do to query their data now.
GARY FOWLER: 22:36
Yeah. So, Flux was something we had developed for InfluxDB version 2. We’re proud of it. It was a proprietary but powerful scripting language that allowed you to do query and other things. And we do still plan on supporting Flux for a long time on our InfluxDB 2 platform. No immediate plans for Flux to go away or anything like that. We’re just not advancing it right now. We’re basically turning Flux over for community development. So, if there are things that you would like to do with Flux to improve Flux, you have that opportunity. We’ll take community submissions, PRs for Flux. We’re not doing active development on it. And the reason that we stopped the active development is we had some subset of customers, but it was fairly small, that really loved Flux and adopted it. And we feel bad for that set of customers that we are not continuing to do development on it. But we had a much broader set of customers that just said, “Hey, Flux is difficult for me to learn. I don’t want to learn a new language just to use your database. Why don’t you use something that everybody else is using?” And where we typically would see it be a problem is when it was the second generation of folks. So, someone that brought in InfluxDB 3, the engineering team and our customers that would bring in InfluxDB 3, they would get excited about Flux. They would use it. They would get their product to where it was almost in maintenance mode. And then they would leave the company and a new person would come in and say, “I don’t know this Flux. I don’t know what we’re doing. Why are we using this?” And they really were driving us to move to SQL. So, that’s kind of what drove that decision. But we recognize that it’s still out there and people are going to be using it for a long time and that’s okay, too.
CAITLIN CROFT: 24:42 Yeah. And I’ll also just say, and shamelessly plug InfluxDB University again, there’s a couple of courses that we’ve recently released that focus on SQL querying and other querying capabilities. So, in the follow-up email that we always send out after these webinars, we’ll make sure to include links to those courses specifically, so yeah. So, there’s still ways to do it, and like Gary said, Flux is still out there. So, hopefully, that helps everyone who’s been asking about it. Let’s see. My team is currently using AWS as a data lake, and data analytics is used to run long queries. In past InfluxDB events, we’ve been told you guys are looking into using InfluxDB as a data lake and having capabilities to run long queries.
GARY FOWLER: 25:46
I’m not 100% sure I understand the question. Is it just that the queries took a long time to run before?
CAITLIN CROFT: 25:55
It sounds like it. Yeah. I’m not entirely sure. I’m trying to see. So, they just want to— yes. They’re asking where? the queries can take a very long time to finish.
GARY FOWLER: 26:16
Gotcha. So, what we’ve seen so far with InfluxDB 3 is some types of queries, especially queries that cover multiple measurements, for instance— or not multiple measurements, but same measurement, same table, but they cover a number of sensors or different things that you’re tracking, those end up being really fast, especially when it’s the most recent data. So, if you’re looking for something that is, “Hey, I want to see something that is a number of sensors over the last 10 minutes,” or something like that, those run really fast. The historical queries, it really depends on the type of historical queries. In some cases, we’re seeing the historical queries be faster with InfluxDB 3, but other types, especially when you’re zeroing down to a single series for a long period of time, are slower than with InfluxDB 1 or 2. So, it really depends.
CAITLIN CROFT: 27:20
Okay. Perfect. Let’s see. Someone says, “We are evaluating InfluxDB for large time series data. What would be a good way to establish a benchmark? Should we look at open source or Cloud and Clustered implementation?”
GARY FOWLER: 27:38
So, we don’t have our InfluxDB 3 open source version out. We’re looking at that. And I saw one question passed by for timing on that. We’re looking around the end of Q3 for that. So, right now, to do it, you would want to use one of our Cloud products or Clustered on InfluxDB 3 to do that benchmarking. And then towards the end of September, around that timeframe, then hopefully, you’ll be able to use the open source version.
CAITLIN CROFT: 28:13
Will InfluxDB 3.0 open source and Community support S3 for storage?
GARY FOWLER: 28:20
Yes. So, all of the Influx— excuse me, frog in my throat. All of the InfluxDB 3 software is designed to be able to store that historical data. So, yes, data gets stored in Parquet files, and then we put those Parquet files in that object storage. So, S3 is what we are using right now for our Cloud Dedicated and Cloud Serverless products because those are AWS-based. And so, you should be able to use S3 or equivalent object storage.
CAITLIN CROFT: 28:58
Someone’s asking about embedded processing. Will it also support Rust?
GARY FOWLER: 29:06
Good question. I don’t know the answer to that yet. We’re not far enough along. We know that we want to support Python. The software itself is written in Rust, so that would be an interesting one for us to do, although we probably will see more demand for JavaScript than we would Rust at this point. But Rust would probably be an easier one since we are native to Rust, but I don’t know yet.
CAITLIN CROFT: 29:33
Cool. Let’s see. Any tips and tricks on migrating or upgrading from InfluxDB 1.X of 3.0?
GARY FOWLER: 29:48
Yeah. So, certainly, you have the ability to use the query APIs to query data and then the write APIs to write them into InfluxDB 3. We are planning some migration tooling to help you with that. Our first iteration of that has been for Cloud 2, to move from Cloud 2 to InfluxDB 3. And we’re getting fairly close on having that tooling available to move from Cloud 2 to InfluxDB 3. Hopefully, towards the end of this quarter or early next quarter, we’ll have that around and available. Then we are going to start working on the version 1 tooling, and we’re projecting that out towards the end of the year.
CAITLIN CROFT: 30:38
Perfect. The roadmap in GitHub states, persist event stream, subscribe to the Parquet file, persist events useful for downstream clients to pick up files from object store. Is this still planned? Would it be a way to trigger the event when data is received or maybe not yet persisted?
GARY FOWLER: 31:02
I don’t know the answer to that. I can talk to my engineering team about that. I don’t know the— I don’t know if that will be the case or not. I do recognize that we have customers that want to be able to trigger things based on something that happens, and that’s one of the reasons for the embedded scripting. But the triggers, I don’t know if they will simply be query-based, that you’ll have a persistent query that looks for certain data, or if there will be some other hook or trigger that will allow you to trigger those events. I don’t know that yet.
CAITLIN CROFT: 31:46
Is there any multi-tenant support on the roadmap for any InfluxDB 3.0 products?
GARY FOWLER: 31:54
So, the InfluxDB Cloud Serverless is our multi-tenant product right now. So, it is available. InfluxDB Cloud Serverless is essentially our upgrade of Cloud 2, basically, using the near-same Cloud 2 experience that our customers have had, but with using InfluxDB 3 as the backend engine for it. So, if you sign up for a new account today, if you’ve signed up through directly with Influx, or if you sign up through AWS, you are going to get— what you get is our InfluxDB 3 product, InfluxDB Cloud Serverless. So, you can use that today. If you are a marketplace user through GCP or through Azure, that’s not available yet, you would need to sign up through our website to get the InfluxDB 3. If you do it through one of those two marketplaces, you’ll still get InfluxDB 2.
CAITLIN CROFT: 33:02
There are so many questions, Gary. So, really appreciate it. So, it sounds—
GARY FOWLER: 33:08
That’s why I allowed plenty of time for this part. I knew there would be a lot of questions.
CAITLIN CROFT: 33:11
Yes. So, this one we might need to follow up afterwards, but I’m going to ask it just in case you do know the answer. It sounds like the Flight SQL data source for Grafana Enterprise paid is broken. According to Grafana, excuse me, the data source is not supported by Grafana, so the Flight SQL developer will need to fix it. Is there anything that we can do on our end to help look into this?
GARY FOWLER: 33:43
So, I think I know what this one is referring to. So, we uncovered an issue with the plug-in for InfluxDB or Grafana when using it with the Clustered product. It’s an issue with gRPC permissions. And actually, a community member put up a PR in the Grafana repo for that. And just earlier this morning, just before this, I saw a note from one of the engineers at Grafana telling me that they’re accepting and merging the PR. So, there’ll be a fix for that very, very quickly.
CAITLIN CROFT: 34:31
Cool. Perfect. Let’s see. Someone’s asking if there’s any archival utility to cold storage added to InfluxDB 3.0.
GARY FOWLER: 34:45
Not necessarily a utility yet. But really, when you talk about the way that we organize and store data, it is in Parquet files. And so, the default partitioning is time by day. So, if you haven’t changed from the default partitioning, you will essentially have a Parquet file for every day’s worth of data that you have. In most cases, there’s some variation, but in most cases, that what you would— that’s what you would have. And you will have it in S3 or in whatever object store you’re using. So, you have the ability to— if you’re using Clustered or you’re going to be using the upcoming OSS offering, and you’re running that yourself, you have the ability to do whatever you want with those files. You could move them off into another location if you wanted, in cold storage, or really do whatever you want with them. We do plan on offering some additional manageability in our managed products to be able to do something with those. But right now, we’re managing that storage, and we’re the ones that are putting them in object storage.
CAITLIN CROFT: 35:58
Perfect. A couple of people are asking if the recording will be made available. Yes. It’ll be available by tomorrow morning, as well as Gary’s slides, so no reason to panic there. It’ll definitely be all available for you guys to watch. And I know a bunch of people have also asked if we can give any more concrete timelines for what Gary’s talked about. We can’t, unfortunately. I would just say, definitely keep an eye out on the InfluxData website, our blogs, and our social media feeds. As soon as we have more dates to share with you guys for launches later this year, we’ll definitely be sharing it loud and everywhere. I don’t know, Gary, if there’s anything else you wanted to add to that.
GARY FOWLER: 36:50
Yeah. I mean, we did talk about some general dates in kind of my first slide. So, the open source software— the first V3 version of the open source software, we’re looking at towards the end of Q3. There may be some availability to— there will likely be some availability to get hands on a pre-release software from the repo before that. But it’s towards the end of September when we plan on releasing that. Some of the things like service dashboards for Cloud Dedicated, we’re actually hoping for as early as early Q2. We’re making some of them available to a customer in an early release in Q1 even, so we think that’s fairly imminent. The drop table and drop database with rename, we think that will land in Q2, maybe towards the middle of Q2, maybe, maybe towards the end. Not 100% sure there, but we think it’ll be in the first half of the year. Yeah. So, those are some of the things that— a few dates for you, at least.
CAITLIN CROFT: 38:05
And I really appreciate our community being super patient with us. I know we’ve been talking about IOx and InfluxDB 3.0 for a long time, and we’re getting it out to you as quickly as we can. So, really appreciate everyone’s patience on this. Let’s see. Next question. For on-prem, will the Python/scripting engine run on the same node as the data? Any options to parallel run the workload, i.e., to speed up large queries that can be split out and run separately?
GARY FOWLER: 38:40
Yeah. Really good question. So, where the scripting engine will first land is on a single-node version. So, it will be in the— it will be running where the other software is. If that happens to be where the data resides, then it will be in the same place. When we move it to our distributed flavors, so Cloud Dedicated and Clustered, there may be some other options on how it’s run and how it’s distributed, but I don’t have the specific details on that yet because it is too early in the development of that feature.
CAITLIN CROFT: 39:18 Does the Telegraf/tail file interface stay the same between version 1 and 3?
GARY FOWLER: 39:28
So, I’m not 100% sure I know the answer to that question in that Telegraf is not specific to any version of InfluxDB 3 or of any flavor of InfluxDB. So, it shouldn’t behave differently with 3 than it did with 2 or 1.
CAITLIN CROFT: 39:50
Okay. Okay. Does InfluxDB 3.0 still support multiple retention buckets per database, or is that feature not required due to unlimited cardinality?
GARY FOWLER: 40:07
Yeah. So, in InfluxDB 3, the retention policy is only set at the database level, and it’s just one policy right now. There has been talk in the past of allowing retention on the table level, but we don’t have that yet.
CAITLIN CROFT: 40:29
So, we released a benchmarking of InfluxDB 3.0 versus InfluxDB 1.X. Do you know if there’s any plans for other benchmarking?
GARY FOWLER: 40:42
No plans right at this moment to do that. One of the things that you see with benchmarking that is run out there in different places is that often, they’re built for whoever is building them to portray whatever message they want to portray. And so, we generally just would like our customers to POC and try it for themselves and see how it’s going for themselves rather than to rely on some public benchmark.
CAITLIN CROFT: 41:18
Will the new version of InfluxDB open source also have the new web UI, or will this be part of the Community or commercial options?
GARY FOWLER: 41:28
Don’t know for sure yet. We believe that there’ll be some UI. It may not have all of the same flavors. But I don’t know that that UI will be available on day one of the open source release. There’s a good chance that it would not. So, you would mainly just have API access when we first release open source V3. But if it’s not, then at some point after that, we would expect some sort of UI.
CAITLIN CROFT: 41:59
Does InfluxDB 3.0 implement any kind of data synchronization between instances like OSS to Clustered?
GARY FOWLER: 42:09
I’m not aware of a feature like that right now. Not to say that we wouldn’t do something like that. We did have Edge Data Replication, something called EDR, with previous versions. And in theory, there’s no reason that you couldn’t use that with InfluxDB 3. But there’s no built-in kind of synchronization feature right now, no.
CAITLIN CROFT: 42:34
What is your definition of leading-edge data? Recent data going back what amount of time?
GARY FOWLER: 42:42
Very good question and one that I don’t have a specific answer to because it really depends on the volume of data. But typically, where we see that recent edge data is people that are looking for sensor readings and other things that are kind of the last data that has been written. So, a lot of times, it’s really looking at the last second. But for some customers, it’s looking at the last 5 minutes, the last 10 minutes, the last half hour, etc. So, it depends on the size of the data, but typically you’re talking about kind of the most recent half-hour to hour of data.
CAITLIN CROFT: 43:21
Gary, there’s a bunch of people asking for any more details between the difference between InfluxDB Edge and InfluxDB Community. I know you had a slide that kind of talked about that, but is there anything else that you’d like to kind of highlight or share what the team is working on as far as the differences between the two?
GARY FOWLER: 43:43
We’re not quite far enough along yet that I want to talk about that just yet. Some more details on that, I think, will emerge in the next two to three months, but we’re not quite ready to talk about that just yet.
CAITLIN CROFT: 43:59
Okay. Cool. I know people are— I can tell people are just super excited for those to come out. I’m just going to end with, Gary, you’ve been here for a while, you’ve been knee-deep in the product for a very long time, what are you excited about? There’s a lot that your team is working on with the engineering team, there’s a lot, clearly, that still needs to be done with the InfluxDB 3.0 line, but what, for you, really excites you? What are you excited about for the next six months of the product development?
GARY FOWLER: 44:33
Oh, good question. A lot of things. We’re still really proud of all of our Influx flavors. DB 3 is one of them. So, we’re still actively working with our customers on 1 and 2, and there’s still customers that are downloading our open source software in those versions and buying those commercial versions. So, we’re excited. We think they have a long shelf life, and we’ll continue to support those for a long time, and they’ll be the best fit for some of our customers for a long time still. But InfluxDB 3 does bring us a new set of customers and a new set of use cases that we haven’t had before, like the real-time analytics, that it’s really good for that zero time to be ready. And then especially in the data analytics community, the fact that we’re using the Apache Parquet kind of ecosystem for this makes it highly interoperable. So, when we talk to data scientists that are already doing some analysis with Parquet and Arrow, and they use Python and the tools like Pandas to work with their data, when they hear about this, they get really excited because it’s all the things that they are familiar with, and it’s easy for them to get started and work with it. So, I think that is where I’ve seen the most excitement, is people saying, “Hey, yeah, we really like how easy this is to work with.”
CAITLIN CROFT: 46:01
Awesome. Thank you. Sorry. One more question. People are kind of— there’s a bunch of questions around going from Flux and SQL querying. So, is there going to be more functionality added to the SQL side to increase the parity between it and Flux?
GARY FOWLER: 46:23
Yes and no, in that one of the things that Flux did was write back to the database. And we don’t plan on, at least in the short-term, supporting inserts or anything like that with SQL. So, it would be a combination of our client libraries for writing and SQL for querying. But we do have a few things coming out, like we’re working on parameterized query support in SQL. So, we expect that to land soon, hopefully in the next month or two. So, there’s additions like that that we are adding. But for writing data back to the database, that’s where you will use our write API. Does that make sense?
CAITLIN CROFT: 47:09
Yep. And if anyone has any things that they want to add to Flux or bugs, they can just go to the Flux repo, I’m assuming, right?
GARY FOWLER: 47:21
Yes. You can go to that repo and you can open an item, or you can open it against InfluxDB 3. We’ll get it routed to the right place. But best to go right to the Flux repo that was added this last year, community repo.
CAITLIN CROFT: 47:40
Awesome. Thank you, everyone, for joining today’s webinar. Gary, I know there were a ton of questions. Well, I appreciate you answering them. I hope we got to everyone’s. If anyone has any questions that weren’t answered that you would love to chat with Gary about, feel free to email me. I’m happy to put you in contact with Gary and also our amazing DevRel team, who are also really well-equipped to answer these questions. Thank you, Gary. I know that was a lot. Are there any last parting words or anything else you’d like to add, or?
GARY FOWLER: 48:16
No. Just thanks for joining. Really appreciate we had a nice turnout for this webinar, and really appreciate your time. Thanks.
CAITLIN CROFT: 48:24
Thank you so much, everyone. And once again, this has been recorded and will be made available by tomorrow morning. Thank you.
GARY FOWLER: 48:34
Thanks, everyone.
[/et_pb_toggle]
Gary Fowler
VP of Products, InfluxData
Gary Fowler is VP of Product at InfluxData. Gary has nearly three decades of experience in product management, program management, software engineering, and sales engineering. He previously held Vice President roles in Product and Engineering at iPass, Airborne Interactive, and Lilee Systems. Gary resides in Holualoa, Hawaii.