Infrastructure Monitoring with InfluxDB | Live Demonstration
Session date: Oct 19, 2023 08:00am (Pacific Time)
In this live demonstration, discover how companies use InfluxDB to provide real-time insights into all aspects of their infrastructure through real-life use cases and technical discussion.
Purpose-built for time series data to ingest millions of data points per second from multiple sources, InfluxDB helps organizations quickly identify and resolve issues before they become major problems.
See how using a tool like InfluxDB reduces the total cost of ownership and complexity for collecting data from multiple sources and managing data across multiple storage locations.
In this demo, you will learn how customers use InfluxDB and dive deeper into its technical aspects, such as:
- Collecting observability data (metrics, logs, and traces) at scale, using different protocols (Open telemetry, Kafka, MQTT, etc.)
- Using InfluxDB 3.0 as a gPRC backend for Jaeger traces.
- Leveraging Grafana and InfluxDB 3.0 together to visualize highly granular metric, trace, and log data.
- Persisting data to Apache Parquet easily and automatically. Saving these highly compressed Parquet files on low-cost cloud object store lets you save more data in less space so you can meet your long-term data retention requirements.
- And much more…
Watch the Webinar
Watch the webinar “Infrastructure Monitoring with InfluxDB | Live Demo” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording. |
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Infrastructure Monitoring with InfluxDB | Live Demo.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. |
Speakers:
- Jay Clifford: Developer Advocate, InfluxData
- Ian Clark: Sales Engineer, InfluxData
Jay Clifford: 00:05
Here we go. Rolling in. It is a cold and dreich day—as we like to call it in Scotland—at the moment. I think we’ve got a storm rolling in. Ian, what’s the weather like on your end?
Ian Clark: 00:21
Really nice. I’m in Cambridge, Massachusetts. So, it’s kind of that Norman Rockwell, blue sky, orange leaf. Pretty good day here.
Jay Clifford: 00:35
Awesome. Right. I think we’ve got a good turnout so far. So, since we’ve got half an hour to get through this, let’s get cracking. So, welcome all to Infrastructure Monitoring. This is the third installment in our demo series. We do have the recordings for the others. So, if you like IoT and would like to see some IoT-based demos, definitely check those out. But you’re in the right place for Infrastructure Monitoring in this case. Today, with me, is my favorite solutions engineer from America. Ben’s my favorite UK, but so I’m politically correct here, my favorite from America, Ian, is here to conduct this webinar for you. So, Ian, would you like to tell everyone a little bit about yourself, and then I’ll let you take it away and smash on with the presentation and the demo?
Ian Clark: 01:24
Yeah. So, my name’s Ian Clark. I’ve been on the technical sales side of the house with Influx for just about three years now. So yeah, Ben and I are great friends. We work very closely together. I cover primarily the Fortune 500 book of business here at Influx. But as we’ve kind of expanded and grown pretty quickly, I guess I kind of have my fingers over everybody right now. So, it’s great to work with some of the largest companies in the world, but also great to kind of touch base with kind of the grassroots community members. So, happy to meet everybody and kind of show everyone what we’re working on.
Jay Clifford: 02:08
Fantastic. And just a final note from me. If you have any questions about Ian’s slides or his demo, please leave them in the Q&A at the bottom, and we will get to them at the end. We’re happy to run over time at the end if there’s plenty of questions. So please keep them firing in, and we’ll get to as many as we possibly can. And this session will be recorded, so please don’t worry if you have to drop. You will get all of the recording in about 24 hours, I believe. Ian, I’ll let you share your screen and your slides and let you get rolling.
Ian Clark: 02:41
Cool. Okay. Okay. Infrastructure Monitoring with InfluxDB, a hot topic. As kind of everybody is aware, right, we live in the digital age. Everything has a sensor attached to it now, right? We’re instrumenting not only just our switches and routers, but now, with IoT, whether it’s an oil rig, electric cars, self-driving cars, water meters, solar panels, everything has a sensor attached to it these days, right? It’s kind of, for better or worse, right, the digital exhaust of our world is increasing, right? So, we need a data store purpose-built to capture it, right? And I think Influx is kind of uniquely positioned to ingest all these different metrics, logs, traces, put them under a single pane of glass. And really, kind of the sensor data and some of the stuff that we’ll talk about today, I think, is really kind of best understood in the lens of time, right? How is this thing changing? What are my patterns, seasonality? How has it changed from yesterday? What does it look like relative to a year ago? Those are all kind of questions that we love to ask and answer of our data. And I think InfluxDB is probably the best-in-class data store for those types of use cases.
Ian Clark: 04:26
So, we’ve got about 120, 100 or so folks on the call. We’re all kind of starting. Some of you may have been with us from the very beginning of the 1.0. Some of us might be brand new to Influx, right? So, we’ll start with just kind of a general overview, jump to some use cases, and then a quick demo at the end. Right. So, what is InfluxDB, right? So, kind of, this is our all-in-one, purpose-built data store, right? So, data collectors, we talk about Telegraf a lot. I mean, Telegraf is ubiquitous, really. And somebody keep me honest, but I think if you check the GitHub Stars, it might be more popular than the database itself at this point. So, kind of data collectors, scripting languages, whether it’s InfluxQL, Python, Java, there’s lots of ways for us to not only collect raw data, but kind of manipulate it, filter it, aggregate it, ask different questions of the data. There’s an API layer, right? We obviously want to insert data into the database, read it out, plug it into a visualization platform, other services, interoperability. And then kind of the foundation of that is a purpose-built time series database, right? So, if you kind of look at our product offerings, right? So, there’s InfluxDB Cloud Serverless. This is our multi-tenant solution, right? There’s a free tier, a pay-as-you-go tier. So, it’s a fantastic way for somebody to kind of just kind of dip their toes into 3.0, kind of learn about the APIs.
Ian Clark: 06:21
What are some of the—how do I interact with SQL? How do I insert data into the database? How do I query it out? And that’s available on AWS. And then for kind of larger workloads, some of the commercial clients that I work with, really, I think InfluxDB Cloud Dedicated is really kind of something I’m extremely excited about. So, this is a single tenant, no noisy neighbors, kind of, for kind of a purpose-built cluster, right? And right now, it’s available on AWS. We can put that in any AWS region, and then GCP and Azure will be coming probably end of 2023, maybe beginning of 2024. So, these are kind of a larger kind of purpose-built cluster, as I mentioned, right? Something where you’re interacting with a sales team. And then there’s also InfluxDB Clustered, which Paul just announced maybe two weeks ago in that blog post. And that’s for our folks who don’t necessarily want a managed service, or maybe their security team doesn’t require it, or there’s a use case where it needs to be on-premise, self-hosted. So, you can kind of think of this as the logical successor to InfluxDB Enterprise, right? For folks who need something that is highly available, has commercial support, it’s a fantastic solution for that. So, we’re really excited about 3.0, right? I think in terms of deployment flexibility, there’re options for everybody. Whether you’re a student where it’s some kind of R&D project, and you just want to start with the free tier or the pay-as-you-go tier, all the way up to purpose-built clusters for your company. So really, I think the goal—something we talk about a lot internally is, kind of, time to awesome, right, and kind of meeting developers where they are.
Ian Clark: 08:23
And we feel that 3.0 is positioned really well for that. So, 3.0 itself, right? I think there’s been a lot of excitement over the past year. We’ve published a lot of different blog posts about the rewrite, different architectures, and really kind of think about this as all the lessons learned from building a time series data store over the last almost a decade at this point. So really, this is kind of things about cardinality, scaling issues, query performance, ingest throughput. 3.0 is kind of a new approach to all of those. And we think that we’ve solved a lot of the issues that we’ve seen with kind of older versions of the platform. Right. So fastest time to awesome, purpose-built, in-house analytics, right? So kind of one data store for metrics, events, and traces, right? So one thing we’ve heard from some of our users is metrics are in one data store, logs and traces are in another data store, right? Wouldn’t it be awesome if we could pull that under a single pane of glass? And that’s something we’ve tried to do with 3.0. Again, sub-second query responses, right? Everybody wants their data. They want it now, and they want it faster than other. So you know I’ve done a handful of proof of concepts with some commercial prospects on dedicated 3.0, and we’re seeing query response times of 100 milliseconds, 50 milliseconds. So we’re really excited about some of the performance gains we’ve made relative to older versions of the system.
Ian Clark: 10:19
We’ll talk about this a little bit more in an upcoming slide, but in terms of persistence format, we’re moving to Apache Parquet, right? So, in terms of keeping data forever in low-cost object storage, fantastic return on investment in terms of TCO. What’s the point of instrumenting all of these different sensors in my oil rigs or my electric car if I can’t hold onto the data for several years, right? So, we feel that with 3.0, the gains made in terms of persisting the data on disk is really, really efficient. And then, of course, now with both InfluxQL and SQL, right, the questions that we can ask of the data and who can ask those questions has really been improved, right? I think standard SQL support is something we’ve heard from the community for years, and we’re excited that now anybody who’s familiar with PostgreSQL will feel right at home on 3.0, right? And then, again, in terms of backwards compatibility, we take it very seriously. So InfluxQL is there. So, anyone coming from version 1 or version 2 of the product that has kind of existing InfluxQL queries, those should work pretty much out of the box on 3.0. And then we’ve also spent a lot of time thinking about how we can be good citizens of the open-source community, right? I mean, Influx is an open-source data store. We take that community very seriously. So, kind of, open an interoperability with the data ecosystem, right? So, Apache Parquet, Apache Arrow are all, kind of, open-source standards. And then we are continuing the work in terms of how we expose InfluxDB to other systems. So, whether that be, kind of, dashboarding solutions or at the Parquet layer, that interoperability is something that we’re really excited about.
Ian Clark: 12:24
So, here’s just kind of from 10,000 feet, what does 3.0 look like? Again, if you’re coming from version 1 or version 2 of the system, this is going to look a bit different, but it’s not too scary if we kind of step through it one at a time. So, kind of working from the left side of the diagram over to the right, right, we submit a query to Influx, right? That could be Standard SQL or InfluxQL. Part of the advantage of being an open-source company is we’ve really had a fantastic time kind of working with the Arrow specification. And part of that is the Data Fusion Project, right? So, that is the logical and physical query planner on disk. We have a catalog, a kind of metadata catalog to track schema changes, the number of tables, number of databases, that kind of metadata. And then the Apache Arrow specification allows us to hold a lot of data in memory, right? And then, of course, Parquet is the persistence format on disk, right? So, when I insert data into 3.0, it’s held in an in-memory cache for those kind of really hot, fast query response times, and it’s persisted to Parquet. And then, of course, one of the cool things with Parquet is it’s kind of an industry standard. Anybody coming from a Hadoop or big data Spark, you’ve probably interacted with Parquet already. So, this is where we’re really excited with some of the integrations that are coming in 2024, right? If you want to pull all of your sensor data into Influx and then do some data transformations or cleaning, you can integrate that with TensorFlow, PyTorch.
Ian Clark: 14:22
I think we’re working on some Snowflake integrations, potentially Databricks as well, right? We’re aware that we’re one piece of a much larger ecosystem. So, whether it’s machine learning or data science kind of activities, those are all really easy in 3.0. Especially, one thing I’ve spent a lot of time personally playing with is Jupyter Notebooks and Python. I’m sure Jay and I both write a lot of Python, and it’s really easy to just submit just standard SQL into Influx, pull that into a data frame, and now I can do feature engineering. I can do data cleaning. It’s just a really easy way to interact with my sensor data. Right. So, what’s kind of one of the big problems with infrastructure monitoring in general, right, is this fracturing of the ecosystem. I have my Docker containers. I have network, different kind of, whether it’s Gemini, Cisco, etc. I have my smart apps. My house is instrumented. I have cars. My data sources are all fragmented, and then they end up landing in different data stores, right? Whether it’s some kind of streaming engine, a relational data store, something using SQL, right, or a data lake, a visualization. Now, I have to jump through all these different hoops. Which data store, which tool do I want to look at, what kind of data? And really, again, we feel that InfluxDB can be that single pane of glass where we’re going to optimize for really high volumes of kind of recent data, and then low cost of storage for longer-term data, right?
Ian Clark: 16:13
So, if I just want to see, “Show me every sensor that’s had an update in the last 15 minutes,” right, it’s all held in cache. If I want to go back for three years and look at, kind of, historical trends over time, that’s also very easy to do. And then, now you have kind of a single source of truth for your visualizations, whether it’s Grafana, Tableau, an integration with a data lake, or some kind of machine learning algorithm, right? This kind of time series forecasting. InfluxDB can be the source of truth for all of those. So, infrastructure monitoring specifically, right? What are the things that we—why do we want to instrument this stuff in the first place, right? Like QA, efficiency, anomaly detection, process optimization, and cost reduction, right? When I go on-site with folks who are instrumenting water meters or factory floors, whether it’s an oil derrick or a solar panel, all this, kind of, different infrastructures, these are the pillars that everybody is tracking, right? And we feel that being able to ingest disparate data feeds and pull them all together into a single data store is the best way to attack each of these pillars. Right. So, for example, let’s talk about network and system monitoring, right? Scale, right? It’s extremely hard to ingest, transform, and analyze all these different data feeds efficiently, right? Think about if I’m a Fortune 500 company, how many offices do I have? How many routers do I have in every office? How many packets are, kind of, moving over the wire at any given second, right? If you kind of extrapolate that out, you very quickly get into massive, massive amounts of data.
Ian Clark: 18:18
And then performance, right? We want a system that can respond quickly to a high volume of needle-in-the-haystack-type analytical queries, right? If 95% of my routers or network switches are, kind of, steady state, that’s great. But I need to know, “In all the offices I have across the United States, all those routers, show me the two that are offline or the two that are deviating from kind of some kind of historical trend.” Right? That’s the needle in the haystack. And I want to be able to answer those questions in seconds, right? Not write a query and go off for a cup of coffee because I know it’s going to take 10 minutes to compute. And then, also, the management of this, right? So again, one thing I say to everybody is a lot of these problems are best understood in the context of time, right? Time-based properties and, kind of, the ephemeral nature of this data, it introduces some complexities around its retention, indexing, lifecycle management. All of that is taken care of when you are working with a purpose-built time series data store. Retention policies, timestamps, they’re all first-class citizens. It makes it really easy to, kind of, manage the data and hold on to what you need to and kind of burn off things that you don’t necessarily want to hold on to, right? So, diagrams should be kind of self-explanatory. When we go kind of up and down the whole stack from Ethernet all the way up to my, kind of, end-user applications, we’ve got metrics, we’ve got events, we’ve got traces, and each of those are solved by, kind of, dropping it all into Influx. And then it’s very easy to have Grafana for visualization. Jaeger is a great kind of purpose-built tool for, kind of, visualizing your traces potentially.
Ian Clark: 20:17
There could be other tools for logs and traces, but these all kind of play really nice with Influx and, kind of, the different visualization tools that integrate with it. So again, kind of to just restate the problem, right? Multiple tools, multiple data stores, right? Events, metrics, traces. They’re fragmented across—maybe you have a time series data store for your metrics. You put your events in something else. Your traces are in some kind of NoSQL document store, right? You have three different visualization tools, but if you want to step through—I have a server somewhere, and suddenly, I see on my metrics data store, CPU utilization has spiked up to 99%. Well, naturally, you want to see the events and traces associated with that, but they’re in three different data stores, three different visualization tools. They may even be three different teams or three different departments, right? And so, it makes it really hard to do, kind of, a root cause analysis from kind of soup to nuts, what are the metrics? Where is my alert? How do I do, kind of, a full root-cause analysis? And this is something that we feel that we can start to tackle in 3.0. Right. So, the nice thing with 3.0 and the flexibility is we’re not only just a metrics data store now, also kind of bringing in those events and traces allows us to have kind of a single source of truth for all of them, right? And it makes monitoring much more efficient. It makes it much more proactive when I can alert and synthesize what are, kind of, traditionally three different disparate data feeds.
Ian Clark: 22:13
So, one of the cool things with this integration is Jaeger, right? And if there’s any kind of software or kind of DevOps engineers on the call, maybe SRE as well, right? We’re all familiar with Jaeger in terms of visualizing traces, right? So open-sourced, distributed tracing platform, fantastic piece of software released to the community by Uber. And the cool thing is 3.0 is compatible. It’s a remote backend store for Jaeger now, right? This means InfluxDB can be kind of bolted onto your existing kind of trace infrastructure, right? Where instead of, kind of, maybe other data providers or kind of persistence formats, now, you can plug Jaeger directly into InfluxDB, and it makes it a bit easier to pull in those metrics, logs, and traces under a single pane of glass. And then, of course, if you want to kind of play with this in real-time, check out the URL at the bottom, right, Killercoda. We kind of have a purpose-built kind of demo environment for all of these. Right. So, kind of, in terms of in the real world, right, Disney+ streaming, those are metrics, logs, traces. Just think about the volume of traffic that, when you instrument a service as large as this, right, how much traffic is going into Influx at any given day, right? It’s not a trivial workload. Same thing with Capital One, infrastructure monitoring. Again, the backend systems that these folks are working on are massive, right? So, this is not just kind of a toy, right? These are real production systems and real production workloads.
Ian Clark: 24:15
So right, kind of, the reference architecture here, right, application performance metrics, infrastructure metrics, business transactions. Capital One, they’ve kind of worked on a custom integration with Splunk to kind of build an API alongside Telegraf, and then all of this comes into InfluxDB, right? They collect it, they can transform it, downsample, alert. And then, now, with S3 and Grafana, you get data lake integration, your visualizations. And I think everybody’s kind of really impressed with, kind of, these kind of architectures now.
Ian Clark: 24:59
So, at this point, we’ll jump into kind of a quick demo and kind of bring together everything we’ve talked about in terms of infrastructure monitoring, right? So now, we’ve talked about InfluxDB, kind of this single source of truth for spans, logs, latency traces, right? Now we’ve got a great little Docker environment where I’ve got a HotROD app, kind of a FunWorld web app. OpenTelemetry integration, right, is another thing we haven’t talked a lot about, but that’s kind of the glue that keeps all these things together, right? Now we can instrument everything and kind of seamlessly ingest those metrics, logs, traces into Influx. Jaeger is there to kind of visualize traces, as well as Grafana, right? So this is, kind of—what we’re aiming for here is, kind of, a standard for converting OpenTelemetry into an InfluxDB compatible schema. And an InfluxDB schema can also be translated into OpenTelemetry, right? So, there are parts of OpenTelemetry, the InfluxDB package that can be replaced with Telegraf, but this is, kind of, the north star of what we’re aiming for in terms of getting all of this under a single data store. And I will jump into that right now.
Ian Clark: 26:31
So, this is our kind of Killercoda sandbox environment. So, anybody who has access to this URL, they can jump in here. So, this is our actual HotROD demo app. Let me make this a little bit larger so everybody can see it. So, rides on demand. And the cool thing is whenever I request a car, thanks to OpenTelemetry, we’re also creating these fans and traces. And in real time, these are being ingested into InfluxDB. I’ll just create some more data here for everybody. And if I jump over to my OpenTelemetry dashboard, this is really cool, right? So, Jay and I have worked a lot on this. So now, we get different service names. I can click on something like Redis. I get a trace ID. And look at this. So now I’ve got my latency, service latency, an error rate. I can click into a trace ID. And now, I get the whole relationship of not only my metrics, logs, traces. If I scroll down even, again—I mean, check this out. We’ve never been able to do this in Influx before. So now I can step through each of these requests, see the latency. Super cool. And then again, right, if I want to jump over to Jaeger, just select MySQL, I can find traces.
Ian Clark: 28:27
Again, look how quick that was, right? Now, this is all querying InfluxDB. So, if I want to compare two traces or maybe do something that I can’t do in my Grafana dashboard, we can just click these two and compare these. Right? So, now I can compare two traces for my service and immediately get some feedback in terms of where trace A is outperforming B. And again, my whole application is now instrumented. All the data is in influx. I get this really slick Grafana dashboard where I can alert on everything. I can visualize it. And now, I think for the first time, you can bring metrics, logs, and traces into a single UI and kind of synthesize data feeds in a way that wasn’t possible in, kind of, prior versions of Influx, right? So, this is something that we are incredibly excited about and something that we feel really kind of differentiates us from just being kind of that kind of classic time series data store where people think of it as, “I can only put my metrics in here. I have to go to an Elasticsearch or some kind of purpose-built APM platform to get my logs and traces.” Now we can be that kind of single source of truth. And we’re really kind of excited for all of the different opportunities that 3.0 unlocks. So, we’re right at the bottom of the hour. And I guess, Jay, if you want, we can do a couple of questions.
Jay Clifford: 30:08
Yeah. Ian, we are ready. We have so many questions. So, let’s quick-fire these as fast as we possibly can. I might help out with a few as we go, but let’s begin. So, first question is, what is required for a migration from V2 to V3?
Ian Clark: 30:24
Yeah. So, we’re working on migration tooling right now. The biggest one is the persistence format, right? So, anyone who’s familiar with, kind of, older versions of Influx, right, we have the TSM file, that’s our time-structured merge tree. Obviously, with 3.0, we’re going to Parquet. So we are, as we speak, building the kind of tooling to do that kind of TSM to Parquet conversion. So that’s the big one that’s going to be required in terms of moving to 3.0. The other is—and we’ve published some blog posts about this—but unfortunately, Flux won’t be brought forward to 3.0. So, I’m kind of sad, but it’s a pain point that we just feel that when we benchmark all these different tools, SQL and InfluxQL, the performance was dramatically better, in orders of magnitude better than Flux. So again, the V1, V2 write API endpoints are completely compatible. So how you insert data into the database doesn’t change at all. But from a query perspective, InfluxQL or SQL are going to be, kind of, the two primary ways you interact with the data store.
Jay Clifford: 31:50
Sweet. Thanks, Ian. So next question for you. What is the licensing for on-prem version of InfluxDB 3.0 when we want to just use it for internal POC testing before we commit to developing with it?
Ian Clark: 32:04
Yeah. So, I would reach out to the sales team. How we license Clustered is probably going to be pretty similar to how we license the enterprise product. If you want to do an internal POC on Clustered, I think, Jay, we have a contact sales form if you don’t have a dedicated AE already. Right now, Clustered is in, kind of, a private beta. We’ve given it to a handful of folks who are really, kind of, working day in, day out, hand in glove with our engineering team, and we’re very excited to kind of open up that data later in the year. And, kind of, commercial conversations will, kind of, naturally fall out from that.
Jay Clifford: 32:47
Yeah. And I think we’ll send out an email directly after the webinar, which will contain lots of details, including the sales contact form. So, definitely, you can start that discussion. I will quickly answer this one for you, Ian. Are there restrictions of 3.0 and 2.X with the open-source version, especially when it comes to the low-cost object store? So, the idea when we release InfluxDB 3.0 OSS is there’s going to be two versions. There’s going to be InfluxDB Edge OSS, which will be what we’re going to classify as a real-time data store for leading-edge data. So that means the most recent form of data that you have is going to be extremely fast, extremely queryable on ingest when it comes into InfluxDB Edge. We’re then going to have a second offering, which is going to be called InfluxDB Community. This is still a free-to-use, premium piece of software that will not be open source but will be the successor to 2.X and 1.X and will handle historic data as well as leading edge. So, in terms of restrictions, there will not be restrictions in that circumstance, but you would just be depending on which version of the platform you use, whether it be directly with open source, which would just be leading edge, or whether you need that historical analysis. And you’ll be looking at the freemium version on that release. So just on that one.
Jay Clifford: 34:27
So, here’s a quick question. I’m going to try and summarize it for you, Ian. So, I apologize for the question, that might be a little off. But as I understand, Influx 3, it’s possible to query the data using SQL. So, does this mean that one is able to explore the data across different measurements and perform joins grouped by fields, not only tags like in full SQL, like in true SQL?
Ian Clark: 34:52
Yeah. Yeah.
Jay Clifford: 34:55
Good answer. Easy answer is a good yes. Is there a way to downsample in the new version of InfluxDB 3.0?
Ian Clark: 35:07
It’s not built-in today. The way to downsample—I think, Jay, you were actually the one who authored this blog post, but kind of, there’s a couple of different ways to skin that cat today. Kind of the way we’ve been pointing people is either something like an AWS Lambda or Apache Prefect, some kind of orchestration framework. And then, as I mentioned, kind of, the API endpoints, you can query data with Python, Java, do some transformations, and then insert the data back in. So, there is definitely a way to do downsampling. We’ve talked with product about baking that into the product itself. As, kind of, data is compacted, it can also be kind of downsampled, but that probably won’t arrive before 2024. But we can definitely help you kind of with your kind of existing downsampling workflows on 3.0 because it is kind of a very important piece of the database. And yeah, it’s something that we take very seriously.
Ian Clark: 36:22
Another question for you is, how would you differ InfluxDB 3.0 from Prometheus?
Ian Clark: 36:31
Well, I mean, Prometheus is kind of, right, like the metrics, right? So, Influx speaks [line?] protocol and that’s not changing in 3.0. I believe there is kind of a Prometheus plugin for Telegraf. So, you can do that kind of transformation if you want to standardize on Influx. But I think, kind of, the differentiator is, I don’t know if Prometheus can do the spans, logs, traces, kind of the integration with OpenTelemetry. So, I think we’re kind of positioning 3.0 as more kind of a general-purpose data store while still focusing on time series data.
Jay Clifford: 37:19
Yeah. I mean, I agree with you, Ian. I mean, the Prometheus, although it’s part of, within that, OpenTelemetry stack, it’s purely focused on metrics. In terms of collecting data over a long-range period of time, it gets slower the further back you go. And in terms of cardinality, that can also be an issue. With 3.0, with that single pane of glass, right? Like you’ve been saying throughout the presentation, we’re your general purpose for storing traces, logs, and metrics, and deal with them all in one data store. So, this question about how could we fit this metric system with the new OpenTelemetry protocol, I think I’m just going to say now that Ian answered this with the demo. So, definitely check out the demo. We are fully compatible with the OpenTelemetry Collector. So, you can use us directly with InfluxDB to collect both metrics, logs, and traces. So, you can ingest all of that through the OpenTelemetry stack. So, just integrate InfluxDB through that way.
Ian Clark: 38:21
Yes. So, Jay, the only thing. Fantastic answer. But I would also say if there’s a piece of OpenTelemetry that your organization really needs and you don’t see it, again, kind of, please open an issue on GitHub or something. That kind of feedback is really important.
Jay Clifford: 38:40
That’s a great shout. Yeah. Yeah. I’m with you on that one. Okay. Can you, sort of, quickly summarize, Ian, what the requirements for running InfluxDB Clustered on-prem are?
Jay Clifford: 38:55
Yeah. So, as I mentioned, we’ve kind of taken apart this monolith into a series of microservices. And the glue that holds those services together is Kubernetes, right? So, whether it’s EKS, AKS, OpenShift, whatever flavor, the requirements are the minimum Kubernetes version is version 1.25. And then the other requirement is an object store, right? One that has, kind of, an S3-compatible API. So those are really, kind of, the two drivers in terms of getting Clustered up and running, is that Kubernetes 1.25 and then some kind of object store with an S3-compatible API.
Jay Clifford: 39:43
Awesome. Thank you, Ian. So, I’m just trying to—we’ve got questions in the webinar chat. I’m going to take—if you’ve got time Ian, if we do five more questions and then we’ll call it quits. So, we’ve done that one. Is it going to be possible to input/retrieve Parquet files from InfluxDB 3.0?
Ian Clark: 40:08
Yeah, I get this question every day. So that’s not part of the API today, but it’s definitely something that’s been feedback that we get from a lot of people. I think where you’re going to see that emerge first is integration with Apache Iceberg, right? So, that’ll allow you to, kind of, query Influx from something like a Snowflake or kind of another data lake. And then I think probably sometime in 2024, you may see an API emerge where you can, kind of, interact with those Parquet files directly. But it’s not something that we expose today.
Jay Clifford: 40:49
We have a question here about, do you have some metrics? So, basically, someone’s asking a question about how long it would take to retrieve data for a year based on a set amount. I guess, Ian, you might, sort of, follow up my answer, but I guess we have released performance metrics for InfluxDB 3.0, which is comparative against 1.X and 2.X clusters. Sorry, not clusters, 2.X nodes. So, you can definitely check those out for performance benchmarking of 3.X. And as Ian pointed out, I would just try it. I mean, if you signed up for Cloud today, ingested a year’s worth of data, you would have to go on a paid-for version. But in terms of the samples that you’re looking at, I don’t think it’d be too expensive just to test and try that. So that would be my answer to that. I don’t know if you had anything you would like to say, Ian.
Ian Clark: 41:48
No. I think this is where—I’m on the commercial side of the house. So, this is where a managed proof of concept, really, kind of, the value emerges. Because we would work together just in terms of configuring the partitioning strategy, kind of optimizing the query itself. But I think, like Jay mentioned, the benchmarks kind of stand on their own.
Jay Clifford: 42:16
Quick question. It’s not specific to infrastructure, but I think you’ll have an answer for this one, Ian. I’m new to Influx, but is there a client that can pass OPC data to InfluxDB?
Ian Clark: 42:29
OPC? Yeah. I mean, Telegraf speaks OPC seamlessly. So, if you want kind of a, I wouldn’t say no code, but kind of a low-code solution, I mean, check out the Telegraf repository and that OPC UA input plugin, right? So that’s probably the easiest way to get that kind of data ingested into Influx.
Jay Clifford: 42:55
Yeah. Awesome. Ian. Yeah. I say come talk to us in the community and in the Telegraf channel if you have any trouble setting it up, and we can definitely help you get up and running with your OPC UA data. Right. Let’s go for the final question. Any questions that we didn’t answer, we will definitely follow up in an email for. So please do not worry. We will get to your questions. Trying to get non-support-based questions. There’re so many support-based questions. We’ll definitely get to those at some point. I’ve done that one, done that one, done that one. Okay. What do the log queries look like? Is it compatible with Loki or Log QL?
Ian Clark: 43:46
I think, Jay, you might have been experimenting a little bit with Loki. I’m not sure they’re directly compatible, but it shouldn’t be too much work to transition from one to the other.
Jay Clifford: 43:59
Yeah. I mean, just to follow up that. So, we’re actually working on sort of a custom—basically, we’re going to submit a pull request to Grafana where we’ll have the ability to basically go from traces to logs, and essentially, we’ll just provide a SQL query that would query over the logs table that’s stored within InfluxDB. Now, definitely reach out to our sales team if you are a Loki house or you prefer one of the other ways, like PromQL. Or if you prefer query via traces, in that case, we do actually have an observability partner that provides a layer which allows you to connect your Grafana dashboards to this sort of translation layer, and then that would translate into SQL. So you can continue using all of your PromQL and your Loki-based querying with Grafana, just having this middle layer in between and InfluxDB on the bottom. So just FYI, if that’s what you guys are looking for, definitely reach out to our sales team and we can help out there as well.
Jay Clifford: 45:09
Okay. So, I think we’ll be here all day, Ian. So, I think we’re going to call it quits since we’re 15 minutes over. Thank you, everyone, for joining. It’s been an amazing turnout and lots and lots of questions. We will get to all the questions later on. We’ll send them out via email. The recording for this will be available in 24 hours’ time, so don’t worry if you’ve missed anything. You will get the recording. Ian, it’s been a pleasure, and I hope we have another session together soon. (crosstalk) been doing these sessions so far, so it’s been great.
Ian Clark: 45:45
Yeah. Well, yeah. I mean, find us on GitHub, the community Slack. I think, like Jay mentioned, this is all being recorded. So, for folks who are kind of interested in introducing InfluxDB into their organization, share it with your tech lead, your architect, other developers. And we’re always happy to take in feedback from the community and show you guys what we’re building.
Jay Clifford: 46:18
Awesome. Thank you so much. Have a great day, all, great evening or great morning, wherever you are in the world. And we’ll catch you for next demo day.
[/et_pb_toggle]
Jay Clifford
Developer Advocate, InfluxData
Jay Clifford is a Developer Advocate for InfluxData. Before joining InfluxData he previously specialised in solving industrial pain points using Vision AI and OT connectivity. Jay now uses his experience within the IoT and automation sector to enable developers and industrial customers alike to realise the potential of Time Series data and analytics.
Ian Clark
Sales Engineer, InfluxData
Ian Clark is a Sales Engineer at InfluxData where he works with clients ranging from seed round startups to the Fortune 500. Prior to InfluxData he was a Technical Solutions Consultant at Google. He holds a Masters degree in statistics from Columbia University.