Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Session date: Nov 28, 2023 08:00am (Pacific Time)
InfluxDB is the purpose-built time series platform. Its high ingest capability makes it perfect for collecting, storing, and analyzing time-stamped data from sensors—down to the nanosecond.
Join this webinar as Anais Dotis-Georgiou provides a product overview for InfluxDB 3.0. She will lead a deep dive into some helpful tips and tricks to help you get more out of InfluxDB. Be sure to stick around for a live demo and Q&A.
Join this webinar to learn:
- The basics of time series data and applications
- A platform overview—learn about InfluxDB, data collection, scripting languages, and APIs
- InfluxDB use case examples—start collecting data at the edge and use your preferred IoT protocol (i.e. MQTT)
Watch the Webinar
Watch the webinar “Best Practices: How to Analyze IoT Sensor Data with InfluxDB” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Best Practices: How to Analyze IoT Sensor Data with InfluxDB”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
- Caitlin Croft: Director of Marketing, InfluxData
- Anais Dotis-Georgiou: Developer Advocate, InfluxData
Caitlin Croft: 00:00
Welcome to today’s webinar. My name is Caitlin, and Anais is here, and she’ll be presenting on, “Best practices: how to analyze IoT sensor data with InfluxDB?” Please post all questions using the Q&A at the bottom of your zoom screen. And without further ado, I’m going to hand things off to Anais.
Anais Dotis-Georgiou: 00:21
Thanks, Caitlin. Hi everyone. It’s nice to be here. So today, we’re going to be talking sort of about best practices, excuse me, and how to analyze IoT sensor data with InfluxDB. So, our basic agenda for today is we’re going to first introduce the basics of time series and time series applications. Then we’ll give a platform overview of InfluxDB and how it looks today as well as Telegraf and other ecosystem compatibility. And then we’ll learn about how we can actually start collecting data at the edge and use it for our preferred IoT tool and IoT protocol. So, my name is Anna Dotis-Gergiou, as Caitlin has already introduced me. I’m a developer advocate at InfluxData, and I encourage you to connect with me on LinkedIn if you want to. Feel free to reach out and ask any questions that you have there. Or I can also share any links that are here in case you missed the email. But all the links that are in this slide deck will also be sent to you. So, you’ll have them as Caitlin mentioned. And so, as a developer advocate, for those of you who aren’t familiar with the role, my job is to represent the company to the community and the community to the company. So I do that by making webinars like this and tutorials and answering questions and generally just trying to teach people how they can use InfluxDB and overcome any hurdles that they have with any projects that they have while they’re using it and then also to give product feedback back to the company so that we can make sure that we’re actually building something that is in alignment with what people are looking to use. So, I will spend part of this webinar just kind of taking a step back and making sure that everyone understands the space and also InfluxDB. So, let’s dive into that.
Anais Dotis-Georgiou: 02:09
So, time series data is what this is all about, and time series data is any data that has a timestamp. And it typically comes from two main sources. The first is the physical world, and that’s where IoT data is derived from. So that’s sensors, right? That’s things like pressure, temperature, concentration, light, flow rate, etc. And then we have instrumentation of the virtual world as well. So common examples of time series data include things like weather conditions, also stock prices, CPU use. Maybe you’re monitoring endpoints or something. Maybe you’re monitoring Docker, Kubernetes, and you might have healthcare metrics as well. We also have data coming from things like biotech and agritech. And time-series data also includes logs and traces which InfluxDB V3 now also supports.So that’s very exciting. But in general, when we think about time series data, we like to think about kind of three types of time series data, metrics, events, and traces. And you can maybe even put logs in there as well.
Anais Dotis-Georgiou: 03:17
So, metrics are usually used to derive or describe any time series data that is being collected at a regular period, and events are something that happens at an irregular period. So, a concrete example would be thinking about if we are measuring your heart rate. So, your heart rate would be a metric. And any sort of anomaly in your heart rate or any sort of cardiovascular event, maybe AFib, would be an event. And then we also have traces, which you don’t really have in the human world, except maybe my heart feels bad when you vocalize it. [laughter] So yeah. So how they relate is kind of like a nice little graph that demonstrates that we can see that metrics are being collected on a regular period. Events are varied, and a combination of metrics and events usually lead to traces. And then another thing to note is timestamp precision. How frequently can we collect metrics or events? And that depends on the timestamp precision. With InfluxDB specifically, we can collect data up to the nanosecond precision. And that’s usually one thing that kind of separates time series databases from other types of databases is that they don’t support that type of timestamp precision.
Anais Dotis-Georgiou: 04:40
Another thing to look at is data granularity with regards to time series. So, this can also be referred to as the sample rate. And when we look at our original waveform of what’s happening in any sort of environment or anything that we’re monitoring, that would be completely seamless, if we could collect data at even like a higher than a nanosecond precision rate or in real time everything that’s going on. We would have our true original waveform. But when we measure things, we’re sampling that waveform. So, whether or not that’s at 10 nanoseconds or 10 seconds, we’re sampling it at a certain rate or a certain number per that period. And then you can also sample something at a lower data granularity or an even lower data granularity. And oftentimes, when we take our data and we have it at a very high-precision granularity and then we reduce it to a lower one, then that’s known as creating down sampling or oftentimes referred to as creating a materialized view of our data. And that can be helpful because, oftentimes, our original raw waveform can maybe have too much data or too much noise. And if we convert that data to a materialized view, then we can kind of see a general trend in our data without all of the raw data obscuring that general trend.
Anais Dotis-Georgiou: 06:01
So key drivers for a time series application is, first, we need access to our time series data, and we need to be able to store a lot of that time series data. Then we need to be able to analyze that data and perform that analysis at scale with the right security. And then we need to actually be able to act on that data and get alerts when something is not working as we expect and maybe also implement any sort of reactions or further steps that need to be taken as a result of monitoring our environment. So, some of the key components for a time series application is that, first, you need to be able to gather a lot of data from a variety of different data sources. We already mentioned some of those, whether or not those are coming from the physical world or from the virtual world or a combination of both. And oftentimes, when you have all this data coming from a lot of different sources, you’re communicating via various protocols, different log formats, through maybe a variety of different brokers or message buses. And so, some of these sources also require that you have updates. Maybe you use polling, but you need to also know the frequency that you ask for other updates. And then other data sources require that you listen for updates on digital channels, whether that’s TCP or UDP, or that you subscribe to brokers and otherwise accept streams of events. And so, no matter how you get the data, you still need to parse, route, and otherwise process that data once for storage and then often times another time for analysis.
Anais Dotis-Georgiou: 07:42
And so, to add to the complexity of this, usually of these type of time series applications, you’re going to also need to expose this data to your users, whether that’s internal or external. Maybe you need to expose it to customers as well and other partners. And then, also, you need to worry about deploying both the back end and the front end of the app and manage your users, your security, and your data governance and so much more. So, all to say that building an entire time-series application has a lot of different moving parts. And the way that InfluxDB fits in is that we are a real time data database and a historical database. And V3 offers a ton of interoperability that enables you to really be able to build your time-series application with the tools that you need in order to successfully build what you’re looking to build. And there’s also tools that help you get data from a lot of different sources.
Anais Dotis-Georgiou: 08:42
So, let’s talk specifically about time-series data within the context of InfluxDB. So, this is what the reference architecture drawing looks like for InfluxDB V3. Basically, what we can do is we can get data from a variety of different sources and gather any timestamp data, metrics, events, sensor data, traces, logs, etc., and ingest them to InfluxDB with a variety of sources. We use Telegraf, which is our collection agent for metrics and events. Telegraf is also open-source, and it’s plugin-driven. And you can use a variety of input plugins, output plugins, processor plugins, and aggregator plugins to collect the data, process it in any way that you might need to do, and output it to InfluxDB. So, in the IoT space, for example,a reallyy popular input plugin is the MQTT Consumer plugin, and then you can also use client libraries. InfluxDB V3 supports the writing of Parquet files as well as being able to query data and transform it back into a Parquet file. So that offers a lot of interoperability with a variety of tools to analyze any IoT data that you might have.
Anais Dotis-Georgiou: 10:00
And additionally, you can always use something like MicroPython as well as HTTP and write data to InfluxDB that way. And then as far as data and visualization is concerned, again, another part of InfluxDB V3 is offering increased interoperability so that you don’t have to use a new visualization tool and learn how to use a new dashboarding tool. Instead, you can use the tools that you’re already familiar with. So, there’s things like Grafana, the Grafana plugin. You can use that. There’s also Apache Superset and Tableau. And Power BI is coming soon. So yeah, lots of tools for actually visualizing and analyzing your data as well. But I did want to take a moment to kind of highlight some of the differences that were created or contributed to InfluxDB V3 just so that we can have a little bit of an appreciation of where we came from and where we’re going because there are, like Caitlin mentioned, some really big changes. And I think it’s important that we highlight them. So InfluxDB new storage engine V3—or sometimes also you might see references to it as listed as InfluxDB IOx, and those are the same things. IOx was just the original name for it because IOx is the chemical symbol for Rust and InfluxDB V3 is built on Rust. So why did we build it on Rust? Well, Rust offers really performant and fine-grained memory management, and we wanted to be able to give some of that memory management control back to the users.
Anais Dotis-Georgiou: 11:41
So, there’s a bunch of different versions of InfluxDB V3 that are available to users. Most likely, you’re just going to start with InfluxDB Cloud Serverless. And in that case, a lot of this memory control or operator control isn’t given to you. But there are other versions like InfluxDB Clustered, which is fully self-managed and gives you much more control there. It’s also built on Apache Arrow. So, Apache Arrow is just a framework for defining in-memory columnar data. And I want to take a little sidebar to talk about the opportunity that building on columnar data gives anyone who is in the IoT world an advantage. So basically, when you store your IoT data or any data in a columnar fashion, what you’ll notice is that, oftentimes, especially in the IoT world where you are measuring your physical environment, and your physical environment may not change from each record that you record. So, if I’m measuring the temperature of this room and I’m doing so at a minute interval, luckily, because I have temperature control, it’s most likely going to stay the same for several hours. And so, what that means is a lot of those neighboring values are going to be both the same data type and literally the same value. And so, this offers a really great opportunity for cheap compression, which enables the really high cardinality that InfluxDB V3 offers, as well as faster scan rates using SIMD.
Anais Dotis-Georgiou: 13:17
And so depending on how you sort the data, you may only need to look at the first column of data to find the max value of a particular field, for example, contrast to row-oriented storage where you would need to look—if you have multiple fields tag, I mean temperature, humidity, pressure, concentration, etc., if I wanted to find the max value and I was storing something row oriented instead of column, I would have to look at everything across every single field and then go across every single field in order to find the max value of just temperature. So, this is one reason why InfluxDB V3 and the way that it’s built on column-oriented data provides much faster and efficient rates and in general is very well-suited for IoT specifically.
Anais Dotis-Georgiou: 14:09
t’s also built on Apache Parquet. So, Parquet is a column-oriented durable file format, and it’s also built on Arrow Flight. Arrow Flight is a client server framework that is used to simplify the transport of these really large data sets. And Data Fusion is the query execution framework that’s also written in Rust, that also uses Arrow as its in-memory format. So, another advantage of everything being built upon Apache technologies is just having the backing of everyone that is part of the Apache Foundation and all the contributors there. It’s the type of thing where it’s just everyone coming together, collaboration, making all the technology so much more robust, and then giving you interoperability with all the other tools that leverage Arrow, for example, Snowflake, BigQuery, so many more.
Anais Dotis-Georgiou: 15:07
InfluxDB V3 also offers SQL support as well as InfluxQL support. So InfluxQL, for those of you aren’t familiar, is a InfluxDB-specific type of version sort of, so to speak, of SQL, but they’re very similar. Here’s an example of querying with SQL, though, specifically. And there is a query builder for you in the UI, so if you’re not familiar with it, you can use that, although you can also use a variety of other tools to help you build your SQL queries. But basically, one thing that’s worth noting is that one of the main motivators for rewriting the storage engine was to accommodate really, really high use cases—really high throughputs, excuse me, of use cases with really high throughputs of data. And when we think about the growth of instrumentation of data and thinking about how much data creation we’ll have by even 2025, as we can see here, we need to think about how we can provide data historians and real time databases like InfluxDB that can accommodate those high-throughput use cases.
Anais Dotis-Georgiou: 16:16
And so, one of the exciting things to come out for InfluxDB V3 was a benchmark for it, and it is about 45 times faster for recent data compared to InfluxDB open-source 2.X. So, the data set that was used for this benchmark was a data set duration of 24 hours with a cardinality or a dimensionality of 160,000 and the points being spaced at 10 seconds. And if you want, you can see the details of that benchmark with that QR there. And that’s kind of to summarize, again, what I just said on the previous slide, but there’s a 45 times better write throughput a 90% reduction in storage costs, 100X faster queries for high-cardinality data for almost all of the basic functions and in general 45% faster queries for recent data. So specifically, when we look at data ingest performance, we can see that we can write around—I think the peak was over 4 million points per second with an average of, yeah, 400 million points per second with a data set that has 329 million rows per hour, for example. And so, this is 45 times faster than InfluxDB OSS 1.X. And I don’t know how it compares to 2.X, but I’m sure it’s even better.
Anais Dotis-Georgiou: 17:56
And so, essentially, this kind of summarizes why InfluxDB V3 was written on the Apache ecosystem and written with Rust and that our investment in that decision was a good one and that if you are using any sort of IoT devices to write your data to InfluxDB or a fleet of IoT devices, that you no longer have to worry about the cardinality. So, it used to be in the past that you’d have to worry about, “How am I going to tag this data and differentiate between all my devices? If I want to start including hundreds of thousands of IoT devices, can I tag those?” Well, now you don’t have to worry about tagging those or contributing to the cardinality of them because InfluxDB V3 can handle it. And the same thing with storage performance. We can see a vastly improved storage performance as well.
Anais Dotis-Georgiou: 18:51
So now I want to just talk about InfluxDB in the IoT space and just kind of mention that you’ve probably used IoT. So, there’s many products and services that people use every day that are running InfluxDB in the background. So, Tesla, for example, is using InfluxDB behind the scenes in home energy to monitor their batteries. And Nest uses InfluxDB for all their connective thermostats. Disney Plus uses InfluxDB, although it’s not an IoT case. And Rapi uses InfluxDB for price fluctuations and availability of any of their drivers and riders in developing delivery networks. And so, this is another functional architecture of InfluxDB for specifically V3 and what it might look like. We also have an InfluxDB Edge version coming out soon. So, if you wanted to, you could potentially write some of your IoT data, that’s a small amount of data, to a lot of different InfluxDB Edge instances and then write those to InfluxDB Cloud, where you would consolidate all of your data from the Edge.
Anais Dotis-Georgiou: 20:05
And with Telegraf, you can do all things like configure your input and also pass your metrics in the right way that you need to the different InfluxDB instances. And it’s worth noting, too, because people ask this question a lot, especially if they are coming from previous versions of InfluxDB, the ingest format is the exact same. So, you still write all your data in Line Protocol, and Line Protocol looks like this. It’s measurements. And then you have a measurement, also the same thing as just your table and SQL. And then you have tags. You used to really have to worry about whether you were going to make a tag and whether you’re going to make a field because it contributed to your cardinality. And you really had to spend a lot of effort making sure that your cardinality didn’t get out of hand. But you don’t have to worry about that anymore. Tags and fields are just columns in InfluxDB V3, but you still make tags and fields as a part of Line Protocol. And what I like to tell users is just, “Think of tags as being any metadata associated with your point and fields being the actual meat of your time-series data. So, if you were monitoring your room and you had a sensor, maybe your tag set would be your sensor and your sensor ID and your field would be the actual temperature, and then you have your timestamp.” And so, yeah, you can still use Line Protocol to write data to InfluxDB the same way that you would with any other version. And so, nothing’s changed there.
Anais Dotis-Georgiou: 21:43
And so now I want to take some time to talk about interoperability with InfluxDB V3. Specifically, a lot of the interoperability I believe that comes with InfluxDB V3 is offered through Parquet, and that’s because Parquet offers interoperability with almost all modern ML and analytics tools. And then, additionally, the Data Fusion API supports both SQL. It has a SQL API as well as a DataFrame API. So, in the future, I hope that we can query InfluxDB and return DataFrames directly and query in Pandas as well. And so that will offer even more interoperability than SQL offers just now. And additionally, it’s worth noting that C+, Python Java, they all have first-class support in the Arrow project for reading and writing with Parquet. And so, whether or not you’re reading any Parquet files and then writing those to InfluxDB or querying InfluxDB—and then you can easily convert that to a Parquet file. Then you can easily write it to a variety of other tools. Also, if you do query with Arrow Flight specifically, whether or not you’re querying Dremio or another SQL database or Snowflake or something like that, that code is going to look the exact same for InfluxDB that it would look for those. So that just means that if you need to use a variety of different data stores and they all use Arrow Flight, then you can really easily use that same code for all of those projects. So that’s pretty nice.
Anais Dotis-Georgiou: 23:24
And then you also get interoperability with a bunch of visualization tools. Like I mentioned, Tableau, Athena, Snowflake, as you know, also offer interoperability because of the Apache Flight. Databricks, spark, so yeah. [laughter] And then also wanted to mention too that there are client libraries for InfluxDB V3, and that assists in utilizing other tools alongside InfluxDB V3. So, we have client libraries for Go, C+, JavaScript, Python, and Java so far. Yeah. So, my favorite client library to use is the Python Client Library, and that’s because there’s a lot of examples for how to write a variety of file formats, including Parquet, but also CSV, Feather, JSON, ORC. So, whether you’re using—whatever ETL or Dag tool you’re using, you can really take advantage of using the Python Client, for example, here, the InfluxDB Query, and get it back at Pandas DataFrame. Maybe take that Parquet file and then leverage whatever ETL or Dag tool you’re using.
Anais Dotis-Georgiou: 24:42
And here’s an example of using the JDCB driver for Tableau. With InfluxDB V3, you can click or use that QR code to see a tutorial of how to do that. And then the cool thing about Tableau, for example, is that it offers forecasting abilities out of the box, and it’ll even analyze which one of the varieties of forecasting methods that are available via Tableau automatically, which one has the best forecast, and just give you the best forecast it can. And so, here’s an example specifically with some IoT data where we’re measuring carbon monoxide over time and then forecasting it. But I’ll also talk about some additional projects that we have performed with InfluxDB V3 and some other tools that are open source so that you can get an idea of some IoT projects that we’re working on so far.
Anais Dotis-Georgiou: 25:39
So, one is major InfluxDB. So, you can look at that QR code as well to get an example of this. So basically, in this demo, we generate some IoT data, and it comes from three machines, machine one, machine two, machine three. And we’re measuring things like the load, vibration, power, and temperature of these machines. And then we send that data to an MQTT broker. And then we use Telegraf with the input plugin called MQTT_Consumer plugin to connect to that MQTT broker, Mosquitto broker. And then this is what the configuration of that Telegraf agent would look like. And then we send that or write that data to InfluxDB itself. And from there, we actually use Mage. And Mage AI is basically—you can think of it as the open-source alternative to Airflow. So, it’s an ETL tool. It has a really nice UI that allows you to really quickly build your pipelines and transform your data any way that you want. And so, for this demo, what we did is we used Half-Space Trees to perform anomaly detection on this machine data. And there’s also another tool as a part of the demo that allows you to generate anomalies in that machine data so that then you can also go ahead and use Half-Space Trees to detect those anomalies and then send a notification to Slack so that you can act on the anomalous data. And it’s all containerized. So it’s a really easy demo to run. And so, yeah, so it’s a great example of having IoT data and then leveraging the interoperability that InfluxDB V3 offers to go ahead and actually perform further analysis and transformation of that data.
Anais Dotis-Georgiou: 27:39
The one thing that I really appreciate about Mage AI is just how easy their code templates are for creating the pipeline. So, you can just specify whether or not you want to do an event detection or pull data from a specific source or create your own data source, etc., or a transformation block. Whatever type of block you have, you can just really easily put that into your pipeline. It’ll generate a script for you, tell you part of the script where you need to put your specific code in. You do that. Super easy to run and use. So thought that was really fun. And then another tool that you could also leverage is Quicks. So, Quicks is also an ETL tool. And there’s another example that you can reference here. So, in this example, we use Quicks plugin, specifically the MQTT plugin, to get data from our IoT devices. And then we send that to a single instance of InfluxDB where we are writing all of our raw data. And then, from there, we use Quick’s actual InfluxDB source. And this plugin allows you to query InfluxDB V3 using the Apache Arrow Flight on a user-defined interval. And then it will parse the data automatically into a Pandas DataFrame and publish that to a Quick stream. And then we can do any sort of transformation we want with Quicks with that Quick stream. And then we can use a second plugin called the InfluxDB Destination Plugin within Quicks to ingest that DataFrame from a Quick streaming topic and then define its structure, what we want to be, a measurement or a tag, etc., from that data frame before writing it to InfluxDB.
Anais Dotis-Georgiou: 29:31
And this is also compatible with 2.X as well as 3.X, the writing too. So, you could use a different version of InfluxDB if you wanted to. And so, part of the transformation that was done here was to, for example, use Keras Autoencoders instead for anomaly detection as well as Holt-Winter’s for forecasting. And then you could visualize both the raw data and the downsampled data within Grafana. And so, there’s a QR code here for that. But also, at the end of this slide, I’ll share links as well for all of these projects. So yeah, highly recommend looking at that. And I just have to say, too—I’ll just kind of put a plug in for these solutions. Before, with InfluxDB V2, we were trying to do some of this work or the similar work with Flux, which was the query language for V2 and also data processing language, but you really just didn’t have the ability to perform this level of anomaly detection or forecasting. It just didn’t exist within the language. And so, previously, you were kind of forced to stay within the constraints of InfluxDB V2 and keep everything within InfluxDB V2. But now that there’s so much more interoperability with V3, you have this ability to do all this sort of processing with the tools that you want to do and just integrate InfluxDB into your overall architecture rather than trying to make InfluxDB the one-stop solution for everything. And so honestly, it’s made building demos like this so much easier.
Anais Dotis-Georgiou: 31:04
And so now I want to take a moment to talk about some IoT use cases. And specifically, we have customers in industrial IoT, enterprise IoT, and consumer IoT. So industrial IoT is usually the area where we’re providing real-time insights and analytics into manufacturing processes. So, we’re collecting and storing and visualizing sensor data from sensors, devices, and specifically industrial equipment, enterprise ID. We’re also collecting sensor data. All this is going to be sensor data, but we’re doing so from smart buildings, campuses, smart cities, and we’re enabling business applications to connect with physical objects and enterprise systems. And then consumer IoT is using sensor metrics digested from DIY projects and smart home devices, things like Home Assistant, for example, to gain cost, operation, and performance insights. We have people that are monitoring their barbecue at home to make sure that they have the perfect smoke ribs, for example. Or I’ve seen people monitor the chlorine levels in their pool, so they know when they have to change those.
Anais Dotis-Georgiou: 32:14
So, I wanted to really focus on some of the bigger customers rather than home automation projects. So, focusing on the industrial and consumer spaces, here are some big names that exist in both spaces. But I really wanted to focus on kind of some of my more favorite use cases. So, in the consumer space, we have Texas Instruments, for example, and they’re using InfluxDB to monitor their operation and detect problems before they become costly. They operate at nearly 100% capacity, which is why even like a momentary blip can cost their organization a ton of money. And so, they get to really leverage all the really fine-grained precision that InfluxDB has to offer. And one thing that I find really cool about that use case specifically is that Texas Instruments started using it because one person in their company was actually first using it for a hobby instance of InfluxDB where they were monitoring different metrics related to their daughter’s health. And so, they liked using that InfluxDB at home, and so then they brought it to their workplace.
Anais Dotis-Georgiou: 33:23
And then another use case that I really like is Bboxx. So, what Bboxx does is it develops and manufactures products to provide affordable, clean solar energy to off-grid communities in the developing world. And so Bboxx stands for battery box, and they use InfluxDB to monitor something like 100,000 solar rooftops. So, I love that use case. And they provide solar energy to around 400,000 people across 35 countries. Farm Pulse is another use case that I really enjoy. So, it’s an Australian agribusiness, and it provides end-to-end solutions for acquiring, storing, and reporting remote sensor data. And they use InfluxDB specifically to store data from a variety of on-farm equipment using a variety of sensor technologies. And this company has been using InfluxDB from the get-go, and they use LoRa satellite to cover and get data from a variety of different places and overcome all sorts of connectivity issues that you might have with getting sensor data from farms.
Anais Dotis-Georgiou: 34:44
And then another use case that I like is Spio. So Spio provides sensors and software solution for monitoring within walls. And so green wall installations, you might have seen them. They’re becoming very popular in the corporate world and also like hotels and stuff like that, but maintaining them can be kind of complex, so they use InfluxDB as well. But a really popular thing to talk about is PTC, especially for industrial IoT. ThingWorx, as you might know, provides end-to-end capabilities to connect devices to build really robust industrial internet of things solutions. And basically, you can use PTC with InfluxDB and use that open-source platform specifically to handle time-series data that’s connected to the ThingWorx ecosystem. And I think, even in version 2, you were able to collect like 3 million points per second. And so, I’m sure, with InfluxDB V3, that duo must be even more powerful. And I’m really excited to see where people push that to what sort of limits.
Anais Dotis-Georgiou: 35:58
So now I just wanted to go over some resources. So of course, there’s Influx community. So, the Influx community on GitHub is where you can find all sorts of projects related to InfluxDB V3, V2, V1 and see how you can use InfluxDB V3 specifically with a variety of other tools. It’s also where all of the client libraries are currently being maintained for V3. So, if you want to use them, that’s where you’re going to go as well. And so, for the Mage demo, whether or not you want to look at a Mage demo that’s using InfluxDB to down-sample data or actually perform that anomaly detection with data from an MQTT broker, you can see that there. Similarly with the Quicks demo, you can look at those resources well, and there’s also corresponding blogs for both the Mage demo and the Quicks demo. And I also can’t speak highly enough about the Docs. They’re fantastic. So, if you want to learn about how to query with Arrow Flight, for example, in a variety of languages, you can use the InfluxDB Docs there and then leverage that code for any other tool that uses Apache Arrow Flight, for example, because the code is the exact same. You just switch the host and the port [laughter] and the token, and then you’re done.
Anais Dotis-Georgiou: 37:20
And then, as Caitlin mentioned before, I encourage you to look at InfluxDB University for courses on InfluxDB. We are slow to update it for V3. So that’s kind of the one caveat that I want to keep in mind here. But there’s still courses on Telegraf that are super useful. If you are a V2 user, then you can go ahead and use that as well. And yeah, if you want to learn more about also, for example, a really cool demo about how to use OpenTelemetry with InfluxDB, that’s great. It uses OpenTelemetry, Grafana, and Jaeger, and it’s all containerized this demo as well. So, it’s super easy to run. And yeah, you can just generate logs, traces, and metrics and then send those with OpenTelemetry and collect those. And yeah. Encourage you to follow these links as well to learn more about that demo as well and run it for yourself.
Anais Dotis-Georgiou: 38:21
So, I think we ended a little early, but that leaves plenty of time for questions. So, I’m going to go take a look at them, and thank you so much. So, you can also use Node-RED for getting sensor data into InfluxDB as well. The reason why I mentioned Telegraf specifically was just because it is another InfluxDB product. But yeah, a lot of people use Node-RED instead of Telegraf and highly recommend using it. I don’t really know if there are any specific advantages. I think Telegraf might be a little bit lighter weight. So, if you have a reason—that might be a reason. You can also install and download Telegraf, yeah, with just the plugins that you’re going to use and reduce the binary size quite a bit. So that might be the only real advantage that I can think of off the top of my head.
Anais Dotis-Georgiou: 39:15
And then John asks. He has a use case example, “I have a local InfluxDB database acting as a process historian, and I need to replicate to enterprise level or a cloud-based Influx database. My question is, ‘Is there a straightforward option to store data locally if connection to cloud Influx database is lost and then forward data to cloud from local database when connection is restored?’” So, from InfluxDB to InfluxDB, I would probably use the client library. Or you could use Telegraf as well. Telegraf offers buffering and caching capabilities for that very instance. But yeah, I would probably use a client library and monitor whether or not InfluxDB goes. If the connection is lost, so that you can make sure to collect those metrics and then write them as needed. Yeah. InfluxDB V3 is available for Docker. Let me find it for you. One second. Oh, wait, I totally take that back. It’s not. [laughter] It’s not because Edge isn’t available yet. So, the OSS V3 version is not available yet. So yeah. I should mention all the examples that I talked about in this presentation that are dockerized. All the elements but InfluxDB are. We’re using InfluxDB Serverless for that. But that should be available pretty soon. I will send a link that kind of describes the future of InfluxDB open source.
Caitlin Croft: 41:22
And I would also just shamelessly plug these webinars because once all that is out in the open, InfluxDB 3 open source will definitely be sure to have some content and webinars around it. And there were a few questions in the chat. Do you want me to read them out to you, or are you scrolling back through?
Anais Dotis-Georgiou: 41:47
Sure, I’m looking through them right now.
Caitlin Croft: 41:49
Okay.
Caitlin Croft: 42:05
So, there’s one here. It says, “Can Telegraf run in your cloud environment, or do I need an external server to run it?”
Anais Dotis-Georgiou: 42:13
You need an external server to run it, but you can use InfluxDB to configure Telegraf, which is pretty useful. But yeah, I would love if you could go ahead and run Telegraf within InfluxDB Cloud, too. That would be amazing.
Caitlin Croft: 42:38
There’s a bunch of people that are expressing their, shall we say, concerns about moving to InfluxDB 3. And I’m just wondering—based on your experience with the product, what are some things that really excite you about 3.0 and maybe some tips and tricks that can alleviate some of that stress or worry?
Anais Dotis-Georgiou: 43:01
Yeah. I think, for me, I’ve just found it so much easier to use than using Flux, for example, and also just being able—I almost exclusively use the Python client. And previously, people would run into—the main issues people would run into is just memory constraints regarding memory constraints and not having enough memory available and additionally, queries taking a really long time to execute. And so, when you’d create any sort of hobby project, it would be okay, but as soon as you tried to make that a more significant project, then you were running into a bunch of issues. And Flux in general is not very easy to use. I enjoyed the challenge of using it. But yeah, using Pandas and Python, for example, is so much easier. I guess it depends on what concerns people specifically have with V3 and OSS. Yeah, people who have switched to 1.X, while they’re waiting for V3, that makes a lot of sense to me because you can’t query V3 with Flux. So, any queries that you make with Flux in V2, if you’re looking then to migrate to V3, you’ll run into some hurdles for sure, where you’ll have to translate those. Whereas if you are just staying in 1.X and you’re using InfluxQL, you can still query with InfluxQL and V3, so you don’t have to worry about that.
Anais Dotis-Georgiou: 44:47
So, the right API for V2 and V3 and V1, is the exact same. It all just takes line protocol. You could use any client library or API for writing to InfluxDB V1, V2, V3. However, you cannot query with Flux with V3 because it uses Arrow Flight instead. And so, the API for querying and the query endpoint for V2 and V3 are not the same. In fact, there isn’t a query endpoint for V3 because you’re using the Flight Client instead of a query endpoint.
Caitlin Croft: 45:33
There’s a question here. “Is creating an Add Timestamp more suitable for a tag or a field, or is including additional timestamps considered bad practice?”
Anais Dotis-Georgiou: 45:48
Can you say that one more time please? I think I lost it. I don’t know where that question went.
Caitlin Croft: 45:51
Okay, so it came in from Ian. It says, “Is a created Add Timestamp more suitable for a tag or a field, or is including additional timestamps considered bad practice?”
Anais Dotis-Georgiou: 46:08
No. So in V3, you’re so good, you don’t even have to worry about whether or not it’s a field or a tag. That’s entirely up to you. So yeah, you don’t have to worry about it. You can definitely include that as another column. And it’s not considered bad practice because, yeah, InfluxDB V2 and V1, you could not store logs, traces, or metrics or logs or traces, but you can in InfluxDB V3, and you don’t have to worry about cardinality. The only thing that you have to worry about is that your table shouldn’t be more than 200 columns wide. Or in other words, you shouldn’t have more than 200 tags and fields in a single measurement.
Caitlin Croft: 46:54
Let’s see. Here’s one. “We are using InfluxDB currently, but we’re seeing a lot of resource issues due to unwanted metrics. So, we want to explore if there’s a way to add filters in InfluxDB so that we can only ingest the required data and avoid the unwanted data.”
Anais Dotis-Georgiou: 47:15
So that’s another reason for using Telegraf. There’s a lot of options for passing metrics that are unwanted. There’re also the exec’d processor plugins. Let me send a link to those. And someone asked, too, “Can you push data from MQTT to InfluxDB?” You can.
Caitlin Croft: 47:45
While Anais is looking for that, I can answer this question. Someone’s asking, “I’ve been trying to transition to InfluxDB recently. I’m still learning it, though. I would like to ask if there is a way I can run my test using InfluxDB Cloud for free.” The short answer is yes. There is a pay-as-you-go model, so you can definitely start throwing your data in for free. You’ll probably hit some limit, but there is a free pay-as-you-go model.
Anais Dotis-Georgiou: 48:16
So, I just shared the exec’d processor plugin. So, if you needed to do really fine-grained passing of certain metrics, you could always use that. That makes it Telegrafic-sensible in the language of your choice, and you could add any sort of processing logic that you needed to, say, “Drop these metrics. Only filter those ones, and only write those ones to InfluxDB.” I’ve also used that plugin to create mini-batch forecasts before even writing the forecast or data to InfluxDB. So yeah, you could do that. But Telegraf also has things like name pass and tag pass and a bunch of different configuration options for passing only the metrics that you want.
Caitlin Croft: 49:02
So, for people who have been using Flux and have a bunch of Flux scripts, is there a migration path for those scripts to InfluxQL?
Anais Dotis-Georgiou: 49:11
Right now, no. All the developer advocates are happy to help you with any translations. The issue is that, certainly, not every Flux script that you have, depending on how complicated it is, you can translate that directly to SQL or InfluxQL. And that’s just the sad reality of it. Luckily, though, I use ChatGPT all the time to convert my Flux queries to SQL - and it works famously - or to Pandas as well. So, if I’m doing something with Flux that I can’t do with SQL, then I can easily convert that to Pandas, and then I can go ahead and use any of the tools that I shared with you today to do that sort of processing at a regular interval if I needed to.
Caitlin Croft: 50:04
And, Ian, if you have any follow-up questions, everyone should have my email address, and I’m happy to put you in contact directly with Anais if you need some help. You might have covered this, Anais, but I just want to ask it. Someone’s looking for a time-series database that can provide additional filtering when ingesting data. For example, there’s some unwanted data which has consumed more resources, and we don’t use this, so we don’t want to ingest it. Does InfluxDB provide this? Would this be something we can use Telegraf for?
Anais Dotis-Georgiou: 50:40
Yeah, so I already addressed that.
Caitlin Croft: 50:43
Okay, cool. I think we’ve covered most people’s questions. Here’s one. If I don’t have a reliable—oh, you’re typing an answer.
Anais Dotis-Georgiou: 51:00
I see it. Yeah. “I don’t have a reliable time sync on my IoT devices. Would it work to let InfluxDB add timestamps as data arrives to the server instead, or will there be any problems with overhead or precision of timestamp?” No. So, if you don’t provide a timestamp as a part of line protocol, InfluxDB will create one on write, and you don’t have to worry about the precision of that timestamp. So, you should be good.
Caitlin Croft: 51:24
Cool. I love it. There are so many questions for you. [laughter] Cool. We’ll stay on here for another minute or two, see if you guys have any last-minute questions, but really appreciate everyone joining today’s webinar, sticking around for all the questions. And I really appreciate all the number of questions. If there’s any questions that you have that you didn’t get a chance to ask or want to ask Anais directly, once again, everyone should have my email address. You can email me, [email protected], and I can put you in contact with Anais. And you can also find her in the forums as well as the community Slack workspace. If you guys are looking to join the community Slack workspace, it’s just influxdata.com/slack. And this webinar is being recorded and will be made available later today. So, thank you so much.
[/et_pb_toggle]
Anais Dotis-Georgiou
Developer Advocate, InfluxData
Anais Dotis-Georgiou is a Developer Advocate for InfluxData with a passion for making data beautiful with the use of Data Analytics, AI, and Machine Learning. She takes the data that she collects, does a mix of research, exploration, and engineering to translate the data into something of function, value, and beauty. When she is not behind a screen, you can find her outside drawing, stretching, boarding, or chasing after a soccer ball.