Understanding InfluxDB Basics: Tags, Fields and Measurements
Session date: Aug 31, 2021 08:00am (Pacific Time)
Is it a table? No, it is much more! Finally understand tags, fields and measurements.
In this session, you will learn how to answer your real-life questions with data stored in InfluxDB. You will see that InfluxDB is more than some tables; it is a window to the world of your data. In particular, the usage of tags, fields and measurements enhances the time series database and helps answer your questions in a convenient and fast way, if you know what to do. Discover tips and tricks to use while implementing InfluxDB.
All topics are addressed in the context of IoT monitoring, predictive maintenance and medical applications.
Watch the Webinar
Watch the webinar “Understanding InfluxDB Basics: Tags, Fields and Measurements” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Understanding InfluxDB Basics: Tags, Fields and Measurements”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
- Caitlin Croft: Customer Marketing Manager, InfluxData
- Thomas Heid: Senior Consultant, ASTRUM IT
Caitlin Croft: 00:00:01.083 I think we’ll get started here. Once again, hello, everyone. And welcome to today’s webinar. My name is Caitlin Croft, and I’m very excited to have Thomas out of ASTRUM here today to talk about tips and tricks to using InfluxDB. Once again, this session is being recorded and will be made available tomorrow. And we’d just like to remind everyone to please, be respectful of all speakers, as well as attendees. All right, Thomas, I think we’ll hand things off to you.
Thomas Heid: 00:00:38.027 Yes. Thank you for your introduction for my talk, and I’m very excited to be able to get the possibility to talk here. Thanks for that. And I hope I can tell you, the listeners, today - for me, the evening; for you, perhaps the morning - something new about tags fields and measurements. And that’s why I made this title, Is it a table? No, it’s much more. And in fact, already, it’s a spoiler for the end of my talk. I wanted that you already know what all is going to end here. But first of all, I will get you some impression what you can expect from my talk here. First, a rough overview of what is InfluxDB and why you could use it if you don’t use it already. Then we at ASTRUM do use it, of course - otherwise, I wouldn’t be here. Now, I will show you two examples of ourselves, what we did with InfluxDB. Next, there is the more technical part. Here, the key concepts of InfluxDB so that you can understand the global picture of Influx, and then going deeper into the field tech and measurements, what I want to tell you here. And at the end, I will also make some example queries to the database I prepared to see how we can access the data and what our choices for the database architecture, what impact this is there from going from the chosen architecture to our queries.
Thomas Heid: 00:02:47.915 And first of all, I wanted to introduce myself here, I’m Thomas Heid, as already Caitlin mentioned. I have a PhD in astroparticle physics, there was a Neutrino experiment [inaudible] on that KM3NeT, what’s important for this talk now, and since 2017 I work for ASTRUM IT as a senior consultant, and I’m focusing on business intelligence and process automatization. There, as you see below, I am doing a lot of workshops to get the requirements from our customers and I provide solutions mostly in Python and using of course also InfluxDB for the solutions. A few words to my company, which I’m working for. We try to improve your business if you are our customer. You do something but perhaps we can find something where we can lift your digital potential so that new business cases can be seen or all business cases can be streamlined and be more efficient. Our company - we are founded 28 years ago, have a lot of experts, 150 in total at three sites and made already, since 2007, here’s the number, 250 projects. So it’s a cool company for me I’m [inaudible] there.
Thomas Heid: 00:04:41.040 And now let’s go deeper in our topic, which I provided for you. First of all, the two topics I wanted to share with you whether you used InfluxDB - used it first. The project is called ERIK. It is a collaboration between several universities, some mid-sized companies, where we try to support children with some deficit in autism spectrum to be better in recognizing emotions and also to control their own emotions. For that we are using robots so that the child can make some interactions with that robot. And in the meantime, when the child is interacting with the robot, some parameters are collected, like some parameters for emotion, for excitement of the child; that are the parameters which are collected and that is [inaudible] in a timestream be collected into InfluxDB. Afterwards, as you see in the middle of the picture, there are two use cases. First, a therapist wants to extract some insights for one therapy to see if the next meetings the child has to do something else, or we can go forward with the planned therapy. On the other side, there are researchers who want to have a more broader picture on the data that’s - the other use case on all data where we need to view on all childhood at once. And at the left bottom, we see some dashboards which we made for the therapist to be able to make some decisions.
Thomas Heid: 00:06:54.922 As a project we have in-house, this is SPX monitoring. Now, for example in hospitals, in police stations, fireworks. There are - not all people work at once, so you have to cleanse the shifts of the people. And for that, there is a software written by our partners. And we have the task to monitor these servers where the software is executed. And here we get a lot of data about the hosts. So like CPU usage, chip usage, if there is one; also how many users are logged in. In principle, the health of the system. And for that we also generate some - we can generate some [inaudible] which is a typical use case also for InfluxDB to send the alert to someone who can then run to the server and do something, add new power, extinct the fire, if there is a fire, or something else. So there is some alert. At the end, as we see here also on this slide on the right side. We have a report which is generated, a PDF report, at the end of a month. Every customer can request the report not one employee of us produces PDF report and copy-paste some - but it’s already automated to get this report, the customer chooses which kind of permit he wants to see in the report, and he gets it automatically.
Thomas Heid: 00:08:53.631 Now, as we know for what we at ASTRUM used InfluxDB, we dig deeper into what is Influx and what can you do with that. First of all, InfluxDB is famous for using this timestamped data. Without timestamp, I wouldn’t choose InfluxDB at all. I think that’s very common sense. What is also special about InfluxDB is optimized for huge data volumes via clever indexing [inaudible] focus later on to be able to group and search for some kind of tags, here we have the first time [inaudible]. And of course, InfluxDB is made for importing data all the time, so also host deliver on the fly, all the time, data which will be stored into InfluxDB, there’s a great surrounding around our InfluxDB so that we can do that. The next important part of InfluxDB is that a point is really defined by time, and in a series which it belongs - [inaudible] we mentioned on the last slide, why there was a timestamp. In series, there is also a tag combination that we get in detail later. We will get all this in more deeper look into all these key features of Influx while ongoing of this talk.
Thomas Heid: 00:10:42.454 We also assume, in the context of Influx, the changes to data are very rare. We collect it on live, it’s switched into InfluxDB and then it will be never changed at all. If so, we have to be - make some tricks, you can do it, but InfluxDB is not optimized for that use case. Next, aggregating is the most common operation, and we mentioned you have collected values for a quarter of an hour, and then in the processing you only want to have it, one, when you - for this [inaudible], and that is a common use case. It’s the most common use case in the operation. First, a feature of InfluxDB is that there is a retention policy that means you can say to InfluxDB in your configuration files that all data should be deleted. That makes sense if, say, all data becomes less important, which is, I think, most of the time, very sensible, you don’t save it all the time, so your amount of saved data didn’t get so big. You would save a lot of resources, which make the application cheaper at all. And if you think upon processing first your data, you can aggregate it in this quarter of an hour example, as said in the last slide. And then, you can erase all individual data points we collected inside this quarter of an hour. So that we can reduce the need for resources.
Thomas Heid: 00:12:38.922 Also here, you can propagate it first, so if you have to aggregate, we can also make here a retention policy if needed. But of course, you can also keep it forever if you can afford the resources. The next point is InfluxDB there has to be some input and if we want to see something, so InfluxDB is nothing without the companions. It can be - it’s a companion delivered and together from InfluxData, or also the dashboards, the input strategies you already developed on your own. So we have here collection. We have here some filters which get in this washing basket. And then we go further towards some pre-processing and processing pipeline. Here is some illumination, and then you hit with a hammer on it. Of course in InfluxDB there are some different tools, no hammer, no saw, no light. But I think you know what I mean here. You imagine in best, or you want to do with your data. You are the experts on the data you are processing. And all data, I think, you can do with tooling around InfluxDB. And finally, as I showed you in my introduction slide to projects from ASTRUM IT, we have some dashboards showing therapists what’s going on in the therapy, or giving a report to the administrator of this host for the SPX monitoring.
Thomas Heid: 00:14:33.088 Here is some overview. It’s a picture taken from the website of InfluxData. Where we have in the central points Influx database, which we can directly access to see what’s directly inside querying with this Flux - I would say the most modern language you can use to access the database. In the following slides, you will also see some examples for this language. All can be done in as we on the right side. We see Grafana - it’s a dashboard directly put into InfluxDB, so if you install InfluxDB in version 2.0 at least, then, Grafana is also with you. In former versions it was an extra package, but now it’s included. On the left side, we see the input, what you can use here, there is a telegraph, it’s a big modeler software that you can add modules, plugins, to get information from other databases, from your MQTT devices, or everything else you want to. There you can also implement your custom - you get custom plugins so that everything can plug in. In ASTRUM we have a big usage in the CSV plugin. We get, for example, from a customer which powers some wind turbines, CSV files which are sent via mobile telephones to us, and then we process it further via the CSV plugin. That’s why I highlighted this plugin here.
Thomas Heid: 00:16:37.109 And on the lower part we see there are also client libraries which you can use in your preferred language. Here I and also my team was using Python client libraries. I think for me it’s no choice for what’s the best. My best side of my programming career on Python, so I use that, it’s no preference at all. So as we know, here’s the complete picture around InfluxDB central, InfluxDB itself stores data, just surrounding where we can see data from InfluxDB and on the left side, how we get into the data - into InfluxDB and also how to get it out. We go deeper in the key concepts, starting from the known. Known, I think, here of other databases. I think everyone knows some databases. So one I’m more familiar with is NoSQL databases, also with SQL databases. And I think it’s no problem if you are using MySQL, no complete list here of course of all the databases which I know. But here let’s start on the left side, what I would say I’ve heard of when someone talks to me for explanations. What different parts in InfluxDB are similar to other databases. So we have a Bucket, we call it Bucket in InfluxDB. That’s very similar to a database that’s where we recollect data and you can access it, and afterwards, you can choose some measurement. Which is often called and compared to a table. But I don’t agree in that.
Thomas Heid: 00:18:37.540 I see it more a collection of series. And a series, as we see in the next line, is for the most people - it’s a query result. It could receive - Influx query language or Flux, a more modern way. But also there, I’m not very confident that that’s the right explanation. It’s more a built-in concept to reduce the workload - for the developer and for the machine itself because we have the tags and the time in InfluxDB, which define all the series. So you have tags and time, and this is somehow - see it in more detail later. It defines the series, and that’s what these last three lines are about. There are tags, field, and time. All would be compared to columns in traditional databases, we would also store as a column. But here in Influx, there are great differences. The tags are index values, which means they are pre-processed inside in the “stomach” of InfluxDB so that it’s more easy to search for it, group by. In contrast to the fields, which are more normal, as we would see it in traditional databases that are not index values. There if you search, you have to go through the complete list. And here we can store more continuous values.
Thomas Heid: 00:20:28.050 The time it’s for me not a column, there [inaudible] with some - this is just I agree that the - I say that the fabric of the data without time, data in the sense of InfluxDB makes no sense to me. Then I would go to some other tooling. Now I want to go into more into depth, also with examples which we did in ASTRUM IT. We start with some measurement. I said in the previous - similar to a table, it’s more than that. But you have to sort it somehow. [inaudible] we choose to make a singular server, a measurement everything we measure, we put into this measurement. For ERIK, it’s more different here, if you go [inaudible], to say [inaudible], we can say that each child would be one measurement. But as we ask here a different question. We have the two use cases, as I told you in previous slides. We want first to see the therapist way of life, that we see one child, but we have also the researcher, we want to have the bigger picture. And that’s why we choose here a more [inaudible], we choose the play, the game which the child can do together with the robot. This is one measurement, and all parameters we collect there are stored as fields and tags at the end.
Thomas Heid: 00:22:24.617 And here, we start to get more details of the fields which I told you before, I want to store into the database, always a field has a key and a value. So for example, for [inaudible] servers we can see here, the key would be the CPU usage, and the value would be 100%, 50% or even 35.12%. So everything is possible here. And you can also call a fieldset if you have many fields, then it’s called a fieldset. So a set there, the key parameter of the field is that we have a wide range of possible values. Here are examples, also in there already introduced in this last slides, that if you use a CPU - here, I think, many are using InfluxDB or familiar with that, more interesting for new staff is Erik. Then we have some value of arousal, which you could talk about excitement of the child. And it’s in a defined range, but it’s continuous, it’s recorded during the therapy, so every other second we collect arousal values.
Thomas Heid: 00:24:00.120 Also, mimicry is not collected so continuous, but after each question a child is answering, the robot recognizes if the emotion was correct, if the child can also mirror the emotion, how much of the emotion was correctly mirrored. So here we go firstly into the pipeline of the processing we aggregated over parts of therapy. Now, the next important concept for us here. Also we have keys and values, you can say keys like the name of the child, for example. The name would be the key, and the value is the name, like [inaudible] for example. Or combination, which is very important for us, in the context of InfluxDB are tagset. Because it defines the following parts as series. And it’s indexed, which means that you can search and group by these tags, and it’s already prepared in the “stomach” of Influx, which makes it really fast, more fast in searching and grouping by some field values. And as I said, here I wanted to repeat it all the time, that is the baseline for the time series to get returned from the database when we query it.
Thomas Heid: 00:25:45.530 Here are some examples we could imagine for SPX monitoring - the type of the CPU which is used for the ERIK project. We used here tags for participants so that we can also query for a specific participant. More [inaudible] of this therapy there are relaxing faces, playing faces, and where the therapist really stresses the child and want to check how does a child react to this stress, and trains. It’s a really limited range. So for example, if he seeing - emotions it’s also one tag. There are happy, angry and a few more, but in this context of this therapy, there are only six at all. And more, we are questioning at the end of the day for these tags, and that’s why we are using them as tags. When you have a question to your data and there is something inside which groups it, then you could use a tag for it.
Thomas Heid: 00:27:12.405 Okay. Let’s go here, the last words of my - for the last slide lead us to this slide, that [inaudible] ask measurement and tags sounds like they are very similar. They are very similar, but in the queries we will see later on, there is a difference also in the “stomach” of the database, of course, there are some differences in the implementation. And what my advice is, you have questions, you have some data you are exploring on all this data and you want to know something about this data. And then start from this question and go to your architecture of the database. For example, in SPX monitoring, we are asking always for one specific server, and that’s why we choose a measurement for this server. And then, afterwards, we have some tags in very crude form. On the other side, we have this ERIK project that we have a lot of children, where we want to compare or to see the progress, in total of the whole group with some therapies. Or to summarize this saying is, you can make your query short and more readable if you choose a correct design and decipher tags and measurements on the right places. That’s always - I see, coming from a developer point of view, we always try to achieve clean code, and that means shorter, more readable. I think it’s the most important part. And there you can help your developers afterwards, when they are writing the queries to ask questions towards the database. And answer questions in real life.
Thomas Heid: 00:29:18.812 Now, go further, I have already mentioned a lot of times in this talk the name series. Series in InfluxDB is, for me, always some kind of time, or timestamps, and then we have a combination of tags, which define a series, and of course, we want to access some values. That’s on the top part. There is a time column. And there are two fields, F1 and F2. And there are tags indicated by A, B, C. And then, already prepared for us in querying, and it repeated always, there is a cache inside InfluxDB which give you a more faster way to access these series. Then we see on the lower part of my slide, that you can see a lot of possibilities to get series. And if you ask always the same time, it can do it faster. On the left side, you can select all records, which have the A and B tag in common. The other two shows different sets as recorded in previous slides. So summarizing here, a tagset defines your series and the ability to have these series makes queries faster and [inaudible].
Thomas Heid: 00:31:03.095 And here, the first time I said to you that in that way a measurement is split up into multiple smaller tables, by this series concept. So here I want to wrap up the same - up till now. So answering your questions, so my big advice also starting is time important? If it’s not important, look for something else in InfluxDB. What do I want to show? What filters do I need? What groupings that lead you to the right way of using tags and measurements, and if you don’t want to filter or to group for some parameter, to some value, perhaps, a field is enough that reduces, for you, the memory for InfluxDB and makes your implementation cheaper. To answer the question of the measurements - are fields really needed side by side? Or could you separate it in different measurements? If so, you could think about that. It makes your queries faster, that’s also similar to the last question, what values do you want to compare? If you don’t want to compare some values why do you put it into one measurement? Perhaps it’s more easy and you don’t get so much troubles with similar tags when you put it in different measurements.
Thomas Heid: 00:33:00.750 Then next, you could ask now you are familiar with the tags and fields, “Hey, why can’t I put everything as tags?” There I say cardinality is the argument, not to do so if you don’t need it or if there is a too large of set of values you would store in a tag, because cardinality means a huge memory usage and longer processing times for your queries. Long processing times makes your customer not happy. To make you some feeling how you could estimate the cardinality, it’s all combinations in an InfluxDB bucket for measurement and tags. And I have here an example - we have two hosts, for example, two fields, and there we have two tags, server type and the color already sees, always multiplied, that we would have a combination set of eight to the power of three. And then, additionally, we have all possible types of - we can put into the tag value field.
Thomas Heid: 00:34:38.401 So server type, we can have the type A and type B - I’m not exploring the server types, but the second example is more familiar to me, I know at least two colors, red and blue, and you match it if you have here more colors. The cardinality would increase, but for example, you can imagine that also the number of customers or clients as we had in the example of ERIK, with “childs”. Each child is a tag, a tag value. At the moment it’s okay to have it as a tag, as we have a limited amount of childs here. 15 at all, it’s much higher, I would expect hundreds. But that’s okay, for me in the sense of tag. But if you have an online shop with a million customers, then you should think about that, if it’s worth to have it in that value or you can imagine it more a clever way. And summarizing, there is no real definition of “big”. You have to have an eye on it. If you have some troubles, think about it, and perhaps you find a better solution and can reduce your cardinality.
Thomas Heid: 00:36:14.464 Now, I thought long about if I wanted to show here a really hands-on, or not. I finally decided to make [inaudible] to do something here and show you live on a database. It’s more tricky here, I think I shared my complete screen. So you should see my Grafana dashboard of my database. For this example, I choose the ERIK project, where I want to show some Flux scripts. Every Flux script starts here with a From
statement where you choose your bucket. You remember from the starting of this talk, it’s like the database, I called it Erik. We can try it. If we already see something? No. It’s important here always set some range. Otherwise, I think it’s very useful to have this necessity. But otherwise, the old database would be [inaudible] total. And could make your cloud really, really big.
Thomas Heid: 00:37:51.271 So we start with the next line, we have these pipe operators in Flux, if you want to know more you can look at other webinars, there are more - better than I can explain it. And we start, we have a start point here. New principle, I make it here five years in the past. Principle, for me it’s a old database. But you could also do some less. Already here we see some result range now, we have [inaudible]. We are questioning, and then we see it already here, all our data with some tables - with our fields, we see here - we start here, arousal as a field key, the value in the value column, and here, the measurement, how the measurement was called. And it takes here the last five columns.
Thomas Heid: 00:39:11.540 We can go further. We want to ask some questions, and so we filter for something - we are filtering this data function. Going into the deep here. We only are interested in this tag for some measurement, which is named Module 2, it’s a game [inaudible]. And we are interested only in the field of arousal. So let’s see what’s the result of this. It’s no, because I have a typo here. So we see it’s important to spell it correctly. Now we have here only Module 2, I can scroll down here. You see Module 2 all the time and also the field is only arousal.
Thomas Heid: 00:40:29.764 Perhaps you already saw this column here, it’s a table column there. You can see it here in this Grafana table view. Here we have a lot of tables - that is what we call series. So every unique combination of tags defined here as series - I’m sure you are not at all interested in this all series. Let’s set it default - combinate a default behavior for a Flux query. If you don’t want it in that way, we can make an empty group message. But then we get it all in one table. And then, we could start to make our own queries and group by some columns or some highlighting, thank you Grafana. We want to query for a face and see what happens. Submit. Now, we have here three tables which are asking for - there is a face of game, a neutral face, there’s nothing, to when there’s a relax face, that is a child relaxes actively.
Thomas Heid: 00:42:11.151 Of course we can make some calculations also with this Flux query, we can calculate the mean, for example, for the column value. Let’s see if everything works fine. Yes, we have here our column, showing what’s the mean value for game, for neutral and relaxed. Perhaps you say it’s not so interesting to have it over all childs. We can here insert that also we are grouping for the child which is called here as a client. And then, we have here separated into several clients. So we have it all - six series or tables, as you want to call it. Afterwards, you can repeat some stuff. You can also group. Group by face and make, again, some calculations. Let’s see what happened. Now we have the maximum value in this face, but also in first - calculated the mean for what this client.
Thomas Heid: 00:43:47.442 A short moment I want to spend on how it could look like. In Python, I don’t want to execute here now. Only wanted to show that it’s possible with this bucket, InfluxDB, I highly recommend that package. There every [inaudible] request already implemented, it makes it very easy. Here’s some information about the bucket, about organization which is defined inside the bucket, plus it’s a token access credential. So not every InfluxDB is secured with ASTRUM IT, and this token with ASTRUM IT, of course. And then the [inaudible], then we define some clients to get the connections. Here, with the query, you can use the Flux query be solved in the last browser tab and paste it here. It would be also, I think, with InfluxQL if you like. I recommend using Flux, it’s a more modern, and I think the future that we should use. And then you can query your API, and then here - I like there are different kinds of queries. You could also have simple query here. You would get a list of records. With data frame, there you already get a [inaudible] data frame. Which makes it easy for you. Perhaps you’re working together with some data scientist, they could be familiar with these data frames. And then, we get some resource as data frames.
Thomas Heid: 00:45:43.265 So let’s go back to our slides, “our hands are dirty”. We, in principle, know now what we can use, we can use Flux, we can use Python to dig into the database. A few closing remarks of the downsides of these [inaudible] decision made for architecture of InfluxDB, the update of a point is not so easy as you could imagine at first. Because if there is a point you can’t change their tag values, and it is defined by that tagset, if you want so, you have to try to delete it, or delete the whole point and edit again. Therefore, if you want to do some use cases as I mentioned on the bottom right, if versioning is a use case for you, or you [inaudible] some change, you could add a tag that’s a version of the algorithms. The forecast could change all the time and you can add a tag when the forecast was made. And workarounds, as I mentioned, add additional tags for the versions or there are some people who might [inaudible], you can increase the timestamp, the timestamp is always stored, or you could store it in the chronology of nanoseconds, and you could imagine to increase it by one nanosecond, then it could be that it doesn’t hurt your application afterwards. And so it can work. I wouldn’t recommend that, at my point of experience here.
Thomas Heid: 00:47:35.285 So advises for using tags and measurements in fields, it helps you to make your queries easy, readable, small and fast. And that why you should think about that, to use tags in the right way, use tags for things which have a low variety in the values. And using measurements to increase the readability of your queries makes your life easier. Here’s a conclusion. On the top left, I pasted the two projects, some screenshots, first is the robot, it’s ERIK. And left, the SPX monitoring - that we produced some reports for your hosts, which you use together with a SPX toolchain. Here are some conclusions. I think I have concluded on the last two slides enough. I would give to you - focus on your questions. I think most of you are also experts in your domain itself, so you have questions. And that should - the InfluxDB is supposed to support you, to answer your questions, and not you should support InfluxDB to work the right way. And of course, it’s not a table, it’s many tables. In fact, there are series inside Influx database. Then, thank you for listening. Up till now I see the attendees list is getting not so small since the start. So I’m happy. Thank you.
Caitlin Croft: 00:49:19.497 Awesome, Thomas. Great job. So there’s a bunch of questions for you. So the first one is, if I want to monitor CPU usage, load average, IDLE time and other things like that, can I use one measurement and put all of these into different fields? Or is it better to create separate measurements for each?
Thomas Heid: 00:49:40.862 So in fact - I would suggest here, if you want to ask at advance to your - in a query, and to show some [inaudible] directly, side by side in a dashboard, I would suggest to load them all at once. And that lead you to one measurement for the host, and then fields for CPU usage, load average, IoT wait, etc. The people who asked the question towards the database are different, so the perhaps, one admin only wants to know about the CPU usage, and you have other users in some other context which are interested in the IoT wait, then I would go for different measurements.
Caitlin Croft: 00:50:53.455 Perfect. As I’m using a lot of tags in one measurement, can it cause any performance issues?
Thomas Heid: 00:51:03.200 Yes. You can cause any performance issues, because as we see in the slide of cardinality, that is also a factor in calculating the cardinality. If you have 1,000 different tags, there is a factor of 1,000. If you only have 10 tags, there is only a factor of 10. And so the cardinality, it would increase by a factor of 100. And that could definitely harm your performance of your Influx database.
Caitlin Croft: 00:51:44.393 Yeah. If old historical data is still valuable, will InfluxDB’s performance decline with excessively large data volumes? Should we consider an archive option such as Snowflake?
Thomas Heid: 00:52:02.089 I’m not so familiar with this Snowflake. So kind of [inaudible] compare that, only can try to make some comment for some archive I would have in mind. Yes, I would perfectly agree with that - to make some archive. I could hear mentioning some kind of steps to put all these old stuff into some other database, if you make some copy operations from time to time. Perhaps every month, you want to copy the previous months to some archive. Always in the precondition that you are in the current questions, you are only working on the current month, for example. Because if some question on the [inaudible] than the last month, you should keep them in your current database. Even if it makes it smaller, it’s slower in questioning questions to your database.
Caitlin Croft: 00:53:32.045 Okay. Is it possible to group by a field value? We have a special tag where the amount of the possible values is around 30, up to 40 thousand. Based on your advice, it should be more likely be fields instead of tags. But we need to be able to group by this value.
Thomas Heid: 00:53:58.263 At the moment I’m not sure if I can group by a field value. But I would try to recommend you avoid it, if it’s possible. Perhaps my advice - 50,000 you can - 40,000 you could try if it works for you with a delay in performance. If it works, you could go for tag, if not, perhaps it’s more wise to do sort of feel and search for and make some logic - I have [inaudible] other examples where when the first guess was - I think you would thought a lot of times about that. But I could imagine that one could see some hierarchy, perhaps in that - [inaudible] you can make out of this one tag, perhaps multiple tags, and so that the grouping gets easier, and that could be a solution in that question.
Caitlin Croft: 00:55:23.036 Okay. Is there a way that we can use delta window functions where the values are counters and ever-increasing and we need D equals find delta between values of different data points?
Thomas Heid: 00:55:41.249 Yes, you can make calculations with Flux to calculate some data between rows, as far as I know. And so I would say it’s possible.
Caitlin Croft: 00:56:02.896 Right. Someone asked if they can receive the presentation. Yes, the recording as well as the slides will be made available later today or tomorrow morning. So be sure to check the registration page again, it’ll be available there as soon as possible. I know lots of people like to go back and watch these again. Please feel free to post any more questions you may have for Thomas. I thought that was a great presentation, I thought it was awesome watching you pull up the InfluxDB data explorer and do a Flux demo in real-time. So if anyone’s interested in learning more about Flux, and learning more about how Thomas did that, once again, we do have the Flux training as part of InfluxDays coming up in October. So you’ll become very familiar with InfluxDB cloud, as well as Flux, which is our querying and scripting language. So the recording will be available using the registration page, so just check back tomorrow morning. Just pretend like you want to register for the webinar again, and you’ll find the recording. And of course, all of you should have my email address, so you guys can always email me if you can’t find the recording, I’m always happy to send that as well as the slides to you. All right, well thank you, Thomas, I think it was a great presentation. Thank you, everyone, for joining today’s webinar. Hope to see you at future events and hope to see you at InfluxDays.
Thomas Heid: 00:57:50.282 Thank you for listening. And thank that I could be here.
Caitlin Croft: 00:57:54.220 Thank you, and I hope everyone has a good day.
[/et_pb_toggle]
Thomas Heid
Senior Consultant, ASTRUM IT
Thomas Heid is Senior Consultant at ASTRUM IT. He is focused on exploring and innovating data processes resulting in customized solutions.
In 2018, he earned his PhD in Neutrino Astronomy, written about the evaluation of sensor data and the development of algorithms specifying the sensitivity. This research was done within the European collaboration KM3NeT. Today, Thomas is passionate about building solutions enabling the user to focus on their strengths in daily work. To fulfill this vision, Thomas leads different projects with consultants and software developers. Outside the office, Thomas enjoys competing in track and field, especially in combined events.