How to Improve Renewable Energy Storage with MQTT, Modbus, and InfluxDB Cloud Dedicated
Session date: Sep 26, 2023 08:00am (Pacific Time)
ju:niz Energy provides large-scale energy storage systems that provide grid services and enable trade flexibility in the energy market. In addition, ju:niz Energy develops intelligent energy management systems that control and optimize the operation of the battery storage system. ju:niz Energy is collecting thousands of data points every second about battery health, climate, temperature, etc. Their tech stack includes Telegraf, Modbus, MQTT, Grafana, Docker, AWS, and InfluxDB. Discover how they are using InfluxDB Cloud Dedicated, the purpose-built time series database, to collect sensor data from their batteries to enable better energy consumption analytics.
Join this webinar as Ricardo Kissinger dives into:
- ju:niz’s approach to industrial IoT monitoring - including how they got rid of legacy Python scripting
- Their methodology to improve sustainability practices across Germany
- The importance of using time-stamped data to enable predictive maintenance
Watch the Webinar
Watch the webinar “How to Improve Renewable Energy Storage with MQTT, Modbus, and InfluxDB Cloud Dedicated” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “How to Improve Renewable Energy Storage with MQTT, Modbus, and InfluxDB Cloud Dedicated”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
- Caitlin Croft: Director of Marketing, InfluxData
- Ricardo Kissinger: Head of IT Infrastructure and IT Security, ju:niz Energy GmbH
Caitlin Croft: 00:00:00.328 Hello everyone and welcome to today’s webinar. My name is Caitlin, and I’m joined today by Ricardo who’s joining us from ju:niz in Germany, and he’s going to be talking about how they are using InfluxDB Cloud Dedicated. So very excited to have him here to talk about his experience with InfluxDB. Please post any questions you have throughout the webinar using the Q&A which you can find at the bottom of your Zoom screen. It is being recorded, so the recording and the slides will be made available later today or tomorrow morning. And don’t be shy, love to hear from you guys. Love to hear some questions from you as well. So without further ado, I’m going to hand things off to Ricardo.
Ricardo Kissinger: 00:00:48.518 Thank you. Hello. Nice to have the chance to speak to you all and to present our company and also Influx and the latest developments of the InfluxDB Cloud Dedicated cluster system. I want to start with a short introduction of the company name. So we rebranded this year. So originally we were the GreenRock Management GmbH with Smart Power GmbH and Green H2. We were a corporate group, and we reunited with one name. That’s why we are all called ju:niz now with the ju:niz Energy, which is my home basically. So it’s the energy player with all the best systems. And yeah, the real estate is also existing for our Immobilien. And I’m here today to talk with you about renewable energy storage systems, how to connect them with Modbus, MQTT, and how to get all of your data to the InfluxDB Cloud.
Ricardo Kissinger: 00:01:57.474 Maybe a few words to me before I start with the slideshow. So I’m the head of IT security and infrastructure from ju:niz Energy GmbH. I started with them last year in September. And yeah, I was able to actually roll out the complete Influx stack on our plans. And yeah, I’m showing you now all of the details. So with the introduction round, I’m talking a bit more about ju:niz Energy. I’m showing you the Generation 1.0 and 1.5 plants where we basically have all the batteries. And the second part of [inaudible] is the more techy part, which I guess you all are keen on. All right, so the core of our business of the Unit energy is basically that we have large-scale storage systems which operate in grid serving and an economical manner. So it’s a decentralized energy supply for renewable energies preferred and battery storage and hydrogen for district areas, which is key in getting the world to a renewable energy place.
Ricardo Kissinger: 00:03:11.708 And we basically make software for intelligent energy management systems that control the batteries and also interact with other parts of the system and in future also hydrogen production. The energy portion of our company was founded 2014. We currently have around 66 employees and have 145 megawatt install capacity around mainly Germany. So this is a short overview of our plants in Germany. You can see they are basically spread mainly in the south and east and some also towards the north of Germany. And yeah, you can also see all of the capacity values from those plants basically.
Ricardo Kissinger: 00:04:04.417 Good. So the showcase of Generation 1.0 plants — this is basically the already existing Generation 1.0 best systems, which were basically already deployed. They are already working since multiple years. They were deployed within the last three years. They have starting capacities of 150-kilowatt hours and an average of 1-megawatt hours up to 11-megawatt hours capacity using mainly Samsung NMC batteries and second life batteries for example from Daimler. And they are mainly used for primary frequency control, peak shavings, safe supply for electric energy, and reduction of the [inaudible] grid fees. And those are basically also smaller plants. You can also see basically we had plants which are below 5 megawatt hours for peak shaving, for optimization of industrial usage of energy, and where frequency reserved and trading plants have more than 10-megawatt hours. A few examples — for example, Garching is close to our office, which is one of our Forschungsprojekt research projects. It was completed 2016 already. It has a power of 1.4 megawatt, capacity of 1.2-megawatt hours, and it basically handles the grid. And it’s also close to the office, so we can actually use it for testing scenarios.
Ricardo Kissinger: 00:05:46.658 Then we have Juwi Schmölln, which was one of the first German innovation projects which did combine wind energy turbines with local energy storage to don’t lose the energy or don’t need to turn off the wind turbines with a capacity of 3-megawatt hours was completed last year. But still, it’s one of the first German innovation projects in this area. SMAREG 1 is one of the was finished 2020. It’s one of the bigger ones with already 11-megawatt hours and mainly Samsung batteries. VWEW Leinau, for example, is a smaller one with 3.7-megawatt hours capacity, but also basically able to handle grid requests and is also for the customer itself to have peak [inaudible] and to reduce the cost from energy. The more interesting and the more modern energy storage plants, currently, we are having seven of those plants being deployed or already online.
Ricardo Kissinger: 00:06:57.676 They have a starting capacity of 11-megawatt hours with an average of 23.7-megawatt hours. The biggest one is 67-megawatt hours and altogether close to 200-megawatt hours of capacity and mainly using Samsung NMC batteries and mainly for primary frequency control and reduction of grid freeze. So those are not industrial energy storage plants, but really for a grid service. Yeah. So Wartburgspeicher, which is our biggest project, has a capacity of 60-megawatt hours, and it’s one of the biggest ones in East Germany and also one of the bigger ones from Germany. SMAREG6, for example, is a good example. We also deployed Starlinks because some of the plants are far away from any network and internet connections. And with Starlinks, we were able to still get the data into the cloud and connect them. And this is for example a 24-megawatt-hour project which is currently in closing phase. So we are going to complete it until the end of this year.
Ricardo Kissinger: 00:08:14.759 Good, so now the techy part of the [inaudible]. So I will give you a short overview of the historical situation. So what did I find when I was starting with ju:niz Energy? What were the methods used in the old plants to collect data, to harvest data? Then I will speak with you about our journey to the InfluxDB Cloud v1 cluster, which started earlier this year. Then we did switch over to InfluxDB Cloud Dedicated and off to a shiny new cluster and a shiny new start. That was, I would say, three months ago. And then I will speak with you about and show you some examples of how we connected MQTT devices and sensor data into the cluster. And then also how to get Modbus data into the cluster.
Ricardo Kissinger: 00:09:05.960 Good. So the historical situation, we had a centralized monitoring server. It’s hard to call it cluster. It was basically like a rendered V-server running InfluxDB OSS 1.8. It didn’t really had a backup solution, it had a limitation of resources, and due to the nature of being a rented V-server, it was also hard to do actually hardware and performance upgrades. And yeah, also the Ubuntu server installation was going close to end of life. So we had to do something. We had to replace it. We actually wanted to have a safe space for our data hence we spoke with InfluxData to get one of their clusters. And we really had a lot of limitations in the legacy solution. So we had unreliable syncing of data between the Edge location and the centralized monitoring. More to that with some live examples, I will show you later in the presentation. We had storage concerns, which did lead to usage of retention policies to throw data out, basically. So we did define retention policies after a few months of data to get rid of a lot of data and just keep some data points to reduce storage. But that also made it hard to actually see what happened in the past because you don’t have all the details you want to have, which also ended up with less precise information about battery health, about the battery cycles, which is the key to have a reliable plant.
Ricardo Kissinger: 00:10:46.994 So that’s why we had to come up with something. Also, because of the InfluxDB OSS 1.x release had some limitations. So the Edge Data replication, which is really a reliable way to get data from any Edge location into the cloud, wasn’t existing. So there were Python scripts used in the past to resend data, but that wasn’t really working stable. It was not really possible to use a Flux language, and there wasn’t an integrated UI, so it only was a CLI. And we were not able to do automated tasks for data processing, which also resulted in cronjobs using Python in the past to do some data processing in [inaudible], which now gets handled directly, natively in Influx. The authentication wasn’t token-based. It was user-based, which had less security. Yeah, the import-export functionality was also somehow limited.
Ricardo Kissinger: 00:11:56.251 So we went over and tried to go to InfluxDB Cloud v1. So we had a cluster set up from the Influx team with implementation of snapshots, and backups, and all the good stuff, and also multiple data nodes, multiple compute nodes to allow basically a reliable server cluster infrastructure. Also, we had a prevention of data deletion due to storage issues since that wasn’t an issue anymore. We did get a more reliable data sync into the cloud, especially with the Generation 1.5 plants. We were able to directly use Influx OSS 2.x, which had Edge Data Replication built in. And yeah, actually also the requirement was obviously to migrate all of our existing data from the OSS instance into the cloud from Influx. And we did get along with it. So we did start with it. We did implement Kapacitor tasks to send data from Generation 1.0 plants into the cloud. However, we had also had some trade-offs, which will I talk about in a few minutes. And yeah, we did get rid of storage issues and concerns.
Ricardo Kissinger: 00:13:24.911 However, there are a few good news, and we also had a few not so good news, or some headaches during the migration phase. So due to the nature of our legacy infrastructure, it wasn’t possible to utilize InfluxDB 2.0 tools to send data, to push data, to make backups, to create snapshots, and so we had to import and compress line protocol files over the Internet and that failed a lot of times. We had a lot of headache with it, and the Influx support team had a lot of tickets and fun with me during that setup period. The Kapacitor tasks which were running, like, a cronjob caused other issues. So we had data now. However, it was coming in like every 10 minutes. We did get some data, which is not real-time data synchronization, which doesn’t allow to set alarms in Grafana. Well, you can still set them, but if you only get data every 10 minutes, it doesn’t make so much sense to make your event-based alarm in Grafana on this.
Ricardo Kissinger: 00:14:35.632 And also, Kapacitor wasn’t resending data if it failed the first time, which especially with the Generation 1.0 plants was a huge issue because they have really unreliable Internet connections. And yeah, as you can see from the screenshot, also, the cluster load is spiking, and the cluster had a hard time to actually keep up, which is especially a pity since it was not even having all of the legacy plants writing into it, and the Generation 1.5 plants were just starting basically to interest data. Here are some more screenshots. This is basically the load of the CPU, and actually, the CPU and the memory load was always around 70% to 80%, which, of course, wasn’t the goal and wasn’t really what we expected and neither was that what Influx team expected. And yeah, because of those bottlenecks, and also some of the limitations of the Kapacitor integrations, yeah, it didn’t really fulfill the job the way we needed it to. That’s why we spoke with the Influx team to see if we can find a solution for that.
Ricardo Kissinger: 00:15:53.947 And yeah, actually there’s a solution for that. So it’s called InfluxDB Cloud Dedicated, which is basically the new Influx version. And yeah, so as you can see here on the screenshots, already, the loads of the CPU and the memory isn’t as high anymore as it was before. We did investigate why the issue was caused initially, and the Influx development team did dig out that we had a lot of inconsistent data across our Influx databases, which isn’t an issue if you are an Influx v2. It’s, in theory, not an issue if you’re on Influx v2. However, it’s just not ideal. And especially it’s not ideal because it was a bug in our Python scripting which caused that partially the data was written inconsistent with different data types depending on the day. One thing, however, which was possible was that we were able to get rid of retention policies since storage isn’t a concern anymore. Getting data into the cloud also did work more reliably, or did work better because we were able to just use — Influx v2 writes into it. And partially I’m reprocessing data because of the inconsistencies using Telegraf. And if I’m rewriting the data with Telegraf, I’m also able to use the EDR process again, so I’m able to locally ingest the backup of my Influx OSS databases and clean them up, and then directly use EDR to pump the data back into the cluster.
Ricardo Kissinger: 00:17:49.396 And that way it was possible, without help of the Influx Support Team, to actually get over a lot of data already. I’m still in the process of normalizing data, I mean, as you can imagine if you have multiple years of data. So we have data which is written every second up to a maximum every five seconds, and we have a lot of data points. For example, on SIMATIC 4, we have two and a half thousand data points, which get written into the database per second. And the old plants don’t have the one second resolution, but they use a five-second resolution. But still, you can imagine if you have 1,000 data points every five seconds and they are in consistent need to be rewritten, it just takes some time. So I’m not finished with all of [inaudible] migration yet, but yeah, I’m not afraid of finishing it up since it just works like a charm especially since the cluster is not having a hard time anymore. Even though now all legacy plants ingest their current recent data in the cluster, basically, it was a relief. And actually switching over to Influx v3 did help us a lot here.
Ricardo Kissinger: 00:19:11.291 Good. So to give you a bit of an introduction to the EDR process to Generation 1.0 plants, since obviously I wanted to get rid of the Python scripting, and that’s why we did come up with — maybe the Influx Support team did point me to that. There’s a possibility to set up Telegraf, pretending it’s basically an InfluxDB one listener picking up the data and then sending the data into a local Influx OSS database and then using the EDR process to sync the data to our cluster, which gets rid of the issue with getting data every 10 minutes. And it’s a reliable mechanism to send back failed data especially nice is it because we basically have like docker-compose container set up, which is like a Swiss Army knife. So I can install it parallel to the existing InfluxDB v1 Edge installations. I can switch the Influx v1 integration off if I needed to and then just activate that. But if it’s needed, and during the implementation phase it was needed sometimes, I can also easily switch back without the need of reinstallation any packages, without the need of migrating data locally. And basically, within two minutes, I can switch out the integrations, which is really nice.
Ricardo Kissinger: 00:20:50.794 And yeah, since it is using now the EDR process from Influx, also, we have more [inaudible] data. So I will show you later some dashboards. The old monitoring server really had issues and had lost data every minutes like in an hour, a couple of times, and then it’s just vanished. And then it’s all gone. We don’t have any of those issues anymore. And since we were also able to utilize the Influx CBOs as tasks, it is now possible for us to write all of the data required to end up in Influx in the Influx stack environment and get all we need. All right.
Ricardo Kissinger: 00:21:36.788 So for gathering data with MQTT, so we have the requirements in the construction phase of the plants to also do recording of the climate in the rooms where the batteries are sitting. And we need to basically lock the humidity and the temperature for the batteries for warranty. And due to the nature that this is just a temporary system, we needed to have a solution which can be implemented quickly with low costs — and yeah, basically quickly. So the solution was to buy some LAN temperature sensors which report the data via MQTT. We are gathering the data from MQTT using Telegraf, and then we’re sending the data from Telegraf to our local InfluxDB OSS installations, which then uses EDR to send the data to our InfluxDB Cloud Dedicated. And then we can set up Grafana to have visualization and alerts also for the temporary tool logging, which really is helping a lot and especially if you consider those batteries are worth hundreds of thousands of Euros.
Ricardo Kissinger: 00:22:59.839 And basically, we were able with like $500 per project, to have a temporary temperature recording and to integrate it into our Influx cluster and have all of the nice things from Grafana as well with alarms and so on, which really is nice. Oh, yeah. So this is some details of how it’s actually done, the setup from Telegraf. So you have to define like an input plugin for the MQTT consumer, and you basically define the topics which you want to record or you want to watch out for, and then you basically just define the data structure. In this example, it was a JSON using XPath, so I was able to get the data out of that JSON using this configuration screenshots you see, and the only portion which is missing is basically the output plugin to Influx v2. But yeah, it’s not more than that. That’s all the magic which is required to actually get the temperature data, humidity data from all of our sensors, yeah, which is really great. I mean, it’s amazing how easy it was for us to integrate it. And yeah, how important it is.
Ricardo Kissinger: 00:24:22.700 I mean as said, it’s batteries worth 10,000 of Euros, and yeah, this small solution does actually keep the warranty of the batteries. Good. Gathering data with Telegraf from Modbus, so this is the setup of our Generation 1.5 plants, basically. That’s why we also have a short system architecture overview. So we basically have IMS SPS controller which is the heart of — sorry, which is the heart of each plant, and it has all of the logic in it. It basically has all of the data from batteries to Inverters, to climate systems, to the fire alarm detection systems that basically gave us all of the data. We query the data from this machine in a one-second resolution from our plant server, which is a Linux server. And yeah, the Telegraf process writes it into our local InfluxDB OSS database, and then we are using the Edge data replication method to transfer the data into the InfluxDB cluster which runs on AWS. And yeah, the great thing is basically that we were able to implement and to use Telegraf like a native Influx tool here to read out data via Modbus from the SPS controller. It’s also possible for me to read out monitoring data from inverters and other systems parallel.
Ricardo Kissinger: 00:26:13.562 And the cool thing is it’s a no-code solution. Basically, it allows agile development and deployments. And because of the VDR process taking care of sending the data reliably, I don’t need to think about it, which really is great. And besides me thinking about it, also, we had a solution partner from Influx which helped us a lot, which basically did a lot of the initial work. It’s B1 Systems, which is one of the official solution partners from Influx. And they are located in Germany. They have multiple locations around Germany from Berlin to Cologne to Dresden to Jena. And they are really experts in handling a lot of questions, issues, and topics around Influx, around Linux systems, around cluster high availability, about private and public clouds. And it’s really great to work with them. And yeah, without their great support, it wouldn’t have been possible to actually implement all of those cool things. So that’s why I wanted to basically have this short advertisement for them in my presentation.
Ricardo Kissinger: 00:27:31.306 Going back to the Telegraf configuration for getting the job done, so what you can see here is basically an excerpt of a [inaudible] Telegraf configuration. It shows you basically the ease of integrating Telegraf [inaudible] data. So we define the measurements where we want to write in the data. We define the name of the field which we want to use. We define the type. What is it? Is it integer float, the scaling of the data which comes in, and the register address. And yeah. So this low-code solution basically is really great because, yeah, it isn’t having any dependencies for modifications in the SPS or the main control unit.
Ricardo Kissinger: 00:28:24.637 So I can do all of those tasks and work parallel to the normal plant working. And also I can, depending on plant size, create multiple Telegraf config files, which allow parallel data querying, which also is really great because I don’t need to think about how can I arrange one file to be super optimized and performant if Telegraf just takes care of handling it. If I need multiple files, or if I have multiple machines which I need to connect to, I also need to do and use multiple Telegraf files. And Telegraf does really a great job of just handling that, and you don’t need to think about it. All right. Yeah, and all of the data, in the end from the machines, from the plant gets visualized in Grafana.
Ricardo Kissinger: 00:29:17.917 It’s also possible with the new Influx release to actually connect systems like Power BI. I haven’t got into that portion yet, so I can’t show you anything about it. But this is one of the next steps that we also integrated into like a BI, business intelligence systems. And yeah, we are using both dashboards for automated reporting, and yeah, we try to use standardized dashboards to have quicker deployments and connecting new projects into the [inaudible] quicker, which especially is needed since the Telegraf implementation of a new plant is done within half a day. So also that’s why we wanted to standardize the Grafana portion to have it at a similar deployment speed compared to the Telegraf Influx 1.
Ricardo Kissinger: 00:30:10.413 The cool thing what we were able to do now with this new cluster and without worrying about query performance, we were able to redefine Grafana alerts. We did basically start with multi-dimensional alerts in the legacy Grafana system. It wasn’t using those multi-dimensional alerts. It also was like a rather old Grafana with a lot of those cool new features in the alarm section, which was missing. Yeah. So we are now able to have also super time-based conditions for the alarms. We are able to have greater context reducing false-positive alarms. I also did work on an integration through Opsgenie to allow automatically generation of tickets. So with the legacy plants, there wasn’t automated ticket system integration for Grafana alerts. So we did get emails with alerts from Grafana or from the plants directly, and then somebody did manually create the tickets. And with this new system, we are now able to actually automatically create those tickets for alarms. For instances, we are able to have on call scheduling and routing rules, and we also have SMS and phone call notifications in case of emergencies, which wasn’t possible before. And yeah, since we can now automate the creation of tickets, it also streamlines our incident management workflow, and it also reduces the workload of everybody working with the plants and the systems.
Ricardo Kissinger: 00:31:53.168 Here’s an example of showing one of those multi-dimensional alerts. I really can recommend that you make sense of them because they allow you to really have a nice way of notification to you without the need of setting up for each individual technical unit an alarm, but you can just make one query, and then it can basically alarm multiple technical units or other systems, which really is great. So I’m switching over now and give you a short overview of some live dashboards, and I will also quickly walk you over the alarm example. Just a second. This is that screen. Yeah. You should now see Grafana dashboard, basically, which is one from SIMATIC 4. So I made it basically so there are multiple categories for each section of the plant from like the privacy frequency control data, the intraday trading data, technical units data in general, inverter data, general plant data, fire system detection, battery data, and climate data.
Caitlin Croft: 00:33:25.794 Ricardo, do you mind zooming in a little bit? It’s a little hard to see —
Ricardo Kissinger: 00:33:30.751 Okay. [crosstalk].
Caitlin Croft: 00:33:31.209 —all the text, and I know people love seeing these data points.
Ricardo Kissinger: 00:33:35.921 Let me see if I can change my resolution. Is it better now?
Caitlin Croft: 00:33:40.128 A little bit? Yeah, if you just zoom in a little bit more if possible.
Ricardo Kissinger: 00:33:44.484 Let me see. Like that?
Caitlin Croft: 00:33:48.510 Yeah, that’s perfect. That’s better. Yeah, awesome.
Ricardo Kissinger: 00:33:50.769 Okay, perfect. All right. Yeah, so there’s a section for basically each section of a plant, and there’s also possibility to actually filter so that you can see the data for the technicals if you actually want to see them. And the nice part is that this actually really works smooth. So I can really just have all of those nice-looking visualizations open, and on the fly, change the amount of technical units I’m actually querying. Like I can say I want to query now, 20 of them, which will result in, yeah, way more data points popping up. And the cool thing is really that as you could see, it really works flawlessly. I can also just switch out the time frames, and it really loads quickly compared to the way this was working in our initial Postgres integration just to give you an — we did try out with SIMATIC 4 initially to have a Postgres database because the Influx version we had had a lack of EDR process, and that’s why we were considering Postgres because I know for a fact that Postgres has like a mechanism already built in to send data from an Edge location to a centralized cloud.
Ricardo Kissinger: 00:35:24.629 So we started to implement Postgres as a starting database for SIMATIC 4, but as you can see here, it doesn’t really work that well. It really is like loading forever. I mean, yeah, you can basically make Postgres work, and you could also use Timescale to work better with it. However, on the other side, I mean, the way this did load and the performance bottlenecks we had with this Postgres integration was really a showstopper for us. That’s why we also switched back to Influx and did work with Influx to actually get the best possibility in the setup to get rid of those loading times. Because again, SIMATIC 4 with Influx just switching out and selecting random technical units, this just loads instantly. And it’s the same data. It’s not that it’s different data. It’s basically the same data. Both databases collect the same amount of data points. And yeah, I think it speaks for itself.
Ricardo Kissinger: 00:36:33.336 With Influx database, we didn’t need to think about indexes. We didn’t need to think about, “How can I optimize with driver? How can I optimize the cluster performance?” We didn’t need to think about any of that, because it was just working, similar for other plants. So, this is [inaudible], for example, also has super quick loading times. And if you go for higher time frames, it just loads. Hence, yeah, that’s the performance I like to see, and that’s the performance we were not seeing using the Postgres integration. And that performance also we didn’t see with our, initially, V-server and running on [inaudible]. Yeah. So here I will show you quickly now dashboards in the old [inaudible], basically. So it’s not so easy — let me see if I maybe make this bigger. So can you see everywhere those white spaces?
Caitlin Croft: 00:37:41.559 Yes. Yeah, the break in the data?
Ricardo Kissinger: 00:37:45.386 Yeah [laughter]. Exactly. So this is basically — so I have the old integration parallel running. This is the result from the old integration. This is the data you get. It’s not really clean, it’s not really nice, it’s not complete, period. And if I quickly show you now exact same plant with the new integration. So yeah. So that’s [inaudible], the 26th, which was today, earlier today, around noon-ish. Let’s set that up as well. Yep. So it’s the same time frame. It’s this exact same setup. It’s the same router, the same firewall, the same internet connection, using the Influx EDR process. All data is here. All data is the way it is supposed to be, not using Influx EDR. So I think it really illustrates quite well how great the Influx EDR process actually works, even for [inaudible] locations, which have terrible internet.
Ricardo Kissinger: 00:39:25.636 Yeah, and besides that, as said before, the alerting system which you can use in Grafana, using those multi-dimensional alerts, which basically gives you also the proper information, “Which technical unit is affected? Which system is affected?” In the past, we didn’t have that. In the past, we had an alarm, “Hey! Check the plant. Something is wrong.” You didn’t know which technical unit. You didn’t know which area the problem was, you just had a random error message with not much details. And yeah, now you can basically also just look in the query and resize and see what happened in that time frame, and yeah, it’s way easier to investigate issues and also to have better alarms so that you don’t have false positive alarms, yeah.
Ricardo Kissinger: 00:40:19.391 All right, that’s for that so far. I will switch back now quickly to the PowerPoint for wrapping it up. Grafana multidimensional alerts. Yes. Play from [inaudible] Slide, okay. Yeah, to give you a short summary and an outcome of the next steps and projects we are going to do. Yeah, so generally getting data with Telegraf. Telegraf really supports a wide range of devices and communication protocols, and it allows really easy integration, similar to Influx database. You don’t need to think twice about the implementation of a solution, and it really is also supporting and supported by a wide range of products. So our Firewall systems for example, also have a Telegraf Output Plugin, which I can use to basically visualize all of the Firewall data in Grafana. The Influx Edge Data Replication really makes my life easier because it really makes a fantastic job of sending data from unreliable Edge locations into the cluster, and it also does a fantastic job of resending paid data.
Ricardo Kissinger: 00:41:40.953 Yeah, with InfluxDB Cloud Dedicated, the great thing is that we don’t need to think about storage costs or don’t really need to think about them anymore and usage anymore because [inaudible] gets way cheaper. It’s, again, a higher compression. I’m not sure how we always do it, but we make it. And besides that also the new cluster performs way better from what I can say, from the stuff we do with it the last few months. And yeah, we are going to evaluate if we can utilize OPC UA or MQTT more to get data via Telegraf from our main control units. And we are also working on integrating the EPEX SPOT data, which is basically the market data so the trades, “When did somebody buy energy, sell energy?” And we are integrating that into our cluster to have then one centralized cluster for doing analytics between batteries and market data, which is something which wasn’t possible for us in that scope in the past. And I’m really looking forward how we can make sense out of the data and how we can use the data and the integration of the market data in the actual battery plant data and the grid data to make even more and even better reliable grids in Germany and also help with getting a more sustainable future. Yeah, that would be it for now. Yeah. [laughter]
Caitlin Croft: 00:43:24.216 Awesome. Thank you, Ricardo. Clearly people are really enjoying this presentation. There are a ton of questions, so you ready for it?
Ricardo Kissinger: 00:43:34.884 Yeah, I’m ready for that.
Caitlin Croft: 00:43:36.658 Cool.
Ricardo Kissinger: 00:43:39.077 So should I just answer live, or should I type answers out? How should I —?
Caitlin Croft: 00:43:43.955 We’ll answer live. I’ll ask the question — I’ll state the question, and you can go ahead and answer it. Make it super easy. What battery technology is used for such a big storage plant?
Ricardo Kissinger: 00:44:00.392 Yeah, so currently we are still using NMC batteries, and we are aiming on going to LFP batteries in future. So the next projects will be definitely LFP batteries because of the advantages we have and the loose price. But currently, it’s still LMC batteries from Samsung, basically.
Caitlin Croft: 00:44:24.119 Typically local storage is less expensive than cloud storage. Did you have a different situation?
Ricardo Kissinger: 00:44:32.167 I can confirm that in theory. The problem is that the Edge locations are limited in hardware upgrade capabilities. So we have sometimes Raspberry Pi-like devices, which don’t really have hard drives you can just replace. And that’s why it’s not so easy to actually just add more storage locally to the Edge locations. But yeah, other than that, normally it would be cheap to get more local storage. However, that also is then just one copy. To have it highly available, you also need to have a backup somewhere else. You need multiple versions of it, and then it also makes the price go up. And especially considering the new pricing of InfluxDB Cloud storage, for me, it’s cheaper to just get data in the cloud.
Caitlin Croft: 00:45:27.488 Probably also easier you just have it all in the cloud ready to go. Did you do a complete migration of all of your data from your local database to the cloud, or did you have any data loss?
Ricardo Kissinger: 00:45:40.799 Okay, so I’m still migrating all of the data. And the reason why it takes a bit is basically since I need to normalize the data, sending it through Telegraf, I need to limit the write speed. Otherwise, Telegraf would actually not be able to keep up because of all of the data. And that’s why I’m still reprocessing the data. I don’t expect that we lose data because the databases, which were clean, I could just send to InfluxDB Cloud without any issues. And they already completely migrated. And the ones with inconsistent data, I don’t think that we will lose data when I’m through with the processing of data.
Caitlin Croft: 00:46:28.989 Why are you collecting your data at such a high frequency, every second? Were there any fast processes in your plants, or did you decide to collect data with some redundancy?
Ricardo Kissinger: 00:46:42.122 Okay. So it’s depending on the area on the plant where you have to have data really a lot more frequent. So for example, the climate systems, we would need data from a climate system every 5 or 15 minutes, for example, because the temperatures don’t change so quickly in the room. However, for the batteries, which is also the main data load or the main data points are the batteries or around the batteries and inverters, in those, we need actually in a relatively quick resolution. So that needs to be in between one and five seconds. And yeah, we need that data for the batteries. And also the grid requires us to have some data in one-second resolution. So that’s why, to keep it simple, we just did everything in one-second resolution. Yeah.
Caitlin Croft: 00:47:41.060 Telegraf has built-in support of MQTT, or is this a separate Telegraf plugin? Did you have Telegraf on the same host as InfluxDB or on separate host or hosts?
Ricardo Kissinger: 00:47:57.815 So I’m using the native plugin from Telegraf. So you basically just say which input plugin you want to have a Telegraf, no need for any other applications. And what was the rest of the question?
Caitlin Croft: 00:48:15.286 I think they’re just curious if it was —
Ricardo Kissinger: 00:48:16.600 Oh yeah. I’m using it. Normally, I’m running it on the same host. Yeah. So for example, on SIMATIC 5, I think I have 20 Telegraf processes running in parallel. Some of them are collecting inverter data for me directly, some of them are connected to the plant. So there’s no issues with running multiple Telegraf [inaudible] on the same host. And I would recommend that you actually choose an Influx database for the output close to the Telegraf, like on the same machine, if you can, to prevent that data is lost in the processing.
Caitlin Croft: 00:48:59.013 Have you considered hardware replication? What happens if the hardware you have placed remotely fails?
Ricardo Kissinger: 00:49:08.472 Yeah, we have considered hardware replication. However, it’s not so easy to just put more servers on a rarely finished project. There’s not a lot of space available, so there’s not the possibility to actually have more hardware in it. But yeah, it would have been a possibility, yeah, but we couldn’t consider it. And if remote hardware fails — so the new plants and the new systems are built the way that the energy — what’s [inaudible] on English? The power consumption device is duplicated — the drives are duplicated. So it’s basically single-fail preventable, so it couldn’t just fail over. Basically, there are always a backup on the new plants, but that wasn’t possible for the old plants. And on the old plants, if hardware fails, the plants are not able to operate.
Caitlin Croft: 00:50:19.555 Do you collect only sensor data from MQTT or other types of data too, like commands or system-level events?
Ricardo Kissinger: 00:50:31.405 We currently just grab data from MQTT and other types, but we don’t capture metrics and log files yet. I’m evaluating that currently, that we have more recording of such metrics and logging data, but yeah, we are in the process of evaluating how to actually connect them best.
Caitlin Croft: 00:50:57.267 Are your MQTT brokers local, or are they in the cloud as well?
Ricardo Kissinger: 00:51:03.368 Basically, everything which happens on the plant happens locally, and then the Edge data application sends it to the cloud. So this is really the only portion where we interact with the cloud. Besides that, the plant is operatable without any Internet. It’s everything running local, and then sending it to the Influx Cloud.
Caitlin Croft: 00:51:27.601 And what about between your Edge server and the Modbus devices? Is MQTT used there as well?
Ricardo Kissinger: 00:51:36.027 In the temporary one, yes. With other devices, we are evaluating of using MQTT in future to get around of configuring a bunch of registers in Telegraf.
Caitlin Croft: 00:51:49.225 Have you considered consuming MQTT directly on the cloud to avoid using InfluxDB’s Edge Data Replication?
Ricardo Kissinger: 00:51:59.220 No, because that wouldn’t be possible because the machines basically generate the data locally, and also I wouldn’t want to have the local generated data and send it to the MQTT broker in the cloud because that could fail, and then I don’t have the data. Using a local Influx OSS version with EDR actually doesn’t make me lose data.
Caitlin Croft: 00:52:28.196 How do you manage configuration for multiple sites? Do you need to worry about configuration updates? If so, how do you manage updating these configurations per site?
Ricardo Kissinger: 00:52:40.361 Yeah, so we did not worry about this in the initial start of a project until I did run into the issue that I was not sure which is the current master file. So I basically now operate differently. So I’m doing all of the testing, checking the code into Azure DevOps, pushing it to DevOps, and then I’m pulling from the servers the recent configuration files and move them into the actual working directory of the server. So basically I’m using DevOps now to take care of — making sure that we have the correct configuration files on the correct projects and that we don’t send the wrong configuration files to the wrong servers and projects.
Caitlin Croft: 00:53:28.383 Do you have any post-processing of data like filtering, discrete Fourier transformation, or others?
Ricardo Kissinger: 00:53:36.885 We had some post-processing for our Generation 1.0 plants, which was done using Python scripts running on some servers. We did get rid of that now using InfluxDB tasks to do some reprocessing or some calculation on some data points. And other than that, we call it — my area at least doesn’t do anything with the data. The analytics boys and girls, I think, do something with it, but I’m not 100% sure what they do with it.
Caitlin Croft: 00:54:10.583 What techniques did you use to prevent false positive alarms?
Ricardo Kissinger: 00:54:15.511 Yeah [laughter]. So I think the initial technique is just don’t configure an alarm. But yeah, so basically I try to — I’m coming from IT. Monitoring systems and false positive alarms is basically, yeah, the last 10 years of my life. And that’s why I know how important it is to not have too many false positive alarms. So I would really do it the other way around. I would just start to implement alarms when you have valid data when you know the data which you get actually is correct and has a correct resolution. And if you are certain that the data which you’re getting in is fitting, is fine, then I would start to set up alarms. However, I would really recommend heavily to do alarms. If you have multiple areas where you want to configure alarms, configure it for one area, test it through, make sure that it’s working all good. Using latest Grafana, you can actually duplicate alarms, and then you can just, within 10 minutes, recreate the alarms with a working alarm logic, which really is key here, and then apply to the other areas of whatever you want to control an alarm. But that method really is causing less headaches than the other way around.
Caitlin Croft: 00:55:44.216 What is the rough latency of propagating of alarms from temperature sensor to Grafana dashboard? Is it milliseconds, seconds, or more?
Ricardo Kissinger: 00:55:54.946 It depends a bit [laughter]. If it can send the data, fine. So it could be a bit more than seconds. But normally [inaudible] internet connection. I didn’t really look into it because it was just always there instantly, so it must be seconds but only a few seconds. It’s not much time.
Caitlin Croft: 00:56:19.021 Do you use other cloud services or just InfluxDB Cloud for integration?
Ricardo Kissinger: 00:56:27.107 I’m only using Influx tools for integration and data handling currently. So I have a bunch of rendered V-servers, which I’m currently using for the post processing and normalization of the data. But all of that basically is Telegraf and Influx and just getting data from A to B.
Caitlin Croft: 00:56:48.537 Have you used other [inaudible] protocols besides Modbus?
Ricardo Kissinger: 00:56:55.223 Not really. We are going to [inaudible]. However, we are not able to get around Modbus, so also the new energy systems which we are going to buy will use Modbus TCP as main communication method, so we will not be able to actually get around it. It’s not a decision we can make because we have to work with what we get, yeah.
Caitlin Croft: 00:57:27.797 Which MQTT broker are you using?
Ricardo Kissinger: 00:57:31.589 I’m using Mosquitto and using a GitHub project from like I’m not sure what’s the name of the guy, but if you search for Mosquitto docker compose GitHub project, it’s basically ready to just like — you clone it and then you basically define your password and spin up and it’s just working.
Caitlin Croft: 00:57:55.026 Can you talk more about what you meant about inconsistent data? Are you talking about having different field keys across the database?
Ricardo Kissinger: 00:58:07.187 No, so I have basically a tag system for each plant, which the tags work over all of the measurements. So I’m basically able to group and structure over all of the measurements based on the initial system of attack, which is basically based on the way we see attacking a unit. So everything which belongs to the technical unit, which is the battery, the inverter, the climate gets a tag, TE1 for example. And that is the same TE ID through all of the measurements per database.
Caitlin Croft: 00:58:53.651 Perfect. Let’s see. Can you talk more about your solution for local hardware?
Ricardo Kissinger: 00:59:01.344 Okay, yeah, so basically we are working with Thomas-Krenn. They make servers and other stuff. They also have some cloud infrastructure, but mainly they are basically a server company. And for the Generation 1.0 plants and also for smaller plants, I’m using a device. It’s called UNO from Advantech, which is really cool because it is like a small mini PC, and there’s an UNO, which I can add four land ports to it. This then is my firewall using OPNsense and also another one of those small Raspberry Pi-like devices. Basically, it’s been used as the actual plant server to collect data. It doesn’t really have four gigs of memory and four cores or something like that and like a small SSD drive, and yeah, the point is most devices are running in the plants partially without climate control. That’s why they are really like we have a heavy brick which just lifts. And for the bigger plants and bigger projects, like for example with [inaudible], we actually have super micro servers from Thomas Gran, which have also Ubuntu and Rate One with two SSDs. For the system itself, it’s an Ubuntu system, and then a few SSDs for BTFS rate one.
Caitlin Croft: 01:00:50.285 Did you migrate all of your alerting from Kapacitor to Grafana alerts or did you always have them in Grafana?
Ricardo Kissinger: 01:00:58.114 So we had initially alerts in Python scripts, and that exists still. However, it’s another way of false positive alarms because a lot of those log messages are actually just wrong. Other than that, we never had Kapacitor alarms, we only use Grafana alarms, but we were not able to just copy and paste those from the old plants because it wasn’t using the new logic, which allows the multidimensional alerts. So it was basically limited to the classic conditions, which didn’t give us the benefits. And that’s why we need to redo all of the alarms. But that’s basically a limitation — or a Grafana limitation or trade off and has nothing to do with Influx.
Caitlin Croft: 01:01:46.417 How do you protect the data when using MQTT and Telegraf?
Ricardo Kissinger: 01:01:52.727 Yeah, me, I’m not sending the data to the cloud besides using the Influx EDR process, which then sends the data encrypted. So since all of the traffic on the local plant, it’s a separate network, so it’s unencrypted and then using EDR to have it encrypted [inaudible].
Caitlin Croft: 01:02:15.112 What is the frequency of data fetching from each device and frequency of your data logs?
Ricardo Kissinger: 01:02:22.481 Yeah, so really the Generation 1.5 plants, we are getting all of the data in one second resolution. So we get every second the data information from the climate systems, the inverters, the batteries, everything from a plant, which is around about 1,500 to 2,500 data points per second per plant.
Caitlin Croft: 01:02:46.138 I know we’ve run completely over, Ricardo. There’s a few more questions [laughter]. Do you have time?
Ricardo Kissinger: 01:02:52.372 Yeah, [inaudible] [laughter] fine.
Caitlin Croft: 01:02:54.775 Cool. I just realized we’re completely over, but still wanted to be respectful of your time. Let’s see. When you said duplicate alarms, is it an automated process that adjusts to the needed database and devices, or is it just a copy of the query and alarm logic when you then associate the desired database or device?
Ricardo Kissinger: 01:03:16.806 Let me quickly just switch over to Grafana to show us. Makes things easier, hopefully. So in Grafana v9 dot something, we introduced basically the possibility to duplicate the alarms. So if I’m doing that now and duplicating it, it would create a copy of it, and I would now be able to just switch out here to, for example, SIMATIC 3 database. And then I would just need to basically define the folder to be [inaudible] free and to have the notification policies to say, “Here’s SIMATIC 3,” and that would be it. It’s like how easy you can just duplicate the recent Grafana version’s alarms. And you don’t need to do all of this setting it up manually again. It’s just — you can duplicate it through the UI. That’s what I was referring to, basically, if that answers the question.
Caitlin Croft: 01:04:30.211 Let’s see, I think that’s all of the questions. I’m just going through — oh, did you send commands or orders to devices? And how many devices do you have pushing data to your servers, say, when you have your own Ubuntu server versus now?
Ricardo Kissinger: 01:04:54.270 So the controlling of — for example, if the battery charges discharges — that is not done through Telegraf, because Telegraf really can just capture data. That is done through like the SPS control system that takes care of sending commands, basically. And the amount of Linux machines, so we have around about 34 Linux Ubuntu servers in the plants and then servers running around and taking care of things. And yeah, then for the new generation 1.5, it’s, I think, 150 technical units, which each have a battery and a climate system, and two technical units share one inverter. So it’s really a bunch of devices, but I don’t know the total number in my head right now. Sorry.
Caitlin Croft: 01:06:01.245 Clearly there’s a lot. [laughter]
Ricardo Kissinger: 01:06:02.917 Yeah.
Caitlin Croft: 01:06:06.088 Which device is running the Grafana server process?
Ricardo Kissinger: 01:06:11.666 So we have basically services unit, grafana.net that is running Grafana Pro hosted by Grafana directly. And then we have — this is the old V-server, the old existing infrastructure, the old V-server I spoke about earlier. And then we have this one — the Grafana unit. And actually, that one is self-hosted on some servers, Ubuntu, and just goes as Grafana, basically nothing special.
Caitlin Croft: 01:06:52.913 Let’s see, what was the frequency of data fetching from each device and frequency of the data logs?
Ricardo Kissinger: 01:07:02.322 Sorry, could you repeat the question?
Caitlin Croft: 01:07:05.511 I think they’re just curious of what was the data frequency of collecting all this data —
Ricardo Kissinger: 01:07:12.035 Yeah, so for the old plants —
Caitlin Croft: 01:07:13.422 —but from the devices.
Ricardo Kissinger: 01:07:15.675 Yeah, so the old plants, the devices collect basically in 5-second resolution, so they get all of the data in 5-second steps. The new plants always get the data in 1-second steps. And for the MQTT broker, I think it’s every 20 seconds, something like that maybe. Let me quickly see. Let me see if it loads. Oh, cool.
Caitlin Croft: 01:07:56.574 The joys of live webinars and demos right?
Ricardo Kissinger: 01:07:59.432 Yeah, yeah. Oh, yeah. It’s definitely one of my [inaudible]. [silence]
Ricardo Kissinger: 01:08:46.751 I think it’s every 20 seconds or something like that. It looks to be at one per second actually.
Caitlin Croft: 01:08:59.478 So every second?
Ricardo Kissinger: 01:09:01.335 Yeah, it looks to be. It’s a bit hard with that resolution. Hang on. Yeah. It looks to be every second once per second.
Caitlin Croft: 01:09:12.621 And you have different frequencies. How did you figure that out? Was it sort of trial and error figuring out what needed to be collected every second versus every five seconds? Or did you already know?
Ricardo Kissinger: 01:09:25.168 Well, so we did know the highest value asset we have on each plant [inaudible], portion which costs the highest cost and the highest loss if you don’t take care of it. And both are a requirement to at least have it in five-second resolution. However, parts of it need to be in one-second resolution, and that’s why we just switched and have everything in one-second resolution. For the batteries, for the climates, we would need it to. However, to keep it simple, we also just collect the climate data in every second resolution as of right now. But that’s something which we could optimize in future because with climate data, we don’t need it one second resolutions.
Caitlin Croft: 01:10:14.046 One final question. Someone’s asking, can you go into more about what method you are using to send commands to devices? Did you mean SSH, or did you mention SS? Could you please dive into that more please?
Ricardo Kissinger: 01:10:32.013 So basically, we are currently using Phoenix SPS Systems and Beckhoff Systems, which are basically [inaudible] PCs, which basically take care of — oh, hang on, I will make my screen a bit bigger so it’s easier to see. Those devices basically we are using on our plants similar from Phoenix — actually, it’s quite nice. So both devices now have Influx built in. So we have Telegraf and Influx built-in to allow direct sending out data to Influx and, yeah, they also take care of sending data and commands, and when we’re using from Phoenix and from Beckhoff, we have SPS Systems.
Caitlin Croft: 01:11:34.821 Perfect. Thank you, Ricardo. I think we’ve gotten through everyone’s questions. Really appreciate everyone joining us today. Thank you so much Ricardo for answering. Clearly, everyone is super excited about how you guys are using InfluxDB just based on all the questions. So really appreciate it. Thank you everyone for joining. Really excited to get an InfluxDB Cloud Dedicated customer story out there for you guys. So really appreciate it. Once again, this webinar has been recorded and the recording as well as the slides will be made available by tomorrow morning. You all should have my email address. So if you have any follow-up questions about InfluxDB and maybe questions for Ricardo, please feel free to reach out to me. I’m happy to put you in contact. If you’re in our Slack workspace, I am there as well. So once again, thank you everyone for joining. And Ricardo, you did a great job. Thank you so much.
Ricardo Kissinger: 01:12:42.628 Thank you. Thanks. And thanks for the opportunity and that you all took the time to actually listen to me. And yeah. I wish you a great evening or day, depending on your time zone.
Caitlin Croft: 01:12:54.303 Thank you so much.
Ricardo Kissinger: 01:12:57.062 Thanks.
Caitlin Croft: 01:12:58.050 Thank you everyone.
Ricardo Kissinger: 01:12:58.947 Bye-bye.
Caitlin Croft: 01:13:00.641 Bye.
[/et_pb_toggle]
Ricardo Kissinger
Head of IT Infrastructure and IT Security, ju:niz Energy GmbH
Ricardo Kissinger's heart truly beats for technology. As an early adopter of cutting-edge IT solutions, he consistently stays ahead of the curve. A fervent advocate for Free and Open Source Software (FOSS), Ricardo brings a unique blend of expertise in IT security and infrastructure to ju:niz Energy GmbH. Continuously driven by challenges, Ricardo not only addresses but also thrives on unraveling the intricacies of the digital realm.