How Enprove Built a Monitoring and Analytics SaaS Platform on InfluxDB
Session date: Sep 10, 2024 08:00am (Pacific Time)
Join us for an insightful session on how Enprove built its SaaS analytics solutions with InfluxDB as the data platform. Enprove’s Enelyzer platform, powered by InfluxDB, delivers comprehensive visibility into energy consumption patterns across various hardware systems, irrespective of the vendor, device, or communication protocol.
By seamlessly integrating real-time energy data with external datasets such as meteorological and utility metrics, Enprove provides analytics and alerts that help customers cut costs and create strategic plans to reduce carbon emissions.
Many organizations struggle to aggregate and analyze complex energy data accurately. Utilizing InfluxDB’s robust data handling and real-time processing features, Enprove addresses these challenges, empowering businesses to make informed, data-driven decisions in real time.
In this session, you’ll learn:
- The criteria and results Enprove used to evaluate moving from InfluxDB OSS to Cloud Dedicated, including security, performance, costs, and support
- How Enprove built its Enelyzer and why cardinality consideration was critical to building an SaaS solution
- Tips for building a high-velocity ingestion pipeline when data is coming from standard protocols like MQTT and proprietary sources
- Tips on how to optimize the database to make complex calculations on the fly
- How Enprove is moving beyond reporting and analytics to forecasting and insights
Watch the Webinar
Watch the webinar “How Enprove Built a Monitoring and Analytics SaaS Platform on InfluxDB” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “How Enprove Built a Monitoring and Analytics SaaS Platform on InfluxDB.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. Speakers:
- Ben Corbett: Solutions Engineer, InfluxDB
- Michallis Pashidis: CTO, Enprove
- Abdullah Sabaa Allil: Technical Lead, Enprove
BEN CORBETT: 00:00
So, I think for now, I’m going to hand over to one of my favorite customers, a customer I work very closely with, very near and dear to my heart. We’ve got both Michallis and Abdullah here. So, I’ll hand over to you guys and yeah, who are going to walk us through a little bit about their Enelyzer product. Okay. Over to you, I guess. Thanks.
MICHALLIS PASHIDIS: 00:16
Yes. Thank you, Ben. Thank you for the nice introduction and welcome you all for joining the webinar. Let’s first start with introducing ourselves. My name is Michallis Pashidis. I’m a technology officer at Enprove for Enelyzer. And Abdullah?
ABDULLAH SABAA ALLIL: 00:36
Yeah. So, I am a technical lead. At Enelyzer, I’m responsible for the technical part of Enelyzer and directing the technical direction of Enelyzer together with Michallis.
MICHALLIS PASHIDIS: 00:47
So maybe a short introduction, what is Enelyzer all about. What we basically are building or have already built in production is a system. It’s a SaaS platform. That’s very important. And it’s guiding energy-intensive industries to a greener transition. That’s basically the goal and with strategic energy and sustainability insights that we provide to our customers. So basically, we’re collecting and converting energy consumption data from different channels and different branches, let’s say, in the industry, and we are transforming them into actionable insights and thus improving the operational efficiency and sustainability. That’s basically the goal that we were working on the last years. So, if you take it from starting with in a nutshell, it’s an industry that is still heavily relying on spreadsheets data. Excel, for example, is typically what we see in that domain. And the system or the platform that we are building, it’s more like having an interaction or a dialogue with the customer by asking the correct questions for data or metadata and then return to give them advanced strategic insights. That’s basically where we want to go. It’s not only a bottom-up, but also a top-down approach that we have.
MICHALLIS PASHIDIS: 02:10
That results in not only the typical graph views that you see here, but we will put a focus on that, for example, throughout the webinar. But it gives you also many more features that we build on top of the data that we have. Typically, it’s about carbon footprinting, CGI reporting, and so on. But maybe let’s start with the beginnings. We had already an initial monolith system, which Abdullah will explain more in details from the technical side, and we had some challenges in that journey. So, the challenges that we had are basically, we sum up of them, not everything because there are many more than that. But those are the challenges that we had put into our roadmap and where we had a very good collaboration with Influx to solve many of those. So, first of all, as many organizations, I guess, we start with a monolithic model, and then we need to scale. So, when you are in the scalable phase or the scale-up phase, yeah, that’s one of the non-functional requirements. We need to scale and have the architecture scaling with it. Typically, by having a boost in sales, creating new organizations or customers, the technology must go along with that. That’s one of the first things that you put down.
MICHALLIS PASHIDIS: 03:31
The second one is everything that had to do with performance, not only in the ingestion flows by where we take in the data before we manipulate the data, but also when we return, for example, the insights, the reports or the graphs. Typically, we did that by listing all the KPIs that we need to have or the wish list of KPIs that we need to have, which are not really bound to the requirements as such, but to the non-functionals here. Performance is one of the most important ones that we had. And we need to store a lot of time series. That’s basically what we are doing. We store a lot of time series as it’s the source that we build upon. And how do we do that in an efficient way and a way that we can keep all the other requirements also involved? So, the capability to have memory projections was another one that we noted down because typically, if you start, we had the setup of relational databases with some materialized views. We had to calculate in-betweens, make some derivations, store that again, and then to get the end result fast in our web application. We wanted to get rid of those, not only for maintainability reasons, but also to boost up the performance and the capabilities that we can expand on the roadmap later on.
MICHALLIS PASHIDIS: 04:56
Extension points, we will dive also a little bit into that. It’s about data analytics. It’s not only capturing data and visualizing it, but also using the same data to do more interesting stuff like benchmarking and data quality, sanitization flows, and so on. So that’s also an important point we mentioned as a challenge. And then the near real-time data, yeah, that’s also a typical one in the IoT. It’s not about data coming in and visualizing it, but we also have external parties who are steering on the IoT side of a customer. So on-premises, where we need to go fast and have a priority lane available throughout the application itself. And before last one, downsampling of chart visualization. What does that mean if you have a big data set and you want to visualize it and get that done on a smart way and very fast, it means that we need to have the capability of downsampling. Again, a topic that we’ll discuss also in the trophy presentation. And the last point, which is a point that is a preparation step for the future, if we have the data captured and normalized in our platform and we add metadata on top of that, that was not sufficient with the requirements that we want to achieve.
MICHALLIS PASHIDIS: 06:29
So, we introduced also web ontologies to have the capability of reasoning, supporting the machine learning models, and so on. So, we added the reasoning step as well in the platform. We call that data ontologies. We’ll not dive too much in that, but it’s an important factor that we kept as a requirement while we were doing the integration with the Influx IOX or P3. So that’s, let’s say, the menu. Our customers typically are intensive industry, energy-intensive industry, and property managers. Today, we have about 30 industrial customers, which not only have one site in Belgium, but are internationally, so in different countries. We have about 8,000 time series that we manage at the moment. And the target that we have in a short amount of time is to go to 200 customers with 75,000 time series. So, you see directly why this webinar came to life or collaboration with Influx came to life is basically to be able to support those challenges throughout our product life cycle. And now let’s more jump into the technical workout. So, I will let Abdullah present the rest of the slides here.
ABDULLAH SABAA ALLIL: 07:58
Thank you, Michallis. So just some history about Enelyzer. So around more than 10 years ago, we started with an energy management system that was just a simple monolithic app as Michallis mentioned. And it has only a single database. And that database was shared between all customers. This approach was fine back then because we didn’t have that many customers, and the amount of data was also fairly reasonable for technologies that were used back then. But as we grow and as the number of our customers kept growing and the amount of data also kept growing, we needed to come up with solutions to scale with our scale of customers. And the fact that we were growing quickly caused performance bottlenecks and platform was struggling quite a bit with querying the data or inserting data. That’s when we decided to change the system architecture a little bit in an attempt to improve the performance. We isolated the energy data by customer, and we started using PostgreSQL for derived data, which is basically aggregated data that has been calculated based on the energy data. We also used PostgreSQL for contextual information about our customers like their assets, the employees, or their tenants that they manage. We also updated our backend systems and modernized them. So, we switched from using a monolithic architecture into a microservice-based architecture. And it was mostly built using Scala with a little bit of Rust in the beginning.
ABDULLAH SABAA ALLIL: 09:36
But here we also noticed that the performance still cannot keep up with our scale. So, our struggle continued, and we needed to find the new alternatives and new solutions to optimize our systems. And that’s when we started to look for more specialized databases. And time series databases were the logical choice for energy data since energy data is typically measured over time. After evaluating several time series databases, we decided to go with InfluxDB. We started using InfluxDB Cloud Serverless, but then we quickly realized that our scale is beyond the capabilities of Influx Serverless. So, we made the switch to InfluxDB dedicated, and it has been a great experience so far for us. We also modernized our backend systems again and reduced the usage of Scala and started adapting Rust for our entire backend, which also resulted in great performance and resources improvements. There are several reasons why we need to modernize our systems. Energy data is quite challenging. Customers would like to generate certain reports and audits from our systems, and they want this to be reasonably fast. If they also have certain alerts, for example, in order to get notified, if the consumption goes above or below a certain threshold, the querying of the data for these alerts need to be reasonably fast for these alerts to have some meaningful function. This is why fast querying and aggregation of the data is a big requirement for us, especially when there are multiple variables, parameters, and dependencies involved.
ABDULLAH SABAA ALLIL: 11:14
Another requirement that we needed to have is continuous data ingestion from virtually any source without vendor logging. Our customers often have their own hardware, or they have a third party that’s managing their own hardware. And we don’t want to log them in in a proprietary hardware that can only be used by us. We want to keep our system as open as possible to the end customer. And the ingestion can happen from different interfaces MQTT, APIs, FTP, and so on. Another important requirement is high-volume data ingestion of historical data. When we onboard a new customer, they will often need to import all their existing energy data into our system. And this can be four or five or six year’s worth of data. And it can result into millions or even billions of data points that need to be imported all at once. So, we needed this to be done quickly and reliably without having to worry about it affecting the performance of our system or even maybe also crashing the system.
ABDULLAH SABAA ALLIL: 12:21
I would like to show you some case studies from features and functionalities that Enelyzer offer to our customers and how we improve the performance and the stability of these features by using InfluxDB. So, our property management customers need to generate a report that contains the consumption and the relevant variables for every tenant that they have. They need this information to bill the tenant the costs of their energy consumption. But the generation of this report is dependent on multiple variables and parameters. It can be the duration of the contract, how many people live in the property, the surface or the volume of the property, how many properties use the same energy source, and many more parameters that needs to be taken into account when generating such a report. All these parameters are time-dependent, and they can vary over time. Based on all these parameters, we calculate what we call the consumption intervals, which is basically a permutation of all the different time-dependent parameters. In each of these intervals, we get the energy consumption, and we calculate the relevant coefficients that are needed for the report to be generated.
ABDULLAH SABAA ALLIL: 13:38
The initial implementation for the billing was as follows. So, we would get the input data, the different variables and parameters. We would determine the consumption intervals, get the consumption for each of these intervals from the energy database, which was Microsoft SQL in that case. We would calculate the report and the different consumptions and coefficients, and then we would persist the results in a Postgres database. That means that the report needed to be recalculated on every dependency change. The report was also periodically recalculated every day to be kept up to date. This approach was not ideal. It was very slow. It would take around five minutes for a report to be generated. And you, as a customer, you don’t want to wait five minutes every time you want to get a report out of the system. Querying the data was quite slow. Around 70% of the time needed for the report to be generated, was actually spent on the querying of the energy data.
ABDULLAH SABAA ALLIL: 14:41
Persisting the data was also slowing the whole operation down. And it was quite an error-prone approach since you needed to track all the dependencies and you need to update the report every time the dependency changes, which is really painful to debug and to diagnose. So, we decided to update our implementation and to change how the calculation happens. In the current solution that we have implemented now in Enelyzer, we would get the input data, different variables and parameters, determine the consumption intervals again, get the consumption intervals from InfluxDB this time, and we will generate the report in real time without persisting the results. That’s great because it means that the data is always up to date since everything happens on the fly, so in real time. And you don’t need to do any kind of dependency tracking in order to know if something changes because the dependencies are always up to date since you’re doing it in real time on the fly.
ABDULLAH SABAA ALLIL: 15:44
Another feature that I would like to demonstrate to you from Enelyzer is what we call the virtual tags. Users often like to combine different measurement points, energy sources if they want to know the consumption of an entire site, for example, which has several measurement points. Sometimes they would also like to know the proportion of the energy consumption with relation to the production of a certain product. That’s when virtual tags come in. It’s a simple domain-specific query language that’s used to query and to aggregate the energy data from different measurement points. For example, here we have a virtual tag which is used to get the water consumption of a primary and a secondary water source. The initial implementation was built on top of a Microsoft SQL Server. Calculating the results of the virtual tag in our legacy systems would take on average, about five minutes. And the results of the virtual tags were persisted in the database as the way to speed up the performance of querying the data of the virtual tags. Again, this approach was not great because here you need to track your dependencies again and you need to recalculate your virtual tag every time something changes in your dependencies. So, it was not real-time data. It was fake real-time since the data will need to be recalculated on a schedule or need to be recalculated whenever a dependency is updated.
ABDULLAH SABAA ALLIL: 17:17
Recently, we modernized our way of calculating the virtual tags as well as the syntax of the virtual tags that our customers can use. So, the user can define a virtual tag in our user interface. This definition that the user provides will be in our backend systems translated into an easier-to-use expression from developer point of view. And next to that expression, we need some variables to get the data, such as the time range of the data, the granularity of the data, which kind of aggregation we need, the time zone, and so on. The virtual tag expression together with the variables will get compiled to DataFusion SQL. This will happen in real time, of course, and the data will be then requested from Influx using the query that has been generated. The calculation of a virtual tag happens now entirely in real time. And it takes now, on average, around a second to calculate the data of a virtual tag. And we don’t do any kind of dependency management or dependency tracking when we query the data of a virtual tag. And I think you can agree that now it’s one second before it was five minutes that this is quite an improvement of the performance.
ABDULLAH SABAA ALLIL: 18:40
All these functionalities and performance improvements in Enelyzer could not be possible without InfluxDB. And one of the big reasons of all these improvements is how InfluxDB structures and separates the data on a physical layer. Traditionally, you would see databases often using indexes to optimize the query performance. But in InfluxDB, you can also define the partitioning strategy that InfluxDB uses to store and to query the data on a physical layer. The partitioning strategy is what will be used to store the data. And in our case, it would make sense to partition the data on a measurement-point basis. So, each IoT device, each meter will be stored physically isolated from the other measurement points. This will allow for better read and write performance since there is less concurrency involved and less transactions on resources. As we mentioned before, one of the requirements of Enelyzer is to be able to ingest data without having vendor locking from any source. These sources can be MQTT, API, FTP, PLC devices, and so on. Our initial solution for the ingestion part was to build custom connectors for each source that we have. But we ended up having several connectors, and each one of them was being slightly different from the other one. And the fact that they are custom implementations made it quite difficult to maintain them. You needed to implement almost the same architecture with slight and subtle differences between each one of them. So, it was not ideal. We’re still experimenting in the space to figure out a good solution that fit our vision of data ingestion with no vendor locking, but that’s also maintainable for us.
ABDULLAH SABAA ALLIL: 20:34
That’s when we started using Telegraf as a crucial part for our data ingestion. The architecture of Telegraf is based on the idea that the pipeline is plug-in-based, which means that you can have different input plugins based on the source from which the data is coming from. But you can also have transformation plugins, output plugins. This allows us to use the same transformation pipeline, but only changing the parts, the plugins that we need to change. So, if we have a different input, but it’s the same structure of data, it means that we only need to change the input plugin and keep the same ingestion pipeline of Telegraf as it is. This will all virtually result in less custom code and easier maintenance for us as well. We also often have to deal with proprietary sources. It can be custom hardware from certain clients, third parties. And these sources, they don’t typically have an open interface for you to communicate with them. So, it means that we need to implement our own custom client for these sources to ingest the data from them. Telegraf has the capabilities to be extended by developing your own plugins for these scenarios. And this allows us to reuse existing pipelines and only develop the parts that are custom and that need to be manually developed.
ABDULLAH SABAA ALLIL: 22:09
What’s also great about InfluxDB is that it’s built around a strong ecosystem. It’s great for us from software as a service, as a point of view, because it means that we can deliver features faster by using existing tools and utilities. That’s our current architecture. You as a user will start from the Enelyzer user interface. Our user interface is used to configure and generate reports and to view the data as well. The user interface is built using React and Highcharts. The Highcharts part we use because some of our customers only need to have limited and easy-to-use charts and visualizations, and they might not be interested in having the full capabilities and the full power of Grafana. The energy data is stored in Influx. And as mentioned before, we use Postgres to store the contextual data for our customers. We also use Grafana for customers who want to have more advanced dashboarding and charts. Some customers also would have alerting capabilities, and Grafana is a great choice for this kind of functionalities. As mentioned before, Telegraf, we use to collect the data from different sources. We also use Mage AI. It’s a great tool for scheduled tasking and ETL pipelines with InfluxDB. We use it to transform the incoming data that we have in a process that we call the normalization of the data. And that’s basically transforming and converting the data in a format that our system can work with.
ABDULLAH SABAA ALLIL: 23:52
We’re also experimenting with Mage AI for AI capabilities forecasting or predictions. Mage AI is great for this kind of functionalities. Our backend system is what powers the entire platform. It’s used to make all the calculations, the reports, the audits. It communicates with InfluxDB using an Arrow Flight client, which is a gRPC connection. And that’s basically quite a speedy and a fast connection instead of using the traditional REST interface. What we really like about InfluxDB is the fact that it’s built on top of known and popular open-source technologies and projects like Apache DataFusion, Arrow, Parquet. And each of these projects have their own community around them and their own developers. So, for me, as a technical guy, it means that I can just pinpoint and identify where there are bugs or missing features for us and the components on top of which Influx is built.
ABDULLAH SABAA ALLIL: 24:56
It makes it easier and quicker to fix bugs that may be present in InfluxDB or even to introduce new features in InfluxDB that are necessary for us. And that’s what we have been doing over the last couple of months together with the InfluxDB team and the DataFusion team. The latest version of Influx is fairly young, so it’s expected to have some subtle bugs or missing functionalities. And it was important for us to work with a database partner that we can grow with. And that’s exactly what we did over the last couple of months. So, we worked together with the InfluxDB team and the DataFusion team to make Influx work for our use cases. And it was done by fixing bugs or introducing new features that were necessary for us, as you can see in the GitHub issues and PRs presented in the slide. And now I will hand it over to Michallis to discuss the business impact of Influx and more business stuff.
MICHALLIS PASHIDIS: 25:59
Yeah. Just to wrap up. So, if you look at the initial challenges that we had and then let’s go and look back to the business impact that we have as a result today, not only did it, like you saw already in the slides, many intermediate steps that we needed to manage from a technical perspective, we could get rid of those. That was one of the improvements that speeds up development also of new features so we could focus on new things and adding capabilities to the platform instead. And what we really didn’t like so much to maintain are the materialized views. That was, I think, in the technical team itself. Nobody was fond of having this in our pipeline. So, we removed that completely. It’s not only about speed, but also about having an enthusiastic team that knows, “Okay, it’s a platform that grows, that have additional powers and is boosted to get on to the next— to go to the next level.” Another thing is: we decided where we started initially from a Scala backend to a Rust backend. Scala is still having its good features and opportunities that we use in the ingestion flow with factors, propagation, stuff like that. But we also had the tendency to go to Rust because it gave us a drop in the maintenance and the infrastructure costs.
MICHALLIS PASHIDIS: 27:32
As we are looking and keeping our flag up as an energy, sustainable energy vision and strategy, we wanted also to lower the costs of our own infrastructure. And that was a way, together with some other components that we changed along the way, that we could reduce the cost basically more than a half. And that’s also one of the reasons that we went for that language. But it also had some additional features that we needed, of course, in our backend. So in the last part that we didn’t focus upon, when you look at the Parquet files that Abdullah mentioned, it gives us a very good extension point to collaborate with partners because it means that suddenly we can use those Parquet files as a source for another AI system or a machine learning system or a prediction model where we can extend our capabilities, not by implementing it ourselves, but by collaborating with a partner who’s focused and has other capabilities complementary to ours. So basically, to wrap up here, we didn’t only achieve the performance, we also improved as a team. We improved as a SaaS platform. And we can go very much faster than before. And we have a reduction in the cost. So, we can only be very happy about the journey and just share that with you because of that. So, thank you.
MICHALLIS PASHIDIS: 29:02
I think the next slide is looking to the future plans. So, the future plans. We have quite some things on the roadmap, and they are scored, and they are, yeah, scored priorities. But basically, now we are looking into forecasting and predictions. We also wanted to have more advanced reporting and auditing. So, auditing module is also because we have one part of the company who is very active, who has expert in the domains delivering consultancy to companies. And, as a last point, the IoT processing at the edge by using InfluxDB Edge, that’s something we are looking forward to where we want to try also and see how that can fit into a model and ease the integration step as well by having calculations or pre-normalizations done probably also on the edge side. So yeah, I will keep it to there. So, we can wrap up and have some Q&A moments right now.
BEN CORBETT: 30:04
Yeah. It’s fantastic. So, I mean, that was wonderful. Thank you so much, Michallis and Abdullah, for such a comprehensive rundown of, yeah, I guess not only the why and kind of your journey are going through this kind of well-trodden path that we see from customers kind of growing out of their relational databases as applied to time series workloads, but also kind of digging into the technical stuff. And I think one thing that was really nice to see in your case from my perspective because I worked with you closely throughout the evaluation was just how much the direction that you wanted to go aligned with the strategy of Influx as you mentioned with regards to the open data architecture and how your engineers like to kind of be in control of the development of those functions and improvements. So, it honestly was a pleasure to work with you, and I can’t wait to see what you guys get up to next as well.
BEN CORBETT: 30:55
Okay. So, we’ve got a couple of questions that have come in. I was going to start off as well. So, I guess to everyone else that’s listening right now, we’ve got a Q&A tab. So, if you have any questions, please drop them in there. I can see someone’s also got their hands raised, but I would ask you to just drop the question into the Q&A tab, and we’ll try and get it answered for you. So, one of the ones I had, I’m sure someone else might have this question as well, and I’m not sure it might be directed more to you towards you, Michallis, is what was the impact on Enprove’s business for the bad performance of the SQL kind of legacy datastore? So, you mentioned that you had lots of materialized views, and I guess that causes an internal maintenance issue. But with regards to the slow report generation and things that, what were kind of the impacts that you were seeing on the business and your customers of that kind of inefficiency?
MICHALLIS PASHIDIS: 31:47
I can maybe summarize it as, the moment you start to go into the scale-up phase, it’s like you are punishing the existing customers by onboarding new ones. Basically, that’s how I can formulate it. And that was a problem because if you’re an organization that grows and suddenly you have to impact the existing customers who are already there for many years, it felt very awkward, and we wanted to come up with some solutions. So, it pushed us to the creative side by saying, “Okay, let’s grab a blank sheet and start it over again to see how we can impact it, how we can change that.” But also, being pragmatic because as it’s an existing business, you have typically to run down the legacy system in combination with the new one. You have to make sure that you don’t go too far in setting up a platform because you need to have a good balance between what’s good and necessary for business right now and what is the technology boundary that we need to introduce just to make sure that we grow steady but secure. There was an impact on the existing customers, but we mitigated that quite fast. So yeah, if that answers the question.
BEN CORBETT: 33:03
Absolutely. Yeah. Yeah. It’s just one we always to understand, right, is the value, not just to your kind of team internally, but also the customers. But yeah, that was really well articulated. Okay. I’m going to jump over into the Q&A tab now. So, I’ll start off with the top one. So, the first question we’ve got is, when dealing with energy data, interoperability is always an issue, a technical and semantic one. How can this be addressed with your solution and InfluxDB in general?
MICHALLIS PASHIDIS: 33:32
Yes. That’s a very good question. We didn’t touch it in a lot of details, but interoperability and the multimodal aspect of the system is very important. So, we knew upfront already we need to bake that in the infrastructure. On one side, interoperability, that’s why we mentioned the ontology or the web ontologies, for us, that’s a solution that is not limiting you to the limits we have at data schematics or schemas. So, by introducing web ontologies and triple stores and intermediate renditions of the same data but presented or pre-calculated in memory in a different fashion, it means that you can ask questions to the system from a different point of view. If I’m putting my glasses on from a building manager, you can ask different questions on the same data as when I put them on from a production plant and I want to see something or ask questions about the carbon footprinting of my production or my machines, for example, or a specific product.
MICHALLIS PASHIDIS: 34:37
So, we added that capability in it, and interoperability was one of the reasons why we added that to it. Because if you want to compare different assets or resources in your platform, you need to have a language to do that. But again, the maintainability of something like this is very difficult because it raises. You have one schema, you have another schema, it becomes 100 schemas, and maybe a general one that becomes 101st one. So, we introduced the ontologies for that, the web ontologies. We have some, let’s say, from a perspective of, do we have already standardization there? It’s not that you have different sources, of course, but it’s not one single ontology that we can apply on it. We are following those. But big difference is that when you work in that way, in that fashion, you can very fast comply with something that’s an ontology that can be there in the future. And so, we look at it from, “Let’s work with the data that we can make it interoperable for ourselves. If we need to make it interoperable and externalize that capability, we can do that. We don’t have the case yet, but we are ready to do that.” So, I guess the secret sauce there is a little bit— yeah, it’s by introducing a semantic web layer on top of the data.
BEN CORBETT: 36:02
And the future-proof approach. No, it’s a good answer. And I think we had a couple of other questions in that area as well we’ve got here. Yeah, more in-depth, any support for ontologies. And then it was ontologies like SAREF, SAR, GON, domOS, and things like that. But I think—
MICHALLIS PASHIDIS: 36:22
Yeah. So, let’s say in the actual state, this is still in the R&D phase for us. We have ontologies incorporated in it, but it’s still something that we don’t use in production. We use it internally. So, it’s still also an investigation track for us. So, we are experimenting with it. We have different standards or ontologies you can follow, and we are trying to feel how it goes and trying to use internal use cases, technical ones to get an answer to a solution which touches those aspects. So yeah, it’s not that we can say today Enelyzer is already compliant with those ontologies or can reason with those ontologies because that’s the benefit of having that. And that’s not the case today.
BEN CORBETT: 37:06
Yeah. No, thanks very much. So, I’ll give you a break for a second, Michallis, because we’ve got a couple of Influx ones, which I can see.
MICHALLIS PASHIDIS: 37:13
Yeah, that’s good for me.
BEN CORBETT: 37:15
Yeah, no problem. Have a glass of water. So, one of the questions we got here is, is InfluxDB version 3 available in AWS and Azure? So, I’ll just give you a little bit of an overview of that. So, the available editions of InfluxDB version 3 today, we’ve got three of them, two of them are fully managed. One is InfluxDB Cloud Serverless. That’s the shared infrastructure multi-tenant platform, which is usage-based and elastically scalable. That is and will only be available in AWS. So, you can have a look at the available cloud regions that we’ve got today, and that is the shared infrastructure platform. You might have noticed during the presentation today that that’s what Enprove started off with for Enelyzer. So, they went there, but that platform is really for kind of prototyping and hobbyists, kind of small workloads. And obviously, because it’s shared infrastructure, we do have to put quotas and limits on different tenants to kind of protect the service and the shared infrastructure from neighbors. So, I think that was the kind of scale issues that Abdullah was referring to when they started to throw a serious workload at this. And that’s when it’s time to move towards a dedicated environment, which is our second fully managed edition of InfluxDB called InfluxDB Cloud Dedicated Version 3. Now, today, that is only available in AWS in any major cloud region. So, it’s got a little bit more deployment flexibility. We also have GCP and Azure on the roadmap, but it’s not going to get done this year. I believe it’s on the roadmap for next year. But I don’t think we have a specific date for that. But basically, Azure is coming.
BEN CORBETT: 38:53
For customers that really want to get their hands on version 3, and it has to be in Azure, that’s when we would direct you right now towards our third available edition of InfluxDB version 3, which is the on-prem or self-hosted equivalent InfluxDB Clustered. So, this is exactly the same software as Cloud Dedicated, what Enprove have presented to you about today. But that would be deployed in your Kubernetes environment, in your VPC. You can put it in the cloud provider of your choosing or even on a Kubernetes environment on bare metal servers like some of our customers do. So that’s kind of how we satisfy the Azure requirements today. And then in the future, hopefully, next year, we’ll have a managed edition that’s deployable in Azure. So, I’m going to mark that one as done. And then one more we had as well, which I think I’ve just answered, which was when is the kind of on-prem equivalent of InfluxDB going to be available? And that was recently made generally available. So, as I mentioned before, it is the same software that underpins Cloud Dedicated. But because it’s a self-hosted kind of equivalent, we have a little bit more work to do when it comes to the deployment experience and documentation. But that is now generally available. If you want to make use of InfluxDB Clustered on-prem, please reach out to us and we can support you through an evaluation and kind of dig into your kind of technical Q&A and make sure it meets your requirements.
BEN CORBETT: 40:18
There are plans for a single node edition of InfluxDB, so that would be kind of the evolution of the open source, but version 3. And we’re hoping to have a version of that available by the end of the year. So yeah, stay tuned for that. And I think there might be— there’s one more Influx one here, which is InfluxQL. So, using InfluxQL in InfluxDB version 1.8, possible to make a pivot operation because I want to show energy meter data in Grafana. So, I don’t believe InfluxQL supports the pivot operation, but in version 3, we have SQL, which does. And obviously, in version 2, which does have a single node open source available, flux does as well. So, although InfluxQL doesn’t specifically support pivot operations, SQL does in version 3 and flux does in version 2. So, we’ve got some options for you there. Okay. So, let’s have a look here. So, another question for the Enprove team. So, this one says, “Have you experimented with embedding part of context data as tags in InfluxDB as an alternative to a dedicated database?” So, this is something that comes up quite a lot for us in kind of IoT use cases. So, this is kind of, I guess, enriching your time series data with context data and metadata. Does that make sense?
ABDULLAH SABAA ALLIL: 41:40
That’s something that we want to do. We’re experimenting with that a little bit because we have some sources that send us validated and estimated data. And we want to know which part of the data is estimated by them. So, it’s a prediction or forecasting and which part of the data has been validated by them. And we want to annotate or to tag the readings, the data in order to know which parts we let the user use for, for example, official reporting and which parts they’re not allowed to use. So that’s something that we want to have. I believe Michallis probably has other use cases for it, especially when you have ontologies and annotations involved.
BEN CORBETT: 42:31
Yeah. Perfect. Thanks very much, Abdullah. So, one more that we’ve got here is, from Diego. He said, I’d like to know if you’re using Telegraf in a containerized environment and how you manage the needs of the real-time data ingest.
MICHALLIS PASHIDIS: 42:47
Yeah. Yes. We used both, actually. We used it also containerized. We initially started also on a specific note, and then we moved it also in a container that was working fine. It had some tricky configuration stuff to know upfront because we had to learn that as well. But basically, we are using it now in front of the ingestion pipeline, which basically means that— or it’s running on-premise and say, for example, to a customer, if you need something, a tool that can, for example, extract some data from a database or get a CSV file and need to transform it, we just provide them now with the components to map that on a generic schema, which can be hooked in our channels so that we can help the customer plug in the data very quickly and easily. But that’s for the running data. It’s not about historian files or historic files or the data which is older because there we don’t use it at the moment. We had some tracks of trying Telegraf out with a lot of big data chunks of the past that was working fine as well. But we are still using different tools to do that at the moment. I don’t know. Abdullah, if you want to add something to that as well?
ABDULLAH SABAA ALLIL: 44:03
No, you described it perfectly.
MICHALLIS PASHIDIS: 44:04
No. Okay. Yeah. So yes, we tried different setups, basically. Yeah.
BEN CORBETT: 44:10
Yeah. Thanks. And the last question here. Oh, we’ve got one more that’s just come in, so I’ll read this one first. So, it says, “Did you ever consider comparing kind of AWS IoT Core and ThingsBoard in brackets a good IoT platform? How efficient is your platform in terms of robustness, scalability, and costs?” So, I guess I’d summarize that as, did you ever consider the use of those IoT platforms and kind of—
MICHALLIS PASHIDIS: 44:36
Yes. I cannot answer on the comparison at the moment to say how does it compare the one to the other one. I do believe that we have a robust and scalable platform and a good cost, I think, because we are still reducing it. But basically, the question we ask ourselves is, “What do we actually need on our site?” And we very fast came up to an ingestion flow, which is a little bit particular because we are using also cold storage, and we need to have provenance of the raw data coming in. So, we need to have a trace or the source of the data, we need to keep that untouched, let’s say. And it’s all possible with different IoT platforms. We know that, but we prefer to have a full control of the steps that we wanted to do from ingestion. And basically, I think it’s about nine steps in two phases, and we just wanted to be sure that we have the back [inaudible] propagation and the back [inaudible] implemented correctly, that we can yield the data very fine-grained. Though we didn’t choose for a platform, not because it’s not good. I think they are good, but we cannot compare them now. But we just had a very clear way of how we wanted to deal with the data flow and the processing of the data itself.
MICHALLIS PASHIDIS: 45:55
And it’s only in the second phase that we introduced tools like Mage AI, for example. But in the first step, we have a very custom approach there. The benefits of doing that for us was that we wanted to control the onboarding of a customer. So, if a new organization joins, we want to be able to enable all those channels so they have a wide range of having or an easy way to integrate the data. That’s one. And secondly, we wanted to have provenance and data quality in the first phase and some other steps that we do, but we wanted to control that part. And it’s not about us being very opinionated about a specific platform. I think they are good, but the approach we had is very good in performance and scalability and cost that we have today. So, it’s not a real answer, but yeah. That’s [crosstalk] how we see it.
BEN CORBETT: 46:44
And of course, when you met the InfluxDB team after that, then it was just— there was no other option as soon as [crosstalk]?
MICHALLIS PASHIDIS: 46:51
Yes, it was peanuts. Exactly. Yes.
BEN CORBETT: 46:57
Okay. So, we’ve got our last question, and I think I’ll be able to handle it. And if you’ve got anything to add, Michallis or Abdullah, let me know. So, this one is about, what is the best way for massive streaming ingestion, line protocol, question mark. So, what we find, we’ve got articles online about the best practice for writing to InfluxDB. InfluxDB has a few simple rules when it comes to write optimizations and the best way to get the most out of the cluster. In general, we like a small handful of concurrent writers, so not hundreds and hundreds with very small payloads. And we like to make sure that batches are in 5 to 10 thousand lines, 5,000 to 10,000 lines of line protocol. If when you’re writing to InfluxDB, you can write the line protocol itself, that’s good. That means the database doesn’t have to pass into line protocol. There’s a couple of other things you can do which makes smaller differences. Well, always use GZIP compression when you’re in your write client, which is writing data. That can massively reduce the data transfer fees and things like that. Definitely seen customers forget to do that in POCs before.
BEN CORBETT: 48:09
If you can order your points within the batch that you’re writing by time, that always helps. That just, again, means that the database engine doesn’t have to do this. If you can order your tags alphabetically, so your tag keys alphabetically and your field keys in each line alphabetically, again, just taking some of that responsibility away from the database engine itself. In general, what we see in version 3—the write efficiency of version 3 is quite staggering. Hopefully, we’ll get some benchmarking out to prove to you guys soon with regards to what the different tiers we’re able to cater for. But what we’re seeing is that the entry-tier environments can handle hundreds and hundreds of thousands of values. We’ve got some customers which are in the gigabytes per second. So, the database won’t be the bottleneck when it comes to ingestion performance. And as long as you can have that small handful of concurrent writers batch appropriately and follow some of those rules, then I think you’ll have a really smooth, kind of efficient, high-speed ingestion. Anything to add there, Michallis, or Abdullah? Let me know.
BEN CORBETT: 49:15
And I think that’s all our questions. So, thank you so much to everyone who joined. Yeah, hopefully, you got as much value out of it as I did seeing this team again. I’m just going to have a quick look to see if there’s anything I need to say. And yeah, just to let you know that this will be recorded, and you’ll have a copy of the recording, and we’ll make sure to send out the slides as well. I think that will all be available tomorrow. So, for now, thank you very much for joining.
MICHALLIS PASHIDIS: 49:42
Thank you all.
BEN CORBETT: 49:42
And we’ll look forward to seeing you again. Thanks to Michallis and Abdullah. Cheers.
MICHALLIS PASHIDIS: 49:47
Bye-bye. Thanks.
[/et_pb_toggle]
Michallis Pashidis
CTO, Enprove
Michallis serves as the Chief Technology Officer at Enprove, a company dedicated to helping energy-intensive industries adopt sustainable practices through advanced analytics. With a rich background in security, data governance, and IoT, he also supports the W3C Decentralised Id, Verifiable Claims, and Web Ontologies.
Abdullah Sabaa Allil
Technical Lead, Enprove
Abdullah Sabaa Allil is the Technical Lead at Enprove, where he directs the development of Enelyzer, an Energy & Sustainability Data Management Platform offered as a SaaS solution. With a strong background in software engineering and a keen focus on sustainable technology, Abdullah ensures that Enelyzer not only meets the evolving needs of energy management but also aligns with environmental sustainability goals.