How Teréga Replaces Legacy Data Historians with InfluxDB, AWS, and IO-Base
Session date: Jul 11, 2023 07:00am (Pacific Time)
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO-Base — a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. They have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
- Teréga’s approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
- Their centralized methodology to collecting sensor, hardware, and network metrics
- The importance of time series data and why they chose InfluxDB
Watch the Webinar
Watch the webinar “How Teréga Replaces Legacy Data Historians with InfluxDB, AWS, and IO-Base” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “How Teréga Replaces Legacy Data Historians with InfluxDB, AWS, and IO-Base”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
- Caitlin Croft: Director of Marketing, InfluxData
- Thomas Delquié: CTO, Teréga Solutions
- Nicolas Lafargue: Tech Lead, Teréga Solutions
Caitlin Croft: 00:00:00.000 Hello, everyone, and welcome to today’s webinar. My name is Caitlin and I’m joined by Thomas and Nicolas at Teréga who will be talking about how they replace data historians with InfluxDB, AWS, and IO-Base. Once again, this session is being recorded and the slides will be made available. And without further ado, I’m going to hand things off to Thomas and Nicolas.
Thomas Delquié: 00:00:26.769 Thanks a lot, Caitlin, and thanks everyone for joining this webinar. So as Caitlin said, today we’ll try to explain to you why we had to replace our data historian, how we decided on a new approach to do that, and more specifically, what we developed with obviously InfluxDB. So to begin with, just a quick presentation on our part. So my name is Thomas Delquié. I’m the CTO at Teréga Solution and I’ve been working at Teréga for 13 years now, where I have done many different jobs in the IT department. And more recently, I work on the AWS cloud migration and the conception of a data department at Teréga.
Nicolas Lafargue: 00:01:17.451 Hello, everyone. My name is Nicolas Lafargue and I work as a DevOps tech lead at Teréga Solution. I’m very happy to be here with all of you today. And what about me? I have been working at Teréga for five years now as an API and data integration architect and DevOps specialist. I actively contributed to the AWS cloud migration of Teréga and currently hold the position of tech lead for the IO-Base project.
Thomas Delquié: 00:01:49.101 Great. So to begin with, we’ll start with a bit of context about Teréga. So Teréga is a gas storage and transportation company in the southwest of France. We manage 5,000 kilometers of pipelines that transport natural gas in our network. And we have more than 600 employees in our company. And so we have been doing this job for decades now, starting from [inaudible]. But recently, we had a new challenge, which was the energy transition. A few years ago, we envisioned that in the future, of course, natural gas usage will decrease because it’s a fossil fuel energy and we need to get rid of it as fast as possible. So we decided to create a new business model and to go into new activities in bio-methane and in hydrogen. But to do that, our boss at the time had a vision. So he said, “Okay, the future of energy will be multi-energy mix. Because energy will be more scarce, we need to better optimize it. There will be more link between electricity and gas. So we need to be ready on the energy side with bio-methane and hydrogen, but we also need to be ready on the digital side. We will need digital tools to operate these new energy vectors and to optimize the overall grid.” And six to seven years ago, we were in an old-school IT architecture with everything on-premises, and we were not at all ready to address these new challenges. So we needed to reinvent ourselves.
Thomas Delquié: 00:03:49.039 And the first step to do that was to get the data to create these new services address for the energy transition. But at that point, we had a problem on our side. It was that, of course, we have our industrial site, we have our network, we produce a lot of data, and we want to make use of this data to optimize our operation, to securitize it, and, in the future, to operate the energy grid as a smart grid. So to create the new services based on this data, we decided to go in the cloud on AWS. Why? Because nowadays that’s where all of the newest services are. All actors are in the cloud. It’s much more faster to create services here, so. And obviously, if we want to manage a grid at the regional or national level, we need to centralize the data in one place, and it is much more easier to do it in the cloud. But between our sites and our target in the cloud, we had one big problem. It was our legacy infrastructure that was at the state of the art 10 years ago, but not anymore. We had, of course, an OT network, an IT network, and then some elements in the cloud.
Thomas Delquié: 00:05:19.082 Between these two elements, we had firewalls, or even worse, demilitarized zones. And I also copied a small portion of our OT server infrastructure with a lot of servers to collect data, aggregate it, replicate it — of course, to have a backup - and then send it to the IT. It was extremely complicated. And because we have a lot of servers, a big infrastructure on-premise, it was costly, expensive to maintain, and not agile at all. We had a lot of difficulties to get the data out. You needed to go see the OT team, the network team, the IT team, and of course, the security team to allow the flow of any kind of data you wanted. So to create new project, it was really a big problem. So we needed to find a new approach to say, okay, how can we get the data out easily while at the same time not jeopardizing the security of our industrial site? Well, at first we say, “Okay, we have our sites, we have our [inaudible] architecture with a lot of stuff on-premise, our data historian, we need to get all of the data in the cloud. Looks easy, let’s just make a full replication of the database.”
Thomas Delquié: 00:06:51.438 Well, yes, we tried, but that didn’t resolve the problem because doing a full replication was just adding one more infrastructure to the already complicated problematics. Moreover, it was obviously costly because we replicated all data, so we have to pay on the two sides. And to anyone who has managed a data center or database, we had to maintain two active database at the same times, and that’s obviously very complicated. Okay. So we say to ourselves, “Okay, full copy of the database, it’s difficult, it’s complicated, so let’s just do a cherry-pick of the data for each use case so that we only take the data that we want for energy performance, for smart grid, and so on.” Of course, in the cloud. That’s more or less what we tried to do at the beginning, but there were a lot of problem with this approach. The first was a security issue because at each time you had to open a door in your firewalls and every — so you increased the surface of attack. Some cloud editors told us, “Oh, but you just have to install one of our hardware appliance on your OT network. It will collect the data and send it to our cloud.” Yeah, it’s even worse on the secret side.
Thomas Delquié: 00:08:21.111 And moreover, because we send small pieces of data here and there, we were not at all fine with data quality because we did not know where was the good data, the bad one, if there was no discrepancy between these different data. So this approach did not work either. So we thought a lot and we went to a completely new approach for us. We said, “Okay, we want to go full cloud, so let’s go full cloud. We don’t want to manage IT infrastructure anymore, so let’s just get rid of all of the on-premise IT infrastructures and server and let’s copy directly the data that we need from the sites to the cloud.” So we decided to put our data historian in the cloud to have only one flow of data from the site to the cloud. And we only keep locally one application, which is a SCADA for common and control of our network, but in a way that it’s the most lightweight possible, with no advanced capabilities and no need for external data, because we didn’t want any more flow of data coming outside. To do that, we created a new equipment. So a data diode, which is called Indabox, which allows us to make a unidirectional flow of data.
Thomas Delquié: 00:09:57.094 So we install it on our 500 sites on our network, because of course, we have a lot of separate and distant sites. This box connects to the sensors and the PLC. They collect the data and then send it directly to the cloud. This equipment was very simple to install and this way, we were able to get the data out very, very easily to send it to the cloud. So to summarize this new approach, we said, okay — before, we had on our site a very complex on-premise infrastructure with a lot of servers before we can send it to the cloud, and it was a really centralized approach. And now we have distributed the approach on all of our sites; all of the sensors of the PLC which are connected to our Indabox. And the Indabox are one side on the OT network, the other side on the intranet networks. We completely bypass the IT infrastructure, which does not exist anymore on our sites and now we have the data in the cloud near real-time, easily. So that was the vision on how to get rid of all these slowness that we encountered and the difficulties that we had with the previous approach.
Thomas Delquié: 00:11:36.248 But now we still had to decide, okay, now we know we want to put our main data historian in the cloud, the question is, what is it? Oh, sorry, one point I forgot. Obviously, we migrated from the old architecture to the new one in a seamless way. You can go to your operational people and tell them, “Okay, we’ll replace your most sensitive application that allows you to monitor and operate the grid with a new one in the cloud that we are building.” That’s just not possible. So the old architecture stayed in place and at the same time, we install the Indabox one by one on the different side to collect the data a second time to send it to the cloud. And that way, we were able to show them that, yes, you can have a data historian in the cloud, get the data, and look, this way we can create services much more easily. So we had an approach where we had to prove to them how it works and that was a good way to go for us. So we needed a data historian in the cloud. So in here, we were in the classic approach, okay, do we build it, or do we buy it? So first we focused on our needs, what we wanted.
Thomas Delquié: 00:13:08.401 We wanted, of course, to have our database master data in the cloud with all of the data coming from the site in a secure way with Indabox and the approach that we presented before we had the how to do it. But we also wanted it to be the most user-friendly possible, so we needed to add several services and functionality for the users to monitor the plans and obviously to leverage data that is in the cloud. So we needed to have everything in API so that we can get the data easily out of the database. And of course, because we operate a network grid for energy, which is a critical services for France, we needed this system to be high-performance because we wanted to send a lot of data into it, scalable, and most importantly, very resilient with 24/7 availability. So we did a market survey because if there was a perfect fit for us, we would not have bothered to recreate it. But we found no industrial cloud-native data historian in the cloud. There were a few software, but they were not highly industrialized.
Thomas Delquié: 00:14:29.845 There were legacy data historians that had a new extension to the cloud, but they were not cloud native and it showed in their architecture the ways they were built, and they had same kind of problems that we have with the previous architecture. It was quite complex to create commissions to train the users to customize. And it could be very expensive depending on your providers. And on the other side, we had a lot of data platform which all said that they could manage the time series data, but when we really looked into it, they were not really managing time series data. Most of the time it was just an add-on. So it could fit for if you did not have a lot of data or a lot of report and so on to do, but it was not really focused on time series. They were more focused on AI and analytics, which is a very strong point, but which was not our use case. We wanted a software to address our industrial requirements and services to operate and to monitor our network. So the decision was taken to build it obviously on our site. And to do that, let Nicolas talk.
Nicolas Lafargue: 00:15:55.419 Thank you, Thomas. So as we said before, we opted to develop our own data historian. So first, we needed to find the best time-series database to meet our specific requirements. We decided to explore different options available in 2019, such as Bigtable with custom layers, Warp 10, and TimescaleDB, among others. We finally decided to opt for InfluxData and use InfluxDB database. Why? First, for its outstanding performance, especially suited for specific use case and architecture, where each measurement has only one field. We will talk about this architecture choice later. Then the ability to create metrics on the fly during data ingestion was a significant advantage that set InfluxDB apart from other options like TimescaleDB. This feature enables simplified deployment, supporting over 100,000 metrics for Teréga, for example. Additionally, we chose InfluxDB for its excellent ecosystem integration, such as with Grafana, for example. Another advantage was the ability to test it with open source version or with free feature. As for us, we were able to start with an initial open-source version for the evolution phase. And lastly, we selected InfluxData for its market leadership. InfluxData’s established position in the market provided us with confidence in terms of longevity and ongoing support.
Nicolas Lafargue: 00:17:45.951 Let’s now talk about the architecture we built to address the changes we encountered. Firstly, we opted to use InfluxDB version one cloud dedicated hosted on AWS. Why AWS? AWS was our preferred choice due to its numerous advantages. First, it provides a reliable and robust infrastructure, ensuring high availability and scalability for our data historian. Secondly, AWS offers a wide range of services and solutions that perfectly align with our project requirements, enabling us to leverage its extensive capabilities and integration. And lastly, AWS provides strong security measures, including encryption, access control, monitoring for the protection and privacy of data. We had an extensive experience with AWS during Teréga’s cloud migration that reinforced our confidence in using AWS services. So this experience has allowed us to develop a comprehensive understanding of AWS, enabling us to leverage its features effectively and efficiently for data historian implementation. Why cloud dedicated? By choosing a dedicated cloud solution with InfluxData, hosting our infrastructure on the AWS account, we benefit from our expertise in managing the DB deployments. This arrangement ensures optimal performance, reduces the overhead of infrastructure maintenance, and allows us to concentrate on our core objectives without compromising on the reliability and efficiency of our data storage.
Nicolas Lafargue: 00:19:39.702 And why InfluxDB version one? Because InfluxDB version one was selected because it was the latest stable release in 2019 when we started to work on this project. So as you can see in the diagram, the InfluxDB is contained within a VPC managed by InfluxDB on their own account. Okay. But in order to ensure a maximum security for the InfluxDB database, we decided to use a private networking option of InfluxDB V1 cloud dedicated. This means that the Influx database is contained within an AWS private subnet and is no longer accessible from the outside. Next, we created an additional layer to host our historian infrastructure with the main component being an API that securely exchanges data with InfluxDB for read and write operations. This API is hosted on our own AWS account within the IO-Base VPC. It communicates with the private network containing the InfluxDB database through an AWS transit gateway. Since the InfluxDB V1 API is only protected by basic authentication with a username and password, which didn’t meet our own security requirements, we further enhanced the security of our API by implementing OAuth2 authentication using Auth0.
Nicolas Lafargue: 00:21:22.110 Let’s pause for a moment and discuss another challenge we encountered, how to manage the hierarchy and structure of measurement and metadata. As you can see on this screen, we needed to be able to make modification to data hierarchy, including measurements renaming and moving into different levels with different metadata associated. InfluxDB V1 was not designed to handle very complex metadata and operations like this. Therefore, we opted to maintain a simple data structure in InfluxDB consisting of only one field per measurement. And to manage metadata and data hierarchy effectively, we developed our own layer on top of the InfluxDB database inside IO-Base. This approach allows end user to dynamically reorganize their data without requiring any modification to the InfluxDB database. Let’s see it in our architecture diagram how we handle it. So we use the DynamoDB database to store both the hierarchy and metadata information, as well as the mapping between the names of our metrics and the corresponding measurements names in InfluxDB. We also leverage DynamoDB to incorporate another feature, which is a management of permissions for accessing measurements. We find when control over user rights, such as specifying read or write access to specific measurements.
Nicolas Lafargue: 00:23:12.442 Now, let’s talk about some performance challenges that we encountered. While retrieving individual values from the InfluxDB was extremely fast, we faced a little performance issue when attempting to retrieve the list of measurement names with InfluxDB version one. It was especially due to the large number of measurements present in the database. This issue impacted the efficiency of retrieving the measurement names from the InfluxDB, causing delays or slower response times. By leveraging DynamoDB to store the measurement names, we were able to overcome this performance challenge and achieve better overall performance for this specific use case. Another improvement we made is that we implemented a Redis cache to store and quickly access the latest values, ensuring real-time availability. This integration of Redis enables faster access for real-time data visualization, formula calculation, and alerts, enhancing overall efficiency in processes, such as your [inaudible] model. Then, as you can see in this diagram, we chose to use Docker containers managed by AWS ECS Fargate.
Nicolas Lafargue: 00:24:44.447 We chose ECS Fargate AWS service due to its auto-scaling capabilities and simplified container management. This setup allows our API to handle fluctuation in traffic without requiring manual intervention. As a result, we achieved optimal performance and cost efficiency by automatically adjusting the number of containers based on demand. So let’s take a look at the global architecture now. This diagram provides a good overview of our architecture and solution designed to create a robust data historian and replace [inaudible]. As we have seen before, our API is available to send our true data securely. Then, as you can see in the bottom left corner, the Indabox is responsible for transmitting data from our industrial site to the API securely. It serves as a unidirectional security gateway that allows data to flow in one direction only. It uses HTTPS to send data through our API using JSON formats, so as data is then stored in the InfluxDB. Additionally, as you can see in the top left corner, we have developed various UI interface models. These models enable data retrieval, visualization, [`] management, and the creation of new metrics using formulas. These web-based UI applications are built with Angular and are hosted on AWS S3.
Nicolas Lafargue: 00:26:27.263 A little word about DevOps to conclude. We developed this entire architecture following the DevOps methodology, and even incorporated DevSecOps principles. We relied on various tools and practices, including Terraform for infrastructure as code, Git for version control, and CI/CD pipelines. These choices allowed us to automate and streamline the deployment and management of our infrastructure, ensuring consistency, scalability, and security during the entire development process.
Thomas Delquié: 00:27:04.089 Thank you, Nicolas. So to finish on this technical side, we saw that we took a managed version of InfluxDB, which was really a big plus for us. That way, we didn’t have to focus on the bot. And the main challenges were in adding our own layers of security, performance, and metadata on top. And that way, we were able to create, for Teréga at first, IO-Base, which is a product or data platform. So on one side, we have the Indabox, which collects the data and then sends it directly to the cloud as we want it to. The data is stored in InfluxDB. And we have created additional services on top, like an online SCADA, visualization of the graph, visualization, reporting, forms, alerts and on-call management, and formulas to create new data. And so with this, we have really created a data historian which allows us to monitor our industrial site with minimum maintenance, with now a centralized database easily accessible via API. So that way, it’s very easy to share the data with subparties to create new services. So the results for Teréga were at different levels. The first one, we had, of course, a big gain on the total cost of ownership, on the IT license, and the infrastructure cost. We were also able to capture a lot more data because we are not constrained by the on-premises architecture. So we have now more than 100,000 points and counting.
Thomas Delquié: 00:28:56.689 And because we were based on Influx and we had virtually no limit on the data we could ingest, we were able to collect the data more frequently. So for most of our data, we went from one-minute data collection to five seconds. So these [inaudible] five points are directly related to our implementation of InfluxDB. And it’s really not just for show. Because we could collect the data more frequently, we were able to see things that we could not see before. We had a problem in one of our delivery stations to one of our customers, where there were failures from time to time. But they were very short. They were under one minute. And so the clients had problems, but we could not see it. With our new data platform, we were able to see it. Same thing on the data capture. As I said, it’s managed database. And at one point, there was a migration. We sent too many data compared to our database size. But InfluxDB, automatically, on the size, they decided to upgrade our tenant. And so we had no problem with infrastructure. And so with Influx later, it was just a one-time massive influx of data. And so we could get back to a good size of tenant for us. So overall, a big success for Teréga. And most importantly, now we have the data available. And we are able to create new services for Teréga. That was the one point. But it’s not the end, of course.
Nicolas Lafargue: 00:30:46.724 Yeah. What was the next step for our database? We are continuously evaluating and improving the application in an Agile manner, taking into account the user feedback and business needs. In the future, we have plans to add data versioning capabilities, allowing the addition of multiple values for a given timestamp. Additionally, we will be working on template management for all different models, including in the board, in the forms, and in the view. This will enhance the customization option and flexibility for users. Integration with AWS AI services is also on our roadmap, enabling us to leverage advanced artificial intelligence capabilities for data analysis and insights. Regarding InfluxDB, we will explore the opportunity to migrate to InfluxDB version three, which offers features we are particularly interested in. This includes longer data retention through S3 storage, lower storage costs, and TCO. We are also interested in the integration of Apache Arrow Flight SQL for enhanced data curing and analytics capabilities. Thank you, and I will let Thomas conclude.
Thomas Delquié: 00:32:04.220 So to conclude, thanks to the capabilities offered by InfluxDB, Teréga’s solution was able to create IO-Base with each really an industrial data history in the cloud. With it, we operate — I cannot stress it enough, we operate our 24/7 critical activity on our network. So it has been made to be as more robust and resonant as possible at all levels with AWS and with Influx. And going back to the start of this presentation, now we are perfectly able to address the challenge posed to us by the energy transition. We’ve already started. We have created several applications on top of IO-Base much more easily and quickly than before because we have the data available, structured, and ready for use. So we are ready for any challenge on that part. Secondly, at the same time, we are a new growth driver at Teréga Solution because by addressing these issues that are commonplace in our industry, we have created two products that are independent and that we propose on to the market. And it’s a nice addition to our strategy and our perspective. So we are nowadays not just an energy company: we are an energy and IT company, and which goes very well with the vision of our director, which was that we need IT to operate the energy market. And nowadays, every company needs to be an IT company.
Thomas Delquié: 00:33:54.002 And finally, it’s important too, OT can be fun too. We had a lot of pain points. It was complicated to create new services, to talk with our internal clients and say to them, “Okay, it will take six more months and so on.” So that was hard sometimes. Nowadays, first, we had a lot of fun creating this solution with Influx. And secondly, now we can go really further than we could before. So on our side, we are very happy with what has been happening thanks to Influx and AWS. That’s it on our side. Thanks a lot. I hope you also had fun listening to us and you found it interesting. Caitlin, I’ll give you back the control.
Caitlin Croft: 00:34:42.682 Awesome. Thank you, Thomas. Thank you, Nicolas. So while we wait, we’ll just give you guys another minute to ask — post any other questions you may have. I just wanted to remind everyone and let you guys know of some amazing resources available to you from the InfluxDB side. So we actually have an engineering-led industrial IoT demo happening this Thursday at 7:00 AM Pacific, 3:00 PM BST, so the same time that this webinar started. I believe it will run for just about 30 minutes, so kind of short and sweet. So if you guys are still really interested in InfluxDB for industrial IoT data, sensor data, check it out. It’s completely free. I’ll be sure to post these links in the chat in a little bit. And then there’s also a really fantastic blog around how to save 96% on data storage costs with InfluxDB. So we always try to post really helpful blogs for our community, whether you’re brand new to the community or you’ve been using InfluxDB for a long time, there’s going to be fantastic resources in our blog. And finally, if you’re really interested in learning more about InfluxDB, come bug us. We’d love to run a proof of concept for you and show you a customized demo and show you how InfluxDB can help you with your time series data needs.
Caitlin Croft: 00:36:21.178 All right. So I see people have been posting questions in the chat. So let’s just go through them. If you have any more, please feel free to post them in the Q&A or in the Zoom chat. So the first question is, which protocol works with Indabox? Is it similar to OPC UA?
Thomas Delquié: 00:36:43.560 So we have implemented by default, several protocols in the Indabox, like Modbus TCP, Ethernet/IP, S7, OPC UA, and FTP for now.
Caitlin Croft: 00:36:58.186 Perfect. Let’s see. How much have you saved in terms of cost of licensing OpEx and CapEx money compared to OSI Pi Is it scalable to multiple sites very easily?
Thomas Delquié: 00:37:15.732 That’s like I said, we slash cost by about 15%. So that’s the scale of the reductions that we had. And yes, it is scalable to multiple sites. The problem at Teréga is that we have network of 5,000 kilometers, a lot of distant sites. We have about 500 or 600 of them. And so to collect the data from all the sites was really complicated. And so this approach is extremely scalable. It’s a distributed approach. You put an Indabox on your distant sites and you send it directly to the cloud. So it’s very easy to scale. That was the point.
Caitlin Croft: 00:37:57.205 All right. How have you achieved AF data analytics and PI Vision reporting and visualization?
Thomas Delquié: 00:38:06.419 Is the question, how did we achieve that yet?
Nicolas Lafargue: 00:38:09.120 Yes.
Caitlin Croft: 00:38:10.249 Mm-hmm.
Nicolas Lafargue: 00:38:10.923 Data analytics and PI Vision reporting.
Thomas Delquié: 00:38:12.803 Yeah, but we recreated services in IO-Base to visualize the data. So we have a small module with a data reporting library that we implemented that is at your hand. You can create the reporting that you want. Of course, it’s not Tableau or Power BI, obviously. And same thing for the data visualization. We have a SCADA in the clouds. We have a tool to create the drawings that you want on which you want to input your data.
Caitlin Croft: 00:38:44.876 How did you handle data reprocessing, for example, purging and ingesting of invalid data? Did you face any issues with InfluxDB and what volume of data — how much volume of data are you ingesting per day?
Thomas Delquié: 00:39:01.688 Mm-hmm. For the data reprocessing, at first, I have to say, we collect all this data, and we send each row to our data streams. That was the point, to all of the point and even the invalid one. And then you can create formulas to refactor the data as much as you want.
Nicolas Lafargue: 00:39:25.824 For example, for Teréga, we send 10 giga of data per day compressed, of course.
Caitlin Croft: 00:39:36.665 Do you guys have any — oh, sorry.
Thomas Delquié: 00:39:39.333 No, no. No, I do not recall we faced any issues on Influx. Sorry, we are searching.
Caitlin Croft: 00:39:46.784 That’s great news to me that you didn’t face any issues. We like hearing that. What’s the frequency of data logged into InfluxDB? Do you guys have any figures on how many metrics you were ingesting?
Thomas Delquié: 00:40:03.683 Yeah, we talked about it today. We have about 100,000 metrics per day that are coming, and they are — the frequency range from five seconds to one or two minutes, or even some of them are just one per day. But most of them are in the range of five seconds to one minute, that was the point. And yes, there are no specific latency issue, but of course, between the collection on site, the data that goes through the internet collection and then the writing on Influx, we consider that we are in a near real time. It takes a few seconds for any new data generated in the cloud on the site to be uploaded in the cloud. And that way I will be able to answer the next question, which is you can change the frequency of data collected on site. You can go to one second if you want, but obviously the more the frequency is high, the less data you can collect overall.
Caitlin Croft: 00:41:08.182 Were there any data latency issues when connecting with InfluxDB Cloud?
Thomas Delquié: 00:41:16.094 Not that we know of.
Caitlin Croft: 00:41:18.384 Perfect. How is the — oh, go ahead.
Thomas Delquié: 00:41:24.363 As we said, we’re not in a perfect real time. So no, we did not encounter any latency issue on that part.
Caitlin Croft: 00:41:34.754 How is the Indabox HTTPS request throughput? Is this able to fetch data at frequency 1 second, 5 second, 10 second?
Thomas Delquié: 00:41:52.986 Yes, you can parametrize the frequency of the collection of data directly on the Indabox. So you are the one who decides what will be the frequency of collection of data.
Caitlin Croft: 00:42:05.840 Were there any on-prem backup measures considered just in case the cloud connectivity was lost intermittently so that the data during that time isn’t lost?
Thomas Delquié: 00:42:17.332 Yes, obviously. There is a classic process to store and forward develop in the Indabox, so it’s a cache. So when we lost internet connectivity, data is stored locally. And then when the internet connectivity comes back, the data is sent again with a good timestamp, obviously.
Caitlin Croft: 00:42:39.487 Did you custom-develop the API? And is it part of the IO-Base solution?
Nicolas Lafargue: 00:42:47.583 Yes, we developed the API using — inside the Docker Linux container. We developed it in the IO-Base layer that I showed you before. And it allows us to call the InfluxDB API and to do some filtering, permissions, and other more things like using the Redis cache. So I showed you before. So we have this — our own custom API called IO-Base API that is on top of InfluxDB API. And it is this API that is called by the Indabox with HTTPS and the JSON format with some endpoints that we chose to be able to query and write data into InfluxDB.
Caitlin Croft: 00:43:41.844 Oh, this is exciting. So someone has already looked at your page and looked at IO-Base pricing. And someone has a question around how your pricing works and asks, €6,000 is for how much time of data? Is it for a year, two years?
Thomas Delquié: 00:44:00.990 It’s a SaaS application, so it’s a yearly subscription. So that’s quite easy on that part.
Caitlin Croft: 00:44:08.209 What was the main driver to look for this solution in getting to the cloud? Has it been a cost-driven main decision, or has it enabled productivity gains that were not possible before?
Thomas Delquié: 00:44:24.001 Okay, yeah, I could go one hour just on this question. So I try to be brief. So at the beginning, we were really on the goal to be more Agile in the IT. So we went full cloud. But very fast, we discovered that the other problem was we are, that was the data. We needed to add our data available. And we had a lot of monolithic application where the data was locked. And it was a big problem. And so we started a big program where we said, okay, we want to build a data hub where we will have several master data to store our reference data for it to be available for other services. And that was really the point in the beginning of IO-Base for IoT data to say, okay, we need to regain the mastery of our data. And it’s in this vision that IO-Base was brought. So the goal was to make the data available to everyone. You can export everything you want. You can create by yourself all of the service you want. So the main driver was really the operational side to gain time, to gain access to the data. And of course, at the same time, we needed to gain money, obviously. So that’s the main driver that we need our data to create new services.
Caitlin Croft: 00:45:47.503 I think you might have covered this, but just want to double-check since someone has asked it. Is IO-Base AWS dependent or independent?
Nicolas Lafargue: 00:45:58.312 IO-Base is hosted on AWS. But you don’t have to know it to use API for you. When you use API, just HTTP REST API that you can use. And it’s a SaaS software. And so it’s internally, as I showed you, using AWS services to be able to scale. But when you purchase IO-Base, you don’t need to have anything to know about AWS.
Caitlin Croft: 00:46:34.291 How many data streams tags are being handled? And what range of data from various sites? I think you’ve covered some of that, but.
Thomas Delquié: 00:46:45.098 Yeah. Yeah. We already talked about that one. So I think it’s okay.
Caitlin Croft: 00:46:48.531 Yeah. Perfect. I presume in a facility of this size, there would be several data points that are used from the legacy data historians inside the SCADA operational machines. Now that SCADA is kind of decoupled from historians, how has this challenge been dealt with? Oh, you’re on mute, Thomas.
Thomas Delquié: 00:47:23.878 Okay. I do it this way.
Caitlin Croft: 00:47:30.778 Perfect.
Thomas Delquié: 00:47:31.279 So yeah. I went back to the architecture view to say, yes, we had some data that was specific on the SCADA system. It was not on the data historian. It was more on the SCADA systems that we had to create new points or new values. And so we also collect data directly from the SCADA through the Indabox. So that’s how we resolved this problem at this point. Yeah, OPC.
Caitlin Croft: 00:47:59.729 Right. How can you compare the total cost of ownership/the cost of the Teréga solution versus legacy data historians?
Thomas Delquié: 00:48:13.020 Sorry, I was not there. Yes, that was the main value we have is 15% reduction in cost. That’s about the vision of the reduction in costs.
Caitlin Croft: 00:48:31.689 Awesome. All right. Let’s see. How many people do you have on the team maintaining the solution?
Thomas Delquié: 00:48:40.078 How do you say it? At Teréga Solutions, we are 20 persons right now. And so we have a team of, let’s see, about six to seven persons on the IO-Base project.
Caitlin Croft: 00:48:56.335 Do you cache the data locally to avoid data loss during network breaks?
Thomas Delquié: 00:49:02.433 Yes, that one. Yeah, the answer is yes.
Caitlin Croft: 00:49:06.443 Perfect. How do you achieve security and data governance?
Caitlin Croft: 00:49:14.132 Okay. That’s a good one. So for the security, the main goal of the security is to protect our industrial site. That is done via the data diode. So not an issue anymore. We are fine. We sleep well. That’s perfect. We obviously want to secure also the IO-Base part in the cloud because we do not want anybody to access our data. And on that part, we are very safe. First, we are on AWS, so we use all of its services to make sure nothing happens to our infrastructure and to its access. Then we use a zero-trust approach globally at Teréga. And specifically for IO-Base, we use [inaudible] services. And in the application, you can manage scopes and roles for anyone that has access to IO-Base to see who has access to the application and the functionalities and for all of these functionalities with which tags they have access. So of course, you need to have people also doing the data governance. So that part is on the client side of the Teréga operational teams that decide who is in which group and to which tags he has access. And so in IO-Base, you can manage a hierarchical tree for your tags. And so you can address and say, okay, this person on this sector needs to have access to its tags, the ones for his operational needs. So he will only have access to the tags in this part of the hierarchical tree.
Caitlin Croft: 00:50:59.426 How are tags configured for I/O bots? Does a local workstation connect to a monitor? Is there a shell or SSH with config files?
Nicolas Lafargue: 00:51:15.366 So everything is configured in the Indabox side where we can select exactly which tags we can get from PLCs or sensors. We have a web view in the Indabox. And the user can use it to select and configure everything. But it’s locally. It’s not in the internet, of course. Thomas Delquié: 00:51:44.690 Yes. You connect to the industrial side of the box and you parameterize in your navigator which tags you want to collect with which access, and from which target you want the data to be sent.
Nicolas Lafargue: 00:51:56.809 Yeah. And you can use the Excel file to be able to download and import the tags and the mapping with the tags in the PLCs.
Caitlin Croft: 00:52:11.538 Let’s see. If we need to do a POC for one site, how much time, effort, and cost is needed considering 5,000 tags for a small site?
Thomas Delquié: 00:52:22.486 Yeah, it’s an approach we do a lot because, of course, we need to prove that it works, obviously. And we have a specific template for that when we just land the Indabox and three or six months access to IO-Base. And depending on the level of help you need to implement it, we also input a few days to integrate the data you need. So POC typically goes from 5K to 10K euros. And the time needed, the biggest time needed is on the client side because it needs to collect all of the tags you want to have and what will be their name and what will be their [inaudible]. So you need to map your data. And that’s what is the most time-expensive. Apart from that, once you have that, in one day, we come. We install the Indabox. We parameterize it. It goes into the cloud. And afterwards, obviously, you create your reporting, your alerts, and so on.
Nicolas Lafargue: 00:53:27.833 So even shorter on one day, if you have everything okay on your site, for us to [inaudible], it’s direct. And then you can quickly configure the Indabox and use it.
Caitlin Croft: 00:53:41.141 Okay. So once all the data is in InfluxDB, how do you guys handle downsampling?
Nicolas Lafargue: 00:53:53.581 Maybe can you repeat the end word?
Caitlin Croft: 00:53:56.615 Once the data is in InfluxDB, are you guys doing any downsampling to look for long-term trends or anything, just storing it all, all the raw data?
Thomas Delquié: 00:54:10.860 No.
Nicolas Lafargue: 00:54:11.138 No, just storing. And then we filter it when we get it with the API.
Caitlin Croft: 00:54:15.838 Okay. Well, let’s see. So someone’s asking, what about using historian data in SCADA, trends, queried calculations, etc.?
Thomas Delquié: 00:54:32.071 Yes, so queried calculations are being — you have to redo it to IO-Base with the formulas, functionalities. And for the trends, it’s coming soon in IO-Base, too. So we do not get legacy data from the data historians [inaudible].
Nicolas Lafargue: 00:54:51.344 And it’s important to say that we use no code to do it, also.
Thomas Delquié: 00:54:54.570 Yes, of course.
Nicolas Lafargue: 00:54:55.353 You have a friendly UI, to do it manually.
Caitlin Croft: 00:55:02.481 What is the Indabox data cache size and the SCADA machine that collects data from Indabox for certain tags is from the cache? I think they’re just kind of confirming where the data is collected.
Nicolas Lafargue: 00:55:21.884 So it’s 8 giga — okay.
Caitlin Croft: 00:55:26.792 Good. Can you elaborate on how the API reads data coming in natively from the PLC to store it in the cloud? Is there any protocol conversion to MQTT or OPC UA, et cetera?
Thomas Delquié: 00:55:42.881 Yeah, so from my part, and maybe Nicolas will complement, the Indabox has two modules. The modules on the OT part connect to the sensors and PLC with the protocols, like OPC UA, and so on, and integrate it. Then it transforms the data in JSON, sorry. It transfers it. And then on the cloud side, we take this JSON and send it to the API or MQTT target. Right. Does IO-Base subscription price include pricing for InfluxDB? Yes.
Caitlin Croft: 00:56:25.950 Perfect. Makes it easy for your customers. Let’s see. Are you using Telegraf in front of InfluxDB? Or are there microservices using the InfluxDB client to store the data?
Thomas Delquié: 00:56:42.665 We are not using Telegraf right now. This is a discussion we have with InfluxData. But for now, no, we are not using it.
Caitlin Croft: 00:56:50.676 Okay. Perfect. All right. I think I got everyone’s questions as of right now. So we’ll stay on the line for another minute or so, just in case there are any last-minute questions that you guys have. Thank you, everyone, for joining today’s webinar. Thank you, Thomas and Nicolas. It was a really great job. Clearly, lots of questions. So I think people are super excited to learn more about Teréga and what you guys have done with InfluxDB. So if anyone has any questions after the webinar that you wish you had asked, feel free to email me. Everyone should have my email address. And I’m more than happy to connect you with Thomas and Nicolas, who can answer your questions. So thank you, everyone, for joining. Thomas, Nicolas, do you guys have anything else you want to add? Last thoughts? Anything else?
Thomas Delquié: 00:57:48.383 Well, thanks, InfluxData, for this webinar. That was fun. Thank you.
Caitlin Croft: 00:57:52.535 Awesome.
Thomas Delquié: 00:57:52.887 [foreign]
Caitlin Croft: 00:57:56.279 Well, thank you very much as well. Thank you, everyone, for joining today’s webinar. And once again, the webinar has been recorded and will be made available probably by tomorrow morning. So the recording as well as the slides will be made available as soon as possible. And you can find it on the same page that you used to register for the webinar. So it’ll be super easy to find all of that. And without further ado, I think I’m going to close things up. Thank you, everyone, and I hope you have a great day. Thanks again, Thomas and Nicolas.
Thomas Delquié: 00:58:34.857 Bye-bye.
Caitlin Croft: 00:58:36.037 Bye. Thank you.
[/et_pb_toggle]
Thomas Delquié
CTO, Teréga Solutions
Thomas is currently working as CTO of Teréga Solutions, and responsible for creating new digital solutions related to the energy transition, especially in gas.
He has extensive experience in projects related to Cloud and Data: migration of corporate messaging to GSuite, setting up Amazon Web Services and Google Cloud Platform accounts for Teréga, migration to the Public cloud of Teréga’s OnPremise infrastructure, creation and management of a Data team.
Nicolas Lafargue
Tech Lead, Teréga Solutions
Nicolas is a Tech Lead at Teréga Solutions, specializing in DevOps and driving the IO-Base project. He played a crucial role in Teréga's digital transformation, including the migration to the cloud on AWS and the API adoption of the company's IT systems. By leading the transition to AWS, Nicolas ensured scalability, flexibility, and enhanced security for Teréga's infrastructure. Additionally, his expertise in API integration facilitated the seamless connectivity of various systems, enabling efficient data exchange and streamlining business processes. Nicolas excels in optimizing workflows, implementing automation strategies, and fostering collaboration, which have significantly contributed to Teréga's success in the dynamic energy industry. His dedication to innovation and outstanding results makes him a valuable asset to the organization's ongoing energy transition efforts.