Real-Time Telemetry Monitoring Across Aerospace and Satellite Operations
Session date: Aug 22, 2024 07:00am (Pacific Time)
In the dynamic field of aerospace and satellite technology, the ability to capture and analyze extensive, high-cardinality sensor data in real time is essential for effective telemetry monitoring throughout the manufacturing and deployment phases. This requires a robust time series database capable of ingesting millions of data points per nanosecond from various sources, while also ensuring rapid query responses.
Join the webinar for a tech deep-dive and real-life aerospace customer stories to show how others use InfluxDB to act as a single observability data store for ingesting, storing, and analyzing all their time series data to provide real-time monitoring and accurate insights.
In this session, InfluxData’s VP of product marketing, Balaji Palaji will share how aerospace companies use a purpose-built time series database to:
- Improve Uptime and Availability SLAs at Scale with non-blocking writes/queries, high data ingestion rates, and optimized in-memory analytics
- Enable Real-Time Data Collection at Unlimited Scale by capturing all metadata from diverse data sources like Kafka, OpenTelemetry, MQTT, OPC-UA, and more
- Reduce Total Cost of Ownership through high compression for cloud storage, no data retrieval fees, and simplified data management
Watch the Webinar
Watch the webinar “Real-Time Telemetry Monitoring Across Aerospace and Satellite Operations” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Real-Time Telemetry Monitoring Across Aerospace and Satellite Operations.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. Speakers:
- Anais Dotis-Georgiou: Developer Advocate, InfluxData
- Balaji Panali: VP of Product Marketing, InfluxData
ANAIS DOTIS-GEORGIOU: 00:00
And we can get started.
BALAJI PALANI: 00:02
Sure. Anais, thank you. Hey, everyone. Excited to be here. I’m going to share my screen really quick. Can you give me a thumbs up if this comes up? Okay. Great. Thank you.
ANAIS DOTIS-GEORGIOU: 00:17
Yeah. Perfect.
BALAJI PALANI: 00:19
All right. Awesome. Welcome, everyone. Good morning, good afternoon, and good evening as you are joining from around the world. My name is Balaji. I lead product marketing at InfluxData. I’ve been here about six years now. Exciting times to be in, especially for InfluxData. We are the makers of InfluxDB. As I kick off this presentation, I just want to talk about real-time telemetry monitoring for aerospace and satellites. A brief agenda. We want to talk about some context, what are the big challenges, and the product. And I think Anais would do the honors for a very short demo. So, we’ll just quickly kick it off since we have only 25 minutes or so. So, before we get started, what are these companies, and what do they really have in common? I mean, as you can see, all of these are in some way or shape in the aerospace sector. Thales Alenia is a global manufacturer of space components that works very closely with the European Space Agency. Loft Orbital is another leading provider of space infrastructure. I can’t name the name, but this is a commercial space station developer creating space stations, international space stations, and so on. Last but not least, that company is basically building next-generation satellite-based communication systems, including broadband internet and so on.
BALAJI PALANI: 01:58
I mean, all of these companies have a lot of telemetric data being generated every second, millisecond, and so on. They’re coming in from a lot of these components as they’re either launching these into space or even manufacturing. You can see some of them are actually using data generators in their manufacturing. And all of these seem to be having either kind of a large amount of large volume of data coming in, and then they have a large number of components that they want to monitor. At the same time, they want to query. They might want to send these data over to or give access to their partners and so on. Very common. We see this often across the aerospace industry without going into details. And I do have a couple of case studies later where I can jump into specifically what they do and what some of those metrics are.
BALAJI PALANI: 03:00
So, I talked about, hey, some of them are being used in manufacturing flow. Some of them are using it—they want to do some testing in their dev-test environment. And, of course, when you have a large electric vehicle takeoff and landing, you might want to capture that data during that process and in-flight operations. What we are talking about is essentially just in numbers, right? So, you’re talking about anything more than 100,000 sensors, maybe sometimes in the millions. That usually translates to a billion or more data points. And we have a lot of them in the billions above data points. Most of them have less than one second of data frequency. And because we are talking about a very large, complex, real-time mission-critical kind of physical implementation, these usually have a lot of device tags. Device tags are used when you want to understand where a particular sensor is. You do it by saying, “Hey, this is located in the upper part, lower part, some locations, some geographic location, or even some within the component itself, and so on.
BALAJI PALANI: 04:16
All of these translate to so why am I talking about this? I’m from InfluxDB, right? So, what does this have to do with InfluxDB? Well, guess what? These use cases, even if you’re in manufacturing or you’re launching your aerospace satellites or whatever, they all speak time series. What is time series? Time series data is nothing. But hey, something that you want to measure is state change, performance, some kind of problem that changes over time. Every data point here is timestamped. You have a timestamp, and you have a bunch of tags that I talked about. Again, sometimes you want to get all those tags. And then you have a few fields, what we call temperature, pressure, high altitude. I mean, there’s a whole bunch of things that you can measure. You can also measure events. Sometimes you generate events like, “Hey, this component failed,” or “It’s about to fail.” You have warnings and those kind of— you translate those logs into some kind of events.” Again, these are encapsulated by those time intervals. As is the case, typically you want to analyze, or you want to do some analysis based on time. So, time becomes your manufacturing and magnifying glass for that operation and transformation, and so on.
BALAJI PALANI: 05:38
So, what are the challenges with managing time series data? Obviously, first, as I mentioned earlier, there’s a massive volume of data coming in. And they’re not coming in like, “Hey, I have a billion data points.” And then once I ingest it, it stops. It’s continuously coming in at high speed. So, you’d have to constantly ingest it and then analyze it at the same time. And you have to take real-time action. As I mentioned, again, these are mission critical applications, just a fraction of a second delay. You may have to— that could cost you in billions of dollars and so on. And last but not least, data cardinality is a true issue. Again, I talked about how you can capture a large number of tags. Typically, when you capture a large number of tags, this arises from— when you do queries, you might want to access that data faster. And to access the data faster, you may have to index all of these tags. So, when you have a lot of combination in your tags, that could cause you to have an enormous cardinality. Typically, in the times series environment, it’s a very well-known problem. Typically, what customers do is they shorten the time duration that they retain the data, or they have multiple instances, again, causing more complex complexity in how you collect your data, how you analyze your data, and so on.
BALAJI PALANI: 07:00
So, I’m going to talk about— okay, these are the problems. So, what is your solution? Before I go into the solution, most tools, including relational database, of course, and everybody’s familiar with Postgres or MySQL or MongoDB, hey, let’s just put it in because you’re familiar with those tools. Let’s get started. Typically, these tools cannot handle the challenges, especially at that scale, right? So relational data stores are not a fit for purpose. They are not built for scale. They’re not built to ingest that data in real time. When you do queries, typically they are slow because of the whole architecture, right? These data stores, they are built for consistency, especially in a distributed environment. As data comes in, you want to query. Sometimes before the data [inaudible], you are making sure they’re replicated, they’re consistent, and all of those things. So, the real-time querying doesn’t happen. It usually takes, we call it time to be ready, TDBR. It usually takes a long TDBR in the order of seconds, sometimes minutes, and so on. They are not architected for real-time query. Lifecycle management, again, we talk about relational data source, which, hey, this is related to that foreign key and things like that. They don’t care about how we handle data. The recent data is more important than the historical. So, you might want to shard your data that allows you to drop data easily. All these sharding data retention are not built in. Of course, you can build it in. You can create additional things, triggers, and things like that that will help in the process. But hey, that means you’re doing extra work within this data source.
BALAJI PALANI: 08:56
So, what’s InfluxDB? How is it different? We built for scale. We have been around since 2014 or so. We are an open source company. We have open source. We have commercial options. But from the beginning, we have been building for scale. We don’t block the writes and reads, meaning the independent pipelines. And as writes come in, we accept those rights, make it available for the reads to happen. We are prioritizing read availability over consistency. Data lifecycle management is built in, we shot it automatically, and then you can set the retention policies after which the data is affected. All of these happens within. It’s architected for that purpose. And we are flexible on schema, meaning you don’t have to build a schema. You don’t have to say, “Hey, my tables have these many columns, and these are this, and these are that.” You can actually build; you can actually ingest data. So, that calls for faster development cycle, right? So, if you’re a developer, you can just push the data in and then figure out the queries later on. Recently last year, we launched InfluxDB 3.0. Basically, this was architected specifically for performance and lower costs of ownership. Why? Because, again, over the years, we have worked with customers. They’ve told us, “Hey, we’re getting really real-time. We need queries to form in under a second. We want TDBR to be lower than a second,” and things like that. And also, “Hey, we need shared storage. We need to store like for 10 years or more, but we can’t be storing an SSD.” So, all of these who focus on 3.0 and create that.
BALAJI PALANI: 10:42
Just to briefly explain, what 3.0 has is an architecture which is built using some open source components. We call it Apache Arrow ecosystem. So, Apache Arrow is a great kind of in-memory format for column net analytics. You can bring it in memory and then analyze that in memory and so on. And then it’s a columnar database of things that are really fast. So, with the previous versions of InfluxDB, data used to land, we used to persist, and then we used to give access to the words. So, the time to be read used to be about a second or two seconds and so on. With this InfluxDB 3.0, data lines are immediately queryable. So, it’s kind of less than a second, maybe even call it zero TDBR because, as soon as data lands, it’s available in memory in the ingesters, and the queries have access to that data. This causes, like, you can do some interesting things. If you get the data in and you’re looking for, hey, the recent edge, right, so again, with caching and things like that, we make it even faster. We are optimizing it and so on. So, that helps you enable some unlock some real-time opportunities, real-time use cases for low latency analytical queries, especially on the recent edge of time.
BALAJI PALANI: 11:59
Second, what we did was we used Apache Parquet. So, Parquet is a format—for those of you who don’t know, it’s an analytical columnar format, especially useful for kind of on disk or even on S3 and so on. And we chose it because of its high compression. It has best-in-class compression. So, as you store data, as you process data, it really, really compresses it. And we also use commodified object storage for storing that data, which means that it lowers your TCO. So, you have data that’s available in real time, and then we store it in Parquet. You don’t have to write special archive record and so on. So, it’s already available necessary. Lastly, we’re using Apache Data Fusion as a query engine, which knows how to query that data. It optimizes queries. It optimizes how it accesses it. It has pushdowns. It has parallelism. A lot of these things help with the last point, which is we’re not using any particular indexing strategy for accessing that data. It’s all within the data fusion, knows how to query, knows how to optimize it, knows how to do columnar analytics, which means that you don’t have to be bound by any cardinality limits. You have a billion CDs tags, no problem. You can bring it in InfluxDB, and then we will take care of it for you.
BALAJI PALANI: 13:26
This is a typical use case architecture for aerospace monitoring. Again, I’ve included a launch here. You can do launch telemetry and tracking. You can also do manufacturing, collecting data from the factory floor, machinery, and so on. Or if you have any custom APIs, you can do that as well. We have an open source Telegraf, which is a collection agent, which has a whole bunch of plugins, like from MQTT, OPC UA. If you have some system plugins, you can also write your own custom plugin. So, if you have custom API, we allow you to do that. If not, you can also use what we call client libraries. Client libraries are language-specific code snippets. If you’re using Java or Go or whatever, you can just include or import those client libraries and then use those specific methods to ingest the data, query the data, and so on. Again, this is InfluxDB 3 architecture. You can see clearly that we have the compute layer. There’s clear separation between compute and storage. Storage happens on S3 or any object store. By the way, I say S3, but we do support AWS, Azure, and GCP if you’re using one of those cloud providers, and of course, on-premises. We’re going to talk about that later as well. And on the query side, either you can visualize using Grafana or Superset. Those are two good visualization tools. You can use it within your applications, use client libraries, again, to query your data. You can also use any one of your ML tools for predictive maintenance and things like that.
BALAJI PALANI: 15:00
We have something called as Apache Iceberg. Again, Iceberg is a common format. We make the catalog available in an iceberg catalog kind of format for these tools to access the data directly on S3, the Parquet data, and then do your machine learning thing, and then push your data back in Influx 3. Influx is really for those real-time use cases. But if you want to do the machine learning, we do support those as well. Briefly, before I hand it over to Anais for a quick demo, I talked about storage, right? Some of our customers have seen 10X kind of reduction in storage costs. Either you can store more data, right, or you can even reduce your storage costs and store more data. So, either of them, we’ve seen somewhere like north of 90% reduction in revenue costs. Especially if you’re storing in SAT, all of a sudden, you go to your object storage, you would see those cost benefits. One of the examples I mentioned earlier—again, I can’t name names, but this is a company which does satellite-based communication. Next generation stuff, they were using Postgres. Now they are migrated over to InfluxDB 3 for the long-term telemetry store. Again, as you can see, they’re collecting 100,000 plus metrics per second from hundreds of different satellites that they launch. Specifically, why they found InfluxDB valuable was the fast query response times really a second for most queries and for over long range time ranges as well. And then they also wanted to retain data over 10-plus years, which is pretty common. And we have a cost-effective way of doing that. You can get up to 10X reduction cost storage.
BALAJI PALANI: 16:55
So, I talked about unlimited cardinality. Again, these all stem from the way we access that data. If you’re going to use the regular databases, you have to index and so on. We don’t do that. We use DataFusion, which has a lot of these optimizations built in. And if you run a query, it quickly looks at it and then creates a query plan that says, “Okay, you just need access to these particular parquet files.” And then it can access that quickly, combine the tool, combine all of them, and then respond to your queries in under a second for most queries. Again, because of the way it’s built in, once you run a query, even if it’s not the recent edge, once it extracts the data it transforms the data. It is actually caching those data so that the next query that comes in, any of your clients, they’re looking for the same thing, it’s recently available to you. So, that would secure your storage, I mean, your query response times as well. And we also built in concurrent queries. It’s not just one query, it’s multiple queries. You can do all these there. It’s really built in the architect in a fashion that we are really, really for that real-time performance. One of the largest aircraft manufacturers, again, this is another one where they were using data historians, especially AVEVA Wonderware, for their manufacturing kind of as the aircraft moved through the manufacturing assembly lines. They are currently using InfluxDB. They are monitoring more than 3,000 parameters per second, meaning strict security and compliance requirements. Again, all of these are made possible within InfluxDB.
BALAJI PALANI: 18:38
So, for now, I’m going to hand it over to Anais for a quick demo.
ANAIS DOTIS-GEORGIOU: 18:43
Thank you so much. And just to introduce myself really quickly, I’m a developer advocate at InfluxData, and I’ve been here for about six years. The demo that I want to share with you today is just this flight demo, and you can find it at Influx Community. For those of you who aren’t familiar with Influx Community, Influx Community is just an org that has a bunch of different examples for how to use InfluxDB with a variety of different tech stacks. So, if you are looking to start with InfluxDB for a certain use case or with a certain stack and you don’t know where to begin, that’s a great resource just because it can help get you started on the path that you want to go. And specifically, today, I’m looking at this flight demo which leverages the OpenSky API. That is a paid API, but it fetches a bunch of data from a variety of different flights. And basically, this is mainly executed through this Python script here, where we first just import our dependencies, the most notable being the InfluxDB 3 Python client. This client library is leveraging and wrapping the Apache Aeroflight client under the hood for all the query methods. And so, it also enables you to work with polars and pandas and write data frames directly to InfluxDB, which makes converting any data from things like APIs really easy. The very first thing that you’re going to need to do is get an org and the URL of your InfluxDB instance as well as a token. And I’ll show you how you can do that in the InfluxDB UI in just a second. And then you’re going to want to instantiate your client. And then we make a query for the flights that we want to look at.
ANAIS DOTIS-GEORGIOU: 20:35
And then basically, all this code here is just converting all of the various flight data that we have access to as a part of this API into a nice little data frame. We do that down here. And then we also use the client library, the write method. We pass in the bucket that we want to write our data to. And a bucket is also just known as a database in InfluxDB. They’re the exact same thing. If you come from the SQL world, database is the same. The only difference is that a bucket has that retention policy that’s automatically associated with it so that you don’t have to worry about expiring your old time series data, and that gets handled for you. And then you can pass that data frame directly into the right method through the record parameter. And then you can specify what measurement you want this data to be written to. And a measurement is the same thing as a table in InfluxDB. And then you can also specify which columns in your data frame are tags versus fields. Ultimately, in InfluxDB V 3, this doesn’t really matter because tags and fields both get converted to columns in a table.
ANAIS DOTIS-GEORGIOU: 21:48
So functionally, there really isn’t a difference. This is more just for the user’s own organizational preference so that they can kind of distinguish which columns are the meat of their time series data, which ones have the altitude values and the longitude values, latitude values, and which ones are more metadata about their time series. So, that’s basically what this demo is doing for us here. And then we can go to our InfluxDB UI. When you instantiate your client, you’re going to need your org ID. And the fastest way to find that is just through the URL here. You’re also going to want to find that host URL as well. And then one thing to note that you will see Cloud 2 mentioned here, and that can be confusing for new users who might think, “Oh, no. Am I on a Cloud 2 version?” The reason why this says Cloud 2 is because we kept the API the same so that there’s right compatibility with our other client libraries. And so just kind of facilitate migration from V2 to V3. But if you’re ever confused, you can always look down here and verify that you are, in fact, using V3 or whatever version you’re using. So, yeah, don’t let that fool you.
ANAIS DOTIS-GEORGIOU: 22:59
Another really fun part of the UI is this kind of way to add data. And you can see that you can select a programming language here and pick any language that you want. And it’ll literally walk you through installing your dependencies, generating a token for you, initializing your client, and writing some data to a bucket of your choice. And then also, we’ll walk you through using SQL to query InfluxDB V3 and return that data as a data frame. So, whether or not you want to jump right into this flight demo or just get familiar with the client library of your choice, you have options. And then we can go to the Data Explorer in the InfluxDB UI and select a bucket that we want to query our data from or database as well as a measurement. And we can also load all our fields. And so, this is what I was saying about being able to kind of distinguish fields versus tags and that being kind of a user preference there. And we have all these waypoint fields, but I’m more interested in the altitude. And so, I can run that data for the past hour. We can see how quickly we were able to return all our results as well as search for our results. I could see just all the data coming out of London, for example. And we have data including our airport, destination city, destination code, flight time, various waypoints, ground speed. And someone asked earlier too, can you put geospatial data? You absolutely can, like latitude and longitude.
ANAIS DOTIS-GEORGIOU: 24:35
And then we can also use the InfluxDB Grafana plugins to easily make beautiful dashboards like this in Grafana because really, you can only really use InfluxDB Data explorer for just that, exploring your data, checking that you are, in fact, receiving data as you expect, but you can’t create dashboards here. We’re not trying to be a one-stop shop. We want to encourage you to use the visualization and business analytics tools of your choice that you’re already using. Once you’ve verified that you have data in InfluxDB, that’s when you can start making cool visualizations like this in Grafana, Superset, Tableau, whatever you might be using. And so, we have here a visualization of all of our flights and where they are, average altitude from our flights, how many flights are currently in the air, things like average ground speed, what airlines are coming from, mostly united. I guess if you want to fly to New York, maybe think about taking United flights and the current altitudes of the various flights that are coming in. So, yeah. You can build cool dashboards like this easily. And that’s our quick demo for today.
ANAIS DOTIS-GEORGIOU: 25:49
One last thing that I wanted to mention. Oops, there is a blog that is associated with this demo as well, which will walk you through exactly how to set up everything as well as how to use the Grafana InfluxDB plugin and how to create SQL queries and how to leverage the Python client library. If you prefer to learn using blogs or technical tutorials, you have that option for you as well. And I will stop sharing. And with that, just want to see if there are any more questions that anyone might have.
BALAJI PALANI: 26:23
I do have just one more slide before we wrap up. Thank you, Anais. All right. So, you saw that wonderful demo. Thank you, Anais, for that. I think it makes sense. I don’t want to leave with this InfluxDB again. The three data version has been deployed on three different flavors. One is a serverless. It’s a cloud. It’s managed by us, but it is a shared environment, meaning if you want to just kick the tires, if you want to get started, you don’t know what your workload is, and you’re still evaluating something, that’s a great place to be. But if you know that, okay, InfluxDB is for you and you don’t want to be on shared infrastructure because for whatever reason, you want to make sure your queries. I mean, there’s a lot more controls that you can do using dedicated. Perhaps you’re looking for enterprise SSO connection, or you want to tweak how many queries you have so that you give importance to your queries and things like that. You would choose in fact to be cloud dedicated. It’s a managed service. It’s a very distributed environment. You can scale. You typically resource satellite capacity, but it’s kind of a flex model, meaning if you know how many queries, ingesters, and factors, you can say, “Hey, I want to go up and down based on your variability,” and so on. That is possible with Cloud Dedicate as well. Of course, it’s an annual-based subscription, meaning you buy in an annual kind of treatment, and then you will draw down from that annual that you have.
BALAJI PALANI: 28:12
And clustered is on-premises. Clustered and dedicated are fairly similar, pretty much the same. It is deployed on top of Kubernetes. You can do it on-premises. You can do it on private cloud environments, wherever you want. Again, there are some regions where if you want to do GovCloud, typically people use the on-premises and then go deploy it themselves and manage it themselves. And again, this is something that we did. These are some benchmarking results comparing 3.0 to the previous versions of Open Source. We took very similar, “Hey, if it’s 24 CPUs on open source. We did the same thing on 3.0.” There’s more detail if you want to know, “Hey, how does that compare?” There is a detailed benchmark report. If you go to influxdata.com, it’s either on the first page or go to the particular product or platform overview, and then we provide a link to that. So, you can take a look at how we do it and what are the benefits and so on. But that’s it, thank you. Hopefully, this was useful. And if you have any questions, please reach out to us. If you’re interested in dedicated or the clustered environment, we do have a very great sales team who can help you out, do the POCs evaluated at your own pleasure, and so on. So, please reach out to us.
ANAIS DOTIS-GEORGIOU: 29:43
Thank you all. Yeah, we are running out of time now. So, please, if you have any more questions, please ask them on the Influx Community Slack or the forums at community.influxdata.com.
[/et_pb_toggle]
Balaji Palani
VP, Product Marketing, InfluxData
Balaji Palani is InfluxData’s Vice President of Product Marketing, overseeing the company’s product messaging and technical marketing. Balaji has an extensive background in monitoring, observability and developer technologies and bringing them to market. He previously served as InfluxData’s Senior Director of Product Management, and before that, held Product Management and Engineering positions at BMC, HP, and Mercury.