Overcoming Connectivity Issues in Distributed Systems: Aerospace
By
Jason Myers /
Developer
Jun 28, 2024
Navigate to:
Maintenance and repairs for aerospace operations in orbit present a considerable challenge. It’s not easy to dispatch a technician to fix components on a satellite. That’s why it becomes increasingly critical to plan for as many scenarios as possible before launching and deploying these kinds of devices.
To understand what’s happening with orbiting devices, companies need data. If that data stream breaks down, the priority becomes reestablishing it to avoid a very expensive piece of equipment being rendered useless.
There are several different strategies businesses can use to help mitigate or overcome poor or spotty connectivity in distributed systems. Here, we’re going to assume there are any number of edge devices trying to send data to a central hub.
Pre-transmission
Before you try transmitting data, there are strategies you can implement locally to make those transmissions more efficient. Some of these will depend on the resources available on the device, but at the same time, the available resources should reflect the potential needs created by connectivity issues.
- Data caching: This is especially useful if you need to use some of the data generated locally. Caching data on the device so it’s usable while the device is waiting to transmit it to the central hub can help keep your system operational.
- Compression: This is pretty straightforward. Compress your data as much as possible to reduce the amount of throughput necessary to send the data to your central hub.
- Priority scheduling: Is some data more important than other data? Knowing this ahead of time and prioritizing transmitting that data first ensures that it is more likely to reach the central hub in a limited connectivity window.
- Checksums and hashes: If you have concerns about data corruption, you can generate checksums or hashes for that data locally. Including those checksums with the transmission helps the central hub verify the data.
Transmission
There are a lot of approaches you can use to build-in safeguards for your data. While this list is by no means exhaustive, hopefully it will help get the conversation started on your end.
Tooling
I’m breaking this section down further because the tooling you ultimately choose may impact your configuration options. So, let’s take a look at some tools that can help manage your data so that you can take advantage of different configuration options more easily.
InfluxDB: This should come as no surprise, but having a local time series database instance helps manage all the data from the various systems and sensors on your devices. In particular, a single-node instance that supports edge data replication (EDR) is ideal. This feature creates a durable, local data queue so that if your connection fails or gets interrupted, the database continues to collect that data and then flushes the queue once connectivity returns.
Kafka: Using Kafka queues is another way to combat intermittent or unreliable connectivity. Kafka queues function differently than a standard publish-subscribe. The queue system saves data in a queue, and once an application reads that data, it is removed. (In the publish-subscribe approach, data can be persisted so it isn’t purged after being read.) Multiple devices can publish to the same queue, which is helpful if the queue has a reliable connection. Like InfluxDB, Kafka queues scale horizontally very well and are good for distributed systems.
Configuration
Between InfluxDB and Kafka, you should be able to collect and store your data. Configuring application logic to work around connectivity issues is a whole different ball game. Here are some concepts and approaches to consider.
- Dynamic Adjustment: This involves adapting transmission rates and methods based on current connectivity conditions. During periods of poor connectivity, you want this application logic to reduce the transmission rate or switch to more robust protocols for transmitting data.
- Forward Error Correction (FEC): This puts the burden of verifying data on the receiver, so if your edge devices have limited resources, this is one workaround. This approach includes additional data in transmissions that allow the receiver to detect and correct errors without the need for retransmission. The approach mentioned above about generating checksums or hashes could fit into this bucket, although there are, no doubt, other options as well.
- Edge Computing: This is where having a database at the edge is helpful. If you have the resources available, you can process data locally at the edge, sending only cleaned and processed data to the central hub. This minimizes the total amount of data you need to transmit.
- Delay Tolerant Networking (DTN): This is a store-and-forward approach. DTN protocols are ideal for environments that experience long delays and disruptions. This is similar to a firefighter bucket brigade, where data is stored at intermediate nodes until a connection is available to forward it to the central hub.
- Optimized Routing Algorithms: There are a couple of options that fall into this category.
- Opportunistic Routing: This involves writing your application logic to take advantage of any available communication opportunity to forward data. You can do this by choosing paths dynamically based on current network conditions.
- Multipath Routing: Instead of relying on a single data transmission path, consider configuring multiple paths to increase the chances of successful delivery. If you’re sending this data to a central InfluxDB instance, the database has automatic deduplication. This approach might end up resulting in more aggregate data transmission, so it might be best to keep this as a backup rather than a primary strategy for overcoming connectivity issues.
- Protocols for Low-Bandwidth and High-Latency Networks
- Lightweight Protocols: Where it makes sense, you can use communication protocols designed for low-bandwidth environments, such as Constrained Application Protocol (CoAP) instead of HTTP.
- High-Latency Protocols: Similarly, lean into established protocols that can handle high latency, such as TCP variants optimized for long-distance communication.
Wrapping up
While it’s unlikely that any single configuration option is the silver bullet that aerospace companies are looking for, some combination of dynamic application logic and having the right tools to collect and manage your data can make serious inroads against unreliable or intermittant connectivity issues.
To learn more about how aerospace companies use InfluxDB, click here. Try out InfluxDB for free, here.