3.3.4. Optimize the Ingestion of Telemetry Data¶
ID | Priority | Best Practice |
---|---|---|
BP 3.3.4.1 | Highly Recommended | Identify the ingestion mechanisms that best fit your use case |
BP 3.3.4.2 | Required | Evaluate network connectivity and data freshness requirements |
BP 3.3.4.3 | Recommended | Optimize data sent from devices to backend services |
Architecture Notes - BP 3.3.4.1 - Identify the ingestion mechanisms that best fit your use case¶
Identify which data ingestion method best fits with your use case to obtain the best performance/operational complexity tradeoff. Multiple mechanisms might be needed. It provides the optimal ingestion path for the data generated by your devices to obtain the best performance and costs trade-offs.
Recommendation 3.3.4.1.1 - Evaluate ingestion mechanism for telemetry data
- Determine if the communication pattern is uni-directional (device to backend) or bi-directional. For example:
- HTTPS should be considered if your device is acting as an aggregator and needs to send more than 100 messages per second instead of opening multiple MQTT connections. Use multiple threads and multiple HTTP connections to maximize the throughput for high delay networks as HTTP calls are synchronous.
- Consider the APIs provided by the destination for your data and adopt them if you can securely access them. For example:
- AWS IoT Analytics provides an HTTP API that is capable of batching several messages and is suitable for high rate data ingestion when the data is consumed in near-real-time fashion and a service-integrated data storage, data processing and data retention and replay are desired.
- AWS IoT SiteWise provides an HTTP API to ingest operational data from industrial applications which needs to be stored for a limited period of time and processed as a time series with hierarchical aggregation capabilities.
- Real-time video (for example, video surveillance cameras) has specific characteristics that makes it more suitable to ingest in a dedicated service, such as Amazon Kinesis Video Streams.
- Consider the need for data to be buffered locally while the device is disconnected and the transmission resumed as soon as the connection is re-established. For example:
- AWS IoT Greengrass stream manager provides a managed stream service with local persistence, local processing pipelines and out-of-the-box exporters to Amazon Kinesis Data Streams and AWS IoT Analytics (for example, industrial gateways).
- Consider the latency, throughput and ordering characteristics of the data you want to ingest. For example:
- For applications with a high ingestion rate (high-frequency sensor data) and where message ordering is important, Amazon Kinesis Data Streams provides stream-oriented processing capabilities and the ability to act as temporary storage.
- For applications that do not have any real time requirements (such as logging, large images) and when the devices have the possibility to store data locally, uploading data directly to Amazon S3 can be both performant and cost efficient.
- For more:
Architecture Notes - BP 3.3.4.2 - Evaluate network connectivity and data freshness requirements¶
It enables you to make the right assumptions on the local data storage and data transmission needed to satisfy the requirements of your workload. It also provides a clear understanding of the requirements of the workload and allows you to determine the hardware and software needs of the devices and the platform.
Recommendation 3.3.4.2.1 - Choose the right Quality of Service (QoS) for publishing the messages
- QoS 0 should be the default choice for all telemetry data that can cope with message loss and where data freshness is more important than reliability.
- QoS 1 provides reliable message transmission at the expense of increased latency, ordered ingestion in case of retries, and local memory consumption. It requires a local buffer for all unacknowledged messages.
- QoS 2 provides once and only once delivery of messages but increases the latency.
Recommendation 3.3.4.2.2 - Right size the offline persistent storage to ensure your application objective can be obtained without wasting resources
- The AWS IoT Greengrass message spooler can be configured with an offline message queue for messages that need to be sent to the AWS IoT Core. The size and type of storage should be configured according to the needs of the workload.
- For more:
Recommendation 3.3.4.2.2 -
Architecture Notes - BP 3.3.4.3 - Optimize data sent from devices to backend services¶
Optimizing the amount of data sent by the devices at the edge allows the backend to more easily meet the processing targets set by the business. Detailed data generated at the edge might have little value for your application in its raw form.
Recommendation 3.3.4.3.1 - Aggregate or compress data at the edge
- You can aggregate data points at the edge before sending it to the cloud, such as performing statistical aggregation, frequency histograms, signal processing.
- For example, if you are using AWS IoT Greengrass you can implement data processing at the edge with a combination of streams and Lambda functions.
- For more: