Designing IoT for the real world: unreliable networks, OTA, and data you can trust
Short answer: IoT data goes wrong because field devices live on networks that drop, and naive retries then duplicate what the drop lost. The fix is to make the device own the truth: buffer readings locally and forward them when the link returns, give every reading a stable identifier so ingestion is idempotent, and update firmware over the air with staged rollout and rollback. The result is telemetry that is complete and exactly-once.
The two ways field data breaks
An intermittent connection corrupts data in opposite directions. A dropout at the moment a reading is taken loses it outright. A retry after the link returns can deliver the same reading twice. Add firmware that has no safe update path and every fix becomes a physical site visit, so the fleet falls out of date and out of sync. A dashboard built on that data cannot be trusted for decisions.
Make the device own the truth
The network is unreliable, so do not depend on it at the moment that matters:
- Store and forward. Firmware writes each reading to local storage and sends it when connectivity returns. A dropout delays delivery; it never destroys data.
- Use a protocol built for the edge. MQTT is a lightweight publish-and-subscribe protocol designed for constrained devices and unreliable links, which makes it a better fit than chatty, connection-heavy alternatives.
- Make ingestion idempotent. Give every reading a stable identifier so that processing it twice has the same effect as once. A reconnect that resends buffered data then cannot double-count.
This is the architecture behind our trustworthy telemetry case study.
Update a fleet you cannot physically reach
Over-the-air updates are essential and dangerous in equal measure: one bad image pushed to every device can brick a fleet. The discipline is to treat an update like a deploy. Roll it out to a small cohort first, run health checks, and roll back automatically if they fail. Managed this way, the fleet stays current without anyone driving to a device, and a bad release is contained to a handful of units.
Observe the devices, not just the dashboard
A reading that never arrives looks the same as a device that is fine but quiet. Instrument the devices and the pipeline so that a stuck sensor, a failing battery, or a backlog of buffered readings surfaces as a signal rather than as silence. Observability is what turns “the data looks low” into “device 14 stopped reporting at 02:00.”
Takeaways
- Field data breaks by loss on dropout and duplication on retry; design for both.
- Store-and-forward firmware plus idempotent ingestion gives complete, exactly-once data.
- Ship over-the-air updates like deploys: staged rollout, health checks, automatic rollback.
- Observe devices directly so that silence becomes a signal. See how we engineer embedded systems.
Frequently asked
How do you keep IoT data complete when the network is unreliable?
Make the device responsible for the data. Firmware buffers readings locally and forwards them when connectivity returns, so a dropout delays delivery instead of losing it. Nothing should depend on the network being up at the instant a reading is taken.
How do you stop duplicate readings after a reconnect?
Give every reading a stable identifier and make ingestion idempotent, so processing the same reading twice has the same effect as once. Retries after a reconnect then become safe and the stored series stays exactly-once.
Is it safe to update firmware across a whole fleet over the air?
Yes, when the rollout is managed. Stage the update to a small cohort, run health checks, and roll back automatically on failure, so a bad image is caught before it can reach the entire fleet.
Got a system like this to build?
An experienced engineer, not a salesperson, will scope it with you and reply within 24 hours.