Don't roll your own IoT Protocol

Zephyr IoT LwM2M product-development

The pitfalls of IoT development

We’re all aware of how the surge of open-source has transformed the way we build software. Nowadays, most successful projects follow the same blueprint - by leveraging collaboration through open-source development, you minimize the time spent on the things that have little to do with your core business. Even if it was possible to write a smarter database engine or a more efficient serialization protocol than what’s already provided by community collaboration, odds are it won’t be worth your time. As a whole, this is well recognized by the software business.

Even the embedded Linux world seems to have caught on to this for the most part. However, this does not seem to be the case for the world of sensor nodes and other resource-constrained devices.

In the world of tiny microcontrollers, you still see a lot of business logic code squeezed into copy pasted vendor HALs, serializing data by memcpy:ing structs, sending actuation commands via JSON-encoded strings and homebrewed encryption protocols leading to embarassing posts on hackernews.

Why is this? One factor is that, by tradition, firmware development has had more in common with hardware design than application development. The challenges faced by embedded developers in the past were perhaps not best solved by obsessing over architectural modularity and leveraging large-scale collaboration.

But while we still have to troubleshoot noisy clock signals and inexplicable watchdog resets, it’s not just voltages, registers and pins anymore. You’re also expected to somehow couple the ever-increasing complexity in application logic with stringent constraints on robustness, power consumption, network availability, latency and bandwidth efficiency. Oh and by the way, we also expect you to support remote firmware upgrades, state of the art encryption and a server-facing API that’s easy to work with.

It’s easy to not fully grasp the scope of this, especially since prototyping an IoT product is so incredibly simple - just solder a sensor breakout board to a devkit, wrap it in JSON and pipe the data to some hardcoded static IP over WiFi. If you think you’re 95% done at this point, perhaps it shouldn’t be surprising if you think you can “wing it” the rest of the way.

But as the project continues and requirements become more clear, you’ll soon discover that the devil is in the details. There’s endianness bugs in the data encoding algorithm. You spend hours in meetings with the cloud team discussing protocol details. Can you send the serial number only once per boot? Should we ack RPCs on reception or completion? Do we need a heartbeat? How do we serialize that? The timestamp is in seconds since boot, cloud insists you implement NTP. You realize too late that your SDK doesn’t support DNS. Or encryption. How do we upgrade the firmware? Can we have automatic rollbacks? Should we pause data uplinks while we’re upgrading?

While this picture may be somewhat too bleak, it hopefully serves to illustrate that there are many hidden pitfalls when it comes to architecture and protocol design. Pitfalls that take time to discover on your own, time much better spent on focusing on improving the core business logic of your product.

The light at the end of the tunnel

It’s not that creating IoT products is impossible if you stick to the dated model of constantly reinventing the wheel. But we believe it’s possible to build products that are more efficient, more stable, more secure, more flexible, more featureful AND have a much shorter time to market by adapting a more modern approach.

Modern products are built using modern development principles: By maximizing use of modular components, collaboratively developed and maintained, communicating with each other using APIs and protocols that are well-defined, open and proven by use.

Up until fairly recently, the lack of an active open-source community focusing on resource-constrained devices made this practically impossible. The modular components were mostly proprietary and closely tied to specific hardware and the popular open protocols were either unsuited for devices running on coin cell batteries or simply lacked the community traction required for a protocol to be considered well established.

The Linux Foundation realized that a unified community is a prerequisite for large-scale collaboration and so Zephyr RTOS was introduced. We never get tired of evangelizing about Zephyr - we believe it’s a truly transformative open source project that we will continue working with for many, many years to come. If you’re curious about Zephyr, drop us a line and we’ll be happy to chat!

In this article, however, we’d like to give you an introduction to another important landmark - the Lightweight Machine to Machine protocol (LwM2M).

LwM2M

LwM2M is a fairly recent IoT protocol mainly designed for device management. It is based on UDP and the Constrained Application Protocol (CoAP) which, loosely speaking, is designed as a drop-in replacement for HTTP for situations where TCP/IP is either infeasible or otherwise undesirable. There exists support for other transports such as SMS and LoRa, but CoAP/UDP is the main use case.

For full disclosure, let’s make it clear that there’s nothing particularily brilliant about LwM2M - it offers an interface for client/server communications, agreeable addressing semantics for its objects/resource model and not much else.

Of course, LwM2M includes many protocol features that are fantastic (bootstrapping and firmware upgrades), but they’re entirely optional, making the scope of the core protocol rather limited. But the simplicity is exactly what makes LwM2M exciting - simple APIs are easier to agree upon and it’s precisely such consensus that facilitates collaboration and reusability on a massive scale. Indeed, this is what we’re beginning to see with LwM2M.

Object-resource model

In LwM2M, we model a device as a collection of resources, where conceptually related resources may belong to the same object. The server addresses resources by their resource path (contained inside the CoAP URI) which takes the form

"Object[/ObjectInstance]/Resource[/ResourceInstance]"

where brackets denoted an optional component - if the object/resource is single-instance, the instance ID is omitted.

All resources support one or more operations - read, write, execute and delete, which correspond to the GET, PUT, POST and DELETE operations in CoAP.

Furthermore, the server can observe resources. For instance, we can ask for sensor readings to be reported if it exceeds 2 V or it deviates by more than 10 mV compared to the last reported value.

Firmware Over the Air (FOTA)

This is a simple but well-designed API that describes the update state machine as well as the details of the image transfer which can either be “push” (this essentially gives the server write access to the update partition) or “pull” which leverages the Block Transfer mechanism already present in the CoAP specification.

When this API is coupled with a modern bootloader that supports automatic rollback, you get an incredibly robust firmware upgrade system essentially that is essentially plug and play.

Bootstrapping

The Bootstrap API is another optional (but very useful) feature of LwM2M. In short, it lets you provision your devices to connect to a bootstrap server instead of your “primary” server. The bootstrap server can then provide credentials and configurations based on device-specific information, such as location, serial number or device type. The device then disconnects from the bootstrap server and connects to the primary server, using the credentials and configuration obtained from the previous step.

While this may sound slightly round-about, it turns out it can be quite useful. It can greatly simplify provisioning if the devices are to be shipped to many different countries, or used by many different customers who each have their own cloud, because they only need to be provided with the credentials for the bootstrap server during production. The “true” server address does not need to be decided until the time of first deployment, which can potentially be much later.

Wrapping up

One key benefit of adapting a widely adopted protocol is that caveats such as data encoding, timestamp format, state synchronization, heartbeats, etc are already ironed out. You don’t have to think about how to serialize RPC calls, or how to ACK them. It’s all in the specification.

Another key benefit is the inter-operability that the OMA object registry enables, which contains definitions for thousands of common use-cases such as light switches, accelerometers and e-ink displays. The registry also allows for companies to register new custom objects at no charge. This means that any LwM2M device connect to any LwM2M server and immediately provide full read/write/exec access to all its resources - no product-specific schemas or config files needed.

Of course, inter-operability also means that many different types of devices can connect seamlessly to the same server which can potentially streamline operations by a lot for companies with diverse device fleets. Think about that that for a second - how many dev hours would it currently take in order to fully integrate an entirely new line of devices into your cloud solution?

It also helps that high quality, open-source reference implementations for both device and server are readily available. Setting up a demo server with a fully functional device management UI literally takes minutes.

We’ve only mentioned a few of the features provided by the LwM2M specification, so in a way we’re just scratching the surface of the LwM2M protocol. On the other hand, there’s conceptually very little about LwM2M to “get”. LwM2M is simply telling you: “Hi, here’s a common API. If you adhere to it, you will in return get access to the work done by everyone who is also adhering to this API”. It’s nothing magic - it’s just an invite to collaborate.

We’re still in the early stages of adoption (the protocol was officially released in 2017), but it’s already very much suited for industrialization, as we’ve most recently shown by helping Voi deploy it to thousands of their scooters world-wide.

If we’ve managed to pique your interest, don’t hesitate to give us a call, or drop us an e-mail!


If you find this article interesting you can find other posts by Endian here.