MQTT High Availability

MQTT High Availability

Pro Edition for Eclipse Mosquitto leverages MQTT High Availability (HA) that allows your smart automation to keep operating, providing a continuous service at all times and in all cases. 

How MQTT High Availability works

Clustering is the key to the MQTT high availability functionality. You require a minimum set of three nodes to create an MQTT broker cluster. All required information coming in and out of the broker is synchronized across the cluster at any moment. In other words, all three brokers possess the same vitals for their functioning data. When one node fails, another automatically takes over all MQTT broker operations. This process is called the MQTT failover.

MQTT broker normal operation

Mosquitto cluster

MQTT broker normal operation

One or more load balancers funnel all traffic to the current leader MQTT broker node while the others stay in follower mode (“Follower nodes”). At any given moment in time, our MQTT Cluster Management (CM) keeps synchronizing the following data within the cluster: persistent sessions, retained messages, message queues, ACL (Access Control List), and authentication information on all clients, as well as the overall cluster status.

MQTT broker failover operation

MQTT failover - MQTT broker failover

MQTT broker failover operation

If the leader node fails, the MQTT broker cluster performs re-organization and assigns the role of the leader node to one of the followers. Due to the constant synchronization process, the new leader node is up-to-date on the communication status and clients’ information. Thus, it can take over seamlessly, ensuring smooth and continuous operation of the cluster. Now, the load balancers route all traffic through the new leader. 

​​MQTT broker back to normal operation

MQTT broker cluster recovered

​​MQTT broker back to normal operation

Once the initial node is restarted, it rejoins the cluster, becomes available, and takes over a follower role. Therefore, the operation is back to normal.

Which types of High Availability cluster modes are there?

Pro Mosquitto supports two MQTT HA cluster modes: Full Sync and Dynamic-Security Sync. 

Full Sync acts as an active-passive cluster where only one of the three nodes is active and can have clients connect. This is especially useful in failover cases, allowing the active node to synchronize the MQTT session and authenticate information across the cluster.

The Dynamic-Security Sync was introduced to cover even more use cases, such as failover and performance handling. This cluster mode acts as an active-active cluster, where all three nodes can have clients connect to them simultaneously. However, only dynamic security authentication is synchronized between nodes. Check out the HA cluster modes documentation for more information.

Does my use case require MQTT High Availability?

MQTT High Availability ensures that your clients can always reach the broker. But why is enabling MQTT HA so important?

Ensure continuous communication with your clients

Mosquitto High Availability use cases

Ensure continuous communication with your clients

Every industrial-grade solution that relies on MQTT as a central piece of its communication should employ a high availability setup. Otherwise, MQTT becomes the single point of failure and jeopardizes your whole solution.

For instance, you used an MQTT broker to implement a smart factory solution that interprets all events as published and received messages and, based on them, performs specific actions. When your broker is offline, it won’t publish data. In other words, your smart solution won’t be functioning at all. 

Sometimes a broker node might not be reachable. Often, it is not the broker that causes it but the underlying server hardware, operating systems, network connectivity, etc. Therefore, we recommend using MQTT Broker with High Availability to ensure your solution doesn’t suffer from outages.

Avoid loss of data

Mosquitto MQTT cluster smooth operation

Avoid loss of data

Many typical MQTT clients run on constrained devices with limited resources and cannot (really) persist data. Even if a client can store data in times of missing connectivity, this configuration becomes too complex to implement since you must consider too many aspects when choosing one for your solution.

For example, you have a sensor that collects data and uses an MQTT broker to publish information on different topics. When a server goes down and the MQTT broker loses connection, all information gathered by a sensor during the server’s downtime period will not be published by the broker and will not be received by clients. As a result, your system will lose data between when the broker lost the connection and when the connection was finally restored.

If your system were equipped with the MQTT High Availability, it would continue seamless operation since switching from one node to another takes only several seconds. As a result, all information between the two mentioned earlier data points would still be available.

How does MQTT High Availability differ from single-node systems?

Clients see the MQTT High Availability systems as single-node brokers. The clients have no idea whether they are connected to node No. 1 or No. 2. What they know is that they are connected to a broker. 

Due to performance reasons, single-node brokers access the hard drive, store the current status, and queue only a few times per hour. If a single-node broker fails, any changes or queue status updates after the last and before a new disk writing cycle will be lost.

The Pro Edition for Eclipse Mosquitto MQTT broker with high availability can perfectly cope with such situations and avoid data loss by performing constant synchronization. It is important to note that data synchronization in the MQTT High Availability setup follows strict conformation to the OASIS MQTT Specification V5.

In particular, single-node systems largely depend on the stability of the underlying server, hardware, and network connectivity. When one of these components fails, the single-node system is no longer reachable. In this case, the communication between clients interrupts and is unavailable until the failure is rectified. Depending on the type of malfunction, the time required for fixing can take seconds to, in extreme cases, days until replacement parts arrive. 

Instead, the MQTT High Availability system’s nodes typically operate on different servers. It is even possible to place them in different geographical regions physically. Hence, if there is an outage of network servers, the MQTT broker node itself, or any other component, the operation is automatically switched to one of the follower nodes within less than a second. The above scenario is how the Pro Mosquitto MQTT High Availability works. Moreover, it is precisely how the MQTT High Availability setup dramatically increases the overall system availability and, at the same time, ensures the smooth operation of your IoT infrastructure. 

MQTT High Availability extra questions

If you have any further questions, feel free to contact us.

A load balancer performs the crucial monitoring function for the Pro Mosquitto MQTT cluster. It is responsible for checking the availability of servers and closing ports for nodes that are no longer available. When the leader server fails, the cluster re-organizes and defines a new leader. 

As soon as a new leader is determined, the load balancer sends all clients there. Although running an MQTT cluster with only one load balancer is possible, we recommend using three. Per default, Pro Edition for Eclipse Mosquitto MQTT High Availability is configured to operate using three load balancers with different IPs to avoid introducing a new single point of failure (SPOF) with one load balancer. Usually, the load balancers’ IPs are summarized in your Domain Name System (DNS) under a common URL. Like this, clients can still be configured for a single access point URL, although the load balancing service is redundant.

Our team can also help you set up your system with a floating IP configuration. In this case, external clients will only contact a single IP address. The floating IP service, in turn, can then internally re-route the request to the three load balancer IPs. To understand whether we can enable this configuration in your environment, we must first check your setup and what company performs the hosting provider’s role.

Yes, you can. However, typically, we need to double-check if such a configuration would work. In this case, we look together with you at the overall system configuration and determine the feasibility. If necessary, our team is ready to support you during the system specification, implementation, and testing stages and provide other professional services on request. 

Our full HA mode (Pro Mosquitto version 2.5 onwards) is a minimum three-node active-passive cluster, with only one node active at once – meaning clients only connect to that node. In this case, all clients can communicate with each other. The cluster synchronizes dynsec authentication and authorization information across the cluster. 

It also synchronizes persistent MQTT information across the cluster. Therefore, retained messages are present across all nodes. It also means that clients using a clean session value set to “false,” or using session-expiry-interval will have their session synchronized across the cluster. So if the active broker node goes down, when the client reconnects all of its subscriptions, both in-flight and queued messages will be present on the newly active node. 

As far as the client is concerned, it is the same node.

Our dynsec HA mode (Pro Mosquitto 2.6 onwards) is a minimum three-node active-active cluster, with all nodes active simultaneously. 

Clients can connect to any node. In this case, clients may not be able to communicate with one another. The cluster synchronizes dynsec authentication and authorization information across the cluster. It does not synchronize anything else. This is a highly available setup because a node should always be available for a client to connect to. This mode has the advantage of being able to scale horizontally. It has the disadvantage that it doesn’t meet all usage patterns. It is well suited to the case where you have e.g., sensors reporting back to a central backend service, but the sensors do not communicate with each other. 

This mode has limited support for clients using a clean session value set to “false.” If we use HAProxy as the load balancer, it can inspect the CONNECT packets and always direct clients to the node they previously connected to. Therefore, if the client loses connection briefly it will be able to reconnect to its own node, but if the node goes down, the session information is not available.

Back to top