Deciphering MQTT vs. Kafka: Full Comparative Guide
Messaging protocols specify the rules for delivering data from a sender to a recipient. In Internet of Things (IoT) projects, it’s important to ensure robust and efficient delivery without any data loss. In this context, people often bring up and compare two technologies: MQTT vs. Kafka.
Message Queue Telemetry Transport (MQTT) is a lightweight messaging protocol designed for constrained devices. Apache Kafka (or Kafka protocol), is a distributed streaming platform that implements its own protocol.
Both protocols use a publish-subscribe messaging model, where messages are published to a certain topic, then sent to receiving clients subscribed to the said topic. However, Kafka uses a modified approach based on a partitioned log model to provide highly scalable and reliable data streaming and storage features. To gain a better understanding about Apache Kafka, this article covers its fundamental principles and features.
Typically, IoT projects only require one messaging protocol. However, with MQTT and Kafka, it’s not a matter of MQTT vs. Kafka but understanding how they can complement each other.
This article explores the similarities and differences between MQTT and Kafka. It dispels common misconceptions that they are competing protocols, explains when to use MQTT or Kafka, and provides real scenarios of how the two protocols can work together.
Debunking MQTT vs. Kafka Misconceptions
There is a prevalent misconception that MQTT and Kafka are competing messaging protocols, largely due to their use of the publish-subscribe model.
While it’s tempting to assume that only one messaging protocol is suitable for an IoT system, that is not the case. MQTT and Kafka possess differences that, when used correctly, can complement and help optimize data gathering and processing in your IoT projects. Therefore, understanding these differences is key to leveraging the full potential of both protocols.
At a high level, MQTT ensures reliable communication between resource-limited devices operating in unstable low-bandwidth networks. Kafka ensures data exchange between applications that produce or consume large volumes of data rapidly and require high throughput, fault tolerance, and scalable data streaming in real-time.
In this scenario, an MQTT broker can be an intermediary between IoT devices generating data and the Kafka infrastructure. The MQTT broker (or a hierarchy of MQTT brokers) acts as a gateway for a network of IoT devices, aggregating the incoming data into topics. It then forwards the resulting data stream to Kafka, which subsequently makes it available for further processing by downstream consumers.
For industrial use cases, consider utilizing the Sparkplug B specification to enhance the MQTT broker’s capabilities. For more details, refer to the MQTT Sparkplug article.
Note that MQTT focuses on connecting to individual IoT devices, while Kafka’s is about connecting systems with one another.
Additionally, there are various reasons why an MQTT broker should not connect directly to the processing consumer applications and instead require a middle layer of Kafka. For example:
- You may want to retain historical data in Kafka for future access.
- Another reason could be keeping a clear separation between data producers (IoT network) and consumers. This would allow consumers to retrieve data at their own pace and not overwhelm them with the constant stream of MQTT messages.
- You may also have a mix of incoming data from different sources (not only IoT networks) and consuming systems that require a unified data format that may be unable to connect to MQTT brokers directly.
To integrate MQTT and Kafka into your project, try the Pro Edition for Eclipse Mosquitto. You can sign up for a free trial to access exclusive features, such as Kafka Bridge, which allows you to connect MQTT brokers to Kafka seamlessly.
The following section delves into comparisons between the architecture, data integration, and performance of MQTT vs. Kafka.
Differences between MQTT and Kafka
The main differences between MQTT and Kafka can be grouped into the following categories.
MQTT vs. Kafka messaging models
In both MQTT and Kafka, the order of message consumption corresponds to the order of message production. A key feature of the publish-subscribe model, which they both use is the ability of multiple subscribers to consume the published data simultaneously. However, Kafka combines it with the queue model, ensuring that only a single consumer receives a message.
To combine these conflicting principles, Kafka implements a partitioned log model. Under this model, each topic to which messages are published is divided into partitions. Each partition is an ordered, immutable sequence of records (messages) that is continuously appended to – a commit log. Message order is preserved within each partition, and messages are distributed across partitions based on a certain key deduced from each message or round-robin.
Topic consumers are organized into consumer groups and each consumer within the group may consume a designated single partition. This setup implies each partition serves as its own queue within the context of a consumer group. On the other hand, multiple consumers from different groups can consume messages from the same partition in parallel, which is typical for a publish-subscribe model.
Unlike Kafka, this difference means MQTT brokers (specifically MQTT version 3) cannot have multiple subscribing instances of the same service consuming messages in a round-robin fashion – a typical messaging queue scenario. In a messaging queue, each service instance would continuously consume a single distinct message at a time. This enables distributing the overall burden of message processing over all the subscribed instances and preventing duplicate message consumption by a single service.
The combination of queue and publish-subscribe models in Kafka underscores its orientation towards being a distributed (also in terms of message consumption) data processing platform. It enables high-throughput data exchange between various applications and systems. At the same time, the MQTT broker fulfils its role as a central data exchange hub specifically designed to manage traffic from IoT devices.
However, MQTT version 5.0 introduces a shared subscription feature that mimics the queue model’s distributed consumption and load balancing features. This results in a less pronounced distinction between MQTT 5.0 and Kafka messaging models. Nonetheless, Kafka still provides more distributed, fault-tolerant, and high-throughput-oriented features. It can have key-based partitions, persist partition data indefinitely, and dynamically rebalance said partitions by increasing their number and distributing the load.
Message delivery guarantees
MQTT provides three Quality of Service (QoS) levels: QoS 0 (at most once), QoS 1 (at least once), and QoS 2 (exactly once). These levels offer flexibility in choosing a reliability level that suits your project needs.
Apache Kafka, by default, guarantees at least once message delivery. However, you can adjust the configuration settings to achieve at most once delivery. If you require exactly once delivery, you need to modify the application layer logic as described in Apache Kafka’s documentation.
Message persistence
While some MQTT brokers like Pro Mosquitto can support additional message persistence features, the MQTT protocol only specifies one feature called retained messages. This means a single message per topic can be persisted and sent to a newly subscribing client to immediately provide the latest state of the topic.
In Apache Kafka, the persistence of all messages is a default feature. It is a distributed commit log with a configurable retention period for all records, including indefinite retention. Kafka provides fault tolerance and data replayability with message persistence, allowing you to access previous records at any time.
Fault tolerance
MQTT brokers ensure reliable message delivery with QoS guarantees to mitigate message delivery failures. However, the MQTT standard does not specify broker recovery strategies in case of server failure. While many brokers provide high availability clusters to tackle this problem, these are additional broker-specific features.
Kafka considers data distribution and fault tolerance, with its cluster replication feature being a part of its core functionality. This emphasizes Kafka’s role as a distributed and fault-tolerant backbone for general-purpose deployments at a large scale, while MQTT focuses on the IoT world.
MQTT vs. Kafka performance and scalability
As mentioned earlier, the Kafka protocol is designed for systems and applications that deal with high volumes of real-time data or need to access stored historical data.
Due to this, Kafka provides robustness, scalability, and high-throughput capabilities like:
- Batching data on the producers’ side,
- Breaking up the data into partitions and storing them to disk for subsequent distribution
- Providing access to scale horizontally,
- Managing consumer groups, as well as replicating all stored data across the cluster nodes.
These features allow Kafka to scale and perform well with increasing data volumes but at the expense of a higher resource toll per broker connection.
Conversely, MQTT is a lightweight messaging protocol suitable for devices with low bandwidths and other constraints. It minimizes connection overhead on the clients and does not incur aggressive data optimization and batching strategies. It assumes a single client is limited in resources and doesn’t send high volumes of data but rather occasional discrete messages.
Note: While individual IoT devices can send occasional discrete messages, a network of such devices can still generate a high-volume stream of data. This is where a combination of MQTT and Kafka in the same project can offer benefits by leveraging their strengths and compensating for their weaknesses.
Pull vs. Push based model
MQTT brokers work by pushing messages to the subscribing clients, and the protocol does not anticipate receiving an overwhelming amount of data per topic. Moreover, the broker does much of the heavy lifting for the client, allowing IoT devices to be simple “dumb” receivers.
In contrast, Kafka stores all data in topic partitions on the brokers, allowing consumers to pull the data themselves. This approach enables consumers to retrieve data at their own pace, ensuring that they are not overwhelmed with incoming data and, simultaneously, enabling natural batching of messages that have yet to be consumed.
What are the respective pros and cons of MQTT vs. Kafka?
Pros of using MQTT
Lightweight
MQTT is designed to generate a very small packet size and has a fixed message header of just 2 bytes, which is considerably smaller than protocols such as Kafka or HTTP. It operates with minimal overhead, making it possible to work on resource-constrained devices.
Scalable in terms of the number of connections
Due to its low overhead, MQTT brokers can handle millions of IoT device connections that do not simultaneously produce high amounts of data.
Low energy consumption
MQTT is one of the lowest energy-consuming protocols because of the minimal overhead per message. It uses a long-lasting, persistent TCP session that is kept alive for the duration of the client connection, further enhancing its efficiency. This preserves power by not having to reestablish a connection on every request, like in HTTP, or transmit larger amounts of metadata per message to keep track of the current partition state, as with Kafka. This makes MQTT extremely battery-friendly, especially when paired with energy-efficient physical layer protocols like LoRa or Bluetooth Low Energy.
On 3G networks, the MQTT protocol consumes less energy than HTTP and on wireless networks. This is why MQTT is ideal for projects that require battery-limited devices to stay connected for extended periods.
Configurable message delivery guarantees
The MQTT protocol provides three Qos levels, which you can configure to determine the level of reliability required for your project’s communication:
- QoS 0: Delivery at most once, with no guarantee.
- QoS 1: Delivery at least once, with guarantee.
- QoS 2: Guarantees delivery exactly once, through a four-step handshake.
The lower the QoS level, the smaller the communication overhead associated with message transfer. This enables further reduction and fine-tuning of the resources expended during the messaging process, thereby increasing communication speed. It also preserves more battery when the strictest delivery guarantee is not necessary. Refer to this article for more details on MQTT QoS levels and how they work.
Client-oriented features: Last Will and Testament (LWT)
The MQTT standard offers features that simplify client connection and management. One is the LWT feature, where a client can specify a ‘last will’ message that the broker will send if the client unexpectedly disconnects. LWT is particularly useful in monitoring and alert systems, ensuring subscribing clients are aware of any disconnections. Refer to this MQTT Last Will article to learn more about the LWT feature and its implementation in your projects.
LWT and other client-oriented features (e.g., session persistence, retained messages) accentuate MQTT’s focus on managing client state. Conversely, Kafka emphasizes managing the overall data flow.
If you want to learn more about MQTT, join the MQTT Academy and access our free MQTT courses.
Cons of using MQTT
Lack of request-response communication pattern
The publish-subscribe model doesn’t inherently provide a way to receive an instant synchronous response to a message published on a topic. In particular, MQTT version 5.0 implements the request-response pattern with response topics and correlation data to map requests to their respective responses.
However, using such a pattern requires additional implementation efforts from the client’s end. Publishers have to publish a message and wait for a response on a topic they should be subscribed to beforehand. On the other hand, subscribers must direct their responses to the designated response topics upon message reception.
Traffic scalability
Thanks to the lightweight nature of the protocol, MQTT brokers can typically scale the number of client connections quite well. However, suppose the clients begin producing messages frequently and are densely connected (multiple clients subscribe and listen to the same topic). In this case, this may strain the broker, leading to latency or service disruptions. Therefore, utilizing multiple brokers or establishing a broker hierarchy and implementing load balancing is essential.
Message size
MQTT is an exceptionally efficient protocol but may lose its efficiency if not used correctly. It’s important to remember that MQTT is meant for restricted devices that send small message payloads. Therefore, using MQTT with large messages could overwhelm some constrained clients.
Sending large messages reduces the overall effectiveness of the protocol even if the connected clients are capable of processing large messages. Simply put, adding an MQTT 2 byte header or an HTTP 8KB header will not make a huge difference if the message size is measured in megabytes. Additionally, while the theoretical upper limit for MQTT messages is 256 megabytes, approaching this limit is strongly impractical.
Centralized infrastructure and broker dependence
MQTT clients rely on the broker as the central hub for communication management. This also means that a broker can sometimes become a single point of failure. MQTT communication is impossible without a broker, so protecting it from denial-of-service and potential server failures is essential.
One way to achieve this is by utilizing multiple broker instances and placing a load balancer before them. However, this approach isn’t seamless and does not inherently maintain client data across brokers. Typically, enterprise broker solutions such as Pro Mosquitto offer a high availability cluster to mitigate this issue.
Notes on security
Some sources claim a major drawback of MQTT is its lack of out-of-the-box security features. But, this is generally not true. MQTT broker implementations, particularly enterprise solutions offering comprehensive security capabilities, tend to prioritize security as one of the first aspects to address.
The MQTT specification doesn’t stipulate concrete security features for the protocol but suggests extending your MQTT implementation to use TLS over TCP connections for enhanced security.
Advantages of using Apache Kafka
Data volume scalability
Apache Kafka can scale quickly with an increase in ingested data volumes. It can scale horizontally by adding more brokers or partitions to accommodate the additional load.
High performance
Kafka is tailored for high-throughput and low-latency, guaranteeing high performance when faced with large volumes of data. This is achievable through an efficient append-only data structure, optimized I/O caching strategy, and aggressive message batching. Additionally, Kafka’s distributed nature combines pub-sub topics with queue-like structured partitions to allow for parallel and balanced message consumption.
According to the Apache projects directory, a Kafka broker can handle hundreds of megabytes of data per second from thousands of clients.
Fault tolerance and high availability
Kafka employs a partitioned log model and distributed clustering to ensure that partition data is replicated to multiple brokers within a cluster. This ensures that if a broker fails or disconnects, another broker replica takes its place and serves data to consumers without any noticeable interruption.
Retention of historical data
Kafka partitions store ingested messages (records) to disk and can retain them for a specified time, even indefinitely, or until reaching a certain size threshold (e.g., 10GB). This capability enables querying and analyzing past data at any time. It also allows failed clients to reconnect and access missed messages, and enables clients that cannot engage in real-time communication to connect later and batch-process the stored messages.
Flexibility
Kafka is a highly versatile system that can connect with a wide range of applications and services through existing connectors. It provides extensible APIs with implementations in many programming languages, making it more adaptable.
It is also highly configurable, allowing customization to meet various use cases and environmental conditions. Moreover, Kafka Streams API provides capabilities for message processing and aggregation, enabling the preprocessing of messages even before they reach their destination.
Disadvantages of using Apache Kafka
Steep learning curve
One of the drawbacks of using Apache Kafka is its vast array of configuration options. Users need to have a comprehensive understanding of Kafka’s architecture, integrations, configuration, and maintenance to use Kafka clusters effectively. Mastering Kafka typically requires significant effort, unlike lighter and more straightforward systems like MQTT brokers.
Resource-intensive infrastructure
Although Kafka can handle extremely high data volumes, such performance comes at the cost of significant computational resources and memory. For efficient operations, the Kafka platform requires setting up a cluster of brokers, and deploying and connecting producers and consumers. This infrastructure can become quite resource-intensive, so sufficient hardware must be allocated.
Additionally, monitoring and maintaining running clusters require not only computational or hardware resources but also possibly human involvement and oversight. This is often the case with systems that are expected to work under heavy loads.
Resource-heavy clients
The Kafka protocol involves additional overhead and resources to establish and maintain a connection. Clients (both producers and consumers) play an active role in Kafka’s communication model. They batch messages before sending them to the broker and proactively pull and manage offsets for message consumption. This results in more resource-intensive client programs.
Lack of support from edge devices
Many IoT devices do not directly support the Kafka protocol and require a different means of connection, such as MQTT, before integrating with it.
Lack of features to scale connection volumes (especially with IoT)
Although Kafka is capable of handling high throughputs, it may not be suitable for scaling a large number of clients. Establishing a Kafka connection requires considerable resources. While Kafka offers many optimizations for data volume, it lacks features for optimizing large numbers of connections.
Let’s consider the comparison with MQTT and connect a network of IoT devices that generates data sporadically. Assuming the end devices support direct connections to Kafka and Kafka supports all the connections, you probably wouldn’t experience any performance improvements – maybe even the contrary. This is because Kafka will not be able to utilize its batching capabilities for many discrete end-device producers that do not generate frequent messages.
Note on additional features
Apache Kafka is a widely used, open-source, and free scalable data streaming platform with many useful features. However, it still lacks some more advanced features, such as Role-Based Access Control (RBAC), audit trails, a monitoring center, etc.
These features are available in the paid version of Kafka, Confluent Kafka. Depending on the scale and purpose of your project, a paid version might incur considerable additional expenses.
Common MQTT and Kafka use case scenarios
MQTT and Kafka can complement each other in scenarios where high volumes of data flowing from an IoT network require real-time analysis and processing or storage to disk for future access.
Let’s look at some use cases where using both MQTT and Kafka can be beneficial.
Smart city management
In highly digitized and connected cities, there can be thousands of devices within a single network. These devices can include smart lights, speed cameras, closed-circuit cameras, smart waste management, and even crowd management systems on public transportation. Such devices can produce a considerable amount of data in real-time which you may want to ingest and process.
Two tools that can work efficiently in these situations are MQTT and Kafka. You can use MQTT to connect to and collect data from individual devices. After that, Kafka can forward the data to analysis applications for real-time processing and insights.
Healthcare monitoring
Hospitals and other medical facilities have adopted digitization, resulting in thousands of online devices connected to the same network. Processing data in real-time is crucial to ensure prompt action in case of emergencies and prevent loss of lives. MQTT and Kafka can connect these devices and monitor vital data.
For instance, MQTT can connect devices like ECG machines, heart rate monitors, smart thermometers, and other wearable devices and transmit the data to a centralized location. Kafka can help with real-time processing and monitoring, allowing healthcare providers to monitor and respond to any changes quickly.
Supply chain optimization
In supply chain management, you need to track packages, monitor their storage conditions, and ensure their safe and prompt delivery, which could take weeks. Hence, lightweight devices with long battery lives are crucial to effectively tracking packages distributed across different locations.
MQTT can help connect these distributed devices and transmit data to a centralized location in these cases. Then, Kafka takes over and delivers the data to respective services to analyze it and provide timely updates to customers. This approach can optimize supply chain and inventory management.
Industrial IoT
Smart factories, or digitized manufacturing facilities, can greatly benefit from using MQTT and Kafka in industrial settings. Such facilities can require hundreds or even thousands of smart devices, making it challenging to efficiently collect and process data from a centralized location. In such scenarios, MQTT can gather data from all the devices and transmit it to Kafka for storage and further delivery to dedicated data processing systems.
Consider using both MQTT and Kafka with the Pro Edition for Eclipse Mosquitto. This version includes the Kafka Bridge feature, making it easy to push MQTT data to Kafka.
When to use MQTT and when to use Kafka
Although MQTT and Kafka are complementary technologies, there are certain cases where using only one is sufficient.
When to use MQTT
MQTT is often the better choice over Kafka, such as when:
- You are working with IoT devices, but they do not generate a lot of data.
- You need to send your IoT data to a service where a direct native integration with your MQTT broker of choice exists.
- Your project has to scale with numerous end-device connections but still doesn’t produce large continuous data streams.
When to use Kafka
Consider Kafka over MQTT if your end devices are powerful enough to handle direct Kafka connections and:
- Your project handles high-volume data streams.
- You need a way to efficiently store large amounts of data to disk for later access.
- You need to perform custom data preprocessing before sending data to downstream consumers.
- Consumers of your IoT data can be connected to Kafka more easily.
Wrap up
There is a common misconception that MQTT and Apache Kafka are competing protocols. However, they can work together, particularly for projects involving numerous device connections and real-time data processing.
An MQTT broker can effectively handle connections to multiple edge devices, serving as the initial collection point for IoT data. Kafka can then ingest and store this data, exposing it for further consumption and processing by other downstream systems within your project’s infrastructure.
If you’re considering using MQTT and Kafka in your next IoT project, sign up for a free 14-day trial or 30-day on-premises trial of the Pro Edition for Eclipse Mosquitto. The pro version of the Mosquitto MQTT broker includes features like Kafka Bridge. This feature can facilitate stable, reliable, fast, and secure transmissions between IoT devices and integrate with external systems.
About the author
Serhii Orlivskyi is a full-stack software developer at Cedalo GmbH. He previously worked in the Telekom industry and software startups, gaining experience in various areas such as web technologies, services, relational databases, billing systems, and eventually IoT.
While searching for new areas to explore, Serhii came across Cedalo and started as a Mosquitto Management Center developer. Over time, Serhii delved deeper into the MQTT protocol and the intricacies of managing IoT ecosystems.
Recognizing the immense potential of MQTT and IoT, he continues to expand his knowledge in this rapidly growing industry and contributes by writing and editing technical articles for Cedalo's blog.