With the digital evolution, the volume of data transfer is pretty huge and the companies need robust, scalable, and flexible platform which can handle the big data. Apache Kafka is a pub-sub messaging service platform which allows to send messages from one end to another whilst handling the large chunk of the data. It also works well for both online & offline types of messaging services. Apache Kafka’s compatibility with the real-time data analysis tools like Apache Spark and Storm gives it a competitive edge over the platforms. Also, as the Apache Kafka is an open-source platform, companies can modify the set of operations as per the convenience.
It would give us a better idea about the advantages, usefulness, and the need of this service platform if we go through the use of cases of Apache Kafka in detail. So let’s take a look at how Apache Kafka has benefited the companies through real life examples.
Top 10 Apache Kafka use cases:
Netflix needs no introduction. One of the world’s most innovative and robust OTT platform uses Apache Kafka in its keystone pipeline project to push and receive notifications. There are two types of Kafka used by Netflix which are Fronting Kafka, used for data collection and buffering by producers and Consumer Kafka, used for content routing to the consumers.
All of us know that the amount of data processed at Netflix is pretty huge, and Netflix uses 36 Kafka clusters (out of which 24 are Fronting Kafka and 12 are Consumer Kafka) to work on almost 700 billion instances in a day.
Netflix has achieved a data loss rate of 0.01% through this keystone pipeline project and Apache Kafka is a key driver to reduce this data loss to such a significant amount. Netflix plans to use 0.9.0.1 version to improve resource utilization and availability.
Spotify, which is the world’s biggest music platform, has a huge database to maintain 200 million users and 40 million paid tracks.
To handle such huge amount of data, Spotify used various big data analytics tools. Apache Kafka was used to notify the users recommending the playlists, pushing targeted Ads amongst many other important features. This initiative helped Spotify to increase its user base and become one of the market leaders in music industry.
But, recently Spotify decided that they did not want to maintain and process all of that data so they made a switch to Google hosted pub-sub platform to manage the growing data.
There are a lot of parameters where a giant in the travel industry like Uber needs to have a system which is uncompromising to errors, and fault-tolerant. Uber uses Apache Kafka to run their driver injury protection program in more than 200 cities. Drivers registered on Uber pay a premium on every ride and this program has been working successfully due to scalability and robustness of Apache Kafka.
It has achieved this success largely through the unblocked batch processing method, which allows Uber engineering team to get a steady throughput. The multiple retries have allowed the Uber team to work on segmentation of messages to achieve a real-time process updates and flexibility.
Uber is planning on introducing a framework, where they can improve the uptime, grow, scale and facilitate the program without having to decrease the developer time with Apache Kafka.
Lyft, one of the growing companies in transportation industry is known for its advanced focus on the use of technology. It is using the enterprise version of Apache Kafka, provided by Confluent as a data streaming platform. Earlier, Lyft used Kinesis for this purpose, but as the volume of data grew, they migrated to Apache Kafka due to its scalability and stability to process large amount of data.
Currently, Lyft is planning to shift to Flink, for these services as they are planning to move to more geography based models.
LinkedIn, one of the world’s most prominent B2B social media platforms handles well over a trillion messages per day. And we thought the number of messages handled by Netflix were huge. This figure is mind-blowing and LinkedIn has seen a rise of over 1200x over the last few years.
LinkedIn uses different clusters for different applications to avoid clashing of failure of one application which would lead to harm the other applications in the cluster. Broker Kafka clusters at LinkedIn help them to differentiate and white list certain users to allow them a higher bandwidth and ensure the seamless user experience.
LinkedIn plans to achieve a lesser number of data loss rate through the Mirror Maker. As the Mirror Maker is used as the intermediary between Kafka clusters and Kafka topics. At present, there is a limit on the message size of 1 MB. But, through Kafka ecosystem, LinkedIn plans to enable the publishers and the users to send well over that limit in the coming future.
Twitter, a social media platform known for its real-time news, story updates is now using Apache Kafka to process the huge amount of data. Earlier, Twitter used to have their own pub-sub system EventBus to do this analysis and the data processing but looking at the benefits and the capabilities of Apache Kafka, they made the switch.
As the amount of data on Twitter is increasing with every passing day, it was more logical to use Apache Kafka instead of sticking to EventBus. Migrating to Apache Kafka allowed them to ease input-output operations, increase bandwidth allocation, ease of data replication, and lesser amount of cost.
Rabobank, a Dutch multinational bank, known for its digital initiative uses Apache Kafka for one of its essential service called as Rabo Alerts. The aim of this service is to notify the customers about the various financial events, from simple events such as amount transaction from or on your account, to more complex events such as future investment suggestions based on your credit score etc.
These notifications are push notifications and though the Rabobank could perform the simple tasks without Apache Kafka, they needed a robust tool to perform a detailed analysis of a huge amount of data.
Goldman Sachs, a giant in the financial services sector, developed a Core Platform to handle data which is almost around 1.5 Tb per week. This platform uses Apache Kafka to as a pub-sub messaging platform. Even though, the number of data handled by Goldman Sachs is relatively smaller to that of Netflix or LinkedIn, it is still a considerable amount of data.
The key factors at Goldman Sachs were to develop a Core Platform system which could achieve a higher data loss prevention rate, easier disaster recovery, and minimize the outage time. The other significant objectives included improving availability and enhancing the transparency as these factors are essential in any financial services firm.
Goldman Sachs have achieved these objectives through the successful implementation of the Core Platform system and Apache Kafka was a key driver of this project.
New York Times, one of the oldest news media house, has transformed itself to thrive in this era of digital transformation. The use of technologies such as big data analytics is not new to this media house. Let’s take a look at how Apache Kafka transformed the New York Times’ data processing.
Whenever an article is published on NYT, it needs to be made available on all sorts of platforms and delivered to their subscribers within no time. Earlier, NYT used to distribute the articles and allow the access to the subscribers, but there were some issues such as the one which prevented the users to access the previously published articles. Or the ones in which higher level of inter–teams coordination was required to maintain and segment the articles since different teams used different APIs.
To address this issue, NYT developed a project called as Publishing Pipeline in which Apache Kafka is used to remove API-based issues through its log-based architecture. Since, it is a pub-sub message system, it cannot only cover the data integration but also data analytics part unlike other log-based architecture services.
They implemented Kafka back in 2015-16 and it was success according to them as Apache Kafka simplified the backend and front end deployments. It also decreased the workload of the developers and helped the NYT to improve the content accessibility.
Shopify, a renowned ecommerce CMS platform uses Apache Kafka to push notifications ranging from updates, offers, to many other service related scenarios. Shopify has also deployed Apache Kafka to serve the log aggregation and event management purposes.
Shopify uses different clusters for different sort of events to segment the large amount of data with the help of mirror maker.
Currently, Shopify has deployed their customer data on the cloud and uses Kubernetes to ensure that the Apache Kafka availability is not hampered.
We can understand the huge impact of Apache Kafka through the use cases and how Apache Kafka is scalable, compatible and useful even with the large amount of data through these Apache Kafka uses cases.
You May Also Like: