My Take on Kafka vs. Kinesis

I spent some time working with Apache #Kafka and also did a small poc on #Kinesis. Often I am asked which one to prefer.Well it totally depends on the cost involved,volume,and kind of human resources available for support.Here is my take on Apache Kafka vs Kinesis.

First difference is the obvious one Apache Kafka is open source and Kinesis is #aws service that you have to pay for as per usage.They both have different terms for the similar functions.What is partition in Kafka is called Shards in Kinesis. Both are distributed platform for messaging.

At the high level,Setting up Kafka requires a Kafka Cluster and Zookeeper.Kafka relies on Zookeeper to store information about Kafka Brokers, Topics configuration and offset for topics.In Kafka, messages are sent to the cluster in form of topics.You can write programs to write to the topics and also read from the topic,these programs are called producer and consumer respectively.

Kafka High Availability(HA) considerations while setting up Kafka: For Zookeepers Quorum to work out of 3 nodes 2 should be up and running. Out of 3 Zookeeper nodes, one specific node is chosen as the leader. When the leader goes down another zookeeper node is chosen as a leader. But for the quorum to work 2 nodes out of 3 should be up and running.For the topic, the leader and replicas are distributed among different availability zones( which is 3 in most cases)The replication factor needs to be configured while setting up Kafka.

In my experience Working with Kafka requires efforts for setting up, managing and supporting, monitoring Kafka cluster and Zookeeper.Kafka comes with a long list of settings for topics, producer, consumer, cluster that must be tried and tested before can setup Kakfa cluster in cloud like #AWS using #Terraform and #Ansible.

On the other hand #Kinesis is fully managed aws service that is easily configurable with a good number of API available with Kinesis for monitoring, encrypting etc.Monitoring Kinesis stream is made easy with integration to #aws #cloudwatch. And Producer and Consumer Libraries supporting and making it easy development for Producer and Consumer for Kinesis Stream.

Data from Kinesis is easily made available to other aws components like #s3 ,#Redshift #ElasticSearch. There are other services depending on your use case Kinesis Firehose and Kinesis Data Analytics. With Kinesis Data is encrypted at rest.The retention in Kinesis is 24 hrs compared to default retention of 7 days in Kafka but it can be edited. Kinesis too supports data recovery and backup.

One way to integrate real-time data coming from these streaming/messaging platforms like kafka/kinesis to #Snowflake is to configure #snowpipe and continuous load data from s3 into snowflake.As #snowpipe checks s3 bucket for any new file or modification into existing ones.As soon as there is some change it runs snowflake copy command to load it into snowflake database/table.

Last words Kafka or Kinesis according to my experience it depends on the cost involved,volume,and human skill set available for support.Please Share your experience working with Kafka and Kinesis.

Data Engineer, BI, Data Analytics,DWH