1 Apache Kafka Event Streaming Platform March, 2019 / Boston, MA @gamussa | #BostonKafka | @ConfluentINc

2 @gamussa | #BostonKafka | @ConfluentINc

@gamussa | #BostonKafka | @ConfluentINc

Raffle, yeah 🚀 Follow @gamussa 📸🖼🏋 Tag @gamussa With #BostonKafka

5 A company is build on DATA FLOWS but All we have is DATA STORES @gamussa | #BostonKafka | @ConfluentINc

6 Pre-Streaming @gamussa | #BostonKafka | @ConfluentINc

7 @gamussa | #BostonKafka | @ConfluentINc

8 New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time @gamussa | #BostonKafka | @ConfluentINc

9 Origins in Stream Processing Java Apps with Kafka Streams or KSQL Serving Layer (Microservices, Elastic, etc.) High Throughput Continuous Streaming platform Computation @gamussa | #BostonKafka | API based clustering @ConfluentINc

10 Streaming Platform Storage Pub / Sub Processing @gamussa | #BostonKafka | @ConfluentINc

11 Storage @gamussa | #BostonKafka | @ConfluentINc

12 Core Abstraction ● DB - table ● Hadoop - file ● Kafka - ? @gamussa | #BostonKafka | @ConfluentINc

13 LOG @gamussa | #BostonKafka | @ConfluentINc

14 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #BostonKafka | @ConfluentINc

15 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #BostonKafka | @ConfluentINc

16 Pub / Sub @gamussa | #BostonKafka | @ConfluentINc

17 Time @gamussa | #BostonKafka | @ConfluentINc

18 Time C1 @gamussa | C2 #BostonKafka C3 | @ConfluentINc

19 Time A B hash(key) % numPartitions = N C D @gamussa | #BostonKafka | @ConfluentINc

20 Time Messages will be produced in a round robin fashion @gamussa | #BostonKafka | @ConfluentINc

21 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here Scan Viktor is here @gamussa | Scan #BostonKafka | @ConfluentINc

22 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan #BostonKafka | Scan @ConfluentINc

23 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan #BostonKafka | @ConfluentINc Scan

24 Only Sequential Access Old Read to offset & scan @gamussa | #BostonKafka | @ConfluentINc New

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

26 C @gamussa | #BostonKafka | @ConfluentINc

27 CC C1 CC C2 @gamussa | #BostonKafka | @ConfluentINc

28 @gamussa | #BostonKafka | C C C C @ConfluentINc

29 @gamussa | #BostonKafka | 0 1 2 3 @ConfluentINc

30 @gamussa | #BostonKafka | 0 1 2 3 @ConfluentINc

31 @gamussa | #BostonKafka | 0, 3 1 2 3 @ConfluentINc

32 Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa | #BostonKafka | @ConfluentINc

33 Replicate to get fault leader msg Machine B Machine A @gamussa replicate | #BostonKafka | msg @ConfluentINc

34 Partition Leadership and Replication Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #BostonKafka | @ConfluentINc Follower

35 Replication provides resiliency A replica takes over on machine failure @gamussa | #BostonKafka | @ConfluentINc

36 Partition Leadership and Replication - node failure Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #BostonKafka | @ConfluentINc Follower

37 The log is a type of durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage

Stop! Demo time! @gamussa | #BostonKafka | @ConfluentINc

39 Processing @gamussa | #BostonKafka | @ConfluentINc

40 Streaming is the toolset for dealing with events as they move! @gamussa | #BostonKafka | @ConfluentINc

41 What exactly is Stream Processing? authorization_attempts @gamussa | possible_fraud #BostonKafka | @ConfluentINc

42 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc

43 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc

44 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc

45 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc

46 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc

47 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc

48 Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa | #BostonKafka | @ConfluentINc

49 KSQL #FTW ksql> 1 UI @gamussa POST /query 2 CLI | #BostonKafka 3 REST | @ConfluentINc 4 Headless

50 Interaction with Kafka KSQL JVM application Kafka (processing) with Kafka Streams (processing) (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers | #BostonKafka | @ConfluentINc

51 Standing on the shoulders of Streaming Giants KSQL Ease of use Powered by KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs @gamussa | Flexibility #BostonKafka | @ConfluentINc

52 Find your local Meetup Group https://cnfl.io/kafka-meetups Grab Stream Processing books https://cnfl.io/book-bundle Join us in Slack http://cnfl.io/slack @gamussa | #BostonKafka | @ConfluentINc

53 One more thing… @gamussa | #BostonKafka | @ConfluentINc

54 @gamussa | #BostonKafka | @ConfluentINc

55 @gamussa | #BostonKafka | @ConfluentINc

https://kafka-summit.org Gamov30 @gamussa | @ @tlberglund | #DEVnexus

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa | @ #BostonKafka | @ConfluentINc