Crossing the Streams: Rethinking Stream Processing with KStreams and KSQL

A presentation at Toronto Kafka Meetup in November 2018 in Toronto, ON, Canada by Viktor Gamov

Slide 1

Slide 1

Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL TORONTO KAFKA MEETUP, NOVEMBER 2018

Slide 2

Slide 2

https://twitter.com/gAmUssA/status/1048258981595111424

Slide 3

Slide 3

Streaming is the toolset for dealing with events as they move! @gamussa #TorontoKafka @confluentinc

Slide 4

Slide 4

Java Apps / Kafka Streams Serving Layer (Cassandra, Elastic, etc.) High Throughput Continuous Streaming platform Computation @gamussa #TorontoKafka @ API based clustering @confluentinc

Slide 5

Slide 5

Stream Processing by Analogy Connect API Stream Processing Connect API $ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt Kafka Cluster @gamussa #TorontoKafka @confluentinc

Slide 6

Slide 6

Streaming Platform Architecture Application Application Application Native Client library Kafka Streams Load Balancer * REST Proxy Schema Registry Kafka Brokers @gamussa Kafka Connect Zookeeper Nodes #TorontoKafka @ @confluentinc

Slide 7

Slide 7

https://twitter.com/monitoring_king/status/1048264580743479296

Slide 8

Slide 8

LET’S TALK ABOUT THIS FRAMEWORK OF YOURS. I THINK ITS GOOD, EXCEPT IT SUCKS @gamussa #TorontoKafka @ @confluentinc

Slide 9

Slide 9

SO LET ME WRITE THE FRAMEWORK THAT’S WHY IT MIGHT BE REALLY GOOD @gamussa #TorontoKafka @ @confluentinc

Slide 10

Slide 10

Every framework Wants to be when it grows up Scalable Elastic Stateful @gamussa Fault-tolerant Distributed #TorontoKafka @confluentinc

Slide 11

Slide 11

https://twitter.com/157rahul/status/1050505569746841600

Slide 12

Slide 12

The log is a simple idea New Old Messages are added at the end of the log @gamussa #TorontoKafka @confluentinc

Slide 13

Slide 13

Consumers have a position all of their own George is here Scan New Old Fred is here @gamussa Scan Sally is here #TorontoKafka Scan @confluentinc

Slide 14

Slide 14

Only Sequential Access Old Read to offset & scan @gamussa #TorontoKafka New @confluentinc

Slide 15

Slide 15

Shard data to get scalability Producer (1) Producer (2) Producer (3) Messages are sent to different partitions Cluster of machines Partitions live on different machines @gamussa #TorontoKafka @confluentinc

Slide 16

Slide 16

CONSUMERS CONSUMER GROUP CONSUMER GROUP COORDINATOR

Slide 17

Slide 17

Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa #TorontoKafka @confluentinc

Slide 18

Slide 18

Talk is cheap! Show me code! https://cnfl.io/streams-movie-demo

Slide 19

Slide 19

As developers, we want to build APPS not INFRASTRUCTURE @gamussa #TorontoKafka @confluentinc

Slide 20

Slide 20

@

Slide 21

Slide 21

the KAFKA STREAMS API is a JAVA API to BUILD REAL-TIME APPLICATIONS @gamussa #TorontoKafka @confluentinc

Slide 22

Slide 22

App Streams API @gamussa #TorontoKafka Not running inside brokers! @confluentinc

Slide 23

Slide 23

Same app, many instances @gamussa App App App Streams API Streams API Streams API #TorontoKafka Brokers? Nope! @confluentinc

Slide 24

Slide 24

Before Processing Cluster Shared Database Dashboard Your Job @gamussa #TorontoKafka @confluentinc

Slide 25

Slide 25

After Dashboard APP Streams API @gamussa #TorontoKafka @confluentinc

Slide 26

Slide 26

this means you can DEPLOY your app ANYWHERE using WHATEVER TECHNOLOGY YOU WANT

Slide 27

Slide 27

So many places to run you app! ...and many more... @gamussa #TorontoKafka @confluentinc

Slide 28

Slide 28

Things Kafka Stream Does Enterprise Support Open Source Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing @gamussa Runs Everywhere Supports Streams and Tables Elastic, Scalable, Fault-tolerant Exactly-Once Processing #TorontoKafka Kafka Security Integration Event-Time Processing @confluentinc

Slide 29

Slide 29

Table-Stream Duality @gamussa #TorontoKafka @confluentinc

Slide 30

Slide 30

Slide 31

Slide 31

TABLE Gwen STREAM 1 Gwen 1 (“Matthias”, 1) Gwen Matthias 1 1 (“Gwen”, 2) Gwen Matthias 2 1 (“Viktor”, 1) Gwen Matthias Viktor 2 1 1 (“Gwen”, 1) Gwen Matthias Gwen Matthias Gwen Matthias Viktor TABLE 1 1 2 1 2 1 1 @gamussa #TorontoKafka @confluentinc

Slide 32

Slide 32

Do you think that’s a table you are querying ?

Slide 33

Slide 33

Talk is cheap! Show me code!

Slide 34

Slide 34

What’s next?

Slide 35

Slide 35

https://twitter.com/IDispose/status/1048602857191170054

Slide 36

Slide 36

KSQL #FTW ksql> 1 UI 2 @gamussa POST /query CLI 3 REST #TorontoKafka 4 Headless @confluentinc

Slide 37

Slide 37

Interaction with Kafka KSQL (processing) Kafka JVM application with Kafka Streams (processing) (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers #TorontoKafka @confluentinc

Slide 38

Slide 38

Fault-Tolerance, powered by Kafka @gamussa #TorontoKafka @confluentinc

Slide 39

Slide 39

Standing on the shoulders of Streaming Giants KSQL Ease of use Powered by KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs @gamussa Flexibility #TorontoKafka @confluentinc

Slide 40

Slide 40

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa #TorontoKafka @ @confluentinc