Kafka Streams и KSQL: как перестать строить кластера и начать обрабатывать стримы

A presentation at St. Petersburg Kafka Meetup in October 2018 in St Petersburg, Russia by Viktor Gamov

Slide 1

Slide 1

Kafka Streams и KSQL: как перестать строить кластера и начать обрабатывать стримы ST. PETERSBURG KAFKA MEETUP, OCTOBER 2018

Slide 2

Slide 2

@gamussa #KafkaSPB @confluentinc

Slide 3

Slide 3

Java Apps / Kafka Streams Serving Layer (Cassandra, Elastic, etc.) High Throughput Continuous Streaming platform Computation @gamussa #KafkaSPB @ API based clustering @confluentinc

Slide 4

Slide 4

What is a Streaming Platform? Producer Connectors Consumer The Log Connectors Streaming Engine @gamussa #KafkaSPB @confluentinc

Slide 5

Slide 5

Kafka’s Distributed Log Producer Connectors Consumer The Log Connectors Streaming Engine @gamussa #KafkaSPB @confluentinc

Slide 6

Slide 6

The log is a simple idea New Old Messages are added at the end of the log @gamussa #KafkaSPB @confluentinc

Slide 7

Slide 7

Consumers have a position all of their own George is here Scan New Old Fred is here @gamussa Sally is here Scan #KafkaSPB Scan @confluentinc

Slide 8

Slide 8

Only Sequential Access Old Read to offset & scan @gamussa #KafkaSPB New @confluentinc

Slide 9

Slide 9

Shard data to get scalability Producer (1) Producer (2) Producer (3) Messages are sent to different partitions Cluster of machines Partitions live on different machines @gamussa #KafkaSPB @confluentinc

Slide 10

Slide 10

Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa #KafkaSPB @confluentinc

Slide 11

Slide 11

Replicate to get fault tolerance leader msg Machine A @gamussa Machine B replicate #KafkaSPB msg @confluentinc

Slide 12

Slide 12

Partition Leadership and Replication Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 @gamussa Topic1 partition4 Broker 3 #KafkaSPB Broker 4 Leader Follower @confluentinc

Slide 13

Slide 13

Replication provides resiliency A ‘replica’ takes over on machine failure @gamussa #KafkaSPB @confluentinc

Slide 14

Slide 14

Partition Leadership and Replication - node failure Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 @gamussa Topic1 partition4 Broker 3 #KafkaSPB Broker 4 Leader Follower @confluentinc

Slide 15

Slide 15

CONSUMERS CONSUMER GROUP CONSUMER GROUP COORDINATOR

Slide 16

Slide 16

Talk is cheap! Show me code! https://cnfl.io/streams-movie

Slide 17

Slide 17

The Connect API Producer The Log Connectors Consumer Connectors Streaming Engine @gamussa #KafkaSPB @confluentinc

Slide 18

Slide 18

Ingest / Output to practically any data source Kafka Connect @gamussa Kafka #KafkaSPB Kafka Connect @confluentinc

Slide 19

Slide 19

Ingest/Output from/to many data sources Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain @gamussa DynamoDB FTP Github BigQuery Google Pub Sub RethinkDB Salesforce Solr Splunk #KafkaSPB JMX Kenesis MongoDB MQTT NATS Postgres Rabbit Redis Twitter @confluentinc

Slide 20

Slide 20

Stream Processing in Kafka Producer Connectors Consumer The Log Connectors Streaming Engine @gamussa #KafkaSPB @confluentinc

Slide 21

Slide 21

Stream Processing by Analogy Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt Kafka Cluster @gamussa #KafkaSPB @confluentinc

Slide 22

Slide 22

Streaming is the toolset for dealing with events as they move! @gamussa #KafkaSPB @confluentinc

Slide 23

Slide 23

App Streams API @gamussa #KafkaSPB Not running inside brokers! @confluentinc

Slide 24

Slide 24

Same app, many instances @gamussa App App App Streams API Streams API Streams API #KafkaSPB Brokers? Nope! @confluentinc

Slide 25

Slide 25

Before Processing Cluster Shared Database Your Job @gamussa #KafkaSPB @confluentinc Dashboard

Slide 26

Slide 26

As developers, we want to build APPS not INFRASTRUCTURE @gamussa #KafkaSPB @confluentinc

Slide 27

Slide 27

After Dashboard APP Streams API @gamussa #KafkaSPB @confluentinc

Slide 28

Slide 28

Things Kafka Stream Does Enterprise Support Open Source Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing @gamussa Runs Everywhere Supports Streams and Tables Elastic, Scalable, Fault-tolerant Exactly-Once Processing #KafkaSPB Kafka Security Integration Event-Time Processing @confluentinc

Slide 29

Slide 29

Slide 30

Slide 30

Table-Stream Duality @gamussa #KafkaSPB @confluentinc

Slide 31

Slide 31

Do you think that’s a table you are querying ?

Slide 32

Slide 32

TABLE Gary Gary Viktor Gary Viktor Gary Viktor Soby STREAM TABLE 1 (“Gary”, 1) Gary 1 (“Viktor”, 1) Gary Viktor 1 1 (“Gary”, 2) Gary Viktor 2 1 Gary Viktor Soby 2 1 1 1 1 2 1 2 1 1 (“Soby”, 1) @gamussa #KafkaSPB @confluentinc

Slide 33

Slide 33

Join Streams and Tables Kafka Topic Kafka Streams Stream Join Table Compacted Topic @gamussa #KafkaSPB @confluentinc

Slide 34

Slide 34

Talk is cheap! Show me code!

Slide 35

Slide 35

What’s next?

Slide 36

Slide 36

Shoulders of Streaming Giants CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … KSQL UDFs KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … subscribe(), poll(), send(), flush(), beginTransaction(), … @gamussa #KafkaSPB @confluentinc

Slide 37

Slide 37

Interaction with Kafka KSQL (processing) Kafka JVM application with Kafka Streams (processing) (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers #KafkaSPB @confluentinc

Slide 38

Slide 38

Fault-Tolerance, powered by Kafka (here: KSQL) @gamussa #KafkaSPB @confluentinc

Slide 39

Slide 39

Differences KSQL streams You write... KSQL statements JVM applications UI included for human interaction Yes, in Confluent Enterprise No CLI included for human interaction Yes No Data formats Avro, JSON, CSV (today) Any data format, including Avro, JSON, CSV, Protobuf, XML REST API included Yes No, but you can DIY Runtime included Yes, the KSQL server Not needed, applications run as standard JVM processes Queryable state Not yet Yes @gamussa #KafkaSPB @confluentinc

Slide 40

Slide 40

One more thing… @gamussa @ @ATLspring @confluentinc

Slide 41

Slide 41

@gamussa @ @ATLspring @confluentinc

Slide 42

Slide 42

@gamussa @ @ATLspring @confluentinc

Slide 43

Slide 43

A Major New Paradigm @gamussa @ @ATLspring @confluentinc

Slide 44

Slide 44

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa #KafkaSPB @ @confluentinc

Slide 45

Slide 45

https://t.me/AwesomeKafka_ru https://t.me/proKafka