Apache Kafka and KSQL in Action: Let’s Build a Streaming Data Pipeline!

A presentation at Jfokus in February 2019 in Stockholm, Sweden by Viktor Gamov

Slide 1

Slide 1

@gamussa #jfokus @confluentinc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 2

Slide 2

@gamussa #Jfokus @confluentinc

Slide 3

Slide 3

@gamussa #Jfokus @confluentinc

Slide 4

Slide 4

Message Bus order events New App <x> @gamussa #Jfokus @confluentinc

Slide 5

Slide 5

Stream Processing order events New App <x> Stream Processing @gamussa #Jfokus @confluentinc

Slide 6

Slide 6

Data Enrichment order events C D C RDBMS <y> customer New App <x> Stream Processing @gamussa #Jfokus @confluentinc

Slide 7

Slide 7

Transform Once, Use Many order events customer orders C D C RDBMS <y> customer New App <x> Stream Processing @gamussa #Jfokus @confluentinc

Slide 8

Slide 8

Evolve processing from old systems to new Existing App C D C RDBMS @gamussa #Jfokus @confluentinc

Slide 9

Slide 9

Evolve processing from old systems to new New App <x> Existing App C D C RDBMS Stream Processing @gamussa #Jfokus @confluentinc

Slide 10

Slide 10

Evolve processing from old systems to new New App <x> Existing App New App <y> C D C RDBMS Stream Processing @gamussa #Jfokus @confluentinc

Slide 11

Slide 11

Evolve processing from old systems to new New App <x> Existing App New App <y> C D C RDBMS Stream Processing @gamussa #Jfokus @confluentinc

Slide 12

Slide 12

Push notification to Slack Rating events Operational Dashboard User data Join events to users, and filter Data Lake @gamussa #Jfokus @confluentinc

Slide 13

Slide 13

Rating events App Pro d uc e rA PI User data RDBMS Join events to users, and filter I P A r e m u s n o C Push notification to Slack App Operational Elasticsearch Dashboard Data Lake S3/HDFS/ SnowflakeDB etc @gamussa #Jfokus @confluentinc

Slide 14

Slide 14

Rating events App uc e rA PI a k f a K t c e n n o C App Kafka Connect a fk t Ka ec n RDBMS u s n o C Push notification to Slack Operational Elasticsearch Dashboard n Co User data Pro d I P A r e m Join events to users, and filter Data Lake S3/HDFS/ SnowflakeDB etc @gamussa #Jfokus @confluentinc

Slide 15

Slide 15

Streaming Integration with Kafka Connect syslog flat file CSV JSON Sources MQTT Tasks Workers Kafka Connect Kafka Brokers @gamussa #Jfokus @confluentinc

Slide 16

Slide 16

Streaming Integration with Kafka Connect Amazon S3 Sinks MQT Tasks Kafka Connect Workers Kafka Brokers @gamussa #Jfokus @confluentinc

Slide 17

Slide 17

Streaming Integration with Kafka Connect Amazon S3 syslog flat file CSV JSON Sources Sinks MQT MQTT Tasks Workers Kafka Connect Kafka Brokers @gamussa #Jfokus @confluentinc

Slide 18

Slide 18

Confluent Hub •One-stop place to discover and download : •Connectors •Transformations •Converters hub.confluent.io @gamussa #Jfokus @confluentinc

Slide 19

Slide 19

Kafka Connect + Schema Registry = WIN Avro Schema Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect @gamussa #Jfokus @confluentinc

Slide 20

Slide 20

Kafka Connect +Schema Schema Registry = WIN Avro Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect @gamussa #Jfokus @confluentinc

Slide 21

Slide 21

Kafka Connect +Schema Schema Registry = WIN Avro Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect @gamussa #Jfokus @confluentinc

Slide 22

Slide 22

Kafka → Elasticsearch @gamussa #Jfokus @confluentinc

Slide 23

Slide 23

Stop! Demo time! Producer API MySQL t c e n n o C a k f Ka m u i z e b e D @gamussa #Jfokus @confluentinc

Slide 24

Slide 24

Rating events App uc e rA PI a k f a K t c e n n o C App Kafka Connect a fk t Ka ec n RDBMS u s n o C Push notification to Slack Operational Elasticsearch Dashboard n Co User data Pro d I P A r e m KSQL Join events to users, and filter Data Lake S3/HDFS/ SnowflakeDB etc @gamussa #Jfokus @confluentinc

Slide 25

Slide 25

Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa #Jfokus @confluentinc

Slide 26

Slide 26

Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa #Jfokus @confluentinc

Slide 27

Slide 27

KSQL #FTW @gamussa #Jfokus @confluentinc

Slide 28

Slide 28

KSQL #FTW 1 UI @gamussa #Jfokus @confluentinc

Slide 29

Slide 29

KSQL #FTW ksql> 1 UI 2 CLI @gamussa #Jfokus @confluentinc

Slide 30

Slide 30

KSQL #FTW ksql> 1 UI 2 POST /query CLI 3 REST @gamussa #Jfokus @confluentinc

Slide 31

Slide 31

KSQL #FTW ksql> 1 UI 2 POST /query CLI 3 REST @gamussa 4 #Jfokus Headless @confluentinc

Slide 32

Slide 32

Interactive KSQL for development and testing REST “Hmm, let me try out this idea…” @gamussa #Jfokus @confluentinc

Slide 33

Slide 33

Interactive KSQL for development and testing Headless KSQL for Production REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” @gamussa #Jfokus @confluentinc

Slide 34

Slide 34

Interaction with Kafka Kafka (data) @gamussa #Jfokus @confluentinc

Slide 35

Slide 35

Interaction with Kafka KSQL (processing) Kafka JVM application with Kafka Streams (processing) (data) Does not run on Kafka brokers Does not run on Kafka brokers @gamussa #Jfokus @confluentinc

Slide 36

Slide 36

Interaction with Kafka KSQL (processing) Kafka JVM application with Kafka Streams (processing) (data) Does not run on Kafka brokers Does not run on Kafka brokers @gamussa #Jfokus @confluentinc

Slide 37

Slide 37

Fault-Tolerance, powered by Kafka @gamussa #Jfokus @confluentinc

Slide 38

Slide 38

Fault-Tolerance, powered by Kafka @gamussa #Jfokus @confluentinc

Slide 39

Slide 39

Fault-Tolerance, powered by Kafka @gamussa #Jfokus @confluentinc

Slide 40

Slide 40

Standing on the shoulders of Streaming Giants @gamussa #Jfokus @confluentinc

Slide 41

Slide 41

Standing on the shoulders of Streaming Giants @gamussa #Jfokus @confluentinc

Slide 42

Slide 42

Standing on the shoulders of Streaming Giants Producer, Consumer APIs @gamussa #Jfokus @confluentinc

Slide 43

Slide 43

Standing on the shoulders of Streaming Giants Kafka Streams Producer, Consumer APIs @gamussa #Jfokus @confluentinc

Slide 44

Slide 44

Standing on the shoulders of Streaming Giants Kafka Streams Powered by Producer, Consumer APIs @gamussa #Jfokus @confluentinc

Slide 45

Slide 45

Standing on the shoulders of Streaming Giants KSQL KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs @gamussa #Jfokus @confluentinc

Slide 46

Slide 46

Standing on the shoulders of Streaming Giants KSQL Powered by KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs @gamussa #Jfokus @confluentinc

Slide 47

Slide 47

Standing on the shoulders of Streaming Giants KSQL Ease of use Powered by KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs Flexibility @gamussa #Jfokus @confluentinc

Slide 48

Slide 48

Demo time! Producer API MySQL Kafka Connect t c e n n o C a k f Ka m u i z e b e D Elasticsearch @gamussa #Jfokus @confluentinc

Slide 49

Slide 49

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” Filter all ratings where STARS<3 POOR_RATINGS } Producer API @gamussa #Jfokus @confluentinc

Slide 50

Slide 50

Do you think that’s a table you are querying ?

Slide 51

Slide 51

The Stream-Table Duality Table (balance) Stream (payments) time @gamussa #Jfokus @confluentinc

Slide 52

Slide 52

The Stream-Table Duality Table (balance) Stream (payments) Alice

  • €50 time @gamussa #Jfokus @confluentinc

Slide 53

Slide 53

The Stream-Table Duality Table Alice €50 Alice

  • €50 (balance) Stream (payments) time @gamussa #Jfokus @confluentinc

Slide 54

Slide 54

The Stream-Table Duality Table Alice €50 (balance) Stream (payments) Alice

  • €50 Alice €50 Bob €18 Bob
  • €18 time @gamussa #Jfokus @confluentinc

Slide 55

Slide 55

The Stream-Table Duality Table Alice €50 (balance) Stream (payments) Alice

  • €50 Alice €50 Alice €75 Bob €18 €18 Bob €18 Bob
  • €18 Alice
  • €25 time @gamussa #Jfokus @confluentinc

Slide 56

Slide 56

The Stream-Table Duality Table Alice €50 (balance) Stream (payments) Alice

  • €50 Alice €50 Alice €75 €75 Alice €15 Bob €18 €18 Bob €18 Bob €18 Bob
  • €18 Alice
  • €25 Alice – €60 time @gamussa #Jfokus @confluentinc

Slide 57

Slide 57

@gamussa #Jfokus @confluentinc

Slide 58

Slide 58

The truth is the log. The database is a cache of a subset of the log. @gamussa #Jfokus @confluentinc

Slide 59

Slide 59

The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf @gamussa #Jfokus @confluentinc

Slide 60

Slide 60

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA t c e n n o C a k f a K { } “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” @gamussa #Jfokus @confluentinc

Slide 61

Slide 61

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API t c e n n o C a k f a K { } “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA Filter for just PLATINUM customers UNHAPPY_PLATINUM_CUSTOMERS @gamussa #Jfokus @confluentinc

Slide 62

Slide 62

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) Join each rating to customer data FROM ProducerRATINGS_WITH_CUSTOMER_DATA API RATINGS_WITH_CUSTOMER_DATA WINDOW TUMBLING (SIZE 1 MINUTES) GROUP BY CLUB_STATUS; } t c e n n o C a k f a K { } “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” Aggregate per-minute by CLUB_STATUS RATINGS_BY_CLUB_STATUS_1MIN @gamussa #Jfokus @confluentinc

Slide 63

Slide 63

Resources and Next Steps https://github.com/confluentinc/demo-scene http://confluent.io https://slackpass.io/confluentcommunity #ksql #connect @gamussa #Jfokus @confluentinc

Slide 64

Slide 64

Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle @gamussa #Jfokus @confluentinc

Slide 65

Slide 65

@gamussa Thanks! #Jfokus @confluentinc @gamussa viktor@confluent.io @

Slide 66

Slide 66

One last thing…

Slide 67

Slide 67

https://kafka-summit.org Gamov30 @gamussa #Jfokus @confluentinc