@gamussa
#jfokus
@confluentinc
Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!
Slide 2
@gamussa
#Jfokus
@confluentinc
Slide 3
@gamussa
#Jfokus
@confluentinc
Slide 4
Message Bus order events
New App <x> @gamussa
#Jfokus
@confluentinc
Slide 5
Stream Processing order events
New App <x>
Stream Processing
@gamussa
#Jfokus
@confluentinc
Slide 6
Data Enrichment order events
C D C RDBMS <y>
customer
New App <x>
Stream Processing
@gamussa
#Jfokus
@confluentinc
Slide 7
Transform Once, Use Many order events
customer orders
C D C RDBMS <y>
customer
New App <x>
Stream Processing
@gamussa
#Jfokus
@confluentinc
Slide 8
Evolve processing from old systems to new Existing App
C D C RDBMS
@gamussa
#Jfokus
@confluentinc
Slide 9
Evolve processing from old systems to new New App <x>
Existing App
C D C RDBMS Stream Processing
@gamussa
#Jfokus
@confluentinc
Slide 10
Evolve processing from old systems to new New App <x>
Existing App
New App <y>
C D C RDBMS Stream Processing
@gamussa
#Jfokus
@confluentinc
Slide 11
Evolve processing from old systems to new New App <x>
Existing App
New App <y>
C D C RDBMS Stream Processing
@gamussa
#Jfokus
@confluentinc
Slide 12
Push notification to Slack
Rating events
Operational Dashboard
User data Join events to users, and filter
Data Lake
@gamussa
#Jfokus
@confluentinc
Slide 13
Rating events
App
Pro d
uc e rA PI
User data
RDBMS
Join events to users, and filter
I P A r e m
u s n o C
Push notification to Slack
App Operational Elasticsearch Dashboard
Data Lake
S3/HDFS/ SnowflakeDB etc
@gamussa
#Jfokus
@confluentinc
Slide 14
Rating events
App
uc e rA PI
a k f a K t c e n n o C
App
Kafka Connect
a fk t Ka ec n
RDBMS
u s n o C
Push notification to Slack
Operational Elasticsearch Dashboard
n Co
User data
Pro d
I P A r e m
Join events to users, and filter
Data Lake
S3/HDFS/ SnowflakeDB etc
@gamussa
#Jfokus
@confluentinc
Stop! Demo time! Producer API
MySQL
t c e n n o C a k f Ka m u i z e b e D
@gamussa
#Jfokus
@confluentinc
Slide 24
Rating events
App
uc e rA PI
a k f a K t c e n n o C
App
Kafka Connect
a fk t Ka ec n
RDBMS
u s n o C
Push notification to Slack
Operational Elasticsearch Dashboard
n Co
User data
Pro d
I P A r e m
KSQL
Join events to users, and filter
Data Lake
S3/HDFS/ SnowflakeDB etc
@gamussa
#Jfokus
@confluentinc
Slide 25
Coding Sophistication
Lower the bar to enter the world of streaming Core developers who use Java/Scala
streams Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
User Population @gamussa
#Jfokus
@confluentinc
Slide 26
Coding Sophistication
Lower the bar to enter the world of streaming Core developers who use Java/Scala
streams Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
User Population @gamussa
#Jfokus
@confluentinc
Interactive KSQL for development and testing
REST
“Hmm, let me try out this idea…” @gamussa
#Jfokus
@confluentinc
Slide 33
Interactive KSQL for development and testing
Headless KSQL for Production
REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” @gamussa
#Jfokus
@confluentinc
Slide 34
Interaction with Kafka Kafka (data)
@gamussa
#Jfokus
@confluentinc
Slide 35
Interaction with Kafka KSQL (processing)
Kafka
JVM application with Kafka Streams (processing)
(data)
Does not run on Kafka brokers
Does not run on Kafka brokers
@gamussa
#Jfokus
@confluentinc
Slide 36
Interaction with Kafka KSQL (processing)
Kafka
JVM application with Kafka Streams (processing)
(data)
Does not run on Kafka brokers
Does not run on Kafka brokers
@gamussa
#Jfokus
@confluentinc
Slide 37
Fault-Tolerance, powered by Kafka
@gamussa
#Jfokus
@confluentinc
Slide 38
Fault-Tolerance, powered by Kafka
@gamussa
#Jfokus
@confluentinc
Slide 39
Fault-Tolerance, powered by Kafka
@gamussa
#Jfokus
@confluentinc
Slide 40
Standing on the shoulders of Streaming Giants
@gamussa
#Jfokus
@confluentinc
Slide 41
Standing on the shoulders of Streaming Giants
@gamussa
#Jfokus
@confluentinc
Slide 42
Standing on the shoulders of Streaming Giants
Producer, Consumer APIs @gamussa
#Jfokus
@confluentinc
Slide 43
Standing on the shoulders of Streaming Giants
Kafka Streams Producer, Consumer APIs @gamussa
#Jfokus
@confluentinc
Slide 44
Standing on the shoulders of Streaming Giants
Kafka Streams Powered by
Producer, Consumer APIs @gamussa
#Jfokus
@confluentinc
Slide 45
Standing on the shoulders of Streaming Giants KSQL
KSQL UDFs
Kafka Streams Powered by
Producer, Consumer APIs @gamussa
#Jfokus
@confluentinc
Slide 46
Standing on the shoulders of Streaming Giants KSQL
Powered by
KSQL UDFs
Kafka Streams Powered by
Producer, Consumer APIs @gamussa
#Jfokus
@confluentinc
Slide 47
Standing on the shoulders of Streaming Giants KSQL
Ease of use
Powered by
KSQL UDFs
Kafka Streams Powered by
Producer, Consumer APIs
Flexibility @gamussa
#Jfokus
@confluentinc
Slide 48
Demo time! Producer API
MySQL
Kafka Connect
t c e n n o C a k f Ka m u i z e b e D
Elasticsearch
@gamussa
#Jfokus
@confluentinc
Slide 49
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
Filter all ratings where STARS<3
POOR_RATINGS
}
Producer API
@gamussa
#Jfokus
@confluentinc
Slide 50
Do you think that’s a table you are querying ?
Slide 51
The Stream-Table Duality Table (balance)
Stream (payments)
time
@gamussa
#Jfokus
@confluentinc
Slide 52
The Stream-Table Duality Table (balance)
Stream (payments)
Alice
€50 time
@gamussa
#Jfokus
@confluentinc
Slide 53
The Stream-Table Duality Table
Alice
€50
Alice
€50
(balance)
Stream (payments)
time
@gamussa
#Jfokus
@confluentinc
Slide 54
The Stream-Table Duality Table
Alice
€50
(balance)
Stream (payments)
Alice
€50
Alice
€50
Bob
€18
Bob
€18 time
@gamussa
#Jfokus
@confluentinc
Slide 55
The Stream-Table Duality Table
Alice
€50
(balance)
Stream (payments)
Alice
€50
Alice
€50
Alice
€75
Bob
€18 €18
Bob
€18
Bob
€18
Alice
€25 time
@gamussa
#Jfokus
@confluentinc
Slide 56
The Stream-Table Duality Table
Alice
€50
(balance)
Stream (payments)
Alice
€50
Alice
€50
Alice
€75 €75
Alice
€15
Bob
€18 €18
Bob
€18
Bob
€18
Bob
€18
Alice
€25
Alice
– €60 time
@gamussa
#Jfokus
@confluentinc
Slide 57
@gamussa
#Jfokus
@confluentinc
Slide 58
The truth is the log. The database is a cache of a subset of the log.
@gamussa
#Jfokus
@confluentinc
Slide 59
The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
@gamussa
#Jfokus
@confluentinc
Slide 60
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
}
Producer API
Join each rating to customer data
RATINGS_WITH_CUSTOMER_DATA
t c e n n o C a k f a K {
}
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
@gamussa
#Jfokus
@confluentinc
Slide 61
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
}
Producer API
t c e n n o C a k f a K {
}
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
Join each rating to customer data
RATINGS_WITH_CUSTOMER_DATA
Filter for just PLATINUM customers
UNHAPPY_PLATINUM_CUSTOMERS
@gamussa
#Jfokus
@confluentinc
Slide 62
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) Join each rating to customer data FROM ProducerRATINGS_WITH_CUSTOMER_DATA API RATINGS_WITH_CUSTOMER_DATA WINDOW TUMBLING (SIZE 1 MINUTES) GROUP BY CLUB_STATUS;
}
t c e n n o C a k f a K {
}
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
Aggregate per-minute by CLUB_STATUS
RATINGS_BY_CLUB_STATUS_1MIN @gamussa #Jfokus @confluentinc
Slide 63
Resources and Next Steps https://github.com/confluentinc/demo-scene http://confluent.io https://slackpass.io/confluentcommunity #ksql #connect @gamussa
#Jfokus
@confluentinc