@gamussa
#Postgres
@confluentinc
Streaming ETL in Practice: Build Data Pipelines without a single line of code!
Slide 2
@gamussa @gamussa
#Postgres
@confluentinc
Slide 3
Special Thanks!
@rmoff @gamussa
#Postgres
@confluentinc
Slide 4
Raffle, yeah 🚀 must follow @gamussa @confluentinc 📸🖼🐘 Tag @gamussa With #peoplepostgresdata @gamussa
#Postgres
@confluentinc
Slide 5
@gamussa
#Postgres
@confluentinc
Slide 6
Apache Kafka is an event streaming platform @gamussa
#Postgres
@confluentinc
Slide 7
But what is An event streaming platform? @gamussa
#Postgres
@confluentinc
Slide 8
A bit of a mess… App
App
App
App
cache
monitoring
cache
MQ
DWH
security
MQ
search
Hadoop
@gamussa
#Postgres
@confluentinc
Slide 9
Streams are changing all of this App
App
App
App
request-response
changelogs App
KAFKA
App
App App
DWH
messaging OR stream processing
Hadoop streaming data pipelines @gamussa
#Postgres
@confluentinc
Slide 10
Message Bus order events
New App <x> @gamussa
#Postgres
@confluentinc
Slide 11
Stream Processing order events
New App <x>
Stream Processing
@gamussa
#Postgres
@confluentinc
Slide 12
Data Enrichment order events
C D C Postgres
customer
New App <x>
Stream Processing
@gamussa
#Postgres
@confluentinc
Slide 13
Transform Once, Use Many order events
customer orders
C D C Postgres
customer
New App <x>
Stream Processing
@gamussa
#Postgres
@confluentinc
Slide 14
Evolve processing from old systems to new New App <x>
Existing App
C D C Postgres Stream Processing
@gamussa
#Postgres
@confluentinc
Slide 15
Evolve processing from old systems to new New App <x>
Existing App
New App <y>
C D C Postgres Stream Processing
@gamussa
#Postgres
@confluentinc
Slide 16
Evolve processing from old systems to new New App <x>
Existing App
New App <y>
C D C Postgres Stream Processing
@gamussa
#Postgres
@confluentinc
Slide 17
Push notification to Slack
Rating events
Operational Dashboard
User data Join events to users, and filter
Data Lake
@gamussa
#Postgres
@confluentinc
Slide 18
Rating events
App
Pro d
uc e rA PI
User data
Postgres
Join events to users, and filter
I P A r e m
u s n o C
Push notification to Slack
App Operational Elasticsearch Dashboard
Data Lake
S3/HDFS/ SnowflakeDB etc
@gamussa
#Postgres
@confluentinc
Slide 19
Rating events
App
uc e rA PI
a k f a K t c e n n o C
Kafka Connect
a fk t Ka ec n
Postgres
u s n o C
Push notification to Slack
App Operational Elasticsearch Dashboard
n Co
User data
Pro d
I P A r e m
Join events to users, and filter
Data Lake
S3/HDFS/ SnowflakeDB etc
@gamussa
#Postgres
@confluentinc
Confluent Hub •One-stop place to discover and download : •Connectors •Transformations •Converters
hub.confluent.io @gamussa
#Postgres
@confluentinc
Slide 25
Stop! Demo time! Producer API
Postgres
t c e n n o C a k f Ka m u i z e b e D
@gamussa
#Postgres
@confluentinc
Slide 26
Rating events
App
uc e rA PI
a k f a K t c e n n o C
Kafka Connect
a fk t Ka ec n
RDBMS
u s n o C
Push notification to Slack
App Operational Elasticsearch Dashboard
n Co
User data
Pro d
I P A r e m
KSQL
Join events to users, and filter
Data Lake
S3/HDFS/ SnowflakeDB etc
@gamussa
#Postgres
@confluentinc
Slide 27
Coding Sophistication
Lower the bar to enter the world of streaming Core developers who use Java/Scala
streams Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
User Population @gamussa
#Postgres
@confluentinc
Interactive KSQL for development and testing
Headless KSQL for Production
REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” @gamussa
#Postgres
@confluentinc
Slide 30
Interaction with Kafka KSQL (processing)
Kafka
JVM application with Kafka Streams (processing)
(data)
Does not run on Kafka brokers
Does not run on Kafka brokers
@gamussa
#Postgres
@confluentinc
Slide 31
Fault-Tolerance, powered by Kafka
@gamussa
#Postgres
@confluentinc
Slide 32
Standing on the shoulders of Streaming Giants KSQL
Ease of use
Powered by
KSQL UDFs
Kafka Streams Powered by
Producer, Consumer APIs
Flexibility @gamussa
#Postgres
@confluentinc
Slide 33
{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
Demo Time!
}
Producer API Postgres
t c e n n o C a k f Ka
{
Kafka Connect Kafk a
Elasticsearch
Con nec t
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
Postgres
}
@gamussa
#Postgres
@confluentinc
Slide 34
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
Filter all ratings where STARS<3
POOR_RATINGS
}
Producer API
@gamussa
#Postgres
@confluentinc
Slide 35
Do you think that’s a table you are querying ?
Slide 36
The Stream-Table Duality Table
Alice
€50
(balance)
Stream (payments)
Alice
€50
Alice
€50
Alice
€75 €75
Alice
€15
Bob
€18 €18
Bob
€18
Bob
€18
Bob
€18
Alice
€25
Alice
– €60 time
@gamussa
#Postgres
@confluentinc
Slide 37
The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
@gamussa
#Postgres
@confluentinc
Slide 38
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
}
Producer API
Join each rating to customer data
RATINGS_WITH_CUSTOMER_DATA
t c e n n o C a k f a K {
}
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
@gamussa
#Postgres
@confluentinc
Slide 39
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
}
Producer API
t c e n n o C a k f a K {
}
Join each rating to customer data
RATINGS_WITH_CUSTOMER_DATA
Filter for just PLATINUM customers
UNHAPPY_PLATINUM_CUSTOMERS
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
@gamussa
#Postgres
@confluentinc
Slide 40
{
“rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain”
CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) Join each rating to customer data FROM ProducerRATINGS_WITH_CUSTOMER_DATA API RATINGS_WITH_CUSTOMER_DATA WINDOW TUMBLING (SIZE 1 MINUTES) GROUP BY CLUB_STATUS;
}
t c e n n o C a k f a K {
}
“id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “mdoughartie1@dedecms.com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none”
Aggregate per-minute by CLUB_STATUS
RATINGS_BY_CLUB_STATUS_1MIN @gamussa
#Postgres
@confluentinc
Slide 41
Resources and Next Steps https://github.com/confluentinc/examples http://confluent.io https://slackpass.io/confluentcommunity #ksql #connect @gamussa
#Postgres
@confluentinc