Origins in Stream Processing Kafka Streams / KSQL
Kafka
Serving Layer (Cassandra, KV-storage etc.)
High Throughput Messaging @gamussa
@
@NYJavaSig
Continuous Computation
API based clustering
@confluentinc
Slide 11
Kafka is a Streaming Platform Producer
Connectors
Consumer
The Log
Connectors
Streaming Engine @gamussa
@
@NYJavaSig
@confluentinc
Slide 12
Streaming
@gamussa
@
@NYJavaSig
@confluentinc
Slide 13
What exactly is Stream Processing?
authorization_attempts
@gamussa
possible_fraud
@
@NYJavaSig
@confluentinc
Slide 14
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
@
@NYJavaSig
@confluentinc
Slide 15
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
@
@NYJavaSig
@confluentinc
Slide 16
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
@
@NYJavaSig
@confluentinc
Slide 17
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
@
@NYJavaSig
@confluentinc
Slide 18
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
@
@NYJavaSig
@confluentinc
Slide 19
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
@
@NYJavaSig
@confluentinc
Slide 20
What is a Streaming Platform? Producer Consumer
Connectors
Connectors
The Log
Streaming Engine @gamussa
@
@NYJavaSig
@confluentinc
The log - durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage @gamussa
@
@NYJavaSig
@confluentinc
Slide 23
The log is a simple idea
New
Old
Messages are added at the end of the log @gamussa
@
@NYJavaSig
@confluentinc
Slide 24
Consumers have a position all of their own
George is here
Scan
New
Old Fred is here @gamussa
Sally is here
Scan
@
@NYJavaSig
@confluentinc
Scan
Slide 25
Only Sequential Access
Old
Read to offset & scan
@gamussa
@
@NYJavaSig
@confluentinc
New
Slide 26
Shard data to get scalability Messages are sent to different partitions
Producer (1) Producer (2) Producer (3)
Cluster of machines
Partitions live on different machines @gamussa
@
@NYJavaSig
@confluentinc
Slide 27
Replicate to get fault tolerance
leader
msg
Machine A
@gamussa
Machine B replicate
@
@NYJavaSig
msg @confluentinc
Slide 28
Replication provides resiliency
A ‘replica’ takes over on machine failure @gamussa
@
@NYJavaSig
@confluentinc
Slide 29
Linearly Scalable Architecture Producers
Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!!
Consumers @gamussa
@
@NYJavaSig
@confluentinc
Slide 30
Worldwide, localized views
London Replicator
Replicator
Tokyo
NY Replicator
@gamussa
@
@NYJavaSig
@confluentinc
!30
Slide 31
The Connect API Producer
Connectors
Consumer Connectors
The Log
Streaming Engine @gamussa
@
@NYJavaSig
@confluentinc
Slide 32
Ingest / Egest into any data source
Kafka Connect
@gamussa
Kafka Connect
@
@NYJavaSig
@confluentinc
Slide 33
Ingest/Egest data from/to data sources Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain
JMX Kenesis MongoDB MQTT NATS Postgres Rabbit Redis Twitter Bintray
DynamoDB FTP Github BigQuery Google Pub Sub RethinkDB Salesforce Solr Splunk
@gamussa
@
@NYJavaSig
@confluentinc
Slide 34
Kafka Streams and KSQL Producer
Connectors
Consumer
Connectors
The Log
Streaming Engine @gamussa
@
@NYJavaSig
@confluentinc
Slide 35
@gamussa
@
@NYJavaSig
@confluentinc
Slide 36
@gamussa
@
@NYJavaSig
@confluentinc
Slide 37
Before
@gamussa
@
@NYJavaSig
@confluentinc
Slide 38
After
@gamussa
@
@NYJavaSig
@confluentinc
Slide 39
Things Kafka Streams Does Runs everywhere
Clustering done for you
Integrated database
Exactly-once processing
Joins, windowing, aggregation @gamussa
@
@NYJavaSig
Event-time processing
S/M/L/XL/XXL/XXXL sizes @confluentinc
Slide 40
Slide 41
Streams to Tables
@gamussa
@
@NYJavaSig
@confluentinc