8
New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time
@gamussa
|
#NDCSydney
|
@ConfluentINc
14
The log is a simple idea New
Old
Messages are added at the end of the log
@gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 15
15
The log is a simple idea New
Old
Messages are added at the end of the log
@gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 16
16
Pub / Sub @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 17
17
Time
@gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 18
18
Time
C1 @gamussa
|
C2
#NDCSydney
C3 |
@ConfluentINc
Slide 19
19
Time A
B hash(key) % numPartitions = N
C D @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 20
20
Time
Messages will be produced in a round robin fashion
@gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 21
21
Consumers have a position all of their own
Ricardo is here
Scan
New
Old
Robin is here
Scan
Viktor is here
@gamussa
|
Scan
#NDCSydney
|
@ConfluentINc
Slide 22
22
Consumers have a position all of their own
Ricardo is here
Scan
New
Old
Robin is here @gamussa
Viktor is here
Scan
|
#NDCSydney
|
Scan
@ConfluentINc
Slide 23
23
Consumers have a position all of their own
Ricardo is here
Scan
New
Old
Robin is here @gamussa
|
Viktor is here
Scan
#NDCSydney
|
@ConfluentINc
Scan
Slide 24
24
Only Sequential Access
Old
Read to offset & scan
@gamussa
|
#NDCSydney
|
@ConfluentINc
New
Slide 25
CONSUMERS
CONSUMER GROUP COORDINATOR
CONSUMER GROUP
Slide 26
26
C
@gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 27
27
CC C1
CC C2 @gamussa
|
#NDCSydney
|
@ConfluentINc
32
Linearly Scalable Architecture Producers
Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!!
Consumers @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 33
33
From/to other systems: Kafka Connect
and more
Tip: Great option to gradually move workloads to Kafka while keeping production running!
Slide 34
34
Kafka Connect ● Deployed standalone (development) or as a distributed cluster (production) ● Elastic service that works on bare-metal, VMs, containers, Kubernetes, … ● The individual ‘Connector’ determines delivery guarantees, e.g., exactly-once
VM
VM
Slide 35
35
Single Message Transforms for real-time ETL Ingress: modify an Event before storing
Egress: modify an Event on its way out
● Obfuscate sensitive information, e.g. PII
● Route high-priority events to faster stores
● Add origin of event for lineage tracking ● Remove unnecessary data fields
● Direct events to different Elasticsearch indexes ● Cast data types to match destination
● … and more
● … and more
{ user: ab123, gender: female, ip: 1.2.3.95 }
{ user: ab123, ip: 1.2.3.XXX }
Slide 36
36
Replicate to get fault tolerance leader
msg Machine B
Machine A
@gamussa
replicate |
#NDCSydney
|
msg
@ConfluentINc
40
The log is a type of durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage
43
Streaming is the toolset for dealing with events as they move! @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 44
44
What exactly is Stream Processing? authorization_attempts
@gamussa
possible_fraud
|
#NDCSydney
|
@ConfluentINc
Slide 45
45
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 46
46
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 47
47
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 48
48
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 49
49
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 50
50
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#NDCSydney
|
@ConfluentINc
Slide 51
51
Coding Sophistication
Lower the bar to enter the world of streaming Core developers who use Java/Scala
streams Core developers who don’t use Java/Scala, e.g. .NET, Go
Data engineers, architects, DevOps/SRE
BI analysts
User Population @gamussa
|
#NDCSydney
|
@ConfluentINc
53
Interaction with Kafka KSQL
Application
Kafka
(processing)
(processing) Java/KStreams, .NET
(data)
Does not run on Kafka brokers @gamussa
Does not run on Kafka brokers |
#NDCSydney
|
@ConfluentINc
Slide 54
54
Find your local Meetup Group https://cnfl.io/kafka-meetups Grab Stream Processing books https://cnfl.io/book-bundle Join us in Slack http://cnfl.io/slack @gamussa
|
#NDCSydney
|
@ConfluentINc