@ Apache Kafka A Streaming Data Platform and #javapuzzlersng @gamussa @sfjava @confluentinc
A presentation at San Francisco JUG in May 2018 in San Francisco, CA, USA by Viktor Gamov
@ Apache Kafka A Streaming Data Platform and #javapuzzlersng @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc Who am I?
@ @gamussa @sfjava @confluentinc Solutions Architect Who am I?
@ @gamussa @sfjava @confluentinc Solutions Architect Developer Advocate Who am I?
@ @gamussa @sfjava @confluentinc Solutions Architect Developer Advocate @gamussa in internetz Who am I?
@
@gamussa @sfjava @confluentinc
Solutions Architect
Developer Advocate
@gamussa
in internetz
Hey you, yes, you,
go follow me in twitter ©
Who am I?
@ @gamussa @sfjava @confluentinc Kafka & Confluent
@ @gamussa @sfjava @confluentinc We are hiring! https://www.confluent.io/careers/
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc A company is build on
@ @gamussa @sfjava @confluentinc A company is build on DATA FLOWS but All we have is DATA STORES
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc Kafka is a Streaming Platform The Log Connectors Connectors Producer Consumer Streaming Engine
@
@gamussa @sfjava @confluentinc
Kafka
Serving
Layer
(Cassandra,
KV-storage etc.)
Kafka
Streams /
KSQL
Continuous
Computation
High Throughput
Messaging
API based
clustering
Origins in Stream Processing
@ @gamussa @sfjava @confluentinc authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; authorization_attempts possible_fraud What exactly is Stream Processing?
@ @gamussa @sfjava @confluentinc Streaming is the toolset for dealing with events as they move!
@ @gamussa @sfjava @confluentinc What is a Streaming Platform? The Log Connectors Connectors Producer Consumer Streaming Engine
@ @gamussa @sfjava @confluentinc Kafka’s Distributed Log The Log Connectors Connectors Producer Consumer Streaming Engine
@ @gamussa @sfjava @confluentinc The log is a type of durable messaging system
@ @gamussa @sfjava @confluentinc Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage The log is a type of durable messaging system
@
@gamussa @sfjava @confluentinc
The log is a simple idea
Messages are added
at the end of the log
Old
New
@ @gamussa @sfjava @confluentinc Consumers have a position all of their own Sally is here George is here Fred is here Old New Scan Scan Scan
@ @gamussa @sfjava @confluentinc Only Sequential Access Old New Read to offset & scan
@ @gamussa @sfjava @confluentinc Scaling Out
@ @gamussa @sfjava @confluentinc Shard data to get scalability Messages are sent to different partitions Producer (1) Producer (2) Producer (3) Cluster of machines Partitions live on different machines
@ @gamussa @sfjava @confluentinc Replicate to get fault tolerance replicate msg msg leader Machine A Machine B
@ @gamussa @sfjava @confluentinc Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
@ @gamussa @sfjava @confluentinc Replication provides resiliency A ‘replica’ takes over on machine failure
@ @gamussa @sfjava @confluentinc Partition Leadership and Replication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
@ @gamussa @sfjava @confluentinc Linearly Scalable Architecture Single topic:
@ @gamussa @sfjava @confluentinc Worldwide, localized views ! 33 NY London Tokyo Replicator Replicator Replicator
@ @gamussa @sfjava @confluentinc The Connect API The Log Connectors Connectors Producer Consumer Streaming Engine
@ @gamussa @sfjava @confluentinc Ingest / Egest into any data source Kafka
Connect Kafka
Connect
@ @gamussa @sfjava @confluentinc Ingest/Egest data from/to data sources Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain JMX
Kenesis MongoDB
MQTT
NATS
Postgres
Rabbit
Redis Twitter
DynamoDB FTP
Github BigQuery Google Pub Sub
RethinkDB Salesforce
Solr Splunk
@ @gamussa @sfjava @confluentinc Kafka Streams and KSQL The Log Connectors Connectors Producer Consumer Streaming Engine
@ @gamussa @sfjava @confluentinc SELECT card_number, count(*)
FROM authorization_attempts
WINDOW (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
Engine for Continuous Computation
@ @gamussa @sfjava @confluentinc But it’s just an API public static void main(String[] args) { StreamsBuilder builder = new StreamsBuilder(); builder.stream( "caterpillars" ) .map(StreamsApp !" coolTransformation) .to( "butterflies" );
new KafkaStreams(builder.build(), props()).start(); }
@ @gamussa @sfjava @confluentinc Compacted
Topic Join Stream Table Kafka Kafka Streams / KSQL Topic Join Streams and Tables
@ @gamussa @sfjava @confluentinc KAFKA Payments Orders Buffer 5 mins Emailer Windows / Retention – Handle Late Events In an asynchronous world, will the payment come first, or the order? Join by Key
@ @gamussa @sfjava @confluentinc Windows / Retention – Handle Late Events KAFKA Payments Orders Buffer 5 mins Emailer Join by Key KStream orders = builder.stream( "Orders" ); KStream payments = builder.stream( "Payments" ); orders.join(payments, KeyValue !" new , JoinWindows.of( 1
@ @gamussa @sfjava @confluentinc A KTable is just a stream with infinite retention KAFKA Emailer Orders, Payments Customers Join
@ @gamussa @sfjava @confluentinc A KTable is a stream with infinite retention KAFKA Emailer Orders, Payments Customers Join Materialize a table in two lines of code! KStream orders = builder.stream( "Orders" ); KStream payments = builder.stream( "Payments" ); KTable customers = builder.table( "Customers" ); orders.join(payments, EmailTuple !" new , JoinWindows.of( 1 *MIN)) .join(customers, (tuple, cust) !# tuple.setCust(cust)) .peek((key, tuple) !# emailer.sendMail(tuple));
@ @gamussa @sfjava @confluentinc The Log Connectors Connectors Producer Consumer Streaming Engine Kafka is a complete Streaming Platform
@ @gamussa @sfjava @confluentinc The Log Connectors Connectors Producer Consumer Streaming Engine Kafka is a complete Streaming Platform
@ @gamussa @sfjava @confluentinc https://www.confluent.io/download/
@ @gamussa @sfjava @confluentinc We are hiring! https://www.confluent.io/careers/
@ @gamussa @sfjava @confluentinc One more thing …
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc
@ @gamussa @sfjava @confluentinc A Major New Paradigm
@ @gamussa @sfjava @confluentinc Thanks ! Stay for #javapuzzlersng!!! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/