Streams Must Flow: fault-tolerant stream processing apps on Kubernetes June, 2019 / Montreal, CA @gamussa | #ImCloudNative | @ConfluentINc

2 @gamussa | #ImCloudNative | @ConfluentINc

3 Special thanks! @gwenshap @gamussa @MatthiasJSax | #ImCloudNative | @ConfluentINc

4 Agenda Kafka Streams 101 How do Kafka Streams applications scale? Kubernetes 101 Recommendations for Kafka Streams @gamussa | #ImCloudNative | @ConfluentINc

https://gamov.dev/streams-k8s @gamussa | #ImCloudNative | @ConfluentINc

6 Kafka Streams – 101 Your App @gamussa | #ImCloudNative | @ConfluentINc Other Systems Kafka Connect Kafka Connect Other Systems Kafka Streams

7 Stock Trade Stats Example KStream<String, Trade> source = builder.stream(STOCK_TOPIC); KStream<Windowed<String>, TradeStats> stats = source .groupByKey() .windowedBy(TimeWindows.of(5000).advanceBy(1000)) .aggregate(TradeStats::new, (k, v, tradestats) -> tradestats.add(v), Materialized.<~>as(“trade-aggregates”) .withValueSerde(new TradeStatsSerde())) .toStream() .mapValues(TradeStats::computeAvgPrice); stats.to(STATS_OUT_TOPIC, Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))); @gamussa | #ImCloudNative | @ConfluentINc

8 Stock Trade Stats Example KStream<String, Trade> source = builder.stream(STOCK_TOPIC); KStream<Windowed<String>, TradeStats> stats = source .groupByKey() .windowedBy(TimeWindows.of(5000).advanceBy(1000)) .aggregate(TradeStats::new, (k, v, tradestats) -> tradestats.add(v), Materialized.<~>as(“trade-aggregates”) .withValueSerde(new TradeStatsSerde())) .toStream() .mapValues(TradeStats::computeAvgPrice); stats.to(STATS_OUT_TOPIC, Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))); @gamussa | #ImCloudNative | @ConfluentINc

9 Stock Trade Stats Example KStream<String, Trade> source = builder.stream(STOCK_TOPIC); KStream<Windowed<String>, TradeStats> stats = source .groupByKey() .windowedBy(TimeWindows.of(5000).advanceBy(1000)) .aggregate(TradeStats::new, (k, v, tradestats) -> tradestats.add(v), Materialized.<~>as(“trade-aggregates”) .withValueSerde(new TradeStatsSerde())) .toStream() .mapValues(TradeStats::computeAvgPrice); stats.to(STATS_OUT_TOPIC, Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))); @gamussa | #ImCloudNative | @ConfluentINc

10 Stock Trade Stats Example KStream<String, Trade> source = builder.stream(STOCK_TOPIC); KStream<Windowed<String>, TradeStats> stats = source .groupByKey() .windowedBy(TimeWindows.of(5000).advanceBy(1000)) .aggregate(TradeStats::new, (k, v, tradestats) -> tradestats.add(v), Materialized.<~>as(“trade-aggregates”) .withValueSerde(new TradeStatsSerde())) .toStream() .mapValues(TradeStats::computeAvgPrice); stats.to(STATS_OUT_TOPIC, Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))); @gamussa | #ImCloudNative | @ConfluentINc

11 Topologies builder.stream Source Node state stores source.groupByKey .windowedBy(…) .aggregate(…) Processor Node mapValues() Processor Node streams Sink Node to(…) @gamussa | #ImCloudNative | @ConfluentINc Processor Topology

How Do Kafka Streams Application Scale? @gamussa | #ImCloudNative | @ConfluentINc

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

14 Partitions, Tasks, and Consumer Groups input topic Task executes processor topology One consumer group: can be executed with 1 - 4 threads on 1 - 4 machines 4 input topic partitions => 4 tasks result topic @gamussa | #ImCloudNative | @ConfluentINc

15 Scaling with State “no state” My Java App Instance 1 @gamussa | #ImCloudNative | @ConfluentINc

Scaling with State “no state” My Java App My Java App Instance 1 @gamussa Instance 2 | #ImCloudNative | @ConfluentINc 16

17 Scaling with State “no state” My Java App My Java App Instance 1 @gamussa My Java App Instance 2 | #ImCloudNative | Instance 3 @ConfluentINc

18 Scaling and FaultTolerance Two Sides of Same Coin @gamussa | #ImCloudNative | @ConfluentINc

19 Fault-Tolerance My Java App My Java App My Java App Instance 1 Instance 2 Instance 3 @gamussa | #ImCloudNative | @ConfluentINc

20 Fault-Tolerant State State Updates Input Topic Changelog Topic Result Topic @gamussa | #ImCloudNative | @ConfluentINc

21 Migrate State My Java App Instance 1 Trade My Java StatsApp App Instance 2 restore Changelog Topic @gamussa | #ImCloudNative | @ConfluentINc

22 Recovery Time • Changelog topics are log compacted • Size of changelog topic linear in size of state • Large state implies high recovery times @gamussa | #ImCloudNative | @ConfluentINc

Recommendations for Kafka Streams @gamussa | #ImCloudNative | @ConfluentINc

24 My Java App My Java App My Java App Kafka Streams Kafka Streams Kafka Streams Instance 1 Instance 2 Instance 3 @gamussa | #ImCloudNative | @ConfluentINc

25 My Java App My Java App My Java App Kafka Streams Kafka Streams Kafka Streams Instance 1 Instance 2 Instance 3 @gamussa | #ImCloudNative | @ConfluentINc

26 StatefulSets are new and complicated. We don’t need them. @gamussa | #ImCloudNative | @ConfluentINc

27 Recovering state takes time. Statelful is faster. @gamussa | #ImCloudNative | @ConfluentINc

28 But I’ll want to scale-out and back anyway. @gamussa | #ImCloudNative | @ConfluentINc

29 @gamussa | #ImCloudNative | @ConfluentINc

30 I don’t really trust my storage admin anyway @gamussa | #ImCloudNative | @ConfluentINc

31 Recommendations ● Keep changelog shards small ● If you trust your storage: Use StatefulSets ● Use anti-affinity when possible ● Use “parallel” pod management

32 🛑 Stop! Demo time! @gamussa | #ImCloudNative | @ConfluentINc

33 Summary Kafka Streams has recoverable state, that gives streams apps easy elasticity and high availability Kubernetes makes it easy to scale applications It also has StatefulSets for applications with state @gamussa | #ImCloudNative | @ConfluentINc

34 Summary Now you know how to deploy Kafka Streams on Kubernetes and take advantage on all the scalability and highavailability capabilities @gamussa | #ImCloudNative | @ConfluentINc

35 Resources and Next Steps https://cnfl.io/helm_video https://cnfl.io/cp-helm https://cnfl.io/k8s https://slackpass.io/confluentcommunity #kubernetes @gamussa | #ImCloudNative | @ConfluentINc

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa | @ #ImCloudNative | @ConfluentINc

37