Don’t despair… “… not even over the fact that you don't despair. Just when everything seems over with, new forces come marching up, and precisely that means that you are alive”
Franz Kafka @gamussa
#BOSDataDay @
@confluentinc
Slide 7
Kafka Streaming Architecture Fundamentals
Slide 8
@gamussa
#BOSDataDay @
@confluentinc
Slide 9
@gamussa
#BOSDataDay @
@confluentinc
Slide 10
Shard data to get scalability Producer (1)
Producer (2)
Producer (3)
Messages are sent to different partitions Cluster of machines
Partitions live on different machines
@gamussa
#BOSDataDay
@confluentinc
Slide 11
Linearly Scalable Architecture Producers
Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!!
Consumers @gamussa
#BOSDataDay
@confluentinc
Slide 12
Replicate to get fault tolerance leader
msg
Machine A @gamussa
Machine B replicate
msg
#BOSDataDay
@confluentinc
Slide 13
Replication provides resiliency
A ‘replica’ takes over on machine failure @gamussa
#BOSDataDay
@confluentinc
Kafka deployment checklist PVC for Storage
Uses ZK Headless Svc
StatefulSet for 3-node zk
PVC for Storage
Optional Pod Anti-Affinity to spread the ZK ensemble across nodes
StatefulSet for n-node Kafka
Headless Service
A group of NodePort Services for external traffic
ConfigMap for Prometheus JMX exporter
ConfigMap for Prometheus JMX exporter
@gamussa
#BOSDataDay @
@confluentinc
Kubernetes Operator Embedded with operational knowledge of both data software and Kubernetes Backup/restore Scale up/down Rebalance data Regular health checks @gamussa
#BOSDataDay @
@confluentinc
Slide 24
Controller
Brain behind Kubernetes resources e.g. replication controller, namespace controller etc. @gamussa
#BOSDataDay @
@confluentinc
Custom Resource Definition(CRD) Usually works together
Custom Controller
API
StatefulSet
ReplicaSet
...
CRD
Controller
StatefulSet Controller
ReplicaSet Controller
...
Custom Controller
ReplicaSet
...
Custom Resource
Instance
@gamussa
#BOSDataDay @
StatefulSet
@confluentinc
Slide 27
Custom Resource Definition(CRD) Users can create and access Customer Resources with kubectl, just as they do for built-in
API
StatefulSet
ReplicaSet
...
CRD
Controller
StatefulSet Controller
ReplicaSet Controller
...
Custom Controller
ReplicaSet
...
Custom Resource
Instance
StatefulSet
resources like pods. @gamussa
#BOSDataDay @
@confluentinc
Slide 28
Operator Deploy and Manage your production streaming platform with Confluent Operator. Automated Provisioning Platform Operations Resiliency Monitoring @gamussa
#BOSDataDay @
@confluentinc
Slide 29
Confluent Platform Reference Architecture Each Confluent Platform component has specific characteristics: Security (SSL certificates) DNS names and zones
Application
Application
Application
Native Client library
Kafka Streams
Load Balancer * Schema Registry
REST Proxy
Kafka Connect
Host selection Fault tolerance
Kafka Brokers
Scaling
@gamussa
#BOSDataDay @
@confluentinc
Zookeeper Nodes
Slide 30
Confluent Operator: Automated Provisioning
Load Balancer
Kafka Pod
Kafka Pod
Kafka Pod
Storage
@gamussa
#BOSDataDay @
@confluentinc
Slide 31
Confluent Operator: Scale Horizontally Automate scaling: Spin up new broker pod(s) Distribute partitions to the new broker(s) Determine balancing plan Execute balancing plan Monitor resources @gamussa
#BOSDataDay @
@confluentinc
Slide 32
Confluent Operator: Rolling Upgrade Automated rolling upgrade with no downtime for Kafka. Stop broker Wait for leader election to complete Start broker with new version Wait for zero under-replicatedpartitions Repeat
@gamussa
#BOSDataDay @
@confluentinc
Slide 33
Will it fly? Let’s see
@gamussa
#BOSDataDay
@confluentinc
Slide 34
Confluent Operator Automate provisioning
Scale your Kafkas and CP clusters elastically
Monitor SLAs through Confluent Control Center or Prometheus
Operate at scale with enterprise support from Confluent
@gamussa
#BOSDataDay @
@confluentinc
Slide 35
Advanced use cases vs.
@gamussa
#BOSDataDay
@confluentinc