One Does Not Simply Query a Stream

A presentation at Chicago Elastic Meetup in May 2026 in Chicago, IL, USA by Viktor Gamov

Slide 1

Slide 1

One Does Not Simply Query a Stream A landscape guide to querying your Kafka data

Slide 2

Slide 2

Before We Start I’m Viktor Gamov Developer Advocate at Confluent Co-author of Kafka in Action (Manning) Java Champion X / Bluesky: @gamussa Slides + video: speaking.gamov.io @gamussa | Chicago Elastic Meetup 4 / 34

Slide 3

Slide 3

Slides + Video scan for slides, speaker notes, and the recording speaking.gamov.io

Slide 4

Slide 4

@gamussa | Chicago Elastic Meetup 5 / 34

Slide 5

Slide 5

Simpler Times Once upon a time, you had a monolith. One application. One database. One SQL query. SELECT * FROM orders WHERE status = ‘pending’; Life was good. You could query anything. Then someone said “microservices” — and it all went sideways. @gamussa | Chicago Elastic Meetup 6 / 34

Slide 6

Slide 6

All Roads Lead to Kafka Your data is already in Kafka topics. Events flow through topics in real time. Data is immutable — once written, it’s written. Kafka is an append-only log, not a database. So how do you query it? You can’t just SELECT * FROM kafka_topic. (Well… actually… we’ll get to that.) @gamussa | Chicago Elastic Meetup 7 / 34

Slide 7

Slide 7

@gamussa | Chicago Elastic Meetup 8 / 34

Slide 8

Slide 8

@gamussa | Chicago Elastic Meetup 14 / 34

Slide 9

Slide 9

OLTP vs OLAP FRAMING This is the most important slide in this talk. OLTP — Transactional queries Get me order #12345 What’s the current balance? Point lookups by key Low latency, single record | | | | | OLAP — Analytical queries How many orders this hour? What’s the average basket size? Aggregations across millions Higher latency, many records Every solution we’ll see optimizes for one of these. Keep this in your head. @gamussa | Chicago Elastic Meetup 9 / 34

Slide 10

Slide 10

Our Options Tonight 1. 2. 3. 4. 5. 6. 7. 8. TABLE OF CONTENTS Kafka Connect + Relational Database Kafka Streams (embedded querying) Streaming SQL databases Real-Time OLAP databases Elasticsearch Cloud Data Warehouses Data Lakes + Table Formats Tableflow (the new kid) I will not give you a definitive answer. There are no right solutions. Only trade-offs. @gamussa | Chicago Elastic Meetup 10 / 34

Slide 11

Slide 11

@gamussa | Chicago Elastic Meetup 11 / 34

Slide 12

Slide 12

When Connect + RDBMS Works + + + + You already know SQL Familiar tooling (pgAdmin, DBeaver, all that jazz) Great for smaller datasets OLTP-friendly — point lookups by key

Not real-time — there’s a lag Doesn’t scale to millions of events/sec You’re maintaining another database If this works for you? @gamussa | Chicago Elastic Meetup It’s fine. REPORT CARD It’s totally fine. 12 / 34

Slide 13

Slide 13

@gamussa | Chicago Elastic Meetup 13 / 34

Slide 14

Slide 14

When Kafka Streams Works + + + + No external database needed Embedded in your Java/Kotlin app Interactive Queries for OLTP lookups Exactly-once processing

It’s a library, not a service — you manage deployment Analytical queries (OLAP) are limited Scaling = scaling your app instances REPORT CARD Congratulations, you built your own database. @gamussa | Chicago Elastic Meetup 15 / 34

Slide 15

Slide 15

@gamussa | Chicago Elastic Meetup 16 / 34

Slide 16

Slide 16

When Streaming SQL Works + + + SQL interface — familiar Continuous materialized views — OLAP on streams No custom code needed

Another service to deploy and manage Scaling characteristics vary wildly between products Community support is… developing REPORT CARD Sidebar: Flink has SQL too — but Flink is a processing framework, not a database. Different tool, different job. @gamussa | Chicago Elastic Meetup 17 / 34

Slide 17

Slide 17

@gamussa | Chicago Elastic Meetup 18 / 34

Slide 18

Slide 18

When Real-Time OLAP Works + + + Millisecond query latency at massive scale Built for concurrent analytical queries Kafka is a first-class data source

Specialized — dedicated OLAP cluster Complex operational overhead Schema management can be… interesting @gamussa | Chicago Elastic Meetup REPORT CARD 19 / 34

Slide 19

Slide 19

@gamussa | Chicago Elastic Meetup 20 / 34

Slide 20

Slide 20

When Elasticsearch Works + + + + Full-text search — something none of the others do well Kibana for dashboards out of the box Kafka Connect sink is battle-tested Hybrid: point lookups (OLTP-ish) + aggregations (OLAP-ish)

Not a streaming engine — destination, not processor Schema mapping can get tricky with nested Avro/JSON Cluster sizing and shard management at scale REPORT CARD But you all know this already. That’s why you’re here tonight. @gamussa | Chicago Elastic Meetup 21 / 34

Slide 21

Slide 21

OPTION 6 — CLOUD DATA WAREHOUSES FIG. 12

Massive scale, managed service SQL interface everyone knows Kafka connectors available

Batch-oriented — even “streaming” modes have latency Expensive at high volume Structured data bias — semi-structured gets messy Good for analytics. @gamussa | Chicago Elastic Meetup Not great for real-time. 22 / 34

Slide 22

Slide 22

@gamussa | Chicago Elastic Meetup 23 / 34

Slide 23

Slide 23

@gamussa | Chicago Elastic Meetup 24 / 34

Slide 24

Slide 24

@gamussa | Chicago Elastic Meetup 25 / 34

Slide 25

Slide 25

@gamussa | Chicago Elastic Meetup 26 / 34

Slide 26

Slide 26

The Decision Framework OLTP (point lookups) -> Kafka Streams Connect + RDBMS OLAP (analytics) -> Real-Time OLAP Streaming SQL Search + hybrid -> Elasticsearch Batch analytics -> Data Lake Cloud DWH Tableflow No ideal solutions. @gamussa | Chicago Elastic Meetup DECISION KEY Only trade-offs. 27 / 34

Slide 27

Slide 27

Three Things to Consider 1. Familiarity Sometimes you go with what you know. That’s NOT a bad thing. 2. Performance If consumer lag keeps you up at night, look at Pinot or StarRocks. 3. Community When you’re choosing, think about where you can go ask questions. @gamussa | Chicago Elastic Meetup THREE CONSIDERATIONS 28 / 34

Slide 28

Slide 28

The Community Thing I picked Pascal to learn — not because it was the best language, but because there was a guy in the neighborhood who could help me. That’s why Kafka won. Not because it’s perfect — because people at meetups like this one eat pizza and help each other figure it out. @gamussa | Chicago Elastic Meetup 29 / 34

Slide 29

Slide 29

What to Try This Week 1. Already using Kafka? Try Interactive Queries with Kafka Streams. Want SQL on streams? Spin up RisingWave or Materialize locally. Need analytics at scale? Look at Pinot + Kafka connector. On Confluent Cloud? Enable Tableflow on a topic and query with DuckDB. Already using Elasticsearch? Try the Kafka Connect ES sink. Come talk to me — I’ll be around after. Pizza first. 2. 3. 4. 5. 6. @gamussa ACTION ITEMS | Chicago Elastic Meetup 30 / 34

Slide 30

Slide 30

Resources Slides + video Book Confluent dev Streaming Frontiers @gamussa | Chicago Elastic Meetup REFERENCES speaking.gamov.io Kafka in Action (Manning) developer.confluent.io my live-stream series 31 / 34

Slide 31

Slide 31

AS ALWAYS, HAVE A NICE DAY