One Does Not Simply Query a Stream

A presentation at Elastic New York City User Group in June 2026 in New York, NY, USA by Viktor Gamov

Slide 1

Slide 1

One Does Not Simply Query a Stream A landscape guide to querying your Kafka data

Slide 2

Slide 2

Before We Start I’m Viktor Gamov Developer Advocate at Confluent Co-author of Kafka in Action (Manning) Java Champion X / Bluesky: @gamussa Slides + video: speaking.gamov.io @gamussa | NYC Elastic Meetup 4 / 34

Slide 3

Slide 3

Slides + Video scan for slides, speaker notes, and the recording speaking.gamov.io

Slide 4

Slide 4

@gamussa | NYC Elastic Meetup 5 / 34

Slide 5

Slide 5

Simpler Times Once upon a time, you had a monolith. One application. One database. One SQL query. SELECT * FROM orders WHERE status = ‘pending’; Life was good. You could query anything. Then someone said “microservices” — and it all went sideways. @gamussa | NYC Elastic Meetup 6 / 34

Slide 6

Slide 6

All Roads Lead to Kafka Your data is already in Kafka topics. Events flow through topics in real time. Data is immutable — once written, it’s written. Kafka is an append-only log, not a database. So how do you query it? You can’t just SELECT * FROM kafka_topic. (Well… actually… we’ll get to that.) @gamussa | NYC Elastic Meetup 7 / 34

Slide 7

Slide 7

@gamussa | NYC Elastic Meetup 8 / 34

Slide 8

Slide 8

@gamussa | NYC Elastic Meetup 14 / 34

Slide 9

Slide 9

OLTP vs OLAP FRAMING This is the most important slide in this talk. OLTP — Transactional queries Get me order #12345 What’s the current balance? Point lookups by key Low latency, single record | | | | | OLAP — Analytical queries How many orders this hour? What’s the average basket size? Aggregations across millions Higher latency, many records Every solution we’ll see optimizes for one of these. Keep this in your head. @gamussa | NYC Elastic Meetup 9 / 34

Slide 10

Slide 10

Our Options Tonight 1. 2. 3. 4. 5. 6. 7. 8. TABLE OF CONTENTS Kafka Connect + Relational Database Kafka Streams (embedded querying) Streaming SQL databases Real-Time OLAP databases Elasticsearch Cloud Data Warehouses Data Lakes + Table Formats Tableflow (the new kid) I will not give you a definitive answer. There are no right solutions. Only trade-offs. @gamussa | NYC Elastic Meetup 10 / 34

Slide 11

Slide 11

@gamussa | NYC Elastic Meetup 11 / 34

Slide 12

Slide 12

When Connect + RDBMS Works + + + + You already know SQL Familiar tooling (pgAdmin, DBeaver, all that jazz) Great for smaller datasets OLTP-friendly — point lookups by key

Not real-time — there’s a lag Doesn’t scale to millions of events/sec You’re maintaining another database If this works for you? @gamussa | NYC Elastic Meetup It’s fine. REPORT CARD It’s totally fine. 12 / 34

Slide 13

Slide 13

@gamussa | NYC Elastic Meetup 13 / 34

Slide 14

Slide 14

When Kafka Streams Works + + + + No external database needed Embedded in your Java/Kotlin app Interactive Queries for OLTP lookups Exactly-once processing

It’s a library, not a service — you manage deployment Analytical queries (OLAP) are limited Scaling = scaling your app instances REPORT CARD Congratulations, you built your own database. @gamussa | NYC Elastic Meetup 15 / 34

Slide 15

Slide 15

@gamussa | NYC Elastic Meetup 16 / 34

Slide 16

Slide 16

When Streaming SQL Works + + + SQL interface — familiar Continuous materialized views — OLAP on streams No custom code needed

Another service to deploy and manage Scaling characteristics vary wildly between products Community support is… developing REPORT CARD Sidebar: Flink has SQL too — but Flink is a processing framework, not a database. Different tool, different job. @gamussa | NYC Elastic Meetup 17 / 34

Slide 17

Slide 17

@gamussa | NYC Elastic Meetup 18 / 34

Slide 18

Slide 18

When Real-Time OLAP Works + + + Millisecond query latency at massive scale Built for concurrent analytical queries Kafka is a first-class data source

Specialized — dedicated OLAP cluster Complex operational overhead Schema management can be… interesting @gamussa | NYC Elastic Meetup REPORT CARD 19 / 34

Slide 19

Slide 19

@gamussa | NYC Elastic Meetup 20 / 34

Slide 20

Slide 20

When Elasticsearch Works + + + + Full-text search — something none of the others do well Kibana for dashboards out of the box Kafka Connect sink is battle-tested Hybrid: point lookups (OLTP-ish) + aggregations (OLAP-ish)

Not a streaming engine — destination, not processor Schema mapping can get tricky with nested Avro/JSON Cluster sizing and shard management at scale REPORT CARD But you all know this already. That’s why you’re here tonight. @gamussa | NYC Elastic Meetup 21 / 34

Slide 21

Slide 21

OPTION 6 — CLOUD DATA WAREHOUSES FIG. 12

Massive scale, managed service SQL interface everyone knows Kafka connectors available

Batch-oriented — even “streaming” modes have latency Expensive at high volume Structured data bias — semi-structured gets messy Good for analytics. @gamussa | NYC Elastic Meetup Not great for real-time. 22 / 34

Slide 22

Slide 22

@gamussa | NYC Elastic Meetup 23 / 34

Slide 23

Slide 23

@gamussa | NYC Elastic Meetup 24 / 34

Slide 24

Slide 24

@gamussa | NYC Elastic Meetup 25 / 34

Slide 25

Slide 25

@gamussa | NYC Elastic Meetup 26 / 34

Slide 26

Slide 26

The Decision Framework OLTP (point lookups) -> Kafka Streams Connect + RDBMS OLAP (analytics) -> Real-Time OLAP Streaming SQL Search + hybrid -> Elasticsearch Batch analytics -> Data Lake Cloud DWH Tableflow No ideal solutions. @gamussa | NYC Elastic Meetup DECISION KEY Only trade-offs. 27 / 34

Slide 27

Slide 27

Three Things to Consider 1. Familiarity Sometimes you go with what you know. That’s NOT a bad thing. 2. Performance If consumer lag keeps you up at night, look at Pinot or StarRocks. 3. Community When you’re choosing, think about where you can go ask questions. @gamussa | NYC Elastic Meetup THREE CONSIDERATIONS 28 / 34

Slide 28

Slide 28

The Community Thing I picked Pascal to learn — not because it was the best language, but because there was a guy in the neighborhood who could help me. That’s why Kafka won. Not because it’s perfect — because people at meetups like this one eat pizza and help each other figure it out. @gamussa | NYC Elastic Meetup 29 / 34

Slide 29

Slide 29

What to Try This Week 1. Already using Kafka? Try Interactive Queries with Kafka Streams. Want SQL on streams? Spin up RisingWave or Materialize locally. Need analytics at scale? Look at Pinot + Kafka connector. On Confluent Cloud? Enable Tableflow on a topic and query with DuckDB. Already using Elasticsearch? Try the Kafka Connect ES sink. Come talk to me — I’ll be around after. Pizza first. 2. 3. 4. 5. 6. @gamussa ACTION ITEMS | NYC Elastic Meetup 30 / 34

Slide 30

Slide 30

Resources Slides + video Book Confluent dev Streaming Frontiers @gamussa | NYC Elastic Meetup REFERENCES speaking.gamov.io Kafka in Action (Manning) developer.confluent.io my live-stream series 31 / 34

Slide 31

Slide 31

AS ALWAYS, HAVE A NICE DAY