One Does Not Simply Query a Stream

A presentation at NYC Apache Iceberg™ Community Meetup in July 2025 in New York, NY, USA by Viktor Gamov

Slide 1

Slide 1

One Does Not Simply Query a Stream! Viktor Gamov, Confluent @gamussa NYC Iceberg Meetup July 10, 2025 @gamussa | @confluentinc | @apacheiceberg

Slide 2

Slide 2

@gamussa | @confluentinc | @apacheiceberg

Slide 3

Slide 3

Viktor GAMOV Principal Developer Advocate | Confluent Kafka in Action | Co-Author Java Champion THE CLOUD CONNECTIVITY COMPANY ` Kong Confidential

Slide 4

Slide 4

Simpler times Monolith @gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 5

Slide 5

Simpler analytics ETL and CDC @gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 6

Slide 6

Data Pipelines Streaming data pipelines and Microservices @gamussa | gamov.dev/rel | @ConfluentInc

Slide 7

Slide 7

LOG @gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 8

Slide 8

OLTP stream vs OLAP vs. OLTP in Streams OLAP streams @gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 9

Slide 9

• Connect/Relational DB Our Options • Streaming SQL • Real-Time OLAP • Data Warehouse/ Data Lake • Tableflow @gamussa | @confluentinc | @apacheiceberg

Slide 10

Slide 10

Kafka Connect @gamussa | @confluentinc | @apacheiceberg

Slide 11

Slide 11

` Connect/RDBMS • Suitable for smaller data • Transactional • Familiar to users @gamussa | @confluentinc | @apacheiceberg

Slide 12

Slide 12

Connect/RDBMS Broker Broker Broker Cluster Data Source Kafka Connect Kafka Connect @gamussa | @confluentinc | @apacheiceberg Data Sink

Slide 13

Slide 13

@gamussa | @confluentinc | @apacheiceberg

Slide 14

Slide 14

Streaming SQLs @gamussa | @confluentinc | @apacheiceberg

Slide 15

Slide 15

Streaming Database • SQL for Queries • Streaming Source is 1st class citizen • Persistence / Storage @gamussa | @confluentinc | @apacheiceberg

Slide 16

Slide 16

Streaming SQL • ksqlDB • Materialize • RisingWave • TimePlus @gamussa | @confluentinc | @apacheiceberg

Slide 17

Slide 17

But Viktor, Flink has SQL Why not Flink? @gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 18

Slide 18

@gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 19

Slide 19

ksqlDB • «Streaming Database» • Provides persistent TABLE abstraction • Pull and Push queries • Like Kafka Streams, but in SQL @gamussa | @confluentinc | @apacheiceberg

Slide 20

Slide 20

Materialize • Replacement data warehouse • Integrates with Kafka, Postgres, dbt • The Materialized View is the central abstraction • Views are persistent and queryable • Postgres wire-compatible • Positioned as an analytics solution @gamussa | @confluentinc | @apacheiceberg

Slide 21

Slide 21

Rising Wave • Distributed SQL Streaming database • Cloud and OSS versions • Implementation of Flink in Rust • Kafka, Pulsar, Kinesis integrations • Flink+persistent views • Postgres wire-compatible @gamussa | @confluentinc | @apacheiceberg

Slide 22

Slide 22

@gamussa | @confluentinc | @apacheiceberg

Slide 23

Slide 23

Real-Time Analytics Database

Slide 24

Slide 24

Real-Time OLAP • Designed for high concurrency, low latency queries • Ingests from streaming and batch sources • Intimate integration with Kafka • Conventional tables and SQL @gamussa | @confluentinc | @apacheiceberg

Slide 25

Slide 25

Real-Time OLAP • Analytics shaped like real-time data • Analytics when users are decision makers @gamussa | @confluentinc | @apacheiceberg

Slide 26

Slide 26

Cloud Data Warehouses

Slide 27

Slide 27

Cloud Data Warehouses • The cloud-based heir of legacy DWH • Ingest from batch and streaming sources • Biased towards structured data and batch access

Slide 28

Slide 28

Data Lake @gamussa | @confluentinc | @apacheiceberg

Slide 29

Slide 29

Data Lake Anything else We’ll figure this out @gamussa | @confluentinc | @apacheiceberg

Slide 30

Slide 30

Data Lakes • Storage and compute are radically decoupled • Structure is relatively less important • Reads are slow • Streaming is historically difficult @gamussa | @confluentinc | @apacheiceberg

Slide 31

Slide 31

Data Lakes • Started as the HDFS cluster • Became S3 • That didn’t help… • ELT vs. ETL • Iceberg/Hudi/DeltaLake @gamussa | @confluentinc | @apacheiceberg

Slide 32

Slide 32

Iceberg @gamussa | @confluentinc | @apacheiceberg

Slide 33

Slide 33

Tableflow @gamussa | @confluentinc | @apacheiceberg

Slide 34

Slide 34

@gamussa | @confluentinc | @apacheiceberg

Slide 35

Slide 35

@gamussa | @confluentinc | @apacheiceberg

Slide 36

Slide 36

 Skip Paywall Sign Up for Confluent Cloud Get $400 worth free credits for your first 30 Days Use Promo Code POPTOUT000MZG62 to skip the paywall! 36

Slide 37

Slide 37

No Solutions Technology Selection only Trade Offs @gamussa || @confluentinc gamov.dev/rel | | @apacheiceberg @ConfluentInc @gamussa

Slide 38

Slide 38

Sometimes you go with what you know @gamussa | @confluentinc | @apacheiceberg

Slide 39

Slide 39

This is not bad! @gamussa | @confluentinc | @apacheiceberg

Slide 40

Slide 40

Performance Performance @gamussa | @confluentinc | @apacheiceberg

Slide 41

Slide 41

Community/Adoption Community @gamussa | @confluentinc | @apacheiceberg

Slide 42

Slide 42

This is not bad! @gamussa | @confluentinc | @apacheiceberg

Slide 43

Slide 43

Slides and Video https://speaking.gamov.io/ X/Bluesky: @gamussa