Who’s Tweeting about #DataSciCon #DataSciCon @gamussa

KSQL is a Declarative Stream Processing Language @gamussa #DataSciCon @confluentinc

KSQL is the Streaming SQL Engine for Apache Kafka @gamussa #DataSciCon @confluentinc

Stream Processing by Analogy Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt Kafka Cluster @gamussa #DataSciCon @confluentinc

Kafka is a Streaming Platform Producer Consumer The Log Connectors Connectors Streaming Engine @gamussa #DataSciCon @ @confluentinc

Streaming is the toolset for dealing with events as they move! @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Table-Stream Duality @gamussa #DataSciCon @confluentinc

Do you think that’s a table you are querying ?

Streams to Tables @gamussa #DataSciCon @ @confluentinc

@gamussa #DataSciCon @ @confluentinc

Stream/Table Duality @gamussa #DataSciCon @ @confluentinc

Stream/Table Duality @gamussa #DataSciCon @ @confluentinc

TABLE Gwen STREAM 1 Gwen 1 (“Matthias”, 1) Gwen Matthias 1 1 (“Gwen”, 2) Gwen Matthias 2 1 (“Viktor”, 1) Gwen Matthias Viktor 2 1 1 (“Gwen”, 1) Gwen Matthias Gwen Matthias Gwen Matthias Viktor TABLE 1 1 2 1 2 1 1 @gamussa #DataSciCon @confluentinc

Demo @gamussa #DataSciCon @confluentinc

@gamussa #DataSciCon @confluentinc

Where is KSQL not such a great fit? Ad-hoc queries •Limited span of time usually retained in Kafka •No indexes @gamussa BI reports (Tableau etc.) •No indexes •No JDBC (most BI tools are not good with continuous results!) #DataSciCon @confluentinc

Resources and Next Steps https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql @gamussa #DataSciCon @confluentinc

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa #DataSciCon @ @confluentinc