Who's tweeting about #datascicon

A presentation at DataSciCon in November 2018 in Atlanta, GA, USA by Viktor Gamov

Slide 1

Slide 1

Who’s Tweeting about #DataSciCon #DataSciCon @gamussa

Slide 2

Slide 2

KSQL is a Declarative Stream Processing Language @gamussa #DataSciCon @confluentinc

Slide 3

Slide 3

KSQL is the Streaming SQL Engine for Apache Kafka @gamussa #DataSciCon @confluentinc

Slide 4

Slide 4

Stream Processing by Analogy Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt Kafka Cluster @gamussa #DataSciCon @confluentinc

Slide 5

Slide 5

Kafka is a Streaming Platform Producer Consumer The Log Connectors Connectors Streaming Engine @gamussa #DataSciCon @ @confluentinc

Slide 6

Slide 6

Streaming is the toolset for dealing with events as they move! @gamussa #DataSciCon @ @confluentinc

Slide 7

Slide 7

What exactly is Stream Processing? possible_fraud authorization_attempts @gamussa #DataSciCon @ @confluentinc

Slide 8

Slide 8

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Slide 9

Slide 9

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Slide 10

Slide 10

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Slide 11

Slide 11

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Slide 12

Slide 12

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Slide 13

Slide 13

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa #DataSciCon @ @confluentinc

Slide 14

Slide 14

Slide 15

Slide 15

Table-Stream Duality @gamussa #DataSciCon @confluentinc

Slide 16

Slide 16

Do you think that’s a table you are querying ?

Slide 17

Slide 17

Streams to Tables @gamussa #DataSciCon @ @confluentinc

Slide 18

Slide 18

@gamussa #DataSciCon @ @confluentinc

Slide 19

Slide 19

Stream/Table Duality @gamussa #DataSciCon @ @confluentinc

Slide 20

Slide 20

Stream/Table Duality @gamussa #DataSciCon @ @confluentinc

Slide 21

Slide 21

TABLE Gwen STREAM 1 Gwen 1 (“Matthias”, 1) Gwen Matthias 1 1 (“Gwen”, 2) Gwen Matthias 2 1 (“Viktor”, 1) Gwen Matthias Viktor 2 1 1 (“Gwen”, 1) Gwen Matthias Gwen Matthias Gwen Matthias Viktor TABLE 1 1 2 1 2 1 1 @gamussa #DataSciCon @confluentinc

Slide 22

Slide 22

Demo @gamussa #DataSciCon @confluentinc

Slide 23

Slide 23

@gamussa #DataSciCon @confluentinc

Slide 24

Slide 24

Where is KSQL not such a great fit? Ad-hoc queries •Limited span of time usually retained in Kafka •No indexes @gamussa BI reports (Tableau etc.) •No indexes •No JDBC (most BI tools are not good with continuous results!) #DataSciCon @confluentinc

Slide 25

Slide 25

Resources and Next Steps https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql @gamussa #DataSciCon @confluentinc

Slide 26

Slide 26

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa #DataSciCon @ @confluentinc