Who’s Tweeting about #DataSciCon #DataSciCon
@gamussa
Slide 2
KSQL is a
Declarative Stream Processing Language @gamussa
#DataSciCon
@confluentinc
Slide 3
KSQL is the
Streaming SQL Engine for Apache Kafka @gamussa
#DataSciCon
@confluentinc
Slide 4
Stream Processing by Analogy Connect API
Stream Processing
Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Kafka Cluster
@gamussa
#DataSciCon
@confluentinc
Slide 5
Kafka is a Streaming Platform Producer
Consumer
The Log
Connectors
Connectors
Streaming Engine @gamussa
#DataSciCon @
@confluentinc
Slide 6
Streaming is the toolset for dealing with events as they move!
@gamussa
#DataSciCon @
@confluentinc
Slide 7
What exactly is Stream Processing? possible_fraud
authorization_attempts
@gamussa
#DataSciCon @
@confluentinc
Slide 8
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
#DataSciCon @
@confluentinc
Slide 9
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
#DataSciCon @
@confluentinc
Slide 10
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
#DataSciCon @
@confluentinc
Slide 11
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
#DataSciCon @
@confluentinc
Slide 12
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
#DataSciCon @
@confluentinc
Slide 13
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
#DataSciCon @
@confluentinc
Where is KSQL not such a great fit? Ad-hoc queries •Limited span of time usually retained in Kafka •No indexes @gamussa
BI reports (Tableau etc.) •No indexes •No JDBC (most BI tools are not good with continuous results!) #DataSciCon
@confluentinc
Slide 25
Resources and Next Steps https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
@gamussa
#DataSciCon
@confluentinc
Slide 26
Thanks! @gamussa
viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa
#DataSciCon @
@confluentinc