Building a Real-time Alerting, Analytics and Reporting System at Scale

A presentation at NYC In-Memory Computing Meetup in June 2019 in New York, NY, USA by Viktor Gamov

Slide 1

Slide 1

real-time streaming and analytics at scale with Apache Kafka and Apache Ignite June, 2019 / New York, NY @denismagda | @gamussa | #NYCInMemory

Slide 2

Slide 2

2 Hello πŸ‘‹ @gamussa @denismagda @denismagda | @gamussa | #NYCInMemory

Slide 3

Slide 3

Digital transformation challenges @denismagda | @gamussa | #NYCInMemory

Slide 4

Slide 4

4 Digital Transformations Challenges Application Layer Web-Scale Apps IoT 10-100x Queries and Transactions (per sec) Mobile Apps Social Media 10-1000x Faster Analytics (Hours to Sec) 50x Data Storage (Big Data) Data Layer NoSQL RDBMS @denismagda | @gamussa Hadoop | #NYCInMemory

Slide 5

Slide 5

4 Digital Transformations Challenges Application Layer ● 10-100x more queries and transactions Web-Scale Apps IoT 10-100x Queries and Transactions (per sec) Mobile Apps Social Media 10-1000x Faster Analytics (Hours to Sec) 50x Data Storage (Big Data) Data Layer NoSQL RDBMS @denismagda | @gamussa Hadoop | #NYCInMemory

Slide 6

Slide 6

4 Digital Transformations Challenges Application Layer ● 10-100x more queries and transactions ● 50x more data today as a decade ago Web-Scale Apps IoT 10-100x Queries and Transactions (per sec) Mobile Apps Social Media 10-1000x Faster Analytics (Hours to Sec) 50x Data Storage (Big Data) Data Layer NoSQL RDBMS @denismagda | @gamussa Hadoop | #NYCInMemory

Slide 7

Slide 7

4 Digital Transformations Challenges Application Layer ● 10-100x more queries and transactions ● 50x more data today as a decade ago ● Overnight analytics become real-time Web-Scale Apps IoT 10-100x Queries and Transactions (per sec) Mobile Apps Social Media 10-1000x Faster Analytics (Hours to Sec) 50x Data Storage (Big Data) Data Layer NoSQL RDBMS @denismagda | @gamussa Hadoop | #NYCInMemory

Slide 8

Slide 8

5 @denismagda | @gamussa | #NYCInMemory

Slide 9

Slide 9

5 @denismagda | @gamussa | #NYCInMemory

Slide 10

Slide 10

5 @denismagda | @gamussa | #NYCInMemory

Slide 11

Slide 11

5 @denismagda | @gamussa | #NYCInMemory

Slide 12

Slide 12

5 @denismagda | @gamussa | #NYCInMemory

Slide 13

Slide 13

5 @denismagda | @gamussa | #NYCInMemory

Slide 14

Slide 14

5 @denismagda | @gamussa | #NYCInMemory

Slide 15

Slide 15

Slide 16

Slide 16

Slide 17

Slide 17

In-Memory Computing and Stream processing Application Layer Web-Scale Apps IoT Mobile Apps Social Media Confluent Platform GridGain In-Memory Computing Platform Event Streaming Transactional Persistence @denismagda | @gamussa | #NYCInMemory

Slide 18

Slide 18

In-Memory Computing and Stream processing β€’ Performance and velocity increases Application Layer Web-Scale Apps IoT Mobile Apps Social Media Confluent Platform GridGain In-Memory Computing Platform Event Streaming Transactional Persistence @denismagda | @gamussa | #NYCInMemory

Slide 19

Slide 19

In-Memory Computing and Stream processing β€’ Performance and velocity increases Application Layer Web-Scale Apps IoT Mobile Apps Social Media β€’ Scalability up to petabytes of data Confluent Platform GridGain In-Memory Computing Platform Event Streaming Transactional Persistence @denismagda | @gamussa | #NYCInMemory

Slide 20

Slide 20

In-Memory Computing and Stream processing β€’ Performance and velocity increases Application Layer Web-Scale Apps IoT Mobile Apps Social Media β€’ Scalability up to petabytes of data β€’ Act faster by analyzing streams of data Confluent Platform GridGain In-Memory Computing Platform Event Streaming Transactional Persistence using SQL language @denismagda | @gamussa | #NYCInMemory

Slide 21

Slide 21

8 Streaming-First Workd @denismagda | @gamussa | #NYCInMemory

Slide 22

Slide 22

9 Kappa Architecture: GridGain and Kafka Connect πŸ’΅ @denismagda | @gamussa | #NYCInMemory

Slide 23

Slide 23

Demo @denismagda | @gamussa | #NYCInMemory

Slide 24

Slide 24

Slide 25

Slide 25

Slide 26

Slide 26

Enter Kafka Connect @denismagda | @gamussa | #NYCInMemory

Slide 27

Slide 27

13 @denismagda | @gamussa | #NYCInMemory

Slide 28

Slide 28

13 @denismagda PRODUCER Producer Application | @gamussa | #NYCInMemory

Slide 29

Slide 29

13 CONSUMER @denismagda PRODUCER Producer Application | @gamussa | Consumer Application #NYCInMemory

Slide 30

Slide 30

14 KAFKA CONNECT KAFKA CONNECT CONSUMER PRODUCER @denismagda | @gamussa | #NYCInMemory

Slide 31

Slide 31

14 KAFKA CONNECT KAFKA CONNECT CONSUMER PRODUCER Source Connector SMTs Converter @denismagda | @gamussa | #NYCInMemory

Slide 32

Slide 32

14 KAFKA CONNECT KAFKA CONNECT CONSUMER PRODUCER Source Connector SMTs Converter SMTs Converter @denismagda | @gamussa | Sink Connector #NYCInMemory

Slide 33

Slide 33

15 Discover connectors, SMTs, and converters @denismagda | @gamussa | #NYCInMemory

Slide 34

Slide 34

16 Discover connectors, SMTs, and converters Descriptions, licensing, support, and more @denismagda | @gamussa | #NYCInMemory

Slide 35

Slide 35

17 Lower the Bar to Enter the World Coding Sophistication Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @denismagda | @gamussa | #NYCInMemory

Slide 36

Slide 36

17 Lower the Bar to Enter the World Coding Sophistication Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @denismagda | @gamussa | #NYCInMemory

Slide 37

Slide 37

Store and process with GridGain @denismagda | @gamussa | #NYCInMemory

Slide 38

Slide 38

GridGain: Real-time Streaming and Analytics @denismagda | @gamussa | #NYCInMemory 19

Slide 39

Slide 39

20 Essential GridGain APIs Distributed memory-centric storage Co-located Computations Combines the performance and scale of inmemory computing together with the disk durability and strong consistency in one system Brings the computations to the servers where the data actually resides, eliminating need to move data over the network Distributed SQL Horizontally, fault-tolerant distributed SQL database that treats memory and disk as active storage tiers Distributed Key-Value Read, write and transact with fast key-value APIs ACID Transactions Machine and Deep Learning Supports distributed ACID transactions for key-value as well as SQL operations Set of simple, scalable and efficient tools that allow building predictive machine learning models without costly data transfers (ETL) @denismagda | @gamussa | #NYCInMemory

Slide 40

Slide 40

21 GridGain SQL For Real-Time Analytics Ignite Node Toronto 2 Canada Montreal Ottawa Calgary 1 Ignite Node 3 2 India New Delhi

  1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one @denismagda Mumbai | @gamussa | #NYCInMemory

Slide 41

Slide 41

Thanks! @denismagda dmagda@gridgain.com @gamussa viktor@confluent.io @denismagda | @ @gamussa | #NYCInMemory

Slide 42

Slide 42

Q&A @denismagda | @gamussa | #NYCInMemory