Real Time Streaming Analytics Application using AWS CDK

Sunny Agrawal
5 min readNov 15, 2021

Hello Guys,

I hope everyone is doing great.

Before getting into the main topic lets clear out the few basic things about this project.

Data Analytics is the science of analyzing raw data to make conclusions about that information. The techniques and processes of data analytics have been automated into algorithms that work over raw data for human consumption. Data analytics help a business optimize its performance. Before we can analyze the data, we need to store it but the size of the data can be very large making it inefficient. The solution that can solve this issue is processing. We can process the data before storing it and we can do various type of processing on the data as per our requirement like we can reduce the size of the data by compressing it or we can change the format or type of data.
While Real time analytics refers to the process of preparing and measuring data as soon as it enters the database. In other words, users get insights or can draw conclusions immediately (or very rapidly after) the data enters their system. Real-time analytics allows businesses to react without delay. They can seize opportunities or prevent problems before they happen.

Architecture for streaming compressed raw data into S3 bucket

In the above architecture, The real-time live data is captured and streamed using Amazon Kinesis Data Stream and then we are processing the captured data using Amazon Kinesis Data Firehose. Using AWS CDK gives us ability to build the whole architecture at one place rather than visiting each service’s console and configuring them individually. The raw data that we have stored in S3 is in compressed form, we can perform analytics solution on this data with many different analytic tools such as Amazon Athena, Amazon S3 Select, etc.

We can get more benefits by capturing near real-time data so that we can act instantly upon finding an anomaly in the data rather than doing it on a fixed time.

Architecture design for capturing real-time data and analyzing using Kinesis Data Analytics

The implementation is of the above architecture is as follows:
The client generates the real-time data which is captured and streamed by Amazon Kinesis Data Stream, but in this architecture for analyzing the streaming data in real-time so that we can find an specific anomaly in the data to take actions based on it instantly. When there is an anomaly in our data found by the Amazon Kinesis Data Analytics, a Amazon Lambda function will be triggered that will generate a SNS topic which will alert the user and add a item in the Amazon DynamoDB Table which will be shown in a Web UI using DynamoDB Table Viewer.

Lets see the working demo of the above architecture.

Running the producer.py for generating fake data

For this demo, we are using a producer.py script which generates fake bank transaction data so that we can process this huge bank transactions and analyze for any anomalies too.

Formatted and compressed data stored in Amazon S3

Here we had the compressed and formatted data stored in Amazon S3 for later analysis. This data is streamed by Amazon Kinesis data stream and processed by Amazon Kinesis Firehose.

Abnormality detector which analyze the data for anomalies

The above image shows the abnormality detector created in Amazon Kinesis data analytics for real time data analysis which detects the specific condition or anomaly present in streaming data. This makes it easier for the consumer to react very fast to the issue.

The real time analyzed data stored in DynamoDB

For receiving the anomaly transaction’s notification so that a consumer can react to it instantly. For this we have used Amazon SNS and the consumer needs to subscribe to the notification and he/she can confirm their mail and receive future notification.

Subscription confirmation for receiving the notification

Here is a demo mail generated by Amazon SNS when an anomaly is detected and sent to the consumer’s verified email.

Demo notification received when anomaly detected

For viewing the analyzed data processed by Amazon Kinesis Data Analytics and stored in Amazon DynamoDB. This browser view helps the user to view the real time analyzed data which has abnormalities in them for easy understanding and quickly responding to the anomalies.

Web UI of the real-time anomaly contained data

Hope you had found this useful.

Thank you for reading our blog.

Conclusion

At the end, we explored how AWS CDK is used for deploying different services using Amazon Cloud Formation. How we can integrate different services and create a whole architecture, all without even visiting AWS Management Console that is building and deploying through AWS Cloud Development Kit(AWS CDK).

Thank you for reading. Hope you enjoyed our project.
The contributors of this projects are:
Sunny Agrawal and Aghera Shyamkumar Bhupendrakumar.

Github Link:
https://github.com/SunnyAgrawal1208/RealTimeStreamingApplication
https://github.com/agherashyam2000/RealTimeStreamingApplication

--

--