EdCast’s AI-powered Knowledge Cloud is a multi tenant platform running in Amazon AWS and tracks the user activities in realtime to provide greater insights to customers. This platform generates millions of events per day. The challenge is to clean the data, combine the real time events from multiple sources with the metadata from multiple meta data stores and make it available for reporting and data science use cases.
Built a complete ETL pipeline and data warehouse using AWS Glue and AWS S3 services. This pipeline extracts the data from multiple datasources, cleans and loads them in to AWS S3 and creates data catalog on Glue for querying by Athena. Built a highly scalable comprehensive and secured query engine on top of AWS Lambda and exposed the data via AWS API Gateway.