P.S: Jump to Repo
In the realm of digital applications, user reviews are a gold mine of insights. Extracting these reviews and analyzing them can empower businesses to refine their app features, fix bugs, and improve user experience. Our project is a manifestation of this concept, focusing on extracting reviews from the Play Store and enabling possibilities for further expansion to other platforms like the Apple Store.
The core of our project is scraping reviews from the Play Store. However, it’s designed with a generic interface, paving the way for integration with various sources such as the Apple Store. Users can add the company ID to the database, and the system automatically extracts reviews for that company as per the procedures outlined in the README.
To accommodate our business requirements, we have crafted a scalable and robust architecture using a variety of AWS services. The tech stack includes ECS, Fargate, CloudFormation, Docker, ECR Repository, and CloudWatch Events Rule. Here’s a brief overview:
ECS and Fargate: The backbone of our project, enabling us to run Docker containers in a scalable way without managing the underlying infrastructure.
Docker Image and ECR Repository: Docker packages our application with all its dependencies, and ECR Repository hosts these Docker images securely.
CloudFormation: It brings all the pieces together, allowing us to define and provision AWS infrastructure resources consistently and repeatably.
CloudWatch Events Rule: This service schedules the execution of our ETL job, ensuring it runs daily to fetch the most recent reviews.
This architecture provides scalability, reliability, and automation. The combination of ECS with Fargate allows us to manage containers easily and scale according to demand. Docker and ECR offer a consistent environment and secure hosting, while CloudFormation ensures that our resources are managed effectively. Lastly, CloudWatch automates our ETL pipeline, reducing manual intervention and the risk of errors.
Also, in general we don’t need the reviews in real time. By embracing a batch processing, we only extract the recently added reviews to PlayStore.
Using AWS CloudFormation, we define a template that describes the resources needed for our project. This includes ECS Task Definitions, Fargate Services, ECR Repository, and CloudWatch Events Rule, along with necessary IAM roles and policies.
Our template also securely manages sensitive information like MongoDB credentials through AWS Secrets Manager, ensuring they are accessible to the Fargate tasks but not exposed.
Once the template is deployed, CloudFormation provisions and configures the resources in the correct order, integrating them to form a cohesive, automated ETL pipeline. The pipeline is triggered daily by CloudWatch Events Rule, fetching the latest reviews from the Play Store for all companies in our database.
Please visit the repository of this project to have a quick glimpse on the code. Note that actual addresses such as ECR repository or subnetIDs are masked. You need to fill in with yours. That is why, this project would not work without additional configuration such as setting your Mongo DB username and connection, and ECR repository.
Our project embodies the synergy of various AWS services to create a scalable and automated solution for extracting app reviews. Of course, this is still a basic setup. But it is a nice exercise for data engineers to see how to combine various cloud tech stack to have an almost-production-ready code :D
Happy Coding!