AWS, Snowflake, Spring, Spark, Jenkins
Our client is a global education and technology company with 12 million students around the globe. Their portfolio comprises numerous educational platforms with various programs from elementary school to university students.
With this number of users, all the platforms combined generated enormous amounts of data. The test submissions that needed to be processed and evaluated alone numbered millions daily. Hence the need to create a comprehensive solution that would automatically and in real-time assess student work, generate a variety of reports, and collect and store data in a data warehouse for future analytics and identifying business insights.
NIX had been developing software solutions for this client for many years. Knowing our experience in the education domain and our deep expertise in creating big data solutions, the client approached us with a request to develop and implement data as a service platform into their product ecosystem.
This project consisted of three main components:
A data pipeline to export data both from the internal database and from external sources to the analytics system
The analytics core module, which directly checks and estimates a student’s work and then returns the result to the relevant education platform
A report processor that will generate numerous real-time and scheduled reports for students, teachers, administration, etc.
Given that all of the client’s educational ecosystem had been deployed by AWS, the DaaS solution was also built on AWS, using the Spring framework, Spark jobs.
Our engineers created 155 different services for such a complex system, which ensured the functioning of all DaaS system components, namely data pipelines, analytics core, and report generating module.
33 Spring services
30 Spark Jobs
59 AWS Lambda functions
33 Amazon API Gateway
One of the fundamental phases of the project was to build data pipelines to collect the data for several purposes. First is the creation of historical storage. The second is providing an analytical BI solution built on com.Snowflake and Looker with the required data.
This pipelining allows us to analyze the behavioral patterns of users when interacting with platform features and identify opportunities to improve them. To collect the data, our team used the Gateways API and Kinesis Firehose, which allowed us to build a pipeline with an average throughput of 40 GB per day, which is a solid number.
One more crucial pipeline performs data Export from MongoDB and S3 Bucket to the data warehouse. This pipeline works based on Spark, which exports data from the database and cloud storage and sends it either to create reports or to Snowflake, from where it goes for analysis and visualization.
The data from the training platforms comes into the analytics root via the API. It’s immediately saved in S3, ensuring that no input data is lost, and bypassing the SQS queue limits.
Once the S3 API has received the input data, the system sends an SQS queue message. From there, this data already goes to Analytics Persistor, pulling the object from S3 and scoring it. It also saves all the data it has received into the database.
We used MongoDB and MongoDB Atlas for database deployment, management, and monitoring.
After scoring, the system sends messages to other downstream systems, informing them that new data has been received and successfully processed. These systems include the report generation processors, which we will also discuss, and the Event publisher. The latter gets the message and transmits it further in an understandable format for the learning platform to display it on the UI or to be processed by the services of the learning platform.
The developed solution can generate 17 reports for different types of users (teachers, students, and school and university administrators), which allows for evaluating students’ success, identifying topics where students have the most significant difficulties, and providing insights to improve the process and the quality of learning.
Various reports require different amounts of information to be processed. For example, it may be a report on a single student or the progress of an entire class. Also, for some users, the relevance of the information is critical. Considering these factors, our engineers used several approaches to generate the reports. This also allowed us to optimize the cost and overall load on the system.
This is an example of generating a real-time report for students that shows the evaluation of their work. It takes only a few seconds to complete. Since the analytics core informs everyone about the end of the test, the system collects and stores all the data needed to generate the report in the database. Then, a report generator module simply waits for the user’s command. We use Spring as one of the main technologies for generating these reports.
We created two types of scheduled reports: one with one data source and one with multiple data sources. While the first one is built on Spring, just like real-time reports, scheduled reports with multiple data sources have a much more complex structure.
One use case is a report showing how much time different students spend on particular courses. For such cases, our engineers developed a custom solution that uses a distributed data processing system based on Spark jobs which is run on an EMR cluster with Jenkins according to a set schedule. As a result, Spark jobs extract the required data and aggregate and save them into the S3 Bucket, where it waits for a request from users.
Our engineers never hesitate when it’s necessary to go the extra mile to deliver better solutions. For example, we had several bulky reports requiring unique models. Otherwise, its generating process would process gigabytes of data. Having built custom models in such cases, we could significantly optimize generating such reports in the context of the volume of computing data. As a result, it needs to process 3 MB instead of 2 GB, which takes only a few seconds.
As you can see, we used Jenkins for data orchestration and deployment of the entire environment. Long story short, our deployment process looks like this:
AWS Configuration by Jenkins Analytics AWS tool to AWS Cloud
We developed custom templates for AWS Configuration, ensuring that only updated configurations and nothing else is deployed and Spark jobs run at specific times. Moreover, we created custom templates for services like S3, Lambda, Firehose, SQS, SNS, and many more.
The developed AWS-based data analytics solution on top of Snowflake and Looker helped the client compose a holistic picture of its products and services. Moreover, it dramatically increased the value that the system provides to users. For example, real-time reports increase student user experience, which has become one of the competitive edges among educational platforms. In regards to other users: business insights from Big Data is a crucial tool for administrations for discovering even more options for service improvements; teachers using analytics can enhance course content i.e. use a more personalized approach to each student.
7 experts (4 Java/Scala data engineers, System analyst, UI designer, Project manager)
AWS, Snowflake, MongoDB, Jenkins, AWS Kinesis, Apache Camel, AMQ, Elastic Beanstalk, S3, Kinesis Data Firehose, EMR, Fargate, CloudFront, Spring, Spark
IoT-smart toy with mobile application teaching girls to code.
E-learning platform for enhancing the learning process with customized content options, vast testing capabilities, and market-leading pedagogy methods.
NIX team maximized resource utilization and minimized costs for LibraryPass with AWS.
Web platform for building curricula with pre-built 3D lessons and slides from anatomy educators across the globe.
Development of advanced, Salesforce-based features to set up and automate processes related to the educational and management processes.
AWS data analytics platform for an educational 3D platform that provides actionable insights on marketing and product activities.
See more success stories
Our representative gets in touch with you within 24 hours.
We delve into your business needs and our expert team drafts the optimal solution for your project.
You receive a proposal with estimated effort, project timeline and recommended team structure.
Cloud analytics will help your business reach a new level. Build and implement cloud solutions and take full advantage of them.
If you’re thinking about introducing business intelligence reporting into your company’s workflow, read this article to make the most of its benefits and best practices.
Marketing automation is the use of technology to automate repetitive, manual marketing activities. Take a look at the role of marketing automation in business.
Find out more about web applications and their types, advantages, and the most effective ways to build and deploy them.
Find out more about top mobile development technologies, their pros and cons, and which ones will help you meet your business goals.
Data migration is the transfer from one operating environment to another, a process associated closely with infrastructure upgrades.
Xamarin provides a way to build native apps for iOS, Android, and Windows completely in C#. Take a look at the features of Xamarin application development.
Learn how ServiceNow IT asset management tools can bring value to your company by introducing efficient management of its hardware, software, and cloud assets.
Leave poor software and manual management: automate your business processes with ServiceNow ITSM. Become a leader in the industry with smart tech solutions.
See more insights