Management Platform for ETL pipelines with Predictive Models and Algorithms for Healthcare Domain

NIX developed a powerful web platform leveraging ML and Big Data to empower healthcare providers and insurers with advanced analytics and predictive capabilities, driving informed decision-making and improved patient outcomes.

Business Domain

Healthcare
Service

Data Engineering, ETL/ELT, Web Development
Technologies

Python, Java

Business Overview

Our client is a large company providing various software solutions in the healthcare sector. They wanted to update their product and came to NIX with two issues:

Difficulty training different analytics models and building algorithms based on the historical data of the patients.
Building a platform that will manage, orchestrate, and run all machine learning (ML) models.

The developing platform can be deployed in on-premise mode to any cloud service provider or bare metal servers, depending on the customer’s preferences. Also, the platform must provide the possibility to be used as a SaaS solution, relieving customers of the platform’s hosting and management issues.

Project Scope

01
Maintain the platform’s ability to be deployed to any cloud service (AWS, Azure, Google, IBM) or used as an on-premise model with minimal effort.
02
Develop a solution according to internal customer policies and HIPAA standards as this software is made for healthcare organizations.
03
Build an orchestration and management system for managing ML models as containerized applications deployed to a Kubernetes cluster and scale horizontally along with easy integration.
04
Build a system that can support multi-tenancy and guarantee user’s data security.
05
Consider the horizontal scale for the computing section to process massive data assets within a short time.
06
Optimize the quality and performance of the new ML models and algorithms based on prototype and closely work with a subject matter expert (SME) from the customer side.
07
Rewrite some legacy algorithms using Spark to be able to process big data in a timely fashion.
08
Build infrastructure to train the models and keep them up-to-date (monitoring of quality and re-training with newer historical data).
09
Build reusable components for the ETL

Solution

Our engineers developed the platform according to all client’s requirements. The whole solution consist of 3 main parts:

Various microservices deployable to Kubernetes cluster to handle orchestration, management, and validation for the end-user by REST API.
Spark/PySpark applications (ML algorithms or pre-trained Models which are integrated into the spark application for scoring along with ETL pipeline) which can be deployed and integrated into the system. These can be submitted for execution using Livy REST API and can run on Spark or YARN based clusters (or a Kubernetes cluster in the future).
Scripts for YARN cluster deploy against dedicated servers so end users can quickly use them for computing power when no other Spark or YARN services are available.

Using the REST API, the system submits data about patients, such as diagnoses, procedures, and prescriptions. Based on these data, the platform uses an ML model or methodology along with required ETL logic, written on Spark using Java and Python APIs. The results of the model’s work form a big data file, which customers can use for further processing.

Given that the software works in the healthcare sector, we work closely with a subject matter expert, providing us with specifications and business models that served as a base for ML algorithms.

Outcome

As a result, we created a platform using ML and Big Data to help hospitals and insurance companies analyze and predict the likelihood of the following actions:

1

Cost efficiency of treatments compared within the industry
2

Risk of mortality based on the patient’s condition, diagnosis, and treatment path
3

Discovering typical patterns per different populations of patients
4

Identifying extra costs spent on services or additional penalties when the procedure wasn’t provided
5

Predicting complications based on patient condition and similar historical data of other patients
6

Identifying if hospitalization is required (based on risk) or outpatient treatment is successful