Site Risk prediction for Drug Discovery

Problem Statement

A leading Healthcare industry in the US was looking for

Avoid Millions of Dollars lost due to Adverse Event (AE) delayed reporting
Forecast AE reporting risk at site level with the predefined list of clinical studies
Reduce cost for performing SDV (Site Data Verification). Currently SDV is performed on 100% of sites resulting in huge cost.
Achieve a better prediction model than what is available in the market through H2O.AI, DataRobot and RapidMiner (current accuracy – ~72%)


Sentienz proposed a prediction model by performing the following…

  • Feature analysis – Resulting in the best feature list to create the prediction model by performing advanced feature engineering.
  • Created detailed models and analyzed results and outputs, to evaluate and re-evaluate the metrics and identify optimum outcomes
  • Stacked Ensemble model combining various Deep learning (DL) and Machine learning (ML) Models
  • Principal Component analysis to understand the spread of training and test data.
  • Provide explanation of how the predictions are done listing top 10 explanations along with their impact rating.
  • Applied Deep Learning models with complex network architectures.
  • Performed detailed False Positive (FP) and False Negative (FN) analysis to maximize recall / sensitivity, allowing for maximum accuracy with a minimal impact on the business.



Implemented Streamsets

For ingesting patient data to data platform

Implemented DataRobot and H20 for generating Base AI and ML models

  • Using the training dataset the Base Models for improved
  • Ensemble of Models were used to improve the accuracy of the results

Business Benefits

  • 85%
  • 90%
  • 88%
  • 60%
    Site Data Verification cost reduced

Get a copy of use case document