Drug Discovery on Compound Screening Analysis
Requirement
Introduction: Develop a PySpark code to performing various calculations, aggregations, joins, and filtering on compound screening analysis data. The analysis focuses on evaluating compound results based on multiple criteria such as IC50, AUC, Efficacy, and other factors. The goal is to generate comprehensive insights by categorizing results based on their overall potential.
Requirements: Read the table “purgo_playground.compound_drug_analysis". Groups the data by therapeutic_area to calculate average ic50, auc, efficacy, total sample_size, and count of studies.
Filter Analysis:* Filters the dataset to include only rows where approved_flag is '1' and validation_status is 'valid'.
Join Analysis:* Joins the filtered data with aggregated data on the therapeutic_area.
Result Analysis:* Computes an overall_score by averaging the scores (score1, score2, score3, score4, score5).
** Categorizes results into High Potential when the overall_scoreis 70 to 100, Moderate Potential when 60 to 70 and Low Potential below 60..
Final Output: Displays the results with all column and Result analysis.
Unity Catalog: “purgo_playground.compound_drug_analysis”
Note: Do not execute Validation part