top of page

Calculate the code logic from derived column list with column directly from source

Requirement

Introduction: To calculate and enrich the patient_therapy_shipment table with key derived metrics that assess patient adherence, therapy gaps, age-based demographics, and discontinuation type. The enrichment should be based on business logic involving shipment dates, dosage supply assumptions, treatment and patient-level behavior, so that downstream analytics teams can generate insights for MSLs, therapy adherence, and patient support interventions.

 

Requirement: Develop a PySpark job to process data from the source table purgo_playground.patient_therapy_shipment using {calctime} = '2024-04-01' as the reference date.

 

* Read the source table purgo_playground.patient_therapy_shipment into a DataFrame.

* Filter and process the data based on the input reference date parameter {calctime}.

Derive 12 output columns* as specified in the Derived_column_list sheet.

* Apply transformation logic for each derived column according to the business rules provided in the attached Excel logic sheet.

* Ensure the transformations strictly align with the column-wise specifications (data type, logic conditions, dependencies, etc.).

* The final DataFrame should contain the original required identifiers and the 12 derived columns.

 

Final Output: Display the result.

 

Unity Catalog: 'purgo_playground.patient_therapy_shipment'

Purgo AI Agentic Code

bottom of page