Calculate the code logic from derived column list with column directly from source
Requirement
Introduction: To calculate and enrich the patient_therapy_shipment table with key derived metrics that assess patient adherence, therapy gaps, age-based demographics, and discontinuation type. The enrichment should be based on business logic involving shipment dates, dosage supply assumptions, treatment and patient-level behavior, so that downstream analytics teams can generate insights for MSLs, therapy adherence, and patient support interventions.
Requirement: Develop a PySpark job to process data from the source table purgo_playground.patient_therapy_shipment using {calctime} = '2024-04-01' as the reference date.
* Read the source table purgo_playground.patient_therapy_shipment into a DataFrame.
* Filter and process the data based on the input reference date parameter {calctime}.
Derive 12 output columns* as specified in the Derived_column_list sheet.
* Apply transformation logic for each derived column according to the business rules provided in the attached Excel logic sheet.
* Ensure the transformations strictly align with the column-wise specifications (data type, logic conditions, dependencies, etc.).
* The final DataFrame should contain the original required identifiers and the 12 derived columns.
Final Output: Display the result.
Unity Catalog: 'purgo_playground.patient_therapy_shipment'