DQ_Rule_Check with hcp interactions against medical inquiry and medical inquiry response
Requirement
Introductions: Data Quality (DQ) rules for validating HCP interaction data against medical inquiries and responses. These rules aim to detect and address anomalies, inconsistencies, and inaccuracies in the dataset by performing comprehensive checks. The focus is on validating key attributes such as interaction types, inquiry dates, and associated responses for compliance with predefined standards.
Requirements: Create the pyspark code and apply the DQ Rule Name with Check applied in the table ‘purgo_playground.dq_datasets’.
||DQ Rule Name||Check Applied||
|Date_Pattern_Check|Ensures date is in YYYY-MM-DD format in (membership_effective_date, membership_expiration_date, expiration_date, hcp_interaction_date)|
|Date_Range_Check|Ensures membership_effective_date < membership_expiration_date|
|Null_Check|Ensures customer_name is not null|
|Length_Check|Ensures customer_name length < 20|
|Discrete_Range_Check|Ensures relationship_type is within allowed values (Primary", "Secondary", "Tertiary”)|
|Numeric_Check|Ensures dosage_amount is a valid numeric value|
|Boolean_Check|Ensures is_follow_up is True/False|
|Uniqueness_Check|Ensures hcp_id is unique value|
|Integer_Check|Ensures patient_count is an integer|
|Decimal_Check|Ensures response_time_hours is a valid decimal|
Expected Output: Display the existing table with reasons with column of DQ Rule passed/failed and also show the summary of failed records with reasons and record number.
Unity Catalog Details: purgo_playground.dq_datasets