Implement Data Quality Checks for Tepezza Product Sales Data
Requirement
Introduction: For regulatory compliance and accurate decision-making in the Life Sciences domain, Tepezza’s sales data must meet stringent DQ standards. DQ checks are essential to ensure complete, accurate, and consistent data flows into analytics and reporting systems, enabling trusted insights into Tepezza's commercial performance.
Requirement: Refer the Attached Excel sheet “Sales_DQ_Rules” and use this sample data “Sales_Data“ only (if data is empty, make it as null record, if datatype is date strings datetype while reading the data then convert to datetime.date object) and infer the datatype according to the data. Implement data quality checks on Tepezza sales data based on the rules provided in the attached ’Sales_DQ_Rules' Excel file. Use this sample sales data along with detailed rule definitions for each data quality category.
Expected Codebase: PySpark, Python