Data quality check to check null values and valid format on d_product table
Requirement
Introduction: Data quality check on the table is mandatory step before moving the data PROD. This ensures quality of the data processed in the table. It helps avoiding erroneous data in the reporting.
Based on the column and its criticality, a Business Analyst makes a decision whether to communicate to the source team or ignore it.
Requirement: Create Databricks Pyspark logic to check the if ‘item_nbr’ , ‘sellable_qty’ is not null values in the ‘d_product’ table.
Get the count of record when ‘item_nbr’ is null. Display 5 sample records when ‘item_nbr’ is null.
Get the count of record when ‘sellable_qty’ is null. Display 5 sample records when ‘sellable_qty’ is null.
Get the count of record when ‘prod_exp_dt’ is not in ‘YYYYMMDD’ Date Format. Display 5 sample records when ‘prod_exp_dt‘ is not in ‘YYYYMMDD’ Date Format.
Table Definition: The table name is d_product. The primary key is prod_id.
Unity Catalog Information: d_product