top of page

DQ Checks from volume file to onboarded table

Requirement

Introduction: To ensure data integrity and consistency, it is essential to perform Data Quality (DQ) checks on onboarded tables. This process involves verifying various aspects such as row count, column count, duplicate records, completeness, uniqueness, and business rule validation. The objective is to identify and resolve discrepancies in the data to maintain high-quality standards.

 

Requirements:

 

Read the table “purgo_playground.onboarded_raw_table“ only. Read the file”/Volumes/agilisium_playground/purgo_playground/de_dq/s3_file.csv“ only. Do the Data quality checks on both records with below criteria

 

Data Quality Checks:

 

Row Count Check:* Ensure row counts are consistent between s3_file and onboarded_raw_table.

Column Count Check:* Verify all columns are present in both s3_file and onboarded_raw_table.

Duplicate Count Check:* Detect duplicate records based on config_id (primary key) in both s3_file and onboarded_raw_table.

Completeness Check:* Confirm no missing values in mandatory columns (config_id, source_object_name, source_system, file_name, s3_vendor_path, s3_landing_path, actual_file_name) across s3_file and onboarded_raw_table.

Uniqueness Check:* Validate uniqueness of config_id values in both s3_file and onboarded_raw_table.

Business Rule Validation:* Check if active_flag is set to 1 for all records in both s3_file and onboarded_raw_table.

 

Final Output: Display results with columns - DQ_Check_Type, onboarded_raw_table_count, s3_csv_file_count, check_passed.

 

Unity Catalog: “purgo_playground.onboarded_raw_table“

 

volume path: ”/Volumes/agilisium_playground/purgo_playground/de_dq/s3_file.csv“

Purgo AI Agentic Code

bottom of page