Configuration Master of Raw tables

Requirement

Introduction: Develop a Databricks Pyspark logic to efficiently process data from a volume file path and load it into a target table while ensuring data integrity. The solution should implement a conversion logic to handle records based on specific conditions, ensuring accurate data updates and appends the record. On Validation and error handling, primary key column contains a null and empty value, raise an exception error and for column count validation if the source file and target table not matching the total number of column count and display total number of columns counts of both source and target table and then throws the exception error.

Requirements: Build a Simple Pyspark logic to insert new entries of Raw tables (csv files) into config_table, ensuring that new records are added while existing ones are updated. Apply the conversion logic specifically define the primary key columns: project, src_sys_cd, table_name, and spark_view_name.

Validation and error handling:

Primary Key Null Validation*:

If any primary key column contains a null value or empty value, display the corresponding column names and raise an exception.

Column Count Validation*:

If the total number of columns in the source table does not match the total number of columns in the target table (purgo_playground.config_table), display both column counts and raise the exception error:

"Source table column count does not match with target table column count" and should not load the data.

Source table:

First Load*: Use Volume path file located at '/Volumes/agilisium_playground/purgo_playground/config_table/d_product_plant_config_dev_first_load.csv'

Second Load For Insert/Update*: Use Volume path file located at '/Volumes/agilisium_playground/purgo_playground/config_table/d_product_plant_config_dev_second_load_insert_update.csv'

Third Load For Primary Key Null Validation*: Use Volume path file located at '/Volumes/agilisium_playground/purgo_playground/config_table/d_product_plant_config_dev_error_primary_key_null.csv'

Fourth Load For Column Name Validation for Source Table & Target Table*: Use Volume path file located at '/Volumes/agilisium_playground/purgo_playground/config_table/d_product_plant_config_dev_error_less_column.csv”

Target table: ‘purgo_playground.config_table’

Final Output: Databricks Pyspark logic and display the output.

Configuration Master of Raw tables

Requirement

Purgo AI Agentic Code