Ingestion of Staging Files Using Databricks Auto Loader
Requirement
Information:
Databricks Auto Loader simplifies the incremental ingestion of files from cloud storage into Delta Lake tables. For the staging table located at Databricks volume, we will use Auto Loader to ingest the staging files incrementally.
Requirement:
Develop a Databricks Pyspark code to stream CSV files present in the volume paths using Auto Loader. Fetch the distinct target table names from the delta_stg_tables column in the config_master table.If the volume path contains "us" in it then retrieve the delta_stg_tables value where country = "US", and follow the same logic for other countries “NL” and “CA”. Load the data into the target table (value from delta_stg_tables). Set the trigger time as 10 seconds
SchemaLocation: /dbfs/tmp/{country}/schema
Volume Information:
/Volumes/agilisium_playground/purgo_playground/stg_us_wholesaler,
/Volumes/agilisium_playground/purgo_playground/stg_nl_wholesaler,
/Volumes/agilisium_playground/purgo_playground/stg_ca_wholesaler
Unity catalog information: purgo_playground.config_master
Expected output:
* Databricks Pyspark code
* Target Table Determined based on the delta_stg_tables value within the purgo_playground schema.