top of page

Ingestion of Staging Files Using Databricks Auto Loader

Requirement

Information:

Databricks Auto Loader simplifies the incremental ingestion of files from cloud storage into Delta Lake tables. For the staging table located at Databricks volume, we will use Auto Loader to ingest the staging files incrementally.

 

Requirement:

Develop a Databricks Pyspark code to stream CSV files present in the volume paths using Auto Loader. Fetch the distinct target table names from the delta_stg_tables column in the config_master table.If the volume path contains "us" in it then retrieve the delta_stg_tables value where country = "US", and follow the same logic for other countries “NL” and “CA”. Load the data into the target table (value from delta_stg_tables). Set the trigger time as 10 seconds

 

SchemaLocation: /dbfs/tmp/{country}/schema

 

 

Volume Information:

 

/Volumes/agilisium_playground/purgo_playground/stg_us_wholesaler,

 

/Volumes/agilisium_playground/purgo_playground/stg_nl_wholesaler,

 

/Volumes/agilisium_playground/purgo_playground/stg_ca_wholesaler

 

Unity catalog information: purgo_playground.config_master

 

Expected output:

 

* Databricks Pyspark code

* Target Table Determined based on the delta_stg_tables value within the purgo_playground schema.

Purgo AI Agentic Code

bottom of page