Real-Time Ingestion of Patient Data from S3 to Delta Using Auto Loader

Requirement

Requirement:

Stream patient data real time which is in CSV format from the Purgo S3 location into the patient_data_auto_loader table by append it. All columns should be read as String type, except data_loaded_at, which must be a timestamp. The complete S3 folder path should be dynamically retrieved from the s3_landing_path column of the ingest_config_master table, where the source_object_name contains the keyword 'patient_data'.

Schema location:* /mnt/checkpoints/s3_autoloader/patient_al_schema

check point location:* /mnt/checkpoints/patient_al_cp/

Access and Secret key detail*: Configure Spark to access S3 by retrieving the access_key and secret_key securely from the Databricks secret scope aws_keys

Below is the schema structure of CSV file:

||Column||

|patient_id|

|patient_name|

|age|

|diagnosis|

|treatment|

Real-Time Ingestion of Patient Data from S3 to Delta Using Auto Loader

Requirement

Purgo AI Agentic Code