Real-Time Ingestion of Patient Data from S3 to Delta Using Auto Loader
Requirement
Requirement:
Stream patient data real time which is in CSV format from the Purgo S3 location into the patient_data_auto_loader table by append it. All columns should be read as String type, except data_loaded_at, which must be a timestamp. The complete S3 folder path should be dynamically retrieved from the s3_landing_path column of the ingest_config_master table, where the source_object_name contains the keyword 'patient_data'.
Schema location:* /mnt/checkpoints/s3_autoloader/patient_al_schema
check point location:* /mnt/checkpoints/patient_al_cp/
Access and Secret key detail*: Configure Spark to access S3 by retrieving the access_key and secret_key securely from the Databricks secret scope aws_keys
Below is the schema structure of CSV file:
||Column||
|patient_id|
|patient_name|
|age|
|diagnosis|
|treatment|