top of page

Taking Backup of Onboarded S3 Files

Requirement

Requirement:

 

Create a Databricks PySpark script to automate the migration of files from the Purgo S3 landing folder to the archive folder. The script must reference the s3_file_process_log table to identify eligible files for archiving the files.

 

Only files with file_status = 'SUCCESS' should be moved. The script should dynamically read the s3_landing_path (source folder) and s3_archive_path (target folder) for each file from the same log table. Example of S3 folder path in log table: s3://agilisium-playground-dev/filestore/purgo/patient_raw

 

 

 

Databricks Secret Information: “access_key” and “secret_key” are placed in Databricks secret under the scope “aws_keys”

Purgo AI Agentic Code

bottom of page