Migrating Vendor S3 File to Purgo S3 Folder
Requirement
Requirement:
Create a Databricks PySpark script transfer all the list of files from the vendor's S3 bucket to the Purgo S3 folder based on the below conditions.
* List of file names in Vendor folder should not be in list of both Purgo S3 folder and Archive S3 folder
* It must also verify that only active files (“A“) are ingested by checking the active_flag column in ingest_config_master .
The script should retrieve the complete S3 folder paths for Vendor, Purgo, and Archive from the ingest_config_master configuration table, using the values stored in the s3_vendor_path, s3_landing_path, and s3_archive_path columns respectively.
Databricks Secret Information: “access_key” and “secret_key” are placed in Databricks secret under the scope “aws_keys”