Protecting customer 360 PII data
Requirement
Information: PII (Personally Identifiable Information) refers to any data that can be used to identify a specific individual. Protecting PII is crucial because its exposure or misuse can lead to privacy violations, identity theft, or fraud. We need identify to encrypt the PII data so that we can make sure it's safe.
Requirement: Develop a Databricks PySpark script to encrypt the specified columns in the customer_360_raw table and load the data to the table. Additionally, save the encryption key as a JSON file named encryption_key_<current_datetime> in the volume location after encrypting the data.
PII columns:
name,
email,
phone,
zip
Prerequisite:
- Drop the table customer_360_raw_clone if exist.
- Create replica of customer_360_rawtable in customer_360_raw_clone table and perform the requirement in the replica table.
JSON location: /Volumes/agilisium_playground/purgo_playground/de_dq
Unity Catalog Information: purgo_playground.customer_360_raw/customer_360_raw_cone
Expected output: Databricks PySpark code and Key