SCD2 for d_product table
Requirement
Introduction: To build an effective Lakehouse, we need to maintain data history. The d_product table , which includes an updt_dt column that logs the most recent modification timestamp for each record. To capture historical changes for each prod_id, we will implement Slowly Changing Dimension Type 2 (SCD2) by adding a column: is_record_active, to track the validity of product records over time.
Requirement: Create a Databricks Spark SQL script display SCD type 2 result of d_product table, along with a new column: is_record_active.
* For each prod_id, retrieve the latest record based on updt_dt.
** Set is_record_active to 1.
* For older records with the same prod_id (other than the latest):
** Update is_record_active to 0 to indicate the record is no longer active.
Unity Catalog Information: d_product
Excepted Output: Databricks SQL code