top of page

SCD2 for d_product table

Requirement

Introduction: To build an effective Lakehouse, we need to maintain data history. The d_product table , which includes an updt_dt column that logs the most recent modification timestamp for each record. To capture historical changes for each prod_id, we will implement Slowly Changing Dimension Type 2 (SCD2) by adding a column: is_record_active, to track the validity of product records over time.

 

Requirement: Create a Databricks Spark SQL script display SCD type 2 result of d_product table, along with a new column: is_record_active.

 

* For each prod_id, retrieve the latest record based on updt_dt.

** Set is_record_active to 1.

* For older records with the same prod_id (other than the latest):

** Update is_record_active to 0 to indicate the record is no longer active.

 

Unity Catalog Information: d_product

 

Excepted Output: Databricks SQL code

Purgo AI Agentic Code

bottom of page