top of page

Building data profiling table

Requirement

Information:

Data profiling plays a crucial role in analyzing the content and characteristics of each column in a dataset. In this case, we need to create a data profiling table

 

Requirement:

Develop a Databricks PySpark code to generate or replace a data profiling table named data_profile table. The table should capture the Data type, Null count, Distinct count, Total count, Minimum value, Maximum value, and Mean of each column the tables listed below:

 

 

d_product_revenue_silver

 

d_product_revenue

 

config_master

 

dq_check_table

 

 

data_profile table should have the following columns:

 

table_name

 

column_name

 

Data type

 

Null count

 

Distinct count

 

Total count

 

Minimum value

 

Maximum value

 

Mean

 

Perquisites:

 

  1. Drop data_profile tables if exist and create them.

 

Unity Catalog Information:

 

purgo_playground.d_product_revenue_silver

 

purgo_playground.d_product_revenue

 

purgo_playground.config_master

 

purgo_playground.dq_check_table

 

Expected Output: Databricks PySpark code and purgo_playground.data_profile.

 

Purgo AI Agentic Code

bottom of page