Building data profiling table
Requirement
Information:
Data profiling plays a crucial role in analyzing the content and characteristics of each column in a dataset. In this case, we need to create a data profiling table
Requirement:
Develop a Databricks PySpark code to generate or replace a data profiling table named data_profile table. The table should capture the Data type, Null count, Distinct count, Total count, Minimum value, Maximum value, and Mean of each column the tables listed below:
d_product_revenue_silver
d_product_revenue
config_master
dq_check_table
data_profile table should have the following columns:
table_name
column_name
Data type
Null count
Distinct count
Total count
Minimum value
Maximum value
Mean
Perquisites:
- Drop data_profile tables if exist and create them.
Unity Catalog Information:
purgo_playground.d_product_revenue_silver
purgo_playground.d_product_revenue
purgo_playground.config_master
purgo_playground.dq_check_table
Expected Output: Databricks PySpark code and purgo_playground.data_profile.