Extract dimensions from product description
Requirement
Introduction: We will often get raw feed on product description where we need to extract certain information. We use pipeline code to extract product_size information for the product description.
Requirements: implement pyspark logic to extract the values for product_size from the ‘product_description' column in ‘agilisium_playground.purgo_playground.product_desc' table. if product_size not found in ‘product_description’ column, extract product_size from the 'product_id’ column with same pattern like 'product_description’ column . Split the string by hyphen(-) after -3rd hyphen(-) value from the end and join to form product_size.Each segment of the product_size is converted to four decimal places using float and formatted with f"{segment:.4f}"
Condition: If you are unable to extract product_size from the ‘product_description’ column, else extract product_size from the 'product_id' column.
Below Example for product size:
if the ‘product_description’ is “ALUM EXTRUDED ROUND TUBE - BAR - 1-Aluminum Bar - 6061-T6511-TB - 6061-T6511-TB-3.7500-.3750-23.7000” then ‘product_size’ will be “3.7500-.3750-23.7000".
if the ‘product_description’ is “ALUM EXTRUDED ROUND TUBE - BAR - 1-Aluminum Bar - 6061-T6511-TB - 6061-T6511-TB-4.2500-1.0000-59.6630" then ‘product_size’ will be “4.2500-1.0000-59.6630".
unity catalog details: agilisium_playground.purgo_playground.product_desc
Expected Output: Databricks Pyspark code without syntax error