top of page

Extract dimensions from product description

Requirement

Introduction: We will often get raw feed on product description where we need to extract certain information. We use pipeline code to extract product_size information for the product description.

 

Requirements: implement pyspark logic to extract the values for product_size from the ‘product_description' column in ‘agilisium_playground.purgo_playground.product_desc' table. if product_size not found in ‘product_description’ column, extract product_size from the 'product_id’ column with same pattern like 'product_description’ column . Split the string by hyphen(-) after -3rd hyphen(-) value from the end and join to form product_size.Each segment of the product_size is converted to four decimal places using float and formatted with f"{segment:.4f}"

 

Condition: If you are unable to extract product_size from the ‘product_description’ column, else extract product_size from the 'product_id' column.

 

Below Example for product size:

 

if the ‘product_description’ is “ALUM EXTRUDED ROUND TUBE - BAR - 1-Aluminum Bar - 6061-T6511-TB - 6061-T6511-TB-3.7500-.3750-23.7000” then ‘product_size’ will be “3.7500-.3750-23.7000".

 

if the ‘product_description’ is “ALUM EXTRUDED ROUND TUBE - BAR - 1-Aluminum Bar - 6061-T6511-TB - 6061-T6511-TB-4.2500-1.0000-59.6630" then ‘product_size’ will be “4.2500-1.0000-59.6630".

 

unity catalog details: agilisium_playground.purgo_playground.product_desc

 

Expected Output: Databricks Pyspark code without syntax error

Purgo AI Agentic Code

bottom of page