Develop Function Result Matching Between Branches

Requirement

Introduction: This validation mechanism to compare the function_result column between the branch_name values. The goal is to automatically identify and flag the output as function_result_status (matched/mismatched) with branch_name and function_name to ensure consistency in critical outputs during development.

Requirement: Develop PySpark logic to validate and compare function results between main_branch and feature_branch using the source table purgo_playground.function_match. Read the purgo_playground.function_match table into a DataFrame. Pivot the dataset by branch_name using groupBy(run_id). Capture the first function_result per run_id for each branch, but does not differentiate by function_name. Filter the source data for records where branch_name = 'main_branch', to retrieve metadata (function_name, branch_name) associated with each run_id. Join the filtered main_branch records with the pivoted result on run_id (not on run_id + function_name). Compare the function_result values between the two branches: If main_branch == feature_branch, mark function_result_status as "Matched". Otherwise, mark as "Mismatched". Rename the pivoted columns: main_branch as function_result_main, feature_branch as function_result_feature.

Final output: Display the columns of run_id, branch_name(always from main_branch), function_name, function_result_main, function_result_feature, function_result_status

Unity Catalog: purgo_playground.function_match

Develop Function Result Matching Between Branches

Requirement

Purgo AI Agentic Code