Fraud insurance claim detection
Requirement
h3. Information:
Fraud detection in insurance claims focuses on identifying and preventing deceptive practices during claim submission, processing, or payment. Effective fraud detection is crucial for maintaining the integrity of the insurance industry, minimizing financial losses, and ensuring fair treatment of legitimate claimants.
h3. Requirements:
Create a Databricks PySpark script to detect fraudulent activities in the health_insurance_claims table by implementing the following rules:
Duplicate Claims:* Identify claims with the same column values “Patient_ID", “Service_Date", and “Procedure_Code" submitted multiple times.
Unusual Procedure for Age Group:* Detect claims with procedures inconsistent with the patient’s age (e.g., column “Procedure_Code" with values 99213, 93000, and 70450 are valid only for “Age” 25+).
Excessive Patient Paid Amount:* Flag claims where the column “Patient_Paid" amount exceeds the “Allowed_Amount".
Suspicious Medication Claims:*
* Atorvastatin*: Should not be associated with “Diagnosis_Code" column values K21, E11, or J45.
* Ventolin*: Should not be associated with “Diagnosis_Code" column values E11, I10, or N18.
Backdated or Future Claims:* Identify claims with “Service_Date" column in the future or older than 3 years.
Add is_fraud column to the table.
Prerequisite:
- Drop the table health_insurance_claims_clone if exist.
- Create replica of health_insurance_claims table in health_insurance_claims_clone table and perform the requirement in the replica table.
Unity catalog information: purgo_playground.health_insurance_claims
Expected output: Databricks Pyspark/Spark SQL