top of page

Non English to English word conversion

Requirement

Introduction: datasets often include text in multiple languages, especially when dealing with user-generated content, international customer feedback, or multilingual documents. To standardize these datasets for analysis, it is crucial to translate non-English text into an English language.

 

h3. Requirements:

 

  1. Functionality:

#* The script should translate non-English content in a specific column (text) of the 'purgo_playground.other_language' table to English.

#* The translation should detect the source language automatically and convert it to English.

  1. Input Details:

#* The input column may contain text in various languages or already in English.

  1. Translation Logic:

#* Automatically detect the language of the input text using the GoogleTranslator class from the deep-translator library.

#* If the text is already in English, retain it without modification.

  1. Final Output Details:

#* A PySpark DataFrame in which the ‘text' column has all non-English entries translated into English in to 'translated’ column, while the original English entries remain unchanged.

#* Save and OverWrite the output to the ‘purgo_playground.translated_other_language’ with text columns and converted columns. set mergeSchemaoption as true.

  1. Technology Stack:

# deep-translator Library*: To perform automatic language detection and translation using Google Translate.

 

Unity Catalog Details: ‘purgo_playground.other_language’

 

Prerequisites :

 

* Install deep-translator library using %pip command.

* Drop the target table 'purgo_playground.translated_other_language' if exist and perform the requirement.

 

Expected Output: Databricks Pyspark code

Purgo AI Agentic Code

bottom of page