Skip to main content
Version: 2024.1

Hugging Face Automated Translation

This Action enables the translation of text within Assets and Data Objects, either as a standalone process or as part of an automation pipeline (e.g., after generating text). Utilizing advanced AI models from Hugging Face, it provides fast and accurate translations for a variety of use cases, including but not limited to, translating multiple text fields in data objects and asset metadata.

The action supports translations for single or multiple Data Objects/Assets, allowing for flexible integration into workflows as an automation action step or through user-initiated flows.

Configuration Options

#The endpoint to use
model_endpoint: 'https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-en-de'

#Determines where the input text is sourced from
input_source: 'subject'

#Targets where the translated text will be saved
output_targets:
save_on_subject: true
save_on_execution_context: true

Detailed Configuration Options

model_endpoint: (Required) Specifies the endpoint for the AI model to use. The default base URL is https://api-inference.huggingface.co. If only the model name is provided (e.g., /models/Helsinki-NLP/opus-mt-en-fr), this default base URL is used. However, if a full URL is specified, it will override the default base URL.

input_source: (Required) Determines where the input text is sourced from, allowing for flexibility in the translation process. The valid options are:

  • subject refers to the Data Object or Asset being processed
  • execution_context refers to the job's execution context
  • environment_data refers to the environment data of the copilot configuration

output_targets: (Required) Specifies the targets where the translated text will be saved.

  • save_on_subject: (Required) Determines whether the translated text is saved directly on the Data Object or Asset itself. Setting this to true enables the result to be stored within the specified 'output_field' of the subject.
  • save_on_execution_context: (Required) Specifies if the translated text should be saved into the job's execution context. This is useful for passing the results to subsequent steps within the same job, allowing other actions to work with the generated text.

input_field: (Required) Specifies the name of the field containing the text that needs to be translated. This field should be a string that corresponds to an attribute or metadata of the Data Object or Asset. Or it can be the key of the field in the execution context. If input_source is set to execution_context, this field should be the key in the execution context. If input_source is set to environment_data, this field should be the key of the environment Variable.

input_language: (Optional) Specifies the language of the input_field.

output_field: (Required) Specifies the field within the Data Object or Asset where the translated text should be stored. If save_on_execution_context set to true, this field defines the key in the execution context where the generated text will be stored.

output_language: (Optional) Specifies the language of the output_field.

options: (Optional) setting includes:

  • use_cache (Default: true). A boolean that determines if cached responses for similar prompts should be reused. Defaulted to true, it enhances efficiency by avoiding repetitive text generation. Setting it to false ensures fresh generation for each request, suitable for unique content needs.

Examples

Data Object Example

model_endpoint: 'https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-en-de'
input_source: 'subject'
output_targets:
save_on_subject: true
save_on_execution_context: true
input_field: description
input_language: en
output_field: description
output_language: de
options:
use_cache: false

Asset Example

model_endpoint: 'https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-en-es'
input_source: 'subject'
output_targets:
save_on_subject: true
save_on_execution_context: true
input_field: CarImages.title
input_language: en
output_field: CarImages.title
output_language: es
options:
use_cache: false

Additional Information

For accurate translations, it's important to select the right model from Hugging Face. The model_endpoint must include both the input and output languages, following the pattern /models/Helsinki-NLP/opus-mt-[input_language]-[output_language].

For example, translating English to French requires the endpoint https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-en-fr, where en is English and fr is French. This ensures the action uses a model specifically trained for your desired language pair.

The model_endpoint should always match the input_language and output_language.

Usage of the execution context

This action can update the execution context with the translated text when save_on_execution_context is set to true. The array structure of the execution context is as follows:

$content[$id_of_the_processed_object][$output_field] = $translated_text;

If the input_source is set to execution_context, the action will look for the following structure in the execution context:

$content[$id_of_the_processed_object][$input_field] = $content_to_translate;