Skip to main content
Version: 2024.3

Hugging Face Fine-tune Models

The following action steps utilize Pimcore Fine-Tuning Service to fine-tune AI models based on data managed within Pimcore.

Supported tasks are image and text classifications. They rely on data filtered by Filter Data Objects or Filter Assets action steps and consist of two steps - preparing data and starting training. It is recommended, to use the Cleanup Tmp Files action step at the end to clean up temporary.

The training itself is then executed on a Pimcore Fine-Tuning Service instance and monitored by the start training action step. The Pimcore Fine-Tuning Service can be hosted on-premises or in a Hugging face space, details also see the Readme. It is recommended to use GPU instances for training, depending on your training data size and fine-tuned models, different sizes of GPUs will be necessary.

Once a training job is finished, the fine-tuned model is uploaded to Hugging face hub and can be used in other Copilot action steps to execute classifications to newly added data.

Image Classification

Image classification is based on tags assigned to the image assets.

Preparing Data

Data preparation includes following steps:

  • Read filtered assets from job run context.
  • Extracts classification from asset tags based on defined parentTagPath setting. Uses first leave tag it finds, assets with no corresponding classification tag are skipped.
  • Calculates thumbnails of assets based on configuration (default is 300px width JPG). It is beneficial to use the same thumbnail definition when utilizing the fine-tuned model for classification tasks.
  • Packs all thumbnails with classification folder structure into a zip file named huggingface-training-export/JOBRUN_ID.zip in temp folder.
  • Adds zip file to clean up list for later clean up by the Cleanup Tmp Files action step.

Starting Training

Starting training includes following steps:

  • Gets training file from job run context.
  • Upload training file and starts training at configured Pimcore Fine-tuning Service instance.
  • Waits for training to finish.

Settings for the training include:

  • project_name: Project name - also used as name for resulting model
  • base_url: URL of the Pimcore fine-tuning service
  • access_token: Access token for Pimcore fine-tuning service. Needs to be the same token as the AUTHENTICATION_TOKEN defined in the Pimcore Fine-tuning Service instance.
  • source_model: Model to be used as a base for fine-tuning
  • epochs: Number of epochs for training
  • learning_rate: Learning rate for training

Text Classification

Text classification is based on data fields assigned to the data objects. The value based on which classification should take place can be defined via a twig template.

Preparing Data

Data preparation includes following steps:

  • Read filtered data objects from job run context.
  • Extracts classification from data object using target_field setting.
  • Generates value based on value_template setting.
  • Packs all rows into a csv file named huggingface-training-export/JOBRUN_ID.csv in temp folder.
  • Adds csv file to clean up list for later clean up by the Cleanup Tmp Files action step.

Starting Training

Starting training includes following steps:

  • Gets training file from job run context.
  • Starts training at Pimcore fine-tuning service.
  • Waits for training to finish.

Settings for the training include:

  • project_name: Project name - also used as name for resulting model
  • base_url: URL of the Pimcore fine-tuning service
  • access_token: Access token for Pimcore fine-tuning service. Needs to be the same token as the AUTHENTICATION_TOKEN defined in the Pimcore Fine-tuning Service instance.
  • source_model: Model to be used as a base for fine-tuning
  • epochs: Number of epochs for training
  • learning_rate: Learning rate for training

Sample Training Action Configuration

A typical training action for asset classification fine-tuning will consist of the following steps:

  • Filter Assets
  • Hugging Face Prepare Training Asset Classification
  • (Optional) Hugging Face Start Hugging face space
  • Hugging Face Start Training Asset Classification
  • (Optional) Hugging Face Stop Hugging face space
  • Cleanup Tmp Files