Finetuning Configuration
Finetuning allows you to adapt pre-trained models to specific tasks or domains with minimal computational overhead. The finetuning process leverages existing model knowledge while updating parameters to optimize for your specific use case.FinetuningConfig
The main configuration class for finetuning experiments, building on the ExperimentConfig structure.Path to the pre-trained model checkpoint or Hugging Face model identifier to finetune from
Data configuration(s) for finetuning tasks. Supports multi-task finetuning scenarios.
Finetuning-specific optimization parameters with typically lower learning rates
Model architecture configuration. Must match the base model architecture.
Metadata and run-specific parameters for the finetuning experiment
MetaConfig
Configuration for experiment metadata and checkpointing behavior.Name identifier for this finetuning experimental run.
Random seed for reproducible finetuning.
Directory path for saving finetuned model checkpoints.
Frequency (in steps) for saving model checkpoints. Set to -1 to save only at the end of finetuning.
Maximum number of model checkpoints to retain. Set to -1 for no limit.
Weights & Biases logging configuration for finetuning experiment tracking and visualization
WandbConfig
Configuration for Weights & Biases experiment tracking and logging during finetuning.Weights & Biases project name for organizing finetuning experiments
Weights & Biases team/organization name. If None, uses the default entity associated with your API key.
Custom run name for the finetuning experiment. If None, uses the MetaConfig name or auto-generates one.
List of tags to associate with the finetuning run for easy filtering and organization
Optional notes or description for the finetuning experiment run
Whether to log the finetuned model as a Weights & Biases artifact for version control. Defaults to True for finetuning.
Frequency (in steps) for logging metrics to Weights & Biases. Lower default for finetuning due to fewer total steps.
Whether to log gradient histograms (can impact performance)
Whether to log parameter histograms (can impact performance)
Model watching mode for logging gradients and parameters:
"gradients": Log gradient histograms"parameters": Log parameter histograms"all": Log both gradients and parametersNone: Disable model watching
Additional configuration dictionary to log to Weights & Biases
Whether to log LoRA adapter weights as artifacts when using LoRA finetuning
FinetuningOptimizationConfig
Specialized optimization configuration for finetuning with recommended parameter ranges.Total number of finetuning steps. Typically much lower than full training (500-5000 steps).
Maximum learning rate for finetuning. Recommended range: 1e-5 to 5e-4 (lower than full training).
Global batch size for finetuning. Can be smaller than full training due to fewer steps.
Learning rate scheduling strategy for finetuning:
"linear": Linear decay (recommended for finetuning)"cosine": Cosine annealing"constant": Constant learning rate- Custom function with signature:
(learning_rate, current_step, total_steps) → decayed_rate
Number of learning rate warmup steps. Typically 5-10% of total finetuning steps.
L2 regularization coefficient. Important for preventing overfitting in finetuning.
Number of steps to accumulate gradients before updating. Useful for effective larger batch sizes.
List of layer patterns to freeze during finetuning. Example:
["embeddings", "layer.0", "layer.1"]Low-Rank Adaptation configuration for parameter-efficient finetuning
Optimizer algorithm. Options:
"AdamW", "Adam", "SGD"Gradient clipping threshold. Important for stability in finetuning.
LoRAConfig
Configuration for Low-Rank Adaptation (LoRA) parameter-efficient finetuning.Rank of the adaptation matrices. Higher rank = more parameters but better expressiveness.
LoRA scaling parameter. Controls the magnitude of the adaptation.
Dropout probability for LoRA layers.
List of module names to apply LoRA to. If None, automatically targets attention and MLP layers.
Bias handling strategy:
"none": No bias adaptation"all": Adapt all biases"lora_only": Only adapt LoRA biases
FinetuningDataConfig
Extended data configuration with finetuning-specific options.Path(s) to finetuning data files. Should be formatted according to your task type.
Type of finetuning task:
"text_generation": Generative language modeling"classification": Text classification"instruction_following": Instruction-tuning"code_generation": Code completion/generation"time_series_forecasting": Time series tasks
Maximum sequence length for finetuning examples. Shorter than training can speed up finetuning.
Portion of data reserved for validation during finetuning.
Task-specific preprocessing options:
- For instruction tuning:
{"format": "alpaca", "prompt_template": "..."} - For classification:
{"label_column": "label", "text_column": "text"}
Example Configurations
Advanced Features
Multi-Task Finetuning
Finetune on multiple related tasks simultaneously for better generalization:Curriculum Learning
Gradually increase task complexity during finetuning:Curriculum learning support is planned for future releases.
convert_finetuned_to_hf()
Convert finetuned models to Hugging Face format, preserving both base model and adaptations.Path to the finetuned checkpoint directory
Path to the finetuning configuration YAML file
Destination directory for the converted Hugging Face model
Whether to merge LoRA weights into the base model. If False, saves LoRA adapters separately.
Whether to directly upload the converted model to Hugging Face Hub

