FinetuningConfig
Path to the pre-trained model checkpoint or Hugging Face model identifier to finetune from.
Data configuration(s) for finetuning tasks. Supports multi-task finetuning scenarios.
Finetuning-specific optimization parameters with typically lower learning rates.
Model architecture configuration. Must match the base model architecture.
Metadata and run-specific parameters for the finetuning experiment.
FinetuningOptimizationConfig
Total number of finetuning steps. Typically much lower than full training (500-5000 steps).
Maximum learning rate for finetuning. Recommended range: 1e-5 to 5e-4 (lower than full training).
Global batch size for finetuning. Can be smaller than full training due to fewer steps.
Learning rate scheduling strategy for finetuning:
"linear": Linear decay (recommended for finetuning)"cosine": Cosine annealing"constant": Constant learning rate- Custom function with signature:
(learning_rate, current_step, total_steps) → decayed_rate
Number of learning rate warmup steps. Typically 5-10% of total finetuning steps.
L2 regularization coefficient. Important for preventing overfitting in finetuning.
Number of steps to accumulate gradients before updating. Useful for effective larger batch sizes.
List of layer patterns to freeze during finetuning. Example:
["embeddings", "layer.0", "layer.1"]Low-Rank Adaptation configuration for parameter-efficient finetuning.
Optimizer algorithm. Options:
"AdamW", "Adam", "SGD"Gradient clipping threshold. Important for stability in finetuning.
LoRAConfig
Rank of the adaptation matrices. Higher rank = more parameters but better expressiveness.
LoRA scaling parameter. Controls the magnitude of the adaptation.
Dropout probability for LoRA layers.
List of module names to apply LoRA to. If None, automatically targets attention and MLP layers.
Bias handling strategy:
"none": No bias adaptation"all": Adapt all biases"lora_only": Only adapt LoRA biases
FinetuningDataConfig
Path(s) to finetuning data files. Should be formatted according to your task type.
Type of finetuning task:
"text_generation": Generative language modeling"classification": Text classification"instruction_following": Instruction-tuning"code_generation": Code completion/generation"time_series_forecasting": Time series tasks
Maximum sequence length for finetuning examples. Shorter than training can speed up finetuning.
Portion of data reserved for validation during finetuning.
Loss function specification:
"cross_entropy": Cross-entropy loss for text generation"classification": Classification loss with label smoothing"mse": Mean Squared Error for regression tasks- Custom callable loss function
Task-specific preprocessing options:
- For instruction tuning:
{"format": "alpaca", "prompt_template": "..."} - For classification:
{"label_column": "label", "text_column": "text"} - For code:
{"language": "python", "max_context_length": 1024}
Relative sampling weight for this data source in multi-task finetuning scenarios.
Feature engineering functions for specialized finetuning tasks.
MetaConfig
Configuration for experiment metadata and checkpointing behavior.Name identifier for this finetuning experimental run.
Random seed for reproducible finetuning.
Directory path for saving finetuned model checkpoints.
Frequency (in steps) for saving model checkpoints. Set to -1 to save only at the end of finetuning.
Maximum number of model checkpoints to retain. Set to -1 for no limit.
Weights & Biases logging configuration for finetuning experiment tracking and visualization.
WandbConfig
Configuration for Weights & Biases experiment tracking and logging during finetuning.Weights & Biases project name for organizing finetuning experiments.
Weights & Biases team/organization name. If None, uses the default entity associated with your API key.
Custom run name for the finetuning experiment. If None, uses the MetaConfig name or auto-generates one.
List of tags to associate with the finetuning run for easy filtering and organization.
Optional notes or description for the finetuning experiment run.
Whether to log the finetuned model as a Weights & Biases artifact for version control. Defaults to True for finetuning.
Frequency (in steps) for logging metrics to Weights & Biases. Lower default for finetuning due to fewer total steps.
Whether to log gradient histograms (can impact performance).
Whether to log parameter histograms (can impact performance).
Model watching mode for logging gradients and parameters:
"gradients": Log gradient histograms"parameters": Log parameter histograms"all": Log both gradients and parametersNone: Disable model watching
Additional configuration dictionary to log to Weights & Biases.
Whether to log LoRA adapter weights as artifacts when using LoRA finetuning.

