Skip to main content

FinetuningConfig

base_model_path
str
required
Path to the pre-trained model checkpoint or Hugging Face model identifier to finetune from.
data_configs
FinetuningDataConfig | List[FinetuningDataConfig]
required
Data configuration(s) for finetuning tasks. Supports multi-task finetuning scenarios.
finetuning_config
FinetuningOptimizationConfig
required
Finetuning-specific optimization parameters with typically lower learning rates.
model_config
ModelConfig
required
Model architecture configuration. Must match the base model architecture.
meta_config
MetaConfig
default:"Uses MetaConfig defaults"
Metadata and run-specific parameters for the finetuning experiment.

FinetuningOptimizationConfig

total_training_steps
int
required
Total number of finetuning steps. Typically much lower than full training (500-5000 steps).
max_learning_rate
float
required
Maximum learning rate for finetuning. Recommended range: 1e-5 to 5e-4 (lower than full training).
global_batch_size
int
required
Global batch size for finetuning. Can be smaller than full training due to fewer steps.
learning_rate_schedule
str | callable
default:"linear"
Learning rate scheduling strategy for finetuning:
  • "linear": Linear decay (recommended for finetuning)
  • "cosine": Cosine annealing
  • "constant": Constant learning rate
  • Custom function with signature: (learning_rate, current_step, total_steps) → decayed_rate
warmup_steps
int
default:"50"
Number of learning rate warmup steps. Typically 5-10% of total finetuning steps.
weight_decay
float
default:"0.01"
L2 regularization coefficient. Important for preventing overfitting in finetuning.
gradient_accumulation_steps
int
default:"1"
Number of steps to accumulate gradients before updating. Useful for effective larger batch sizes.
freeze_layers
List[str] | None
default:"None"
List of layer patterns to freeze during finetuning. Example: ["embeddings", "layer.0", "layer.1"]
lora_config
LoRAConfig | None
default:"None"
Low-Rank Adaptation configuration for parameter-efficient finetuning.
optimizer_type
str
default:"AdamW"
Optimizer algorithm. Options: "AdamW", "Adam", "SGD"
clip_grad
float
default:"1.0"
Gradient clipping threshold. Important for stability in finetuning.

LoRAConfig

rank
int
default:"16"
Rank of the adaptation matrices. Higher rank = more parameters but better expressiveness.
alpha
float
default:"32"
LoRA scaling parameter. Controls the magnitude of the adaptation.
dropout
float
default:"0.1"
Dropout probability for LoRA layers.
target_modules
List[str] | None
default:"Auto-detected"
List of module names to apply LoRA to. If None, automatically targets attention and MLP layers.
bias
str
default:"none"
Bias handling strategy:
  • "none": No bias adaptation
  • "all": Adapt all biases
  • "lora_only": Only adapt LoRA biases

FinetuningDataConfig

data_paths
str | List[str]
required
Path(s) to finetuning data files. Should be formatted according to your task type.
task_type
str
default:"text_generation"
Type of finetuning task:
  • "text_generation": Generative language modeling
  • "classification": Text classification
  • "instruction_following": Instruction-tuning
  • "code_generation": Code completion/generation
  • "time_series_forecasting": Time series tasks
max_sequence_length
int
default:"512"
Maximum sequence length for finetuning examples. Shorter than training can speed up finetuning.
validation_split
float
default:"0.1"
Portion of data reserved for validation during finetuning.
training_objective
str | callable
default:"cross_entropy"
Loss function specification:
  • "cross_entropy": Cross-entropy loss for text generation
  • "classification": Classification loss with label smoothing
  • "mse": Mean Squared Error for regression tasks
  • Custom callable loss function
data_preprocessing
dict | None
default:"None"
Task-specific preprocessing options:
  • For instruction tuning: {"format": "alpaca", "prompt_template": "..."}
  • For classification: {"label_column": "label", "text_column": "text"}
  • For code: {"language": "python", "max_context_length": 1024}
sampling_weight
float | None
default:"Equal weight among all data configs"
Relative sampling weight for this data source in multi-task finetuning scenarios.
features
str | List[str] | callable | List[callable] | None
default:"None"
Feature engineering functions for specialized finetuning tasks.

MetaConfig

Configuration for experiment metadata and checkpointing behavior.
name
str
default:"trial-run"
Name identifier for this finetuning experimental run.
seed
int
default:"42"
Random seed for reproducible finetuning.
save_path
str
default:"current working directory / run_name"
Directory path for saving finetuned model checkpoints.
model_save_frequency
int
default:"-1"
Frequency (in steps) for saving model checkpoints. Set to -1 to save only at the end of finetuning.
max_checkpoints
int
default:"-1"
Maximum number of model checkpoints to retain. Set to -1 for no limit.
wandb_config
WandbConfig | None
default:"None"
Weights & Biases logging configuration for finetuning experiment tracking and visualization.

WandbConfig

Configuration for Weights & Biases experiment tracking and logging during finetuning.
project
str
required
Weights & Biases project name for organizing finetuning experiments.
entity
str | None
default:"None"
Weights & Biases team/organization name. If None, uses the default entity associated with your API key.
run_name
str | None
default:"None"
Custom run name for the finetuning experiment. If None, uses the MetaConfig name or auto-generates one.
tags
List[str] | None
default:"None"
List of tags to associate with the finetuning run for easy filtering and organization.
notes
str | None
default:"None"
Optional notes or description for the finetuning experiment run.
log_model
bool
default:"True"
Whether to log the finetuned model as a Weights & Biases artifact for version control. Defaults to True for finetuning.
log_frequency
int
default:"50"
Frequency (in steps) for logging metrics to Weights & Biases. Lower default for finetuning due to fewer total steps.
log_gradients
bool
default:"False"
Whether to log gradient histograms (can impact performance).
log_parameters
bool
default:"False"
Whether to log parameter histograms (can impact performance).
watch_model
str | None
default:"None"
Model watching mode for logging gradients and parameters:
  • "gradients": Log gradient histograms
  • "parameters": Log parameter histograms
  • "all": Log both gradients and parameters
  • None: Disable model watching
config
dict | None
default:"None"
Additional configuration dictionary to log to Weights & Biases.
log_lora_weights
bool
default:"True"
Whether to log LoRA adapter weights as artifacts when using LoRA finetuning.