Skip to main content

ExperimentConfig

The main configuration class that orchestrates all aspects of model training.
data_configs
DataConfig | List[DataConfig]
required
Data configuration(s) for potentially multi-objective modeling.
optimization_config
OptimizationConfig
required
Optimization and training parameters.
model_config
ModelConfig
required
Model architecture and initialization parameters.
meta_config
MetaConfig
default:"Uses MetaConfig defaults"
Metadata and run-specific parameters.

MetaConfig

Configuration for experiment metadata and checkpointing behavior.
name
str
default:"trial-run"
Name identifier for this experimental run.
seed
int
default:"42"
Random seed for reproducible training.
save_path
str
default:"current working directory / run_name"
Directory path for saving model checkpoints.
model_save_frequency
int
default:"-1"
Frequency (in steps) for saving model checkpoints. Set to -1 to save only at the end of training.
max_checkpoints
int
default:"-1"
Maximum number of model checkpoints to retain. Set to -1 for no limit.
wandb_config
WandbConfig | None
default:"None"
Weights & Biases logging configuration for experiment tracking and visualization.

WandbConfig

Configuration for Weights & Biases experiment tracking and logging.
project
str
required
Weights & Biases project name for organizing experiments.
entity
str | None
default:"None"
Weights & Biases team/organization name. If None, uses the default entity associated with your API key.
run_name
str | None
default:"None"
Custom run name for the experiment. If None, uses the MetaConfig name or auto-generates one.
tags
List[str] | None
default:"None"
List of tags to associate with the run for easy filtering and organization.
notes
str | None
default:"None"
Optional notes or description for the experiment run.
log_model
bool
default:"False"
Whether to log the model as a Weights & Biases artifact for version control.
log_frequency
int
default:"100"
Frequency (in steps) for logging metrics to Weights & Biases.
log_gradients
bool
default:"False"
Whether to log gradient histograms (can impact performance).
log_parameters
bool
default:"False"
Whether to log parameter histograms (can impact performance).
watch_model
str | None
default:"None"
Model watching mode for logging gradients and parameters:
  • "gradients": Log gradient histograms
  • "parameters": Log parameter histograms
  • "all": Log both gradients and parameters
  • None: Disable model watching
config
dict | None
default:"None"
Additional configuration dictionary to log to Weights & Biases.

DataConfig

Configuration for training data and objectives. Can be specified as a single instance or list for multi-task learning.
data_paths
str | List[str]
required
Path(s) to preprocessed data files.
features
str | List[str] | callable | List[callable] | None
default:"None"
Feature engineering functions for lag tokens (historical lag features) and exogenous variables (external variables). Can be string identifier(s) or custom function(s).
sampling_weight
float | None
default:"Equal weight among all data configs"
Relative sampling weight for this data source (normalized to sum to 1 across all data configs).
training_objective
str | callable
default:"cross_entropy"
Loss function specification:
  • "cross_entropy": Chronos-style or text cross-entropy loss
  • "mse": Mean Squared Error (TimesFM-style)
  • "quantile" or "pinball": Quantile/Pinball loss (TiRex-style)
  • "multi_task": Multi-task learning (TimesFM 2.0-style)
  • Custom callable loss function
validation_split
float
default:"0.1"
Portion of the dataset to use as validation data (0.0-1.0, where 1.0 means all data is validation).
At least one DataConfig must have validation_split < 1.0 for training to proceed.

OptimizationConfig

Configuration for training optimization parameters.
total_training_steps
int
required
Total number of training steps for the experiment.
max_learning_rate
float
required
Maximum learning rate value.
global_batch_size
int
required
Global batch size for training.
learning_rate_schedule
str | callable
default:"constant"
Learning rate scheduling strategy:
  • String options: "constant", "linear", "cosine", "exponential"
  • Custom function with signature: (learning_rate, current_step, total_steps) → decayed_rate
Warmup is applied after this schedule and must be disabled separately if not needed
warmup_steps
int
default:"0"
Number of learning rate warmup steps.
decay_steps
int
default:"0"
Number of learning rate decay steps. Must be set to 0 when using custom learning rate schedules.
Constraint: warmup_steps + decay_steps ≤ total_training_steps
min_learning_rate
float | None
default:"max_learning_rate / 10"
Minimum learning rate value.
optimizer_type
str
default:"Adam"
Optimizer algorithm. Options: "Adam", "SGD", "Lion"
weight_decay
float
default:"0.01"
L2 regularization coefficient.
z_loss
float
default:"0.0"
Z-loss regularization coefficient. Set to 0.0 to disable.
load_balancing
float | None
default:"None"
Load balancing coefficient for Mixture of Experts (MoE) models. Only applicable for MoE architectures.
clip_grad
float
default:"1.0"
Gradient clipping threshold based on global L2 norm.

ModelConfig

Model architecture and initialization configuration.
architecture
str
required
Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
init_method
str
default:"normal"
Weight initialization strategy:
  • "none": Load from pre-trained model (Qwen/LLaMA/Gemma)
  • "normal": Normal distribution initialization
  • "xavier_uniform": Xavier uniform initialization
  • "wang_init": Wang initialization method
model_path
str | None
default:"None"
Path to pre-trained model for continual training. Must be None if init_method is not "none".
load_optimizer
bool | None
default:"None"
Whether to load optimizer state from checkpoint. Set to True for continual training from checkpoint.
precision
str
default:"fp16"
Model precision configuration:
  • "binary": Binary precision (1-bit)
  • "ternary": Ternary precision (1.58-bit)
  • "int2": 2-bit integer precision
  • "fp8": 8-bit floating point
  • "mxfp4": 4-bit microscaling floating point
  • "mxfp6": 6-bit microscaling floating point
  • "ue8m0": 8-bit unsigned integer with 0 exponent bits
  • "fp16": 16-bit floating point (default)
  • "fp32": 32-bit floating point