ExperimentConfig
The main configuration class that orchestrates all aspects of model training.Data configuration(s) for potentially multi-objective modeling.
Optimization and training parameters.
Model architecture and initialization parameters.
Metadata and run-specific parameters.
MetaConfig
Configuration for experiment metadata and checkpointing behavior.Name identifier for this experimental run.
Random seed for reproducible training.
Directory path for saving model checkpoints.
Frequency (in steps) for saving model checkpoints. Set to -1 to save only at the end of training.
Maximum number of model checkpoints to retain. Set to -1 for no limit.
Weights & Biases logging configuration for experiment tracking and visualization.
WandbConfig
Configuration for Weights & Biases experiment tracking and logging.Weights & Biases project name for organizing experiments.
Weights & Biases team/organization name. If None, uses the default entity associated with your API key.
Custom run name for the experiment. If None, uses the MetaConfig name or auto-generates one.
List of tags to associate with the run for easy filtering and organization.
Optional notes or description for the experiment run.
Whether to log the model as a Weights & Biases artifact for version control.
Frequency (in steps) for logging metrics to Weights & Biases.
Whether to log gradient histograms (can impact performance).
Whether to log parameter histograms (can impact performance).
Model watching mode for logging gradients and parameters:
"gradients": Log gradient histograms"parameters": Log parameter histograms"all": Log both gradients and parametersNone: Disable model watching
Additional configuration dictionary to log to Weights & Biases.
DataConfig
Configuration for training data and objectives. Can be specified as a single instance or list for multi-task learning.Path(s) to preprocessed data files.
Feature engineering functions for lag tokens (historical lag features) and exogenous variables (external variables). Can be string identifier(s) or custom function(s).
Relative sampling weight for this data source (normalized to sum to 1 across all data configs).
Loss function specification:
"cross_entropy": Chronos-style or text cross-entropy loss"mse": Mean Squared Error (TimesFM-style)"quantile"or"pinball": Quantile/Pinball loss (TiRex-style)"multi_task": Multi-task learning (TimesFM 2.0-style)- Custom callable loss function
Portion of the dataset to use as validation data (0.0-1.0, where 1.0 means all data is validation).
OptimizationConfig
Configuration for training optimization parameters.Total number of training steps for the experiment.
Maximum learning rate value.
Global batch size for training.
Learning rate scheduling strategy:
- String options:
"constant","linear","cosine","exponential" - Custom function with signature:
(learning_rate, current_step, total_steps) → decayed_rate
Warmup is applied after this schedule and must be disabled separately if not needed
Number of learning rate warmup steps.
Number of learning rate decay steps. Must be set to 0 when using custom learning rate schedules.
Minimum learning rate value.
Optimizer algorithm. Options:
"Adam", "SGD", "Lion"L2 regularization coefficient.
Z-loss regularization coefficient. Set to 0.0 to disable.
Load balancing coefficient for Mixture of Experts (MoE) models. Only applicable for MoE architectures.
Gradient clipping threshold based on global L2 norm.
ModelConfig
Model architecture and initialization configuration.Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
Weight initialization strategy:
"none": Load from pre-trained model (Qwen/LLaMA/Gemma)"normal": Normal distribution initialization"xavier_uniform": Xavier uniform initialization"wang_init": Wang initialization method
Path to pre-trained model for continual training. Must be
None if init_method is not "none".Whether to load optimizer state from checkpoint. Set to
True for continual training from checkpoint.Model precision configuration:
"binary": Binary precision (1-bit)"ternary": Ternary precision (1.58-bit)"int2": 2-bit integer precision"fp8": 8-bit floating point"mxfp4": 4-bit microscaling floating point"mxfp6": 6-bit microscaling floating point"ue8m0": 8-bit unsigned integer with 0 exponent bits"fp16": 16-bit floating point (default)"fp32": 32-bit floating point

