OptimizationConfig
Total number of training steps for the experiment
Maximum learning rate value
Global batch size for training
Learning rate scheduling strategy:
- String options:
"constant","linear","cosine","exponential" - Custom function with signature:
(learning_rate, current_step, total_steps) → decayed_rate
Number of learning rate warmup steps.
Number of learning rate decay steps. Must be set to 0 when using custom learning rate schedules.Constraint:
warmup_steps + decay_steps ≤ total_training_stepsMinimum learning rate value.
Optimizer algorithm. Options:
"Adam", "SGD", "Lion"L2 regularization coefficient.
Z-loss regularization coefficient. Set to 0.0 to disable.
Load balancing coefficient for Mixture of Experts (MoE) models. Only applicable for MoE architectures.
Gradient clipping threshold based on global L2 norm.
ExperimentConfig
Data configuration(s) for potentially multi-objective modeling
Optimization and training parameters
Model architecture and initialization parameters
Metadata and run-specific parameters
MetaConfig
Name identifier for this experimental run.
Random seed for reproducible training.
Directory path for saving model checkpoints.
Frequency (in steps) for saving model checkpoints. Set to -1 to save only at the end of training.
Maximum number of model checkpoints to retain. Set to -1 for no limit.
DataConfig
Path(s) to preprocessed data files
Feature engineering functions for lag tokens (historical lag features) and exogenous variables (external variables). Can be string identifier(s) or custom function(s).
Relative sampling weight for this data source (normalized to sum to 1 across all data configs).
Loss function specification:
"cross_entropy": Chronos-style or text cross-entropy loss"mse": Mean Squared Error (TimesFM-style)"quantile"or"pinball": Quantile/Pinball loss (TiRex-style)"multi_task": Multi-task learning (TimesFM 2.0-style)- Custom callable loss function
Portion of the dataset to use as validation data (0.0-1.0, where 1.0 means all data is validation).

