Model Configuration
ModelConfig
Configuration for model architecture and initialization.Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
TSFM and other architectures coming soon.
Weight initialization strategy:
"none": Load from pre-trained model (Qwen/LLaMA/Gemma)"normal": Normal distribution initialization"xavier_uniform": Xavier uniform initialization"wang_init": Wang initialization method
Path to pre-trained model for continual training. Must be
None if init_method is not "none".Whether to load optimizer state from checkpoint. Set to
True for continual training from checkpoint.Model precision configuration:
"binary": Binary precision (1-bit)"ternary": Ternary precision (1.58-bit)"int2": 2-bit integer precision"fp8": 8-bit floating point"mxfp4": 4-bit microscaling floating point"mxfp6": 6-bit microscaling floating point"ue8m0": 8-bit unsigned integer with 0 exponent bits"fp16": 16-bit floating point (default)"fp32": 32-bit floating point
Example Configurations
Supported Architectures
- Dense Models
- Mixture of Experts (MoE)
- Time Series Foundation Models
- Low Precision Models
- LLaMA Series
- Qwen Series
- Gemma Series

