ModelConfig
Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
Weight initialization strategy:
"none": Load from pre-trained model (Qwen/LLaMA/Gemma)"normal": Normal distribution initialization"xavier_uniform": Xavier uniform initialization"wang_init": Wang initialization method
Path to pre-trained model for continual training. Must be
None if init_method is not "none".Whether to load optimizer state from checkpoint. Set to
True for continual training from checkpoint.Model precision configuration:
"binary": Binary precision (1-bit)"ternary": Ternary precision (1.58-bit)"int2": 2-bit integer precision"fp8": 8-bit floating point"mxfp4": 4-bit microscaling floating point"mxfp6": 6-bit microscaling floating point"ue8m0": 8-bit unsigned integer with 0 exponent bits"fp16": 16-bit floating point (default)"fp32": 32-bit floating point

