Skip to main content

ModelConfig

architecture
str
required
Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
init_method
str
default:"normal"
Weight initialization strategy:
  • "none": Load from pre-trained model (Qwen/LLaMA/Gemma)
  • "normal": Normal distribution initialization
  • "xavier_uniform": Xavier uniform initialization
  • "wang_init": Wang initialization method
model_path
str | None
default:"None"
Path to pre-trained model for continual training. Must be None if init_method is not "none".
load_optimizer
bool | None
default:"None"
Whether to load optimizer state from checkpoint. Set to True for continual training from checkpoint.
precision
str
default:"fp16"
Model precision configuration:
  • "binary": Binary precision (1-bit)
  • "ternary": Ternary precision (1.58-bit)
  • "int2": 2-bit integer precision
  • "fp8": 8-bit floating point
  • "mxfp4": 4-bit microscaling floating point
  • "mxfp6": 6-bit microscaling floating point
  • "ue8m0": 8-bit unsigned integer with 0 exponent bits
  • "fp16": 16-bit floating point (default)
  • "fp32": 32-bit floating point