Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.nolano.ai/llms.txt

Use this file to discover all available pages before exploring further.

ModelConfig

architecture
str
required
Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
init_method
str
default:"normal"
Weight initialization strategy:
  • "none": Load from pre-trained model (Qwen/LLaMA/Gemma)
  • "normal": Normal distribution initialization
  • "xavier_uniform": Xavier uniform initialization
  • "wang_init": Wang initialization method
model_path
str | None
default:"None"
Path to pre-trained model for continual training. Must be None if init_method is not "none".
load_optimizer
bool | None
default:"None"
Whether to load optimizer state from checkpoint. Set to True for continual training from checkpoint.
precision
str
default:"fp16"
Model precision configuration:
  • "binary": Binary precision (1-bit)
  • "ternary": Ternary precision (1.58-bit)
  • "int2": 2-bit integer precision
  • "fp8": 8-bit floating point
  • "mxfp4": 4-bit microscaling floating point
  • "mxfp6": 6-bit microscaling floating point
  • "ue8m0": 8-bit unsigned integer with 0 exponent bits
  • "fp16": 16-bit floating point (default)
  • "fp32": 32-bit floating point