Skip to main content

Model Configuration

ModelConfig

Configuration for model architecture and initialization.
architecture
str
required
Model architecture specification. Supports major dense and MoE Hugging Face architectures including Qwen, LLaMA, Gemma.
TSFM and other architectures coming soon.
init_method
str
default:"normal"
Weight initialization strategy:
  • "none": Load from pre-trained model (Qwen/LLaMA/Gemma)
  • "normal": Normal distribution initialization
  • "xavier_uniform": Xavier uniform initialization
  • "wang_init": Wang initialization method
model_path
str or None
default:"None"
Path to pre-trained model for continual training. Must be None if init_method is not "none".
load_optimizer
bool or None
default:"None"
Whether to load optimizer state from checkpoint. Set to True for continual training from checkpoint.
precision
str
default:"fp16"
Model precision configuration:
  • "binary": Binary precision (1-bit)
  • "ternary": Ternary precision (1.58-bit)
  • "int2": 2-bit integer precision
  • "fp8": 8-bit floating point
  • "mxfp4": 4-bit microscaling floating point
  • "mxfp6": 6-bit microscaling floating point
  • "ue8m0": 8-bit unsigned integer with 0 exponent bits
  • "fp16": 16-bit floating point (default)
  • "fp32": 32-bit floating point

Example Configurations

from pynolano import ModelConfig

config = ModelConfig(
    architecture="Qwen/Qwen3-4B",
    init_method="none",
    model_path="./pretrained_model",
    load_optimizer=False,
    precision="fp16"
)

Supported Architectures

  • LLaMA Series
  • Qwen Series
  • Gemma Series
you can request custom architecture if needed
When using init_method="none", you must provide a valid model_path. The model path should point to a compatible pre-trained model or checkpoint.