Evaluation & Inference
The evaluation system in Nolano.AI provides comprehensive tools for assessing model performance across different modalities and tasks. The platform supports both built-in evaluation metrics and custom evaluation functions.EvaluationConfig
Configuration class for model evaluation and inference settings.Path to the trained model checkpoint directory (e.g.,
/path/to/checkpoint/global_step_XXXXX)Configuration for evaluation data. Similar to training data config but typically with
validation_split=1.0Evaluation metrics to compute:
- Text/code models:
"perplexity","accuracy","bleu","rouge" - Time series:
"mse","mae","mape","smape","quantile_loss" - Custom callable functions with signature:
(predictions, targets) → metric_value
Batch size for evaluation.
Whether to save predictions to file.
Directory to save evaluation results and predictions.
Maximum number of evaluation steps. Set to
None for full dataset evaluation.Built-in Evaluation Metrics
- Text/Code Modality
- Time Series Modality
Perplexity: Measures how well the model predicts the next tokenAccuracy: Token-level or sequence-level accuracyBLEU: Bilingual Evaluation Understudy score for text generation qualityROUGE: Recall-Oriented Understudy for Gisting EvaluationCodeBLEU: Specialized BLEU variant for code generation
Evaluation Examples
Running Evaluation
Inference
InferenceConfig
Configuration class for model inference settings.Batch size for inference.
Maximum number of new tokens to generate (for generative models).
Sampling temperature for text generation. Higher values increase randomness.
Nucleus sampling parameter. Only consider tokens with cumulative probability up to this value.
Only consider the k most likely tokens at each step.
Whether to use sampling for generation. If False, uses greedy decoding.
Penalty for token repetition. Values > 1.0 discourage repetition.
Penalty for sequence length. Values > 1.0 encourage longer sequences.
Device for inference (‘cuda’, ‘cpu’, ‘auto’).
Inference Examples
Evaluation Output
Evaluation results are saved in JSON format with the following structure:This comprehensive evaluation system enables thorough assessment of model performance and supports iterative improvement of your foundation models.

