EvaluationConfig
Path to the trained model checkpoint directory (e.g.,
/path/to/checkpoint/global_step_XXXXX)Configuration for evaluation data. Similar to training data config but typically with
validation_split=1.0Evaluation metrics to compute:
- For text/code models:
"perplexity","accuracy","bleu","rouge" - For time series:
"mse","mae","mape","smape","quantile_loss" - Custom callable functions with signature:
(predictions, targets) → metric_value
Batch size for evaluation.
Whether to save predictions to file.
Directory to save evaluation results and predictions.
Maximum number of evaluation steps. Set to
None for full dataset evaluation.InferenceConfig
Batch size for inference.
Maximum number of new tokens to generate (for generative models).
Sampling temperature for text generation. Higher values increase randomness.
Nucleus sampling parameter. Only consider tokens with cumulative probability up to this value.
Only consider the k most likely tokens at each step.
Whether to use sampling for generation. If False, uses greedy decoding.
Penalty for token repetition. Values > 1.0 discourage repetition.
Penalty for sequence length. Values > 1.0 encourage longer sequences.
Device for inference (‘cuda’, ‘cpu’, ‘auto’).

