Building Time Series Forecasting Foundation Models

This comprehensive tutorial demonstrates building state-of-the-art time series forecasting foundation models through cross-modal continual pretraining. We’ll start with a pretrained language model (Gemma-3-4B) and adapt it for time series forecasting using advanced patch-based tokenization and a custom research multi-objective optimization with MAE and Quantile Loss with 20 quantiles.

What You’ll Learn

In this tutorial, we’ll cover:

Cross-modal continual pretraining from language models to time series
Patch-based time series tokenization with masking for robust learning
Custom multi-objective loss functions combining MAE and pinball loss
Advanced forecasting techniques using foundation model approaches
Production-ready model deployment for time series prediction

Step 1: Data Preparation

Your time series data should be in JSONL format following the AutoGluonTS format. Each line should contain a dictionary with the following structure: We’ll use patch-based tokenization to convert time series into sequences that can be processed by transformer architectures, enabling transfer from language models.

data_config.py

from pynolano import DataPreparationConfig, TimeSeriesTokenizerConfig

def build() -> DataPreparationConfig:
    return DataPreparationConfig(
        input_path="./time_series_data.jsonl",
        output_path="./prepared_ts_foundation",
        tokenization=TimeSeriesTokenizerConfig(
            type="patch_based",
            input_patch_size=32,     # Input patches of 32 time steps
            output_patch_size=128,   # Predict patches of 128 time steps 
            patch_masking=True,      # Enable patch masking for robust learning
            normalization_method="z-norm"  # Standardize values
        ),
        max_sequence_length=2048  # Maximum sequence length for efficient training
    )

Run Data Preparation

nolano prepare_data data_config.py

Step 2: Custom Loss Function Implementation

Our custom loss function combines Mean Absolute Error (MAE) with Pinball Loss for quantile forecasting. While MAE is well-known, let’s focus on the key component:

Pinball Loss (Quantile Loss)

Pinball loss enables probabilistic forecasting by predicting multiple quantiles with asymmetric penalties:

\text{Pinball}(\tau) = \frac{1}{n} \sum_{i=1}^{n} \max\left(\tau(y_i - \hat{y}_i^\tau), (\tau-1)(y_i - \hat{y}_i^\tau)\right)

Where:

$\tau$ is the quantile level (e.g., 0.1 for 10th percentile)
$\hat{y}_i^\tau$ is the predicted value for quantile $\tau$
The loss penalizes under-prediction more for high quantiles, over-prediction more for low quantiles

Custom Multi-Objective Loss Function

Create a custom multi-objective loss function that combines MAE and pinball loss for comprehensive forecasting:

custom_loss.py

import torch
import torch.nn as nn

def multi_objective_forecasting_loss(logits, targets):
    """
    Custom multi-objective loss combining MAE (70%) and Pinball loss (30%)
    
    Args:
        logits: Model predictions of shape (..., sequence_length, output_patch_size)
        targets: Ground truth values of shape (..., sequence_length, output_patch_size)
    
    Returns:
        loss: Single scalar loss value
    """
    # Flatten the last two dimensions for easier computation
    # Shape: (..., sequence_length * output_patch_size)
    logits_flat = logits.view(*logits.shape[:-2], -1)
    targets_flat = targets.view(*targets.shape[:-2], -1)
    
    # 1. Mean Absolute Error (MAE) - 70% weight
    mae_loss = torch.mean(torch.abs(logits_flat - targets_flat))
    
    # 2. Pinball Loss for 20 quantiles - 30% weight
    quantiles = torch.linspace(0.05, 0.95, 20, device=logits.device)  # 20 quantiles from 5% to 95%
    pinball_losses = []
    
    for tau in quantiles:
        # For pinball loss, we interpret predictions as quantile estimates
        # This is a simplified approach - in practice, you might have separate heads for each quantile
        errors = targets_flat - logits_flat
        pinball_loss = torch.where(
            errors >= 0,
            tau * errors,
            (tau - 1) * errors
        )
        pinball_losses.append(torch.mean(pinball_loss))
    
    # Average pinball loss across all quantiles
    avg_pinball_loss = torch.stack(pinball_losses).mean()
    
    # Combine losses with specified weights
    total_loss = 0.7 * mae_loss + 0.3 * avg_pinball_loss
    
    return total_loss

Step 3: Training Configuration

Configure the model for cross-modal continual pretraining from Gemma-3-4B to time series forecasting:

train_config.py

from pynolano import (
    ExperimentConfig, 
    DataConfig, 
    ModelConfig, 
    OptimizationConfig,
    MetaConfig
)
from custom_loss import multi_objective_forecasting_loss

def build() -> ExperimentConfig:
    return ExperimentConfig(
        data_configs=[
            DataConfig(
                data_paths="./prepared_ts_foundation",
                training_objective=multi_objective_forecasting_loss,  # Custom loss function
                validation_split=0.15
            )
        ],
        model_config=ModelConfig(
            architecture="google/gemma-3-4b-pt",  # Pretrained Gemma model
            init_method="none",  # Don't reinitialize weights - use pretrained
            # Cross-modal adaptation will be handled automatically
        ),
        optimization_config=OptimizationConfig(
            total_training_steps=25000,
            max_learning_rate=5e-5,  # Lower learning rate for continual pretraining
            global_batch_size=64,
            learning_rate_schedule="cosine",
            warmup_steps=2500,
            weight_decay=0.01,
            gradient_clipping=1.0  # Important for stability in cross-modal training
        ),
        meta_config=MetaConfig(
            name="ts-foundation-gemma-4b",
            model_save_frequency=2500,
            max_checkpoints=5,
            seed=42
        )
    )

Launch the training process:

nolano train train_config.py

The platform will automatically:

Adapt the Gemma-3-4B architecture for time series processing
Apply patch-based tokenization during training
Optimize using the custom multi-objective loss function
Scale across multiple GPUs for efficient training

Advanced Foundation Model Features

The tutorial showcases several cutting-edge capabilities:

Cross-Modal Continual Pretraining

Patch-Based Tokenization

Advanced Time Series RepresentationThe patch-based approach treats time series like sequences:

Input patches of 32 time steps for context understanding
Output patches of 128 time steps for multi-step forecasting
Patch masking during training improves robustness
Enables efficient processing of long time series

Multi-Objective Loss Optimization

Sophisticated Loss Function DesignOur custom loss combines multiple objectives:

70% MAE weight for robust point forecasting
30% pinball loss weight for uncertainty quantification
20 quantiles (vs. standard 10) for detailed probabilistic forecasting
Asymmetric penalty structure for realistic cost modeling

Foundation Model Benefits

Scalable and Transferable ArchitectureThe foundation model approach provides:

Zero-shot forecasting on new time series
Few-shot adaptation to domain-specific patterns
Robust performance across diverse time series types
Efficient fine-tuning for specialized applications

Get Started

Core Concepts

Tutorials

Time Series Forecasting Foundation Models

Building Time Series Forecasting Foundation Models

What You’ll Learn

Step 1: Data Preparation

Step 2: Custom Loss Function Implementation

Pinball Loss (Quantile Loss)

Custom Multi-Objective Loss Function

Step 3: Training Configuration

Advanced Foundation Model Features

Get Started

Core Concepts

Tutorials

​Building Time Series Forecasting Foundation Models

​What You’ll Learn

​Step 1: Data Preparation

​Step 2: Custom Loss Function Implementation

​Pinball Loss (Quantile Loss)

​Custom Multi-Objective Loss Function

​Step 3: Training Configuration

​Advanced Foundation Model Features

Building Time Series Forecasting Foundation Models

What You’ll Learn

Step 1: Data Preparation

Step 2: Custom Loss Function Implementation

Pinball Loss (Quantile Loss)

Custom Multi-Objective Loss Function

Step 3: Training Configuration

Advanced Foundation Model Features