Skip to main content

Formats

Nolano.AI supports specific data formats for different modalities to ensure seamless processing and training. Understanding these formats is crucial for preparing your data correctly.

Language & Code Data

JSONL Format

For language models and code generation, Nolano.AI uses JSONL (JSON Lines) format where each line contains a JSON object with a text field containing the training content.
{"text": "def fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)"}
{"text": "The quick brown fox jumps over the lazy dog. This is a sample text for language modeling."}
{"text": "import torch\nimport torch.nn as nn\n\nclass TransformerModel(nn.Module):\n    def __init__(self, vocab_size, d_model):\n        super().__init__()"}
{"text": "# Machine Learning Pipeline\n\nThis document explains how to build an ML pipeline using Python and scikit-learn."}
{"text": "SELECT customer_id, COUNT(*) as order_count FROM orders WHERE order_date >= '2023-01-01' GROUP BY customer_id"}

Best Practices

File Naming: Use .jsonl extension for clarity, though .json also works.
Escape Characters: Ensure proper escaping of newlines (\n), quotes (\"), and other special characters in JSON.

Time Series Data

AutoGluonTS Compatible Format

Time series data follows the AutoGluonTS format, where each line contains a JSON object with time series metadata and values.

Univariate Time Series

{"target": [1.2, 1.5, 1.8, 2.0, 1.9, 1.7, 2.1, 1.8], "start": "2023-01-01", "freq": "D"}
{"target": [45.2, 48.1, 52.3, 49.7, 51.2, 47.8, 50.1], "start": "2023-01-01", "freq": "H"}
{"target": [100, 105, 110, 108, 112, 115, 118, 120], "start": "2023-01-01 00:00:00", "freq": "15T"}
{"target": [23.5, 24.1, 23.8, 24.7, 25.2, 24.9, 25.5], "start": "2023-06-15", "freq": "D"}

Multivariate Time Series

Coming Soon: Multivariate time series support will be available in an upcoming release, enabling training on multiple correlated time series simultaneously.

Irregular Sample Intervals

Coming Soon: Support for irregular sample intervals will be added to handle time series data with non-uniform timestamps and varying frequencies.

Required Fields

target
array
required
Univariate: Array of numeric values representing the time seriesMultivariate: Array of arrays, where each inner array represents values at a time step across multiple dimensions
start
string
required
ISO format timestamp string indicating when the time series begins. Examples:
  • "2023-01-01" (date only)
  • "2023-01-01 00:00:00" (date and time)
  • "2023-01-01T00:00:00" (ISO 8601 format)
freq
string
required
Frequency string indicating the time interval between observations:
  • "D" - Daily
  • "H" - Hourly
  • "T" or "min" - Minutely
  • "S" - Secondly
  • "W" - Weekly
  • "M" - Monthly
  • "15T" - Every 15 minutes
  • "30S" - Every 30 seconds

Optional Fields

item_id
string
Unique identifier for the time series, useful when training on multiple series

Frequency Specification Guide

Common Frequencies

  • D - Daily
  • H - Hourly
  • T - Minutely
  • S - Secondly

Custom Intervals

  • 15T - Every 15 minutes
  • 30S - Every 30 seconds
  • 2H - Every 2 hours
  • 5D - Every 5 days

Business Frequencies

  • B - Business day
  • W - Weekly
  • M - Monthly
  • Q - Quarterly
Ready to prepare your data? Check out our Data Preparation Guide for step-by-step instructions on processing your data files.