> ## Documentation Index > Fetch the complete documentation index at: https://internal.nolano.ai/llms.txt > Use this file to discover all available pages before exploring further. # DataPreparationConfig > API reference for data preparation configuration ## DataPreparationConfig Source location for raw data files or a readable object from which data can be accessed. Destination location for processed data files or a writable object where processed data will be stored. Tokenization strategy based on data modality: * For text/string/code: String denoting a Hugging Face tokenizer name or path to a local tokenizer compatible with Hugging Face * For time series: An instance of `TimeSeriesTokenizerConfig` * For custom tokenization: A callable function that can map string → tensor of floats/integers (2D for patch-based) Maximum sequence length for tokenized data. ## TimeSeriesTokenizerConfig Tokenization approach - `"patch_based"` (Chronos Bolt, TSFM, TiRex style) or `"bin_quant_based"` (Chronos style). Size of input patches. Required for patch-based tokenization. Default: None for bin quantization Size of output patches. Required for patch-based tokenization. Can differ from input patch size (e.g., TimesFM uses longer output patches than input). Default: None for bin quantization Enables patch masking strategy. Only applicable for patch-based tokenization. Helps models learn to predict well for context lengths that are multiples of input patch length (see TimesFM paper). Number of quantization bins. Required for bin quantization-based tokenization. Normalization strategy. Accepts custom function that maps a list of numbers to a list of numbers. Mean value for normalization. Should remain `None` for custom normalization methods. For z-norm, computes mean over first patch (patch-based tokenization, following TimesFM) or entire series (bin quantization, following Chronos). Standard deviation for normalization. Computes series-wise standard deviation based on first patch (patch-based) or entire series (bin quantization). The system automatically handles padding, missing values, and end-of-sequence tokens.