Model pipeline¶
ModelPipeline builds a sequence-to-sequence model from the model block of your config and the tokenizer produced by IOPipeline. It is used after IOPipeline.build() and before TrainerPipeline.
- Overview — how the three pipelines (IO, Model, Trainer) fit together.
- Configuration — the
modelblock intrain.yamland its keys.
ModelPipeline¶
Use ModelPipeline.from_io_dict(cfg.model, io_dict) to create a pipeline from the result of IOPipeline.from_config(cfg.data).build(). The tokenizer is taken from io_dict["tokenizer"]. Call .build() to obtain the PreTrainedModel instance.
Pipeline for creating models from configuration using ModelRegistry.
Similar to IOPipeline, this class provides a simple interface for creating model instances from config files. It uses ModelRegistry internally to handle model creation.
Examples:
>>> from omegaconf import OmegaConf
>>> from calt.models import ModelPipeline
>>>
>>> cfg = OmegaConf.load("config/train.yaml")
>>> tokenizer = ... # Get tokenizer from IOPipeline
>>>
>>> model_pipeline = ModelPipeline(cfg.model, tokenizer)
>>> model = model_pipeline.build()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
calt_config
|
DictConfig
|
Model configuration from cfg.model (OmegaConf). |
required |
tokenizer
|
PreTrainedTokenizerFast | None
|
Tokenizer instance (required for some models). |
None
|
Source code in src/calt/models/pipeline.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
from_io_dict
classmethod
¶
from_io_dict(calt_config: DictConfig, io_dict: dict) -> ModelPipeline
Create a ModelPipeline using the result dict from IOPipeline.build().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
calt_config
|
DictConfig
|
Model configuration (cfg.model). |
required |
io_dict
|
dict
|
Result dict from |
required |
Source code in src/calt/models/pipeline.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
build ¶
build() -> PreTrainedModel
Build the model from configuration using ModelRegistry.
Returns:
| Name | Type | Description |
|---|---|---|
PreTrainedModel |
PreTrainedModel
|
Model instance. |
Source code in src/calt/models/pipeline.py
68 69 70 71 72 73 74 75 76 77 78 79 | |
Supported model types¶
Models are created via an internal ModelRegistry. The following types are registered by default:
model_type |
Description |
|---|---|
generic, transformer, calt |
CALT generic Transformer (encoder–decoder). |
bart |
HuggingFace BART for conditional generation. |
Set model_type in the model block of train.yaml (e.g. model_type: generic). Other keys in the model block (e.g. num_encoder_layers, d_model, max_sequence_length) are documented under Configuration — model.
ModelRegistry¶
To create a model without using the pipeline (e.g. with a custom config), you can use the registry or helpers from calt.models: ModelRegistry, get_model_from_config. See the API reference below.
Registry for creating model instances based on model type.
This class provides a unified interface for creating different types of models. Models can be registered and retrieved by name or inferred from config.
Examples:
>>> # Create model with explicit name and config
>>> from calt.models.generic import TransformerConfig
>>> registry = ModelRegistry()
>>> config = TransformerConfig(vocab_size=1000, d_model=128)
>>> model = registry.create("transformer", config)
>>>
>>> # Create model from config only (model_type inferred from config)
>>> model = registry.create(model_config=config)
>>>
>>> # Register custom model
>>> registry.register("custom_model", CustomModel, CustomModelConfig)
Source code in src/calt/models/base.py
33 34 35 36 37 38 39 | |
create_from_config ¶
create_from_config(
model_config: DictConfig,
tokenizer: Optional[PreTrainedTokenizerFast] = None,
model_name: Optional[str] = None,
) -> PreTrainedModel
Create a model instance from OmegaConf config (cfg.model).
This method automatically converts the unified config format to model-specific configs using registered config mapping functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_config
|
DictConfig
|
Model configuration from cfg.model (OmegaConf). |
required |
tokenizer
|
PreTrainedTokenizerFast | None
|
Tokenizer instance (required for some models like BART). |
None
|
model_name
|
str | None
|
Name of the model type. If None, will be inferred from model_config.model_type. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
PreTrainedModel |
PreTrainedModel
|
Model instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If model_name is not supported or cannot be inferred from config. |
Examples:
>>> # Create model from OmegaConf config
>>> from omegaconf import OmegaConf
>>> cfg = OmegaConf.load("config/train.yaml")
>>> registry = ModelRegistry()
>>> model = registry.create_from_config(cfg.model, tokenizer)
Source code in src/calt/models/base.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
list_models ¶
list_models() -> list[str]
List all registered model types.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of registered model type names. |
Source code in src/calt/models/base.py
225 226 227 228 229 230 231 | |
register ¶
register(
model_name: str,
model_class: Type[PreTrainedModel],
config_class: Type[PretrainedConfig],
)
Register a model class with the registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
Name to register the model under. |
required |
model_class
|
Type[PreTrainedModel]
|
Model class to register. |
required |
config_class
|
Type[PretrainedConfig]
|
Config class for the model. |
required |
Source code in src/calt/models/base.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
register_config_mapping ¶
register_config_mapping(
model_name: str,
mapping_func: Callable[
[DictConfig, Optional[PreTrainedTokenizerFast]], PretrainedConfig
],
)
Register a config mapping function for converting OmegaConf to model config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
Name of the model type. |
required |
mapping_func
|
Callable
|
Function that takes (model_config: DictConfig, tokenizer: Optional) and returns PretrainedConfig. |
required |
Source code in src/calt/models/base.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |