Schema¶

vertex-forager uses TableSchema objects to describe the canonical shape of each output table. Schemas define the target columns, types, unique keys, and optional data-quality rules that the mapper and writers rely on during normalization and persistence.

Use the schema reference when you want to:

understand the columns and keys expected for a provider table
configure strict validation with SchemaMapper
inspect registry lookup behavior before writing custom integrations

Usage example¶

from vertex_forager.schema.registry import get_table_schema

schema = get_table_schema("yfinance_price")
if schema is not None:
    print(schema.unique_key)
    print(schema.analysis_date_col)

The registry function looks up a table name in the shared provider registry and returns the matching TableSchema if one exists. Providers such as Sharadar and yfinance register their schemas centrally so downstream normalization and writers can stay provider-agnostic.

TableSchema¶

Definition of a table's structural constraints.

Attributes:

Name	Type	Description
`table`	`str`	Canonical table name (e.g., "sharadar_sep").
`schema`	`dict[str, DataType \| type[DataType]]`	Mapping of column names to Polars DataTypes.
`unique_key`	`tuple[str, ...]`	Tuple of column names that form the primary key (for deduplication).
`analysis_date_col`	`str \| None`	Optional timestamp/analysis column used for time-based processing. Defaults to None.
`flexible_schema`	`bool`	Whether schema is permissive to extra/unknown fields. Defaults to False.
`quality_rules`	`tuple[DataQualityRule, ...]`	Tuple of data quality validation rules to apply to table data. Defaults to empty tuple.

SchemaMapper¶

Core component responsible for data normalization and schema enforcement.

The SchemaMapper ensures that all data flowing through the pipeline conforms to pre-defined schemas before it reaches the Writer stage. This guarantees type safety and structural consistency across different storage backends.

Key Responsibilities: 1. Schema Lookup: Retrieves the authoritative TableSchema for a given table name from the central registry. 2. Type Casting: - Default (strict_validation=False): Casts columns with non-strict casting (strict=False), allowing nulls on failure. - Strict (strict_validation=True): Casts with strict=True and raises on type mismatches. 3. Missing Column Handling: Automatically adds missing schema columns with null values to ensure downstream systems receive complete records. 4. Column Ordering: Reorders columns to match the canonical schema definition.

Usage

mapper = SchemaMapper() normalized_packet = mapper.normalize(raw_packet)

`normalize(packet)` ¶

Enforce schema conformance on a data packet.

This method transforms a raw DataFrame into a schema-compliant DataFrame. If a schema is registered, the frame is cast to declared types and columns are reordered. When analysis_date_col is set on the schema and present in frame.columns, that column is cast to the schema type (strict=False) and the frame is sorted by it. No new column is created if the target analysis_date_col is absent.

If no schema is registered for the table, the packet is returned strictly as-is.

Parameters:

Name	Type	Description	Default
`packet`	`FramePacket`	Input packet containing potentially raw/untyped data.	required

Returns:

Name	Type	Description
`FramePacket`	`FramePacket`	A new packet containing the normalized DataFrame.

Registry¶

Retrieve the schema for a given table name.