Configuration Reference¶

Public runtime configuration is now centered on create_client(...) plus grouped config models.

Required Inputs¶

provider: str — provider identifier such as "sharadar" or "yfinance"
api_key: str | None — required for Sharadar, not accepted by the yfinance overload
rate_limit: int — required for Sharadar; the yfinance overload uses an internal default

Top-Level Client Parameters¶

schedule: SchedulerConfig = SchedulerConfig()
quality_check: Literal["warn", "error"] = "warn"
concurrency: int | None = None

Stage logs are always emitted at DEBUG level; the host application controls visibility and formatting via standard logging configuration.

quality_check="warn" records quality-rule violations in RunResult.quality_violations and continues. quality_check="error" raises DataQualityError on the first violating flush.

Provider-Specific Direct Clients¶

Some provider-specific configuration only exists on the concrete client, not on create_client(...).

YFinanceClient¶

pickle_compat_datasets: list[str] | None = None
None or [] disables pickle compatibility
a non-empty list enables pickle fallback only for the named datasets
available on YFinanceClient(...) and on create_client(provider="yfinance", ...)
create_client(provider="yfinance", ...) uses an internal default rate limit and does not accept api_key or rate_limit

Grouped Public Config¶

RetryConfig¶

max_attempts: int
base_backoff_s: float
max_backoff_s: float — must be >= base_backoff_s
backoff_mode: Literal["full_jitter", "equal"]
retry_status_codes: tuple[int, ...]

AdaptiveThrottleConfig¶

enabled: bool
window_s: int
error_rate_threshold: float — in [0, 1]
rpm_floor_ratio: float — in [0, 1]
recovery_factor: float — in [0, 1]
healthy_window_s: int

HTTPConfig¶

max_connections: int
max_keepalive_connections: int
timeout_s: float — HTTP request timeout in seconds

SchedulerConfig¶

quantum: int = 3
max_pending_per_symbol: int | None = None
backpressure_threshold: int | None = None

StorageConfig¶

flush_threshold_rows: int — DuckDB write buffer threshold before flush begins
checkpoint_retention_days: int = 7 — retention window for completed checkpoint state
run_history_retention_days: int = 90 — retention window for run-history records
dlq_tmp_retention_s: int — retention window for DLQ .tmp artefacts

DLQ spooling, periodic cleanup, writer chunking, writer worker count, and memory guard thresholds are internal defaults and are no longer public tuning knobs.

Example¶

from vertex_forager import (
    AdaptiveThrottleConfig,
    HTTPConfig,
    RetryConfig,
    SchedulerConfig,
    StorageConfig,
    create_client,
)

client = create_client(
    provider="sharadar",
    api_key="...",
    rate_limit=300,
    concurrency=4,
    schedule=SchedulerConfig(
        quantum=3,
        max_pending_per_symbol=50,
        backpressure_threshold=120,
    ),
    retry=RetryConfig(max_attempts=3),
    throttle=AdaptiveThrottleConfig(rpm_floor_ratio=1.0),
    limits=HTTPConfig(max_connections=200, max_keepalive_connections=100, timeout_s=30.0),
    storage=StorageConfig(
        flush_threshold_rows=500_000,
        checkpoint_retention_days=7,
        run_history_retention_days=90,
    ),
)