Migration Guide¶
Use this guide when upgrading between vertex-forager releases and checking for behavior changes that may affect your pipelines.
Upcoming release¶
RetryConfig.enable_http_status_retry removed¶
RetryConfig.enable_http_status_retry has been removed from the public retry API.
Disable HTTP status retries by passing an empty tuple instead:
from vertex_forager import RetryConfig
retry = RetryConfig(retry_status_codes=())
Retry-After integer headers are now honored automatically and capped by max_backoff_s before falling back to jitter backoff.
Action required:
- Remove
enable_http_status_retryfromRetryConfig(...)construction. - Use
retry_status_codes=()to disable status-based retry. - Review provider-specific retry expectations if your upstream returns
Retry-Afterheaders.
Local state storage moved to SQLite¶
Checkpoint state and run history now live in a single SQLite database:
~/.cache/vertex-forager/state.db
DLQ payloads stay as Arrow IPC files under ~/.cache/vertex-forager/dlq/<table>/, but each spool file is now tracked in the dlq_index table inside state.db.
Action required:
- If you upgrade while a run is in progress, restart that run from scratch. Existing JSON checkpoint and run-history files are not migrated.
- Use
vertex-forager runs listandvertex-forager dlq listto inspect local operational state. - Use
vertex-forager clear --checkpoints,--runs,--dlq, or--allfor selective cleanup.
persist_run_history removed¶
persist_run_history has been removed from create_client(...). Run history is always inserted into state.db.
Action required:
- Remove
persist_run_historyfromcreate_client(...)calls. - Replace it with retention controls via
StorageConfig: storage.checkpoint_retention_daysstorage.run_history_retention_days
YFinance pickle compatibility env vars removed¶
YFinance pickle compatibility is no longer configured by environment variables.
Removed env vars:
VF_ALLOW_PICKLE_COMPATVF_PICKLE_ALLOWED_DATASETS
Use YFinanceClient(...) directly instead:
from vertex_forager import create_client
from vertex_forager.providers.yfinance.client import YFinanceClient
client = YFinanceClient(
rate_limit=60,
pickle_compat_datasets=["price"],
)
client2 = create_client(
provider="yfinance",
pickle_compat_datasets=["price"],
)
create_client(...) exposes pickle_compat_datasets only for provider="yfinance" overloads, and the yfinance overload no longer accepts api_key or rate_limit.
0.3.x → 0.4.x¶
RequestSpec.idempotent flag¶
RequestSpec now includes an idempotent field (default True).
When set to False, the retry controller performs a single attempt regardless of retry configuration.
Action required: None for most users. If you build custom RequestSpec objects for non-idempotent requests, set idempotent=False to prevent duplicate side effects.
from vertex_forager.core.config import RequestSpec
spec = RequestSpec(
url="https://api.example.com/submit",
method="POST",
json_body={"x": 1},
idempotent=False,
)
0.2.x → 0.3.x¶
Opt-in strict schema validation¶
SchemaMapper now accepts strict_validation=True. When enabled, missing required columns or type mismatches raise ValueError immediately instead of being silently ignored.
Action required: None — strict validation is opt-in. To enable:
from vertex_forager.schema.mapper import SchemaMapper
mapper = SchemaMapper(strict_validation=True)
Warning
Enabling strict validation on existing pipelines may surface previously silent schema issues. Test with your data before deploying.
In-memory writer dedup/upsert¶
InMemoryBufferWriter now supports unique_key for automatic deduplication and upsert behavior during writes.
Action required: None — the feature is additive.
0.1.x → 0.2.x¶
Breaking changes
This is a major restructuring release. Most imports changed.
Public API moved to package root¶
All primary classes and factories are now importable directly from vertex_forager:
# Before (0.1.x)
from vertex_forager.clients.base import BaseClient
from vertex_forager.core.config import RetryConfig
# After (0.2.x)
from vertex_forager import BaseClient, RetryConfig, create_client
Centralized constants¶
Provider-specific constants moved to vertex_forager.constants.
Environment-aware CLI¶
The CLI now reads environment variables and supports vertex-forager collect, vertex-forager status, and vertex-forager recover.
DLQ spool¶
Dead Letter Queue spooling is always enabled; failed write packets are persisted to disk with fsync and atomic replace.
Writer chunked flush¶
Writers use internal chunked flush to bound memory usage during large writes.
RunResult summaries¶
Pipeline runs return RunResult with per-table DLQ counts.
In-memory collect return type¶
Client collect methods now always return RunResult.
Before:
df = client.get_price_data(tickers=["AAPL"])
After:
result = client.get_price_data(tickers=["AAPL"])
df = result.data
Data quality policy¶
Provider table schemas now ship with built-in quality rules. The default quality_check="warn" mode records counts in RunResult.quality_violations; quality_check="error" raises DataQualityError and aborts the run.
Logging model¶
OBS/stage logs are now always emitted at DEBUG level with structured extra fields on the vertex_forager logger. Configure visibility in the host app:
import logging
logging.getLogger("vertex_forager").setLevel(logging.DEBUG)
Removed parameters:
structured_logslog_verbose
Tracing model¶
Tracing now follows the OpenTelemetry library standard. The library acquires its own tracer via trace.get_tracer("vertex_forager") and emits spans unconditionally. The host application controls whether spans are exported by configuring a TracerProvider:
Note: The snippet below uses
TracerProvider,BatchSpanProcessor, andOTLPSpanExporterfrom the OpenTelemetry SDK and OTLP exporter packages. These are not included in the vertex-forager runtime dependency — install them separately:pip install opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
When no TracerProvider is configured, the OTel SDK falls back to a no-recording tracer — spans are not recorded or exported.
Removed parameters:
otel_enabledtraceradvanced/AdvancedConfig
Action required: Update imports to use the package root and review any code that parses RunResult.
0.1.0¶
Initial release. No migration needed.