Performance Tuning¶
Goals¶
- Profile realistic workloads and quantify bottlenecks (p95/p99).
- Optimize Polars transforms, writer batching, progress UI, memory validation.
- Tune concurrency and HTTP client parameters to maximize throughput safely.
Client Parameters¶
concurrency: Max concurrent fetch jobs/requests.schedule=SchedulerConfig(...): Grouped scheduler controls for DRR fairness and backlog pressure.limits=HTTPConfig(...): HTTP connection pool settings (max connections, keepalive, timeout).storage=StorageConfig(...): Grouped data-lifecycle and write-path tuning settings.
Convenience Environment Variables¶
VF_PROFILE_OUTPUT_DIR: Output directory for verification artifacts.SHARADAR_API_KEY: Optional credential for Sharadar verification runs.
Profiling Scripts¶
- Price:
packages/vertex-forager/tests/verification/verify_pipeline_perf.py - Financials:
packages/vertex-forager/tests/verification/verify_pipeline_perf_financials.py - Sweep:
packages/vertex-forager/tests/verification/verify_pipeline_sweep.py
Usage:
uv run python packages/vertex-forager/tests/verification/verify_pipeline_perf.py
Outputs JSON summaries (p95/p99 and rows) under the configured output directory.
Example explicit SDK configuration:
from vertex_forager import HTTPConfig, SchedulerConfig, StorageConfig, create_client
client = create_client(
provider="sharadar",
api_key="...",
rate_limit=500,
concurrency=12,
schedule=SchedulerConfig(
quantum=3,
backpressure_threshold=12 * 3 * 10,
),
limits=HTTPConfig(
max_connections=200,
max_keepalive_connections=100,
timeout_s=30.0,
),
storage=StorageConfig(
flush_threshold_rows=500_000,
),
)
Tuning Strategy¶
- Start
concurrencyin [8, 12, 16, 20, 24]; calibrate by provider latency. - Keep
schedule.quantum=3as the default baseline, then lower it for more symbol interleaving or raise it when deep pagination dominates and fairness is less important. - When you need backlog protection, start
schedule.backpressure_thresholdnearconcurrency × quantum × 10. - Add
schedule.max_pending_per_symbolonly when one symbol can monopolize memory with exceptionally deep history. - Increase
storage.flush_threshold_rowsto reduce flush frequency on large tables. - Tune
limits.max_keepalive_connectionsandlimits.max_connectionsto match concurrency and provider behavior. - Split processes per dataset if optimal parameters differ significantly.