Tune Chunked Flush Thresholds¶
Chunked flush reduces memory peaks by writing in bounded batches. Tune thresholds for your data volume and disk throughput.
Concepts¶
flush_threshold_rows: rows buffered per table before a flush is triggered.WRITER_CHUNK_ROWS: internal per‑chunk row target during a flush.
Guidelines¶
- Start with:
flush_threshold_rows: 500_000 (default)- Keep
flush_threshold_rowsat default and rely on internal chunking unless you have a clear reason to tune flush cadence. - Increase
flush_threshold_rowsif: - Write throughput is low but memory headroom is ample.
- Decrease
flush_threshold_rowsif: - Peak memory or GC pressure increases during merges.
Note: chunk size (WRITER_CHUNK_ROWS) is an internal constant and not user-configurable.
Example¶
from vertex_forager import StorageConfig, create_client
client = create_client(
provider="sharadar",
api_key="...",
rate_limit=120,
storage=StorageConfig(flush_threshold_rows=500_000),
)
Monitor:
writer_rows.{table}histogram distributionwriter_flush_duration_s.{table}percentiles
Aim for per‑chunk durations within ~0.5–2s on your hardware to balance I/O and latency.