Skip to content

Pipeline Architecture

flowchart TD
  A[Router Jobs] --> B[Throttle (GCRA) + Concurrency (Gradient)]
  B --> C[HTTP / Library Fetch with Retry]
  C --> D[Parse -> FramePackets]
  D --> E[Normalize (Schemas, PK)]
  E --> F{Flush threshold reached?}
  F -- No --> E
  F -- Yes --> G[Merge Frames, PK Checks]
  G --> H{Write Chunk}
  H -- Success --> I[Update RunResult & Metrics]
  H -- Failure --> J[Per-packet Rescue]
  J -- Partial Success --> K[DLQ Spool Failed Packets]
  J -- All Fail --> K
  K --> L[Operator Recovery CLI]
  L --> H

High‑level Flow

1) Fetch jobs scheduled with FlowController: - GCRA pacing enforces RPM. - Gradient concurrency limits in‑flight requests via RTT feedback. 2) Parse & normalize with Polars and provider schemas. 3) Buffer packets per table until flush_threshold_rows. 4) Flush and write: - Streaming chunked merge/write using internal defaults. - PK validation and per‑chunk accounting. 5) Error handling and DLQ: - Attempt per‑packet rescue writes. - On remaining failures: - Spool to DLQ with atomic replace and fsync.

Design Notes

  • Transport decoupling: routers are transport‑agnostic; library calls use schemes.
  • Resilience first: DLQ, structured logs, and per‑table counts maintain operator visibility.
  • Memory control: chunked flush avoids monolithic merges on large batches.
  • Observability: counters/histograms/optional spans provide diagnostics without hard dependency on external tracing.