Skip to content

Architecture

Rhei is a fully in-process HTAP (Hybrid Transactional/Analytical Processing) engine that embeds a low-latency SQLite write path alongside a high-throughput analytical engine (DuckDB or Apache DataFusion), bridged by trigger-based Change Data Capture so that analytical queries always see a near-real-time view of your OLTP data — without a separate server, ETL job, or network hop.

Standard HTAP Mode

text
Client


HtapEngine (facade — crates/rhei/)

  ├─ SqlParserRouter ──────────────────► AST-based routing (sqlparser-rs)
  │                                       ├── writes / DDL / simple SELECT → OLTP
  │                                       └── aggregates / JOIN / CTE / window → OLAP

  ├─ RusqliteEngine (OLTP)
  │     ├── write connection  (WAL mode, dedicated OS thread)
  │     ├── read pool         (round-robin, 4 connections by default)
  │     └── CDC connection    ──► _rhei_cdc_log (trigger-populated)

  ├─ OlapBackend (enum — crates/rhei-olap/)
  │     ├── DuckDbEngine      (feature: duckdb-backend)
  │     └── DataFusionEngine  (feature: datafusion-backend, default)

  └─ CdcSyncEngine (crates/rhei-sync/)
        ├── polls _rhei_cdc_log (seq watermark)
        ├── SyncMode::Destructive — mirror (UPDATE/DELETE in-place)
        ├── SyncMode::Temporal   — SCD Type 2 (append-only history)
        ├── bulk INSERT via Arrow batch → OlapEngine::load_arrow()
        └── background loop with CancellationToken shutdown

Sidecar Mode

In sidecar mode the local OLTP write path is optional. Instead, a TimestampCdcConsumer polls an external database (SQLite or PostgreSQL) for rows whose updated_at timestamp exceeds the last recorded watermark:

text
External DB (SQLite / PostgreSQL)
  │  SELECT … WHERE updated_at > watermark ORDER BY updated_at, pk

TimestampCdcConsumer<S: SourceConnector>  (crates/rhei-sidecar/)
  │  emits CdcEvent stream (Insert / Update / Delete)
  │  watermark persisted in RocksDB (optional)

CdcSyncEngine  (temporal or destructive)
  │  same DML pipeline as standard mode

OlapBackend (DuckDB / DataFusion)

No schema changes are required on the source database — only that each tracked table has an updated_at column (and optionally a deleted_at column for soft-delete detection).

Why Two Engines?

SQLite (via Rusqlite) excels at low-latency point writes: WAL mode gives sub-millisecond single-row commits, and the embedded nature means zero network overhead. DuckDB and DataFusion excel at vectorised scan-heavy queries over large tables: column-oriented storage, predicate pushdown, and SIMD execution make aggregates orders of magnitude faster than row-store databases.

Rhei lets each engine do what it does best. The SqlParserRouter makes the routing invisible to the caller — one query() call, no manual dispatch.

Why SQLite for OLTP?

  • Serverless: no separate process, no port, no credentials.
  • WAL mode: concurrent reads never block writes.
  • Trigger-based CDC: SQLite's AFTER INSERT/UPDATE/DELETE triggers write compact JSON arrays to _rhei_cdc_log with ~10 % less overhead than json_object() equivalents.
  • forbid(unsafe_code): the async wrapper (rhei-tokio-rusqlite) uses dedicated OS threads + crossbeam channels to make rusqlite::Connection (which is !Send) safely usable from async code with zero unsafe blocks.

Why DuckDB or DataFusion for OLAP?

DuckDB brings a battle-tested columnar SQL engine with ACID transaction support, native Arrow Appender bulk-loading (zero-copy), and MVCC-based concurrent reads. All operations run on blocking threads via spawn_blocking.

DataFusion is a pure-Rust natively async engine with pluggable storage backends — InMemory (volatile) or unified Vortex columnar storage (local filesystem or S3-compatible object store, auto-classified from the URL). It integrates naturally with Tokio and streams results without buffering the entire result set (RecordBatchBoxStream). It is the default because it has zero C/C++ dependencies.

See OLAP Backends for a detailed comparison and guidance on which to pick.

How CDC Works

  1. When register_table is called, Rhei installs three SQLite triggers on the target table — AFTER INSERT, AFTER UPDATE, and AFTER DELETE.
  2. Each trigger appends one row to _rhei_cdc_log: a monotonically increasing sequence number, a Unix timestamp, the operation code (I/U/D), the table name, and a JSON array snapshot of the row data.
  3. CdcSyncEngine polls _rhei_cdc_log using a stored watermark (last_synced_seq). On each cycle it fetches up to sync_batch_size events, converts them to OLAP DML, and advances the watermark.
  4. Consecutive INSERTs to the same table are batched into a single Arrow RecordBatch and loaded via OlapEngine::load_arrow() — an Arrow-native path that avoids SQL string construction entirely (5–7× faster than individual INSERT statements for bulk workloads).
  5. Processed events are pruned from _rhei_cdc_log after each successful cycle (prune_after_sync = true by default) to keep the log table small.

For crash-durable CDC event buffering, set rocksdb_cdc_path — SQLite triggers still fire into _rhei_cdc_log, but a bridge drains them atomically into RocksDB before the sync engine reads them. See RocksDB CDC for details.

Key Crates

CrateRole
rheiHtapEngine facade, HtapConfig, integration tests
rhei-coreShared traits and types (CdcEvent, SyncMode, QueryTarget, …)
rhei-tokio-rusqliteAsync SQLite wrapper (dedicated thread per connection)
rhei-oltp-rusqliteRusqliteEngine + CDC trigger setup
rhei-syncCdcSyncEngine, SqlParserRouter, CDC-to-DML converter
rhei-datafusionDataFusion OLAP engine with pluggable StorageMode
rhei-duckdbDuckDB OLAP engine with Arrow Appender bulk-load
rhei-sidecarTimestampCdcConsumer, watermark store, SourceConnector trait
rhei-flightArrow Flight SQL gRPC server (flight-sql feature)
rhei-tuirh CLI + TUI dashboard

Released under the Apache-2.0 License.