Architecture
Rhei is a fully in-process HTAP (Hybrid Transactional/Analytical Processing) engine that embeds a low-latency SQLite write path alongside a high-throughput analytical engine (DuckDB or Apache DataFusion), bridged by trigger-based Change Data Capture so that analytical queries always see a near-real-time view of your OLTP data — without a separate server, ETL job, or network hop.
Standard HTAP Mode
Client
│
▼
HtapEngine (facade — crates/rhei/)
│
├─ SqlParserRouter ──────────────────► AST-based routing (sqlparser-rs)
│ ├── writes / DDL / simple SELECT → OLTP
│ └── aggregates / JOIN / CTE / window → OLAP
│
├─ RusqliteEngine (OLTP)
│ ├── write connection (WAL mode, dedicated OS thread)
│ ├── read pool (round-robin, 4 connections by default)
│ └── CDC connection ──► _rhei_cdc_log (trigger-populated)
│
├─ OlapBackend (enum — crates/rhei-olap/)
│ ├── DuckDbEngine (feature: duckdb-backend)
│ └── DataFusionEngine (feature: datafusion-backend, default)
│
└─ CdcSyncEngine (crates/rhei-sync/)
├── polls _rhei_cdc_log (seq watermark)
├── SyncMode::Destructive — mirror (UPDATE/DELETE in-place)
├── SyncMode::Temporal — SCD Type 2 (append-only history)
├── bulk INSERT via Arrow batch → OlapEngine::load_arrow()
└── background loop with CancellationToken shutdownSidecar Mode
In sidecar mode the local OLTP write path is optional. Instead, a TimestampCdcConsumer polls an external database (SQLite or PostgreSQL) for rows whose updated_at timestamp exceeds the last recorded watermark:
External DB (SQLite / PostgreSQL)
│ SELECT … WHERE updated_at > watermark ORDER BY updated_at, pk
▼
TimestampCdcConsumer<S: SourceConnector> (crates/rhei-sidecar/)
│ emits CdcEvent stream (Insert / Update / Delete)
│ watermark persisted in RocksDB (optional)
▼
CdcSyncEngine (temporal or destructive)
│ same DML pipeline as standard mode
▼
OlapBackend (DuckDB / DataFusion)No schema changes are required on the source database — only that each tracked table has an updated_at column (and optionally a deleted_at column for soft-delete detection).
Why Two Engines?
SQLite (via Rusqlite) excels at low-latency point writes: WAL mode gives sub-millisecond single-row commits, and the embedded nature means zero network overhead. DuckDB and DataFusion excel at vectorised scan-heavy queries over large tables: column-oriented storage, predicate pushdown, and SIMD execution make aggregates orders of magnitude faster than row-store databases.
Rhei lets each engine do what it does best. The SqlParserRouter makes the routing invisible to the caller — one query() call, no manual dispatch.
Why SQLite for OLTP?
- Serverless: no separate process, no port, no credentials.
- WAL mode: concurrent reads never block writes.
- Trigger-based CDC: SQLite's
AFTER INSERT/UPDATE/DELETEtriggers write compact JSON arrays to_rhei_cdc_logwith ~10 % less overhead thanjson_object()equivalents. forbid(unsafe_code): the async wrapper (rhei-tokio-rusqlite) uses dedicated OS threads +crossbeamchannels to makerusqlite::Connection(which is!Send) safely usable from async code with zero unsafe blocks.
Why DuckDB or DataFusion for OLAP?
DuckDB brings a battle-tested columnar SQL engine with ACID transaction support, native Arrow Appender bulk-loading (zero-copy), and MVCC-based concurrent reads. All operations run on blocking threads via spawn_blocking.
DataFusion is a pure-Rust natively async engine with pluggable storage backends — InMemory (volatile) or unified Vortex columnar storage (local filesystem or S3-compatible object store, auto-classified from the URL). It integrates naturally with Tokio and streams results without buffering the entire result set (RecordBatchBoxStream). It is the default because it has zero C/C++ dependencies.
See OLAP Backends for a detailed comparison and guidance on which to pick.
How CDC Works
- When
register_tableis called, Rhei installs three SQLite triggers on the target table —AFTER INSERT,AFTER UPDATE, andAFTER DELETE. - Each trigger appends one row to
_rhei_cdc_log: a monotonically increasing sequence number, a Unix timestamp, the operation code (I/U/D), the table name, and a JSON array snapshot of the row data. CdcSyncEnginepolls_rhei_cdc_logusing a stored watermark (last_synced_seq). On each cycle it fetches up tosync_batch_sizeevents, converts them to OLAP DML, and advances the watermark.- Consecutive INSERTs to the same table are batched into a single Arrow
RecordBatchand loaded viaOlapEngine::load_arrow()— an Arrow-native path that avoids SQL string construction entirely (5–7× faster than individualINSERTstatements for bulk workloads). - Processed events are pruned from
_rhei_cdc_logafter each successful cycle (prune_after_sync = trueby default) to keep the log table small.
For crash-durable CDC event buffering, set rocksdb_cdc_path — SQLite triggers still fire into _rhei_cdc_log, but a bridge drains them atomically into RocksDB before the sync engine reads them. See RocksDB CDC for details.
Key Crates
| Crate | Role |
|---|---|
rhei | HtapEngine facade, HtapConfig, integration tests |
rhei-core | Shared traits and types (CdcEvent, SyncMode, QueryTarget, …) |
rhei-tokio-rusqlite | Async SQLite wrapper (dedicated thread per connection) |
rhei-oltp-rusqlite | RusqliteEngine + CDC trigger setup |
rhei-sync | CdcSyncEngine, SqlParserRouter, CDC-to-DML converter |
rhei-datafusion | DataFusion OLAP engine with pluggable StorageMode |
rhei-duckdb | DuckDB OLAP engine with Arrow Appender bulk-load |
rhei-sidecar | TimestampCdcConsumer, watermark store, SourceConnector trait |
rhei-flight | Arrow Flight SQL gRPC server (flight-sql feature) |
rhei-tui | rh CLI + TUI dashboard |
