Vortex storage
Rhei's DataFusion backend can store OLAP data using Vortex, a modern columnar format with native local-filesystem and S3-compatible object-store support. A single Vortex { url } storage mode replaces the four file-format modes that existed in v1.5.x (ArrowIpc, Parquet, S3Parquet, GcsParquet).
Feature flag
The cloud-storage feature gates the S3-compatible Vortex codepath. Local Vortex storage (file-URL or relative path) works without any feature flags.
Enable S3-compatible Vortex storage in your Cargo.toml:
[dependencies]
rhei = { version = "2", features = ["cloud-storage"] }Or, when building from a checkout of the workspace:
cargo build --features cloud-storageDataFusion only
Vortex storage is available exclusively with the DataFusion OLAP backend. DuckDB uses its own native .duckdb file storage and is unaffected by this setting.
Storage modes
DataFusion supports two StorageMode variants in v2.0:
| Mode | TOML value | Description |
|---|---|---|
InMemory | "in_memory" / "inmemory" / "memory" | Default. Data is lost on shutdown. No url needed. |
Vortex { url } | "vortex" | Unified Vortex storage. URL auto-classifies as local or S3-compatible. Requires url. |
URL auto-classification
The url field in the Vortex mode is inspected at startup to determine whether local or S3-compatible storage is used:
| URL pattern | Classification | Feature required |
|---|---|---|
./path/to/dir | Local filesystem (relative) | None |
/absolute/path | Local filesystem (absolute) | None |
file:///absolute/path | Local filesystem (file URI) | None |
s3://bucket/prefix | S3-compatible object store | cloud-storage |
Examples
The TOML field name is mode (not storage_mode). Use path for local storage and url for S3-compatible — they share the same internal Vortex { url } field but the parser separates them so a misconfiguration that would otherwise create a directory named s3:/ fails fast.
# Local, relative to working directory
[engine.datafusion_storage]
mode = "vortex"
path = "./rhei-data"
# Local, absolute path
[engine.datafusion_storage]
mode = "vortex"
path = "/var/lib/rhei/olap"
# S3-compatible (requires the cloud-storage feature)
[engine.datafusion_storage]
mode = "vortex"
url = "s3://my-bucket/prefix"S3-compatible backend
Supported services
The S3-compatible path works with any service that speaks the S3 API, including:
- Amazon S3
- MinIO
- Cloudflare R2
- Wasabi
- Ceph RGW
Required environment variables
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"
# For S3-compatible services (MinIO, R2, Wasabi, Ceph), override the endpoint:
export AWS_ENDPOINT_URL="https://my-minio.example.com"For AWS S3 itself, AWS_ENDPOINT_URL is not required. For IAM role / instance-profile authentication, no static credentials are needed — the standard AWS credential chain applies.
URL format
s3://<bucket>/<prefix>Example: s3://my-analytics-bucket/rhei-data
Per-table subdirectories are created beneath the prefix automatically.
GCS support dropped in v2.0
Google Cloud Storage (gs:// URLs) is no longer supported directly. GCS users have two options:
- S3-compatibility layer: run MinIO or another S3-compatible proxy in front of GCS (e.g. via
gcsfuse+ a local MinIO instance), then pointurlat the S3 endpoint withAWS_ENDPOINT_URL. - Stay on v1.5.x: v1.5.x
gcs-parquetmode continues to work on the v1.5.x branch. First-class GCS support may return in a future v2.x release.
Migration from v1.5.x
There is no automatic data conversion from ArrowIpc / Parquet / S3Parquet / GcsParquet to Vortex. The upgrade path is:
- Drop the existing OLAP directory (or S3 prefix).
- Re-sync from the OLTP source (
engine.initial_sync_all().await?orrh sync).
A rh convert subcommand may ship in a future release to automate this step.
Read and write characteristics
- Reads: DataFusion's query engine reads Vortex data natively via
ListingTable, with column pruning and predicate pushdown applied where the format supports it. - Bulk inserts (
load_arrow/ CDC sync): a new Vortex file is appended under the table path/prefix. - UPDATE / DELETE: requires a read-modify-write cycle — the full table is read, mutated in-memory, and re-written as a consolidated file. Not suitable for high-frequency point updates, especially over S3 where network round-trips dominate.
Object-store latency
Network round-trips dominate query time for cloud modes. For single-host deployments with a small dataset, local Vortex storage will be significantly faster than S3-compatible storage. Choose S3-compatible when the dataset size exceeds local disk, or when the OLAP mirror must be shared across hosts.
TOML configuration
Local Vortex
[engine]
oltp_path = "app.db"
olap_backend = "datafusion"
sync_mode = "destructive"
[engine.datafusion_storage]
mode = "vortex"
url = "./rhei-data"S3-compatible Vortex
[engine]
oltp_path = "app.db"
olap_backend = "datafusion"
sync_mode = "temporal"
[engine.datafusion_storage]
mode = "vortex"
url = "s3://my-analytics-bucket/rhei-data"Full S3 analytics example
[engine]
oltp_path = "app.db"
olap_backend = "datafusion"
sync_mode = "temporal"
sync_interval_ms = 1000
sync_batch_size = 2000
read_pool_size = 4
log_format = "json"
[engine.datafusion_storage]
mode = "vortex"
url = "s3://acme-analytics/prod/rhei"
[[tables]]
name = "orders"
ddl = "CREATE TABLE IF NOT EXISTS orders (id INTEGER PRIMARY KEY, amount REAL, status TEXT, created_at INTEGER)"
[[tables.columns]]
name = "id"
type = "int64"
nullable = false
primary_key = true
[[tables.columns]]
name = "amount"
type = "float64"
nullable = true
[[tables.columns]]
name = "status"
type = "utf8"
nullable = true
[[tables.columns]]
name = "created_at"
type = "int64"
nullable = trueRun with appropriate credentials in the environment:
AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=us-east-1 \
rh serve --config htap.tomlFor MinIO or other S3-compatible services, add AWS_ENDPOINT_URL:
AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... \
AWS_REGION=us-east-1 AWS_ENDPOINT_URL=https://my-minio.example.com \
rh serve --config htap.tomlCombining with temporal mode
Vortex storage works with both sync_mode = "destructive" and sync_mode = "temporal". Temporal mode on S3-compatible storage is a natural fit for long-retention audit history: Vortex files accumulate append-only, and point-in-time queries work identically to local storage (see Temporal mode).
Source
crates/rhei-datafusion/src/storage.rs — StorageMode enum definition with full doc-comments on read/write characteristics. crates/rhei-tui/src/config.rs — DataFusionStorageToml and parse_storage_mode (TOML mode string parsing).
