Skip to content

Vortex storage

Rhei's DataFusion backend can store OLAP data using Vortex, a modern columnar format with native local-filesystem and S3-compatible object-store support. A single Vortex { url } storage mode replaces the four file-format modes that existed in v1.5.x (ArrowIpc, Parquet, S3Parquet, GcsParquet).

Feature flag

The cloud-storage feature gates the S3-compatible Vortex codepath. Local Vortex storage (file-URL or relative path) works without any feature flags.

Enable S3-compatible Vortex storage in your Cargo.toml:

toml
[dependencies]
rhei = { version = "2", features = ["cloud-storage"] }

Or, when building from a checkout of the workspace:

sh
cargo build --features cloud-storage

DataFusion only

Vortex storage is available exclusively with the DataFusion OLAP backend. DuckDB uses its own native .duckdb file storage and is unaffected by this setting.

Storage modes

DataFusion supports two StorageMode variants in v2.0:

ModeTOML valueDescription
InMemory"in_memory" / "inmemory" / "memory"Default. Data is lost on shutdown. No url needed.
Vortex { url }"vortex"Unified Vortex storage. URL auto-classifies as local or S3-compatible. Requires url.

URL auto-classification

The url field in the Vortex mode is inspected at startup to determine whether local or S3-compatible storage is used:

URL patternClassificationFeature required
./path/to/dirLocal filesystem (relative)None
/absolute/pathLocal filesystem (absolute)None
file:///absolute/pathLocal filesystem (file URI)None
s3://bucket/prefixS3-compatible object storecloud-storage

Examples

The TOML field name is mode (not storage_mode). Use path for local storage and url for S3-compatible — they share the same internal Vortex { url } field but the parser separates them so a misconfiguration that would otherwise create a directory named s3:/ fails fast.

toml
# Local, relative to working directory
[engine.datafusion_storage]
mode = "vortex"
path = "./rhei-data"

# Local, absolute path
[engine.datafusion_storage]
mode = "vortex"
path = "/var/lib/rhei/olap"

# S3-compatible (requires the cloud-storage feature)
[engine.datafusion_storage]
mode = "vortex"
url = "s3://my-bucket/prefix"

S3-compatible backend

Supported services

The S3-compatible path works with any service that speaks the S3 API, including:

  • Amazon S3
  • MinIO
  • Cloudflare R2
  • Wasabi
  • Ceph RGW

Required environment variables

sh
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# For S3-compatible services (MinIO, R2, Wasabi, Ceph), override the endpoint:
export AWS_ENDPOINT_URL="https://my-minio.example.com"

For AWS S3 itself, AWS_ENDPOINT_URL is not required. For IAM role / instance-profile authentication, no static credentials are needed — the standard AWS credential chain applies.

URL format

text
s3://<bucket>/<prefix>

Example: s3://my-analytics-bucket/rhei-data

Per-table subdirectories are created beneath the prefix automatically.

GCS support dropped in v2.0

Google Cloud Storage (gs:// URLs) is no longer supported directly. GCS users have two options:

  1. S3-compatibility layer: run MinIO or another S3-compatible proxy in front of GCS (e.g. via gcsfuse + a local MinIO instance), then point url at the S3 endpoint with AWS_ENDPOINT_URL.
  2. Stay on v1.5.x: v1.5.x gcs-parquet mode continues to work on the v1.5.x branch. First-class GCS support may return in a future v2.x release.

Migration from v1.5.x

There is no automatic data conversion from ArrowIpc / Parquet / S3Parquet / GcsParquet to Vortex. The upgrade path is:

  1. Drop the existing OLAP directory (or S3 prefix).
  2. Re-sync from the OLTP source (engine.initial_sync_all().await? or rh sync).

A rh convert subcommand may ship in a future release to automate this step.

Read and write characteristics

  • Reads: DataFusion's query engine reads Vortex data natively via ListingTable, with column pruning and predicate pushdown applied where the format supports it.
  • Bulk inserts (load_arrow / CDC sync): a new Vortex file is appended under the table path/prefix.
  • UPDATE / DELETE: requires a read-modify-write cycle — the full table is read, mutated in-memory, and re-written as a consolidated file. Not suitable for high-frequency point updates, especially over S3 where network round-trips dominate.

Object-store latency

Network round-trips dominate query time for cloud modes. For single-host deployments with a small dataset, local Vortex storage will be significantly faster than S3-compatible storage. Choose S3-compatible when the dataset size exceeds local disk, or when the OLAP mirror must be shared across hosts.

TOML configuration

Local Vortex

toml
[engine]
oltp_path    = "app.db"
olap_backend = "datafusion"
sync_mode    = "destructive"

[engine.datafusion_storage]
mode = "vortex"
url          = "./rhei-data"

S3-compatible Vortex

toml
[engine]
oltp_path    = "app.db"
olap_backend = "datafusion"
sync_mode    = "temporal"

[engine.datafusion_storage]
mode = "vortex"
url          = "s3://my-analytics-bucket/rhei-data"

Full S3 analytics example

toml
[engine]
oltp_path        = "app.db"
olap_backend     = "datafusion"
sync_mode        = "temporal"
sync_interval_ms = 1000
sync_batch_size  = 2000
read_pool_size   = 4
log_format       = "json"

[engine.datafusion_storage]
mode = "vortex"
url          = "s3://acme-analytics/prod/rhei"

[[tables]]
name = "orders"
ddl  = "CREATE TABLE IF NOT EXISTS orders (id INTEGER PRIMARY KEY, amount REAL, status TEXT, created_at INTEGER)"

  [[tables.columns]]
  name        = "id"
  type        = "int64"
  nullable    = false
  primary_key = true

  [[tables.columns]]
  name     = "amount"
  type     = "float64"
  nullable = true

  [[tables.columns]]
  name     = "status"
  type     = "utf8"
  nullable = true

  [[tables.columns]]
  name     = "created_at"
  type     = "int64"
  nullable = true

Run with appropriate credentials in the environment:

sh
AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=us-east-1 \
  rh serve --config htap.toml

For MinIO or other S3-compatible services, add AWS_ENDPOINT_URL:

sh
AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... \
AWS_REGION=us-east-1 AWS_ENDPOINT_URL=https://my-minio.example.com \
  rh serve --config htap.toml

Combining with temporal mode

Vortex storage works with both sync_mode = "destructive" and sync_mode = "temporal". Temporal mode on S3-compatible storage is a natural fit for long-retention audit history: Vortex files accumulate append-only, and point-in-time queries work identically to local storage (see Temporal mode).

Source

crates/rhei-datafusion/src/storage.rsStorageMode enum definition with full doc-comments on read/write characteristics. crates/rhei-tui/src/config.rsDataFusionStorageToml and parse_storage_mode (TOML mode string parsing).

Released under the Apache-2.0 License.