Ingest pipelines
XERJ runs its parsers in-process with the writer. One fsync, one network hop, no Logstash tier. Four ingest paths cover most real workloads.
turbo-ingest · NDJSON bulk
The fast path. Docs arrive as NDJSON, get parallel-tokenized across cores, and hit the memtable in batches. [indexing] turbo_batch_size controls the batch size, turbo_parallel enables concurrent tokenization, and turbo_fast_analyzer skips stemming for log-shaped documents.
$ curl -sX POST http://localhost:8080/v1/indices/logs/turbo-ingest \
-H 'Content-Type: application/x-ndjson' \
--data-binary @nginx.jsonl
logs · auto-detected timestamps
Sends raw log lines — Apache combined, Nginx access, ISO-8601 structured. The ingest path detects the format, normalizes the timestamp to µs, and stores the structured fields. No regex config.
$ curl -sX POST http://localhost:8080/v1/indices/logs/logs \
-H 'Content-Type: text/plain' \
--data-binary @access.log
syslog · RFC-5424
Accepts raw syslog frames. Parses the envelope, extracts structured fields, indexes the message.
$ curl -sX POST http://localhost:8080/v1/indices/syslog/syslog \
-H 'Content-Type: text/plain' \
--data-binary @messages
otlp · OpenTelemetry
OTLP HTTP protobuf, no collector in front. Accepts logs, metrics, and traces. Traces land as a connected graph rather than flat docs.
$ curl -sX POST http://localhost:8080/v1/indices/traces/otlp \
-H 'Content-Type: application/x-protobuf' \
--data-binary @spans.pb
Back-pressure
End-to-end. When the memtable approaches flush_size_mb, writers start to block rather than OOM. When the WAL is rolling faster than segments can be flushed, turbo-ingest returns 429 with a Retry-After header.
Source · engine/crates/logs/src/parse.rs · otlp/src/lib.rs