January 1, 2026 Benchmark

1.17x

Single Query

140μs vs 164μs

1.46x

Pool (100 concurrent)

16.2ms vs 23.6ms

4.00x

HTTP/2 Batch

4.8ms vs 19ms

📊 Single Query Performance

1000 sequential searches - Fair comparison

Driver	Latency/Query	Throughput	Result
QAIL gRPC	140.3μs	7,126 ops/sec	1.17x faster
Official Client	164.0μs	6,096 ops/sec	baseline

 🔧 Key Optimizations Buffer pooling: .split() vs .clone()
Direct h2 transport (no Tonic overhead)
Pre-computed protobuf tags
unsafe memcpy for 1536 floats → 1 operation
 

🚀 HTTP/2 Pipelining (Batch)

50 queries sent concurrently over single connection

Approach	Total Time	Per Query	Result
HTTP/2 Pipelined	4.8ms	95μs	4.00x faster
Sequential	19.0ms	380μs	baseline

💡 HTTP/2 multiplexing wins!

All 50 requests sent concurrently - perfect for RAG pipelines

Reproduce Results

git clone https://github.com/qail-io/qail
cd qail/qdrant

# Start Qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Seed data
python3 examples/seed_qdrant.py

# Run benchmarks
cargo run --example fair_benchmark --release
cargo run --example batch_benchmark --release