1.17x
Single Query
140μs vs 164μs
1.46x
Pool (100 concurrent)
16.2ms vs 23.6ms
4.00x
HTTP/2 Batch
4.8ms vs 19ms
📊 Single Query Performance
1000 sequential searches - Fair comparison
| Driver | Latency/Query | Throughput | Result |
|---|---|---|---|
| QAIL gRPC | 140.3μs | 7,126 ops/sec | 1.17x faster |
| Official Client | 164.0μs | 6,096 ops/sec | baseline |
🔧 Key Optimizations
- Buffer pooling:
.split()vs.clone() - Direct h2 transport (no Tonic overhead)
- Pre-computed protobuf tags
- unsafe memcpy for 1536 floats → 1 operation
🚀 HTTP/2 Pipelining (Batch)
50 queries sent concurrently over single connection
| Approach | Total Time | Per Query | Result |
|---|---|---|---|
| HTTP/2 Pipelined | 4.8ms | 95μs | 4.00x faster |
| Sequential | 19.0ms | 380μs | baseline |
💡 HTTP/2 multiplexing wins!
All 50 requests sent concurrently - perfect for RAG pipelines
Reproduce Results
git clone https://github.com/qail-io/qail
cd qail/qdrant
# Start Qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Seed data
python3 examples/seed_qdrant.py
# Run benchmarks
cargo run --example fair_benchmark --release
cargo run --example batch_benchmark --release