Was this page helpful?
Quantization and Rescoring¶
Quantization and rescoring help you balance memory efficiency and search accuracy for Vector Search indexes in ScyllaDB. This page explains how to configure these features when creating a vector index.
Overview¶
By default, ScyllaDB stores vectors in the in-memory index using full 32-bit
floating-point precision (f32). Quantization reduces the memory footprint
of the index by storing vectors at lower precision. This compression trades
some search accuracy for significant memory savings.
To mitigate the accuracy loss from quantization, ScyllaDB provides two complementary mechanisms:
Oversampling — retrieves a larger candidate set during the initial index search, increasing the chance that the true nearest neighbors are included.
Rescoring — re-calculates exact distances for candidates using the original full-precision vectors stored in ScyllaDB, then re-ranks results before returning them to the client.
Caution
Quantization applies only to the in-memory vector index. The source
vectors stored in your ScyllaDB table always remain in their original
float format. Your data is never degraded.
Quantization Levels¶
The quantization index option controls the numeric precision used in the
vector index. The following levels are supported:
Value |
Description |
Memory per dimension |
|---|---|---|
|
32-bit single-precision IEEE 754 floating-point |
4 bytes |
|
16-bit standard half-precision floating-point (IEEE 754) |
2 bytes |
|
16-bit “Brain” floating-point (optimized for ML workloads) |
2 bytes |
|
8-bit signed integer |
1 byte |
|
1-bit binary value (packed 8 per byte) |
0.125 bytes |
Lower-precision quantization levels use less memory but produce less accurate distance calculations in the index. Use oversampling and rescoring to recover accuracy.
Important
Quantization compresses only the vector data in the index. The HNSW
graph structure (neighbor lists and edge metadata) is not compressed
and its size stays constant regardless of quantization level. Because the
graph overhead is a significant portion of total index memory, the actual
memory savings from quantization are always much less than the raw
compression ratio suggests. For example, going from f32 to i8
gives a 4x reduction in vector storage, but total index memory typically
drops only ~3x. See Sizing and Capacity Planning for worked examples.
Oversampling¶
When a client requests the top K vectors, the search algorithm normally retrieves exactly K candidates from the index. With oversampling, the algorithm retrieves a larger candidate set:
Candidate pool size = ceil(K × oversampling)
The candidates are then sorted by distance and only the top K results are returned. A larger candidate pool increases the probability that the true nearest neighbors survive this final selection.
Range: 1.0 to 100.0
Default: 1.0 (no oversampling)
Oversampling offers two advantages over simply increasing the query LIMIT:
Performance — candidate filtering happens internally in ScyllaDB, avoiding the overhead of fetching and transporting extra rows to the application.
Scale — allows an effective internal limit of up to 100.0 × 1000 = 100,000 candidates.
Note
Even without quantization, the ANN algorithm is approximate. Setting
oversampling > 1.0 can improve recall on high-dimensionality datasets
even when using the default f32 precision.
Rescoring¶
Rescoring is a second-pass operation that re-calculates distances using the full-precision vectors stored in the ScyllaDB table, then re-ranks candidates before returning results.
``true`` — ScyllaDB fetches original vectors and re-ranks candidates by exact distance.
``false`` (default) — results are returned directly based on the approximate distances in the quantized index.
Caution
Rescoring can reduce search throughput by roughly 4 times because ScyllaDB must fetch the original full-precision vectors and recalculate exact distances for every candidate. Enable rescoring only when high recall is critical, and benchmark your workload to confirm acceptable performance.
Note
Rescoring is only beneficial when quantization is enabled. For unquantized
indexes (default f32), the index already contains full-precision data,
making the rescoring pass redundant.
CQL Syntax¶
Quantization, oversampling, and rescoring are configured as options when
creating a vector index with CREATE CUSTOM INDEX:
CREATE CUSTOM INDEX ON myapp.comments(comment_vector)
USING 'vector_index'
WITH OPTIONS = {
'similarity_function': 'COSINE',
'quantization': 'i8',
'oversampling': '5.0',
'rescoring': 'true'
};
Options reference:
Option |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Numeric precision for the index. Values: |
|
string (float) |
|
Multiplier for the candidate set size. Range: 1.0-100.0. |
|
string (bool) |
|
Whether to perform a second-pass exact distance calculation using full-precision vectors from storage. |
Warning
The ALTER INDEX statement is not supported for vector indexes. To change
quantization settings, you must drop the existing index and recreate it.
When to Use Quantization¶
Scenario |
Recommendation |
|---|---|
Small dataset, high recall required |
Use default |
Large dataset, memory-constrained |
Use |
Very large dataset, approximate results acceptable |
Use |
High-dimensionality vectors (>= 768) |
Consider |
What’s Next¶
Working with Vector Search — vector data type, index creation, and ANN queries.
Filtering Vector Search Results — combine similarity search with metadata filtering.
Vector Search Concepts — architecture and data flow.