ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs ScyllaDB Cloud Vector Search Quantization and Rescoring

Quantization and Rescoring¶

Quantization and rescoring help you balance memory efficiency and search accuracy for Vector Search indexes in ScyllaDB. This page explains how to configure these features when creating a vector index.

Overview¶

By default, ScyllaDB stores vectors in the in-memory index using full 32-bit floating-point precision (f32). Quantization reduces the memory footprint of the index by storing vectors at lower precision. This compression trades some search accuracy for significant memory savings.

To mitigate the accuracy loss from quantization, ScyllaDB provides two complementary mechanisms:

  • Oversampling — retrieves a larger candidate set during the initial index search, increasing the chance that the true nearest neighbors are included.

  • Rescoring — re-calculates exact distances for candidates using the original full-precision vectors stored in ScyllaDB, then re-ranks results before returning them to the client.

Caution

Quantization applies only to the in-memory vector index. The source vectors stored in your ScyllaDB table always remain in their original float format. Your data is never degraded.

Quantization Levels¶

The quantization index option controls the numeric precision used in the vector index. The following levels are supported:

Value

Description

Memory per dimension

f32 (default)

32-bit single-precision IEEE 754 floating-point

4 bytes

f16

16-bit standard half-precision floating-point (IEEE 754)

2 bytes

bf16

16-bit “Brain” floating-point (optimized for ML workloads)

2 bytes

i8

8-bit signed integer

1 byte

b1

1-bit binary value (packed 8 per byte)

0.125 bytes

Lower-precision quantization levels use less memory but produce less accurate distance calculations in the index. Use oversampling and rescoring to recover accuracy.

Important

Quantization compresses only the vector data in the index. The HNSW graph structure (neighbor lists and edge metadata) is not compressed and its size stays constant regardless of quantization level. Because the graph overhead is a significant portion of total index memory, the actual memory savings from quantization are always much less than the raw compression ratio suggests. For example, going from f32 to i8 gives a 4x reduction in vector storage, but total index memory typically drops only ~3x. See Sizing and Capacity Planning for worked examples.

Oversampling¶

When a client requests the top K vectors, the search algorithm normally retrieves exactly K candidates from the index. With oversampling, the algorithm retrieves a larger candidate set:

Candidate pool size = ceil(K × oversampling)

The candidates are then sorted by distance and only the top K results are returned. A larger candidate pool increases the probability that the true nearest neighbors survive this final selection.

  • Range: 1.0 to 100.0

  • Default: 1.0 (no oversampling)

Oversampling offers two advantages over simply increasing the query LIMIT:

  • Performance — candidate filtering happens internally in ScyllaDB, avoiding the overhead of fetching and transporting extra rows to the application.

  • Scale — allows an effective internal limit of up to 100.0 × 1000 = 100,000 candidates.

Note

Even without quantization, the ANN algorithm is approximate. Setting oversampling > 1.0 can improve recall on high-dimensionality datasets even when using the default f32 precision.

Rescoring¶

Rescoring is a second-pass operation that re-calculates distances using the full-precision vectors stored in the ScyllaDB table, then re-ranks candidates before returning results.

  • ``true`` — ScyllaDB fetches original vectors and re-ranks candidates by exact distance.

  • ``false`` (default) — results are returned directly based on the approximate distances in the quantized index.

Caution

Rescoring can reduce search throughput by roughly 4 times because ScyllaDB must fetch the original full-precision vectors and recalculate exact distances for every candidate. Enable rescoring only when high recall is critical, and benchmark your workload to confirm acceptable performance.

Note

Rescoring is only beneficial when quantization is enabled. For unquantized indexes (default f32), the index already contains full-precision data, making the rescoring pass redundant.

CQL Syntax¶

Quantization, oversampling, and rescoring are configured as options when creating a vector index with CREATE CUSTOM INDEX:

CREATE CUSTOM INDEX ON myapp.comments(comment_vector)
USING 'vector_index'
WITH OPTIONS = {
  'similarity_function': 'COSINE',
  'quantization': 'i8',
  'oversampling': '5.0',
  'rescoring': 'true'
};

Options reference:

Option

Type

Default

Description

quantization

string

'f32'

Numeric precision for the index. Values: f32, f16, bf16, i8, b1.

oversampling

string (float)

'1.0'

Multiplier for the candidate set size. Range: 1.0-100.0.

rescoring

string (bool)

'false'

Whether to perform a second-pass exact distance calculation using full-precision vectors from storage.

Warning

The ALTER INDEX statement is not supported for vector indexes. To change quantization settings, you must drop the existing index and recreate it.

When to Use Quantization¶

Scenario

Recommendation

Small dataset, high recall required

Use default f32 — no quantization needed.

Large dataset, memory-constrained

Use i8 or f16 with oversampling of 3.0-10.0. Add rescoring: true only if very high recall is required.

Very large dataset, approximate results acceptable

Use b1 for maximum memory savings. Enable oversampling to compensate for accuracy loss.

High-dimensionality vectors (>= 768)

Consider oversampling > 1.0 even with f32 to improve recall.

What’s Next¶

  • Working with Vector Search — vector data type, index creation, and ANN queries.

  • Filtering Vector Search Results — combine similarity search with metadata filtering.

  • Vector Search Concepts — architecture and data flow.

Was this page helpful?

PREVIOUS
Filtering Vector Search Results
NEXT
Vector Search Security
  • Create an issue

On this page

  • Quantization and Rescoring
    • Overview
    • Quantization Levels
    • Oversampling
    • Rescoring
    • CQL Syntax
    • When to Use Quantization
    • What’s Next
ScyllaDB Cloud
  • Quick Start Guide to ScyllaDB Cloud
  • About ScyllaDB Cloud as a Service
    • Benefits
    • Backups
    • Best Practices
    • Managing ScyllaDB Versions
    • Support, Alerts, and SLA Commitments
    • Billing
  • Deployment
    • Cluster Types - X Cloud and Standard
    • Bring Your Own Account (BYOA) - AWS
    • Bring Your Own Account (BYOA) - GCP
    • Terraform Provider
    • Free Trial
  • Cluster Connections
    • Configure AWS Transit Gateway (TGW) VPC Attachment Connection
    • Configure Virtual Private Cloud (VPC) Peering with AWS
    • Configure Virtual Private Cloud (VPC) Peering with GCP
    • Migrating Cluster Connection
    • Checking Cluster Availability
    • Glossary for Cluster Connections
  • Access Management
    • SAML Single Sign-On (SSO)
    • User Management
  • Managing Clusters
    • Resizing a Cluster
    • Adding a Datacenter
    • Deleting a Cluster
    • Maintenance Windows
    • Email Notifications
    • Usage
  • Using ScyllaDB
    • Apache Cassandra Query Language (CQL)
    • ScyllaDB Drivers
    • Tracing
    • Role Based Access Control (RBAC)
    • ScyllaDB Integrations
  • Vector Search
    • Quick Start Guide
    • Vector Search Concepts
    • Vector Search Deployments
    • Sizing and Capacity Planning
    • Working with Vector Search
    • Filtering
    • Quantization and Rescoring
    • Security
    • Troubleshooting
    • FAQ
    • Glossary
    • Reference
    • Example Project
  • Monitoring
    • Monitoring Clusters
    • Extracting Cluster Metrics in Prometheus Format
  • Security
    • Security Best Practices
    • Security Concepts
    • Database-level Encryption
    • Storage-level Encryption
    • Client-to-node Encryption
    • Service Users
    • Data Privacy and Compliance
  • API Documentation
    • Create a Personal Token for Authentication
    • Terraform Provider for ScyllaDB Cloud
    • API Reference
    • Error Codes
  • Help & Learning
    • Tutorials
    • FAQ
    • Getting Help
Docs Tutorials University Contact Us About Us
© 2026, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 24 Mar 2026.
Powered by Sphinx 9.1.0 & ScyllaDB Theme 1.9.1
Ask AI