ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs ScyllaDB Cloud Vector Search Vector Search Glossary

Vector Search Glossary¶

This glossary defines key terms related to Vector Search in ScyllaDB. It covers core concepts essential to understanding how vectors are stored, indexed, and queried.

  • ANN (Approximate Nearest Neighbor) Search — A search technique that efficiently finds data points in a large dataset that are most similar to a given query vector. Instead of looking for an exact match, ANN speeds up the search by accepting results that are close enough — making it ideal for working with large datasets and high-dimensional vector spaces in applications like semantic search, recommendations, and generative AI.

  • CDC Reader — A Change Data Capture (CDC) consumer that propagates base table mutations to the vector index. ScyllaDB uses two readers: a fine-grained reader with sub-second intervals for low-latency updates, and a wide-framed reader with a 30-second safety interval that ensures consistency.

  • Embedding — A vector generated by a machine learning model to represent raw data in a numerical form. ScyllaDB can store and query embeddings generated by an external tool and inserted into the database.

  • Filtering — The ability to combine an ANN similarity search with predicate constraints on primary key columns. ScyllaDB supports filtering via global vector indexes (with ALLOW FILTERING) and local vector indexes (with partition key restrictions). See Filtering.

  • Global Vector Index — A vector index that spans all partitions and enables cluster-wide similarity search. Global indexes support filtering on the base table’s primary key columns with ALLOW FILTERING. See Global Vector Indexes.

  • HNSW (Hierarchical Navigable Small World) — A graph-based algorithm for Approximate Nearest Neighbor search. It organizes vectors into a multi-layered graph, allowing efficient navigation from coarse to fine resolution. ScyllaDB’s vector index uses the HNSW algorithm implemented by the USearch library.

  • Local Vector Index — A per-partition vector index that is co-located with the data it indexes. Local indexes require the full partition key in the WHERE clause and support efficient filtered similarity search within a single partition. See Local Vector Indexes.

  • Oversampling — An index option (range 1.0-100.0) that controls how many candidate vectors the index evaluates internally before returning the top-k results. Higher oversampling improves recall at the cost of higher latency. See Quantization and Rescoring.

  • Quantization — A technique that reduces index memory usage by storing vectors in lower-precision formats (e.g., f16, i8, b1) instead of the default f32. Lower precision reduces memory but may decrease accuracy; combine with rescoring to recover precision. See Quantization and Rescoring.

  • Rescoring — A post-processing step where the index re-ranks candidate results using the original full-precision vectors from disk, improving accuracy after quantized search. Enabled by setting 'rescoring': 'true' in the index options. See Quantization and Rescoring.

  • Semantic Search — A type of similarity search that compares the meaning of a query and data items using vector embeddings. It enables context-aware retrieval by focusing on semantic relevance rather than exact terms.

  • Similarity Function (Distance Metric) — A mathematical function that measures how close two vectors are. In ScyllaDB, three similarity functions are supported: COSINE (default), DOT_PRODUCT, and EUCLIDEAN. See Choosing a Similarity Function.

  • Similarity Search — A technique for finding items in a dataset that are most similar to a query vector, using a distance or similarity measure. It is commonly used in high-dimensional vector spaces to retrieve approximate matches efficiently.

  • USearch Index — A high-performance, in-memory vector index library developed by Unum, designed for fast approximate nearest-neighbor (ANN) search. ScyllaDB uses USearch as the underlying engine for its Vector Search Index to deliver low-latency similarity queries and efficient memory utilization.

  • Vector — An ordered list of numbers (floats) representing data, such as text, images, or audio, in a way that captures its meaning or features.

  • Vector Search Index — In ScyllaDB, a USearch index built on a vector column that accelerates similarity queries. Unlike traditional indexes (for exact matches or ranges), a Vector Search index is optimized for approximate nearest neighbor (ANN) lookups over high-dimensional data.

  • Vector Type — A native ScyllaDB column type used to store fixed-length numeric vectors directly in a table for similarity search. See Data Types - Vectors in the ScyllaDB documentation.

Was this page helpful?

PREVIOUS
Vector Search FAQ
NEXT
Reference for Vector Search
  • Create an issue
ScyllaDB Cloud
  • Quick Start Guide to ScyllaDB Cloud
  • About ScyllaDB Cloud as a Service
    • Benefits
    • Best Practices
    • Billing
  • Deployment
    • Cluster Types - X Cloud and Standard
    • Bring Your Own Account (BYOA) - AWS
    • Bring Your Own Account (BYOA) - GCP
    • Terraform Provider
    • Free Trial
  • Cluster Connections
    • Configure AWS Transit Gateway (TGW) VPC Attachment Connection
    • Configure Virtual Private Cloud (VPC) Peering with AWS
    • Configure Virtual Private Cloud (VPC) Peering with GCP
    • Migrating Cluster Connection
    • Checking Cluster Availability
    • Glossary for Cluster Connections
  • Access Management
    • SAML Single Sign-On (SSO)
    • User Management
  • Managing Clusters
    • Resizing a Cluster
    • Adding a Datacenter
    • Deleting a Cluster
    • Maintenance Windows
    • Email Notifications
    • Usage
  • Security
    • Security Best Practices
    • Security Concepts
    • Database-level Encryption
    • Storage-level Encryption
    • Client-to-node Encryption
    • Service Users
    • Data Privacy and Compliance
  • Using ScyllaDB
    • Apache Cassandra Query Language (CQL)
    • ScyllaDB Drivers
    • Tracing
    • Role Based Access Control (RBAC)
    • ScyllaDB Integrations
  • Vector Search
    • Quick Start Guide
    • Vector Search Concepts
    • Vector Search Deployments
    • Sizing and Capacity Planning
    • Working with Vector Search
    • Filtering
    • Quantization and Rescoring
    • Security
    • Troubleshooting
    • FAQ
    • Glossary
    • Reference
    • Example Project
  • Service Behavior
    • Backups
    • Managing ScyllaDB Versions
    • Advanced Internode (RPC) Compression
  • Monitoring
    • Monitoring Clusters
    • Extracting Cluster Metrics in Prometheus Format
  • API Documentation
    • Create a Personal Token for Authentication
    • Terraform Provider for ScyllaDB Cloud
    • API Reference
    • Error Codes
  • Help & Learning
    • Tutorials
    • FAQ
    • Getting Help
Docs Tutorials University Contact Us About Us
© 2026, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 26 Mar 2026.
Powered by Sphinx 9.1.0 & ScyllaDB Theme 1.9.1
Ask AI