ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Search Ask AI
ScyllaDB Docs ScyllaDB Cloud Vector Search Vector Search Glossary

Vector Search Glossary¶

This glossary defines key terms related to Vector Search in ScyllaDB. It covers core concepts essential to understanding how vectors are stored, indexed, and queried.

  • ANN (Approximate Nearest Neighbor) Search — A search technique that efficiently finds data points in a large dataset that are most similar to a given query vector. Instead of looking for an exact match, ANN speeds up the search by accepting results that are close enough — making it ideal for working with large datasets and high-dimensional vector spaces in applications like semantic search, recommendations, and generative AI.

  • CDC Reader — A Change Data Capture (CDC) consumer that propagates base table mutations to the vector index. ScyllaDB uses two readers: a fine-grained reader with sub-second intervals for low-latency updates, and a wide-framed reader with a 30-second safety interval that ensures consistency.

  • Embedding — A vector generated by a machine learning model to represent raw data in a numerical form. ScyllaDB can store and query embeddings generated by an external tool and inserted into the database.

  • Filtering — The ability to combine an ANN similarity search with predicate constraints on primary key columns. ScyllaDB supports filtering via global vector indexes (with ALLOW FILTERING) and local vector indexes (with partition key restrictions). See Filtering.

  • Global Vector Index — A vector index that spans all partitions and enables cluster-wide similarity search. Global indexes support filtering on the base table’s primary key columns with ALLOW FILTERING. See Global Vector Indexes.

  • HNSW (Hierarchical Navigable Small World) — A graph-based algorithm for Approximate Nearest Neighbor search. It organizes vectors into a multi-layered graph, allowing efficient navigation from coarse to fine resolution. ScyllaDB’s vector index uses the HNSW algorithm implemented by the USearch library.

  • Local Vector Index — A per-partition vector index that is co-located with the data it indexes. Local indexes require the full partition key in the WHERE clause and support efficient filtered similarity search within a single partition. See Local Vector Indexes.

  • Oversampling — An index option (range 1.0-100.0) that controls how many candidate vectors the index evaluates internally before returning the top-k results. Higher oversampling improves recall at the cost of higher latency. See Quantization and Rescoring.

  • Quantization — A technique that reduces index memory usage by storing vectors in lower-precision formats (e.g., f16, i8, b1) instead of the default f32. Lower precision reduces memory but may decrease accuracy; combine with rescoring to recover precision. See Quantization and Rescoring.

  • Rescoring — A post-processing step where the index re-ranks candidate results using the original full-precision vectors from disk, improving accuracy after quantized search. Enabled by setting 'rescoring': 'true' in the index options. See Quantization and Rescoring.

  • Semantic Search — A type of similarity search that compares the meaning of a query and data items using vector embeddings. It enables context-aware retrieval by focusing on semantic relevance rather than exact terms.

  • Similarity Function (Distance Metric) — A mathematical function that measures how close two vectors are. In ScyllaDB, three similarity functions are supported: COSINE (default), DOT_PRODUCT, and EUCLIDEAN. See Choosing a Similarity Function.

  • Similarity Search — A technique for finding items in a dataset that are most similar to a query vector, using a distance or similarity measure. It is commonly used in high-dimensional vector spaces to retrieve approximate matches efficiently.

  • USearch Index — A high-performance, in-memory vector index library developed by Unum, designed for fast approximate nearest-neighbor (ANN) search. ScyllaDB uses USearch as the underlying engine for its Vector Search Index to deliver low-latency similarity queries and efficient memory utilization.

  • Vector — An ordered list of numbers (floats) representing data, such as text, images, or audio, in a way that captures its meaning or features.

  • Vector Search Index — In ScyllaDB, a USearch index built on a vector column that accelerates similarity queries. Unlike traditional indexes (for exact matches or ranges), a Vector Search index is optimized for approximate nearest neighbor (ANN) lookups over high-dimensional data.

  • Vector Type — A native ScyllaDB column type used to store fixed-length numeric vectors directly in a table for similarity search. See Data Types - Vectors in the ScyllaDB documentation.

Was this page helpful?

PREVIOUS
Vector Search FAQ
NEXT
Reference for Vector Search
  • Create an issue
ScyllaDB Cloud
Search Ask AI
  • Get Started
    • What Is ScyllaDB Cloud?
    • Free Trial
    • Quick Start Guide
    • Billing and Pricing
  • Create & Connect to Your Cluster
    • Deployment Overview
    • Choose Your Cluster Type
      • Cluster Types Overview
      • X Cloud Clusters
      • X Cloud Autoscaling Behavior and Best Practices
      • Standard Clusters
    • Deploy to Your Own AWS Account (BYOA)
    • Deploy to Your Own GCP Account (BYOA)
    • Configure Availability Zones
    • Connect to Your Cluster
    • Cluster Setup Best Practices
  • Configure Network Access
    • Network Access Options
    • Configure AWS Transit Gateway (TGW) VPC Attachment Connection
    • Configure Virtual Private Cloud (VPC) Peering with AWS
    • Configure Virtual Private Cloud (VPC) Peering with GCP
    • Migrate a Cluster Connection
    • Check Cluster Availability
    • Glossary for Cluster Connections
  • Operate and Manage Clusters
    • Resize a Cluster
    • Add a Datacenter
    • Delete a Cluster
    • Configure Maintenance Windows
    • Configure Notifications
    • Track Resource Usage
    • Monitor Clusters
    • Monitor with Prometheus
    • Backups
  • Use ScyllaDB
    • Application Best Practices
    • Apache Cassandra Query Language (CQL)
    • ScyllaDB Drivers
    • Data Modeling
    • Tracing
    • Change Data Capture (CDC)
    • Role Based Access Control (RBAC)
    • ScyllaDB Alternator (DynamoDB-compatible API)
    • Lightweight Transactions (LWT)
    • ScyllaDB Integrations
  • Security
    • Security Best Practices
    • Security Concepts
    • Database-level Encryption
    • Storage-level Encryption
    • Client-to-node Encryption
    • Service Users
    • User Management
    • SAML Single Sign-On (SSO)
    • Immutable (WORM) Backups
    • Data Privacy and Compliance
  • Vector Search
    • Quick Start Guide
    • Vector Search Concepts
    • Vector Search Deployments
    • Sizing and Capacity Planning
    • Working with Vector Search
    • Filtering
    • Quantization and Rescoring
    • Security
    • Troubleshooting
    • FAQ
    • Glossary
    • Reference
    • Example Project
  • Cost Optimization
    • Cost Optimization Overview
    • Advanced Internode (RPC) Compression
    • Datacenter Placement and Data Transfer Costs
  • Automate with the ScyllaDB Cloud API
    • Programmatic Access Overview
    • Create a Personal Token for Authentication
    • API Reference
    • API Error Codes
    • Terraform Provider for ScyllaDB Cloud
    • ScyllaDB Cloud MCP Server
  • Get Help
    • FAQ
    • Tutorials
    • Getting Help
Docs Tutorials University Contact Us About Us
© 2026, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 29 Jun 2026.
Powered by Sphinx 9.1.0 & ScyllaDB Theme 1.9.2