ScyllaDB Docs ScyllaDB Cloud Vector Search Vector Search Concepts

Vector Search Concepts¶

This page explains the foundational concepts behind vector search and the architecture of Vector Search in ScyllaDB. It covers both general knowledge — what vectors are, how similarity search works, and how the HNSW algorithm operates — and ScyllaDB-specific details like architecture, CDC-based indexing, and quantization.

Tip

This page focuses on how it works. For practical how-to guidance (CQL syntax, index creation, queries), see Working with Vector Search.

What is Vector Search?¶

Traditional keyword search matches documents by the exact words they contain. If you search for “cheap flights,” you will only find documents that contain those specific terms — missing results that use “affordable airfare” or “low-cost travel,” even though they mean the same thing.

Vector search solves this problem by comparing meaning instead of words. It works in three steps:

Represent data as vectors. An embedding model converts each item (text, image, audio) into a fixed-length list of numbers called a vector. The model is trained so that items with similar meaning are placed close together in a high-dimensional vector space.
Index the vectors. A specialized data structure (in ScyllaDB’s case, an HNSW graph) organizes the vectors so that nearby vectors can be found quickly without scanning every single one.
Search by similarity. Given a query item, compute its vector using the same embedding model, then find the stored vectors that are closest to it in the vector space. These are the most semantically similar items.

Because vectors encode meaning rather than surface-level tokens, “cheap flights” and “affordable airfare” end up close together in vector space and are correctly identified as similar.

Common Use Cases¶

Vector search is used across a wide range of AI and data-intensive applications:

Semantic search — Find documents or passages that match the meaning of a query, not just the keywords.
Retrieval-Augmented Generation (RAG) — Provide relevant context to a large language model (LLM) by retrieving similar documents from a vector store before generating a response.
Recommendation systems — Find items (products, movies, articles) similar to those a user has interacted with.
Image and audio search — Find visually or acoustically similar media using vectors from vision or audio models.
Anomaly detection — Identify data points that are far from all clusters in vector space, flagging them as outliers.
Deduplication — Find near-duplicate records by identifying vectors that are very close together.

How Embeddings Work¶

An embedding is a fixed-length numeric vector that represents a piece of data in a high-dimensional space. Embedding models are trained so that semantically similar items map to vectors that are close together.

The embedding pipeline:

Raw data (text, image, audio, etc.) is fed into an embedding model.
The model outputs a fixed-length vector — for example, a list of 768 floating-point numbers: [0.12, -0.34, 0.56, ...].
This vector is stored in ScyllaDB and indexed for similarity search.

Caution

You must use the same embedding model for both indexing and querying. Vectors from different models live in incompatible vector spaces and cannot be meaningfully compared. If you change your embedding model, you must re-embed and re-index all your data.

ScyllaDB does not generate embeddings — your application is responsible for calling the embedding model (OpenAI, Cohere, sentence-transformers, etc.) and inserting the resulting vectors. See Choosing an Embedding Model for model selection guidance.

Why You Need an Index¶

Without a vector index, a similarity query must compare the query vector against every stored vector — a brute-force scan with cost \(O(N \times D)\), where \(N\) is the number of vectors and \(D\) is the number of dimensions. For a dataset of 10 million 768-dimensional vectors, that means computing ~7.7 billion floating-point operations per query, which can take seconds.

A vector index (such as HNSW) pre-organizes vectors into a navigable data structure that allows finding approximate nearest neighbors in roughly \(O(\log N)\) time — reducing query latency from seconds to single-digit milliseconds.

In ScyllaDB, the vector index lives in memory on dedicated vector search nodes. This is why instance sizing depends on the number, dimensionality, and quantization level of your vectors.

ANN vs. Exact Search¶

There are two ways to find the nearest neighbors of a query vector:

Exact (brute-force) KNN search: Compute the distance between the query vector and every vector in the dataset. Returns perfect results (recall = 1.0). Cost: \(O(N \times D)\). Impractical for large datasets — for 10M vectors of 768 dimensions, this requires ~7.7 billion distance computations per query.
Approximate Nearest Neighbor (ANN) search: Use an index (HNSW in ScyllaDB’s case) to quickly find approximately the nearest neighbors. Trades a small amount of accuracy (recall < 1.0) for dramatically faster search — typically \(O(\log N)\) traversal of the HNSW graph.

Why ANN is the standard: For datasets with millions of vectors and hundreds of dimensions, exact search would take seconds per query. ANN indexes bring this down to single-digit milliseconds.

Understanding Recall¶

Recall is the fraction of true nearest neighbors that the approximate search actually finds. A recall of 0.95 means that 95 out of 100 results match what an exact brute-force search would return.

It is important to understand that low recall does not mean wrong results. The top-k ANN results are all genuinely similar vectors — it is just that a small number of the most similar may be missed. For most applications (semantic search, recommendations, RAG), recall above 0.95 is functionally equivalent to exact search.

You can increase recall by raising the search_beam_width (ef_search) parameter at the cost of higher query latency. See Tuning the Vector Index.

HNSW Algorithm¶

ScyllaDB’s vector index is based on the Hierarchical Navigable Small World (HNSW) algorithm, a state-of-the-art approach for Approximate Nearest Neighbor search in high-dimensional spaces.

How HNSW Builds the Graph¶

HNSW organizes vectors into a multi-layer graph:

Layer 0 (bottom) contains all vectors. Each vector is connected to up to m of its nearest neighbors, forming a dense proximity graph.
Higher layers contain progressively fewer vectors. When a new vector is inserted, it is promoted to higher layers with probability \(1/\ln(M)\) (where \(M\) = m). This means most vectors exist only in layer 0, while a small fraction appear in the higher layers.
Top layers are extremely sparse and act as “express lanes” — allowing the search algorithm to make large jumps across the dataset.

During construction, each new vector is inserted by:

Starting at the entry point (top layer).
Greedily navigating to the nearest neighbor at each layer.
Descending to the bottom layer.
Connecting the new vector to its ef_construct nearest candidates at each layer where it appears, keeping at most m connections per node.

The quality of the graph depends on how thoroughly the algorithm searches for neighbors during construction (controlled by ef_construct).

How HNSW Search Works¶

Searching the HNSW graph follows a top-down traversal:

Start at a fixed entry point in the top layer.
At each layer, greedily move to the closest neighbor. When no closer neighbor is found, descend to the next layer.
At the bottom layer (layer 0), expand the search using a beam of width ef_search: maintain a priority queue of the best candidates found so far, exploring their neighbors until no improvement is possible.
Return the top-k results from the candidate pool.

The multi-layer structure means the search makes large jumps in the sparse upper layers (quickly narrowing down the region of interest) and then fine-tunes in the dense bottom layer. This gives HNSW its characteristic \(O(\log N)\) search complexity.

HNSW Parameters Explained¶

The three main HNSW parameters control the trade-off between recall, build speed, query latency, and memory usage. Below, the common HNSW parameter names are listed first, followed by ScyllaDB’s CQL option names in parentheses:

m (ScyllaDB option: maximum_node_connections, default: 16)

The maximum number of edges (connections) per node in the graph. Think of it as the number of roads between cities — more roads mean more route options (better recall) but higher infrastructure cost (more memory).

Higher m → richer graph → better recall, but more memory per vector and slower inserts.
Lower m → sparser graph → lower memory, but potentially lower recall.
Typical range: 8-64. Use higher values for high-dimensional vectors (>512 dimensions).

ef_construct (ScyllaDB option: construction_beam_width, default: 128)

Controls how hard the algorithm tries to find the best neighbors when inserting a new vector. A higher value builds a higher-quality graph by evaluating more candidates during construction.

Only affects build time, not query time.
Higher values → better graph quality → higher eventual recall, but slower index builds.
Typical range: 64-512.

ef_search (ScyllaDB option: search_beam_width, default: 128)

Controls the number of candidates evaluated at query time. This is the main knob for tuning recall vs. query latency.

Higher ef_search → more candidates evaluated → higher recall, but higher query latency.
Lower ef_search → fewer candidates → faster queries, but potentially lower recall.
Must be ≥ k (the LIMIT in your query).
Typical range: 64-512 for most workloads.

Note

You cannot change index parameters after creation. To adjust parameters, drop the index and recreate it. See Tuning the Vector Index for practical guidance.

USearch Engine¶

ScyllaDB uses USearch as the underlying HNSW implementation. USearch is a high-performance, in-memory vector search library developed by Unum that provides:

Efficient memory layout optimized for modern CPUs.
Support for multiple distance metrics (cosine, dot product, Euclidean).
Built-in quantization support (f32, f16, bf16, i8, b1).
Lock-free concurrent reads and writes.

Similarity Functions¶

When you create a vector index, you choose a similarity function that determines how vector distances are computed. The chosen function must match the characteristics of your embedding model. ScyllaDB supports three functions:

COSINE (default): Measures the angle between two vectors. Produces values between -1 and 1, where 1 means identical direction. Best for normalized embeddings (unit vectors), which is the output of most text embedding models.
DOT_PRODUCT: Computes the inner product of two vectors. Suitable when vector magnitudes carry meaning (e.g., popularity-weighted embeddings). Faster than cosine on some workloads when vectors are pre-normalized.
EUCLIDEAN: Measures the straight-line (L2) distance between two vectors. Values range from 0 (identical) to infinity. Best for spatial data or applications where absolute distance matters.

See Choosing a Similarity Function for practical guidance on when to use each function.

Mathematical Definitions¶

For two vectors \(\mathbf{a} = (a_1, a_2, \ldots, a_n)\) and \(\mathbf{b} = (b_1, b_2, \ldots, b_n)\):

Cosine similarity:

\[\text{cosine}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}| \cdot |\mathbf{b}|} = \frac{\sum_{i=1}^{n} a_i b_i}{\sqrt{\sum_{i=1}^{n} a_i^2} \cdot \sqrt{\sum_{i=1}^{n} b_i^2}}\]

Range: \([-1, 1]\). A value of 1 means identical direction; -1 means opposite direction.

Dot product:

\[\text{dot}(\mathbf{a}, \mathbf{b}) = \sum_{i=1}^{n} a_i \cdot b_i\]

Range: \((-\infty, +\infty)\). Sensitive to both direction and magnitude.

Euclidean distance:

\[\text{euclidean}(\mathbf{a}, \mathbf{b}) = \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2}\]

Range: \([0, +\infty)\). A value of 0 means identical vectors.

Cosine vs. Dot Product for Normalized Vectors¶

For unit-normalized vectors (vectors with magnitude 1), cosine similarity is mathematically equivalent to the dot product because the denominator \(|\mathbf{a}| \cdot |\mathbf{b}| = 1\). This means:

If your embedding model outputs normalized vectors (most text embedding models do), both COSINE and DOT_PRODUCT give identical results.
DOT_PRODUCT may be slightly faster in this case because it avoids the normalization division.
If vector magnitudes carry meaning (e.g., you multiply vectors by a confidence score or popularity weight), use DOT_PRODUCT — cosine similarity discards magnitude information.

Architecture Overview¶

ScyllaDB Vector Search uses a dedicated node architecture. Vector Search nodes are separate from the regular ScyllaDB storage nodes and are deployed alongside your cluster. This separation provides several benefits:

Independent scaling — vector search capacity can be scaled without affecting storage node performance.
Resource isolation — memory-intensive vector indexes do not compete with storage workloads for resources.
Flexible instance types — vector search nodes can use memory-optimized instances suited for in-memory indexes.

When you enable Vector Search on a cluster, ScyllaDB Cloud deploys one or more vector search nodes per Availability Zone. These nodes host the in-memory vector indexes and handle all ANN query traffic.

Data Flow¶

The lifecycle of a vector query follows these steps:

Write path — Your application inserts or updates rows containing vector data in a regular ScyllaDB table. The write goes to the storage nodes.
CDC propagation — Changes are captured by the CDC (Change Data Capture) subsystem and propagated to the vector search nodes. See CDC-Based Indexing below.
Index update — The vector search node updates its in-memory HNSW index with the new or modified vector.
Query path — When a client issues an ORDER BY ... ANN OF query, the coordinator routes the ANN portion to the vector search nodes, which traverse the HNSW graph and return the top-k candidate row keys.
Result assembly — The coordinator fetches the full rows from the storage nodes and returns them to the client.

CDC-Based Indexing¶

Vector indexes are updated asynchronously through ScyllaDB’s Change Data Capture (CDC) mechanism. When you insert, update, or delete a row in the base table, the change is recorded in a CDC log. Vector search nodes consume this log to keep the in-memory index synchronized.

ScyllaDB uses a dual CDC reader system:

Fine-grained reader — Polls the CDC log at high frequency (sub-second intervals) and applies changes as soon as they appear. This provides low-latency propagation, with typical p50 latency under 1 second.
Wide-framed reader — Operates on a longer interval (~30 seconds) to sweep up any changes that the fine-grained reader might have missed. This ensures eventual consistency and acts as a safety net.

The dual-reader design means that:

Newly inserted vectors are typically searchable within ~1 second (fine-grained reader latency).
In the worst case, a change becomes visible within ~30 seconds (wide-framed reader interval).
The system remains consistent even under node failures or temporary disruptions.

Tablets and Data Distribution¶

Vector Search requires tablets-enabled keyspaces. Tablets are ScyllaDB’s modern data distribution mechanism that provides:

Fine-grained load balancing across nodes.
Dynamic rebalancing without full-cluster streaming.
Improved support for indexes, including vector indexes.

All ScyllaDB versions currently used by ScyllaDB Cloud enable tablets by default. See Tablets Requirement for details.

Quantization and Memory¶

Full-precision (f32) vector indexes store each dimension as a 4-byte float. For large datasets or high-dimensional vectors, this can require significant memory. Quantization reduces memory usage by representing vectors in lower-precision formats:

Level	Bytes per dimension	Notes
`f32`	4	Full precision (default). Best recall, highest memory.
`f16`	2	Half precision. Good trade-off for most workloads.
`bf16`	2	Brain float. Optimized for ML model outputs.
`i8`	1	8-bit integer. Significant memory savings with moderate recall loss.
`b1`	0.125	Binary (1-bit). Maximum compression. Best with rescoring enabled.

Combine quantization with oversampling and rescoring to recover accuracy lost through lower precision. See Quantization and Rescoring for practical configuration guidance.

What’s Next¶

Quick Start Guide — hands-on walkthrough of your first similarity search.
Working with Vector Search — CQL syntax for vector tables, indexes, and ANN queries.
Sizing Guide — estimate memory requirements and choose instance types.
Filtering — combine ANN search with predicate constraints.
Quantization and Rescoring — reduce index memory while maintaining quality.

Was this page helpful?