Was this page helpful?
Working with Vector Search¶
This page provides a technical overview of how to work with vector search in ScyllaDB. It introduces the new vector data type, the vector index type, and the syntax for performing similarity searches using the ANN OF query.
Workflow¶
Create a keyspace with tablets disabled.
Create a table with a vector-typed column to store your embedding vectors.
Insert vector data (embeddings) into the table.
Create a vector index on the vector column to enable efficient similarity search.
Perform similarity searches using the
ANN OFquery to find vectors most similar to your input.
Vector Data Type¶
The VECTOR data type allows you to store fixed-length numeric vectors as
a native column type in ScyllaDB tables. These vectors can represent embedding
vectors or other high-dimensional numeric data used for similarity search.
Syntax:
vector<element_type, dimension>(e.g.vector<float, 768>)Element types: Typically floating-point types (e.g.,
float).Dimensions: Supports vectors with dimensionality ranging from 1 up to 16,000.
This vector data type integrates with ScyllaDB’s native protocol (v5) and is fully supported by the CQL interface.
See Data Types - Vectors in the ScyllaDB documentation for details.
Table with Vector Column¶
You can store and query vectors in a table that contains a vector-typed column
(e.g., vector<float, 64>). In the current beta release, the table must be
created in a keyspace with tablets disabled. CDC is enabled automatically for
vector indexes; no manual CDC configuration is required.
In the following example, a myapp keyspace is created with tablets disabled:
CREATE KEYSPACE myapp WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 } AND tablets = { 'enabled': false };
In the following example, a comments table is created in the myapp keyspace.
In addition to columns for storing and identifying comments (commenter name, comment text,
comment ID, etc.), it has a comment_vector vector-typed column for storing
64-dimensional vectors of float type.
CREATE TABLE IF NOT EXISTS myapp.comments ( record_id timeuuid, id uuid, commenter text, comment text, comment_vector vector<float, 64>, created_at timestamp, PRIMARY KEY (id, created_at) );
Embeddings¶
Embeddings are fixed-length numeric vectors that represent data - such as text, images, or audio - in a high-dimensional space, capturing their semantic or structural meaning. They are typically generated by external machine learning or deep learning models trained for tasks like semantic search, recommendation, or classification.
While ScyllaDB does not generate embeddings, it can efficiently store and query
embeddings produced by external tools or frameworks. These vectors can be
inserted into tables using the VECTOR data type and queried using Approximate
Nearest Neighbor (ANN) search via vector indexes.
To insert an embedding vector into a ScyllaDB table, use a standard INSERT
statement with a list of numeric values matching the vector column’s defined
dimension and data type.
Example:
INSERT INTO myapp.comments (
record_id,
id,
commenter,
comment,
comment_vector,
created_at
) VALUES (
now(),
uuid(),
'Alice',
'I like vector search in ScyllaDB.',
[0.12, 0.34, 0.56, 0.78, 0.91, 0.15, 0.62, 0.48,
0.22, 0.31, 0.40, 0.67, 0.53, 0.84, 0.19, 0.72,
0.63, 0.54, 0.26, 0.33, 0.11, 0.09, 0.27, 0.41,
0.69, 0.82, 0.57, 0.38, 0.71, 0.46, 0.55, 0.64,
0.17, 0.81, 0.23, 0.95, 0.66, 0.35, 0.44, 0.59,
0.02, 0.75, 0.28, 0.16, 0.92, 0.88, 0.47, 0.13,
0.99, 0.21, 0.32, 0.83, 0.45, 0.04, 0.86, 0.25,
0.36, 0.73, 0.07, 0.61, 0.52, 0.14, 0.68, 0.05],
toTimestamp(now())
);
The vector must match the dimension and element type declared in the table schema, e.g.,
vector<float, 64>.All vector values must be numeric (e.g.,
float), and enclosed in square brackets.
Vector Index Type¶
Before you query the data, you need to create a vector index to enable fast similarity search over vector columns. This index type is based on the HNSW (Hierarchical Navigable Small World) algorithm and supports Approximate Nearest Neighbor (ANN) search with configurable similarity functions.
Creation: Use a custom index on a vector column.
Similarity functions supported:
DOT_PRODUCT,COSINE(default), andEUCLIDEAN.Index parameters: Tunable HNSW parameters such as
m(maximum node connections),ef_construct(construction beam width), andef_search(search beam width).
Example:
CREATE CUSTOM INDEX IF NOT EXISTS ann_idx
ON myapp.items(embedding)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
See Global Secondary Indexes - Vector Index in the ScyllaDB documentation for details.
ANN OF Queries¶
Approximate Nearest Neighbor (ANN) is a search technique used to find data points in large, high-dimensional datasets that are most similar to a given query vector. Rather than computing exact distances for all entries, ANN algorithms trade off a small amount of accuracy for significant speed improvements, returning results that are sufficiently similar. This makes ANN especially effective for applications like semantic search, recommendations, image and audio retrieval, and generative AI, where real-time response and scalability are critical.
Once a vector index is created on a VECTOR-typed column, you can use
the ANN OF query to perform ANN searches. This query allows you to
efficiently retrieve the top-k rows with vectors most similar to a given input
vector, using the similarity function defined while creating the vector index.
Syntax:
SELECT column1, column2, ...
FROM keyspace.table
ORDER BY vector_column ANN OF [v1, v2, ..., vn]
LIMIT k;
vector_column: The name of the indexed vector column used for similarity search.
[v1, …, vn]: The input query vector. It must match the dimensionality of the indexed column.
k: The number of the nearest neighbors to return (required).
The query returns up to k most similar vectors, ranked according to the similarity function
defined in the index (COSINE, DOT_PRODUCT, or EUCLIDEAN).
Example:
SELECT id, commenter, comment, created_at
FROM myapp.comments
ORDER BY comment_vector ANN OF [
0.12, 0.34, 0.56, 0.78, 0.9
]
LIMIT 5;
See Data Manipulation- SELECT - Vector Queries in the ScyllaDB documentation for details.
CQL Limitations in Beta Release¶
ANN OFis only supported inORDER BYclauses.Filtering is not available as
ANN OFis not supported withWHEREclauses in vector queries.The
ALTER INDEXstatement is not supported for vector indexes. You cannot modify index options after the index has been created. To change these settings, you must drop the existing index and recreate it with the updated configuration.Time to Live (TTL) is not supported. This means that:
Creating a vector index on a table with TTL set by
default_time_to_livewill be rejected.Changing TTL for a table with a vector index is ignored.
Writes with TTL on a column with a vector index are ignored (TTL on other columns is accepted).
Rows existing when scheduling the build of the index with TTL set on the column selected for indexing are indexed.
Local indexes are not supported.