ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs ScyllaDB Cloud Vector Search BETA Working with Vector Search

Working with Vector Search¶

This page provides a technical overview of how to work with vector search in ScyllaDB. It introduces the new vector data type, the vector index type, and the syntax for performing similarity searches using the ANN OF query.

Workflow¶

  1. Create a keyspace with tablets disabled.

  2. Create a table with a vector-typed column to store your embedding vectors.

  3. Insert vector data (embeddings) into the table.

  4. Create a vector index on the vector column to enable efficient similarity search.

  5. Perform similarity searches using the ANN OF query to find vectors most similar to your input.

Vector Data Type¶

The VECTOR data type allows you to store fixed-length numeric vectors as a native column type in ScyllaDB tables. These vectors can represent embedding vectors or other high-dimensional numeric data used for similarity search.

  • Syntax: vector<element_type, dimension> (e.g. vector<float, 768>)

  • Element types: Typically floating-point types (e.g., float).

  • Dimensions: Supports vectors with dimensionality ranging from 1 up to 16,000.

This vector data type integrates with ScyllaDB’s native protocol (v5) and is fully supported by the CQL interface.

See Data Types - Vectors in the ScyllaDB documentation for details.

Table with Vector Column¶

You can store and query vectors in a table that contains a vector-typed column (e.g., vector<float, 64>). In the current beta release, the table must be created in a keyspace with tablets disabled. CDC is enabled automatically for vector indexes; no manual CDC configuration is required.

In the following example, a myapp keyspace is created with tablets disabled:

CREATE KEYSPACE myapp
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'replication_factor': 3
}
AND tablets = {
   'enabled': false
};

In the following example, a comments table is created in the myapp keyspace. In addition to columns for storing and identifying comments (commenter name, comment text, comment ID, etc.), it has a comment_vector vector-typed column for storing 64-dimensional vectors of float type.

CREATE TABLE IF NOT EXISTS myapp.comments (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector vector<float, 64>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
);

Embeddings¶

Embeddings are fixed-length numeric vectors that represent data - such as text, images, or audio - in a high-dimensional space, capturing their semantic or structural meaning. They are typically generated by external machine learning or deep learning models trained for tasks like semantic search, recommendation, or classification.

While ScyllaDB does not generate embeddings, it can efficiently store and query embeddings produced by external tools or frameworks. These vectors can be inserted into tables using the VECTOR data type and queried using Approximate Nearest Neighbor (ANN) search via vector indexes.

To insert an embedding vector into a ScyllaDB table, use a standard INSERT statement with a list of numeric values matching the vector column’s defined dimension and data type.

Example:

INSERT INTO myapp.comments (
    record_id,
    id,
    commenter,
    comment,
    comment_vector,
    created_at
) VALUES (
    now(),
    uuid(),
    'Alice',
    'I like vector search in ScyllaDB.',
    [0.12, 0.34, 0.56, 0.78, 0.91, 0.15, 0.62, 0.48,
     0.22, 0.31, 0.40, 0.67, 0.53, 0.84, 0.19, 0.72,
     0.63, 0.54, 0.26, 0.33, 0.11, 0.09, 0.27, 0.41,
     0.69, 0.82, 0.57, 0.38, 0.71, 0.46, 0.55, 0.64,
     0.17, 0.81, 0.23, 0.95, 0.66, 0.35, 0.44, 0.59,
     0.02, 0.75, 0.28, 0.16, 0.92, 0.88, 0.47, 0.13,
     0.99, 0.21, 0.32, 0.83, 0.45, 0.04, 0.86, 0.25,
     0.36, 0.73, 0.07, 0.61, 0.52, 0.14, 0.68, 0.05],
    toTimestamp(now())
);
  • The vector must match the dimension and element type declared in the table schema, e.g., vector<float, 64>.

  • All vector values must be numeric (e.g., float), and enclosed in square brackets.

Vector Index Type¶

Before you query the data, you need to create a vector index to enable fast similarity search over vector columns. This index type is based on the HNSW (Hierarchical Navigable Small World) algorithm and supports Approximate Nearest Neighbor (ANN) search with configurable similarity functions.

  • Creation: Use a custom index on a vector column.

  • Similarity functions supported: DOT_PRODUCT, COSINE (default), and EUCLIDEAN.

  • Index parameters: Tunable HNSW parameters such as m (maximum node connections), ef_construct (construction beam width), and ef_search (search beam width).

Example:

CREATE CUSTOM INDEX IF NOT EXISTS ann_idx
ON myapp.items(embedding)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };

See Global Secondary Indexes - Vector Index in the ScyllaDB documentation for details.

ANN OF Queries¶

Approximate Nearest Neighbor (ANN) is a search technique used to find data points in large, high-dimensional datasets that are most similar to a given query vector. Rather than computing exact distances for all entries, ANN algorithms trade off a small amount of accuracy for significant speed improvements, returning results that are sufficiently similar. This makes ANN especially effective for applications like semantic search, recommendations, image and audio retrieval, and generative AI, where real-time response and scalability are critical.

Once a vector index is created on a VECTOR-typed column, you can use the ANN OF query to perform ANN searches. This query allows you to efficiently retrieve the top-k rows with vectors most similar to a given input vector, using the similarity function defined while creating the vector index.

Syntax:

SELECT column1, column2, ...
FROM keyspace.table
ORDER BY vector_column ANN OF [v1, v2, ..., vn]
LIMIT k;
  • vector_column: The name of the indexed vector column used for similarity search.

  • [v1, …, vn]: The input query vector. It must match the dimensionality of the indexed column.

  • k: The number of the nearest neighbors to return (required).

The query returns up to k most similar vectors, ranked according to the similarity function defined in the index (COSINE, DOT_PRODUCT, or EUCLIDEAN).

Example:

SELECT id, commenter, comment, created_at
FROM myapp.comments
ORDER BY comment_vector ANN OF [
0.12, 0.34, 0.56, 0.78, 0.9
]
LIMIT 5;

See Data Manipulation- SELECT - Vector Queries in the ScyllaDB documentation for details.

CQL Limitations in Beta Release¶

  • ANN OF is only supported in ORDER BY clauses.

  • Filtering is not available as ANN OF is not supported with WHERE clauses in vector queries.

  • The ALTER INDEX statement is not supported for vector indexes. You cannot modify index options after the index has been created. To change these settings, you must drop the existing index and recreate it with the updated configuration.

  • Time to Live (TTL) is not supported. This means that:

    • Creating a vector index on a table with TTL set by default_time_to_live will be rejected.

    • Changing TTL for a table with a vector index is ignored.

    • Writes with TTL on a column with a vector index are ignored (TTL on other columns is accepted).

    • Rows existing when scheduling the build of the index with TTL set on the column selected for indexing are indexed.

  • Local indexes are not supported.

Was this page helpful?

PREVIOUS
Vector Search Deployments
NEXT
Vector Search Glossary
  • Create an issue

On this page

  • Working with Vector Search
    • Workflow
    • Vector Data Type
    • Table with Vector Column
    • Embeddings
    • Vector Index Type
    • ANN OF Queries
    • CQL Limitations in Beta Release
ScyllaDB Cloud
  • New to ScyllaDB? Start here!
  • Quick Start Guide to ScyllaDB Cloud
  • About ScyllaDB Cloud as a Service
    • Benefits
    • Backups
    • Best Practices
    • Managing ScyllaDB Versions
    • Support, Alerts, and SLA Commitments
    • Billing
  • Deployment
    • Cluster Types - X Cloud and Standard
    • Bring Your Own Account (BYOA) - AWS
    • Bring Your Own Account (BYOA) - GCP
    • Terraform Provider
    • Free Trial
  • Cluster Connections
    • Configure AWS Transit Gateway (TGW) VPC Attachment Connection
    • Configure Virtual Private Cloud (VPC) Peering with AWS
    • Configure Virtual Private Cloud (VPC) Peering with GCP
    • Migrating Cluster Connection
    • Checking Cluster Availability
    • Glossary for Cluster Connections
  • Access Management
    • SAML Single Sign-On (SSO)
    • User Management
  • Managing Clusters
    • Resizing a Cluster
    • Adding a Datacenter
    • Deleting a Cluster
    • Maintenance Windows
    • Email Notifications
    • Usage
  • Using ScyllaDB
    • Apache Cassandra Query Language (CQL)
    • ScyllaDB Drivers
    • Tracing
    • Role Based Access Control (RBAC)
    • ScyllaDB Integrations
  • Monitoring
    • Monitoring Clusters
    • Extracting Cluster Metrics in Prometheus Format
  • Security
    • Security Best Practices
    • Security Concepts
    • Database-level Encryption
    • Storage-level Encryption
    • Service Users
    • Data Privacy and Compliance
  • Vector Search
    • Quick Start Guide to Vector Search
    • Vector Search Clusters
    • Working with Vector Search
    • Glossary
    • Reference
    • Example Project
  • API Documentation
    • Create a Personal Token for Authentication
    • Terraform Provider for ScyllaDB Cloud
    • API Reference
    • Error Codes
  • Help & Learning
    • Tutorials
    • FAQ
    • Getting Help
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 24 Nov 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.9
Ask AI