ScyllaDB Docs ScyllaDB Cloud Vector Search Quick Start Guide to Vector Search

Quick Start Guide to Vector Search¶

This quickstart will help you get familiar with Vector Search in ScyllaDB. It provides a step-by-step example of setting up a new cluster with vector search enabled, creating a vector index, and running a basic similarity query.

See Vector Search Deployments for information on enabling Vector Search in existing clusters and for a list of deployment limitations.
See Working with Vector Search for details of Vector Search-related CQL syntax.

Prerequisites¶

A ScyllaDB Cloud account. Sign up at cloud.scylladb.com if you don’t have one. You can use a free trial cluster to try Vector Search at no cost. Free trial clusters are limited to the smallest instance size (t4g.medium on AWS, e2-medium on GCP).
cqlsh installed on your machine (or use the web-based CQL console available in the ScyllaDB Cloud UI).
For real workloads, an embedding model (e.g., OpenAI, Cohere, or an open-source sentence-transformer) to generate vectors from your data. This quickstart uses hand-crafted vectors for simplicity.

Create a Cluster with Vector Search¶

Create a new cluster with Vector Search enabled by following the steps in Creating a New Cluster with Vector Search Enabled. When your cluster is deployed, go to the Connect tab, choose Cqlsh from the left menu, and follow the instructions to connect.

Create a Vector Index¶

Create a new keyspace.
```
CREATE KEYSPACE myapp;
```

Create a table with a vector column.

Note

This example uses 5-dimensional vectors for clarity. In production, you will typically use higher dimensions (384-1536) to match your embedding model’s output.

CREATE TABLE IF NOT EXISTS myapp.comments (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector vector<float, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
);

Insert example rows.

INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at)
  VALUES (now(), uuid(), 'Alice', 'I like vector search in ScyllaDB.',
          [0.12, 0.34, 0.56, 0.78, 0.91], toTimestamp(now()));
INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at)
  VALUES (now(), uuid(), 'Bob', 'I like ScyllaDB!',
          [0.11, 0.35, 0.55, 0.77, 0.92], toTimestamp(now()));
INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at)
  VALUES (now(), uuid(), 'Charlie', 'Can somebody recommend a good restaurant in Paris?',
          [0.55, 0.08, 0.44, 0.19, 0.77], toTimestamp(now()));
INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at)
  VALUES (now(), uuid(), 'Diana', 'Vector databases are the future',
          [0.12, 0.33, 0.57, 0.79, 0.90], toTimestamp(now()));
INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at)
  VALUES (now(), uuid(), 'Eve', 'Testing similarity search queries in ScyllaDB',
          [0.13, 0.36, 0.59, 0.76, 0.88], toTimestamp(now()));

To enable approximate nearest neighbor (ANN) queries, create a vector index.

CREATE CUSTOM INDEX IF NOT EXISTS comment_ann_index
ON myapp.comments(comment_vector)
USING 'vector_index'
WITH OPTIONS = {
  'similarity_function': 'COSINE'
};

See Global Secondary Indexes - Vector Index in the ScyllaDB documentation for details.

Run a Vector Search Query¶

Now you can run similarity queries.

In the following example, the query vector is identical to Alice’s comment vector: [0.12, 0.34, 0.56, 0.78, 0.91].

SELECT commenter, comment
FROM myapp.comments
ORDER BY comment_vector ANN OF [0.12, 0.34, 0.56, 0.78, 0.91]
LIMIT 3;

With the limit set to 3, up to the three most similar comments to the provided query vector will be retrieved:

Alice  | I like vector search in ScyllaDB.
Diana  | Vector databases are the future
Bob    | I like ScyllaDB!

Because the query vector is identical to Alice’s, her comment appears first. Diana’s and Bob’s comments rank next because their vectors are numerically closest (highest cosine similarity) to the query vector.

Retrieve Similarity Scores¶

To include similarity scores in your results, call the similarity function that matches your index’s distance metric. Since the index above uses COSINE, use similarity_cosine:

SELECT commenter, comment,
       similarity_cosine(comment_vector, [0.12, 0.34, 0.56, 0.78, 0.91])
       AS similarity
FROM myapp.comments
ORDER BY comment_vector ANN OF [0.12, 0.34, 0.56, 0.78, 0.91]
LIMIT 3;

The three available functions are similarity_cosine, similarity_dot_product, and similarity_euclidean. Each returns a float in [0, 1], where values closer to 1 indicate greater similarity.

See Vector Similarity Functions for details.

What’s Next¶

Working with Vector Search — learn about the vector data type, index options, and ANN query syntax.
Filtering Vector Search Results — combine similarity search with metadata constraints.
Quantization and Rescoring — reduce index memory usage.
Vector Search Deployments — enable, resize, or disable Vector Search on your cluster.

Was this page helpful?