Was this page helpful?
Quick Start Guide to Vector Search¶
This quickstart will help you get familiar with Vector Search in ScyllaDB. It provides a step-by-step example of setting up a new cluster with vector search enabled, creating a vector index, and running a basic similarity query.
See Vector Search Deployments for information on enabling Vector Search in existing clusters and for a list of deployment limitations in the beta release.
See Working with Vector Search for details of Vector Search–related CQL syntax.
Create a Cluster with Vector Search¶
To create a new cluster with Vector Search enabled:
Go to cloud.scylladb.com and log in to your account, or sign up to create a new user account.
Click New Cluster to create a cluster.
Configure the required options:
Cluster name
Cloud provider: AWS or GCP under ScyllaDB Account.
Cluster type: Standard
Enable the Vector Search option.
Choose an instance type.
Review the Billing Options page for your cluster. The page displays the number of Vector Search nodes and the associated costs.
Vector Search UI supports only on-demand billing. If you want to use your existing contract, please contact ScyllaDB Support.
Click Launch Cluster. It will take a few minutes for your cluster to launch.
When your cluster is deployed, go to the Connect tab. It displays instructions on how to connect to your cluster.
Choose Cqlsh from the left menu and follow the instructions.
Your cluster is ready to work with Vector Search!
Create a Vector Index¶
Create a new keyspace.
CREATE KEYSPACE myapp WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };
Create a table with a vector column.
CREATE TABLE IF NOT EXISTS myapp.comments ( record_id timeuuid, id uuid, commenter text, comment text, comment_vector vector<float, 64>, created_at timestamp, PRIMARY KEY (id, created_at) );
Insert example rows.
INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Alice', 'I like vector search in ScyllaDB.', [0.12,0.34,0.56,0.78,0.91,0.15,0.62,0.48,0.22,0.31,0.40,0.67,0.53,0.84,0.19,0.72,0.63,0.54,0.26,0.33,0.11,0.09,0.27,0.41,0.69,0.82,0.57,0.38,0.71,0.46,0.55,0.64,0.17,0.81,0.23,0.95,0.66,0.35,0.44,0.59,0.02,0.75,0.28,0.16,0.92,0.88,0.47,0.13,0.99,0.21,0.32,0.83,0.45,0.04,0.86,0.25,0.36,0.73,0.07,0.61,0.52,0.14,0.68,0.05], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Bob', 'I like ScyllaDB!', [0.11,0.35,0.55,0.77,0.92,0.14,0.61,0.47,0.23,0.32,0.41,0.66,0.52,0.83,0.18,0.73,0.64,0.53,0.27,0.34,0.12,0.10,0.26,0.42,0.70,0.81,0.56,0.39,0.70,0.47,0.54,0.65,0.16,0.80,0.22,0.94,0.67,0.34,0.43,0.58,0.03,0.74,0.29,0.17,0.91,0.87,0.46,0.12,0.98,0.20,0.31,0.84,0.44,0.05,0.85,0.24,0.35,0.72,0.06,0.60,0.51,0.13,0.67,0.06], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Charlie', 'Can somebody recommend a good restaurant in Paris?', [0.55,0.08,0.44,0.19,0.77,0.25,0.39,0.10,0.50,0.62,0.07,0.14,0.97,0.23,0.36,0.92,0.31,0.81,0.06,0.42,0.70,0.28,0.59,0.21,0.85,0.63,0.15,0.30,0.38,0.27,0.11,0.79,0.52,0.99,0.33,0.40,0.12,0.73,0.24,0.47,0.65,0.20,0.57,0.87,0.13,0.48,0.74,0.04,0.60,0.29,0.18,0.64,0.71,0.16,0.53,0.45,0.95,0.02,0.37,0.26,0.05,0.82,0.35,0.32], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Diana', 'Vector databases are the future', [0.12,0.33,0.57,0.79,0.90,0.16,0.63,0.49,0.21,0.30,0.39,0.68,0.54,0.85,0.20,0.71,0.62,0.55,0.25,0.32,0.10,0.08,0.28,0.40,0.68,0.83,0.58,0.37,0.72,0.45,0.56,0.63,0.18,0.82,0.24,0.96,0.65,0.36,0.45,0.60,0.01,0.76,0.27,0.15,0.93,0.89,0.48,0.14,1.00,0.22,0.33,0.82,0.46,0.03,0.87,0.26,0.37,0.74,0.08,0.62,0.53,0.13,0.69,0.04], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Eve', 'Testing similarity search queries in ScyllaDB', [0.13,0.36,0.59,0.76,0.88,0.17,0.60,0.50,0.25,0.34,0.38,0.65,0.50,0.82,0.23,0.70,0.66,0.51,0.28,0.31,0.09,0.07,0.30,0.43,0.71,0.80,0.60,0.36,0.74,0.48,0.53,0.62,0.19,0.83,0.26,0.93,0.64,0.33,0.46,0.61,0.00,0.73,0.31,0.13,0.90,0.85,0.49,0.11,0.97,0.19,0.35,0.81,0.42,0.06,0.89,0.29,0.34,0.75,0.10,0.63,0.52,0.12,0.67,0.02], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Frank', 'Deep learning meets databases', [0.10,0.31,0.53,0.81,0.93,0.13,0.59,0.45,0.26,0.33,0.42,0.64,0.49,0.80,0.22,0.74,0.61,0.57,0.24,0.35,0.15,0.11,0.29,0.39,0.66,0.84,0.55,0.40,0.73,0.50,0.51,0.60,0.14,0.79,0.20,0.92,0.68,0.37,0.41,0.56,0.04,0.77,0.30,0.18,0.91,0.86,0.47,0.10,0.96,0.23,0.36,0.80,0.43,0.02,0.88,0.27,0.38,0.70,0.06,0.65,0.54,0.08,0.71,0.07], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Grace', 'ScyllaDB is highly performant', [0.11,0.84,0.29,0.71,0.17,0.94,0.62,0.53,0.43,0.25,0.96,0.38,0.18,0.82,0.45,0.01,0.75,0.19,0.30,0.58,0.12,0.68,0.92,0.15,0.26,0.20,0.44,0.32,0.89,0.16,0.64,0.54,0.79,0.27,0.36,0.21,0.09,0.50,0.23,0.88,0.39,0.33,0.06,0.70,0.31,0.07,0.80,0.13,0.24,0.52,0.46,0.85,0.60,0.08,0.48,0.22,0.14,0.42,0.10,0.34,0.28,0.02,0.41,0.63], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Heidi', 'Anyone here using ScyllaDB for AI applications?', [0.08,0.37,0.52,0.82,0.94,0.19,0.57,0.43,0.27,0.36,0.44,0.62,0.48,0.79,0.26,0.76,0.59,0.50,0.30,0.29,0.14,0.05,0.32,0.37,0.65,0.86,0.61,0.42,0.75,0.51,0.49,0.59,0.12,0.77,0.25,0.90,0.70,0.39,0.40,0.54,0.06,0.71,0.33,0.21,0.89,0.84,0.45,0.09,0.93,0.16,0.37,0.78,0.41,0.00,0.86,0.22,0.40,0.69,0.11,0.64,0.56,0.10,0.73,0.09], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Ivan', 'Looking forward to using vector capabilities.', [0.14,0.38,0.60,0.74,0.87,0.20,0.55,0.42,0.28,0.37,0.37,0.61,0.47,0.78,0.24,0.77,0.58,0.52,0.31,0.28,0.08,0.04,0.33,0.36,0.64,0.87,0.62,0.43,0.77,0.52,0.48,0.58,0.11,0.76,0.23,0.89,0.71,0.40,0.39,0.53,0.07,0.70,0.34,0.22,0.88,0.83,0.44,0.08,0.95,0.15,0.38,0.77,0.40,0.01,0.90,0.21,0.41,0.68,0.12,0.66,0.57,0.11,0.72,0.03], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Judy', 'I''m looking for a new job.', [0.24,0.85,0.03,0.64,0.91,0.12,0.48,0.26,0.39,0.77,0.58,0.43,0.15,0.08,0.72,0.05,0.68,0.36,0.95,0.22,0.31,0.14,0.66,0.11,0.19,0.29,0.93,0.47,0.30,0.80,0.25,0.84,0.54,0.62,0.37,0.28,0.56,0.46,0.33,0.99,0.02,0.18,0.40,0.63,0.21,0.50,0.59,0.35,0.32,0.09,0.06,0.27,0.75,0.44,0.81,0.42,0.17,0.20,0.73,0.07,0.55,0.60,0.16,0.13], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Karl', 'Indexes for high dimensional data are tricky', [0.65,0.24,0.19,0.77,0.05,0.31,0.49,0.09,0.36,0.58,0.20,0.86,0.27,0.40,0.73,0.04,0.80,0.12,0.93,0.25,0.46,0.38,0.70,0.13,0.60,0.52,0.16,0.81,0.29,0.17,0.41,0.88,0.07,0.63,0.50,0.28,0.96,0.21,0.11,0.83,0.03,0.44,0.35,0.15,0.68,0.22,0.95,0.54,0.08,0.72,0.47,0.26,0.33,0.32,0.85,0.10,0.42,0.06,0.59,0.84,0.18,0.48,0.30,0.14], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Laura', 'Approximate nearest neighbor queries are useful', [0.07,0.33,0.46,0.82,0.25,0.94,0.10,0.38,0.59,0.11,0.84,0.41,0.19,0.69,0.05,0.30,0.17,0.74,0.23,0.45,0.09,0.36,0.62,0.14,0.28,0.49,0.01,0.93,0.20,0.12,0.72,0.54,0.40,0.80,0.08,0.29,0.99,0.43,0.32,0.86,0.02,0.67,0.18,0.26,0.55,0.21,0.63,0.47,0.06,0.71,0.42,0.15,0.50,0.27,0.95,0.04,0.60,0.39,0.31,0.57,0.16,0.22,0.53,0.35], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Mallory', 'Hello!', [0.84,0.11,0.57,0.34,0.22,0.75,0.63,0.41,0.15,0.20,0.92,0.36,0.07,0.60,0.48,0.23,0.71,0.27,0.39,0.29,0.51,0.08,0.77,0.17,0.42,0.68,0.10,0.31,0.40,0.95,0.28,0.56,0.32,0.66,0.04,0.30,0.13,0.45,0.89,0.38,0.19,0.54,0.14,0.79,0.35,0.47,0.25,0.09,0.61,0.44,0.12,0.81,0.33,0.50,0.21,0.18,0.65,0.26,0.05,0.87,0.24,0.37,0.46,0.02], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Niaj', 'Query optimization is important for large datasets', [0.23,0.47,0.15,0.60,0.31,0.02,0.53,0.79,0.41,0.25,0.14,0.89,0.09,0.50,0.07,0.33,0.94,0.12,0.65,0.46,0.19,0.35,0.08,0.42,0.22,0.37,0.05,0.83,0.20,0.49,0.11,0.68,0.24,0.18,0.77,0.55,0.04,0.30,0.16,0.61,0.40,0.71,0.26,0.39,0.13,0.98,0.32,0.09,0.58,0.27,0.91,0.36,0.21,0.06,0.75,0.44,0.10,0.63,0.28,0.38,0.17,0.56,0.03,0.52], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Olivia', 'Search quality improves with better vectors', [0.34,0.63,0.27,0.49,0.14,0.72,0.40,0.12,0.53,0.81,0.33,0.07,0.19,0.44,0.29,0.95,0.09,0.70,0.31,0.05,0.64,0.20,0.37,0.16,0.60,0.86,0.02,0.26,0.47,0.17,0.30,0.79,0.13,0.55,0.04,0.35,0.92,0.24,0.18,0.67,0.21,0.51,0.36,0.08,0.74,0.28,0.10,0.42,0.25,0.57,0.15,0.39,0.11,0.48,0.22,0.66,0.50,0.03,0.45,0.19,0.78,0.06,0.32,0.82], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Peggy', 'Large vectors consume more space', [0.72,0.15,0.38,0.20,0.56,0.44,0.13,0.30,0.92,0.05,0.49,0.33,0.10,0.61,0.22,0.27,0.07,0.48,0.02,0.65,0.14,0.53,0.36,0.12,0.73,0.19,0.08,0.79,0.26,0.39,0.18,0.54,0.04,0.35,0.83,0.24,0.11,0.90,0.47,0.29,0.09,0.71,0.31,0.45,0.01,0.58,0.37,0.84,0.28,0.16,0.06,0.80,0.25,0.50,0.41,0.17,0.55,0.46,0.21,0.67,0.40,0.03,0.23,0.85], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Rupert', 'Dimensionality reduction is sometimes required', [0.61,0.08,0.26,0.75,0.13,0.32,0.42,0.05,0.97,0.28,0.46,0.36,0.09,0.20,0.84,0.11,0.63,0.14,0.38,0.22,0.10,0.59,0.17,0.41,0.06,0.69,0.18,0.35,0.07,0.44,0.25,0.80,0.16,0.53,0.21,0.82,0.02,0.57,0.23,0.30,0.29,0.93,0.12,0.66,0.03,0.48,0.27,0.19,0.15,0.71,0.34,0.40,0.24,0.98,0.37,0.31,0.55,0.45,0.50,0.74,0.33,0.04,0.39,0.99], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Sybil', 'My cat is aggressive. Can somebody help?', [0.49,0.12,0.66,0.25,0.05,0.59,0.21,0.18,0.74,0.04,0.38,0.45,0.15,0.50,0.19,0.27,0.93,0.09,0.56,0.14,0.44,0.17,0.31,0.08,0.29,0.84,0.20,0.40,0.07,0.33,0.96,0.23,0.11,0.73,0.32,0.54,0.13,0.46,0.30,0.22,0.10,0.42,0.26,0.39,0.02,0.63,0.36,0.28,0.16,0.68,0.01,0.48,0.24,0.35,0.52,0.03,0.06,0.65,0.09,0.41,0.53,0.37,0.60,0.88], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Trent', 'Similarity search powers recommendation engines', [0.28,0.64,0.09,0.46,0.12,0.38,0.57,0.07,0.33,0.71,0.18,0.49,0.24,0.62,0.04,0.26,0.95,0.15,0.35,0.11,0.29,0.40,0.08,0.55,0.20,0.36,0.05,0.83,0.19,0.50,0.23,0.72,0.30,0.16,0.45,0.39,0.10,0.58,0.27,0.31,0.17,0.93,0.25,0.54,0.03,0.41,0.22,0.13,0.14,0.79,0.42,0.34,0.21,0.88,0.47,0.32,0.44,0.06,0.63,0.48,0.02,0.85,0.37,0.99], toTimestamp(now())); INSERT INTO myapp.comments (record_id, id, commenter, comment, comment_vector, created_at) VALUES (now(), uuid(), 'Victor', 'I''m hungry.', [0.06,0.43,0.22,0.71,0.19,0.09,0.33,0.12,0.74,0.29,0.17,0.47,0.20,0.60,0.25,0.30,0.98,0.15,0.32,0.14,0.40,0.27,0.13,0.35,0.23,0.46,0.11,0.84,0.21,0.50,0.28,0.62,0.31,0.16,0.44,0.36,0.10,0.57,0.26,0.39,0.18,0.95,0.24,0.53,0.03,0.42,0.08,0.34,0.09,0.78,0.41,0.38,0.05,0.80,0.48,0.33,0.45,0.07,0.65,0.52,0.02,0.82,0.37,0.99], toTimestamp(now()));
To enable approximate nearest neighbor (ANN) queries, create a vector index.
CREATE CUSTOM INDEX IF NOT EXISTS comment_ann_index ON myapp.comments(comment_vector) USING 'vector_index' WITH OPTIONS = { 'similarity_function': 'COSINE' };
See Global Secondary Indexes - Vector Index in the ScyllaDB documentation for details.
Run a Vector Search Query¶
Now you can run similarity queries.
In the following example, the vector is identical to the one in Alice’s comment: “I like vector search in ScyllaDB.”.
SELECT id, commenter, comment
FROM myapp.comments
ORDER BY comment_vector ANN OF [
0.12,0.34,0.56,0.78,0.91,0.15,0.62,0.48,0.22,0.31,
0.40,0.67,0.53,0.84,0.19,0.72,0.63,0.54,0.26,0.33,
0.11,0.09,0.27,0.41,0.69,0.82,0.57,0.38,0.71,0.46,
0.55,0.64,0.17,0.81,0.23,0.95,0.66,0.35,0.44,0.59,
0.02,0.75,0.28,0.16,0.92,0.88,0.47,0.13,0.99,0.21,
0.32,0.83,0.45,0.04,0.86,0.25,0.36,0.73,0.07,0.61,
0.52,0.14,0.68,0.05
]
LIMIT 3;
With the limit set to 3, up to the three most similar comments to the provided query vector will be retrieved:
id (of Alice), commenter = Alice, comment = "I like vector search in ScyllaDB."
id (of Diana), commenter = Diana, comment = "Vector databases are the future"
id (of Bob), commenter = Bob, comment = "I like ScyllaDB!"