Was this page helpful?
Vector Search¶
Vector Search in ScyllaDB¶
Vector Search is a powerful method for efficiently searching and retrieving high-dimensional data based on similarity rather than exact matches. It is particularly useful in AI and machine learning applications, where data is often represented as vectors — mathematical representations of objects such as text, images, audio, or video. In these applications, you typically need to retrieve data that is similar to a given query, rather than relying on keyword-based search or exact matches.
ScyllaDB’s Vector Search feature allows you to store, index, and query high-dimensional vector data at scale. Vector Search is built to work within your existing ScyllaDB infrastructure, taking advantage of its high-performance and highly available architecture.
Common Use Cases¶
Semantic search — Find documents or passages that match the meaning of a query, not just the keywords.
Retrieval-Augmented Generation (RAG) — Provide relevant context to an LLM by retrieving similar documents from a vector store.
Recommendation systems — Find items similar to those a user has interacted with.
Image and audio search — Find visually or acoustically similar media.
Anomaly detection — Identify outliers far from all clusters in vector space.
Deduplication — Find near-duplicate records by identifying vectors that are very close together.
See Common Use Cases in the Concepts page for more details.
Getting Started¶
- Walk through setting up a vector-enabled table, inserting data, and running your first similarity search in minutes.
Understanding Vector Search¶
- Architecture overview, HNSW algorithm, CDC-based indexing, and data flow between storage and vector search nodes.
- Definitions of key terms including ANN, HNSW, embeddings, similarity functions, quantization, filtering, and more.
Deployment and Operations¶
- Create, enable, resize, disable, and monitor Vector Search clusters in ScyllaDB Cloud via the UI or API.
- Estimate memory requirements, understand quantization impact, and choose instance types for your workload.
- Authentication, authorization, service-level isolation, and network security for vector search.
Working with Vectors¶
- CQL usage guide covering the vector data type, vector indexes, similarity functions, index tuning, ANN queries, and driver integration.
- Combine similarity search with metadata constraints using global and local vector indexes.
- Reduce index memory usage with quantization (f16, i8, b1) and recover precision with oversampling and rescoring.
Troubleshooting and Reference¶
- Common issues and solutions for index creation, query results, data insertion, performance, and connectivity.
- Frequently asked questions about similarity functions, dimensions, latency, filtering, quantization, and more.
- Technical reference for instance types, CQL syntax, index options, and Cloud API endpoints.
Examples¶
- Learn how to use ScyllaDB Vector Search to build RAG applications, semantic caching layers, and how it integrates with popular LLM libraries like LlamaIndex and LangChain.