Vector Search Glossary | ScyllaDB Docs

Vector Search Glossary¶

This glossary defines key terms related to Vector Search in ScyllaDB. It covers core concepts essential to understanding how vectors are stored, indexed, and queried.

Vector - An ordered list of numbers (floats) representing data, such as text, images, or audio, in a way that captures its meaning or features.

Vector Type - A native ScyllaDB column type used to store fixed-length numeric vectors directly in a table for similarity search. See Data Types - Vectors in the ScyllaDB documentation.

Vector Search Index - In ScyllaDB, a USearch index built on a vector column that accelerates similarity queries. Unlike traditional indexes (for exact matches or ranges), a Vector Search index is optimized for approximate nearest neighbor (ANN) lookups over high-dimensional data.

USearch Index - A high-performance, in-memory vector index library developed by Unum, designed for fast approximate nearest-neighbor (ANN) search. ScyllaDB uses USearch as the underlying engine for its Vector Search Index to deliver low-latency similarity queries and efficient memory utilization.

ANN (Approximate Nearest Neighbor) Search - A search technique that efficiently finds data points in a large dataset that are most similar to a given query vector. Instead of looking for an exact match, ANN speeds up the search by accepting results that are close enough - making it ideal for working with large datasets and high-dimensional vector spaces in applications like semantic search, recommendations, and generative AI.

Similarity Search - A technique for finding items in a dataset that are most similar to a query vector, using a distance or similarity measure. It is commonly used in high-dimensional vector spaces to retrieve approximate matches efficiently.

Semantic Search - A type of similarity search that compares the meaning of a query and data items using vector embeddings. It enables context-aware retrieval by focusing on semantic relevance rather than exact terms.

Embedding - A vector generated by a machine learning model to represent raw data in a numerical form. ScyllaDB can store and query embeddings generated by an external tool and inserted into the database.

Similarity Function (Distance Metric) - A mathematical function that measures how close two vectors are. In ScyllaDB, three similarity functions are supported: dot product, cosine similarity, and Euclidean distance.

Was this page helpful?