Was this page helpful?
Filtering Vector Search Results¶
Filtering lets you combine similarity search with metadata constraints, so results are both semantically relevant and meet your business requirements. This page explains how to use filtering with global and local vector indexes in ScyllaDB.
Overview¶
Typical filtering use cases include:
Multi-tenant isolation —
tenant_id = 'acme'Recency —
created_at >= '2024-01-01'Geo-filtering —
region IN ('EU', 'US')Access control —
visibility = 'public'
Without filtering, you would need to retrieve a large set of similar vectors and then filter them in your application. ScyllaDB filtering pushes this work into the database, reducing network overhead and application complexity.
ScyllaDB supports two types of vector indexes for filtering:
Global vector indexes — index all vectors in a table. Filter on columns that are part of the table’s primary key.
Local vector indexes — index vectors within a single partition. Significantly faster than global indexes because they search only a single partition’s index instead of the entire index space.
Caution
For best filtering performance, design your schema so that the columns
you filter on are part of the partition key, and use a local vector
index. This ensures that only equality (=) filters on partition key
columns are needed, which is the fastest path. Global index filtering and
inequality/IN operators are substantially slower (see details below).
Filtering with Global Vector Indexes¶
A global vector index indexes all vector data stored in a table. You can filter results using columns that are part of the table’s primary key (partition key and clustering columns).
Caution
Searching through a global index is always much slower than searching through a local index, because ScyllaDB must search the entire index space across all partitions and then post-filter the results. The more selective the filter (i.e., the fewer rows that match), the slower the query, because more index entries must be scanned to find enough matching results.
Global index queries also require the ALLOW FILTERING option in the
SELECT statement.
Whenever possible, prefer local vector indexes for filtered vector search.
Example Table Schema¶
The examples in this section use the following table with a composite partition key:
CREATE TABLE IF NOT EXISTS myapp.comments_vs (
commenter text,
comment text,
comment_vector VECTOR<FLOAT, 5>,
created_at timestamp,
discussion_board_id int,
country text,
lang text,
PRIMARY KEY ((commenter, discussion_board_id), created_at)
);
Creating a Global Vector Index¶
The syntax is the same as creating a standard vector index:
CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
ON myapp.comments_vs(comment_vector)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
With a global vector index, you can filter on any column that is part of the base table’s primary key.
Querying with Filtering¶
You can filter on primary key columns (commenter, discussion_board_id,
created_at) in your ANN query:
SELECT commenter, comment FROM myapp.comments_vs
WHERE created_at = '1970-01-01 00:01:04'
ORDER BY comment_vector ANN OF [0.1, 0.2, 0.3, 0.4, 0.5] LIMIT 5
ALLOW FILTERING;
The LIMIT keyword limits the number of results returned after applying
the filter, not before. Internally, the ANN search first retrieves candidate
vectors by similarity, then applies the filter predicates. If the filter
eliminates most candidates, the final result set may be smaller than the
requested LIMIT.
Local Vector Indexes¶
A local vector index creates a separate vector index per partition. This is more efficient than global filtering when you frequently query within a specific partition (e.g., per tenant, per user, per discussion board).
Caution
The partition key columns in a local vector index must match the partition key of the base table.
Creating a Local Vector Index¶
The local vector index syntax specifies the partition key columns in parentheses before the vector column:
CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
ON myapp.comments_vs((commenter, discussion_board_id), comment_vector)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
In this example:
(commenter, discussion_board_id)— the partition key columns. These must match the base table’s partition key.comment_vector— the vector column to index.
For each unique partition key value, ScyllaDB maintains a separate vector index, which keeps index sizes small and queries local to a single node.
Querying with a Local Vector Index¶
When using a local vector index, you must specify the full partition key
in the WHERE clause:
SELECT commenter, comment FROM myapp.comments_vs
WHERE commenter = 'Alice' AND discussion_board_id = 42
ORDER BY comment_vector ANN OF [0.1, 0.2, 0.3, 0.4, 0.5] LIMIT 5;
You can also combine partition key filtering with clustering column filtering:
SELECT commenter, comment FROM myapp.comments_vs
WHERE commenter = 'Alice' AND discussion_board_id = 42
AND created_at >= '2024-01-01'
ORDER BY comment_vector ANN OF [0.1, 0.2, 0.3, 0.4, 0.5] LIMIT 5;
Caution
Inequality operators (>, <, >=, <=) and the IN
operator are always slow in vector search queries, regardless of
whether they are applied to partition key or clustering columns. These
operators force ScyllaDB to search a much larger portion of the index
space than an equality (=) filter. The slowdown is proportional to
the filter’s selectivity - the fewer rows that match the filter, the
more index entries must be scanned, and the slower the query becomes.
For best performance, design your schema so that the columns you need to filter on are partition key columns queried with equality (``=``) operators. Use a local vector index so that the search is confined to a single partition’s index.
Choosing Between Global and Local Indexes¶
Criteria |
Global Vector Index |
Local Vector Index |
|---|---|---|
Index scope |
All rows in the table |
Rows within a single partition |
Filter columns |
Primary key columns |
Primary key columns |
Requires partition key in WHERE |
No |
Yes |
Performance at scale (>10M vectors) |
Always much slower (searches entire index space) |
Fast (searches only one partition’s index) |
Use case |
Cross-partition similarity search |
Per-tenant, per-user, or scoped search |
|
Yes (when using WHERE clause) |
No |
General guidance:
Always prefer local indexes over global indexes for filtered vector search. Local indexes search only a single partition’s index, while global indexes must search the entire index space, making them significantly slower.
Design your schema so that columns you filter on are part of the partition key. This lets you use a local vector index with equality (
=) filters on the partition key - the fastest possible filtering path.Use global indexes only when you genuinely need to search across all data without knowing the partition key in advance. Be aware that performance degrades as the dataset grows.
Avoid inequality and IN operators in filtered vector queries. They force the database to scan a larger portion of the index, with slowdown proportional to selectivity (fewer matching rows = slower query).
If both a global and a local vector index exist on the same vector column, ScyllaDB automatically selects the local index when the partition key is specified in the query, as it provides better performance.
Limitations¶
The
TOKENfunction,CONTAINSoperator, andDISTINCTkeyword are not supported in vector queries.Only one local vector index can be defined for each combination of partition key columns and vector column.
Filtering on columns not in the primary key is not supported.
What’s Next¶
Working with Vector Search — vector data type, index creation, and ANN queries.
Quantization and Rescoring — reduce memory usage while maintaining search quality.
Vector Search Concepts — architecture and data flow.