ScyllaDB Docs ScyllaDB Cloud Vector Search Filtering Vector Search Results

Filtering Vector Search Results¶

Filtering lets you combine similarity search with metadata constraints, so results are both semantically relevant and meet your business requirements. This page explains how to use filtering with global and local vector indexes in ScyllaDB.

Overview¶

Typical filtering use cases include:

Multi-tenant isolation — tenant_id = 'acme'
Recency — created_at >= '2024-01-01'
Geo-filtering — region IN ('EU', 'US')
Access control — visibility = 'public'

Without filtering, you would need to retrieve a large set of similar vectors and then filter them in your application. ScyllaDB filtering pushes this work into the database, reducing network overhead and application complexity.

ScyllaDB supports two types of vector indexes for filtering:

Global vector indexes — index all vectors in a table. Filter on columns that are part of the table’s primary key.
Local vector indexes — index vectors within a single partition. Significantly faster than global indexes because they search only a single partition’s index instead of the entire index space.

Caution

For best filtering performance, design your schema so that the columns you filter on are part of the partition key, and use a local vector index. This ensures that only equality (=) filters on partition key columns are needed, which is the fastest path. Global index filtering and inequality/IN operators are substantially slower (see details below).

Filtering with Global Vector Indexes¶

A global vector index indexes all vector data stored in a table. You can filter results using columns that are part of the table’s primary key (partition key and clustering columns).

Caution

Searching through a global index is always much slower than searching through a local index, because ScyllaDB must search the entire index space across all partitions and then post-filter the results. The more selective the filter (i.e., the fewer rows that match), the slower the query, because more index entries must be scanned to find enough matching results.

Global index queries also require the ALLOW FILTERING option in the SELECT statement.

Whenever possible, prefer local vector indexes for filtered vector search.

Example Table Schema¶

The examples in this section use the following table with a composite partition key:

CREATE TABLE IF NOT EXISTS myapp.comments_vs (
  commenter text,
  comment text,
  comment_vector VECTOR<FLOAT, 5>,
  created_at timestamp,
  discussion_board_id int,
  country text,
  lang text,
  PRIMARY KEY ((commenter, discussion_board_id), created_at)
);

Creating a Global Vector Index¶

The syntax is the same as creating a standard vector index:

CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
ON myapp.comments_vs(comment_vector)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };

With a global vector index, you can filter on any column that is part of the base table’s primary key.

Querying with Filtering¶

You can filter on primary key columns (commenter, discussion_board_id, created_at) in your ANN query:

SELECT commenter, comment FROM myapp.comments_vs
WHERE created_at = '1970-01-01 00:01:04'
ORDER BY comment_vector ANN OF [0.1, 0.2, 0.3, 0.4, 0.5] LIMIT 5
ALLOW FILTERING;

The LIMIT keyword limits the number of results returned after applying the filter, not before. Internally, the ANN search first retrieves candidate vectors by similarity, then applies the filter predicates. If the filter eliminates most candidates, the final result set may be smaller than the requested LIMIT.

Local Vector Indexes¶

A local vector index creates a separate vector index per partition. This is more efficient than global filtering when you frequently query within a specific partition (e.g., per tenant, per user, per discussion board).

Caution

The partition key columns in a local vector index must match the partition key of the base table.

Creating a Local Vector Index¶

The local vector index syntax specifies the partition key columns in parentheses before the vector column:

CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
ON myapp.comments_vs((commenter, discussion_board_id), comment_vector)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };

In this example:

(commenter, discussion_board_id) — the partition key columns. These must match the base table’s partition key.
comment_vector — the vector column to index.

For each unique partition key value, ScyllaDB maintains a separate vector index, which keeps index sizes small and queries local to a single node.

Querying with a Local Vector Index¶

When using a local vector index, you must specify the full partition key in the WHERE clause:

SELECT commenter, comment FROM myapp.comments_vs
WHERE commenter = 'Alice' AND discussion_board_id = 42
ORDER BY comment_vector ANN OF [0.1, 0.2, 0.3, 0.4, 0.5] LIMIT 5;

You can also combine partition key filtering with clustering column filtering:

SELECT commenter, comment FROM myapp.comments_vs
WHERE commenter = 'Alice' AND discussion_board_id = 42
  AND created_at >= '2024-01-01'
ORDER BY comment_vector ANN OF [0.1, 0.2, 0.3, 0.4, 0.5] LIMIT 5;

Caution

Inequality operators (>, <, >=, <=) and the IN operator are always slow in vector search queries, regardless of whether they are applied to partition key or clustering columns. These operators force ScyllaDB to search a much larger portion of the index space than an equality (=) filter. The slowdown is proportional to the filter’s selectivity - the fewer rows that match the filter, the more index entries must be scanned, and the slower the query becomes.

For best performance, design your schema so that the columns you need to filter on are partition key columns queried with equality (``=``) operators. Use a local vector index so that the search is confined to a single partition’s index.

Choosing Between Global and Local Indexes¶

Criteria	Global Vector Index	Local Vector Index
Index scope	All rows in the table	Rows within a single partition
Filter columns	Primary key columns	Primary key columns
Requires partition key in WHERE	No	Yes
Performance at scale (>10M vectors)	Always much slower (searches entire index space)	Fast (searches only one partition’s index)
Use case	Cross-partition similarity search	Per-tenant, per-user, or scoped search
`ALLOW FILTERING` required	Yes (when using WHERE clause)	No

General guidance:

Always prefer local indexes over global indexes for filtered vector search. Local indexes search only a single partition’s index, while global indexes must search the entire index space, making them significantly slower.
Design your schema so that columns you filter on are part of the partition key. This lets you use a local vector index with equality (=) filters on the partition key - the fastest possible filtering path.
Use global indexes only when you genuinely need to search across all data without knowing the partition key in advance. Be aware that performance degrades as the dataset grows.
Avoid inequality and IN operators in filtered vector queries. They force the database to scan a larger portion of the index, with slowdown proportional to selectivity (fewer matching rows = slower query).

If both a global and a local vector index exist on the same vector column, ScyllaDB automatically selects the local index when the partition key is specified in the query, as it provides better performance.

Limitations¶

The TOKEN function, CONTAINS operator, and DISTINCT keyword are not supported in vector queries.
Only one local vector index can be defined for each combination of partition key columns and vector column.
Filtering on columns not in the primary key is not supported.

What’s Next¶

Working with Vector Search — vector data type, index creation, and ANN queries.
Quantization and Rescoring — reduce memory usage while maintaining search quality.
Vector Search Concepts — architecture and data flow.

Was this page helpful?