Was this page helpful?
Vector Search Troubleshooting¶
This page lists common issues encountered when working with Vector Search in ScyllaDB and provides solutions for each.
Index Issues¶
Index creation fails with a tablets error¶
Symptom: CREATE CUSTOM INDEX returns an error about tablets not
being enabled.
Cause: The table’s keyspace was created without tablets support.
Solution: Vector indexes require tablets-enabled keyspaces. Create a new keyspace with tablets enabled and recreate the table:
CREATE KEYSPACE myapp
WITH replication = {
'class': 'NetworkTopologyStrategy',
'replication_factor': 3
}
AND tablets = { 'enabled': true };
See Tablets Requirement.
Index creation is slow¶
Symptom: CREATE CUSTOM INDEX takes a long time on a table with
existing data.
Cause: When creating an index on a table that already contains data, ScyllaDB must build the HNSW graph over all existing vectors in a background process.
Solution: This is expected behavior. Build time depends on the number of
rows and vector dimensionality. Monitor progress through the
Monitoring dashboard. For large datasets, consider
increasing construction_beam_width for higher-quality builds, or lowering
it to speed up construction at the expense of recall.
Cannot change index options after creation¶
Symptom: ALTER INDEX is not supported.
Cause: ScyllaDB does not support altering vector index options.
Solution: Drop the existing index and recreate it with new options:
DROP INDEX IF EXISTS my_index;
CREATE CUSTOM INDEX my_index ON myapp.table(vec)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'COSINE' };
Query Issues¶
ANN query returns no results¶
Symptom: ORDER BY ... ANN OF ... LIMIT k returns an empty result set.
Possible causes:
The vector index has not finished building.
No data has been inserted into the table.
The query vector dimensionality does not match the column definition.
Solution: Verify:
The index status is
ACTIVE(check viaDESCRIBE INDEX).Data exists in the table (
SELECT COUNT(*) FROM table).The query vector has the correct number of dimensions.
ANN query returns stale results¶
Symptom: Recently inserted vectors do not appear in ANN query results.
Cause: Vector indexes are updated asynchronously via CDC. There is a short propagation delay (typically under 1 second, up to 30 seconds in edge cases).
Solution: Wait briefly and retry. See Write-to-Query Latency for details on the dual CDC reader system.
Query returns unexpected results¶
Symptom: The top-k results seem irrelevant or have low similarity.
Possible causes:
The similarity function used in the index does not match the embedding model’s output characteristics.
The embedding model was changed after data was inserted, producing vectors in a different space.
search_beam_widthis set too low, reducing recall.
Solution:
Verify the similarity function matches your model. See Choosing a Similarity Function.
Ensure all vectors in the table use the same embedding model.
Increase
search_beam_widthto improve recall (requires dropping and recreating the index).
Data Issues¶
Insert fails with a vector dimension mismatch¶
Symptom: CQL INSERT returns an error about vector dimensions.
Cause: The number of elements in the vector literal does not match the column’s declared dimension.
Solution: Ensure the vector has exactly the number of elements declared
in the schema. For example, if the column is vector<float, 768>, every
inserted vector must have exactly 768 elements.
TTL is not supported¶
Symptom: Writes with TTL on a column with a vector index are silently accepted but TTL is ignored.
Cause: TTL is not supported for vector-indexed columns.
Solution: Do not set TTL on vector-indexed columns. See CQL Features Not Supported for the full list of limitations.
Connectivity Issues¶
Cannot connect to the cluster¶
Symptom: Driver or cqlsh cannot connect to the cluster.
Solution:
Verify your cluster is in
ACTIVEstatus in the ScyllaDB Cloud console.Ensure your client IP is allowed in the cluster’s connection settings.
Verify TLS is enabled in your driver configuration (ScyllaDB Cloud requires TLS).
Check that the DC-aware load balancing policy is configured correctly.
See Checking Cluster Availability for connection troubleshooting.
Performance Issues¶
Memory pressure or OOM on vector search nodes¶
Symptom: Queries time out or vector search nodes restart unexpectedly.
Cause: The vector index size exceeds available RAM on the vector search nodes. The HNSW index resides entirely in memory, so under-provisioned instances will experience memory pressure.
Solution:
Use quantization (f16 or i8) to reduce memory per vector. See Quantization and Rescoring.
Choose a larger instance type with more RAM. See Supported Instance Types.
Use the Sizing Guide to estimate memory requirements before scaling up.
Performance degradation during heavy writes¶
Symptom: Query latency increases during bulk data loading or heavy write periods.
Cause: The CDC readers that propagate changes to vector search nodes consume additional CPU and memory when processing a high volume of changes. This is expected behavior — the index is being continuously updated.
Solution:
This is temporary. Query latency returns to normal once the write burst completes and the CDC backlog is processed.
For planned bulk loads, consider loading data before creating the index, so the HNSW graph is built in a single pass rather than incrementally.
Monitor the Write-to-Query Latency — during heavy writes, propagation latency may increase from sub-second to several seconds.
Filtering query returns fewer results than expected¶
Symptom: A SELECT ... WHERE ... ORDER BY ... ANN OF ... LIMIT 10
query returns fewer than 10 rows.
Cause: The filter is highly selective and not enough matching vectors
exist in the candidate set. The ANN search first finds the nearest vectors,
then applies the filter - if the filter eliminates most candidates, fewer
results remain. This is especially pronounced with global indexes (which must
search the entire index space) and with inequality (>=, <=, etc.) or
IN operators, where the slowdown is proportional to selectivity.
Solution:
Increase the
LIMITto a higher value to give the search more candidates to work with.Use a less selective filter condition.
Switch to a local (per-partition) vector index with equality (
=) filters on partition key columns - this is the fastest filtering path. See Filtering.
Driver version incompatibility¶
Symptom: The driver cannot parse vector column responses, or vector inserts fail with type errors.
Cause: The VECTOR data type requires driver support. Older driver
versions may not recognize the vector type.
Solution: Upgrade to a driver version that supports vectors. See ScyllaDB Drivers — Support for Vector Search to check which versions support the vector type.
Index build progress monitoring¶
Symptom: You created an index on a table with existing data and want to know if the index has finished building.
Cause: When an index is created on a table that already contains data, ScyllaDB builds the HNSW graph as a background process. Until the build completes, ANN queries may return incomplete results.
Solution:
Check the index status using
DESCRIBE INDEX— the status should beACTIVEwhen the build is complete.Monitor the vector search node metrics in the Monitoring dashboard.
Build time depends on the number of rows, vector dimensionality, and the
construction_beam_widthparameter. Larger datasets with higher dimensions take longer.
What’s Next¶
Working with Vector Search — CQL syntax reference for vector tables, indexes, and queries.
Vector Search Concepts — architecture and design principles.
Reference — instance types, CQL reference, and API endpoints.