ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs ScyllaDB Cloud Vector Search Sizing and Capacity Planning

Sizing and Capacity Planning¶

This page helps you estimate the memory requirements for your vector search workload and choose appropriate instance types. Because the HNSW index resides entirely in memory on the vector search nodes, memory is typically the constraining resource.

Memory Estimation Formula¶

The total memory required for a vector index depends on three factors:

  1. Raw vector data — the number of vectors × dimensions × bytes per dimension (determined by the quantization level).

  2. HNSW graph overhead — each node in the graph stores edges to its neighbors. The overhead scales with the m parameter (maximum_node_connections).

  3. Operational headroom — memory for query processing, CDC readers, and system overhead.

The simplified formula:

\[\text{Memory} \approx N \times \left( D \times B + m \times 16 \right) \times 1.2\]

Where:

  • \(N\) = number of vectors

  • \(D\) = number of dimensions

  • \(B\) = bytes per dimension (see table below)

  • \(m\) = maximum_node_connections (default: 16)

  • The \(m \times 16\) term accounts for HNSW graph edges (each edge stores a neighbor ID and metadata)

  • The 1.2 multiplier provides ~20% operational headroom

Bytes per Dimension by Quantization Level¶

Quantization

Bytes / dim

Notes

f32

4

Full precision (default). Highest recall, highest memory.

f16

2

Half precision. Good balance for most workloads.

bf16

2

Brain float. Statistically equivalent to f16 for most models.

i8

1

8-bit integer. ~4× memory savings vs. f32.

b1

0.125

Binary. ~32× memory savings vs. f32. Use with rescoring.

Worked Examples¶

Example 1: 10M vectors, 768 dimensions, f32¶

A typical OpenAI or sentence-transformer embedding workload:

\[\begin{split}\text{Memory} &\approx 10{,}000{,}000 \times (768 \times 4 + 16 \times 16) \times 1.2 \\ &\approx 10{,}000{,}000 \times (3{,}072 + 256) \times 1.2 \\ &\approx 10{,}000{,}000 \times 3{,}328 \times 1.2 \\ &\approx 39.9 \text{ GB}\end{split}\]

Recommendation: An r7g.8xlarge (or larger) instance on AWS, or n4-highmem-16 (or larger) on GCP.

Example 2: 10M vectors, 768 dimensions, i8 (quantized)¶

The same workload with i8 quantization:

\[\begin{split}\text{Memory} &\approx 10{,}000{,}000 \times (768 \times 1 + 16 \times 16) \times 1.2 \\ &\approx 10{,}000{,}000 \times (768 + 256) \times 1.2 \\ &\approx 10{,}000{,}000 \times 1{,}024 \times 1.2 \\ &\approx 12.3 \text{ GB}\end{split}\]

Savings: ~3.2x less memory than f32 for the same dataset - not 4x, because quantization compresses only the vector data, while the HNSW graph structure (the \(m \times 16\) term) remains the same size regardless of quantization level. Combine with oversampling and rescoring to maintain recall. See Quantization and Rescoring.

Example 3: 100M vectors, 1536 dimensions, f16¶

A large-scale workload with OpenAI text-embedding-3-large vectors:

\[\begin{split}\text{Memory} &\approx 100{,}000{,}000 \times (1{,}536 \times 2 + 16 \times 16) \times 1.2 \\ &\approx 100{,}000{,}000 \times (3{,}072 + 256) \times 1.2 \\ &\approx 100{,}000{,}000 \times 3{,}328 \times 1.2 \\ &\approx 399 \text{ GB}\end{split}\]

Recommendation: This workload requires multiple vector search nodes. ScyllaDB Cloud distributes the index across nodes within each Availability Zone. Contact ScyllaDB support for guidance on large deployments.

Impact of Higher m Values¶

Increasing m (maximum_node_connections) improves recall but adds graph overhead. The following table shows the impact for 10M vectors of 768 dimensions with f32 quantization:

m

Graph overhead

Total memory

Trade-off

16 (default)

~2.4 GB

~39.9 GB

Good default for most workloads.

32

~4.8 GB

~42.2 GB

Higher recall for high-dimensional vectors.

64

~9.6 GB

~47.0 GB

Maximum recall; recommended for D > 1024.

Quantization Impact Summary¶

For 10M vectors of 768 dimensions with m=16:

Quantization

Vector data

Graph overhead

Total

Recall impact

f32

28.6 GB

2.4 GB

~39.9 GB

Baseline (highest recall)

f16

14.3 GB

2.4 GB

~21.0 GB

Negligible recall loss

i8

7.2 GB

2.4 GB

~12.3 GB

Minor recall loss; use oversampling

b1

0.9 GB

2.4 GB

~4.0 GB

Significant recall loss; use rescoring

Notice that the graph overhead (2.4 GB) is constant across all quantization levels. Only the vector data column shrinks. This is why the actual memory savings from quantization are always less than the raw compression ratio - for example, i8 is 4x smaller per dimension than f32, but total memory drops only ~3.2x.

Sizing Guidelines¶

  • Start with the memory formula to estimate your baseline requirement. Add 20-30% headroom for operational overhead.

  • Choose quantization early. Quantization has the largest impact on memory. For most workloads, f16 or i8 with oversampling provides a good balance of memory savings and recall.

  • Match instance type to workload. Choose an instance with enough RAM for your estimated memory requirement. See Supported Instance Types for available options.

  • Plan for growth. If your dataset is growing, size for expected data volume 6-12 months out.

  • Test with your data. Memory formulas are estimates. Load a representative sample and measure actual memory usage before committing to production instance types.

For detailed specifications, see the sizing algorithm documentation and the 1B vector benchmark.

What’s Next¶

  • Vector Search Deployments — create, enable and manage vector search clusters.

  • Quantization and Rescoring — reduce memory usage while maintaining recall.

  • Reference — instance types, CQL syntax, and API endpoints.

Was this page helpful?

PREVIOUS
Vector Search Deployments
NEXT
Working with Vector Search
  • Create an issue

On this page

  • Sizing and Capacity Planning
    • Memory Estimation Formula
      • Bytes per Dimension by Quantization Level
    • Worked Examples
      • Example 1: 10M vectors, 768 dimensions, f32
      • Example 2: 10M vectors, 768 dimensions, i8 (quantized)
      • Example 3: 100M vectors, 1536 dimensions, f16
      • Impact of Higher m Values
      • Quantization Impact Summary
    • Sizing Guidelines
    • What’s Next
ScyllaDB Cloud
  • Quick Start Guide to ScyllaDB Cloud
  • About ScyllaDB Cloud as a Service
    • Benefits
    • Backups
    • Best Practices
    • Managing ScyllaDB Versions
    • Support, Alerts, and SLA Commitments
    • Billing
  • Deployment
    • Cluster Types - X Cloud and Standard
    • Bring Your Own Account (BYOA) - AWS
    • Bring Your Own Account (BYOA) - GCP
    • Terraform Provider
    • Free Trial
  • Cluster Connections
    • Configure AWS Transit Gateway (TGW) VPC Attachment Connection
    • Configure Virtual Private Cloud (VPC) Peering with AWS
    • Configure Virtual Private Cloud (VPC) Peering with GCP
    • Migrating Cluster Connection
    • Checking Cluster Availability
    • Glossary for Cluster Connections
  • Access Management
    • SAML Single Sign-On (SSO)
    • User Management
  • Managing Clusters
    • Resizing a Cluster
    • Adding a Datacenter
    • Deleting a Cluster
    • Maintenance Windows
    • Email Notifications
    • Usage
  • Using ScyllaDB
    • Apache Cassandra Query Language (CQL)
    • ScyllaDB Drivers
    • Tracing
    • Role Based Access Control (RBAC)
    • ScyllaDB Integrations
  • Vector Search
    • Quick Start Guide
    • Vector Search Concepts
    • Vector Search Deployments
    • Sizing and Capacity Planning
    • Working with Vector Search
    • Filtering
    • Quantization and Rescoring
    • Security
    • Troubleshooting
    • FAQ
    • Glossary
    • Reference
    • Example Project
  • Monitoring
    • Monitoring Clusters
    • Extracting Cluster Metrics in Prometheus Format
  • Security
    • Security Best Practices
    • Security Concepts
    • Database-level Encryption
    • Storage-level Encryption
    • Client-to-node Encryption
    • Service Users
    • Data Privacy and Compliance
  • API Documentation
    • Create a Personal Token for Authentication
    • Terraform Provider for ScyllaDB Cloud
    • API Reference
    • Error Codes
  • Help & Learning
    • Tutorials
    • FAQ
    • Getting Help
Docs Tutorials University Contact Us About Us
© 2026, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 24 Mar 2026.
Powered by Sphinx 9.1.0 & ScyllaDB Theme 1.9.1
Ask AI