Optimizing Compaction Strategy in ScyllaDB for High-Write Workloads

Authored by Andrei Manakov, Staff Software Engineer and reviewed by the Metapress editorial team, this article was published following a careful evaluation process to ensure quality, relevance, and editorial standards.

***

When ScyllaDB clusters experience heavy write workloads, the underlying cause is often overlooked: inadequate compaction settings. During traffic surges, clusters can reach their limits due to compaction settings that performed well in normal operations but became ineffective under pressure. In this article, I’ll explore how to optimize your compaction strategy to keep performance consistent — even when write volumes spike.

Know Your Weapons: The Four Compaction Strategies

ScyllaDB, Cassandra’s C++-powered and potentially faster cousin, shares the common LSM-tree challenge of managing immutable SSTables efficiently. Compaction in LSM-tree databases like ScyllaDB is a periodic background process. Writes are first accumulated in memory and flushed to disk as SSTables once the Memtable fills up. This creates multiple files per table, impacting read performance. Compaction periodically merges these SSTables to reduce fragmentation and improve read efficiency.

Compaction is crucial for high-write systems, distinguishing between a smoothly functioning database and a catastrophic incident that casts doubt on one’s career choices.

Size-Tiered Compaction Strategy (STCS)

While STCS is the default and often a safe starting point, it may not scale well for every production workload.

STCS works by grouping similarly sized SSTables together and compacting them when enough accumulate in a tier:

sstable (50MB) ┐

sstable (52MB) ┼─> compaction ──> new sstable (150MB)

sstable (48MB) ┘

Over time, STCS can create massive SSTables that are slow to compact. During sustained write loads, smaller SSTables accumulate faster than they can be merged. This leads to inefficient reads, where a single row may be spread across multiple files, and increased disk usage due to redundant data being rewritten.

Time-Window Compaction Strategy (TWCS)

TWCS is the specialist on the team. It organizes data into time buckets, compacting only within those windows. Once a window closes, its data gets compacted into a single SSTable that’s never touched again.

It’s brilliant for time-series data or logs — anything where newer data matters more than old stuff and reads typically focus on recent information.

Leveled Compaction Strategy (LCS)

LCS is the most structured of the compaction strategies, organizing data into distinct levels with exponentially increasing sizes and minimal overlap.

It provides stellar read performance even under write pressure, but at the cost of higher background compaction activity.

Incremental Compaction Strategy (ICS)

Incremental Compaction Strategy (ICS) is designed to reduce write amplification and space overhead for frequently updated datasets. It is particularly effective when working with update-heavy datasets like user sessions, shopping carts, or device state tracking, where rows are modified often but not necessarily large in size.

Diagnosing Your Write Workload

Before choosing a strategy, you need to understand what you’re dealing with. There are too many teams that pick compaction strategies based on blog posts rather than actual workload analysis.

The Questions That Matter

Is your data time-oriented? Event logs, metrics, and audit trails are natural fits for TWCS.
How are your reads distributed? If you’re predominantly reading recent data (last 24-48 hours), TWCS shines. If reads span old and new data equally, LCS might be better.
What’s your delete pattern? Heavy deletes or updates create tombstones that become compaction magnets.
What’s your hardware situation? LCS demands more consistent CPU and I/O resources than the other strategies.

Metrics That Tell the Truth

Launch your ScyllaDB monitoring stack and look for these distinctive indicators:

Pending compactions rising steadily: Your current strategy can’t keep up
High sstables_per_read values: Reads are checking too many files
Write latency spikes correlating with compaction activity: Compaction is stealing resources from your writes
Disk space usage growing faster than actual data: Inefficient compaction is wasting space

Real-World Strategy Selection

When STCS Makes Sense

STCS works surprisingly well when:

You’re dealing with unpredictable, bursty write patterns
Your total dataset size is relatively stable (not constantly growing)
You perform regular cleanup during low-traffic periods

When TWCS Makes Sense

TWCS is tailor-made for:

Time-series metrics and monitoring data
Log aggregation and analysis
Any data with a natural expiration cycle

When LCS Makes Sense

LCS shines when:

Read performance absolutely cannot suffer, even during write spikes

Operational Wisdom

After years of managing ScyllaDB clusters under write pressure, here are the lessons that matter:

Never Skip Testing

I cannot emphasize this enough: simulate your production write patterns before deployment. I’ve seen teams confidently push changes only to discover their write pattern had peculiarities that interact badly with their chosen strategy.

Monitor the Right Metrics

Set up dashboards and alerts for:

Pending compactions (growing backlog is your first warning)
Disk space headroom (compaction needs temporary space)
Write vs. compaction throughput ratio (ideally at least 3:1)
SSTable count per read (a spike here signals degraded read efficiency)
CPU usage by compaction processes (useful for capacity planning)

The TTL Trap

Time-to-Live settings can create tombstone explosions that overwhelm compaction. If you’re using TTLs extensively with STCS, you’re basically asking for trouble. Either switch to TWCS or carefully manage your TTL distribution to avoid mass expirations.

Also, make sure your TTL values exceed gc_grace_seconds. If data expires before this grace period ends, Scylla will retain tombstones to ensure consistency during repairs, leading to unnecessary storage bloat and compaction overhead.

Infrastructure Matters

Use fast SSDs for ScyllaDB — compaction is I/O intensive
Consider dedicated disks for commitlog vs. data directories

The Bottom Line

ScyllaDB’s compaction isn’t just a background housekeeping task — it’s central to performance under high-write loads. The right strategy depends entirely on your specific workload patterns, read requirements, and hardware resources.

Remember: there’s no universal “best” strategy. Anyone claiming otherwise hasn’t dealt with enough varied workloads. Start with understanding your data, test thoroughly, monitor aggressively, and be prepared to adjust as your system evolves.

Most importantly, don’t leave compaction as an afterthought. In high-write ScyllaDB deployments, it deserves as much attention as your schema design and hardware selection. Your future self will thank you.