1

I would like to understand the recommendation of max. 1TB load per node that is repeatedly being suggested around, Datastax in particular.

I have not seen anywhere how such a limit translates into any metric, besides quite subjective comments like faster node replacement or backups. These types of comments are very ambiguos (What is fast for you may not be for a different production environmen) or might even be irrelevant. (See this)

Furthermore, 1TB seems way too low a limit currently, when you can get an 8TB sata disk for barely above $130 USD.

  1. Does the 1TB limit correspond to an actual inherent limitation in the design of Cassandra?
  2. Has this limit been quantified, say by way of showing (e.g. graphs) how some metrics clearly worsen above it?
  3. Is this limit more relevant than the "<50% capacity" ? Say load is at 3TB but capacity is at 50%, would there still be a need for increasing the number of nodes?

This constraint is such that it is probably always easier and cheaper to fix the capacity threshold than the load one. That feels unreasonable to me, and, if certain, severely questions Cassandra's adequacy for small to medium size businesses.

1 Answer 1

2
+50

I'll try do my best to give you the answers you're after.

Short answers are: 1. no, 2. yes, 3. it depends™

In more detail…

Cassandra 5.0+ will comfortably permit higher than 4TB density per node. The recommended node density is based on hardware, data model and operational factors. An updated accurate recommendation for 5.0 is in the works.

The recommendation for older versions of Apache Cassandra, and all its variants, is 1-4TB data per node. This means 2-8TB disks. (And disks need to be able to handle minimum 10k IOPS sustained, and don't do JBOD.)

The 50% rule is the baseline rule for tables using STCS (SizeTieredCompactionStrategy). The compaction process up to duplicates on disk what (and while) it is compacting. More tables means no single compaction will all this 50% (keeping in mind there's concurrent compactors). Other compaction strategies change this behaviour. LCS (LeveledCompactionStrategy) has levels and works with smaller compactions, so it can often be ok with up to 70% of disk usage. TWCS (TimeWindowedCompactionStrategy) only even compacts the current time-window, so the 50% becomes 50% x max-time-window-size.

There's other things that use up disk space like backups and snapshots, streaming operations, etc, so the 50% rule is a safe recommendation to throw out there without going into the full details. In practice it's typically more like 60-70%. But as you've already pointed out, disk at these sizes is cheap.

The limitation of data per node is about operational ease. Certain operations start to become more time-consuming and clumsy above the 1TB node data density. Especially all streaming related activities: bootstrapping, decommissioning, repairs, etc. Many clusters are running at 4TB just fine, despite the heaviness of some tasks imposed on the operator. If using LCS then there can be a lot of IO from compactions, which may hurt read latencies. If using STCS then compacting the largest tier can take a long time, leaving lower tiers pending, which also may hurt read latencies. TWCS doesn't suffer these issues, and we have seen people use up to 16TB per node, but this makes those streaming operations painful. Again, the safe recommendation to throw out there is 1TB.

Apache Cassandra 5.0, and DataStax 6.8, has UCS (Unified Compaction Strategy). This compaction strategy can be adjusted/tuned between the trade-offs of write or read amplification, i.e. a sliding scale between behaving like STCS or like LCS (poorman's analogy). With this compaction strategy, and a number of other significant improvements (like tries-based sstable indexes), node density can be much higher. And streaming operations have been improved to be both faster and "lighter".

Cassandra is a scale-out technology, embrace that mindset when capacity planning from day one. This isn't just about taking advantage of commodity hardware, but Cassandra's key design characteristics of durability, availability and linear scalability.

Your point about Cassandra not separating compute from storage scalability is valid. Stay tuned for more action on this front, from tiered storage, separate processes/services for background stages (SEDA), to TCM (CEP-21), there's plenty of improvements coming. Having worked with hundreds (maybe thousands) of production clusters, as a consultant in The Last Pickle and at DataStax, and most of them being small to medium sized as you raise concerns to, I haven't seen this issue be the blocker you suppose it to be. Cassandra is the most popular leaderless distributed database out there for a reason. But yes, your hardware choice is a little more select (remember scale-out, not up, and local-attached SSDs not a SAN), but otherwise we're really talking here more about performance/cost optimisations, which we all do love, but are not blockers.

I recommend you go checkout AstraDB which has all these advantages and more already.

2
  • Thanks. Astra is unfortunately at a higher "division" a game due to much more expensive license costs -think on a scale of 10s of TB of total data. For same reasons, scaling out horizontally is difficult to justify in my case, although not as much. I wish I could see a comparison of actual latencies, e.g., as given by opscenter, for different loads/node, or other metrics. In my mind Id like to asses more quantitatively how much strain we are putting on our cluster. Still, recommending 1TB by 2024 seems unrealistic. But it could be that Cassandra is simply not optimal for us, you are right.
    – MASL
    Commented May 31 at 19:13
  • I will wait a couple more of days. If there isn't a more tangible answer I'll accept yours. Thanks again for taking the time to reply.
    – MASL
    Commented May 31 at 19:21

Not the answer you're looking for? Browse other questions tagged or ask your own question.