0

I am looking for a way to optimize large table with 8M+ rows.

Table struct looks like follow: id INT PRIMARY AUTOINCREMENT removed TINYINT(1) status TINYINT(1) ~10 other data fields

There is also (id,removed,status) unique index

About half rows have removed=1 and half rows have removed=0 90% rows have status=1, remaining 10% have values with range 0-9

Application uses about 200 different queries accessing this table directly or with JOINs. Rewriting application and queries is out of my scope.

About 90% queries are accessing only removed=0 AND status=1 rows (WHERE...removed=0 AND status=1...), ~1% queries are accessing both removed and unremoved rows regardless of status, and 9% queries are direct PK hit (where id=X).

The question is - will partitioning by removed field (0,1) speed up these 90% of queries with removed=0? If innodb engine will have to access only ~3.5M instead of 7M rows, it theoretically should speed up all queries? Or this is not the case?

2 Answers 2

2

Partitioning is not a tool for improving query performance with SELECT, INSERT, UPDATE, or DELETE queries. Rather it's a tool for improving management of the table (such as when you need TRUNCATE an entire partition at one time). This is a common misconception because less data is better, right? But that's what the purposes of indexes are for.

Partitioning just reduces the data in a linear fashion. Indexes (specifically of the B-Tree kind) divides the data logarithmically, which is exponentially more efficient than partitioning.

So to answer your question(s):

The question is - will partitioning by removed field (0,1) speed up these 90% of queries with removed=0? If innodb engine will have to access only ~3.5M instead of 7M rows, it theoretically should speed up all queries? Or this is not the case?

No, it's not the case. Any performance gains that could possibly result are negligible and there are reasons why partitioning makes queries slower actually (especially when data is needed across partitions).

You're better off understanding what the queries are generally filtering on, by looking at their query plans (EXPLAIN ANALYZE) and tuning the indexes on the tables that those queries are referencing. Sometimes query tuning or database re-architecting are the only proper solutions too.

1

Thumbs up to J.D.'s answer. What index would or would not be better?...

  • In MySQL, PRIMARY KEY(id) is necessarily UNIQUE. Hence, UNIQUE(id, removed, status) is wasteful. Drop that index.
  • The 9% (just WHERE id = ...) are very efficiently handled by the PRIMARY KEY. In fact, PARTITIONing would slow down this case because of the need to check each partition.
  • The 1% -- You say there is no WHERE? Then a table scan is best.
  • MariaDB has "histogram" statistics. This may help the Optimizer understand the uneven distribution of status values. If so, I recommend INDEX(status, removed) for helping when testing rare status values (other than 1).
  • When testing removed, but not status, no index is beneficial. Partitioning on just removed may help this case, but hurt most other cases.
  • removed <> 0 probably prevents the use of indexing and partitioning. Do try to fix the code to say remove = 1. (INDEXes are useful for equality and ranges, not for inequality.)

Not the answer you're looking for? Browse other questions tagged or ask your own question.