4

Relevant System Info: Aurora Mysql 8.0.mysql_aurora.3.03 rg6.xl instances (1 writer, 2 read replicas) Total size:5.5TB (all databases combined, or just looking at the most recent Snapshot)

I have been working on migrating mariadb databases deprecated tokudb engine to RDS Aurora Mysql. With some fine tuning of the parameters in RDS, there has been one behavior I cannot understand.

Scenario

The main database houses tables created by each year, so lets use TABLE_2023 as an example. In this table, there are two indexes; the primary and the secondary index. PRIMARY index(ID, DATE) and Secondary(DATE). If I take a query like the following, and thrown an EXPLAIN, the output shows NULL for the key column. I found this odd since if I run the same query, but against the existing mariadb server, it will output with one of the indexes in that column. If I force the query to use an index (SELECT * FROM TABLE_2023 FORCE INDEX(PRIMARY) WHERE ID = '3' AND TIME >=202301010000 AND <=202304230000;), the forced index appears in this column. No surprise there.

When I run this query, and many like it, RDS takes 5-10times longer to complete. If I force the query, its extremely fast (though, still not as fast as the mariadb server on our current EC2 instances).

So here comes the question; am I wrong to think that Aurora isn't picking an index to use when presented with similar queries? When I use EXPLAIN for the same query on our EC2 instances, I can see that one of the indexes will be used in the results. But with Aurora, it is providing no such thing (just NULL).

There are obviously other caveats here; the table for this year alone is over a billion rows. There are also things that need to be tweaked with some of the queries that the data engineers are throwing at this thing, but I am stumped on how to understand the RDS Aurora behavior.

I cannot find anywhere in AWS documentation that explains this result. I know if I drop the time range in the same query above, and then run EXPLAIN in front of it..the results SHOW that Aurora Mysql will use the PRIMARY index. So, I know it can work as expected.

In case people ask:

  1. The data was migrated from EC2 instance using DMS tool. I noticed after the migration that the secondary index was missing. So, instead, I dropped all the tables, created them again WITH the secondary index as well, and then loaded the tables using DMS. My thought is maybe the indexes need to be reindexed, but this would require me to drop the index and re-add...which takes almost 3 days for 2022's data.
  2. OPTIMIZE TABLE does not make the situation any better because its just creating a new table, taking the data from old table and putting it into the new table)
  3. I tried ALTER TABLE DISABLE KEYS and then ENABLING them again, hoping I would get better results...but still slower than EC2 instance running mariadb and same results when using EXPLAIN.

Example query:

SELECT * FROM TABLE_2023
   WHERE TIME >= 202304170000
   AND TIME <= 202304172359
   AND ID IN (
              1206
             ,1332
             ,1919
             ,2878
             ,8694
             ,15452
             ,18323
             ,18501
             ,19908
             ,20142
             ,20881
             ,33090
             ,34131
             ,36806
             ,37402
             ,37672
             ,37673
             ,37853
             ,38840
             ,38978
             ,39078
             ,39128
             ,42530
             ,51730
             ,51964
             ,70686
             ,71504
             ,74520
             ,74521
             ,74522
             ,74523
             ,74524
             ,74525
             ,74526
             ,74527
             ,74528
             ,74529
             ,74530
             ,74531
             ,75237
                   )

This produces 1822 rows and took 4min and 3 seconds without forcing index. When forcing the PRIMARY index, these requests take 2-3 seconds (for this size query). And yes, this is how long it takes when the query is not cached.

EXPLAIN ANALYZE results for the same query (again not forcing index):

-> Filter: ((TABLE_2023.TIME >= 202304170000) and(TABLE_2023.TIME <= 202304172359) and (TABLE_2023.ID in () [Using parallel query (2 filters, 1 exprs; 0 extra)] (cost=404.85 rows=1823) (actual time=18271.466..244722.077 rows=1822 loops=1) -> Table scan on TABLE_2023 [Using parallel query (6 columns)] (cost=404.85 rows=2296760704) (actual time=18271.459..244721.615 rows=1822 loops=1) [parallel query actual (total time= 244722.063, rows returned= 1822)]

enter image description here

select count(*) on this year's table brings back 1809962901 rows.

CREATE TABLE Results

enter image description here

CREATE TABLE `TABLE_2023` (
  `SOMETHING_ID` mediumint NOT NULL,
  `TIME` bigint unsigned NOT NULL,
  `COLUMN3` tinyint DEFAULT NULL,
  `COLUMN4` smallint DEFAULT NULL,
  `COLUMN5` smallint DEFAULT NULL,
  `JSON` varchar(2048) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  PRIMARY KEY (`SOMETHING_ID`,`TIME`),
  KEY `TIME` (`TIME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

API Behaviors

Users that use the api can request a range of data. However, the query behavior is almost always asking the database for time series data for either an single ID or multiple IDs for the same time series. Currently, the code behind the API uses the "TIME>='' and TIME<=''" for a time range, not BETWEEN. Nor does it use "ID='1'" for a single station, but rather "in". I have recommended the developers to change the code to use "=" if one station is used, to avoid the table scan.

Final Thoughts Can we get away with forcing the index we want the query to use? Of course. But the behavior we are seeing with Aurora Mysql not being able to pick the index without being forced to bothers me. Especially when I do the same queries against our current databases environment (which will use an index, regardless of being forced). The only time I am able to get an index used is by using simple queries;

EXPLAIN select * from TABLE_2023 where SOMETHING_ID in (1206); --index PRIMARY "used" 
EXPLAIN select * from TABLE_2023 where SOMETHING_ID = '1206'; --index PRIMARY "used"
EXPLAIN SELECT * FROM TABLE_2023 where TIME = '202304170000'; --index TIME "used"

Update as of 05/01/2023

I dropped the year table (2023) and reimported all the data from our source database using DMS (again). After this was done, I added the secondary index (TIME) since DMS will usually only grab the primary index. After this was done, I took the same example query and still my EXPLAIN results show no usage of an index.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.html

I came across this document when reviewing the methods I was loading data into tables. Turns out AWS recommends dropping Primary Index if doing a full table load. Going to give this a try and load the data, then re-add just the PRIMARY index.

Update as of 05/02/2023

After creating a new table w/o the PRIMARY index and secondary index, I loaded the data into the table via DMS. Once completed, I waited 6+ hours for the PRIMARY key index to create. Unfortunately, the behavior still exists. However, I do have a scheduled call with AWS reps that I hope to get an answer for this behavior. Once an answer is vetted, I will post here.

Update as of 05/05/2023

After meeting with AWS account managers, they will be relaying technical information to their engineers to take a look at. In the meantime, I conducted a few more tests:

Scenario 1: Create table with primary key index only Load 38,000 rows, run 'EXPLAIN' with the example query-primary index would have been used.

Scenario 2: Create table with primary key index only Load 23 million rows, run 'EXPLAIN' with the example query-primary index identified, but not used

Scenario 3: Drop primary index in table from Scenario 2 Re-add primary index, run 'EXPLAIN' with the example query-primary index identified, but not used

Scenario 4: Create table without primary key index Load 23 million rows, add primary key index, run 'EXPLAIN' with the example query-primary index identified, but not used

Scenario 5: Create table without primary key index Load 38,000 rows, add primary key, run 'EXPLAIN' with example query-primary index identified and used

Scenario 6: Create table without primary key index Load around 12 million rows, add primary key, run 'EXPLAIN' with example query-primary index identified but not used.

Update as of 05/08/2023 Tried suggested answer, same behavior. Ended up doing the following but AWS Aurora MySQL behaved the same

ALTER TABLE TABLE_2023
    ADD PRIMARY KEY (`SOMETHING_ID`,`TIME`),
    ADD UNIQUE KEY `TIME_SOMETHING_ID` (`TIME`,`SOMETHING_ID`);

Update as of 05/10/2023 As suggested Rolando, I dropped the TIME index and added the following

ALTER TABLE TABLE_2023
ADD UNIQUE INDEX `TIME_SOMETHING_ID` (`TIME`,`SOMETHING_ID`);

I then ran the example query in the beginning of this post. Here are the results:

EXPLAIN enter image description here

Query Results

1822 rows in set (52.513 sec)

So, again, same behavior. Before I reverse the Primary Key order (as I did before), waiting to see response from Rolando for any further suggestions.

Update as of 05/16/2023 I have another meeting planned with AWS,but this time with an Aurora Mysql Specialist. Before I have the meeting, they suggested to turn off aurora_parallel_query.

Upon doing this, I ran the same example query and was shocked to see the results. The Explain results showed

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE TABLE_2023 range PRIMARY,time_Something_ID PRIMARY 11 1823 100.0 Using where

And the query itself, when ran, completed extremely quickly.

afterauroraparalleloffresults

However, before I write this off as the answer, I am curious to why this is the solution. Amazon markets Aurora Parallel Query as a benefit for moving to Aurora, so my use case must not benefit from this. I will post the details of the meeting here when I have them.

Update as of 05/25/2023 Same behavior with 3.03.1. Sending AWS a snapshot of our TABLE_2023 with a bug report.

Update as of 06/12/2023

AWS internally identified the "bug" and the fix is set to be released in the public versions of 3.04.0 and 3.03.2. These are projected to come out at the end of this quarter, or the beginning of next quarter.

14
  • How many rows are returned by your example query? How many rows total are there in TABLE_2023?
    – J.D.
    Commented Apr 28, 2023 at 3:59
  • The example query would just an example query. Depending on the request size from the API, it could be 1 row to over a million rows. Commented Apr 28, 2023 at 13:43
  • Well that'll affect which index and how it's used, because query plans are based on statistical information regarding the data being asked for (i.e. different physical operations are faster for different amounts of data). So there's no concrete answer to your question without a concrete query (and EXPLAIN ANAYLZE would help) to discuss.
    – J.D.
    Commented Apr 28, 2023 at 13:49
  • select count(*) on this year's table brings back 1809962901 rows. Commented Apr 28, 2023 at 14:26
  • Let me pull an API request really quick and get a query. One moment Commented Apr 28, 2023 at 14:27

1 Answer 1

0

From the looks of the example query, my immediate suggestion would be a better index. I would adamantly recommend creating the following

ALTER TABLE TABLE_2023
    ADD UNIQUE INDEX `TIME_SOMETHING_ID` (`TIME`,`SOMETHING_ID`),
    DROP INDEX `TIME`
;

So, the table would end up looking like this

CREATE TABLE `TABLE_2023` (
  `SOMETHING_ID` mediumint NOT NULL,
  `TIME` bigint unsigned NOT NULL,
  `COLUMN3` tinyint DEFAULT NULL,
  `COLUMN4` smallint DEFAULT NULL,
  `COLUMN5` smallint DEFAULT NULL,
  `JSON` varchar(2048) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  PRIMARY KEY (`SOMETHING_ID`,`TIME`),
  UNIQUE KEY `TIME_SOMETHING_ID` (`TIME`,`SOMETHING_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Another alternative would be to reverse the order of the primary key columns with

ALTER TABLE TABLE_2023
    DROP PRIMARY KEY,
    ADD PRIMARY KEY (`TIME`,`SOMETHING_ID`),
    ADD UNIQUE KEY `SOMETHING_ID_TIME` (`SOMETHING_ID`,`TIME`),
    DROP INDEX `TIME`
;

producing this

CREATE TABLE `TABLE_2023` (
  `SOMETHING_ID` mediumint NOT NULL,
  `TIME` bigint unsigned NOT NULL,
  `COLUMN3` tinyint DEFAULT NULL,
  `COLUMN4` smallint DEFAULT NULL,
  `COLUMN5` smallint DEFAULT NULL,
  `JSON` varchar(2048) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  PRIMARY KEY (`TIME`,`SOMETHING_ID`),
  UNIQUE KEY `SOMETHING_ID_TIME` (`SOMETHING_ID`,`TIME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Doing one of these would eliminates the need to rewrite the query in any way (no refactoring, no index hints). This would also accommodate queries that has TIME is in the WHERE clause but not SOMETHING_ID.

8
  • I thought of something similar, but was unsure if reworking the indexes in this case would make a difference. However, going to give this a go. Thank you for the advice! Commented May 8, 2023 at 12:27
  • Unfortunately, the suggestions boded the same results. AWS Aurora still does not pickup the idea to use any index. Commented May 8, 2023 at 15:47
  • You showed the EXPLAIN plan, but I am little suspect of the query. please show the actual query. Commented May 8, 2023 at 20:10
  • You bet-I will use the same query at the very beginning of the post (labeled under Example Query) and run the explain first, then run the query w/o forcing an index. Additionally, I will make sure to use the index suggestions you provided. Will take some time, but I will post results in the next 24 hours. Commented May 9, 2023 at 19:42
  • I would argue against making the secondary key UNIQUE instead of simply INDEX.
    – Rick James
    Commented May 20, 2023 at 22:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.