86

I have a query of the following form:

SELECT * FROM MyTable WHERE Timestamp > [SomeTime] AND Timestamp < [SomeOtherTime]

I would like to optimize this query, and I am thinking about putting an index on timestamp, but am not sure if this would help. Ideally I would like to make timestamp a clustered index, but MySQL does not support clustered indexes, except for primary keys.

  • MyTable has 4 million+ rows.
  • Timestamp is actually of type INT.
  • Once a row has been inserted, it is never changed.
  • The number of rows with any given Timestamp is on average about 20, but could be as high as 200.
  • Newly inserted rows have a Timestamp that is greater than most of the existing rows, but could be less than some of the more recent rows.

Would an index on Timestamp help me to optimize this query?

9
  • 1
    Even in MSSQL if you create a clusterd index on a non unique column it makes it unique under the covers. Surely an index will help select but slow down insert and the index will take disk space. But that sound like a trade you are willing to take. Put the index on the table and test. Indexes are used for > and <.
    – paparazzo
    Commented Jan 31, 2012 at 22:22
  • Do you have a clustered index on this table? Commented Jan 31, 2012 at 22:24
  • @BalamBalam I'm actually designing a database for queries of the above type so I can't test the performance. Commented Jan 31, 2012 at 22:29
  • @ypercube MySQL doesn't support clustered indexes, except for primary keys, which are clustered by default. Commented Jan 31, 2012 at 22:29
  • Actually no. InnoDB supports one clustered key per table, either the PRIMARY KEY or a UNIQUE KEY. If you haven't defined any of these, it will make an "undercover" one, (using a 6-byte INT, if I remember well). Commented Jan 31, 2012 at 22:31

4 Answers 4

81

No question about it. Without the index, your query has to look at every row in the table. With the index, the query will be pretty much instantaneous as far as locating the right rows goes. The price you'll pay is a slight performance decrease in inserts; but that really will be slight.

5
  • 20
    So there's no downside to the fact that the number of unique timestamps is quite high and will therefore result in quite a large index? Commented Jan 31, 2012 at 22:18
  • 1
    Instantaneous it will be if the difference between [SomeOtherTime] and [SomeTime] is small. Commented Jan 31, 2012 at 22:21
  • 3
    Thanks @ypercube - just qualified that in the answer :) - I'd say the downside of a few megabytes of index is worth it. Databases are good at that sort of thing!
    – Chris Nash
    Commented Jan 31, 2012 at 22:24
  • 5
    It will read 20-200 rows of the index; these will be consecutive in the BTree. Then it will do 20-200 lookups in the table for any other columns needed (SELECT *). Very efficient compared to not having INDEX(Timetamp).
    – Rick James
    Commented Nov 26, 2015 at 7:17
  • Same issue here with a timestamp column. Without index: 5 minutes, with index: 5 Seconds
    – Sauer
    Commented Nov 9, 2023 at 11:21
9

You should definitely use an index. MySQL has no clue what order those timestamps are in, and in order to find a record for a given timestamp (or timestamp range) it needs to look through every single record. And with 4 million of them, that's quite a bit of time! Indexes are your way of telling MySQL about your data -- "I'm going to look at this field quite often, so keep an list of where I can find the records for each value."

Indexes in general are a good idea for regularly queried fields. The only downside to defining indexes is that they use extra storage space, so unless you're real tight on space, you should try to use them. If they don't apply, MySQL will just ignore them anyway.

9

I don't disagree with the importance of indexing to improve select query times, but if you can index on other keys (and form your queries with these indexes), the need to index on timestamp may not be needed.

For example, if you have a table with timestamp, category, and userId, it may be better to create an index on userId instead. In a table with many different users this will reduce considerably the remaining set on which to search the timestamp.

...and If I'm not mistaken, the advantage of this would be to avoid the overhead of creating the timestamp index on each insertion -- in a table with high insertion rates and highly unique timestamps this could be an important consideration.

I'm struggling with the same problems of indexing based on timestamps and other keys. I still have testing to do so I can put proof behind what I say here. I'll try to postback based on my results.

A scenario for better explanation:

  1. timestamp 99% unique
  2. userId 80% unique
  3. category 25% unique

    • Indexing on timestamp will quickly reduce query results to 1% the table size
    • Indexing on userId will quickly reduce query results to 20% the table size
    • Indexing on category will quickly reduce query results to 75% the table size
    • Insertion with indexes on timestamp will have high overhead **
    • Despite our knowledge that our insertions will respect the fact of have incrementing timestamps, I don't see any discussion of MySQL optimisation based on incremental keys.
    • Insertion with indexes on userId will reasonably high overhead.
    • Insertion with indexes on category will have reasonably low overhead.

** I'm sorry, I don't know the calculated overhead or insertion with indexing.

1
  • 1
    That uniqueness percentage made what column to index click in my head. Thanks bud. Commented Mar 8 at 6:46
5

If your queries are mainly using this timestamp, you could test this design (enlarging the Primary Key with the timestamp as first part):

CREATE TABLE perf (
  , ts INT NOT NULL
  , oldPK 
  , ... other columns 
, PRIMARY KEY(ts, oldPK)
, UNIQUE (oldPK)
) ENGINE=InnoDB ;

This will ensure that the queries like the one you posted will be using the clustered (primary) key.

Disadvantage is that your Inserts will be a bit slower. Also, If you have other indices on the table, they will be using a bit more space (as they will include the 4-bytes wider primary key).

The biggest advantage of such a clustered index is that queries with big range scans, e.g. queries that have to read large parts of the table or the whole table will find the related rows sequentially and in the wanted order (BY timestamp), which will also be useful if you want to group by day or week or month or year.

The old PK can still be used to identify rows by keeping a UNIQUE constraint on it.


You may also want to have a look at TokuDB, a MySQL (and open source) variant that allows multiple clustered indices.

6
  • Big downside with this approach is that you now need to know the timestamp along with the old PK to find a row by PK. Commented Mar 7, 2013 at 20:06
  • @DavidHarkness No, if the old PK is still unique. I will edit the answer to make that clear. Commented Mar 7, 2013 at 23:53
  • Yes, with the new unique constraint you're good. If clustering by the timestamp is important the cost may be worth it. I'll have to consider this for two tables in the system I'm currently building that are essentially transaction logs for reporting. Commented Mar 8, 2013 at 0:21
  • Not recommended actually, PK has different usage than just an index. Commented Jul 21, 2017 at 14:54
  • 1
    Using timestamp as PK, not safe, you can do that with index and keep PK for the id. Commented Jul 28, 2017 at 15:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.