avoid joining a table 100 times with itself

Question

I have a huge table (millions of rows) which looks like this (in essence)

 datatime               tagname      interesting somemore columns
 2014-12-04 20:00:00   grp1_tagA          77        0       0
 2014-12-04 20:00:00   grp1_tagB          88        0       0
 2014-12-04 20:00:00   grp1_tagC          99        0       0
 2014-12-04 20:00:00   grp2_tagA          11        0       0
 2014-12-04 20:00:00   grp2_tagB          22        0       0
 2014-12-04 20:00:00   grp2_tagC          13        0       0
 2014-12-04 21:00:00   grp1_tagA          17        0       0
 2014-12-04 21:00:00   grp1_tagC          28        0       0
 2014-12-04 21:00:00   grp1_tagC          29        0       0
 2014-12-04 21:00:00   grp2_tagA          31        0       0
 2014-12-04 21:00:00   grp2_tagB          62        0       0
 2014-12-04 21:00:00   grp2_tagC          53        0       0
 2014-12-04 22:00:00   grp1_tagA          87        0       0
 2014-12-04 22:00:00   grp1_tagB          48        0       0
 2014-12-04 22:00:00   grp1_tagC          99        0       0
 2014-12-04 22:00:00   grp2_tagA          51        0       0
 2014-12-04 22:00:00   grp2_tagB          42        0       0
 2014-12-04 22:00:00   grp2_tagC          53        0       0

In the real table, there are tens of groups, each group has ~100 tags, and for each group and tag, there is several years worth of hourly data (so couple of ten thousand rows per tagname), amounting to currently about 8 million rows. At a later stage, other tables, which have smaller time interval, and are hence even bigger, will come into play.

I need a FAST way to get all data out of the table which has to do with a certain group (say, group 1, i.e. tagname starting with "grp1"), in some date range (data to be sent to some client's browser for visualization.)

So I want to produce a "group 1 digest" table such like this

group1_1

A simplistic query would be something like (dropping the date constraint for now)

SELECT A.`datatime` as `datatime`,
A.`interesting` as tagA, B.`interesting` as tagB, C.`interesting` as tagC 
FROM `everything` A, `everything` B, `everything` C
WHERE 
A.`datatime` = B.`datatime` AND
A.`datatime` = C.`datatime` AND
A.`tagname` = "grp1_tagA" AND
B.`tagname` = "grp1_tagB" AND
C.`tagname` = "grp1_tagC"

It's actually a little more complicated, because at some date, some tags might have data, while others don't, and I also want the rows with partial data. So with one more row

enter image description here

what I want is

group1_2

A possible query to this end is

SELECT GLUE.thyme, A.iwant as tagA, B.iwant as tagB, C.iwant as tagC FROM
(SELECT distinct `datatime` as thyme from `everything`) GLUE left join
(SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagA") A on GLUE.thyme = A.thyme left join
(SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagB") B on GLUE.thyme = B.thyme left join
(SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagC") C on GLUE.thyme = C.thyme

Problem: The "real world" version of this query is not fast enough. I tested the above query structure with 34 tag names (making 35 table joins), adding a date constraint like where/and datatime >= '2013-12-04' to each of the subqueries, so that a total of 8760 rows (i.e. 1 year of data) was returned. The resulting run time was 2 and a half minutes. I'm targeting something well below half a minute, which is the time to transfer the data over the internet.

The big table has a composite primary key index on datatime and tagname, and an index (key) on datatime.

How can I get the data faster with a better equivalent query?

A small piece of advice. I started reading the question and thought "oh I know how to fix this". And then I read a bit more. And scrolled some more. And saw multiple numbered questions, and gave up. In other words, your question is just to complicated (for me) to consider answering on this forum. — Gordon Linoff, Commented Dec 5, 2014 at 18:00
My immediate reaction is that even if you decide to keep your compound tagname column, you should at least consider having to separate columns, groupname and tagwithingroup, Doing so will make the queries easier to write, almost certainly. I'd review dropping the existing tagname column (or make it hold what I called tagwithingroup); you can create its value easily enough when you need to present it. You'd probably want an index on the groupname and tagwithingroup columns, maybe with the date/time column added as the third item in the index (which might then be a unique index). — Jonathan Leffler, Commented Dec 5, 2014 at 18:10
@Jonathan: I don't have the option of redesigning the database. It's "industrial" data from a water treatment facility, I don't work there, I just visualize the data for them. But yes, splitting the tagname column would be a good idea (for them.) — mathheadinclouds, Commented Dec 5, 2014 at 18:29
@GordonLinoff point well taken, question should have ended with "Question 1" - I didn't anticipate such a simple and efficient answer to exist, that's why it didn't. — mathheadinclouds, Commented Dec 6, 2014 at 18:26

BateTech · Accepted Answer · 2014-12-05 21:03:05Z

3

Try using a group by on the datatime column, and a case statement as follows.

SELECT a.datatime
    , sum(case when a.tagname = 'grp1_tagA' then a.interesting else NULL end) as tagA
    , sum(case when a.tagname = 'grp1_tagB' then a.interesting else NULL end) as tagB
    , sum(case when a.tagname = 'grp1_tagC' then a.interesting else NULL end) as tagC
FROM everything AS a
WHERE a.datatime >= '2013-12-04'
GROUP BY a.datatime
;

edited Dec 5, 2014 at 21:03

answered Dec 5, 2014 at 18:02

BateTech

6,2863 gold badges22 silver badges33 bronze badges

That's it! Much faster! I just replaced zero with NULL, that's what I'm going to use in production.
– mathheadinclouds
Commented Dec 5, 2014 at 18:40
Glad it worked for you. I updated the answer to use NULL instead of zero in the case statements.
– BateTech
Commented Dec 5, 2014 at 21:05
You could condense the SQL by using IF, rather than CASE: SUM(IF(a.tagname='grp1_tagA', a.interesting, NULL)) as TagA. en.wikibooks.org/wiki/MySQL/Pivot_table
– Jon Senchyna
Commented Dec 5, 2014 at 21:09
@JonSenchyna Interesting, thanks for the info. I didn't know about the IF statement with MySQL b/c I normally work with SQL Server or Oracle. When 2 options that do the same thing exist, and one is an ANSI standard and one is not (CASE vs. IF), I normally prefer to go with the standard syntax and so would still use CASE.
– BateTech
Commented Dec 5, 2014 at 21:26
@BateTech That's probably a good stance to take (going with the ASNI standard). I also only really work in SQL Server, but I did a quick google search on pivoting in MySQL and found the use of the IF (my original thought was to use the PIVOT functionality, until I realized this question was not SQL Server).
– Jon Senchyna
Commented Dec 12, 2014 at 14:58

Add a comment |

mathheadinclouds · Accepted Answer · 2014-12-06 18:31:36Z

0

Tests on the huge table with millions of rows have shown that BateTech's excellent answer can still be a little bit improved, like so

SELECT a.datatime
    , sum(case when a.tagname = 'grp1_tagA' then a.interesting else NULL end) as tagA
    , sum(case when a.tagname = 'grp1_tagB' then a.interesting else NULL end) as tagB
    , sum(case when a.tagname = 'grp1_tagC' then a.interesting else NULL end) as tagC
FROM (SELECT * FROM everything WHERE datatime >= '2013-12-04' and tagname like "grp1_%") AS a
GROUP BY a.datatime
;

edited Dec 6, 2014 at 18:31

answered Dec 6, 2014 at 18:23

mathheadinclouds

3,6352 gold badges28 silver badges40 bronze badges

Add a comment |

Collectives™ on Stack Overflow

avoid joining a table 100 times with itself

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
mysql
sql
database
join
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged mysqlsqldatabasejoin or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
mysql
sql
database
join
or ask your own question.