So I've had a problem where I need to compare data in 2 different tables on two different servers. Now, I know MySQL supports CHECKSUM TABLES
, but from my testing and understanding, it's not reliable across server instances and versions.
So I created this query:
$part = '@CRC := MD5(CONCAT_WS(\'#\', COALESCE(`'.
implode('`, "#NULL#"), COALESCE(`', $this->_columns).
'`, "#NULL#")))';
$sql1 = "SELECT COUNT(*) AS cnt,
SUM(CONV(SUBSTRING({$part}, 1, 4), 16, 10)) as a1,
SUM(CONV(SUBSTRING(@CRC, 5, 4), 16, 10)) as a2,
SUM(CONV(SUBSTRING(@CRC, 9, 4), 16, 10)) as a3,
SUM(CONV(SUBSTRING(@CRC, 13, 4), 16, 10)) as a4,
SUM(CONV(SUBSTRING(@CRC, 17, 4), 16, 10)) as a5,
SUM(CONV(SUBSTRING(@CRC, 21, 4), 16, 10)) as a6,
SUM(CONV(SUBSTRING(@CRC, 25, 4), 16, 10)) as a7,
SUM(CONV(SUBSTRING(@CRC, 29, 4), 16, 10)) as a8
FROM `dbname`.`tablename`
WHERE `id` >= $min AND `id` <= $max ";
So basically, it's concatenating each row together (all the columns of each row more specifically) and then MD5ing them. Then it walks 4 hexbits at a time through that MD5 and sums them across all rows (4 hexbits to allow me to do huge tables without needing to worry about overflowing). Then, I just compare the result of this query on both tables to see if everything is the same.
By using this binary search, I am able to rather quickly narrow down where the changes are so that I can port them.
It's actually reasonably efficient, so I'm not too concerned about that. What I am concerned about is if this is even necessary. It's screaming to me "You're doing it wrong", but I can't figure out any cleaner method around it...
What are your thoughts?