3

A common operation is to insert a new row if none exists, or update an existing row. Unfortunately, the syntax for insert and update SQL statements is completely different: Insert takes a list of columns followed by a corresponding list of values, while update takes a list of column=value pairs. The MySQL "insert...on duplicate key update" statement (its upsert statement) doesn't solve this problem, as it still requires the complete insert column/value lists followed by the complete update column/value list. [UPDATE: Comment from Wrikken below points out that the two statements can share the column=value syntax, but the other issues remain.]

Related to this problem is that if you're using triggers to check the data (as I do), you need two triggers (before-insert and before-update), and, as they have to use the "new" qualifier, you have to either write the checking code twice, once for each trigger, or put it into a procedure. If you use a procedure, you have to pass each column as a separate parameter, because the procedure can't use "new", which is a lot of potentially error-prone typing if you have a lot of columns. Each column has to have its type in the create table statement and then a second time in the definition of the checking procedure. Once little mistake and you've created a subtle, hard-to-find bug. I don't like any approach that involves coding the same thing twice. (It's the equivalent of non-normalization.)

Thinking about this insert/update problem, I've been toying with the following idea, and I'd like some feedback, especially if anyone has actually tried it:

Use insert only for a placeholder row, holding only the minimal amount of data, and to get or set the primary key. Then, put all of the user-entered data into an update statement. Now, you don't need "insert...on duplicate key update", as plain update will do. Also, you need to check data only on a before-update trigger, as there is nothing to check on an insert. (All user-supplied data from the entry form is handled by the update, never by the insert.)

The chief disadvantage of this method, of course, is that there are two operations for a new row: insert followed by update, instead of an insert. But, that may not be a factor because:

  1. Inserts may be relatively rare. For example, in a student grading application I did a few years ago for the Richardson (TX) School District, only a couple of thousand or so students were added each year, whereas there were tens of thousands of updates, as the teachers used the system throughout the school year.

  2. In several other systems I've built, performance was irrelevant. For example, a current system I'm working on has only two or three people updating the database for only a few hours a week. The load is so small that the overhead caused by two operations (insert + update) when only one would suffice is insignificant. (It's only for new rows, remember.)

So, has anyone actually tried this: Insert only to create a minimalist, placeholder row, and use update for all user-supplied-data updates?

4
  • 1
    You DO know that INSERT INTO tablename SET col1=1, col2=2... etc. is valid? And I've yet to find writing a query just once cumbersome...
    – Wrikken
    Commented Dec 12, 2012 at 17:18
  • dev.mysql.com/doc/refman/5.5/en/insert.html
    – jchapa
    Commented Dec 12, 2012 at 17:19
  • Thanks for pointing this out. In PHP, if the assignments are assembled as a string, then the string can be used in both places in the "insert...on duplicate key" statement, which helps a little. However, the need for two triggers is still there. My idea still allows only one trigger, which simplifies the trigger coding a lot. Ideas on that? Commented Dec 12, 2012 at 17:37
  • "allows only one trigger" should have been "requires only one trigger". Commented Dec 12, 2012 at 17:46

1 Answer 1

7

If I need the database to enforce "rules" for data validity, I will still need the INSERT trigger as well as the UPDATE trigger, because at the database level, I'm not going to be able to guarantee that someone isn't going to do an INSERT that includes invalid data. I'm inclined to have both triggers anyway.

Another drawback of a inserting a "placeholder" row and then updating is that (for variable length records), there's an aspect of fragmentation. The subsequent update is almost guaranteeing that the length of the row is going to increase, which is going to lead to an unnecessary increase in fragmentation in the database (which wouldn't occur if you just inserted the row as it needs to be there.)

I'd also need to consider the case of when the INSERT of the placeholder succeeds but the UPDATE fails. I'd have to have some additional mechanism for handling that condition.

It's going to be more efficient to run a single statement, and just insert the values that I know need to be there, rather than running two separate statements (one to insert a placeholder row, and then a second statement to update it.)

Personally, I'd just go with the INSERT ... ON DUPLICATE KEY UPDATE, but rather than repeating the values in the UPDATE portion, I would just reference the values supplied for those columns in the INSERT statement, e.g.

INSERT INTO foo (a,b,c) VALUES (1,'one','won'), (2,'two','too')
   ON DUPLICATE KEY
   UPDATE a = VALUES(a)
        , b = VALUES(b)
        , c = VALUES(c)

NOTE: One side effect of this statement to be aware of, especially if it's predominantly an UPDATE that gets performed. This statement will increment an AUTO_INCREMENT id for each row that is attempted to be inserted. That AUTO_INCREMENT id value will essentially be "wasted", since the generated value won't be inserted into the table, but it will "gone". (The next generated value will be one higher.)

1
  • Excellent point, which I missed: That putting the checks (validation) into the database is to ensure that they're always effective, and not up to the app programmer, yet requiring that an insert be a placeholder only is leaving it up to the app programmer. I didn't see this contradiction. Commented Dec 12, 2012 at 17:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.