An upsert (UPdate or inSERT) SQL query does either one of the two following things. If the record in question does not exist yet (as identified by some primary key or other uniqueness constraint), the row gets inserted. Otherwise, the entry gets updated.
This behavior is often quite handy because it simplifies some boilerplate code. The user interface could present some functionality to add (insert) some entity and to modify (update) it. For the developer, both processes can be captured by the same functionality.
In MySQL, such an upsert query can be formed using the either
REPLACE ... or
INSERT ... ON DUPLICATE KEY UPDATE commands.
The difference being that
REPLACE will actually delete an existing entry and add a new one with the same/updated data while the
INSERT ... ON DUPLICATE KEY UPDATE command will actually update the existing row.
The end result is the same except for
auto_increment values which get incremented by
You cannot use
REPLACE if you want to reference an existing column value:
REPLACE INTO table (counter_col) VALUES (counter_col + 1) would yield
I’ll go with
INSERT INTO ... ON DUPLICATE KEY UPDATE because it is more powerful in my opinion.
In a simple CRUD application where users modify content as pictured above you will find yourself writing a lot of similar database queries like
INSERT INTO `X`,
INSERT INTO `Y`,
UPSERT `Y`, etc.
UPDATE can be combined into a single query as I said.
But: We can go further by automacilly generating the query for us.
The thing is, given a set of key-value pairs, we know how the query would look like - no matter the table columns.
We always talk about
INSERT INTO `tablename` VALUES(col1, col2, col3) (value1, value2, value3) ON DUPLICATE KEY UPDATE col1 = value`, col2 = value2, col3 = value3.
This can be trivially automated.
Before diving into the implementation, let’s make ourselves us a sample scenario:
And some content:
We have some astronomy app which contains information about our solar system. All we ever would want to do is inserting or updating a planet. Here is the little upsert query code which will help us along the way.
Let’s say we want to use the above code to upsert a planet. Here is some data we could use:
Feeding this into
INSERT INTO `planets`(satellites, orbital_speed__km_s, name) VALUES ("5", "4.67", "Pluto") ON DUPLICATE KEY UPDATE satellites = VALUES(satellites), orbital_speed__km_s = VALUES(orbital_speed__km_s), name = VALUES(name);
Note too bad, he?
As you can see
get_upsert_query() is completely agnostic about the actual database structure.
This is where it comes handy.
You can use it for all your tables!
Just throw the table name & some key-value pairs at it and you’re done.
Oh, and updating existing rows is as easy as:
Given the two sample queries, this is what happens in the database:
We have used the nearly identical queries in order to insert or update the planet Pluto.
get_upsert_query() function is a good fit for a small project where a specific database abstraction layer would be overkill but writing every query by hand time consuming.
It’s especially useful when the database design is not clear in the beginning and you don’t want to go ahead and change your code whenever your realized some column is missing (of course, we always have the requirements in advance, don’t we?)
Some further stuff:
- Note how we only need the column names for the query’s
UPDATEpart. You can use
colname = VALUE(colname)to tell MySQL to use the value that was already transmitted in the
- You obviously need to provide the primary key or the any unique key in the dict in order to correctly update an existing row.
- The function could me made even more elaborate. For example, you could remove from the dict any key that does not correspond to a column in the table. However, at this point the app might already be big enough for some db abstraction layer.