Friday, May 6, 2011

Everything is a query these days - Part 2

In my previous post I advocated a solution for detecting and subsequently deleting duplicate records from a Database. What about the reverse problem, i.e. when two Databases have a common table, but they are not synchronized? Let us suppose that a table in a production Database is correctly updated via an external app, but the same backup Database table is not updated due to a synchronization issue. Assuming that no fancy tools are out there to assist you in this issue, how would you manually update the backup Database table so as to be in sync with production?

The table in question is called emp and has four columns, emp_id, first_name, last_name, dept_id. It is obvious from this structure that the primary key is emp_id and records missing from the backup Database table are lacking this attribute.

The first solution may be something like the following query: 

INSERT INTO emp_backup
SELECT * FROM emp
WHERE emp_id NOT IN (SELECT DISTINCT emp_id FROM emp_backup);

This query has two evident problems. The first is performance, given that for every row fetched from emp table a check has to be made so as to define whether or not emp_id column value is present in the corresponding backup table.
But there is a bigger issue than this. Most RDBMS systems will not allow for an indefinite list of values in the NOT IN … clause. In other words, if query

SELECT DISTINCT emp_id FROM emp_backup

fetches more than x-number of rows, it will subsequently fail, giving you a warning for your actions. This So, what is the best query to solve this? The first solution that comes to mind is to upsert emp_backup table, i.e. make insert if record does not exist or make a dummy update if the record exists. Let us transform this into an Oracle query:

MERGE INTO emp_backup b
USING emp e
 ON (b.emp_id = e.emp_id)
WHEN NOT MATCHED THEN
  INSERT (b.emp_id, b.first_name, b.last_name, b.dept_id)
  VALUES (e.emp_id, e.first_name, e.last_name, e.dept_id);

The query above is Oracle's way of upserting data from one table to another. MySQL has a similar command called REPLACE INTO. More on REPLACE INTO can be found here.

No comments:

Post a Comment