0

The information flowing from an OLE DB Source to an OLE DB Destination is getting inserted, allowing duplicate records to be submitted after the package is run.

Is there a good way to replace these records instead of inserting them to avoid duplicates?

I'm using Visual Studio to edit the package if that helps.

1 Answer 1

0

What you're looking for is how to implement an upsert (insert/update) pattern.

When you're adding data to a table with existing data, there are 3 conditions you need to account for:

  1. This record is brand new
  2. This record exists and is different
  3. This record exists but is identical

Andy Leonard has a great series, the Stairway to Integration Services but steps 3 and 4 are probably what you're looking for

https://www.sqlservercentral.com/steps/adding-rows-in-incremental-loads-level-3-of-the-stairway-to-integration-services

The OLE DB Destination component only handles inserts, thus updates (or replace in your terminology) are not an option.

That leaves you with two choices: the OLE DB Command or another OLE DB Destination.

OLE DB Command allows you to run arbitrary commands for every row that is sent. And it is executed for every row sent which if you have a large amount of rows, can be impactful on performance.

The better option is a second OLE DB Destination. Stuff all the existing rows that need updating (case 2) into a "staging changes table."

After the data flow, you'll add an Execute SQL Task that performs a bulk update of data for all the case 2 rows.

UPDATE T
SET Col1 = S.Col1 /*etc*/
FROM dbo.Target AS T
INNER JOIN dbo.Source as S
ON S.SurrogateKey = T.SurrogateKey;

In my mind you're package will look something like

                        OLE DB Source
                             |
                        Lookup PK Target
                             |
               _____exists_______not exists____
               |                            |
         Lookup cols Target           OLE DB Dest (Case 1)
               |
 __________________not exists____
                      |
                 OLE DB Dest (Case 2)

The first lookup is testing for existence using the key(s). What we don't find is New Row. What we do find, we'll then look up a second time, this time matching on all the columns we care about (i.e. maybe insert date is only in the database and therefore, we don't care about it). If we still match after the second lookup, those rows exist and are identical, Case 3, so we do nothing. The Updates are rows that Exist in the first lookup but do not exist in the second.

There are lots of optimizations we can do to simplify this but for a rookie, this is probably the easiest to grasp.

Not the answer you're looking for? Browse other questions tagged or ask your own question.