0

I've been given the unenviable task documenting the catalogue ingest process of our data warehouse. All was going well until the end where I came across the use of a conditional split inside of an SSIS data flow task.

The data flow task is relatively simple, loading the empty live catalogue table from the staging table, however there's a conditional split that takes the input from the staging table and splits it into four separate outputs using a modulo expression based on the unique record number.

  • Catalogue_Key % 4 < 1 (Output 1)
  • Catalogue_Key % 4 < 2 (Output 2)
  • Catalogue_Key % 4 < 3 (Output 3)
  • Catalogue_Key % 4 < 4 (Output 4)

(Catalogue_Key is a auto incrementing integer field)

All four paths out of the conditional split are then loaded into the same destination table with no additional processing.

Conditional Split

Can anyone offer any wisdom for why this conditional split might have been implemented given that all the records get loaded into the same destination table.

I've tried searching Google and Stack Overflow for an answer but I've been unable to find anything related, however now knowing the reason, it's hard to know what to even search for.

5
  • This sounds like it would be a business logic/decision and could be this way for any number of reasons. Are you certain the insert into the same staging table is not inserting/changing any of the data on the insert or anything else? There could have been a step in the middle to do a data transform or something that was later removed but this part to split was not removed. It is hard to tell why logic is used in your code, especially if you do not provide teh entire process code.
    – Brad
    Commented Mar 5 at 17:15
  • Thanks @Brad I've updated the post to include an image so you can see how little logic exists. The conditional split simply splits the input (seemingly) arbitrarily into four separate streams and then all four streams are loaded into the same output table. The SSIS package hasn't changed since it was built so there's nothing has been taken out in the past. Might there be any performance benefit doing it this way?
    – DickMille
    Commented Mar 5 at 17:30
  • 1
    FYI, SQL Server 2005 reached End of Line nearly a decade ago. You urgently need to get your environment upgraded, as it is insecure. Getting that upgrade is only to get harder as time passes, as no supported versions of SQL Server support migrations from SQL Server 2005, meaning you can't in-place upgrade or even BACKUP and RESTORE.
    – Thom A
    Commented Mar 5 at 18:00
  • 1
    Can it be some sort of multithreading hack? Commented Mar 5 at 18:05
  • I was thinking that a bit for the multithreading thing but didnt see it was 2005, so that is probably why it was done that way because I do not think/remember if 2005 had multithreading if the SSIS package is an older version as well.
    – Brad
    Commented Mar 5 at 18:56

0