0

Is it true that we never enable foreign key constraints in the dimensional model of a data warehouse? If yes, then what is the rationale behind that?

As per my research:

Some experts told me in a dimensional model, FK will never be enabled, and it is the responsibility of the ETL process to ensure consistency and integrity.

Data integrity issues may come into the picture, even though ETL is responsible enough through proper dependency.

Examples:

  • Late arriving dimension from source

  • few records could not pass data quality checks and routed to the error table.

  • intermediate tables are not populated due to batch load failure, and proper restart or recover steps are not followed. Someone restarted the last session to load data into the facts table while some of the dimensions are yet to be populated,

  • primary key constraints will help me to avoid duplicate record population if data in intermediate tables are getting processed one more time due to re-triggering the target table load session accidentally.

0

1 Answer 1

2

There is no "law" or standard which forbids FK constraints on a dimensional model. In fact, you already mentioned several reasons why it can make a lot of sense to add FK constraints to a star schema.

The main reason why FK might not be enabled for such a model is IMHO this one: ETL processes often take a lot of time, maybe hours or days, so performance may matter. A popular optimization technique is to disable any constraints during the load, but enable them later, when the ETL process is finished. Note that FK constraints will usually imply indexing, which is crucial when selecting data from the model, which happens usually after the ETL process is completed.

But it actually depends on the case. If you think you can fulfill your performance requirements with FKs enabled, and it does even help you to keep the data in better quality during the ETL process, then go ahead. If FKs actually slow your ETL process heavily down, then better switch them off (but enable them afterwards).

5
  • Thank you Doc, it helps !!! I have one more query based on your answer. What type of indexing will be implied by FK. Index is associated with FK column by default or we need to define explicitly on FK? For example when we create PK and UK constraint then unique index is associated with PK and UK columns by default. Is that true for FK constraint as well? If yes then is it Bitmap index or B tree index. As far as I understand Bitmap indexes make sense in DWH environment because they are compressed in size and hence index space is not an issue for large number of indexed FKs in fact. Commented May 7, 2020 at 18:12
  • @Curious_Mind: that depends on the specific DBMS, sometimes on the DB configuration, sometimes on the specific version.
    – Doc Brown
    Commented May 7, 2020 at 20:49
  • FK's definitely slow down deletes, might imapact updates/inserts a bit, but definitely assist in speeding up selects, even without an index, because they give critical clues to the query planner. I recommend implementing FK's unless you have a specific performance issue
    – Nick.Mc
    Commented Feb 2 at 0:47
  • @Nick.Mc: the OP wrote about a dimensional model of a data warehouse. Such models are usually filled as a snapshot from some other data source by an ETL process which often does not require individual deletes. When the snapshop is deleted, it is deleted as a whole (for example, by dropping all tables and recreating the schema, or by some "TRUNCATE TABLE" statement (in Oracle), or by switching off all contraints during the deletion process. And yes, what you wrote about "FK's unless you have a specific performance issue" is exactly what I wrote in my answer.
    – Doc Brown
    Commented Feb 2 at 5:24
  • I just read your answer properly and upvoted it :)
    – Nick.Mc
    Commented Feb 2 at 5:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.