62

I have done a fair bit of work with relational databases, and think I understand the basic concepts of good schema design pretty well. I recently was tasked with taking over a project where the DB was designed by a highly-paid consultant. Please let me know if my gut intinct - "WTF??!?" - is warranted, or is this guy such a genius that he's operating out of my realm?

DB in question is an in-house app used to enter requests from employees. Just looking at a small section of it, you have information on the users, and information on the request being made. I would design this like so:

User table:

UserID (primary Key, indexed, no dupes)
FirstName
LastName
Department

Request table

RequestID (primary Key, indexed, no dupes)
<...> various data fields containing request details
UserID -- foreign key associated with User table

Simple, right?

Consultant designed it like this (with sample data):

UsersTable

UserID  FirstName   LastName
234     John        Doe
516     Jane        Doe
123     Foo         Bar

DepartmentsTable

DepartmentID   Name
1              Sales
2              HR
3              IT

UserDepartmentTable

UserDepartmentID   UserID   Department
1                  234      2
2                  516      2
3                  123      1

RequestTable

RequestID   UserID   <...>
1           516      blah
2           516      blah
3           234      blah

The entire database is constructed like this, with every piece of data encapsulated in its own table, with numeric IDs linking everything together. Apparently the consultant had read about OLAP and wanted the 'speed of integer lookups'

He also has a large number of stored procedures to cross reference all of these tables.

Is this valid design for a small to mid-sized SQL DB?

Thanks for comments/answers...

9
  • 12
    Oh boy, if this makes you say WTF, then you probably have not seen tables with 200+ columns and stored procedures more than 1000 lines long.
    – Job
    Commented Sep 29, 2011 at 18:02
  • 43
    +1 for not deleting after feeling embarassed. Thanks for leaving this so others can learn. Commented Sep 29, 2011 at 18:09
  • 2
    @Job - actually, I haven't - I'm not a DBA by trade (pretty obvious by now! lol), so my SQL WTF threshold is fairly low. Although, my completely missing the point of the consultant's design has me WTF'ing my own abilities. Ever have a day where you just feel dumb?
    – Jim
    Commented Sep 29, 2011 at 18:11
  • 10
    @Jim: Congratulations, you've turned a dumb day into an enlightened day. Commented Sep 29, 2011 at 18:14
  • 3
    Curse those highly paid consultants! Commented Sep 30, 2011 at 4:59

6 Answers 6

73

Makes perfect sense to me. It's just very normalized, which imparts a lot of flexibility that you wouldn't have otherwise. De-normalized data is a pain in the butt.

6
  • your answer makes perfect sense, and looking over my question and the schema maybe it is just the sheer number of tables he is using that confused me. I greatly simplified the example for my question, but I see how the concept is sound - he is just splitting things off much more than I would. Sigh, I guess it's a good thing I'm not a DBA! :)
    – Jim
    Commented Sep 29, 2011 at 17:12
  • Learn to design by the ten-minute rule: "What holds true now probably won't in ten minutes." Make sure your designs can deal with change.
    – Blrfl
    Commented Sep 29, 2011 at 17:14
  • 1
    This schema actually has the advantage that when an employee is inserted, their department has to exist. Commented Sep 30, 2011 at 14:57
  • @SimonRichter: It is not true. The Employee can be created without any Department existing, and also the reverse. Commented Jul 1, 2012 at 20:31
  • @SimonRichter The benefit of this design is, firstly, that the Department is a separate entity, and secondly, that there is a many-to-many relationship between Department and Employee, as opposed to the OPs example, where it was "many-to-one-ish" (could not say many-to-one, as no separate Department entity it was referencing to be called a relationship). Commented Jul 1, 2012 at 20:34
48

I don't think that either a WTF is warranted or that the guy is doing any mad genius kind of design - it's pretty standard database normalization.

The reason for the department table is that if you don't put the departments into a separate table you will have to deal with users in "Sales", "sales", "Salesmen", "Sails", and "Selling" departments, unless you do something to prevent it. And having the extra table is (part of) the best way I know to do that.

Whether there should be a UserDepartment table is a tougher call, which of course means neither decision is way out and crazy. On the one hand it's a pain when all your table design and logic had assumed one department per user and then that changes, on the other hand doing an extra join for no reason for years and years is a real possibility and also a pain.

I personally would agree with you that the UserDepartment table is probably overkill. Even if it's included, the chances are that over time people will write queries that assume there is only one user per department, so you will end up with the worst of both worlds - an extra join for no reason prior to needing the table, and code not working anyway once more than one department per user becomes allowed.

EDIT - A key driver of whether the many to many relationship should be supported is if the business rules are clear. If you have no idea how a user in multiple departments would even work there isn't much point in adding the table, since your code can't possibly correctly handle the cases where a user is in multiple departments.

Imagine that you did allow many departments per user, just in case. You then implemented a business rule for assigning commissions, based on department. Then multiple departments were allowed. Fortunately, you also had the foresight to write your commission code in a way that took this into account. Unfortunately, you added the commissions from each department for users in both. Management wanted you to base in on the persons role for each sale. So, how much good was having the table in advance? What about the other tables you had "just in case" that are never needed at all?

LATER EDIT - Another reason the consultant might have wanted to add all those intermediary tables is addressed in this follow-up question, the answers to which give some reasons why refactoring a database is usually harder than refactoring code, which would tend to push you towards the "put in all the tables you might ever need" approach.

4
  • I think you put into words what my WTF was - the guy is using TONS of these intemediary tables, and it just seemed so stupid to me. Now that I've broken it down into a much smaller example for this question, I feel rather stupid for posting it since it doesn't seem that bad.
    – Jim
    Commented Sep 29, 2011 at 17:18
  • 5
    As you can see by lots of the comments, there is a healthy skepticism about "there will only ever be one X per Y" comments. The consultant is covering himself from "how come there can only be one X per Y" complaints. Some of which will probably come up. But he won't be responsible for maintaining code that has many joins (not too bad, but harder) and that has to be correct against business rules that don't yet exist (bad) - imagine the question "why do users get ALL the permissions from each department, they should get the LOWEST of each permission" or some such.
    – psr
    Commented Sep 29, 2011 at 17:31
  • @psr I think there's a typo: shouldn't "queries that assume there is only one user per department" be "queries that assume a user is in only one departement" ?
    – BiAiB
    Commented Sep 30, 2011 at 12:19
  • @BiAiB - you are correct, that is what I meant to say.
    – psr
    Commented Sep 30, 2011 at 17:44
14

If the requirement is to have multiple departments per user, this design makes sense. The only gripe about it is the UserDepartmentTable having a surrogate key UserDepartmentID that is not needed (just make the UserId and DepartmentId a composite primary key).

If a user only ever belongs to a single department, your design makes sense (though a department lookup table would still be a good thing).

14
  • 18
    ...Until more than one department is possible per user.
    – Blrfl
    Commented Sep 29, 2011 at 16:53
  • 1
    Exactly, @Blrfl. Today's not-ever-going-to-happen is tomorrow's the-CEO-is-having-an-aneurysm-because-it-doesn't-do-it. Commented Sep 29, 2011 at 17:09
  • 2
    Part of deciding what's worthy of that kind of treatment is understanding the problem domain. In some applications, it might be important to know that employee #3804 has been known to the company as Ann Smith and Ann Jones (after getting married), which would make normalizing the name out of the table of employees a sane thing to do. In Jim's case, it might be worth expanding the breaker table to keep a history so that if Ann moves from HR to IT, the fact that an old request tied to her could reflect that it was really HR's request and not IT's.
    – Blrfl
    Commented Sep 29, 2011 at 18:01
  • 8
    YAGNI - databases can be refactored.
    – JeffO
    Commented Sep 29, 2011 at 18:26
  • 2
    @Oded, Some ORM mappers like Entity Framework do not work well with tables that have a composite primary key.
    – maple_shaft
    Commented Sep 29, 2011 at 19:21
5

Some requirements are not clear in your question. The correct answer depends on what your customer wants - If I were you, I would ask the customer about this:

0-What is the difference between a user and an employee?

1-Assuming an employee=user, what if an employee changes departments?

2-Can a group of employees make 1 request?

3-Could an employee belong to more than one department? What about the CEO

4-Is there a subset of employees that are allowed to make requests?

5-What happens to the request when an employee record is deleted (if ever)?

6-Could you delete a request? What happens when the request is delted (make sure you don't delete the employee record by RI)

7-Can the employee make the "same" request more than one time (define the "same")

8-How to handle requests for employees when they leave the company (cancel their requests or delete the requests?)

There may be more questions, but my point is that the solution depends on exact requirements and project scope. Once that is determined the schema can be derived directly. Accordingly both solutions presented may be correct.

2
  • +1 these are great questions that need to be clarified before designing this type of schema. I like your flow of logic.
    – user29981
    Commented Sep 30, 2011 at 0:12
  • @Surfer513: I appreciate your nice comment.
    – NoChance
    Commented Sep 30, 2011 at 0:20
1

I'd like to add a couple point form notes talking explicitly about some of the potential advantages of using a join table in the fashion that your highly-paid consultant did.

  • Properly indexed (e.g., if UserDepartmentTable indexes the two foreign keys), there is only a small performance loss of a join table like this due to the foreign keys not being unique. If the foreign keys are guaranteed to be unique, the by the little database theory I know, looking up UserDepartmentTable.Department is no "harder" than looking up any other column in the User table.
  • The join table gives you more flexibility to set up other information about the association between the user and the department (e.g., timestamps on creation).
  • The join table allows you to "version" the association fairly easily (e.g., when a user changes departments, trigger some index boolean flag like UserDepartmentTable.Active to false and create a new association that is active). It's also possible to have department association versioning with the two-table model (just User and Department), but it's tougher and necessitates adding at least one more column or doing database acrobatics in order to avoid duplicating primary keys.
  • It allows you to assign one-to-many or many-to-one or many-to-many associations pretty easily.

That being said, there are several reasons NOT to do what your highly-paid consultant did.

  • All the above benefits are all anticipatory of possible future needs, overcomplicating things for the present day. It is not YAGNI-compliant. It's of trivial difficulty later to write a migration that moves from your two-table model to a join table model. You can do that when the business need emerges. To do it before that can be confusing.
  • It confuses other developers. While, yes, I would say the expectation for a web developer of your stature (where you're reviewing consultants' decisions) would be to understand and recognize a join table, it's still more complicated than necessary and considering the lack of business need, it causes confusion.
2
  • nice analysis - however, I wouldn't say I have any stature as a dev in my day job, except that I am the only one here who knows anything about db/c#/vb/etc... so guess I'm part time dev by default. this is a fairly small project, so the consultants sheer number of tables and joins left me saying "wtf" (but thanks to you fine folk I am now saying "oic...")
    – Jim
    Commented Sep 30, 2011 at 18:09
  • Quite an old topic, but still relevant... refactoring may be very hard, imagine you need multiple departments in the future instead of one, but have only a department ID in Users as FK. You will likely end up with duplicate referendes (Users.DeptID and UsersDepartmentsTable) or complete garbage, like comma-separated lists in Users.DeptID, or XML. The correct solution could not be easily added, as suggested by YAGNI or KISS, but would be obstructed.
    – Erik Hart
    Commented Oct 5, 2015 at 9:45
0

Without full structure of needed information I can't say is it terrible or not. But at least shown piece is not of "WTF" designs. It's just seems as 3-rd Normal Form of data-structure (well, we have theoretically also 4-rd and 5-th also)

Some talks can have place for UserDepartmentTable between two schools of "natural" and "artificial" keys in shown piece. Nothing more, as I can see

Normalization is rule of good DB-developer/designer for a lot of reasons, *de*normalizations are used sometimes in the middle of developments for speed-win mostly

Not the answer you're looking for? Browse other questions tagged or ask your own question.