Skip to main content

The development of the conceptual schema and/or the logical model and/or the physical settings of a database.

The result of database design is a plan for the construction of a database that captures some of the essential features of the proposed database, but omits a lot of less important detail. These plans often take the form of data models, and database design can be learned as the art of constructing a certain kind of data model.

Most databases that capture and manage semi-permanent data operate under the control of a Database Management System (DBMS). Prominent DBMS products are SQL Server, Oracle RDBMS, and DB2. There are dozens of others. Many of the questions and answers you’ll find under this tag relate to one of these DBMS products, but some design issues are DBMS independent.

The amount of preparation and education you’ll need before building your first successful database varies widely depending on many factors. Among other factors, it depends on how ambitious your database project is and on what prior experience you bring to bear on the project. Very experienced programmers sometimes underestimate the amount of material there is to learn about database design.

Sometimes programmers learn well by trial and error, or by postponing formal learning until their second or third project. Other times, database design neophytes make design decisions that lead into pitfalls that are very difficult to reverse.

There are many ways to measure the quality of a database design. Programmers building their first database are often primarily concerned with performance. There’s no question that performance is important. A bad design can easily result in database operations that take ten to a hundred times as much time as they should.

But don’t let performance issues blind you to other aspects of good design. In particular, future proofing of a database is enormously important. Failure to do this can result in a database that traps its users at the first level and prevents their data from evolving as their needs evolve.

Another aspect involves separating out the hidden features of a database (sometimes called physical design) from the public features visible across the application interface (sometimes called logical design). A neat separation of these features can result a database that can be tweaked and tuned quite a bit with no changes to application code. A poor separation of these features can result in a database that makes a nightmare out of application development or database administration.

Another consideration is whether the proposed database will be embedded within a single application, or whether it will be an information hub that serves the needs of multiple applications. Some design decisions will be made very differently in these two cases.

Yet another consideration is whether the application is going to perform all data management functions on behalf of its clients, or whether custodial responsibility for the database and its data is going to be vested in one or more DBAs (Data Base Administrators).


What kinds of questions will appear in the database-design tag?

You'll see a lot of questions about table design, data normalization, index design, query optimization, constraint declarations, and keys. A lot of questions, and many of the responses will address issues of speed or performance. There will be a lot of questions about key selection.

Most of the questions are about relational databases, including the SQL databases that are commonly called relational. A few questions are about "truly relational" databases or about "non relational" or "post relational" databases. A few are about semistructured or unstructured data.

A lot of questions tagged "database design" will also be tagged "data modeling". There is a huge overlap between the two subjects.

You'll see a lot of questions on the subject of table composition and decomposition. Closely related to table decomposition is the concept of data normalization. Indeed, many responders treat table decomposition and data normalization as though they are synonymous terms. They aren't quite synonymous. Nearly all improvements in data normalization result in table decomposition, but there are plenty of ways of decomposing tables that have nothing to do with normalization.

Data normalization is a brand new topic to many neophyte database designers. It's worth learning the rudiments of data normalization, even if the database you are building is small and simple. It's also sometimes worthwhile to disregard the rules of data normalization, but you really have to know what you are doing.

You'll also see a lot of questions on the subject of index design. Closely related to index design is query optimization. Many questions about either index design or query design have to do with how much effort the programmer should expend in getting the very best result out of the optimizer.

Three things are worth keeping in mind. First, optimization is often a matter of tradeoffs. Sometimes organizing things for rapid query will slow down data updates. Sometimes speed really matters in some database operations, but not others.

Second, you really need to pay attention to those things that slow operations down from seconds to minutes, or from minutes to hours, before you worry about 10% improvements.

Third, database delays vary enormously as the volume of data increases and as the number of concurrent users increases. Simple tests with one user and sample data can really mislead you about speed in a production environment.