Database design theory is a topic that many people avoid learning
for lack of time. Many others attempt to learn it, but give up
because of the dry, academic treatment it is usually given by most
authors and teachers. But if creating databases is part of your job,
then you're treading on thin ice if you don't have a good solid
understanding of relational database design theory.
This paper begins with an introduction to relational database
design theory, including a discussion of keys, relationships,
integrity rules, and the often-dreaded "Normal Forms." Following the
theory, I present a practical step-by-step approach to good database
Note Most of the example tables discussed in this paper
can be found in the sample Microsoft® í¨‹í í¨ í‹ Access
database included on the Tech•Ed CD-ROM for this session:
The Relational Model
The relational database model was conceived by E. F. Codd in
1969, then a researcher at IBM. The model is based on branches of
mathematics called set theory and predicate logic. The basic idea
behind the relational model is that a database consists of a series
of unordered tables (or relations) that can be manipulated using
non-procedural operations that return tables. This model was in vast
contrast to the more traditional database theories of the time that
were much more complicated, less flexible and dependent on the
physical storage methods of the data.
Note It is commonly thought that the word relational in
the relational model comes from the fact that you relate together
tables in a relational database. Although this is a convenient way
to think of the term, it's not accurate. Instead, the word
relational has its roots in the terminology that Codd used to define
the relational model. The table in Codd's writings was actually
referred to as a relation (a related set of information). In fact,
Codd (and other relational database theorists) use the terms
relations, attributes and tuples where most of us use the more
common terms tables, columns and rows, respectively (or the more
physical—and thus less preferable for discussions of database design
theory—files, fields and records).
The relational model can be applied to both databases and
database management systems (DBMS) themselves. The relational
fidelity of database programs can be compared using Codd's 12 rules
(since Codd's seminal paper on the relational model, the number of
rules has been expanded to 300) for determining how DBMS products
conform to the relational model. When compared with other database
management programs, Microsoft Access fares quite well in terms of
relational fidelity. Still, it has a long way to go before it meets
all twelve rules completely.
Fortunately, you don't have to wait until Microsoft Access is
perfect in a relational sense before you can benefit from the
relational model. The relational model can also be applied to the
design of databases, which is the subject of the remainder of this
Relational Database Design
When designing a database, you have to make decisions regarding
how best to take some system in the real world and model it in a
database. This consists of deciding which tables to create, what
columns they will contain, as well as the relationships between the
tables. While it would be nice if this process was totally intuitive
and obvious, or even better automated, this is simply not the case.
A well-designed database takes time and effort to conceive, build
The benefits of a database that has been designed according to
the relational model are numerous. Some of them are:
- Data entry, updates and deletions will
- Data retrieval, summarization and
reporting will also be efficient.
- Since the database follows a
well-formulated model, it behaves predictably.
- Since much of the information is stored
in the database rather than in the application, the database is
- Changes to the database schema are easy
The goal of this paper is to explain the basic principles behind
relational database design and demonstrate how to apply these
principles when designing a database using Microsoft Access. This
paper is by no means comprehensive and certainly not definitive.
Many books have been written on database design theory; in fact,
many careers have been devoted to its study. Instead, this paper is
meant as an informal introduction to database design theory for the
Note While the examples in this paper are centered around
Microsoft Access databases, the discussion also applies to database
development using the Microsoft Visual Basic® í¨‹í í¨ í‹
programming system, the Microsoft FoxPro® í¨‹í í¨ í‹
database management system, and the Microsoft SQL Server™ client-server database management system.
Tables, Uniqueness and Keys
Tables in the relational model are used to represent "things" in
the real world. Each table should represent only one thing. These
things (or entities) can be real-world objects or events. For
example, a real-world object might be a customer, an inventory item,
or an invoice. Examples of events include patient visits, orders,
and telephone calls. Tables are made up of rows and columns.
The relational model dictates that each row in a table be unique.
If you allow duplicate rows in a table, then there's no way to
uniquely address a given row via programming. This creates all sorts
of ambiguities and problems that are best avoided. You guarantee
uniqueness for a table by designating a primary key—a column that
contains unique values for a table. Each table can have only one
primary key, even though several columns or combination of columns
may contain unique values. All columns (or combination of columns)
in a table with unique values are referred to as candidate keys,
from which the primary key must be drawn. All other candidate key
columns are referred to as alternate keys. Keys can be simple or
composite. A simple key is a key made up of one column, whereas a
composite key is made up of two or more columns.
The decision as to which candidate key is the primary one rests
in your hands—there's no absolute rule as to which candidate key is
best. Fabian Pascal, in his book SQL and Relational Basics,
notes that the decision should be based upon the principles of
minimality (choose the fewest columns necessary), stability (choose
a key that seldom changes), and simplicity/familiarity (choose a key
that is both simple and familiar to users). Let's illustrate with an
example. Say that a company has a table of customers called
tblCustomer, which looks like the table shown in Figure 1.
Figure 1. The best choice for primary key for tblCustomer
would be CustomerId.
Candidate keys for tblCustomer might include CustomerId,
(LastName + FirstName), Phone#, (Address, City, State), and (Address
+ ZipCode). Following Pascal's guidelines, you would rule out the
last three candidates because addresses and phone numbers can change
fairly frequently. The choice among CustomerId and the name
composite key is less obvious and would involve tradeoffs. How
likely would a customer's name change (e.g., marriages cause names
to change)? Will misspelling of names be common? How likely will two
customers have the same first and last names? How familiar will
CustomerId be to users? There's no right answer, but most developers
favor numeric primary keys because names do sometimes change and
because searches and sorts of numeric columns are more efficient
than of text columns in Microsoft Access (and most other databases).
Counter columns in Microsoft Access make good primary keys,
especially when you're having trouble coming up with good candidate
keys, and no existing arbitrary identification number is already in
place. Don't use a counter column if you'll sometimes need to
renumber the values—you won't be able to—or if you require an
alphanumeric code—Microsoft Access supports only long integer
counter values. Also, counter columns only make sense for tables on
the one side of a one-to-many relationship (see the discussion of
relationships in the next section).
Note In many situations, it is best to use some sort of
arbitrary static whole number (e.g., employee ID, order ID, a
counter column, etc.) as a primary key rather than a descriptive
text column. This avoids the problem of misspellings and name
changes. Also, don't use real numbers as primary keys since they are
Foreign Keys and Domains
Although primary keys are a function of individual tables, if you
created databases that consisted of only independent and unrelated
tables, you'd have little need for them. Primary keys become
essential, however, when you start to create relationships that join
together multiple tables in a database. A foreign key is a column in
a table used to reference a primary key in another table.
Continuing the example presented in the last section, let's say
that you choose CustomerId as the primary key for tblCustomer. Now
define a second table, tblOrder, as shown in Figure 2.
Figure 2. CustomerId is a foreign key in tblOrder which can be
used to reference a customer stored in the tblCustomer table.
CustomerId is considered a foreign key in tblOrder since it can
be used to refer to given customer (i.e., a row in the tblCustomer
It is important that both foreign keys and the primary keys that
are used to reference share a common meaning and draw their values
from the same domain. Domains are simply pools of values from which
columns are drawn. For example, CustomerId is of the domain of valid
customer ID #'s, which in this case might be Long Integers ranging
between 1 and 50,000. Similarly, a column named Sex might be based
on a one-letter domain equaling 'M' or 'F'. Domains can be thought
of as user-defined column types whose definition implies certain
rules that the columns must follow and certain operations that you
can perform on those columns.
Microsoft Access supports domains only partially. For example,
Microsoft Access will not let you create a relationship between two
tables using columns that do not share the same datatype (e.g.,
text, number, date/time, etc.). On the other hand, Microsoft Access
will not prevent you from joining the Integer column EmployeeAge
from one table to the Integer column YearsWorked from a second
table, even though these two columns are obviously from different
You define foreign keys in a database to model relationships in
the real world. Relationships between real-world entities can be
quite complex, involving numerous entities each having multiple
relationships with each other. For example, a family has multiple
relationships between multiple people—all at the same time. In a
relational database such as Microsoft Access, however, you consider
only relationships between pairs of tables. These tables can be
related in one of three different ways: one-to-one, one-to-many or
Two tables are related in a one-to-one (1–1) relationship if, for
every row in the first table, there is at most one row in the second
table. True one-to-one relationships seldom occur in the real world.
This type of relationship is often created to get around some
limitation of the database management software rather than to model
a real-world situation. In Microsoft Access, one-to-one
relationships may be necessary in a database when you have to split
a table into two or more tables because of security or performance
concerns or because of the limit of 255 columns per table. For
example, you might keep most patient information in tblPatient, but
put especially sensitive information (e.g., patient name, social
security number and address) in tblConfidential (see Figure 3).
Access to the information in tblConfidential could be more
restricted than for tblPatient. As a second example, perhaps you
need to transfer only a portion of a large table to some other
application on a regular basis. You can split the table into the
transferred and the non-transferred pieces, and join them in a
Figure 3. The tables tblPatient and tblConfidential are
related in a one-to-one relationship. The primary key of both tables
Tables that are related in a one-to-one relationship should
always have the same primary key, which will serve as the join
Two tables are related in a one-to-many (1–M) relationship if for
every row in the first table, there can be zero, one, or many rows
in the second table, but for every row in the second table there is
exactly one row in the first table. For example, each order for a
pizza delivery business can have multiple items. Therefore, tblOrder
is related to tblOrderDetails in a one-to-many relationship (see
Figure 4). The one-to-many relationship is also referred to as a
parent-child or master-detail relationship. One-to-many
relationships are the most commonly modeled relationship.
Figure 4. There can be many detail lines for each order in the
pizza delivery business, so tblOrder and tblOrderDetails are related
in a one-to-many relationship.
One-to-many relationships are also used to link base tables to
information stored in lookup tables. For example, tblPatient might
have a short one-letter DischargeDiagnosis code, which can be linked
to a lookup table, tlkpDiagCode, to get more complete Diagnosis
descriptions (stored in DiagnosisName). In this case, tlkpDiagCode
is related to tblPatient in a one-to-many relationship (i.e., one
row in the lookup table can be used in zero or more rows in the
Two tables are related in a many-to-many (M–M) relationship when
for every row in the first table, there can be many rows in the
second table, and for every row in the second table, there can be
many rows in the first table. Many-to-many relationships can't be
directly modeled in relational database programs, including
Microsoft Access. These types of relationships must be broken into
multiple one-to-many relationships. For example, a patient may be
covered by multiple insurance plans and a given insurance company
covers multiple patients. Thus, the tblPatient table in a medical
database would be related to the tblInsurer table in a many-to-many
relationship. In order to model the relationship between these two
tables, you would create a third, linking table, perhaps called
tblPtInsurancePgm that would contain a row for each insurance
program under which a patient was covered (see Figure 5). Then, the
many-to-many relationship between tblPatient and tblInsurer could be
broken into two one-to-many relationships (tblPatient would be
related to tblPtInsurancePgm and tblInsurer would be related to
tblPtInsurancePgm in one-to-many relationships).
Figure 5. A linking table, tblPtInsurancePgm, is used to model
the many-to-many relationship between tblPatient and tblInsurer.
In Microsoft Access, you specify relationships using the
Edit–Relationships command. In addition, you can create ad-hoc
relationships at any point, using queries.
This paper has covered the basics of database design in the
context of Microsoft Access. The main concepts covered were:
- The relational database model was
created by E.F. Codd in 1969 and is founded on set theory
- A database designed according to the
relational model will be efficient, predictable, well performing,
self-documenting and easy to modify.
- Every table must have a primary key,
which uniquely identifies rows in the table.
- Foreign keys are columns used to reference
a primary key in another table.
- You can establish three kinds of relationships
between tables in a relational database: one-to-one, one-to-many
or many-to-many. Many-to-many relationships require a linking
Database design is an important component of application design.
If you take the time to design your databases properly, you'll be
rewarded with a solid application foundation on which you can build
the rest of your application.