3 Database Design

Week 3: February 6-12, 2022

COWS 2.1-2.6, 2.8-2.9

No assessment for this week, but the first part of PS1 is all about designing a database so don’t miss a beat here!
Also week 4 builds on this week.

Database Design

Design often brings in together art, practice and a bit of good sense. Database design is no different. There are some good sense guidelines and rules that you should follow, but there is also room for creative solutions. To become a good database designer, you need some experience and the best way to develop this is through designing databases and examining other designs and thinking about their goals and trade-offs. The video lectures here will try to explore conceptual designs for databases that are different from the textbook ones to give you a diverse set of examples to learn from.

3.1 Conceptual Design

ER Modeling is a very handy tool for conceptual design. To see its utility over using a relational model to conceptually design your database, think about designing the user interface for an application. UI designers often sketch by hand the different user interfaces and storyboard the flow of the application. This hand sketch is analogous to an ER diagram. An ER diagram is quite rough and isn’t as concrete as the actual relational schema you will eventually create for the database but it is a nice visual sketch that allows you to communicate your initial designs to the users and allows them to physically point at things and say “Yes, that is exactly the relationship between Animals and Keepers” or “This isn’t quite right and you are missing these relationships or properties!”

3.1.1 Basic ER Constructs

Here is the ER diagram that models the entities and the relationships in the NY Zoo of Abu Dhabi. Notice how the entity set “Keepers” participates twice in the “ReportsTo” relationship with two different roles: manager and subordinate.

The ER diagram for the Zoo

This ER diagram is still missing a few important integrity constraints. It assumes all the relationships between entities are “many-to-many”: an animal can live in many enclosures and an enclosure can have many animals, or a keeper can have multiple managers and a manager can manage multiple keepers. We may want to define the type of and restrict these relationships. The following video explains how:

3.1.2 Design Choices

What makes good design? You want your design to achieve the following:

It should capture well the semantics of the real world that you are modeling a database for.
It should make it impossible to violate any of the integrity constraints.
Common queries or database workloads should have good performance with this design.
The design should be as simple as it can be!

In many cases, you can’t achieve all of the above. Good design is often a matter of making the right trade-offs.

In the following videos, we will look at some of the considerations that guide three design choices:

Should we represent a real world object as its own entity or as an attribute of another entity?
Should we represent an association between entities as an entity with relationships to each of the participating entities or as simply as a relationship?

Should we represent a relationship between multiple entities as an n-ary relationship or should we break it down into several binary relationships?

3.2 Conceptual Design to Logical Design (ER to Relational)

In general, ER and the relational model are fairly analogous and you can even automate much of the translation process from an ER model to a relational model. There are, however, situations where the translation is not as straightforward and some design thought is required. I’ll illustrate some of the main translation steps here by illustrating how we can translate parts of the Zoo ER model into a DBMS relational model using SQL’s data definition language (DDL).

3.2.1 Entity Sets to Relations

Easy peasy! Keys in the entity set (the underlined attributes) become the primary key of the relation.

The ER diagram of the Animal Entity

create table animals(
aid integer,
name varchar(20),
species varchar(50),
dob date
primary key (aid))

Usually, ER models do not allow for multi-valued, non-primitive data types for attributes; so one can easily select the appropriate data type for each attribute in the relational model. However, some design decisions have to be made. For example, for strings, is it a fixed length string? What data type will you use for identifiers - Integers, alpha-numeric string? Some of these decision can impact perforamnce. For example, using an integer for an identifier allows for faster primary-foreign key joins than using variable length character strings!

3.2.2 Relationship Sets to Relations

3.2.2.1 Many-to-Many relations

Those are the easiest to translate. The attributes of the relation must include keys for each participating entity set as a foreign key and all descriptive attributes.

The ER diagram of a many-to-many feeding relationship between animals and keepers

create table fedby(
aid integer
kid integer
fdate date
ftime time
food varchar(100)
primary key (aid, kid, fdate, ftime)
foreign key (aid) references animals
foreign key (kid) references keepers
)

3.2.2.2 One-to-Many | Many-to-One Relationship (Key constraint)

Suppose each animal can live in at most one enclosure. We capture this many to one relationship (key constraint) between animals and enclosures with the following ER diagram.

The ER diagram of the Key Constraint between animals and enclosures

By setting the primary key of the LivesIn relation to be the animals aid foreign key we capture the key constraint that an animal can live in at most one enclosure.

create table LivesIn(
aid integer,
eid integer,
primary key aid,
foreign key (aid) references animals,
foreign key (eid) references enclosures)

Alternatively, you can push the information about which enclosure an animal lives in into the Animals Relation:

create table animals(
aid integer,
name varchar(20),
species varchar(50),
dob date
eid integer,
primary key aid,
foreign key (eid) references enclosures)

Now, how do you represent a one-to-one relationship? Not so straightforward, but think about it :)

3.2.2.3 Participation Constraint

Suppose each animal has to live in at least one enclosure.

The ER diagram of the Key+Participation Constraint between animals and enclosures

create table animals(
aid integer,
name varchar(20),
species varchar(50),
dob date
eid integer not null,
primary key aid,
foreign key (eid) references enclosures)

You can force the foreign key eid to not be null and this will require each animal to have at least one enclosure to live in (and because aid only is the primary key and we do not allow duplicate keys, each animal can live in at most one enclosure.)

We can use the “not null” constraint to capture participation constraints involving one entity set in a binary relationship, but little else.

What if you want total participation for both entity sets in a relationship? It is not trivial and you would need special DBMS support. Here is a stack exchange on how to do it in postgres. Also fun reading is how to enforce one favorite child in a database. Stack exchange is rife with questions on database design that illustrate how tricky it can get!

3.3 Summary

2 Relational Algebra

4 Normalization