3 Database Design
Week 3: February 6-12, 2022
- COWS 2.1-2.6, 2.8-2.9
- No assessment for this week, but the first part of PS1 is all about designing a database so don’t miss a beat here!
- Also week 4 builds on this week.
Design often brings in together art, practice and a bit of good sense. Database design is no different. There are some good sense guidelines and rules that you should follow, but there is also room for creative solutions. To become a good database designer, you need some experience and the best way to develop this is through designing databases and examining other designs and thinking about their goals and trade-offs. The video lectures here will try to explore conceptual designs for databases that are different from the textbook ones to give you a diverse set of examples to learn from.
3.1 Conceptual Design
ER Modeling is a very handy tool for conceptual design. To see its utility over using a relational model to conceptually design your database, think about designing the user interface for an application. UI designers often sketch by hand the different user interfaces and storyboard the flow of the application. This hand sketch is analogous to an ER diagram. An ER diagram is quite rough and isn’t as concrete as the actual relational schema you will eventually create for the database but it is a nice visual sketch that allows you to communicate your initial designs to the users and allows them to physically point at things and say “Yes, that is exactly the relationship between Animals and Keepers” or “This isn’t quite right and you are missing these relationships or properties!”
3.1.1 Basic ER Constructs
Here is the ER diagram that models the entities and the relationships in the NY Zoo of Abu Dhabi. Notice how the entity set “Keepers” participates twice in the “ReportsTo” relationship with two different roles: manager and subordinate.
This ER diagram is still missing a few important integrity constraints. It assumes all the relationships between entities are “many-to-many”: an animal can live in many enclosures and an enclosure can have many animals, or a keeper can have multiple managers and a manager can manage multiple keepers. We may want to define the type of and restrict these relationships. The following video explains how:
3.1.2 Design Choices
What makes good design? You want your design to achieve the following:
- It should capture well the semantics of the real world that you are modeling a database for.
- It should make it impossible to violate any of the integrity constraints.
- Common queries or database workloads should have good performance with this design.
- The design should be as simple as it can be!
In many cases, you can’t achieve all of the above. Good design is often a matter of making the right trade-offs.
In the following videos, we will look at some of the considerations that guide three design choices:
- Should we represent a real world object as its own entity or as an attribute of another entity?
- Should we represent an association between entities as an entity with relationships to each of the participating entities or as simply as a relationship?
- Should we represent a relationship between multiple entities as an n-ary relationship or should we break it down into several binary relationships?
3.2 Conceptual Design to Logical Design (ER to Relational)
In general, ER and the relational model are fairly analogous and you can even automate much of the translation process from an ER model to a relational model. There are, however, situations where the translation is not as straightforward and some design thought is required. I’ll illustrate some of the main translation steps here by illustrating how we can translate parts of the Zoo ER model into a DBMS relational model using SQL’s data definition language (DDL).
3.2.1 Entity Sets to Relations
Easy peasy! Keys in the entity set (the underlined attributes) become the primary key of the relation.
create table animals(
integer,
aid varchar(20),
name varchar(50),
species date
dob primary key (aid))
Usually, ER models do not allow for multi-valued, non-primitive data types for attributes; so one can easily select the appropriate data type for each attribute in the relational model. However, some design decisions have to be made. For example, for strings, is it a fixed length string? What data type will you use for identifiers - Integers, alpha-numeric string? Some of these decision can impact perforamnce. For example, using an integer for an identifier allows for faster primary-foreign key joins than using variable length character strings!
3.2.2 Relationship Sets to Relations
3.2.2.1 Many-to-Many relations
Those are the easiest to translate. The attributes of the relation must include keys for each participating entity set as a foreign key and all descriptive attributes.
create table fedby(
integer
aid integer
kid date
fdate time
ftime varchar(100)
food primary key (aid, kid, fdate, ftime)
foreign key (aid) references animals
foreign key (kid) references keepers
)
3.2.2.2 One-to-Many | Many-to-One Relationship (Key constraint)
Suppose each animal can live in at most one enclosure. We capture this many to one relationship (key constraint) between animals and enclosures with the following ER diagram.
By setting the primary key of the LivesIn relation to be the animals aid foreign key we capture the key constraint that an animal can live in at most one enclosure.
create table LivesIn(
integer,
aid integer,
eid primary key aid,
foreign key (aid) references animals,
foreign key (eid) references enclosures)
Alternatively, you can push the information about which enclosure an animal lives in into the Animals Relation:
create table animals(
integer,
aid varchar(20),
name varchar(50),
species date
dob integer,
eid primary key aid,
foreign key (eid) references enclosures)
Now, how do you represent a one-to-one relationship? Not so straightforward, but think about it :)
3.2.2.3 Participation Constraint
Suppose each animal has to live in at least one enclosure.
create table animals(
integer,
aid varchar(20),
name varchar(50),
species date
dob integer not null,
eid primary key aid,
foreign key (eid) references enclosures)
You can force the foreign key eid to not be null and this will require each animal to have at least one enclosure to live in (and because aid only is the primary key and we do not allow duplicate keys, each animal can live in at most one enclosure.)
We can use the “not null” constraint to capture participation constraints involving one entity set in a binary relationship, but little else.
What if you want total participation for both entity sets in a relationship? It is not trivial and you would need special DBMS support. Here is a stack exchange on how to do it in postgres. Also fun reading is how to enforce one favorite child in a database. Stack exchange is rife with questions on database design that illustrate how tricky it can get!