# 2 Relational Algebra

Week 2: January 30 - February 5, 2022

In this week’s module we will focus on the relational model’s abstract high level language: relational algebra. Relational algebra is a **procedural programming language**, i.e. a relational algebra query or expression specifies exactly (i) what operators to use and (ii) in what order.

This contrasts with SQL, which is a **declarative language**: you specify the properties of the final result set you wish. In SQL, you do not the specify operators to execute or their order of execution. We will deep dive into SQL during the in-class sessions this week. In the Appendix, you will find the SQL deep dive worksheet.

- COWS 4.1, 4.2, 15.3

- COWS 4.3, 4.4

- Relational Algebra on Gradescope.

## 2.1 Basics

All of the operators in relational algebra take in a relation and output a *different* relation.

For example the query or expression \(\pi_{\mathsf{species}}(\mathsf{Animals})\) produces a new relation with only one column \({\sf species}\) from the relation \({\sf Animals}\) which has attributes \({\sf aid, name, species, age, feedtime, ...}\).

**Relational algebra is an algebra over sets.** Each relation is a set: it has no duplicate rows. Relational database management systems relax this requirement and do allow for duplicates. The semantics of relational operators vary slightly when operating over *multisets* or *bags* rather than strict sets.

In relational algebra notation, we use Greek letters to denote different operations, subscripts are the parameters of the operation and the order of execution is from the innermost base relation to the outermost operation.

A **query tree** or **query plan** illustrates this order of execution with base relations as the leaves and as you move up the tree, you apply each operator in sequence to the intermediate output relation of the previous operation.

The following table provides a quick cheat sheet for the relational algebra notation. We will discuss the different operations and the composition rules in the following videos.

Notation | Operator |
---|---|

\(A_i\) | An attribute (column) \(A_i\) |

\(\mathbb{A}\) | A set of attributes; \(\mathbb{A}_R\) denotes a subset of attributes from R |

\(R, S, T, ...\) | Relations \(R, S, T\) |

\(c, \theta\) | A logical Boolean expression, e.g. \(A_1 > 2 \wedge A_2 \leq A_3 \vee A_4<5\) |

\(\pi_{A_1, ..., A_k}(R)\) | Projection: select columns \(A_1, ... , A_k\) from the relation \(R\) |

\(\sigma_{c}(R)\) | Selection: select rows from \(R\) that satisfy \(c\) |

\(\rho_{\{R', (... i \rightarrow A'_i ...)\}}(R)\) | Renaming: rename relation \(R\) to \(R'\) and columns at position \(i\) to \(A'_i\). |

\(\rho_{\{R', (... A_i \rightarrow A'_i ...)\}}(R)\) | Renaming: rename relation \(R\) to \(R'\) and column \(A_i\) to \(A'_i\). |

\(R \cup S\) | Union: Combine rows from \(R\) and \(S\) if they are compatible relations (same attributes in \(R\) and \(S\)) |

\(R - S\) | Difference: Remove rows from \(R\) that are in \(S\) if they are compatible relations |

\(R \cap S\) | Intersection: Select rows that exist in both \(R\) and \(S\) if they are compatible relations |

\(R \times S\) | Cross product or Cartesian product: For each row in \(R\) pair it with all the rows in \(S\) |

\(R \bowtie_\theta S\) | A theta-join: This is equivalent to \(\sigma_\theta(R \times S)\) |

\(R \bowtie S\) | A natural join on all matching column names or primary-foreign keys followed by projecting only one column from each pair of matching columns. |

We will now examine the basic unary operators: projection \(\pi\), selection \(\sigma\) and renaming \(\rho\).

## 2.2 Binary Operations

The three basic binary set operations from which we can derive other operations are union \(\cup\), difference \(-\) and cross-product \(\times\).

## 2.3 Relational Alegbra to SQL

In addition to the basic relational operations as proposed by Codd, there are several practical extensions that account for allowing duplicates in relational algebra and address data analysis needs such as ranking or sorting the data (sort operator \(\tau\)), computing statistics over the data (\({\sf min, max, sum, count, avg, ...}\)) and partitioning the data into different strata/groups/etc (\(\gamma\)).

With these extensions, we now have a mapping from SQL - the declarative query langague interface to relational databases - to relational algebra.

## 2.4 Relational Algebra Equivalences

As we will see later in the class a query planner translates SQL into relational algebra query plans. There are multiple equivalent relational expressions for the same SQL query and there are multiple SQL queries that are equivalent. These equivalences allow a query optimizer to explore different query plans to choose one that is efficient or “optimal.”

As you learn more about SQL, you will be able to construct equivalent relational algebra expressions for them. Here are a few simple ones to get you started.

- Projections: \(\pi_{{\sf aid, name}}({\sf Animals})\)

`select aid, name from Animals;`

- Selections & Renaming \(\rho_{\sf Infants, (age\rightarrow months)}(\sigma_{{\sf age < 2}}({\sf Animals}))\)

```
create table Infants as (
select aid, name, species, age*12 as months, feedtime
from Animals
where age < 2);
```

- Cross-Products & Theta-Joins \(\sigma_{\sf Animals.aid=LivesIn.aid \wedge LivesIn.eid=Enclosures.eid}({\sf Animals \times LivesIn \times Enclosures})\) \({\sf Animals}\bowtie_{\sf Animals.aid=LivesIn.aid} {\sf LivesIn} \bowtie_{\sf LivesIn.eid=Enclosures.eid}{\sf Enclosures}\)

```
select *
from Animals, LivesIn, Enclosures
where Animals.aid = LivesIn.aid and
= Enclosures.eid; LivesIn.eid
```

- Natural Joins \({\sf Animals}\bowtie {\sf LivesIn} \bowtie {\sf Enclosures}\)

```
select * from
natural join LivesIn natural join Enclosures; Animals
```

Testing if two queries are equivalent is a challenging theoretical problem! There are classes of equivalence. For example, *view equivalence* just means that when you execute two different queries over the same database, you end up with the same result. These queries may not actually be equivalent, in the sense that if you change the input database relations, you might end up with different results.