5 The Database File Abstraction

Week 5: February 27 - March 5, 2022

Overview of the DBMS Internals: COWS 1.8
Disk & Storage Hierarchy: COWS 9.1
The Disk Space Manager: COWS 9.3
Records, Pages & Files: COWS 9.5-9.7
Cost Model Teaser: COWS 8.4.1-8.4.3 + Read ahead 8.4.4-8.4.7 knowing that we haven’t covered indexing yet!

Disks & Files on Gradescope.

Disks & Files

5.1 An overview of DBMS architecture

Clarification: I noticed that at the end of the above video I say that we don’t update at the level of a field or a record: I just mean that when we update a record say through a transaction, we still have to read in the whole page that contains the record and write back the whole page even if we only changed a single byte in the whole page!

5.2 Disks, Storage Hierarchy & The Disk Space Manager

5.2.1 What about solid state drives (SSDs) ?

To better understand the characteristics of SSDs, you can watch this optional video.

The crux of it is that SSDs do give you fast, fine-granularity, random reads, which makes the need for ‘sequential layouts’ less pressing, but writes are much slower (than reads) and need to be on larger units (not as fine-grained as reads). So:

we still have to buffer many writes together,
we still have a random/sequential performance difference at least with respect to writes and
while SSDs are much faster than HDDs they are still much much slower than main memory so we haven’t flattened the storage hierarchy.

So even though they have different characteristics than HDDs, they don’t really require a full rethink of our DBMS storage system.

More importantly, HDDs are here to stay for some time. SSDs do improve performance and do reduce the variability that we see when reading and writing to hard disks but they are still expensive. It is more cost effective for large database systems to store the data on disks rather than SSDs.

For small databases, it is often the case that they can fit entirely in today’s main memory. There are main-memory databases that can provide insane performance benefits for these tiny databases. MySQL provides a main-memory backed backend that you can experiment with. You still need to back your data on a durable storage device but with a main-memory DBMS most of the query and transactional processing occurs in memory.

You may want to read this paper on a memory-optimized transactional DBMS called Hekaton. What is exciting to understand here is that when everything becomes memory-resident, other bottlenecks become amplified. In this course, we will focus mostly on reducing IO costs. This is the bottleneck if we have to bring our data back and forth from disk (pluto). But now, that everything is memory-resident (the data is in Dubai), the CPU or the instructions executed per transaction (update query) becomes the bottleneck! The transaction management sections are advanced at this point of the class but the first few sections are well within your reach as an up and coming database systems engineer.

Note: Some DBMSs like SQLite actually store everything in a single file. Others might have a separate file for each database table or for each index. Only the DBMS can decipher the contents of a database file even if it uses the OS to create the file. The OS file system can’t interpret any of the bytes in these files.

5.3 Layouts

5.3.1 The Record

The physical representation of a record is just bytes. Only the DBMS can interpret those bytes. The data types that are necessary to interpret these records are found in a catalog not within the bytes of the record. We will look at different representation options depending on whether the record has fixed-length or variable-length attributes.

5.3.2 The Page

Note: record ids are not your primary keys! This is a physical unique identifier of each record that helps the DBMS storage system locate a record. Applications cannot rely on record ids to identify or search for a record.

Note: The slotted pages in the book has the slot directory in the footer of the page rather than the header. Both layouts are equivalent!

5.3.3 The Heap File

5.3.4 Cost Models

In video above, we introduce the notion of workloads. In general, there are two main classes of workloads to consider:

OLTP: Online Transactional Processing or transactional workloads for short. These workloads typically have many fast, simple queries that operate on and usually write to a single entity at a time. For example, the database backing an online store should be tuned for OLTP workloads.
OLAP: Online Analytical Processing or analytical workloads for short. These workloads typically have long-running, complex queries that read large amounts of data. For example, analyzing the database of an online store to find the most popular products that are bought togetheer in different places is a complex analytical workload.

Hybrid Transactional + Analytical Processing (HTAP) systems try to support both workloads on the same database.

5.4 Row Stores vs. Column Stores

Thinking about data as entities and records is so ingrained that it is natural to think about physically storing a “record of information” or “a row within a table.” This is the traditional relational record-oriented storage system: the row-store. It is often fun to break the way we see things and rethink everything. What if we store data by columns instead of rows?

Column-stores do exactly that. They redesign the DBMS storage system and in turn the entire DBMS to store data column-by-column instead of row-by-row. This opens up a world of possibilities in terms of how we access data for efficient read-optimized queries, compress data, etc.

However, there is often no one-size that fits all. Column stores, like any systems design choice, make trade-offs. They immensely speed up analytical workloads at the cost of write-heavy, transactional workloads.

You may want to read the intro of the C-Store paper to get an understanding how storing data by columns instead of by rows leads to a redesign of the DBMS and the performance characteristics of column-stores. The paper goes into transactional support over column-stores, which is a bit advanced at this stage of the course.

As a fun exercise, download MonetDB, an open-source column-store and compare the performance of running some the analytical queries from problem set 1 (Appendix D) on both MonetDB and Postgres.

4 Normalization

6 Database Indexes: B+ Trees