6 Database Indexes: B+ Trees

Week 6: March 6 - March 12, 2022

COWS Chapter 10 Tree-Structured Indexing
COWS Sections 20.2-20.4 (Fig 20.2 is quite cool) How to select indexes?

Tree Indexing on Gradescope.

Tree Indexes

6.1 What is a database index?

In this module, we will look into index access methods and how indexes enable efficient table access by attribute search keys. We first discuss database indexes and their types broadly.

6.2 Tree Indexes

We will explore tree indexes in this unit and hash indexes during class.

Tree indexes (B+ trees in particular) provide logarithmic complexity equality lookups and range searches (Hash indexes only provide constant time equality lookups) and take up \(O(B)\) size where \(B\) is the number of data pages in the base heap file.

B+ trees are a popular DB tree index. In the following video, we describe the set of search operations B+ trees can (and cannot) efficiently support.

6.3 How a B+ tree works?

We describe first the properties of a B+ tree and its general structure.

6.3.1 Properties of a B+ tree

A B+ tree is

An m-way search tree
Self-balancing: It grows and shrinks from the root as data is inserted or deleted.
It maintains two invariants:
- Key invariant: Each entry in an internal node is a search key such that the pointer to the left subtree of the entry contains all keys that are less than this search key and the pointer to the right subtree contains all keys that are \(\geq\) this search key.
- Occupancy invariant:
  - Except the root, all nodes are at least half-full.
  - If \(F\) is the fanout, then each internal node has at least \(d\) entries (and \(d+1\) children) where \(d = (F-1)/2\)
  - The root must have at least \(1\) entry and \(2\) children or is a leaf with \(1\) to \(F-1\) entries
  - The maximum number of entries in any node is \(F-1\)
Data is only stored in the leaf nodes not in the internal nodes.

We don’t show sibling pointers in our B+ tree diagrams for visual simplicity but they do exist and they are important especially for range scans as you don’t need to traverse up to the parent from a leaf node when finding the next leaf node!

6.3.2 Insertions & Deletions

To understand how insertions work, Miro Mannino and I created this custom visualization for you. You can see all the intermediate steps for each insertion. You can select the fanout, the number of keys to insert and whether you wish values to be inserted in sorted order or in random order.

# of keys to insert: fanout: sort keys before insertion?

6.3.3 B+ trees in practice

6.3.4 Bulk Loading

6.4 Cost Analysis

COWS Revisit all of Chapter 8!

6.5 Index & Database Design Considerations

After completing this module, revisit the Benchmarking worksheet that gets you started on different indexing experiments within postgres.

5 The Database File Abstraction

7 Buffer Pool Management