The second programming project is to implement a hash table for the BusTub DBMS that is backed by disk storage. Your hash table is responsible for fast data retrieval without having to search through every record in a database table.
第二个编程项目是为BusTub DBMS实现一个支持磁盘存储的哈希表。你的哈希表负责快速检索数据,而无需在数据库表中搜索每条记录。

You will need to implement a hash table using the linear probing hashing scheme. It is a giant slot array that spans across multiple pages where each block page holds a hash table block. The table will access pages through your buffer pool from Project #1. The table contains a header page to map blocks to pages. Your hash table needs to support growing in size if it runs out of space (i.e., every slot is full).

You will need to complete the following tasks in your hash table implementation:

  • Page Layouts

  • Hash Table Implementation

  • Table Resizing

  • Concurrency Control

This is a single-person project that will be completed individually (i.e., no groups).

Release Date: Sep 30, 2019

Due Date: Oct 20, 2019 @ 11:59pm
截止日期:2019年10月20日 @ 11:59pm



Like the first project, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you do this, then it will break the test code that we will use to grade your assignment you end up getting no credit for the project. If a class already contains certain member variables, you should not remove them. But you may add private helper functions/member variables to these classes in order to correctly realize the functionality.

The correctness of the linear probe hash table index depends on the correctness of your buffer pool implementation. We will not provide solutions for the previous programming projects.


任务#1 - 页面布局

Your hash table is meant to be accessed through the DBMS’s BufferPoolManager. This means that you cannot allocate memory to store information. Everything must be stored in disk pages so that they can read/written from the DiskManager. If you create a hash table, write its pages to disk, and then restart the DBMS, you should be able to load back the hash table from disk after restarting.

To support reading/writing hash table blocks on top of pages, you will implement two Page classes to store the data of your hash table. This is meant to teach you how to allocate memory from the BufferPoolManager as pages.

  • Hash Table Header Page
    Hash Table Header 页面

  • Hash Table Block Page
    Hash Table Block 页面



This class holds all of the meta-data for the hash table. It is divided into the fields as shown by the table below:

变量名称 大小 描述
page_id_ 4字节 自身页面ID
size_ 4字节 哈希表可以容纳的键值对数量
next_ind_ 4字节 添加新条目到block_page_ids_的下一个索引
lsn_ 4字节 日志序列号(在第四个项目中使用)
block_page_ids_ 4080字节 块页面ID的数组

The block_page_ids_ array maps block ids to page_id_t ids. The ith element in block_page_ids_ is the page_id for the ith block.
block_page_ids_数组将块ID映射到page_id_t ID。block_page_ids_中的第i个元素是第i个块的page_id。

You must implement your Hash Table Header Page in the designated files. You are only allowed to modify the header file (src/include/page/hash_table_header_page.h) and its corresponding source file (src/page/hash_table_header_page.cpp).


The Hash Table Block Page holds three arrays:

  • occupied_ : The ith bit of occupied_ is 1 if the ith index of array_ has ever been occupied.
  • occupied_:如果array_的第i个索引曾经被占用,则occupied_的第i个位为1。

  • readable_ : The ith bit of readable_ is 1 if the ith index of array_ holds a readable value.

  • readable_:如果array_的第i个索引持有可读的值,则readable_的第i个位为1。

  • array_ : The array that holds the key-value pairs.

  • array_:保存键值对的数组。

The number of slots available in a Hash Table Block Page depends on the types of the keys and values being stored. You only need to support fixed-length keys and values. The size of keys/values will be the same within a single hash table instance, but you cannot assume that they will be the same for all instances (e.g., hash table #1 can have 32-bit keys and hash table #2 can have 64-bit keys).

You must implement your Hash Table Block Page in the designated files. You are only allowed to modify the header file (src/include/page/hash_table_block_page.h) and its corresponding source file (src/page/hash_table_block_page.cpp).

Each Hash Table Header/Block page corresponds to the content (i.e., the byte array data_) of a memory page fetched by buffer pool. Every time you try to read or write a page, you need to first fetch the page from buffer pool using its unique page_id, then reinterpret cast to either a header or a block page, and unpin the page after any writing or reading operations.
每个哈希表 Header/Block 页面对应于通过缓冲池获取的内存页面的内容(即字节数组data_)。每当你尝试读取或写入页面时,你需要使用其唯一的page_id


任务 #2 - 哈希表实现

You will implement a hash table that uses the linear probing hashing scheme. It needs to support insertions (Insert), point search (GetValue), and deletions (Remove).

Your hash table must support both unique and non-unique keys. Duplicate values for the same key are not allowed. This means that (key_0, value_0) and (key_0, value_1) can exist in the same hash table, but not (key_0, value_0) and (key_0, value_0). The Insert method only returns false if it tries to insert an existing key-value pair.
你的哈希表必须支持唯一键和非唯一键。同一个键的重复值是不允许的。这意味着在同一个哈希表中可以存在 (key_0, value_0) 和 (key_0, value_1) 这样的键值对,但不能存在 (key_0, value_0) 和 (key_0, value_0) 这样的情况。Insert 方法只有在尝试插入已存在的键值对时才会返回 false。

Your hash table implementation must hide the details of the key/value type and associated comparator, like this:

template <typename KeyType,
typename ValueType,
typename KeyComparator>
class LinearProbeHashTable {
// ---

These classes are already implemented for you:

  • KeyType: The type of each key in the hash table. This will only be GenericKey, the actual size of GenericKey is specified and instantiated with a template argument and depends on the data type of indexed attribute.
    KeyType:哈希表中每个键的类型。这将只是 GenericKey,GenericKey 的实际大小由模板参数指定并实例化,取决于索引属性的数据类型。

  • ValueType: The type of each value in the hash table. This will only be 64-bit RID.
    ValueType:哈希表中每个值的类型。这将只是 64 位 RID。

  • KeyComparator: The class used to compare whether two KeyType instances are less/greater-than each other. These will be included in the KeyType implementation files.
    Note that, to compare whether two ValueType instances are equal to each other, you can use the == operator.
    KeyComparator:用于比较两个 KeyType 实例是否小于/大于彼此的类。这些将包含在 KeyType 实现文件中。
    请注意,要比较两个 ValueType 实例是否相等,可以使用 == 运算符。


任务 #3 - 表格调整大小

The linear probing hashing scheme uses a fized-size table. When the hash table is full, then any insert operation will get stuck in an infinite loop because the system will walk through the entire slot array and not find a free space. If your hash table detects that it is full, then it must resize itself to be twice the current size (i.e., if currently has n slots, then the new size will be 2×n).

Since any write operation could lead to a change of header_page_id in your hash table, it is your responsibility to update header_page_id in the header page (src/include/page/header_page.h) to ensure that the container is durable on disk.
由于任何写操作都可能导致哈希表中 header_page_id 的变化,所以你有责任更新头页(src/include/page/header_page.h)中的 header_page_id,以确保容器在磁盘上是持久的。


任务 #4 - 并发控制

Up to this this point you could assume that your hash table only supported single-threaded execution. In this last task, you will modify your implementation so that it supports multiple threads reading/writing the table at the same time.

You will need to have latches on each block so that when one thread is writing to a block other threads are not reading or modifying that index as well. You should also allow multiple readers to be reading the same block at the same time.

You will need to latch the whole hash table when you need to resize. When resize is called, the size that the table was when resize was called is passed in as an argument. This is so that if the table was resized while a thread was waiting for the latch, it can immediately give up the latch and attempt insertion again.



  • You are not allowed to use a global scope latch to protect your data structure for each operation. In other words, you may not lock the whole container and only unlock the latch when operations are done.

  • We are providing you a ReaderWriterLatch (src/include/common/rwlatch.h) that you can use in your hash table. There are also helper functions in the Page base class to acquire and release latches (src/include/page/page.h).
    我们提供了一个 ReaderWriterLatch (src/include/common/rwlatch.h) 可以在你的哈希表中使用。在 Page 基类中也有一些辅助函数可以获取和释放闩锁 (src/include/page/page.h)。

  • Sone of your hash table functions will be given a Transaction (src/include/concurrency/transaction.h) object. This object provides methods to store the page on which you have acquired latch while traversing through the Hash Table.
    你的一些哈希表函数将获得一个 Transaction (src/include/concurrency/transaction.h) 对象。这个对象提供了在遍历哈希表过程中存储已获得闩锁页面的方法。