OVERVIEW

The third programming project is to add support for executing queries in your database system. You will implement executors that are responsible for taking query plan nodes and executing them. You will create executors that perform sequential scans, inserts, hash joins, and aggregations. Because the DBMS does not support SQL (yet), your implementation will operate directly on hand-written query plans.
第三个编程项目是在数据库系统中添加支持执行查询的功能。您将实现执行器,负责接收查询计划节点并执行它们。您将创建执行器来执行顺序扫描、插入、哈希连接和聚合。由于DBMS尚不支持SQL,因此您的实现将直接在手写的查询计划上操作。

We will use the Iterator query processing model (i.e., the Volcano model). Every query plan executor implements a Next function. When the DBMS invokes an executor’s Next function, the executor either returns a single tuple or indicates that there are no more tuples. This lets the executor implement a loop that just keeps calling Next on its children to get and process their tuples.
我们将使用迭代器查询处理模型(即Volcano模型)。每个查询计划执行器都实现了一个Next函数。当DBMS调用执行器的Next函数时,执行器要么返回一个单独的元组,要么指示没有更多的元组。这使得执行器能够实现一个循环,只需不断调用其子节点的Next来获取和处理它们的元组。

As optional reading, you may be interested in following a select statement through the PostgreSQL internals.
作为可选阅读,您可能会对跟踪 PostgreSQL 内部的 select 语句感兴趣。

The project is comprised of the following tasks:
该项目由以下任务组成:

  • Task #1 - Creating a Catalog Table
    任务#1 - 创建目录表

  • Task #2 - Executors
    任务#2 - 执行器

  • Task #3 - Linear Probe Hash Table Returns
    任务#3 - 线性探测哈希表返回

This is a single-person project that will be completed individually (i.e., no groups).
这是一个单人项目,个人完成(即无组)。

Please post all of your questions about this project on Piazza. Do not email the TAs directly with questions. The instructor and TAs will not debug your code. You may verbally describe test cases on Piazza, but do not post an excessive amount of testing code on Piazza either.
请在 Piazza 上发布有关该项目的所有问题。请勿直接通过电子邮件向助教提问。讲师和助教不会调试您的代码。您可以在 Piazza 上口头描述测试用例,但请勿在 Piazza 上发布过多的测试代码。

Release Date: Oct 21, 2019
发布日期:2019年10月21日

Due Date: Nov 17, 2019 @ 11:59pm
截止日期:2019年11月17日 @ 11:59pm

PROJECT SPECIFICATION

项目规范

Like the previous projects, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you do this, then it will break the test code that we will use to grade your assignment and you will end up getting no credit for the project. If a class already contains certain member variables, you should not remove them. You may however add private helper functions and member variables to these classes.
和之前的项目一样,我们提供了包含您需要实现的 API 的存根类。您不应该修改这些类中预定义函数的签名。如果您这样做,将会破坏我们用于评分的测试代码,您将无法得到该项目的任何学分。如果某个类已经包含某些成员变量,您不应该删除它们。但是,您可以向这些类添加私有辅助函数和成员变量。

The correctness of this project depends on the correctness of your implementation of previous projects, we will not provide solutions or binary files.
该项目的正确性取决于您对之前项目的正确实现,我们不会提供解决方案或二进制文件。

ADVICE FROM THE TAS

助教建议

  • Task #1 is much easier than the others. Task #3 is open-ended. Budget your time wisely.
    任务#1比其他任务容易得多。任务#3是开放式的。明智地分配时间。

  • Tasks #1 and #2 do not rely on a working linear probe hash table implementation. Focus on getting those first.
    任务#1和#2不依赖于一个完全工作的线性探测哈希表实现。首先专注于这些任务。

  • Conceptual understanding is important for this project. It will make your life much easier.
    对于这个项目,概念理解很重要。它将让你的生活变得更容易。

TASK #1 - CREATING A CATALOG TABLE

任务#1 - 创建目录表

A database maintains an internal catalog to keep track of meta-data about the database. For example, the catalog is used to answer what tables are present and where. For more details, refer back to Lecture #04 - Database Storage (Part II).
数据库维护一个内部目录,用于跟踪关于数据库的元数据。例如,目录用于回答哪些表存在以及它们的位置。有关更多详细信息,请参考Lecture #04 - Database Storage (Part II)。

In this warm-up task, you will be modifying src/include/catalog/simple_catalog.h to that allow the DBMS to add new tables to the database and retrieve them using either the name or internal object identifier (table_oid_t). You will implement the CreateTable, GetTable(const std::string &table_name), and GetTable(table_oid_t table_oid) methods. You should find these tasks straightforward, the goal is to familiarize yourself with the catalog and how it interacts with the rest of the system.
在这个热身任务中,你将修改src/include/catalog/simple_catalog.h以允许数据库管理系统向数据库添加新表,并使用表名或内部对象标识符(table_oid_t)来检索表。你需要实现CreateTable、GetTable(const std::string &table_name)和GetTable(table_oid_t table_oid)方法。这些任务应该比较简单,目标是让你熟悉目录以及它如何与系统的其他部分交互。

TESTING

There is a sample test in test/catalog/catalog_test.cpp.
test/catalog/catalog_test.cpp中有一个样例测试。

TASK #2 - EXECUTORS

In the second task, you will implement executors for sequential scans, inserts, hash joins, and aggregations. For each query plan operator type, there is a corresponding executor object that implements the Init and Next methods. The Init method is for setting up internal state about the invocation of the operator (e.g., retrieving the corresponding table to scan). The Next method provides the iterator interface that returns a single tuple on each invocation (or null if there are no more tuples).
在第二个任务中,您将实现顺序扫描(sequential scan)、插入(insert)、哈希连接(hash join)和聚合(aggregation)的执行器。对于每个查询计划操作类型,都有相应的执行器对象,实现了Init和Next方法。Init方法用于设置关于操作调用的内部状态(例如,检索要扫描的相应表)。Next方法提供了迭代器接口,每次调用返回一个元组(如果没有更多元组,则返回空)。

The executors that you will implement are defined in the following header files:
您需要实现的执行器定义在以下头文件中:

  • src/include/execution/executors/seq_scan_executor.h
  • src/include/execution/executors/insert_executor.h
  • src/include/execution/executors/hash_join_executor.h
  • src/include/execution/executors/aggregation_executor.h

We assume executors are single-threaded throughout the entire project. You are also free to add private helper functions and class members as you see fit.
我们假设执行器在整个项目中都是单线程的。您可以根据需要添加私有辅助函数和类成员。

To understand how the executors are created at runtime during query execution, refer to the ExecutorFactory (src/include/execution/executor_factory.h) helper class. Moreover, every executor has an ExecutorContext (src/include/execution/executor_context.h) that maintains additional state about the query. Finally, we have provided sample tests for you in test/execution/executor_test.cpp.
要了解在查询执行期间如何在运行时创建执行器,请参阅ExecutorFactory(src/include/execution/executor_factory.h)助手类。此外,每个执行器都有一个ExecutorContext(src/include/execution/executor_context.h),用于维护有关查询的其他状态。最后,我们在test/execution/executor_test.cpp中为您提供了示例测试。

Plan nodes are the input format or blueprint for the executors that you will be implementing. Plan nodes and executors are both tree-like; they accept tuples from their children and give tuples to their parent. They may rely on conventions about the ordering of their children, which we will describe below.
查询计划节点是执行器的输入格式或蓝图,您将要实现的执行器和查询计划节点都是树形结构;它们从其子节点接收元组并将元组提供给其父节点。它们可能依赖于关于子节点排序的约定,我们将在下面进行描述。

SEQUENTIAL SCANS

Sequential scans iterate over a table and return its tuples one-at-a-time. A sequential scan is specified by a SeqScanPlanNode. The plan node specifies which table to iterate over. The plan node may also contain a predicate; if a tuple does not satisfy the predicate, it is skipped over.
顺序扫描(Sequential scans)会对表进行迭代,并逐个返回其元组。顺序扫描由SeqScanPlanNode指定。计划节点指定要迭代的表。计划节点还可以包含谓词;如果元组不满足谓词条件,则跳过该元组。

Hint: Be careful when using the TableIterator object. Make sure that you understand the difference between the pre-increment and post-increment operators. You may find yourself getting strange output by switching between ++iter and iter++.
提示:在使用TableIterator对象时要小心。确保理解前增量运算符和后增量运算符之间的区别。如果在++iter和iter++之间切换,可能会得到奇怪的输出。

INSERT

Inserts add tuples to tables. Inserts are specified by an InsertPlanNode. There are two types of inserts: raw inserts, where the values to be inserted are embedded directly inside the plan node itself, or not-raw inserts, which take the values to be inserted from a child executor. For example, you could have an InsertPlanNode whose child was a SeqScanPlanNode to copy one table into another.
插入操作(Inserts)将元组添加到表中。插入操作由InsertPlanNode指定。有两种类型的插入操作:原始插入(raw inserts)和非原始插入(not-raw inserts)。原始插入操作中,要插入的值直接嵌入在计划节点本身中,而非原始插入操作则从子执行器中获取要插入的值。例如,您可以将子执行器设为SeqScanPlanNode,以将一个表复制到另一个表中。

HASH JOIN

Hash joins are used to combine the results of two child executors together. In this project, you just need to implement a basic hash join. For more details on hash joins, refer to the Lecture #11 notes on join algorithms. In this project, by convention the left child is used to build the hash table and the right child is used to probe.
哈希连接(Hash joins)用于将两个子执行器的结果组合在一起。在这个项目中,您只需要实现一个基本的哈希连接操作。关于哈希连接的更多详细信息,请参考Lecture #11 notes上的连接算法。在这个项目中,按照惯例,左子执行器用于构建哈希表,右子执行器用于探测(probe)。

We are providing you with a SimpleHashJoinHashTable implementation. We strongly recommend that you use this hash table and obtain full credit for this task before proceeding to Task #3.
我们为您提供了一个SimpleHashJoinHashTable的实现。我们强烈建议您使用这个哈希表,并在完成这个任务之前获得满分,然后再继续进行第三个任务。

We have also provided you with a HashValues function, which may be useful.
我们还为您提供了一个HashValues函数,可能会有用。

Hint: You will want to make use of the predicate in the hash join plan node. In particular, take a look at AbstractExpression::EvaluateJoin, which handles the left tuple and right tuple and their respective schemas. Note that this returns a Value, which you can GetAs<bool>.
提示:您需要使用哈希连接计划节点中的谓词。特别是,请查看AbstractExpression::EvaluateJoin函数,该函数处理左元组、右元组及其各自的模式。请注意,该函数返回一个Value对象,您可以使用GetAs方法来获取布尔值。

AGGREGATION

Aggregations are used to combine multiple tuple results from a single child executor into a single tuple. In this project, we ask you to implement COUNT, SUM, MIN, and MAX.
聚合操作用于将来自单个子执行器的多个元组结果合并为单个元组。在这个项目中,我们要求您实现COUNT、SUM、MIN和MAX聚合函数。

Note that we provide you with a SimpleAggregationHashTable. We strongly recommend that you use this hash table. We have trimmed down the course project and you do not need to do this, but you may be interested in seeing what it takes to use your LinearProbeHashTable from Project #2 instead.
请注意,我们为您提供了一个SimpleAggregationHashTable。我们强烈建议您使用这个哈希表。我们简化了课程项目,您不需要使用Project #2中的LinearProbeHashTable,但您可能会有兴趣看看如何使用它。

Hint: You will want to make use of the having expression in the aggregate plan node. In particular, take a look at AbstractExpression::EvaluateAggregate, which handles the groupbys of the key and the aggregates of the value. Note that this returns a Value, which you can GetAs<bool>`.
提示:您需要使用聚合计划节点中的HAVING表达式。特别是,请查看AbstractExpression::EvaluateAggregate函数,该函数处理键的分组和值的聚合。请注意,该函数返回一个Value对象,您可以使用GetAs方法来获取布尔值。

TESTING

We have provided sample tests in test/execution/executor_test.cpp.
我们已经在test/execution/executor_test.cpp中提供了示例测试。

TASK #3 - LINEAR PROBE HASH TABLE RETURNS

任务#3 - 线性探测哈希表返回

Intermission: We strongly recommend submitting to Gradescope at this point. This provides a backup of your submission, and you want to make sure that you get full credit for Tasks #1 and #2 before proceeding further.
中场休息:我们强烈建议在此时提交给Gradescope。这将提供您提交的备份,并确保在继续进行之前获得任务#1和任务#2的全部学分。

This task will require you to modify your hash join executor. You will now use your Linear Probe Hash Table from Project #2 instead of the simplified hash join hash table, henceforth referred to as JHT.
这个任务要求您修改哈希连接执行器。您现在将使用项目#2中的线性探测哈希表,而不是简化的哈希连接哈希表,以后将称为JHT。

Note: Task #3.1 can be skipped entirely. As long as you receive full credit for Task #3.2, you will also automatically receive full credit for Task #3.1.
注意:任务#3.1可以完全跳过。只要您获得了任务#3.2的全部学分,您也将自动获得任务#3.1的全部学分。

It is likely that you will need to understand explicit template instantiation. You have already seen code that used explicit template instantiation in Project #2.
您可能需要理解显式模板实例化。您已经在项目#2中看到了使用显式模板实例化的代码。

TASK #3.1 - LINEAR PROBE HASH TABLE, TMPTUPLEPAGE

任务#3.1 - 线性探测哈希表,TmpTuplePage

This part of the project is intended to be open-ended. Again, receiving full credit for Task #3.2 will automatically give you full credit for Task #3.1.
这部分项目旨在开放性。同样,只要您获得了任务#3.2的全部学分,您将自动获得任务#3.1的全部学分。

You are provided, but are not required to use, the following stub classes:
我们为您提供了以下存根类,但不要求您使用:

  • TmpTuplePage at src/include/storage/page/tmp_tuple_page.h. In our implementation, we wanted a simplified version of a TablePage. Although this is not necessary, we found that this will make the development easier.
    TmpTuplePage位于src/include/storage/page/tmp_tuple_page.h。在我们的实现中,我们需要一个简化版本的TablePage。虽然这不是必需的,但我们发现这将使开发工作更容易。

  • TmpTuple at src/include/storage/table/tmp_tuple.h. In our implementation, this solved certain issues that come from instantiating LinearProbeHashTable. You should think carefully about what those issues are.
    TmpTuple位于src/include/storage/table/tmp_tuple.h。在我们的实现中,这解决了一些由实例化LinearProbeHashTable引起的问题。您应该仔细考虑这些问题是什么。

To provide more scaffolding to students who prefer a more structured project, implementing missing functionality in TmpTuplePage will earn you full credit for Task #3.1. However, this may constrain you to working towards our solution. You are also free to implement the missing TmpTuplePage functionality and not use any of it.
对于希望进行更结构化项目的学生,实现TmpTuplePage中缺失的功能将为您赢得任务#3.1的全部学分。然而,这可能会限制您朝我们的解决方案发展。您也可以实现缺失的TmpTuplePage功能,但不使用其中的任何部分。

TASK #3.2 - LINEAR PROBE HASH TABLE, HASH JOIN

任务#3.2 - 线性探测哈希表,哈希连接

Lastly, you will modify your hash join executor in src/include/execution/executors/hash_join_executor.h to use your hash table from Project #2. You may find TmpTuplePage useful, but there are other solutions. As long as the type of your hash table used in the hash join is a LinearProbeHashTable, you will obtain full credit.
最后,您将修改src/include/execution/executors/hash_join_executor.h中的哈希连接执行器,以使用您在“项目#2”中的哈希表。您可能会发现TmpTuplePage有用,但也有其他解决方案。只要在哈希连接中使用的哈希表类型是LinearProbeHashTable,您将获得全部学分。

Note: Depending on how you choose to use the LinearProbeHashTable, you may need to remove the uniqueness constraint from it. Feel free to do so. In our solution involving TmpTuplePage and TmpTuple, we did not need to.
注意:根据您选择如何使用LinearProbeHashTable,您可能需要从中删除唯一性约束。请随意这样做。在我们涉及TmpTuplePage和TmpTuple的解决方案中,我们不需要这样做。

You may also add functions to src/include/execution/executor_context.h if you find that helpful.
如果发现有用,您还可以将函数添加到src/include/execution/executor_context.h中。

附录