Hive的各种调优设置

1、reduce个数

1
2
3
4
5
.hive.exec.reduces.bytes.per.reducer
.mapred.reduce.tasks=-1
CDH5:
hive (default)> set mapred.reduce.tasks;
mapred.reduce.tasks=-1

2、权限问题

1
2
3
4
.hive.warehouse.subdir.inherit.perms
CDH5:默认是true
hive (default)> set hive.warehouse.subdir.inherit.perms;
hive.warehouse.subdir.inherit.perms=true

3、HiveServer2的内存问题

1
2
.设置-Xmx越大越好
-Xmx=2048m 甚至-Xmx=4g

4、关闭推测式执行

1
2
3
4
5
6
7
8
CDH5 default:
.mapred.map.tasks.speculative.execution=true
.mapred.reduce.tasks.speculative.execution=true
update back:
hive (default)> set mapred.map.tasks.speculative.execution;
mapred.map.tasks.speculative.execution=false
hive (default)> set mapred.reduce.tasks.speculative.execution;
mapred.reduce.tasks.speculative.execution=false

5、客户端

1
2
. hive.cli.print.current.db
. hive.cli.print.header

6、并行执行

1
2
3
4
5
6
7
.hive.exec.parallel
.hive.exec.parallel.thread.number
hive (default)> set hive.exec.parallel;
cdh5:
hive.exec.parallel=true
hive (default)> set hive.exec.parallel.thread.number;
hive.exec.parallel.thread.number=16

7、MapJoin

1
2
3
4
5
6
7
8
9
10
11
12
13
14
.hive.auto.convert.join
.hive.mapjoin.smalltable.filesize
.hive.ignore.mapjoin.hint #忽略MAPJOIN写法,而是自动检查是否转换
CDH5:
hive (default)> set hive.auto.convert.join;
hive.auto.convert.join=true
hive (default)> set hive.mapjoin.smalltable.filesize;
hive.mapjoin.smalltable.filesize=500000000
hive (default)> set hive.ignore.mapjoin.hint;
hive.ignore.mapjoin.hint=false

例子:
hive (hsu_tmp)> select /*+MAPJOIN(tableX)*/ tableA.id,tableX.id from tableA join tableX on tableA.id=tableX.id;
#tableX自动转换为小表,如果这个表过大,会出现oom问题

8、LocalModel [全局关闭local执行模式]

1
2
3
4
5
6
7
8
9
10
.hive.exec.mode.local.auto
.hive.exec.mode.local.auto.input.files.max
.hive.exec.mode.local.auto.inputbytes.max
CDH5:
hive (hsu_tmp)> set hive.exec.mode.local.auto;
hive.exec.mode.local.auto=false
hive (hsu_tmp)> set hive.exec.mode.local.auto.input.files.max;
hive.exec.mode.local.auto.input.files.max=4
hive (hsu_tmp)> set hive.exec.mode.local.auto.inputbytes.max;
hive.exec.mode.local.auto.inputbytes.max=134217728

9、conf setting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
推测式执行
<property>
<name>hive.mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>false</value>
</property>
小表mapjoin
<property>
<name>hive.ignore.mapjoin.hint</name>
<value>false</value>
</property>
<property>
<name>hive.mapjoin.smalltable.filesize</name>
<value>500000000</value>
</property>
并行执行
<property>
<name>hive.exec.parallel</name>
<value>true</value>
</property>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>16</value>
</property>
客户端显示
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
关闭自动统计
<property>
<name>hive.stats.autogather</name>
<value>false</value>
</property>

10、查询表可以带引号来过滤

1
2
3
4
5
6
7
8
9
10
11
hive (default)> show tables '*sample*';
OK
tab_name
sample_07
sample_08
Time taken: 0.057 seconds, Fetched: 2 row(s)
hive (default)> show tables 'e*';
OK
tab_name
etl_data_source
Time taken: 0.056 seconds, Fetched: 1 row(s)
  1. thriftserver并发支持多少个会话session,默认100个
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    <property>
    <name>hive.server2.thrift.min.worker.threads</name>
    <value>0</value>
    </property>

    <property>
    <name>hive.server2.thrift.max.worker.threads</name>
    <value>1024</value>
    </property>

    <property>
    <name>hive.server2.async.exec.threads</name>
    <value>100</value>
    </property>