hive install

1、安装mysql-5.5

** From server2

  • (1)、rpm包安装mysql
    1
    2
    3
    4
    5
    6
    # rpm -qa | grep mysql  [查询是否自带mysql]
    # rpm -e mysql-libs-5.1.71-1.el6.x86_64 --nodeps [不验证依赖卸载]
    # rpm -qa | grep mysql [再次查看卸载是否成功]
    # rpm -i MySQL-server-5.5.40-1.linux2.6.x86_64.rpm [安装服务端]
    # mysqld_safe & [后台启动,jobs查看]
    # rpm -i MySQL-client-5.5.40-1.linux2.6.x86_64.rpm [安装客户端]

or

1
2
3
4
5
6
7
$ yum install mysql-server
$ /etc/init.d/mysqld start
$ chkconfig --add mysqld
$ chkconfig mysqld on
$ chkconfig --list|grep mysqld

$ mysql_secure_installation #设置初始化信息
  • (2)、修改数据库配置信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# mysql_secure_installation
NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MySQL
SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY!

In order to log into MySQL to secure it, we'll need the current
password for the root user. If you've just installed MySQL, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none): //当前密码,第一次使用为null
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MySQL
root user without the proper authorisation.

Set root password? [Y/n] Y //设置密码
New password: /admin
Re-enter new password: //admin
Password updated successfully!
Reloading privilege tables..
... Success!

By default, a MySQL installation has an anonymous user, allowing anyone
to log into MySQL without having to have a user account created for
them. This is intended only for testing, and to make the installation
go a bit smoother. You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] n //是否删除匿名用户
... skipping.

Normally, root should only be allowed to connect from 'localhost'. This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] n //不允许root远程登录
... skipping.

By default, MySQL comes with a database named 'test' that anyone can
access. This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] n //是否删除测试数据库
... skipping.

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] Y //重新加载权限表
... Success!

Cleaning up...

All done! If you've completed all of the above steps, your MySQL
installation should now be secure.

Thanks for using MySQL!
  • (3)、登录mysql数据库验证
1
# mysql -uroot -padmin
  • (4)创建数据库并授权
    1
    2
    3
    4
    5
    6
    #  mysql -uroot -padmin
    mysql> create database hive;
    [授权hive在任何位置(%)远程可以登陆]
    mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';
    mysql> grant all on hive.* to 'hive'@'server2' identified by 'hive';
    mysql> flush privileges; [刷新权限]
  • (5)使用的是SQLyog数据库可视化软件连接验证

2、hive-0.13.1的安装

  • (1)、解压
    1
    # tar -zxvf apache-hive-0.13.1-bin.tar.gz -C /usr/local/
  • (2)、scp hive-0.13.1 [hive安装在server2]
    1
    # scp -rq hive-0.13.1 server2:/usr/local/
  • (3)、cp配置文件

    1
    2
    3
    4
    # cp hive-exec-log4j.properties.template hive-exec-log4j.properties
    # cp hive-log4j.properties.template hive-log4j.properties
    # cp hive-env.sh.template hive-env.sh
    # cp hive-default.xml.template hive-site.xml
  • (4)、配置hive-config.sh

    1
    2
    3
    4
    vi /usr/local/hive-0.13.1/bin/hive-config.sh
    export JAVA_HOME=/usr/local/jdk1.7.0_45
    export HIVE_HOME=/usr/local/hive-0.13.1
    export HADOOP_HOME=/usr/local/hadoop-2.4.0
  • (5)、配置hive-site.xml [mysql做metastore]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    //连接地址,基本配置
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://server2:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
    <property>
    //mysql驱动
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    </property>
    //用户名
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    </property>
    //用户密码
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    </property>
    //默认数据库位置
    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    //hive远程metastore的thrift地址
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://server2:9083</value>
    </property>


    //hiveserver2的配置
    <property>
    <name>hive.support.concurrency</name>
    <description>Enable Hive's Table Lock Manager Service</description>
    <value>true</value>
    </property>
    <property>
    <name>hive.zookeeper.quorum</name>
    <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
    <value>server1,server2,server3</value>
    </property>
    <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
    <description>The port of zookeeper servers to talk to. This is only needed for read/write locks.</description>
    </property>
    <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>server2</value>
    <description>Bind host on which to run the HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>
    </property>

    //关闭推测式执行,一些优化项
    <property>
    <name>hive.mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
    </property>
    <property>
    <name>mapreduce.reduce.speculative</name>
    <value>false</value>
    </property>
    小表mapjoin
    <property>
    <name>hive.ignore.mapjoin.hint</name>
    <value>false</value>
    </property>
    <property>
    <name>hive.mapjoin.smalltable.filesize</name>
    <value>500000000</value>
    </property>
    并行执行
    <property>
    <name>hive.exec.parallel</name>
    <value>true</value>
    </property>
    <property>
    <name>hive.exec.parallel.thread.number</name>
    <value>16</value>
    </property>
    客户端显示
    <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
    </property>
    <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
    </property>
    关闭自动统计
    <property>
    <name>hive.stats.autogather</name>
    <value>false</value>
    </property>
  • (6)、cp mysql-connector-java-5.1.26-bin.jar 到hive-0.13.1/lib

  • (7)、启动metastore

    1
    # hive-0.13.1/bin/hive --service metastore &       [jobs查看]
  • (8)、启动hive
    1
    # hive-0.13.1/bin/hive
  • (9)、启动hiveserver2 [jdbc服务]

    1
    2
    3
    4
    5
    6
    # hive-0.13.1/bin/hive --service hiveserver2 --hiveconf hive.server2.thrift.port=14000 start &

    [hsu@server2 ~]$ lsof -i :14000
    COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
    java 3460 hsu 306u IPv4 29485 0t0 TCP server2:scotty-ft->server1:18243 (ESTABLISHED)
    java 3460 hsu 313u IPv4 29484 0t0 TCP server2:scotty-ft (LISTEN)
  • (10)、测试hiveserver2

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
     使用beeline控制台连接hiveserver2:
    $ /usr/local/hive-0.13.1/beeline> !connect jdbc:hive2://server2:14000 -uhsu -phsu org.apache.hive.jdbc.HiveDriver

    $ beeline --help
    Usage: java org.apache.hive.cli.beeline.BeeLine
    -u <database url> the JDBC URL to connect to
    -n <username> the username to connect as
    -p <password> the password to connect as
    -d <driver class> the driver class to use

    # hive1.2.1测试

    !connect jdbc:hive2://server2:14000 -udefault -nhsu -phsu -dorg.apache.hive.jdbc.HiveDriver

    [hsu@server1 ~]$ beeline
    Beeline version 1.2.1 by Apache Hive
    beeline> !connect jdbc:hive2://server2:14000
    Connecting to jdbc:hive2://server2:14000
    Enter username for jdbc:hive2://server2:14000:
    Enter password for jdbc:hive2://server2:14000:
    Error: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":hsu:supergroup:drwx------
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:208)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:171)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4140)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) (state=,code=0)

    解决:
    [hsu@server2 ~]$ hadoop fs -ls /tmp
    Found 2 items
    drwx-wx-wx - hsu supergroup 0 2015-08-13 14:57 /tmp/hive
    drwx------ - hsu supergroup 0 2015-08-06 13:03 /tmp/hsu

    [hsu@server2 ~]$ hadoop fs -chmod 777 /tmp
    [hsu@server2 ~]$ hadoop fs -ls /
    drwxrwxrwx - hsu supergroup 0 2015-08-13 14:57 /tmp

    0: jdbc:hive2://server2:14000> !connect jdbc:hive2://server2:14000
    Connecting to jdbc:hive2://server2:14000
    Enter username for jdbc:hive2://server2:14000: hsu
    Enter password for jdbc:hive2://server2:14000: ***
    Connected to: Apache Hive (version 1.2.1)
    Driver: Hive JDBC (version 1.2.1)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    1: jdbc:hive2://server2:14000> create table test(id int);
    No rows affected (20.356 seconds)
    1: jdbc:hive2://server2:14000>

    1: jdbc:hive2://server2:14000> insert into table test values (1), (2), (3);
    INFO : Number of reduce tasks is set to 0 since there's no reduce operator
    WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    INFO : number of splits:1
    INFO : Submitting tokens for job: job_1439448136822_0001
    INFO : The url to track the job: http://server1:23188/proxy/application_1439448136822_0001/
    INFO : Starting Job = job_1439448136822_0001, Tracking URL = http://server1:23188/proxy/application_1439448136822_0001/
    INFO : Kill Command = /home/hsu/hadoop/bin/hadoop job -kill job_1439448136822_0001
    INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    INFO : 2015-08-13 16:00:31,907 Stage-1 map = 0%, reduce = 0%
    INFO : 2015-08-13 16:01:22,538 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.56 sec
    INFO : MapReduce Total cumulative CPU time: 1 seconds 560 msec
    INFO : Ended Job = job_1439448136822_0001
    INFO : Stage-3 is selected by condition resolver.
    INFO : Stage-2 is filtered out by condition resolver.
    INFO : Stage-4 is filtered out by condition resolver.
    INFO : Moving data to: hdfs://mycluster/user/hive/warehouse/test/.hive-staging_hive_2015-08-13_15-59-12_208_7322597341621072670-1/-ext-10000 from hdfs://mycluster/user/hive/warehouse/test/.hive-staging_hive_2015-08-13_15-59-12_208_7322597341621072670-1/-ext-10002
    INFO : Loading data to table default.test from hdfs://mycluster/user/hive/warehouse/test/.hive-staging_hive_2015-08-13_15-59-12_208_7322597341621072670-1/-ext-10000

    1: jdbc:hive2://server2:14000> select * from test;
    +----------+--+
    | test.id |
    +----------+--+
    | 1 |
    | 2 |
    | 3 |
    +----------+--+
    3 rows selected (0.324 seconds)
  • (11)、hive安装目录加入环境变量中

1
2
3
4
5
6
7
8
9
   # vi /etc/profile

附录1:hiveserver1和hiveserver2的区别
Hiveserver1 和hiveserver2的JDBC区别:
HiveServer version Connection URL Driver Class
HiveServer2 jdbc:hive2://: org.apache.hive.jdbc.HiveDriver
HiveServer1 jdbc:hive://: org.apache.hadoop.hive.jdbc.HiveDriver

参考地址:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

3、hive的语法介绍

  • (1) 创建
1
2
hive (default)> create table t1(int id);     [创建表]
hive (default)> create table t2(id int,name string) row format delimited fields terminated by '\t'; [指定行分割符]
  • (2) 加载数据
    1
    2
    hive (default)> load data local inpath '/usr/local/testdata/id' into table t1;  [linux本地加载数据到hive,copy数据]
    hive (default)> load data local inpath '/usr/local/name' into table t2; [linux本地加载数据到hive,copy数据]
  • (3) 删除表
1
2
hive (default)> drop table t1;      [受控表managed table,从hdfs删除数据,metastore元数据信息删除]
mysql> select * from TBLS; [TBL_TYPE字段记录表类型MANAGED_TABLE or other]

4 受控表(MANAGED_TABLE)包括内部表,分区表,桶表。

  • (5) 分区表[数据在不同文件目录中,扫描范围]
1
2
3
4
5
6
hive (default)> create table t3(id int,name string) partitioned by(grade int) row format delimited fields terminated by '\t';    [创建分区表]
hive (default)> load data local inpath '/usr/local/name' into table t3 partition(grade=1); [加载数据]
hive (default)> select * from t3 where grade=1; [分区字段可以作用过滤条件,是一个文件夹的标识]

hive (default)> create table enroll(id int,name string) partitioned by(year int,month int) row format delimited fields terminated by '\t';
load data local inpath '/usr/local/name' into table enroll partition(year=2014,month=11-12);
  • (6) 桶表(bucket table)
    1
    2
    3
    4
    5
    6
    表链接中使用,提高效率
    hive (default)> create table bucket_test(id int,name string) clustered by(id) into 3 buckets; [按照id分为3个桶]
    hive (default)> set hive.enforce.bucketing=true; [启用桶表]

    insert overwrite table buckets select ... from ... [插入数据]
    hive (default)> insert overwrite table bucket_test select id,name from t2;
  • (7) 外部表EXTERNAL_TABLE[删除表,只删除表定义,对HDFS的数据不会删除]{外部分区表}
    1
    2
    3
    # hadoop fs -put name /testdata/external_teble
    hive (default)> create external table external_test(id int,name string) row format delimited fields terminated by '\t' location '/testdata/external_table';
    hive (default)> select id,name from external_test order by id desc;

5、命令行工具

(1) # hive
(2) # hive -e "select * from t2";
(3) # hive -e "select * from t2" >> a
(4) # hive -S -e "select * from t2" >> a
(5) # hive --hiveconf hive.querylog.location=/usr/local/hive-0.13.1/logs
(6) hive (default)> set hive.querylog.location;
    hive.querylog.location=/tmp/root
(7) hive -f file
(8) source file
(9) /root/.hiverc 和 /root/.hivehistory 

搜狗数据
    use sougoulibs;
    create external table sogou(dt string, websession string, word string, s_seq int, c_seq int, website string) 
    row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/labs/sogou/'; 

6、hcatlog

参考:https://cwiki.apache.org/confluence/display/Hive/HCatalog

7、hive-1.2.1 cli启动报错

1
2
3
4
5
6
7
mysql> select * from VERSION;
+--------+----------------+------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+------------------------------------+
| 1 | 1.2.0 | Set by MetaStore hsu@192.168.2.201 |
+--------+----------------+------------------------------------+
1 row in set (0.00 sec)
1
2
3
4
5
[hsu@server2 ~]$ hive
Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin/conf/hive-log4j.properties
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)

解决:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[hsu@server2 ~]$ ls hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar 
hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar
[hsu@server2 ~]$ ls hive/lib/jline-2.12.jar
hive/lib/jline-2.12.jar

[hsu@server2 ~]$ mv hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar.bak

[hsu@server2 ~]$ cp hive/lib/jline-2.12.jar hadoop/share/hadoop/yarn/lib/

[hsu@server2 ~]$ ls hadoop/share/hadoop/yarn/lib/jline-*
hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar.bak hadoop/share/hadoop/yarn/lib/jline-2.12.jar

[hsu@server2 ~]$ hive
Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin/conf/hive-log4j.properties
hive (default)> show tables;
OK
tab_name
Time taken: 2.386 seconds

由于使用的jline版本不一致导致问题~!

7. Hive 2.0.1 FAQ

启动报错:

1
2
3
 Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)

解决:
在hive-site.xml文件中加入如下配置解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<property>
<name>datanucleus.readOnlyDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>

hive 2.0.1 beeline权限验证问题

hive 2.0.1 and tez 0.7.1 兼容性问题

1、整合hive 2.0.1 and tez 0.7.1发现一些class包没用被加载,提示是没用找到类。
2、通过降低hive版本为 1.2.1发现没用任何问题,可以正常执行tez任务!
3、社区查看hive2.0.1依赖的tez版本是0.8.3,打算重新手动编译在测试是否可行,待验证。
4、这一点可以说明tez这个包各个版本直接接口的兼容性非常差,导致升级代价大,难以维护!

原创文章,转载请注明: 转载自Itweet的博客
本博客的文章集合: http://www.itweet.cn/blog/archive/