ganglia

Ganglia是UC Berkeley发起的一个开源集群监视项目,设计用于测量数以千计的节点。
Ganglia的核心包含gmond、gmetad以及一个Web前端。主要是用来监控系统性能,如:
cpu 、mem、硬盘利用率, I/O负载、网络流量情况等,通过曲线很容易见到每个节点
的工作状态,对合理调整、分配系统资源,提高系统整体性能起到重要作用。

Install and config Ganglia on CDH5

Install

Server node

#### 1. install packages

1
sudo apt-get install ganglia-monitor ganglia-webfrontend gmetad

2. edit /etc/ganglia/gmond.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sample gmond.conf setting:
cluster {
name = "CDH5"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

udp_send_channel {
/* mcast_join = xxx */
host = xxx.xxx.xxx.1 /* your ganglia server ip */
port = 8649
ttl = 1
}

udp_recv_channel {
/* mcast_join = xxx */
port = 8649
/* bind = xxx */
}

3. scp /etc/ganglia/gmond.conf to all the other nodes

4. edit /etc/ganglia/gmetad.conf

sample gmond.conf setting:
data_source "CDH5" xxx.xxx.xxx.1 xxx.xxx.xxx.2 xxx.xxx.xxx.3 ...

Other nodes

1. install ganglia monitor

1
sudo apt-get install ganglia-monitor

2. copy gmond.conf from server node to /etc/ganglia

1
sudo mv gmond.conf /etc/ganglia/

3. restart ganglia monitor service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
sudo service ganglia-monitor restart


Config: CDH5 components Hadoop Metrics2 setting
hadoop-metrics2.properties

HDFS
NameNode
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.period=10
namenode.sink.ganglia.servers=10.10.114.120:8649

DataNode
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.period=10
datanode.sink.ganglia.servers=10.10.114.120:8649

SecondaryNameNode
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.period=10
secondarynamenode.sink.ganglia.servers=10.10.114.120:8649

YARN
ResourceManager
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.period=10
resourcemanager.sink.ganglia.servers=10.10.114.120:8649
maptask.sink.ganglia.servers=10.10.114.120:8649
reducetask.sink.ganglia.servers=10.10.114.120:8649

NodeManager
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.period=10
nodemanager.sink.ganglia.servers=10.10.114.120:8649
maptask.sink.ganglia.servers=10.10.114.120:8649
reducetask.sink.ganglia.servers=10.10.114.120:8649

HBase
Master Server
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
hbase.sink.ganglia.period=10
hbase.sink.ganglia.servers=10.10.114.120:8649

Region Server
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
hbase.sink.ganglia.period=10
hbase.sink.ganglia.servers=10.10.114.120:8649

*(there are too many metrics generated from RegionServer, each region of all of tables generates some metrics... be careful if you turn on the ganglia hadoop metrics2 sink on HBase RegionServer)