AIX & Linux OS agent
LPAR2RRD OS agent is a solution for these of you who wish to get further metrics which can be obtained only from Operating System level.
CPU
|
CPU Queue
|
Memory
|
LAN
|
SAN
|
SAN IOPS
|
SAN RESP
|
More examples on our
demo site.
OS agent metrics and features
- AIX FC errors in graphs (FC physical adapters only)
- Linux: CPU core and CPU GHz graphs
- Linux: Total IOPS, Data and Latency graphs
- SAN multipath monitoring
- JOB TOP, CPU and Memory tracking of running processes graphically in the time
-
OS CPU utilization of user/sys/IO wait/idle in %
-
CPU queue: load average, blocked processes / raw / direct IO
-
Memory utilization of used/FS cache/free memory in MB
-
Paging rate in MB/sec
-
Paging space utilization in %
-
SAN (FC & vSCSI) throughput per adapter
- data in MB/sec
- IO/sec
- response time (latency)
-
LAN (ethernet) throughput per adapter
- data in MB/sec
- packet count
-
Total IO throughput (Linux only, v7.20+)
- IOPS
- Data in MB/sec
- response time (latency)
-
Filesystem capacity utilization
-
AIX SEA (Shared Ethernet Adapter) throughput per adapter in MB/sec (IBM Power only)
-
AIX WLM (Workload Manager) monitoring (IBM Power only)
-
AIX AME (Active Memory Expansion) allocation (IBM Power only)
-
Solaris Memory Pools
Operating systems
Implementation
it is implemented as simple client/server application.
There is LPAR2RRD daemon listening on the host where LPAR2RRD server is running on port
8162 (IANA official port assigned to LPAR2RRD project).
Each LPAR has installed simple Perl based agent which is started every minute from the crontab and saves memory and paging statistics into a temporary file.
The agent contacts the server every 15-25 minutes and sends all locally stored data for that period.
Agent prerequisites
- Perl interpreter. All Unix/Linux systems contain Perl in basic installation.
- It might run under whatever user account, it does not need any special privileges in the OS.
- Opened TCP communication between each LPAR and LPAR2RRD server on port 8162.
- Connections are initiated from LPARs.
- Additional disk space on LPAR2RRD server (about 40MB per each monitored LPAR)
OS agent
release notes
Usage
perl lpar2rrd-agent.pl [-s ] [-d] [-c] [-n ] [-b ] [-i ] <LPAR2RRD server hostname/IP>[:<PORT>]
-d forces sending out data immediately to check communication channel (DEBUG purposes)
-c agent collects & sends only internal HMC data
-n agent sends only NMON data from NMON directory <NMON_DIR>
-b path to Hitachi HvmSh API
-i IP address of HVM (Hitachi Virtualization Manager)
-t <max send time in seconds>
-s <step in seconds>, do not set < 60, do not forget to update crontab line accordingly e.g. -s 300 means in crontab */5 for minutes
-m using sudo for multipath (only root can run it): sudo multipath -l", put this into sudoers: lpar2rrd ALL = (root) NOPASSWD: /usr/sbin/multipath -ll
options -c and -n are mutual exclusive
options -b and -i are both required for Hitachi agent
no option - agent collects & sends standard OS agent data
Crontab entry for scheduling, use non admin account preferably
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <LPAR2RRD server hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
The agent collects data and sends them every 5 - 20 minutes to LPAR2RRD server
If you use other than standard LPAR2RRD port then place if after SERVER by ':' delimiter
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <LPAR2RRD server hostname/IP>:<PORT> > /var/tmp/lpar2rrd-agent.out 2>&1
If you want to send data to more LPAR2RRD server instances (number is not restricted)
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <LPAR2RRD server 1 hostname/IP> <LPAR2RRD server 2 hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
NMON usage
Documentation
/usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -n <LPAR2RRD server hostname/IP> >/var/tmp/lpar2rrd-agent-nmon.out 2>&1
crontab usage: run it either every 10 minutes to process new data collected by nmon or once a day to get all day in once
0,10,20,30,40,50 * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -n <NMON_DIR> <LPAR2RRD server hostname/IP> > /var/tmp/lpar2rrd-agent-nmon.out 2>&1
-n option disables normal agent data collection, only NMON data is collected.
Use 2 separate crontab lines to get standard OS agent data & NMON data load
HMC usage
It works only for HMC CLI (ssh) connected HMCs
Documentation
. /home/lpar2rrd/lpar2rrd/etc/lpar2rrd.cfg; /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -c <LPAR2RRD server hostname/IP> >/var/tmp/lpar2rrd-agent-hmc.out 2>&1
crontab usage: run it every 5 minutes
0,5,10,15,20,25,30,35,40,45,50,55 * * * * . /home/lpar2rrd/lpar2rrd/etc/lpar2rrd.cfg; /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -c <LPAR2RRD server hostname/IP> > /var/tmp/lpar2rrd-agent-hmc.out 2>&1
-c option disables normal agent data collection, only internal HMC data is collected.
Use 2 separate crontab lines to get standard OS agent data & HMC data load
Notice: when option -c, agent needs some Env info, change the path to etc/lpar2rrd.cfg file according your installation
Hitachi Compute Blade (BladeSymphony) usage
place it into crontab
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -b <HVMSH_PATH> -i <LPAR2RRD server hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
Use 2 separate crontab lines to get standard OS agent data & HITACHI data load
Enhanced setting
-
default behaviour is, that the agent tries randomly send data to the LPAR2RRD server between 5 - 20 mins
you can specify max time when data is send, minimum is 5 minutes
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -t <max send time in seconds> <LPAR2RRD server hostname/IP>
-
How to avoid SAN checks via fcstat (they might cause some problems, it should not happen in v4.50+)
* * * * * FCSTAT=/bin/true /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <LPAR2RRD server hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
-
By default only interfaces which have IP address assiggned are reported, by env variable can this be skiped and selection is done base on LPAR2RRD_LAN_INT env var, it allows regex only for Linux, be carefull here to do not stack in 1 graph interfaces from different virtualization level what might lead to creasing of presented traffic by counting some traffic more times
* * * * * LPAR2RRD_LAN_INT="eth.*0$,bond.*,rhevm,9.*" /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <LPAR2RRD server hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
Debug