AlertingLPAR2RRD has build-in alerting feature. This is not implemented for VMware yet, only IBM Power Systems.
You can define alarms based on performance data for any CPU pool or LPAR in your environment.
This feature does not intend to replace your standard alerting you have already in place.
It is rather enhanced functionality especially for critical CPU pools and lpars.
There are 3 main reasons why use it:
Possibility to monitor whole physical servers.
This is something what usual monitoring tools are not able to provide for IBM Power™ platform
Possibility to monitor CPU pools.
Same as for above, you can hardly find it in usual monitoring/alerting tools for IBM Power™ platform
Alarming here is based on "real physical consumption".
There is still a lot of monitoring tools which are not aware of virtualization and provide not accurate data for CPU alerting/monitoring.
- Emailing. You can place direct email address on each directive, use email groups or default email address.
Nagios support. You can configure Nagios to pick up alarms from LPAR2RRD via standard NRPE module.
LPAR2RRD Nagios plug-in installation
- External alerting via external shell script. Each alert can invoke defined script with given parameters. You can use it for your integration needs.
- SNMP trap: it is implemented since 4.96, follow this to configure it
- Alert plug-ins to other monitoring tools can be developed on demand especially for customers under support contract
Example shows a rule for CPU pool of server p795 which issue an aleart when CPU pool utilization overcome 10 cores or goes below 1 core.
percentage of maximum CPOU utilization what can CPU pool or LPAR reach. It is CPU pool maximum cores in a pool or in case of LPAR it is number of logical (virtual) CPUs.
This is supported since LPAR2RRD version 4.80.
Example shows a rule for CPU pool of server p795 which issue an alert when CPU pool utilization overcome 80% of maximal utilization or goes below 5%.
This is configurable
- CPU maximum and minimum for alert issuing in CPU cores or in percentage (place % sign after the value)
- Time of CPU peak. When CPU average utilization is in given time in average above the limit
- You can create different email groups and direct alarms to them
- CPU warning in percentage of CPU Critical alarm
- Alert retention. Time between alerting of the same issue
How to start
- Create configuration file
(upgrade process creates configuration file automatically, so you might skip this)
$ cd /home/lpar2rrd/lpar2rrd $ ./scripts/update_cfg_alert.shit creates this configuration file: etc/alert.cfg
- edit ./etc/alert.cfg and configure alerts
place into crontab following script:
0,10,20,30,40,50 * * * * /home/lpar2rrd/lpar2rrd/load_alert.sh > /home/lpar2rrd/lpar2rrd/load_alert.out 2>&1
Check whether emailing is working from LPAR2RRD hosted server
Replace your_addr\@lpar2rrd.com by your email, place "\" before "@":
perl -le 'print "To: your_addr\@lpar2rrd.com\nSubject: LPAR2RRD test\n\nJust a test\n\n"'|/usr/sbin/sendmail -t
when you want to refresh list of servers/pools/lpars within alert.cfg then just run again:
Note: If there are configured hundreds of lpars or CPU pools for alerting then it might have impact on performance of the HMCs.
After each big change in alert configuration run ./load_alert.sh from the cmd line to find out typical run-time duration (it is printed out at the end).
It should not be too close of the time range when it is scheduled from crontab (10minutes typically, we do not recommend less)
Once you have installed and configured OS agents then you might configure alerting for paging activity.
You need at first configure alerting generally if you do not use it yet:
Follow Alerting install instructions
Then edit etc/alert.cfg:
$ vi etc/alert.cfg #SWAP:server:lpar name:swapping in kB/sec::peek time in min:alert repeat time in min:email group #======================================================================================================================== SWAP:.*:.*:10:::email@example.comAbove example will alert for every server and lpar if paging goes above 10kByes per second in 10 minute average.
Alerts will be send to email firstname.lastname@example.org