Linux Monitoring

Skip Prerequisites, Web and LPAR2RRD tabs in case of configuring Virtual Appliance, Docker or a Container

HW sizing


Consider usage of our brand new full-stack infrastructure monitoring tool XorMon Next Generation as LPAR2RRD replacement.
It brings a new level of infrastructure monitoring by relying on a modern technology stack.
In particular, reporting, exporting, alerting and presentation capabilities are unique on the market.

Follow installation procedure for your operating system platform

Install LPAR2RRD server (all under lpar2rrd user)

  • Download the latest LPAR2RRD server
    Upgrade your already running LPAR2RRD instance.

  • Install it:
    # su - lpar2rrd
    $ tar xvf lpar2rrd-7.XX.tar
    $ cd lpar2rrd-7.XX
    $ ./install.sh
    $ cd /home/lpar2rrd/lpar2rrd
    
  • Make sure all Perl modules are in place
    cd /home/lpar2rrd/lpar2rrd
    . etc/lpar2rrd.cfg; $PERL bin/perl_modules_check.pl
    
    If there is missing "LWP::Protocol::https" then check this docu to fix it

  • Enable Apache authorisation
    su - lpar2rrd
    umask 022
    cd /home/lpar2rrd/lpar2rrd
    cp html/.htaccess www
    cp html/.htaccess lpar2rrd-cgi
    
  • Schedule to run it from lpar2rrd crontab (it might already exist there)
    $ crontab -l | grep load.sh
    $
    
    Add if it does not exist as above
    $ crontab -e
    
    # LPAR2RRD UI
    0,30 * * * * /home/lpar2rrd/lpar2rrd/load.sh > /home/lpar2rrd/lpar2rrd/load.out 2>&1 
    
    Assure there is just one such entry in crontab.

  • You might need to add lpar2rrd user into /etc/cron.allow (Linux) or /var/adm/cron/cron.allow (AIX) if 'crontab -e' command fails
    Allow it for lpar2rrd user as root user.
    # echo "lpar2rrd" >> /etc/cron.allow
    
  • Initial start from cmd line:
    $ cd /home/lpar2rrd/lpar2rrd
    $ ./load.sh
    
  • Go to the web UI: http://<your web server>/lpar2rrd/
    Use Ctrl-F5 to refresh the web browser cache.

Troubleshooting

  • If you have any problems with the UI then check:
    (note that the path to Apache logs might be different, search apache logs in /var)
    tail /var/log/httpd/error_log             # Apache error log
    tail /var/log/httpd/access_log            # Apache access log
    tail /var/tmp/lpar2rrd-realt-error.log    # STOR2RRD CGI-BIN log
    tail /var/tmp/systemd-private*/tmp/lpar2rrd-realt-error.log # STOR2RRD CGI-BIN log when Linux has enabled private temp
    
  • Test of CGI-BIN setup
    umask 022
    cd /home/lpar2rrd/lpar2rrd/
    cp bin/test-healthcheck-cgi.sh lpar2rrd-cgi/
    
    go to the web browser: http://<your web server>/lpar2rrd/test.html
    You should see your Apache, LPAR2RRD, and Operating System variables, if not, then check Apache logs for connected errors
Deploy the OS agent on each monitored server, no matter whether stand-alone, or virtualized.
The agent is written in Perl and calls basic OS commands to obtain required statistics like vmstat, iostat.

Additional information about the OS agent:

Prerequisites

  • Perl
  • Opened TCP communication between each Linux and LPAR2RRD server on port 8162. Connections are initiated from Linux side.
  • Additional disk space on LPAR2RRD server (about 40MB per each monitored Linux)
  • Create preferable dedicated user lpar2rrd on each Linux with minimum rights
    # useradd -c "LPAR2RRD agent user" -m lpar2rrd
    

OS agent installation

  • Get the latest OS agent from download page

  • Linux installation under root
    # rpm -Uvh lpar2rrd-agent-6.00-0.noarch.rpm
    # rpm -qa|grep lpar2rrd-agent
      lpar2rrd-agent-6.00-0
    
  • Linux Debian
    # apt-get install lpar2rrd-agent_6.00-0_all.deb
      lpar2rrd-agent-6.00-0
    
  • Schedule its run every minute from the crontab on every Linux.
    This line must be placed into crontab for the lpar2rrd user (or any other user, preferably unprivileged):
    # su - lpar2rrd
    $ crontab -e 
    * * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <LPAR2RRD-SERVER> > /var/tmp/lpar2rrd-agent.out 2>&1
    
    Replace <LPAR2RRD-SERVER> by hostname of your LPAR2RRD server.

  • You might need to add lpar2rrd user into /var/adm/cron/cron.allow under root user if above "crontab -e" fails.
    # echo "lpar2rrd" >> /var/adm/cron/cron.allow
    
  • The Linux host will appear in the LPAR2RRD server UI in the "Linux" folder one hour later, use Ctrl-F5 to refresh your web browser.

Troubleshooting

  • Client (agent) side:
    • Test if communication through the LAN is allowed.
      $ telnet  <LPAR2RRD-SERVER> 8162
        Connected to 192.168.1.1   .
        Escape character is '^]'.
      
      This is ok, exit either Ctrl-C or ^].

    • Check following agent files:
      data store: /var/tmp/lpar2rrd-agent-*.txt
      error log: /var/tmp/lpar2rrd-agent-*.err
      output log: /var/tmp/lpar2rrd-agent.out

    • run the agent from cmd line:
      $ /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -d <LPAR2RRD-SERVER>
        ...
        Agent send     : yes : forced by -d 
        Agent send slp: sending wait: 4
        OS agent working for server: <LPAR2RRD-SERVER>
        store file for sending is /var/tmp/lpar2rrd-agent-<LPAR2RRD-SERVER>-lpar2rrd.txt
      
      It means that data has been sent to the server, all is fine
      Here is example when the agent is not able to sent data :
      $ /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -d <LPAR2RRD-SERVER>
        ...
        Agent send     : yes : forced by -d 
        Agent send slp: sending wait: 1
        OS agent working for server: <LPAR2RRD-SERVER>
        store file for sending is /var/tmp/lpar2rrd-agent-<LPAR2RRD-SERVER>-lpar2rrd.txt
        Agent timed out after : 50 seconds /opt/lpar2rrd-agent/lpar2rrd-agent.pl:265
      
      It means that the agent could not contact the server.
      Check communication, port, above telnet example, DNS resolution of the server etc.

  • Server side:
    • test if the daemon on LPAR2RRD server is running, and checking the logs
      $ ps -ef|grep lpar2rrd-daemon
        lpar2rrd 10617010 1 0 Mar 16 - 0:00 /usr/bin/perl -w /home/lpar2rrd/lpar2rrd/bin/lpar2rrd-daemon.pl
      $ cd /home/lpar2rrd/lpar2rrd
      $ tail logs/error.log-daemon
      $ tail logs/daemon.out
        new server has been found and registered: Linux (lpar=linuxhost01)
        mkdir : /lpar2rrd/data/Linux/no_hmc/linuxhost01/
      
      It means that new OS agent has been registered from linuxhost01 (Linux stand-alone example)

    • Test if OS agent data is being stored on the LPAR2RRD server and have a current timestamp:
      $ cd /home/lpar2rrd/lpar2rrd
      $ ls -l data/<server name>/*/<Linux name>/*mmm
        -rw-r--r-- 2 lpar2rrd staff  7193736 Mar 17 16:16 data/<server name>/no_hmc/<Linux name>/cpu.mmm
        -rw-r--r-- 2 lpar2rrd staff  7193736 Mar 17 16:16 data/<server name>/no_hmc/<Linux name>/lan-en1.mmm
        -rw-r--r-- 2 lpar2rrd staff 10790264 Mar 17 16:16 data/<server name>/no_hmc/<Linux name>/mem.mmm
        -rw-r--r-- 2 lpar2rrd staff  7193736 Mar 17 16:16 data/<server name>/no_hmc/<Linux name>/pgs.mmm
        -rw-r--r-- 2 lpar2rrd staff  7193736 Mar 17 16:16 data/<server name>/no_hmc/<Linux name>/san-vscsi0.mmm
        -rw-r--r-- 2 lpar2rrd staff  3597208 Mar 17 16:16 data/<server name>/no_hmc/<Linux name>/san_resp-vscsi0.mmm
      $ find data -name mem.mmm -exec ls -l {} \;
        ...
      
  • In case of a problem check our forum or contact us via support@lpar2rrd.com.
    We would need this data for start of troubleshooting.