Nagios Tuning

Posted on Posted in Nagios, Uncategorized

Welcome to another Sysadmin & DBA Tips, in this entry I’ll explain how increase your Nagios performance installation in case that your monitoring system check a large number (> 1,000) of hosts and services.

Nagios tunning

Introduction

When our installation grows, the number of host and services to be checked increases exponentially with consequent Nagios server resource consumption.

Although having enough CPU power on a Nagios server is important, the biggest hardware limitation to a Nagios system is disk I\O. A large Nagios installation creates an enormous amount of disk activity, and if the hard disk can’t keep up with the constant traffic flow that needs to happen, even a large number of CPU’s are simply going to wait for the disk in order to write new information to the disk. This can cause check latencies to soar even though the CPU usage appears within a safe range.

To resolved this problem the most efficient method consist by adding RAM disk on the local filesystem, that contain the Nagios check results, cache objects and host/service performance data.

The files affected are:

  • /usr/local/nagios/var/status.dat – This is the bread and butter file of all of the “live” information on the monitoring environment. This file gets updated every 10-20 seconds (as specified in nagios.cfg) with all current status information.
  • /usr/local/nagios/var/objects.cache – This file stores all of the object configuration data for Nagios. This file only gets updated upon a restart of the Nagios process.
  • /usr/local/nagios/var/host-perfdata & service-perfdata – These files may be in a different location for Core install, but this file functions as an intermediary file for PNP’s NPCD daemon that processes performance data results. This file gets updated about every 10-15 seconds.
  • /usr/local/nagios/var/spool – This directory tree acts as a dropbox for all incoming check results. The disk activity in this directory is almost constant, since both Nagios and NPCD are continually creating result files, and then reaping the results every X number of seconds.

Another aspect to take intoaccount consist in set some Nagios core options, like:

    • use_large_installation_tweaks – Enabling this option allows the Nagios daemon to take certain shortcuts which result in lower system load and better performance.
    • enable_environment_macros – This option determines whether or not the Nagios daemon will make all standard macros available as environment variables to your check, notification, event hander, etc. commands. In large Nagios installations this can be problematic because it takes additional memory and (more importantly) CPU to compute the values of all macros and make them available to the environment.
    • free_child_process_memory – This option determines whether or not Nagios will free memory in child processes when they are fork()ed off from the main process. By default, Nagios frees memory. However, if the use_large_installation_tweaks option is enabled, it will not. By defining this option in your configuration file, you are able to override things to get the behavior you want.
    • child_processes_fork_twice – This option determines whether or not Nagios will fork() child processes twice when it executes host and service checks. By default, Nagios fork()s twice. However, if the use_large_installation_tweaks option is enabled, it will only fork() once. By defining this option in your configuration file, you are able to override things to get the behavior you want.
    • check_result_reaper_frequency – This option allows you to control the frequency in seconds of check result “reaper” events. “Reaper” events process the results from host and service checks that have finished executing. These events consitute the core of the monitoring logic in Nagios.
    • max_check_result_reaper_time – This option allows you to control the maximum amount of time in seconds that host and service check result “reaper” events are allowed to run. “Reaper” events process the results from host and service checks that have finished executing. If there are a lot of results to process, reaper events may take a long time to finish, which might delay timely execution of new host and service checks. This variable allows you to limit the amount of time that an individual reaper event will run before it hands control back over to Nagios for other portions of the monitoring logic.

Configuration

Ok, after this explication it’s the moment to start with our Nagios tuning!!.

Creating the Ramdisk

The first thing to do is created the ramdisk and mount it in our file system

mkdir /var/nagiosramdisk/
mount -t tmpfs none /var/nagiosramdisk -o size=50m
mkdir -m 775 /var/nagiosramdisk/tmp
mkdir -p -m 775 /var/nagiosramdisk/spool/checkresults
mkdir -m 775 /var/nagiosramdisk/spool/perfdata
chown -R nagios.nagios /var/nagiosramdisk

Nagios configuration

Editing the nagios.cfg configuration file to utilize the Ramdisk and set some other performance options.

Set this options as follows

use_large_installation_tweaks=1
enable_environment_macros=0
free_child_process_memory=0
child_processes_fork_twice=0
check_result_reaper_frequency=3
max_check_result_reaper_time=10

Now define the new path for your performance host and service data

service_perfdata_file=/var/nagiosramdisk/service-perfdata
host_perfdata_file=/var/nagiosramdisk/host-perfdata

And the rest of the options like

object_cache_file=/var/nagiosramdisk/objects.cache
check_result_path=/var/nagiosramdisk/spool/checkresults
status_file=/var/nagiosramdisk/status.dattemp_file=/var/nagiosramdisk/tmp/nagios.tmp
temp_path=/var/nagiosramdisk/tmp

Commands definition

Because the performance host and service path changed, is necessary adjust the commands definition. To do that edit your commands.cfg and change:

###############
# Commands for PNP4
###############

define command{
       command_name    process-service-perfdata-file
       command_line    /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/perfdata/service-perfdata.$TIMET$
}

define command{
       command_name    process-host-perfdata-file
       command_line    /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/perfdata/host-perfdata.$TIMET$
}

############
# Nagios configuration
############

define command{
	command_name    process-host-perfdata
	command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /var/nagiosramdisk/host-perfdata.out
}

define command{
	command_name    process-service-perfdata
	command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /var/nagiosramdisk/service-perfdata.out
}

PNP4 configuration

In my case the graphs are created by PNP4,edit your npcd.cfg and change the perfdata spool dir.

perfdata_spool_dir = /var/nagiosramdisk/spool/perfdata/

Restart the services

Finally restart the Nagios and NPCD service for the changes to take effect

/etc/init.d/nagios reload
/etc/init.d/npcd restart

Making the RAM Disk Permanent

Because RAM disks are volatile, we have to ensure that when the system reboot the Ramdisk and his structure created.

RamDisk fstab

The fist thing we have to do is add the ramdisk in our /etc/fstab file

tmpfs           /var/nagiosramdisk                    tmpfs   defaults,size=50m 0 0

Nagios Init.d

When the system startup automatically launch the /etc/init.d/nagios script, we adjust this script to add in it the necessary structure path

[ ! -d /var/nagiosramdisk ] && mkdir -m 775 /var/nagiosramdisk
[ ! -d /var/nagiosramdisk/tmp ] && mkdir -p -m 775 /var/nagiosramdisk/tmp
[ ! -d /var/nagiosramdisk/spool/checkresults ] && mkdir -p -m 775 /var/nagiosramdisk/spool/checkresults
chown -R nagios.nagios /var/nagiosramdisk

PNP4 Init.d

Like the /etc/init.d/nagios script, we’ll set the correct ramdisk creation path

[ ! -d /var/nagiosramdisk ] && mkdir -m 775 /var/nagiosramdisk
[ ! -d /var/nagiosramdisk/spool/perfdata ] && mkdir -p -m 775 /var/nagiosramdisk/spool/perfdata
chown -R nagios.nagios /var/nagiosramdisk

Check the changes

Like the following Nagios stats confirm, this changes increase significantly our Nagios Monitoring system.

nagios_comparation_performance

As we see, making use of local RAM disks can cause huge performance improvements on larger systems, or any system where check latencies are greater than 2 seconds.

Leave a Reply

Your email address will not be published. Required fields are marked *