Deploy System Monitoring

This role deploys comprehensive system monitoring infrastructure including NRPE (Nagios Remote Plugin Executor), custom monitoring scripts, and Centreon plugins.

ARA Ansible Apache Bash Centreon Debian Docker HTTPS

Deploy System Monitoring Role

Overview

This role deploys comprehensive system monitoring infrastructure including NRPE (Nagios Remote Plugin Executor), custom monitoring scripts, and Centreon plugins. It handles host-specific configurations for Docker containers, Proxmox hypervisor, OPNsense firewall, and Centreon monitoring server. The role installs packages, deploys custom check scripts, configures NRPE daemon with appropriate permissions, and sets up service integrations like Centreon Apache HTTPS and Proxmox PBS storage scheduling.

Purpose

  • Centralized Monitoring: Deploy NRPE for Centreon to monitor all infrastructure
  • Custom Checks: Deploy specialized monitoring scripts for Docker, Proxmox, OPNsense
  • Docker Monitoring: Container status, uptime, and health checks
  • Proxmox Monitoring: CPU temperature, SMART disk health
  • Centreon Configuration: Apache HTTPS, PHP settings, security headers
  • PBS Scheduling: Automated Proxmox Backup Server enable/disable scheduling
  • Security: Sudoers configuration for privileged monitoring operations
  • Multi-OS Support: Handles RedHat and Debian package differences

Requirements

  • Ansible 2.9 or higher
  • Collection: community.general (for cpanm module)
  • Target systems: Docker (RedHat), Proxmox (Debian), Centreon (RedHat), OPNsense
  • Centreon server accessible on VLAN12
  • Proper sudo/root permissions
  • For Proxmox: Perl, lm-sensors, smartmontools
  • For Docker: Docker daemon running, Python for check scripts
  • For Centreon: Apache (httpd), PHP-FPM, SSL certificates deployed

What is NRPE?

NRPE (Nagios Remote Plugin Executor) is a monitoring agent that:

  • Runs on monitored hosts
  • Listens on port 5666 (TCP)
  • Executes local check commands
  • Returns results to monitoring server (Centreon)
  • Enables active checks of local resources

Role Variables

Required Variables

All required variables have defaults or are auto-detected from inventory.

Optional Variables

VariableDefaultDescription
deploy_system_monitoring_redhat_packagesSee defaultsRedHat monitoring packages list
deploy_system_monitoring_debian_packagesSee defaultsDebian monitoring packages list
deploy_system_monitoring_centreon_packagesSee defaultsCentreon-specific packages
deploy_system_monitoring_nrpe_config_path/etc/nagios/nrpe.cfgNRPE configuration file path
deploy_system_monitoring_nrpe_service_name_redhatnrpeNRPE service name on RedHat
deploy_system_monitoring_nrpe_service_name_debiannagios-nrpe-serverNRPE service name on Debian
deploy_system_monitoring_centreon_ipAuto-detectedCentreon server IP (VLAN12)
deploy_system_monitoring_docker_uptime_warning1800Docker container uptime warning (seconds)
deploy_system_monitoring_docker_uptime_critical900Docker container uptime critical (seconds)
deploy_system_monitoring_proxmox_cpu_temp_warning70CPU temperature warning (°C)
deploy_system_monitoring_proxmox_cpu_temp_critical80CPU temperature critical (°C)
deploy_system_monitoring_proxmox_cpu_sensor"Package id 0"CPU sensor identifier
deploy_system_monitoring_proxmox_smart_disksSee defaultsList of disks for SMART monitoring
deploy_system_monitoring_centreon_ssl_cert_path/etc/pki/tls/certs/centreon.crtCentreon SSL certificate path
deploy_system_monitoring_centreon_ssl_key_path/etc/pki/tls/private/centreon.keyCentreon SSL key path
deploy_system_monitoring_centreon_ca_cert_path/etc/pki/tls/certs/opnsense_ca.pemCA certificate for PHP
deploy_system_monitoring_proxmox_pbs_storage_namehp-pbsPBS storage name in Proxmox
deploy_system_monitoring_proxmox_pbs_enable_hour1Hour to enable PBS storage (1:10 AM)
deploy_system_monitoring_proxmox_pbs_enable_minute10Minute to enable PBS storage
deploy_system_monitoring_proxmox_pbs_disable_hour2Hour to disable PBS storage (2:20 AM)
deploy_system_monitoring_proxmox_pbs_disable_minute20Minute to disable PBS storage

Variable Details

deploy_system_monitoring_redhat_packages

Packages installed on RedHat-based systems (Docker, Centreon):

deploy_system_monitoring_redhat_packages:
  - nrpe
  - nagios-plugins
  - perl-App-cpanminus

deploy_system_monitoring_debian_packages

Packages installed on Debian-based systems (Proxmox):

deploy_system_monitoring_debian_packages:
  - monitoring-plugins
  - nagios-nrpe-server
  - cpanminus

deploy_system_monitoring_proxmox_smart_disks

List of disks to monitor with SMART:

deploy_system_monitoring_proxmox_smart_disks:
  - device: /dev/sda
    interface: ata
  - device: /dev/sdb
    interface: ata
  - device: /dev/sdc
    interface: ata

Interface types: ata, scsi, nvme, sat

deploy_system_monitoring_centreon_ip

Auto-detected from inventory:

deploy_system_monitoring_centreon_ip: "{{ hostvars['centreon']['ip_vlan12'] }}"

This configures NRPE allowed_hosts to permit Centreon connections.

Dependencies

This role has no dependencies on other Ansible roles, but requires:

  • Inventory variables: ip_vlan12 for each host
  • SSL certificates: For Centreon Apache (deployed by deploy_ssl_certificates role)
  • Docker daemon: For Docker host monitoring
  • Smartmontools: For Proxmox disk monitoring (installed by role)
  • lm-sensors: For Proxmox temperature monitoring (installed by role)

Example Playbook

Basic Usage

---
- name: Deploy System Monitoring to All Hosts
  hosts: all
  become: true

  roles:
    - deploy_system_monitoring

Deploy to Specific Hosts

---
- name: Deploy Monitoring to Docker and Proxmox
  hosts: docker,proxmox
  become: true

  roles:
    - deploy_system_monitoring

With Custom Temperature Thresholds

---
- name: Deploy Proxmox Monitoring with Custom Thresholds
  hosts: proxmox
  become: true

  vars:
    deploy_system_monitoring_proxmox_cpu_temp_warning: 75
    deploy_system_monitoring_proxmox_cpu_temp_critical: 85

  roles:
    - deploy_system_monitoring

What This Role Does

For All Monitored Hosts (Docker, Proxmox)

  1. Installs monitoring packages (NRPE, Nagios plugins)
  2. Configures NRPE to bind to VLAN12 IP
  3. Sets allowed_hosts to permit Centreon server
  4. Creates /etc/nrpe.d/ directory for host-specific commands
  5. Deploys custom monitoring scripts to appropriate directories
  6. Enables and starts NRPE service

For Docker Host

  1. Installs RedHat packages (nrpe, nagios-plugins, cpanminus)
  2. Deploys check_docker.py script to /usr/lib64/nagios/plugins/
  3. Deploys Docker NRPE commands configuration to /etc/nrpe.d/docker_commands.cfg
  4. Adds nrpe user to docker group for container access
  5. Restarts NRPE service to apply changes

Docker NRPE Commands:

  • check_docker_containers: Check container status
  • check_docker_uptime: Check container uptime

For Proxmox Host

  1. Installs Debian packages (monitoring-plugins, nagios-nrpe-server, cpanminus)
  2. Installs Perl modules (Config::Tiny) via cpanm
  3. Deploys check_temp.sh script to /usr/lib/nagios/plugins/
  4. Deploys check_smart.pl script to /usr/lib/nagios/plugins/
  5. Deploys Proxmox NRPE commands configuration to /etc/nrpe.d/proxmox_commands.cfg
  6. Configures sudoers for nagios user to run smartctl without password
  7. Creates PBS enable/disable scripts and cron schedules
  8. Restarts NRPE service to apply changes

Proxmox NRPE Commands:

  • check_cpu_temp: Monitor CPU temperature
  • check_smart_sda: Check SMART health for /dev/sda
  • check_smart_sdb: Check SMART health for /dev/sdb
  • check_smart_sdc: Check SMART health for /dev/sdc

For Centreon Server

  1. Installs Centreon NRPE plugin (centreon-nrpe3-plugin)
  2. Deploys Centreon custom scripts to /usr/lib/centreon/plugins/
  3. Configures Apache for HTTPS with SSL certificates
  4. Configures PHP settings (session, memory, time limits, CA certificate)
  5. Disables Apache autoindex module (security)
  6. Restarts httpd and php-fpm services

For OPNsense Firewall

  1. Deploys custom monitoring scripts to /usr/local/libexec/nagios/

Note: OPNsense uses built-in NRPE, role only deploys scripts.

NRPE Configuration

Network Configuration

NRPE is configured to:

  • Bind to VLAN12 IP: server_address={{ ip_vlan12 }}
  • Allow Centreon: allowed_hosts=127.0.0.1,::1,{{ centreon_ip }}
  • Listen on port 5666: Default NRPE port (TCP)

Command Configuration

Host-specific commands are defined in /etc/nrpe.d/:

Docker (/etc/nrpe.d/docker_commands.cfg):

command[check_docker_containers]=/usr/lib64/nagios/plugins/check_docker.py --status running
command[check_docker_uptime]=/usr/lib64/nagios/plugins/check_docker.py --uptime --warning 1800 --critical 900

Proxmox (/etc/nrpe.d/proxmox_commands.cfg):

command[check_cpu_temp]=/usr/lib/nagios/plugins/check_temp.sh --sensor "Package id 0" --warning 70 --critical 80
command[check_smart_sda]=sudo /usr/sbin/smartctl -H -d ata /dev/sda
command[check_smart_sdb]=sudo /usr/sbin/smartctl -H -d ata /dev/sdb
command[check_smart_sdc]=sudo /usr/sbin/smartctl -H -d ata /dev/sdc

Custom Monitoring Scripts

Docker: check_docker.py

Purpose: Monitor Docker containers

Capabilities:

  • Check container status (running, stopped, exited)
  • Check container uptime (warn if too short - restart loop)
  • List all containers with status
  • Nagios plugin format (exit codes 0/1/2/3)

Usage:

# Check all containers are running
/usr/lib64/nagios/plugins/check_docker.py --status running

# Check uptime (warn if container restarted recently)
/usr/lib64/nagios/plugins/check_docker.py --uptime --warning 1800 --critical 900

Thresholds:

  • Warning: Container uptime < 30 minutes (possible restart loop)
  • Critical: Container uptime < 15 minutes (frequent restarts)

Proxmox: check_temp.sh

Purpose: Monitor CPU temperature via lm-sensors

Capabilities:

  • Read temperature from specific sensor
  • Compare against warning/critical thresholds
  • Nagios plugin format

Usage:

# Check CPU package temperature
/usr/lib/nagios/plugins/check_temp.sh --sensor "Package id 0" --warning 70 --critical 80

Sensor detection:

# List available sensors
sensors

Proxmox: check_smart.pl

Purpose: Monitor disk health via SMART attributes

Capabilities:

  • Check SMART health status
  • Monitor specific SMART attributes (reallocated sectors, pending sectors)
  • Support multiple interfaces (ATA, SCSI, NVMe)
  • Nagios plugin format

Usage:

# Check SMART health
sudo /usr/sbin/smartctl -H -d ata /dev/sda

# Detailed SMART check with Perl script
/usr/lib/nagios/plugins/check_smart.pl -d /dev/sda -i ata

Requires: Sudoers entry for nagios user (configured by role)

Sudoers Configuration for Proxmox

The role creates /etc/sudoers.d/nagios_smartctl:

nagios   ALL = NOPASSWD: /usr/sbin/smartctl

Purpose: Allow nagios user to run smartctl without password prompt

Security: Limited to specific command only (not full sudo access)

Centreon Apache HTTPS Configuration

Apache Modules

The role installs:

  • mod_ssl: HTTPS support
  • mod_security: Web application firewall
  • openssl: SSL/TLS library

Apache Configuration

File: /etc/httpd/conf.d/10-centreon.conf

Changes:

  • Configures SSL certificate paths
  • Sets SSL protocols and ciphers
  • Configures security headers
  • Redirects HTTP to HTTPS

PHP Configuration

File: /etc/php.d/50-centreon.ini

Changes:

  • Sets session.cookie_httponly = On
  • Sets session.cookie_secure = On
  • Increases memory_limit
  • Increases max_execution_time
  • Configures openssl.cafile for CA validation
  • Configures curl.cainfo for cURL requests

Security Headers

Apache configured to send:

  • Strict-Transport-Security: HSTS for HTTPS enforcement
  • X-Content-Type-Options: Prevent MIME sniffing
  • X-Frame-Options: Prevent clickjacking
  • X-XSS-Protection: Enable XSS filter

Proxmox PBS Storage Scheduling

The role configures automated scheduling for Proxmox Backup Server (PBS) storage:

Why Scheduling?

PBS storage on external device (HP ProLiant) that:

  • Powers on at 1:00 AM (via Wake-on-LAN or IPMI)
  • Runs backups
  • Powers off at 2:30 AM (to save energy and disk wear)

Proxmox needs to enable storage before backups and disable before shutdown.

Cron Schedules

Enable PBS (1:10 AM):

10 1 * * * /usr/local/bin/enable-pbs.sh >> /var/log/pbs-schedule.log 2>&1

Disable PBS (2:20 AM):

20 2 * * * /usr/local/bin/disable-pbs.sh >> /var/log/pbs-schedule.log 2>&1

Scripts

enable-pbs.sh:

#!/bin/bash
pvesm set hp-pbs --disable 0
echo "$(date): PBS storage enabled"

disable-pbs.sh:

#!/bin/bash
pvesm set hp-pbs --disable 1
echo "$(date): PBS storage disabled"

Commands:

  • pvesm set <storage> --disable 0: Enable storage
  • pvesm set <storage> --disable 1: Disable storage

Directory Structure

RedHat Systems (Docker, Centreon)

/etc/nagios/
├── nrpe.cfg                      # Main NRPE config
└── nrpe.d/
    └── docker_commands.cfg       # Docker-specific commands

/usr/lib64/nagios/plugins/
├── check_docker.py               # Docker monitoring script
└── (standard Nagios plugins)

/usr/lib/centreon/plugins/
└── (Centreon custom scripts)

Debian Systems (Proxmox)

/etc/nagios/
├── nrpe.cfg                      # Main NRPE config
└── nrpe.d/
    └── proxmox_commands.cfg      # Proxmox-specific commands

/usr/lib/nagios/plugins/
├── check_temp.sh                 # CPU temperature check
├── check_smart.pl                # SMART disk health check
└── (standard Nagios plugins)

/etc/sudoers.d/
└── nagios_smartctl               # Sudoers for smartctl

/usr/local/bin/
├── enable-pbs.sh                 # PBS enable script
└── disable-pbs.sh                # PBS disable script

/var/log/
└── pbs-schedule.log              # PBS schedule log

Security Considerations

  • NRPE Access: Limited to Centreon server IP only
  • NRPE Commands: Defined in configuration, cannot execute arbitrary commands
  • Sudoers: Minimal privileges (nagios user can only run smartctl)
  • SSL/TLS: Centreon Apache configured with strong ciphers
  • HTTP Only Cookies: PHP session cookies protected
  • Secure Cookies: Cookies only sent over HTTPS
  • Security Headers: Protect against common web attacks
  • Service User: nrpe/nagios user has limited system access
  • Docker Group: nrpe user in docker group (read-only access to containers)
  • No Passwords: NRPE uses allow-list, not authentication

Tags

This role does not define any tags. Use playbook-level tags if needed:

- hosts: all
  roles:
    - deploy_system_monitoring
  tags:
    - monitoring
    - nrpe
    - nagios
    - centreon

Notes

  • Role is host-aware and deploys appropriate configuration per host
  • NRPE service name differs by OS (nrpe vs nagios-nrpe-server)
  • Docker monitoring requires nrpe user in docker group
  • Proxmox SMART monitoring requires sudoers configuration
  • Centreon Apache config requires SSL certificates pre-deployed
  • PBS scheduling assumes external backup server with power schedule
  • Custom scripts deployed from files/{hostname}/ directories
  • Role includes handlers to restart services on configuration changes

Troubleshooting

NRPE service fails to start

Check configuration:

# Test NRPE config
nrpe -c /etc/nagios/nrpe.cfg -v

# Check service status
systemctl status nrpe  # RedHat
systemctl status nagios-nrpe-server  # Debian

Common issues:

  • Invalid command syntax in /etc/nrpe.d/ files
  • Missing plugin scripts
  • Incorrect file permissions

Centreon cannot connect to NRPE

Test NRPE connectivity:

# From Centreon server
/usr/lib64/nagios/plugins/check_nrpe -H docker_ip -c check_load

# From monitored host (should work from localhost)
/usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_load

Check:

  • Firewall allows port 5666: firewall-cmd --list-ports
  • NRPE allowed_hosts includes Centreon IP
  • NRPE binding to correct interface: netstat -tlnp | grep 5666

Docker monitoring “Permission denied”

Check nrpe user in docker group:

groups nrpe
# Should show: nrpe : nrpe docker

# If not, add manually:
usermod -aG docker nrpe
systemctl restart nrpe

Proxmox SMART monitoring fails

Check sudoers:

# Test as nagios user
sudo -u nagios sudo /usr/sbin/smartctl -H /dev/sda

# Verify sudoers file
visudo -cf /etc/sudoers.d/nagios_smartctl

Common issues:

  • Sudoers file has syntax errors
  • Sudoers file has wrong permissions (must be 0440)
  • smartctl path incorrect (use which smartctl)

Proxmox temperature monitoring no data

Check sensors:

# List available sensors
sensors

# Check if lm-sensors installed
apt list --installed | grep lm-sensors

# Detect sensors
sensors-detect

Verify sensor name:

# Find your CPU sensor
sensors | grep -i package

# Update role variable if different
deploy_system_monitoring_proxmox_cpu_sensor: "Your Sensor Name"

PBS scheduling not working

Check cron jobs:

# List cron jobs for root
crontab -l

# Check script execution
/usr/local/bin/enable-pbs.sh
cat /var/log/pbs-schedule.log

Test manually:

# Enable storage
pvesm set hp-pbs --disable 0
pvesm status

# Disable storage
pvesm set hp-pbs --disable 1
pvesm status

Centreon Apache HTTPS issues

Check SSL certificate:

# Verify certificate files exist
ls -la /etc/pki/tls/certs/centreon.crt
ls -la /etc/pki/tls/private/centreon.key

# Test Apache config
httpd -t

# Check Apache error log
tail -f /var/log/httpd/error_log

Test HTTPS:

curl -vk https://centreon-ip/centreon

Testing Monitoring Checks

Test NRPE Commands Locally

On Docker host:

# Test Docker check
/usr/lib64/nagios/plugins/check_docker.py --status running

# Expected output:
# OK: All containers running (5 containers)
# Exit code: 0

# Test uptime check
/usr/lib64/nagios/plugins/check_docker.py --uptime --warning 1800 --critical 900

On Proxmox host:

# Test temperature check
/usr/lib/nagios/plugins/check_temp.sh --sensor "Package id 0" --warning 70 --critical 80

# Expected output:
# OK: CPU temperature is 45°C
# Exit code: 0

# Test SMART check
sudo /usr/sbin/smartctl -H /dev/sda

# Expected output:
# SMART overall-health self-assessment test result: PASSED
# Exit code: 0

Test NRPE from Centreon

# From Centreon server
/usr/lib64/nagios/plugins/check_nrpe -H docker_ip -c check_docker_containers
/usr/lib64/nagios/plugins/check_nrpe -H proxmox_ip -c check_cpu_temp
/usr/lib64/nagios/plugins/check_nrpe -H proxmox_ip -c check_smart_sda

Best Practices

  1. Test on single host before deploying to all infrastructure
  2. Monitor NRPE logs after deployment: /var/log/messages or journalctl -u nrpe
  3. Verify Centreon connectivity from monitoring server
  4. Set appropriate thresholds for your hardware (CPU temp varies by CPU model)
  5. Schedule regular SMART tests (separate from monitoring checks)
  6. Rotate logs for PBS scheduling (/var/log/pbs-schedule.log)
  7. Document custom checks in Centreon with descriptions
  8. Test fail conditions (stop container, heat CPU, fail disk)
  9. Keep monitoring scripts updated as infrastructure changes
  10. Review security headers in Centreon after deployment

Performance Considerations

  • NRPE daemon is lightweight (~1-2 MB RAM)
  • Check execution is fast (< 1 second typically)
  • Docker monitoring requires minimal overhead (reads Docker socket)
  • SMART checks can take 1-5 seconds per disk
  • Temperature checks are near-instant
  • Centreon Apache config doesn’t impact monitoring performance
  • PBS scheduling runs twice daily (minimal system impact)

This role is often used with:

  • deploy_ssl_certificates: Deploy SSL certs for Centreon Apache
  • deploy_network_configuration: Configure VLAN12 networking
  • docker_compositor: Deploy Docker containers to monitor
  • telegraf_agent: Deploy additional metrics collection (Telegraf → InfluxDB)

License

MIT

Author

Created for homelab infrastructure management.