Deploy System Monitoring
This role deploys comprehensive system monitoring infrastructure including NRPE (Nagios Remote Plugin Executor), custom monitoring scripts, and Centreon plugins.
Deploy System Monitoring Role
Overview
This role deploys comprehensive system monitoring infrastructure including NRPE (Nagios Remote Plugin Executor), custom monitoring scripts, and Centreon plugins. It handles host-specific configurations for Docker containers, Proxmox hypervisor, OPNsense firewall, and Centreon monitoring server. The role installs packages, deploys custom check scripts, configures NRPE daemon with appropriate permissions, and sets up service integrations like Centreon Apache HTTPS and Proxmox PBS storage scheduling.
Purpose
- Centralized Monitoring: Deploy NRPE for Centreon to monitor all infrastructure
- Custom Checks: Deploy specialized monitoring scripts for Docker, Proxmox, OPNsense
- Docker Monitoring: Container status, uptime, and health checks
- Proxmox Monitoring: CPU temperature, SMART disk health
- Centreon Configuration: Apache HTTPS, PHP settings, security headers
- PBS Scheduling: Automated Proxmox Backup Server enable/disable scheduling
- Security: Sudoers configuration for privileged monitoring operations
- Multi-OS Support: Handles RedHat and Debian package differences
Requirements
- Ansible 2.9 or higher
- Collection:
community.general(for cpanm module) - Target systems: Docker (RedHat), Proxmox (Debian), Centreon (RedHat), OPNsense
- Centreon server accessible on VLAN12
- Proper sudo/root permissions
- For Proxmox: Perl, lm-sensors, smartmontools
- For Docker: Docker daemon running, Python for check scripts
- For Centreon: Apache (httpd), PHP-FPM, SSL certificates deployed
What is NRPE?
NRPE (Nagios Remote Plugin Executor) is a monitoring agent that:
- Runs on monitored hosts
- Listens on port 5666 (TCP)
- Executes local check commands
- Returns results to monitoring server (Centreon)
- Enables active checks of local resources
Role Variables
Required Variables
All required variables have defaults or are auto-detected from inventory.
Optional Variables
| Variable | Default | Description |
|---|---|---|
deploy_system_monitoring_redhat_packages | See defaults | RedHat monitoring packages list |
deploy_system_monitoring_debian_packages | See defaults | Debian monitoring packages list |
deploy_system_monitoring_centreon_packages | See defaults | Centreon-specific packages |
deploy_system_monitoring_nrpe_config_path | /etc/nagios/nrpe.cfg | NRPE configuration file path |
deploy_system_monitoring_nrpe_service_name_redhat | nrpe | NRPE service name on RedHat |
deploy_system_monitoring_nrpe_service_name_debian | nagios-nrpe-server | NRPE service name on Debian |
deploy_system_monitoring_centreon_ip | Auto-detected | Centreon server IP (VLAN12) |
deploy_system_monitoring_docker_uptime_warning | 1800 | Docker container uptime warning (seconds) |
deploy_system_monitoring_docker_uptime_critical | 900 | Docker container uptime critical (seconds) |
deploy_system_monitoring_proxmox_cpu_temp_warning | 70 | CPU temperature warning (°C) |
deploy_system_monitoring_proxmox_cpu_temp_critical | 80 | CPU temperature critical (°C) |
deploy_system_monitoring_proxmox_cpu_sensor | "Package id 0" | CPU sensor identifier |
deploy_system_monitoring_proxmox_smart_disks | See defaults | List of disks for SMART monitoring |
deploy_system_monitoring_centreon_ssl_cert_path | /etc/pki/tls/certs/centreon.crt | Centreon SSL certificate path |
deploy_system_monitoring_centreon_ssl_key_path | /etc/pki/tls/private/centreon.key | Centreon SSL key path |
deploy_system_monitoring_centreon_ca_cert_path | /etc/pki/tls/certs/opnsense_ca.pem | CA certificate for PHP |
deploy_system_monitoring_proxmox_pbs_storage_name | hp-pbs | PBS storage name in Proxmox |
deploy_system_monitoring_proxmox_pbs_enable_hour | 1 | Hour to enable PBS storage (1:10 AM) |
deploy_system_monitoring_proxmox_pbs_enable_minute | 10 | Minute to enable PBS storage |
deploy_system_monitoring_proxmox_pbs_disable_hour | 2 | Hour to disable PBS storage (2:20 AM) |
deploy_system_monitoring_proxmox_pbs_disable_minute | 20 | Minute to disable PBS storage |
Variable Details
deploy_system_monitoring_redhat_packages
Packages installed on RedHat-based systems (Docker, Centreon):
deploy_system_monitoring_redhat_packages:
- nrpe
- nagios-plugins
- perl-App-cpanminus
deploy_system_monitoring_debian_packages
Packages installed on Debian-based systems (Proxmox):
deploy_system_monitoring_debian_packages:
- monitoring-plugins
- nagios-nrpe-server
- cpanminus
deploy_system_monitoring_proxmox_smart_disks
List of disks to monitor with SMART:
deploy_system_monitoring_proxmox_smart_disks:
- device: /dev/sda
interface: ata
- device: /dev/sdb
interface: ata
- device: /dev/sdc
interface: ata
Interface types: ata, scsi, nvme, sat
deploy_system_monitoring_centreon_ip
Auto-detected from inventory:
deploy_system_monitoring_centreon_ip: "{{ hostvars['centreon']['ip_vlan12'] }}"
This configures NRPE allowed_hosts to permit Centreon connections.
Dependencies
This role has no dependencies on other Ansible roles, but requires:
- Inventory variables:
ip_vlan12for each host - SSL certificates: For Centreon Apache (deployed by
deploy_ssl_certificatesrole) - Docker daemon: For Docker host monitoring
- Smartmontools: For Proxmox disk monitoring (installed by role)
- lm-sensors: For Proxmox temperature monitoring (installed by role)
Example Playbook
Basic Usage
---
- name: Deploy System Monitoring to All Hosts
hosts: all
become: true
roles:
- deploy_system_monitoring
Deploy to Specific Hosts
---
- name: Deploy Monitoring to Docker and Proxmox
hosts: docker,proxmox
become: true
roles:
- deploy_system_monitoring
With Custom Temperature Thresholds
---
- name: Deploy Proxmox Monitoring with Custom Thresholds
hosts: proxmox
become: true
vars:
deploy_system_monitoring_proxmox_cpu_temp_warning: 75
deploy_system_monitoring_proxmox_cpu_temp_critical: 85
roles:
- deploy_system_monitoring
What This Role Does
For All Monitored Hosts (Docker, Proxmox)
- Installs monitoring packages (NRPE, Nagios plugins)
- Configures NRPE to bind to VLAN12 IP
- Sets allowed_hosts to permit Centreon server
- Creates
/etc/nrpe.d/directory for host-specific commands - Deploys custom monitoring scripts to appropriate directories
- Enables and starts NRPE service
For Docker Host
- Installs RedHat packages (nrpe, nagios-plugins, cpanminus)
- Deploys
check_docker.pyscript to/usr/lib64/nagios/plugins/ - Deploys Docker NRPE commands configuration to
/etc/nrpe.d/docker_commands.cfg - Adds nrpe user to docker group for container access
- Restarts NRPE service to apply changes
Docker NRPE Commands:
check_docker_containers: Check container statuscheck_docker_uptime: Check container uptime
For Proxmox Host
- Installs Debian packages (monitoring-plugins, nagios-nrpe-server, cpanminus)
- Installs Perl modules (Config::Tiny) via cpanm
- Deploys
check_temp.shscript to/usr/lib/nagios/plugins/ - Deploys
check_smart.plscript to/usr/lib/nagios/plugins/ - Deploys Proxmox NRPE commands configuration to
/etc/nrpe.d/proxmox_commands.cfg - Configures sudoers for nagios user to run
smartctlwithout password - Creates PBS enable/disable scripts and cron schedules
- Restarts NRPE service to apply changes
Proxmox NRPE Commands:
check_cpu_temp: Monitor CPU temperaturecheck_smart_sda: Check SMART health for /dev/sdacheck_smart_sdb: Check SMART health for /dev/sdbcheck_smart_sdc: Check SMART health for /dev/sdc
For Centreon Server
- Installs Centreon NRPE plugin (centreon-nrpe3-plugin)
- Deploys Centreon custom scripts to
/usr/lib/centreon/plugins/ - Configures Apache for HTTPS with SSL certificates
- Configures PHP settings (session, memory, time limits, CA certificate)
- Disables Apache autoindex module (security)
- Restarts httpd and php-fpm services
For OPNsense Firewall
- Deploys custom monitoring scripts to
/usr/local/libexec/nagios/
Note: OPNsense uses built-in NRPE, role only deploys scripts.
NRPE Configuration
Network Configuration
NRPE is configured to:
- Bind to VLAN12 IP:
server_address={{ ip_vlan12 }} - Allow Centreon:
allowed_hosts=127.0.0.1,::1,{{ centreon_ip }} - Listen on port 5666: Default NRPE port (TCP)
Command Configuration
Host-specific commands are defined in /etc/nrpe.d/:
Docker (/etc/nrpe.d/docker_commands.cfg):
command[check_docker_containers]=/usr/lib64/nagios/plugins/check_docker.py --status running
command[check_docker_uptime]=/usr/lib64/nagios/plugins/check_docker.py --uptime --warning 1800 --critical 900
Proxmox (/etc/nrpe.d/proxmox_commands.cfg):
command[check_cpu_temp]=/usr/lib/nagios/plugins/check_temp.sh --sensor "Package id 0" --warning 70 --critical 80
command[check_smart_sda]=sudo /usr/sbin/smartctl -H -d ata /dev/sda
command[check_smart_sdb]=sudo /usr/sbin/smartctl -H -d ata /dev/sdb
command[check_smart_sdc]=sudo /usr/sbin/smartctl -H -d ata /dev/sdc
Custom Monitoring Scripts
Docker: check_docker.py
Purpose: Monitor Docker containers
Capabilities:
- Check container status (running, stopped, exited)
- Check container uptime (warn if too short - restart loop)
- List all containers with status
- Nagios plugin format (exit codes 0/1/2/3)
Usage:
# Check all containers are running
/usr/lib64/nagios/plugins/check_docker.py --status running
# Check uptime (warn if container restarted recently)
/usr/lib64/nagios/plugins/check_docker.py --uptime --warning 1800 --critical 900
Thresholds:
- Warning: Container uptime < 30 minutes (possible restart loop)
- Critical: Container uptime < 15 minutes (frequent restarts)
Proxmox: check_temp.sh
Purpose: Monitor CPU temperature via lm-sensors
Capabilities:
- Read temperature from specific sensor
- Compare against warning/critical thresholds
- Nagios plugin format
Usage:
# Check CPU package temperature
/usr/lib/nagios/plugins/check_temp.sh --sensor "Package id 0" --warning 70 --critical 80
Sensor detection:
# List available sensors
sensors
Proxmox: check_smart.pl
Purpose: Monitor disk health via SMART attributes
Capabilities:
- Check SMART health status
- Monitor specific SMART attributes (reallocated sectors, pending sectors)
- Support multiple interfaces (ATA, SCSI, NVMe)
- Nagios plugin format
Usage:
# Check SMART health
sudo /usr/sbin/smartctl -H -d ata /dev/sda
# Detailed SMART check with Perl script
/usr/lib/nagios/plugins/check_smart.pl -d /dev/sda -i ata
Requires: Sudoers entry for nagios user (configured by role)
Sudoers Configuration for Proxmox
The role creates /etc/sudoers.d/nagios_smartctl:
nagios ALL = NOPASSWD: /usr/sbin/smartctl
Purpose: Allow nagios user to run smartctl without password prompt
Security: Limited to specific command only (not full sudo access)
Centreon Apache HTTPS Configuration
Apache Modules
The role installs:
mod_ssl: HTTPS supportmod_security: Web application firewallopenssl: SSL/TLS library
Apache Configuration
File: /etc/httpd/conf.d/10-centreon.conf
Changes:
- Configures SSL certificate paths
- Sets SSL protocols and ciphers
- Configures security headers
- Redirects HTTP to HTTPS
PHP Configuration
File: /etc/php.d/50-centreon.ini
Changes:
- Sets
session.cookie_httponly = On - Sets
session.cookie_secure = On - Increases
memory_limit - Increases
max_execution_time - Configures
openssl.cafilefor CA validation - Configures
curl.cainfofor cURL requests
Security Headers
Apache configured to send:
Strict-Transport-Security: HSTS for HTTPS enforcementX-Content-Type-Options: Prevent MIME sniffingX-Frame-Options: Prevent clickjackingX-XSS-Protection: Enable XSS filter
Proxmox PBS Storage Scheduling
The role configures automated scheduling for Proxmox Backup Server (PBS) storage:
Why Scheduling?
PBS storage on external device (HP ProLiant) that:
- Powers on at 1:00 AM (via Wake-on-LAN or IPMI)
- Runs backups
- Powers off at 2:30 AM (to save energy and disk wear)
Proxmox needs to enable storage before backups and disable before shutdown.
Cron Schedules
Enable PBS (1:10 AM):
10 1 * * * /usr/local/bin/enable-pbs.sh >> /var/log/pbs-schedule.log 2>&1
Disable PBS (2:20 AM):
20 2 * * * /usr/local/bin/disable-pbs.sh >> /var/log/pbs-schedule.log 2>&1
Scripts
enable-pbs.sh:
#!/bin/bash
pvesm set hp-pbs --disable 0
echo "$(date): PBS storage enabled"
disable-pbs.sh:
#!/bin/bash
pvesm set hp-pbs --disable 1
echo "$(date): PBS storage disabled"
Commands:
pvesm set <storage> --disable 0: Enable storagepvesm set <storage> --disable 1: Disable storage
Directory Structure
RedHat Systems (Docker, Centreon)
/etc/nagios/
├── nrpe.cfg # Main NRPE config
└── nrpe.d/
└── docker_commands.cfg # Docker-specific commands
/usr/lib64/nagios/plugins/
├── check_docker.py # Docker monitoring script
└── (standard Nagios plugins)
/usr/lib/centreon/plugins/
└── (Centreon custom scripts)
Debian Systems (Proxmox)
/etc/nagios/
├── nrpe.cfg # Main NRPE config
└── nrpe.d/
└── proxmox_commands.cfg # Proxmox-specific commands
/usr/lib/nagios/plugins/
├── check_temp.sh # CPU temperature check
├── check_smart.pl # SMART disk health check
└── (standard Nagios plugins)
/etc/sudoers.d/
└── nagios_smartctl # Sudoers for smartctl
/usr/local/bin/
├── enable-pbs.sh # PBS enable script
└── disable-pbs.sh # PBS disable script
/var/log/
└── pbs-schedule.log # PBS schedule log
Security Considerations
- NRPE Access: Limited to Centreon server IP only
- NRPE Commands: Defined in configuration, cannot execute arbitrary commands
- Sudoers: Minimal privileges (nagios user can only run smartctl)
- SSL/TLS: Centreon Apache configured with strong ciphers
- HTTP Only Cookies: PHP session cookies protected
- Secure Cookies: Cookies only sent over HTTPS
- Security Headers: Protect against common web attacks
- Service User: nrpe/nagios user has limited system access
- Docker Group: nrpe user in docker group (read-only access to containers)
- No Passwords: NRPE uses allow-list, not authentication
Tags
This role does not define any tags. Use playbook-level tags if needed:
- hosts: all
roles:
- deploy_system_monitoring
tags:
- monitoring
- nrpe
- nagios
- centreon
Notes
- Role is host-aware and deploys appropriate configuration per host
- NRPE service name differs by OS (nrpe vs nagios-nrpe-server)
- Docker monitoring requires nrpe user in docker group
- Proxmox SMART monitoring requires sudoers configuration
- Centreon Apache config requires SSL certificates pre-deployed
- PBS scheduling assumes external backup server with power schedule
- Custom scripts deployed from
files/{hostname}/directories - Role includes handlers to restart services on configuration changes
Troubleshooting
NRPE service fails to start
Check configuration:
# Test NRPE config
nrpe -c /etc/nagios/nrpe.cfg -v
# Check service status
systemctl status nrpe # RedHat
systemctl status nagios-nrpe-server # Debian
Common issues:
- Invalid command syntax in
/etc/nrpe.d/files - Missing plugin scripts
- Incorrect file permissions
Centreon cannot connect to NRPE
Test NRPE connectivity:
# From Centreon server
/usr/lib64/nagios/plugins/check_nrpe -H docker_ip -c check_load
# From monitored host (should work from localhost)
/usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_load
Check:
- Firewall allows port 5666:
firewall-cmd --list-ports - NRPE allowed_hosts includes Centreon IP
- NRPE binding to correct interface:
netstat -tlnp | grep 5666
Docker monitoring “Permission denied”
Check nrpe user in docker group:
groups nrpe
# Should show: nrpe : nrpe docker
# If not, add manually:
usermod -aG docker nrpe
systemctl restart nrpe
Proxmox SMART monitoring fails
Check sudoers:
# Test as nagios user
sudo -u nagios sudo /usr/sbin/smartctl -H /dev/sda
# Verify sudoers file
visudo -cf /etc/sudoers.d/nagios_smartctl
Common issues:
- Sudoers file has syntax errors
- Sudoers file has wrong permissions (must be 0440)
- smartctl path incorrect (use
which smartctl)
Proxmox temperature monitoring no data
Check sensors:
# List available sensors
sensors
# Check if lm-sensors installed
apt list --installed | grep lm-sensors
# Detect sensors
sensors-detect
Verify sensor name:
# Find your CPU sensor
sensors | grep -i package
# Update role variable if different
deploy_system_monitoring_proxmox_cpu_sensor: "Your Sensor Name"
PBS scheduling not working
Check cron jobs:
# List cron jobs for root
crontab -l
# Check script execution
/usr/local/bin/enable-pbs.sh
cat /var/log/pbs-schedule.log
Test manually:
# Enable storage
pvesm set hp-pbs --disable 0
pvesm status
# Disable storage
pvesm set hp-pbs --disable 1
pvesm status
Centreon Apache HTTPS issues
Check SSL certificate:
# Verify certificate files exist
ls -la /etc/pki/tls/certs/centreon.crt
ls -la /etc/pki/tls/private/centreon.key
# Test Apache config
httpd -t
# Check Apache error log
tail -f /var/log/httpd/error_log
Test HTTPS:
curl -vk https://centreon-ip/centreon
Testing Monitoring Checks
Test NRPE Commands Locally
On Docker host:
# Test Docker check
/usr/lib64/nagios/plugins/check_docker.py --status running
# Expected output:
# OK: All containers running (5 containers)
# Exit code: 0
# Test uptime check
/usr/lib64/nagios/plugins/check_docker.py --uptime --warning 1800 --critical 900
On Proxmox host:
# Test temperature check
/usr/lib/nagios/plugins/check_temp.sh --sensor "Package id 0" --warning 70 --critical 80
# Expected output:
# OK: CPU temperature is 45°C
# Exit code: 0
# Test SMART check
sudo /usr/sbin/smartctl -H /dev/sda
# Expected output:
# SMART overall-health self-assessment test result: PASSED
# Exit code: 0
Test NRPE from Centreon
# From Centreon server
/usr/lib64/nagios/plugins/check_nrpe -H docker_ip -c check_docker_containers
/usr/lib64/nagios/plugins/check_nrpe -H proxmox_ip -c check_cpu_temp
/usr/lib64/nagios/plugins/check_nrpe -H proxmox_ip -c check_smart_sda
Best Practices
- Test on single host before deploying to all infrastructure
- Monitor NRPE logs after deployment:
/var/log/messagesorjournalctl -u nrpe - Verify Centreon connectivity from monitoring server
- Set appropriate thresholds for your hardware (CPU temp varies by CPU model)
- Schedule regular SMART tests (separate from monitoring checks)
- Rotate logs for PBS scheduling (
/var/log/pbs-schedule.log) - Document custom checks in Centreon with descriptions
- Test fail conditions (stop container, heat CPU, fail disk)
- Keep monitoring scripts updated as infrastructure changes
- Review security headers in Centreon after deployment
Performance Considerations
- NRPE daemon is lightweight (~1-2 MB RAM)
- Check execution is fast (< 1 second typically)
- Docker monitoring requires minimal overhead (reads Docker socket)
- SMART checks can take 1-5 seconds per disk
- Temperature checks are near-instant
- Centreon Apache config doesn’t impact monitoring performance
- PBS scheduling runs twice daily (minimal system impact)
Related Roles
This role is often used with:
- deploy_ssl_certificates: Deploy SSL certs for Centreon Apache
- deploy_network_configuration: Configure VLAN12 networking
- docker_compositor: Deploy Docker containers to monitor
- telegraf_agent: Deploy additional metrics collection (Telegraf → InfluxDB)
License
MIT
Author
Created for homelab infrastructure management.