System Update Role

Overview

This role orchestrates comprehensive system updates across heterogeneous infrastructure including Debian, RedHat, and OPNsense systems with intelligent reboot handling and Centreon monitoring integration. It performs OS-specific updates, detects kernel updates requiring reboots, schedules Centreon downtimes to prevent false alerts during maintenance, automatically reboots systems when necessary, and provides informational summaries. The role handles special cases like Proxmox host updates that affect VMs, Docker daemon updates requiring container restarts, and Centreon-specific package exclusions.

Purpose

Automated Updates: Apply system updates across entire infrastructure
OS-Agnostic: Support Debian, RedHat, and OPNsense systems
Intelligent Reboots: Auto-detect kernel updates and reboot when needed
Monitoring Integration: Schedule Centreon downtimes during maintenance
Cascading Awareness: Handle Proxmox host updates affecting VMs
Docker Support: Update Docker and containers intelligently
Safe Centreon Updates: Exclude MySQL packages to prevent breakage
Informational Feedback: Provide clear update summaries

Requirements

Ansible 2.9 or higher
Target systems: Debian/Ubuntu, RedHat/CentOS, or OPNsense
Root or sudo privileges on target systems
Centreon monitoring system (optional, for downtime scheduling)
OPNsense API access (for OPNsense updates)
Network connectivity to package repositories

What This Role Does

Comprehensive update orchestration:

OS Detection: Identifies system type (Debian/RedHat/OPNsense)
Package Updates: Applies all available updates
Kernel Detection: Identifies if kernel was updated
Downtime Scheduling: Schedules Centreon maintenance windows
Automatic Reboots: Reboots systems when kernel updated
Status Reporting: Provides clear update summaries

Special handling:

Proxmox: Schedules downtime for VMs when host kernel updates
Docker: Handles Docker daemon and container updates separately
Centreon: Excludes MySQL packages from Centreon servers
OPNsense: Uses API to check for firmware updates

Role Variables

Optional Variables

Variable	Default	Description
`system_update_auto_reboot_enabled`	`false`	Enable automatic reboots
`system_update_reboot_timeout`	`600`	Reboot timeout (seconds)
`system_update_reboot_message`	`"Rebooting due to system maintenance"`	Reboot message
`system_update_downtime_duration_minutes`	`60`	Centreon downtime duration
`system_update_downtime_comment_prefix`	`"Ansible maintenance"`	Downtime comment prefix
`system_update_centreon_excluded_packages`	See defaults	Packages excluded on Centreon
`system_update_docker_prune_on_update`	`true`	Prune Docker after updates
`system_update_apt_cache_valid_time`	`3600`	Apt cache validity (seconds)
`system_update_apt_autoremove`	`true`	Remove unused packages (Debian)
`system_update_apt_autoclean`	`true`	Clean apt cache (Debian)

Variable Details

system_update_auto_reboot_enabled

Whether to automatically reboot after kernel updates.

Default: false (manual reboot required)

Set to true to enable auto-reboot:

system_update_auto_reboot_enabled: true

Safety: Disabled by default to prevent unexpected reboots

Override per host:

# In inventory
webserver1:
  system_update_auto_reboot_enabled: true

database1:
  system_update_auto_reboot_enabled: false  # Never auto-reboot

system_update_reboot_timeout

Maximum seconds to wait for system to come back after reboot.

Default: 600 (10 minutes)

Increase for slow systems:

system_update_reboot_timeout: 900  # 15 minutes

system_update_downtime_duration_minutes

How long to schedule Centreon downtime (in minutes).

Default: 60 minutes

Adjust based on reboot time:

system_update_downtime_duration_minutes: 30  # Quick reboot
system_update_downtime_duration_minutes: 120  # Long maintenance

system_update_centreon_excluded_packages

Packages to exclude from updates on Centreon servers.

Default:

system_update_centreon_excluded_packages:
  - perl-DBD-MySQL
  - mysql-common
  - mysql-libs

Why excluded: Centreon uses specific MySQL/MariaDB versions. Auto-updating can break Centreon’s database connection.

Custom exclusions:

system_update_centreon_excluded_packages:
  - perl-DBD-MySQL
  - mysql-common
  - mysql-libs
  - custom-package

system_update_docker_prune_on_update

Clean up unused Docker resources after updates.

Default: true

What it prunes:

Stopped containers
Unused images
Unused volumes
Unused networks

Disable if needed:

system_update_docker_prune_on_update: false

Dependencies

No Ansible role dependencies, but integrates with:

Centreon: For downtime scheduling (optional)
OPNsense API: For firmware update checks (if using OPNsense)
Docker: For container management (if Docker host)

Example Playbook

Basic Usage (All Systems)

---
- name: Update All Systems
  hosts: all
  become: true

  roles:
    - system_update

Enable Auto-Reboot

---
- name: Update with Auto-Reboot
  hosts: all
  become: true

  vars:
    system_update_auto_reboot_enabled: true

  roles:
    - system_update

Update Non-Critical Systems Only

---
- name: Update Development Servers
  hosts: development
  become: true

  vars:
    system_update_auto_reboot_enabled: true
    system_update_downtime_duration_minutes: 30

  roles:
    - system_update

Selective Auto-Reboot

---
- name: Update with Host-Specific Reboot Policy
  hosts: all
  become: true

  roles:
    - system_update

# Inventory:
# webservers:
#   web1:
#     system_update_auto_reboot_enabled: true
#   web2:
#     system_update_auto_reboot_enabled: true
# databases:
#   db1:
#     system_update_auto_reboot_enabled: false  # Manual reboot only

What This Role Does (Detailed)

1. Initialize Tracking Facts

Sets facts to track update status:

system_update_kernel_updated: Kernel requires reboot
system_update_updates_applied: Updates were applied
system_update_opnsense_updates_available: OPNsense updates found
system_update_opnsense_updates_need_reboot: OPNsense needs reboot
system_update_docker_containers_updated: Docker containers updated
system_update_docker_updated: Docker daemon updated

2. Run OS-Specific Updates

Debian/Ubuntu (debian.yml):

Update apt cache
Run apt upgrade dist (full distribution upgrade)
Autoremove unused packages
Autoclean package cache
Check /var/run/reboot-required file
Detect kernel package updates
Set kernel update flag

RedHat/CentOS (redhat.yml):

Run dnf upgrade (or yum upgrade)
Exclude Centreon packages on Centreon hosts
Check if kernel package updated
Set kernel update flag

OPNsense (opnsense.yml):

Query API: /api/core/firmware/status
Check for available updates
Detect if updates require reboot
Set OPNsense update flags
Note: Does NOT apply updates (manual via UI)

3. Schedule Centreon Downtime

If kernel updated or reboot needed:

Host Uptime Service:

Schedules downtime for host’s Uptime service
Duration: system_update_downtime_duration_minutes
Comment: “Ansible maintenance - kernel update”

Proxmox VMs (if updating Proxmox host):

Schedules downtime for ALL VMs on that Proxmox host
Comment: “Proxmox host kernel update - VMs will reboot”
Prevents false alerts when VMs go down during host reboot

Docker Containers (if Docker updated):

Schedules downtime for “Docker Containers Uptime” service
Comment: “Ansible maintenance - docker-ce update”

4. Reboot System (If Needed)

If kernel updated and auto-reboot enabled:

Sends reboot message to logged-in users
Initiates system reboot
Waits for system to come back online
Timeout: system_update_reboot_timeout seconds
Tests SSH connectivity after reboot

Not rebooted:

OPNsense systems (manual reboot via UI)
Systems with system_update_auto_reboot_enabled: false

5. Display Informational Messages

Shows update summary:

Packages updated
Kernel update status
Reboot requirement
OPNsense update availability
Docker update status

Debian/Ubuntu Update Process

Commands executed:

# Update package cache
apt-get update

# Full distribution upgrade
apt-get dist-upgrade -y

# Remove unused packages
apt-get autoremove -y

# Clean package cache
apt-get autoclean -y

Reboot detection:

Check /var/run/reboot-required (created by apt)
Check if kernel packages updated:
- linux-image-*
- linux-headers-*
- proxmox-kernel-* (for Proxmox)

If either true: Kernel updated, reboot needed

RedHat/CentOS Update Process

Commands executed:

# Standard system update
dnf upgrade -y

# Centreon host with exclusions
dnf upgrade -y --exclude=perl-DBD-MySQL,mysql-common,mysql-libs

Reboot detection:

Checks if kernel package updated in dnf output
Searches for: kernel, kernel-core, kernel-modules

OPNsense Update Process

API query:

GET https://opnsense/api/core/firmware/status

Response analysis:

status_upgrade_action: Update available?
needs_reboot: Reboot required?

Important: Role does NOT apply OPNsense updates

Only checks for updates
Manual update via UI or API required
Schedules downtime if reboot needed

Centreon Downtime Scheduling

Centreon CLI command:

centreon -u admin -p 'password' -o RTDOWNTIME -a add \
  -v "SVC;hostname,Uptime;2026/01/08 10:00;2026/01/08 11:00;1;3600;Ansible maintenance - kernel update"

Parameters:

SVC: Service downtime (not host)
hostname,Uptime: Host and service name
Start time: Current time
End time: Current + duration minutes
1: Fixed downtime (not flexible)
Duration in seconds
Comment: Reason for downtime

Services scheduled:

Uptime: Host uptime monitoring
Docker Containers Uptime: Container uptime monitoring (Docker host)

Downtime prevents:

False alerts during reboot
Notifications to on-call staff
Downtime statistics inflation

Proxmox Special Handling

When Proxmox host kernel updates:

Proxmox Host:
- Kernel updated → Reboot scheduled
- Downtime scheduled for Proxmox host
All VMs on that Host:
- Downtime scheduled for each VM
- Prevents alerts when VMs go down
- VMs restart automatically with Proxmox

Implementation:

Loops through groups['proxmox_vms']
Schedules downtime for each VM’s Uptime service
Comment: “Proxmox host {name} kernel update - VMs will reboot”

Docker Special Handling

Docker daemon update:

Sets system_update_docker_updated: true
Schedules downtime for Docker Containers Uptime
Docker restart may affect containers

Docker containers update:

Sets system_update_docker_containers_updated: true
Schedules downtime for Docker Containers Uptime

Docker prune (if enabled):

docker system prune -af --volumes

Removes stopped containers
Removes unused images
Removes unused volumes
Removes unused networks

Reboot Workflow

When reboot triggered:

Pre-reboot:
- Centreon downtime already scheduled
- Message sent to logged-in users
- Wait 1 minute (allows graceful shutdown)
Reboot:
- System reboots
- Ansible waits for system to go down
Post-reboot:
- Ansible waits for SSH to come back
- Timeout: system_update_reboot_timeout
- Tests connectivity
- Continues playbook
Failure handling:
- If timeout exceeded: Playbook fails
- Manual intervention required

Security Considerations

Auto-reboot: Disabled by default (requires explicit enable)
Centreon Password: Stored in Ansible Vault
OPNsense API: Uses API key/secret from vault
Package Exclusions: Prevents breaking critical services
Reboot Message: Alerts logged-in users
Update Verification: Doesn’t blindly apply all updates

Notes

Role runs on target systems (not localhost)
become: true required for package updates and reboots
Centreon integration optional (skipped if centreon_host_name not defined)
OPNsense updates checked but not applied automatically
Reboot happens only if system_update_auto_reboot_enabled: true
Docker prune enabled by default (disable if needed)
Compatible with Debian, Ubuntu, RedHat, CentOS, OPNsense

Troubleshooting

Updates applied but reboot not triggered

Cause: system_update_auto_reboot_enabled is false

Solution: Enable auto-reboot or manually reboot

system_update_auto_reboot_enabled: true

Check if reboot needed:

# Debian/Ubuntu
ls /var/run/reboot-required

# All systems
uname -r  # Current kernel
dpkg -l | grep linux-image  # Installed kernels (Debian)

Reboot timeout exceeded

Symptom: Ansible fails waiting for system to come back

Causes:

System taking longer than timeout
SSH not starting automatically
Network issues

Solutions:

# Increase timeout
system_update_reboot_timeout: 1200  # 20 minutes

# Check system after timeout
# SSH manually, check boot logs
journalctl -b  # Current boot logs

Centreon downtime not scheduled

Symptom: No downtime visible in Centreon

Check:

# Verify centreon_host_name defined
ansible-inventory --host hostname

# Check Centreon CLI works
ssh centreon
centreon -u admin -p 'password' -o RTDOWNTIME -a show

Common issues:

centreon_host_name not defined in inventory
Centreon password incorrect
Service name mismatch (e.g., “uptime” vs “Uptime”)

Centreon packages broken after update

Symptom: Centreon web interface not loading, database connection errors

Cause: MySQL packages updated despite exclusions

Prevention:

system_update_centreon_excluded_packages:
  - perl-DBD-MySQL
  - mysql-common
  - mysql-libs
  - mariadb-server  # Add if using MariaDB

Recovery:

# Downgrade MySQL packages
yum downgrade perl-DBD-MySQL mysql-common mysql-libs

# Restart Centreon services
systemctl restart centreon

OPNsense updates available but not applied

Expected behavior: Role only checks, doesn’t apply

Apply updates:

Via UI: System → Firmware → Updates → Update
Via API: POST /api/core/firmware/upgrade

Why not automated: Firmware updates require careful testing, may break config

Testing the Role

Dry Run (Check Mode)

# See what would be updated (Debian)
ansible-playbook update-playbook.yml --check

# Check specific host
ansible-playbook update-playbook.yml --limit webserver1 --check

Test Without Auto-Reboot

- hosts: test-server
  become: true
  vars:
    system_update_auto_reboot_enabled: false  # Don't auto-reboot
  roles:
    - system_update

Verify Centreon Downtime

# After role runs, check Centreon
# Monitoring → Downtimes
# Should see scheduled downtime for host

Check Update Facts

# After role runs (no reboot)
ansible webserver1 -m setup -a 'filter=ansible_local'

# Check kernel version
ansible webserver1 -m command -a 'uname -r'

Best Practices

Test in non-production first: Run on dev/test systems
Schedule updates: Run during maintenance windows
Enable auto-reboot selectively: Enable for non-critical only
Monitor downtime duration: Ensure sufficient time for reboot
Verify Centreon integration: Confirm downtimes scheduled
Check before/after: Compare kernel versions
Backup before updates: Especially for critical systems
Staged rollouts: Update in waves (dev → staging → prod)
Communication: Notify team before running
Manual verification: Check critical services after updates

Scheduled Update Pattern

Recommended approach:

---
# Weekly update playbook
- name: Weekly Security Updates - Development
  hosts: development
  become: true
  vars:
    system_update_auto_reboot_enabled: true
  roles:
    - system_update

- name: Weekly Security Updates - Staging
  hosts: staging
  become: true
  vars:
    system_update_auto_reboot_enabled: true
  roles:
    - system_update

- name: Weekly Security Updates - Production (No Auto-Reboot)
  hosts: production
  become: true
  vars:
    system_update_auto_reboot_enabled: false
  roles:
    - system_update
  # Manual reboot during maintenance window

Schedule via cron/systemd timer:

# Cron: Run every Sunday at 2 AM
0 2 * * 0 ansible-playbook /path/to/update-playbook.yml

This role is often used with:

deploy_centreon: Centreon monitoring system
backup roles: Backup before updates
System hardening: Security updates

License

MIT

Author

Created for homelab infrastructure management.

System Update