Telegraf Agent

This role installs and configures the Telegraf monitoring agent on Linux systems for collecting and forwarding system metrics to InfluxDB.

ARA Ansible Bash Centreon Debian Docker Grafana Graylog

Telegraf Agent Role

Overview

This role installs and configures the Telegraf monitoring agent on Linux systems for collecting and forwarding system metrics to InfluxDB. It uses a two-tier configuration approach with a main config file containing agent settings, InfluxDB output, and common input plugins, plus host-specific configs for specialized inputs like Docker monitoring, SNMP polling, temperature sensors, interrupts, and sysctl metrics. The role supports both RedHat and Debian systems, automatically manages repository configuration, validates Telegraf configuration before deploying, and handles service management.

Purpose

  • System Metrics Collection: Monitor CPU, memory, disk, network on all systems
  • InfluxDB Integration: Send metrics to centralized InfluxDB database
  • Host-Specific Monitoring: Docker, SNMP, temperature sensors per host
  • Two-Tier Configuration: Base + optional host-specific inputs
  • Repository Management: Automatically configure InfluxData repos
  • Service Management: Enable and manage Telegraf service
  • Configuration Validation: Test config before applying

Requirements

  • Ansible 2.9 or higher
  • Target system: RedHat/CentOS or Debian/Ubuntu
  • Root or sudo privileges
  • InfluxDB server accessible from target hosts
  • InfluxDB database and user created
  • Telegraf password stored in Ansible Vault
  • SNMP community strings in vault (if using SNMP monitoring)

What is Telegraf?

Telegraf is a plugin-driven server agent for collecting and reporting metrics:

Architecture:

Input Plugins → Telegraf Agent → Output Plugins
(Collect data)   (Process)      (Send to InfluxDB)

Input plugins (collect metrics):

  • System: CPU, memory, disk, network
  • Docker: Container metrics
  • SNMP: Network device monitoring
  • Sensors: Temperature, fan speed

Output plugins (send metrics):

  • InfluxDB: Time-series database
  • Prometheus, Graphite, etc.

Why Telegraf:

  • Lightweight (~10-20MB memory)
  • Plugin ecosystem (200+ inputs)
  • Go-based (single binary)
  • Active development

Role Variables

Required Variables

VariableRequiredDescription
vault_influxdb_telegraf_passwordYesInfluxDB password (vault)

Optional Variables

VariableDefaultDescription
telegraf_agent_influxdb_urlhttp://<grafana-ip>:8086InfluxDB URL
telegraf_agent_influxdb_databasetelegraf_bddDatabase name
telegraf_agent_influxdb_usernametelegrafInfluxDB username
telegraf_agent_interval10sCollection interval
telegraf_agent_flush_interval10sFlush interval
telegraf_agent_enable_docker_inputfalseEnable Docker monitoring
telegraf_agent_enable_temp_inputfalseEnable temperature sensors
telegraf_agent_enable_interrupts_inputfalseEnable interrupts monitoring
telegraf_agent_enable_sysctl_inputfalseEnable sysctl monitoring
telegraf_agent_enable_snmp_synologyfalseEnable Synology NAS SNMP
telegraf_agent_enable_snmp_zyxel_apfalseEnable Zyxel AP SNMP

Variable Details

vault_influxdb_telegraf_password

Password for Telegraf user in InfluxDB.

IMPORTANT: Store in Ansible Vault

# In vault.yml
vault_influxdb_telegraf_password: "secure_password_here"

Usage in playbook: Automatically referenced by default

telegraf_agent_influxdb_url

URL to InfluxDB server.

Default: "http://{{ hostvars['grafana']['ip_vlan12'] }}:8086"

For Grafana host (InfluxDB on same server):

# In host_vars/grafana.yml
telegraf_agent_influxdb_url: "http://127.0.0.1:8086"

For remote InfluxDB:

telegraf_agent_influxdb_url: "http://192.168.x.x:8086"

telegraf_agent_interval

How often to collect metrics.

Default: 10s (10 seconds)

Recommendations:

  • 10s: Standard (most use cases)
  • 30s: Lower frequency (reduce load)
  • 5s: High frequency (detailed monitoring)

Trade-off: Lower interval = more data points but higher load

telegraf_agent_enable_docker_input

Enable Docker container monitoring.

Default: false

Enable for Docker hosts:

# In host_vars/docker.yml
telegraf_agent_enable_docker_input: true

What it monitors:

  • Container CPU usage
  • Container memory usage
  • Container network I/O
  • Container disk I/O
  • Container status

Requirements: Docker installed, Telegraf user added to docker group

telegraf_agent_enable_temp_input

Enable temperature sensor monitoring.

Default: false

Enable for servers:

# In host_vars/centreon.yml
telegraf_agent_enable_temp_input: true

What it monitors:

  • CPU temperature
  • Motherboard temperature
  • Disk temperature

Requirements: lm-sensors package installed, sensors detected

telegraf_agent_enable_interrupts_input

Enable interrupts monitoring.

Default: false

Enable for detailed system monitoring:

telegraf_agent_enable_interrupts_input: true

What it monitors:

  • IRQ interrupts per CPU
  • Useful for troubleshooting high interrupt rates

telegraf_agent_enable_sysctl_input

Enable sysctl kernel parameters monitoring.

Default: false

Enable for kernel monitoring:

telegraf_agent_enable_sysctl_input: true

What it monitors:

  • Kernel parameters
  • System tunables

telegraf_agent_enable_snmp_synology

Enable SNMP monitoring for Synology NAS.

Default: false

Enable for Grafana host (runs SNMP queries):

# In host_vars/grafana.yml
telegraf_agent_enable_snmp_synology: true

What it monitors:

  • NAS system info
  • Disk status and usage
  • RAID status
  • Network interfaces
  • Temperature sensors

Requirements:

  • Synology SNMP enabled
  • SNMP community string in vault

telegraf_agent_enable_snmp_zyxel_ap

Enable SNMP monitoring for Zyxel WiFi access points.

Default: false

Enable with target IPs:

# In host_vars/grafana.yml
telegraf_agent_enable_snmp_zyxel_ap: true
telegraf_agent_snmp_zyxel_agents:
  - "192.168.x.x"
  - "192.168.x.x"

What it monitors:

  • AP system info
  • WiFi client count
  • Network traffic
  • Device status

Note: Template includes example MIBs that need customization

Dependencies

No Ansible role dependencies, but requires:

  • InfluxDB server with database created
  • InfluxDB user with write permissions
  • Network connectivity to InfluxDB
  • Telegraf password in vault

Often used with:

  • influxdb: Deploy InfluxDB database
  • grafana_install: Visualize metrics in Grafana
  • docker_install: Install Docker before enabling Docker monitoring

Example Playbook

Basic Usage (All Hosts)

---
- name: Deploy Telegraf Agent
  hosts: all
  become: true

  roles:
    - telegraf_agent

Docker Host

---
- name: Deploy Telegraf on Docker Host
  hosts: docker
  become: true

  vars:
    telegraf_agent_enable_docker_input: true
    telegraf_agent_enable_temp_input: true
    telegraf_agent_enable_interrupts_input: true
    telegraf_agent_enable_sysctl_input: true

  roles:
    - telegraf_agent

Grafana Host (with SNMP Monitoring)

---
- name: Deploy Telegraf on Grafana
  hosts: grafana
  become: true

  vars:
    telegraf_agent_influxdb_url: "http://127.0.0.1:8086"
    telegraf_agent_enable_snmp_synology: true

  roles:
    - telegraf_agent

Standard Server

---
- name: Deploy Telegraf on Servers
  hosts: centreon,graylog,zoneminder
  become: true

  vars:
    telegraf_agent_enable_temp_input: true
    telegraf_agent_enable_interrupts_input: true
    telegraf_agent_enable_sysctl_input: true

  roles:
    - telegraf_agent

What This Role Does

1. Configure InfluxData Repository

RedHat/CentOS:

  • Adds InfluxData yum repository
  • Installs GPG key for package verification
  • Repository URL: https://repos.influxdata.com/rhel/$releasever/$basearch/stable

Debian/Ubuntu:

  • Adds InfluxData apt repository
  • Installs GPG key
  • Repository: https://repos.influxdata.com/debian

2. Install Telegraf Package

Installs telegraf package from InfluxData repository.

Version: Latest stable from repository

3. Deploy Main Configuration

File: /etc/telegraf/telegraf.conf

Content:

  • Agent settings (interval, flush, buffer)
  • InfluxDB output configuration
  • Common input plugins (always enabled):
    • CPU metrics
    • Disk usage
    • Disk I/O
    • Kernel statistics
    • Memory usage
    • Processes
    • Swap usage
    • System load/uptime
    • Network interfaces
    • Network statistics

4. Deploy Host-Specific Configurations

Directory: /etc/telegraf/telegraf.d/

Conditional configs (based on enabled flags):

  • docker.conf: Docker monitoring (enable_docker_input: true)
  • temp.conf: Temperature sensors (enable_temp_input: true)
  • interrupts.conf: Interrupt monitoring (enable_interrupts_input: true)
  • sysctl.conf: Sysctl monitoring (enable_sysctl_input: true)
  • snmp_synology.conf: Synology SNMP (enable_snmp_synology: true)
  • snmp_zyxel.conf: Zyxel AP SNMP (enable_snmp_zyxel_ap: true)

5. Add Telegraf to Docker Group (if Docker enabled)

If telegraf_agent_enable_docker_input: true:

  • Adds telegraf user to docker group
  • Allows Telegraf to access Docker socket
  • Restarts Telegraf to apply group membership

6. Validate Configuration

Before restarting Telegraf:

  • Runs telegraf --config /etc/telegraf/telegraf.conf --test
  • Validates configuration syntax
  • Fails deployment if config invalid

7. Enable and Start Service

  • Enables Telegraf service on boot
  • Starts or restarts Telegraf service
  • Service runs as telegraf user

Two-Tier Configuration Architecture

Why two tiers?

  1. Main config (telegraf.conf):

    • Common to all hosts
    • Agent settings
    • InfluxDB output
    • Standard system inputs
  2. Host-specific configs (telegraf.d/*.conf):

    • Varies by host role
    • Docker, SNMP, sensors
    • Enabled via variables

Benefits:

  • Clean separation of concerns
  • Easy per-host customization
  • Modular configuration

Loading order:

  1. Main config processed first
  2. Files in telegraf.d/ loaded alphabetically
  3. All configs merged

Common Input Plugins (Always Enabled)

CPU:

[[inputs.cpu]]
  percpu = true
  totalcpu = true

Disk:

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs"]

Memory:

[[inputs.mem]]

Network:

[[inputs.net]]
[[inputs.netstat]]

System:

[[inputs.system]]
[[inputs.kernel]]
[[inputs.processes]]
[[inputs.swap]]

Host-Specific Configurations

Docker Input

File: /etc/telegraf/telegraf.d/docker.conf

Configuration:

[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  container_names = []

Metrics collected:

  • docker_container_cpu: CPU usage per container
  • docker_container_mem: Memory usage per container
  • docker_container_net: Network I/O per container
  • docker_container_blkio: Disk I/O per container

Temperature Input

File: /etc/telegraf/telegraf.d/temp.conf

Configuration:

[[inputs.temp]]

Metrics collected:

  • temp: Temperature readings from all sensors

Requirements: lm-sensors installed and configured

SNMP Input (Synology)

File: /etc/telegraf/telegraf.d/snmp_synology.conf

Configuration:

[[inputs.snmp]]
  agents = ["192.168.x.x:161"]
  version = 2
  community = "{{ vault_snmp_community }}"
  interval = "60s"

  [[inputs.snmp.field]]
    name = "sysName"
    oid = "SNMPv2-MIB::sysName.0"
    is_tag = true

Metrics collected:

  • System information
  • Disk status and usage
  • RAID array status
  • Network interface stats
  • Temperature sensors

InfluxDB Output Configuration

Configuration:

[[outputs.influxdb]]
  urls = ["http://192.168.x.x:8086"]
  database = "telegraf_bdd"
  retention_policy = "trente_jours"
  username = "telegraf"
  password = "password_from_vault"

Connection:

  • HTTP protocol (not HTTPS by default)
  • Port 8086 (InfluxDB default)
  • Basic authentication

Retention policy:

  • trente_jours: 30 days retention
  • Configured in InfluxDB

Mapping to Hosts

Example host configurations:

HostRequired Variables
dockerenable_docker_input: true
enable_interrupts_input: true
enable_sysctl_input: true
centreonenable_temp_input: true
enable_interrupts_input: true
enable_sysctl_input: true
grafanainfluxdb_url: "http://127.0.0.1:8086"
enable_snmp_synology: true
graylogenable_temp_input: true
enable_interrupts_input: true
enable_sysctl_input: true
zoneminderenable_temp_input: true
enable_interrupts_input: true
enable_sysctl_input: true

File Locations

FilePathPurpose
Main config/etc/telegraf/telegraf.confAgent and common inputs
Host configs/etc/telegraf/telegraf.d/*.confHost-specific inputs
Service file/etc/systemd/system/telegraf.serviceSystemd service
Binary/usr/bin/telegrafTelegraf executable
Logsjournalctl -u telegrafService logs

Security Considerations

  • Vault Credentials: InfluxDB password in Ansible Vault
  • SNMP Communities: Community strings in vault
  • File Permissions: Config files readable by telegraf user
  • Docker Socket: Limited to docker group members
  • Network Security: InfluxDB accessible from all monitored hosts
  • No Encryption: HTTP to InfluxDB (use HTTPS in production)

Tags

This role does not define any tags. Use playbook-level tags if needed:

- hosts: all
  roles:
    - telegraf_agent
  tags:
    - telegraf
    - monitoring
    - metrics

Notes

  • Role runs on target systems (not localhost)
  • become: true required for package installation
  • InfluxDB database must exist before first run
  • Configuration validated before service restart
  • Compatible with RedHat, CentOS, Debian, Ubuntu
  • Telegraf runs as dedicated telegraf user
  • Service enabled and started automatically

Troubleshooting

Telegraf service won’t start

Check service status:

systemctl status telegraf
journalctl -u telegraf -n 50

Common causes:

  • Configuration syntax error
  • InfluxDB not accessible
  • Permission issues

Test configuration:

telegraf --config /etc/telegraf/telegraf.conf --test

Check connectivity to InfluxDB:

curl http://influxdb-ip:8086/ping
# Should return: 204 No Content

No metrics appearing in InfluxDB

Verify Telegraf running:

systemctl status telegraf
ps aux | grep telegraf

Check InfluxDB connection:

# From monitored host
curl http://influxdb-ip:8086/query?db=telegraf_bdd \
  --data-urlencode "q=SHOW MEASUREMENTS LIMIT 10"

Check Telegraf logs:

journalctl -u telegraf -f
# Look for connection errors or authentication failures

Manual test:

# Test Telegraf output
telegraf --config /etc/telegraf/telegraf.conf --test
# Should show metrics being collected

Docker metrics not collected

Symptom: No docker_* measurements in InfluxDB

Check Docker socket access:

# Verify telegraf in docker group
id telegraf
# Should show: groups=... docker ...

# Test Docker socket access
sudo -u telegraf docker ps
# Should list containers

Fix:

# Add telegraf to docker group
usermod -aG docker telegraf

# Restart Telegraf
systemctl restart telegraf

Temperature metrics missing

Symptom: No temp measurements

Check sensors:

# Install sensors if missing
apt install lm-sensors  # Debian
yum install lm_sensors  # RedHat

# Detect sensors
sensors-detect

# Test sensors
sensors
# Should show temperature readings

If no sensors found: System may not support temperature monitoring

SNMP queries failing

Symptom: No SNMP measurements

Test SNMP manually:

# Install snmp tools
apt install snmp  # Debian
yum install net-snmp-utils  # RedHat

# Query device
snmpwalk -v2c -c community_string device_ip system
# Should return OID values

Check Telegraf SNMP config:

cat /etc/telegraf/telegraf.d/snmp_*.conf
# Verify agents, community string correct

Check firewall:

# SNMP uses UDP 161
nc -u -zv device_ip 161

High CPU usage

Symptom: Telegraf consuming excessive CPU

Causes:

  • Too low collection interval
  • Too many SNMP devices
  • Complex input plugins

Solutions:

# Increase interval
telegraf_agent_interval: "30s"  # From 10s

# Reduce SNMP frequency
telegraf_agent_snmp_synology_interval: "120s"  # From 60s

Testing the Role

Verify Installation

# Check package installed
dpkg -l | grep telegraf  # Debian
rpm -qa | grep telegraf  # RedHat

# Check version
telegraf version

Verify Configuration

# Test main config
telegraf --config /etc/telegraf/telegraf.conf --test

# Check what would be collected
telegraf --config /etc/telegraf/telegraf.conf --test | head -20

Verify Service

# Check service status
systemctl status telegraf

# Check service enabled
systemctl is-enabled telegraf
# Should show: enabled

Verify Metrics in InfluxDB

# Query InfluxDB
curl -G 'http://influxdb-ip:8086/query?db=telegraf_bdd' \
  --data-urlencode "q=SELECT * FROM cpu WHERE host='hostname' ORDER BY time DESC LIMIT 5"

# Should return recent CPU metrics

Best Practices

  1. Use vault for credentials: Never commit passwords to git
  2. Monitor Telegraf itself: Set up alerts for Telegraf down
  3. Test configuration changes: Use telegraf --test before deploying
  4. Appropriate intervals: Balance data granularity vs load
  5. Host-specific configs: Use variables for per-host customization
  6. SNMP frequency: Poll SNMP devices less frequently (60s+)
  7. Docker group membership: Required for Docker monitoring
  8. InfluxDB retention: Configure appropriate retention policy
  9. Regular updates: Keep Telegraf updated for bug fixes
  10. Monitor InfluxDB: Ensure it has sufficient disk space

Adding New SNMP Device Types

To monitor additional SNMP device types:

1. Create Template

Create templates/telegraf_snmp_<device>.conf.j2:

[[inputs.snmp]]
  agents = [{{ telegraf_agent_snmp_<device>_agents | map('to_json') | join(', ') }}]
  version = 2
  community = "{{ vault_snmp_community }}"
  interval = "60s"

  [[inputs.snmp.field]]
    name = "sysName"
    oid = "SNMPv2-MIB::sysName.0"
    is_tag = true

2. Add Variables to defaults/main.yml

telegraf_agent_enable_snmp_<device>: false
telegraf_agent_snmp_<device>_agents:
  - "192.168.x.x"

3. Add Task to tasks/main.yml

- name: Deploy <device> SNMP config
  ansible.builtin.template:
    src: telegraf_snmp_<device>.conf.j2
    dest: "{{ telegraf_agent_config_dir }}/snmp_<device>.conf"
    mode: '0644'
  when: telegraf_agent_enable_snmp_<device> | bool
  notify: restart telegraf

4. Discover MIBs

# Walk device MIB tree
snmpwalk -v2c -c community device_ip

# Find specific OIDs
snmpwalk -v2c -c community device_ip 1.3.6.1.4.1

This role is often used with:

  • influxdb: Deploy InfluxDB database server
  • grafana_install: Visualize metrics in Grafana dashboards
  • docker_install: Install Docker before monitoring it
  • deploy_system_monitoring: Centreon monitoring integration

License

MIT

Author

Created for homelab infrastructure management.