Telegraf Agent
This role installs and configures the Telegraf monitoring agent on Linux systems for collecting and forwarding system metrics to InfluxDB.
Telegraf Agent Role
Overview
This role installs and configures the Telegraf monitoring agent on Linux systems for collecting and forwarding system metrics to InfluxDB. It uses a two-tier configuration approach with a main config file containing agent settings, InfluxDB output, and common input plugins, plus host-specific configs for specialized inputs like Docker monitoring, SNMP polling, temperature sensors, interrupts, and sysctl metrics. The role supports both RedHat and Debian systems, automatically manages repository configuration, validates Telegraf configuration before deploying, and handles service management.
Purpose
- System Metrics Collection: Monitor CPU, memory, disk, network on all systems
- InfluxDB Integration: Send metrics to centralized InfluxDB database
- Host-Specific Monitoring: Docker, SNMP, temperature sensors per host
- Two-Tier Configuration: Base + optional host-specific inputs
- Repository Management: Automatically configure InfluxData repos
- Service Management: Enable and manage Telegraf service
- Configuration Validation: Test config before applying
Requirements
- Ansible 2.9 or higher
- Target system: RedHat/CentOS or Debian/Ubuntu
- Root or sudo privileges
- InfluxDB server accessible from target hosts
- InfluxDB database and user created
- Telegraf password stored in Ansible Vault
- SNMP community strings in vault (if using SNMP monitoring)
What is Telegraf?
Telegraf is a plugin-driven server agent for collecting and reporting metrics:
Architecture:
Input Plugins → Telegraf Agent → Output Plugins
(Collect data) (Process) (Send to InfluxDB)
Input plugins (collect metrics):
- System: CPU, memory, disk, network
- Docker: Container metrics
- SNMP: Network device monitoring
- Sensors: Temperature, fan speed
Output plugins (send metrics):
- InfluxDB: Time-series database
- Prometheus, Graphite, etc.
Why Telegraf:
- Lightweight (~10-20MB memory)
- Plugin ecosystem (200+ inputs)
- Go-based (single binary)
- Active development
Role Variables
Required Variables
| Variable | Required | Description |
|---|---|---|
vault_influxdb_telegraf_password | Yes | InfluxDB password (vault) |
Optional Variables
| Variable | Default | Description |
|---|---|---|
telegraf_agent_influxdb_url | http://<grafana-ip>:8086 | InfluxDB URL |
telegraf_agent_influxdb_database | telegraf_bdd | Database name |
telegraf_agent_influxdb_username | telegraf | InfluxDB username |
telegraf_agent_interval | 10s | Collection interval |
telegraf_agent_flush_interval | 10s | Flush interval |
telegraf_agent_enable_docker_input | false | Enable Docker monitoring |
telegraf_agent_enable_temp_input | false | Enable temperature sensors |
telegraf_agent_enable_interrupts_input | false | Enable interrupts monitoring |
telegraf_agent_enable_sysctl_input | false | Enable sysctl monitoring |
telegraf_agent_enable_snmp_synology | false | Enable Synology NAS SNMP |
telegraf_agent_enable_snmp_zyxel_ap | false | Enable Zyxel AP SNMP |
Variable Details
vault_influxdb_telegraf_password
Password for Telegraf user in InfluxDB.
IMPORTANT: Store in Ansible Vault
# In vault.yml
vault_influxdb_telegraf_password: "secure_password_here"
Usage in playbook: Automatically referenced by default
telegraf_agent_influxdb_url
URL to InfluxDB server.
Default: "http://{{ hostvars['grafana']['ip_vlan12'] }}:8086"
For Grafana host (InfluxDB on same server):
# In host_vars/grafana.yml
telegraf_agent_influxdb_url: "http://127.0.0.1:8086"
For remote InfluxDB:
telegraf_agent_influxdb_url: "http://192.168.x.x:8086"
telegraf_agent_interval
How often to collect metrics.
Default: 10s (10 seconds)
Recommendations:
10s: Standard (most use cases)30s: Lower frequency (reduce load)5s: High frequency (detailed monitoring)
Trade-off: Lower interval = more data points but higher load
telegraf_agent_enable_docker_input
Enable Docker container monitoring.
Default: false
Enable for Docker hosts:
# In host_vars/docker.yml
telegraf_agent_enable_docker_input: true
What it monitors:
- Container CPU usage
- Container memory usage
- Container network I/O
- Container disk I/O
- Container status
Requirements: Docker installed, Telegraf user added to docker group
telegraf_agent_enable_temp_input
Enable temperature sensor monitoring.
Default: false
Enable for servers:
# In host_vars/centreon.yml
telegraf_agent_enable_temp_input: true
What it monitors:
- CPU temperature
- Motherboard temperature
- Disk temperature
Requirements: lm-sensors package installed, sensors detected
telegraf_agent_enable_interrupts_input
Enable interrupts monitoring.
Default: false
Enable for detailed system monitoring:
telegraf_agent_enable_interrupts_input: true
What it monitors:
- IRQ interrupts per CPU
- Useful for troubleshooting high interrupt rates
telegraf_agent_enable_sysctl_input
Enable sysctl kernel parameters monitoring.
Default: false
Enable for kernel monitoring:
telegraf_agent_enable_sysctl_input: true
What it monitors:
- Kernel parameters
- System tunables
telegraf_agent_enable_snmp_synology
Enable SNMP monitoring for Synology NAS.
Default: false
Enable for Grafana host (runs SNMP queries):
# In host_vars/grafana.yml
telegraf_agent_enable_snmp_synology: true
What it monitors:
- NAS system info
- Disk status and usage
- RAID status
- Network interfaces
- Temperature sensors
Requirements:
- Synology SNMP enabled
- SNMP community string in vault
telegraf_agent_enable_snmp_zyxel_ap
Enable SNMP monitoring for Zyxel WiFi access points.
Default: false
Enable with target IPs:
# In host_vars/grafana.yml
telegraf_agent_enable_snmp_zyxel_ap: true
telegraf_agent_snmp_zyxel_agents:
- "192.168.x.x"
- "192.168.x.x"
What it monitors:
- AP system info
- WiFi client count
- Network traffic
- Device status
Note: Template includes example MIBs that need customization
Dependencies
No Ansible role dependencies, but requires:
- InfluxDB server with database created
- InfluxDB user with write permissions
- Network connectivity to InfluxDB
- Telegraf password in vault
Often used with:
- influxdb: Deploy InfluxDB database
- grafana_install: Visualize metrics in Grafana
- docker_install: Install Docker before enabling Docker monitoring
Example Playbook
Basic Usage (All Hosts)
---
- name: Deploy Telegraf Agent
hosts: all
become: true
roles:
- telegraf_agent
Docker Host
---
- name: Deploy Telegraf on Docker Host
hosts: docker
become: true
vars:
telegraf_agent_enable_docker_input: true
telegraf_agent_enable_temp_input: true
telegraf_agent_enable_interrupts_input: true
telegraf_agent_enable_sysctl_input: true
roles:
- telegraf_agent
Grafana Host (with SNMP Monitoring)
---
- name: Deploy Telegraf on Grafana
hosts: grafana
become: true
vars:
telegraf_agent_influxdb_url: "http://127.0.0.1:8086"
telegraf_agent_enable_snmp_synology: true
roles:
- telegraf_agent
Standard Server
---
- name: Deploy Telegraf on Servers
hosts: centreon,graylog,zoneminder
become: true
vars:
telegraf_agent_enable_temp_input: true
telegraf_agent_enable_interrupts_input: true
telegraf_agent_enable_sysctl_input: true
roles:
- telegraf_agent
What This Role Does
1. Configure InfluxData Repository
RedHat/CentOS:
- Adds InfluxData yum repository
- Installs GPG key for package verification
- Repository URL:
https://repos.influxdata.com/rhel/$releasever/$basearch/stable
Debian/Ubuntu:
- Adds InfluxData apt repository
- Installs GPG key
- Repository:
https://repos.influxdata.com/debian
2. Install Telegraf Package
Installs telegraf package from InfluxData repository.
Version: Latest stable from repository
3. Deploy Main Configuration
File: /etc/telegraf/telegraf.conf
Content:
- Agent settings (interval, flush, buffer)
- InfluxDB output configuration
- Common input plugins (always enabled):
- CPU metrics
- Disk usage
- Disk I/O
- Kernel statistics
- Memory usage
- Processes
- Swap usage
- System load/uptime
- Network interfaces
- Network statistics
4. Deploy Host-Specific Configurations
Directory: /etc/telegraf/telegraf.d/
Conditional configs (based on enabled flags):
docker.conf: Docker monitoring (enable_docker_input: true)temp.conf: Temperature sensors (enable_temp_input: true)interrupts.conf: Interrupt monitoring (enable_interrupts_input: true)sysctl.conf: Sysctl monitoring (enable_sysctl_input: true)snmp_synology.conf: Synology SNMP (enable_snmp_synology: true)snmp_zyxel.conf: Zyxel AP SNMP (enable_snmp_zyxel_ap: true)
5. Add Telegraf to Docker Group (if Docker enabled)
If telegraf_agent_enable_docker_input: true:
- Adds
telegrafuser todockergroup - Allows Telegraf to access Docker socket
- Restarts Telegraf to apply group membership
6. Validate Configuration
Before restarting Telegraf:
- Runs
telegraf --config /etc/telegraf/telegraf.conf --test - Validates configuration syntax
- Fails deployment if config invalid
7. Enable and Start Service
- Enables Telegraf service on boot
- Starts or restarts Telegraf service
- Service runs as
telegrafuser
Two-Tier Configuration Architecture
Why two tiers?
-
Main config (
telegraf.conf):- Common to all hosts
- Agent settings
- InfluxDB output
- Standard system inputs
-
Host-specific configs (
telegraf.d/*.conf):- Varies by host role
- Docker, SNMP, sensors
- Enabled via variables
Benefits:
- Clean separation of concerns
- Easy per-host customization
- Modular configuration
Loading order:
- Main config processed first
- Files in
telegraf.d/loaded alphabetically - All configs merged
Common Input Plugins (Always Enabled)
CPU:
[[inputs.cpu]]
percpu = true
totalcpu = true
Disk:
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
Memory:
[[inputs.mem]]
Network:
[[inputs.net]]
[[inputs.netstat]]
System:
[[inputs.system]]
[[inputs.kernel]]
[[inputs.processes]]
[[inputs.swap]]
Host-Specific Configurations
Docker Input
File: /etc/telegraf/telegraf.d/docker.conf
Configuration:
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
container_names = []
Metrics collected:
docker_container_cpu: CPU usage per containerdocker_container_mem: Memory usage per containerdocker_container_net: Network I/O per containerdocker_container_blkio: Disk I/O per container
Temperature Input
File: /etc/telegraf/telegraf.d/temp.conf
Configuration:
[[inputs.temp]]
Metrics collected:
temp: Temperature readings from all sensors
Requirements: lm-sensors installed and configured
SNMP Input (Synology)
File: /etc/telegraf/telegraf.d/snmp_synology.conf
Configuration:
[[inputs.snmp]]
agents = ["192.168.x.x:161"]
version = 2
community = "{{ vault_snmp_community }}"
interval = "60s"
[[inputs.snmp.field]]
name = "sysName"
oid = "SNMPv2-MIB::sysName.0"
is_tag = true
Metrics collected:
- System information
- Disk status and usage
- RAID array status
- Network interface stats
- Temperature sensors
InfluxDB Output Configuration
Configuration:
[[outputs.influxdb]]
urls = ["http://192.168.x.x:8086"]
database = "telegraf_bdd"
retention_policy = "trente_jours"
username = "telegraf"
password = "password_from_vault"
Connection:
- HTTP protocol (not HTTPS by default)
- Port 8086 (InfluxDB default)
- Basic authentication
Retention policy:
trente_jours: 30 days retention- Configured in InfluxDB
Mapping to Hosts
Example host configurations:
| Host | Required Variables |
|---|---|
| docker | enable_docker_input: trueenable_interrupts_input: trueenable_sysctl_input: true |
| centreon | enable_temp_input: trueenable_interrupts_input: trueenable_sysctl_input: true |
| grafana | influxdb_url: "http://127.0.0.1:8086"enable_snmp_synology: true |
| graylog | enable_temp_input: trueenable_interrupts_input: trueenable_sysctl_input: true |
| zoneminder | enable_temp_input: trueenable_interrupts_input: trueenable_sysctl_input: true |
File Locations
| File | Path | Purpose |
|---|---|---|
| Main config | /etc/telegraf/telegraf.conf | Agent and common inputs |
| Host configs | /etc/telegraf/telegraf.d/*.conf | Host-specific inputs |
| Service file | /etc/systemd/system/telegraf.service | Systemd service |
| Binary | /usr/bin/telegraf | Telegraf executable |
| Logs | journalctl -u telegraf | Service logs |
Security Considerations
- Vault Credentials: InfluxDB password in Ansible Vault
- SNMP Communities: Community strings in vault
- File Permissions: Config files readable by telegraf user
- Docker Socket: Limited to docker group members
- Network Security: InfluxDB accessible from all monitored hosts
- No Encryption: HTTP to InfluxDB (use HTTPS in production)
Tags
This role does not define any tags. Use playbook-level tags if needed:
- hosts: all
roles:
- telegraf_agent
tags:
- telegraf
- monitoring
- metrics
Notes
- Role runs on target systems (not localhost)
become: truerequired for package installation- InfluxDB database must exist before first run
- Configuration validated before service restart
- Compatible with RedHat, CentOS, Debian, Ubuntu
- Telegraf runs as dedicated
telegrafuser - Service enabled and started automatically
Troubleshooting
Telegraf service won’t start
Check service status:
systemctl status telegraf
journalctl -u telegraf -n 50
Common causes:
- Configuration syntax error
- InfluxDB not accessible
- Permission issues
Test configuration:
telegraf --config /etc/telegraf/telegraf.conf --test
Check connectivity to InfluxDB:
curl http://influxdb-ip:8086/ping
# Should return: 204 No Content
No metrics appearing in InfluxDB
Verify Telegraf running:
systemctl status telegraf
ps aux | grep telegraf
Check InfluxDB connection:
# From monitored host
curl http://influxdb-ip:8086/query?db=telegraf_bdd \
--data-urlencode "q=SHOW MEASUREMENTS LIMIT 10"
Check Telegraf logs:
journalctl -u telegraf -f
# Look for connection errors or authentication failures
Manual test:
# Test Telegraf output
telegraf --config /etc/telegraf/telegraf.conf --test
# Should show metrics being collected
Docker metrics not collected
Symptom: No docker_* measurements in InfluxDB
Check Docker socket access:
# Verify telegraf in docker group
id telegraf
# Should show: groups=... docker ...
# Test Docker socket access
sudo -u telegraf docker ps
# Should list containers
Fix:
# Add telegraf to docker group
usermod -aG docker telegraf
# Restart Telegraf
systemctl restart telegraf
Temperature metrics missing
Symptom: No temp measurements
Check sensors:
# Install sensors if missing
apt install lm-sensors # Debian
yum install lm_sensors # RedHat
# Detect sensors
sensors-detect
# Test sensors
sensors
# Should show temperature readings
If no sensors found: System may not support temperature monitoring
SNMP queries failing
Symptom: No SNMP measurements
Test SNMP manually:
# Install snmp tools
apt install snmp # Debian
yum install net-snmp-utils # RedHat
# Query device
snmpwalk -v2c -c community_string device_ip system
# Should return OID values
Check Telegraf SNMP config:
cat /etc/telegraf/telegraf.d/snmp_*.conf
# Verify agents, community string correct
Check firewall:
# SNMP uses UDP 161
nc -u -zv device_ip 161
High CPU usage
Symptom: Telegraf consuming excessive CPU
Causes:
- Too low collection interval
- Too many SNMP devices
- Complex input plugins
Solutions:
# Increase interval
telegraf_agent_interval: "30s" # From 10s
# Reduce SNMP frequency
telegraf_agent_snmp_synology_interval: "120s" # From 60s
Testing the Role
Verify Installation
# Check package installed
dpkg -l | grep telegraf # Debian
rpm -qa | grep telegraf # RedHat
# Check version
telegraf version
Verify Configuration
# Test main config
telegraf --config /etc/telegraf/telegraf.conf --test
# Check what would be collected
telegraf --config /etc/telegraf/telegraf.conf --test | head -20
Verify Service
# Check service status
systemctl status telegraf
# Check service enabled
systemctl is-enabled telegraf
# Should show: enabled
Verify Metrics in InfluxDB
# Query InfluxDB
curl -G 'http://influxdb-ip:8086/query?db=telegraf_bdd' \
--data-urlencode "q=SELECT * FROM cpu WHERE host='hostname' ORDER BY time DESC LIMIT 5"
# Should return recent CPU metrics
Best Practices
- Use vault for credentials: Never commit passwords to git
- Monitor Telegraf itself: Set up alerts for Telegraf down
- Test configuration changes: Use
telegraf --testbefore deploying - Appropriate intervals: Balance data granularity vs load
- Host-specific configs: Use variables for per-host customization
- SNMP frequency: Poll SNMP devices less frequently (60s+)
- Docker group membership: Required for Docker monitoring
- InfluxDB retention: Configure appropriate retention policy
- Regular updates: Keep Telegraf updated for bug fixes
- Monitor InfluxDB: Ensure it has sufficient disk space
Adding New SNMP Device Types
To monitor additional SNMP device types:
1. Create Template
Create templates/telegraf_snmp_<device>.conf.j2:
[[inputs.snmp]]
agents = [{{ telegraf_agent_snmp_<device>_agents | map('to_json') | join(', ') }}]
version = 2
community = "{{ vault_snmp_community }}"
interval = "60s"
[[inputs.snmp.field]]
name = "sysName"
oid = "SNMPv2-MIB::sysName.0"
is_tag = true
2. Add Variables to defaults/main.yml
telegraf_agent_enable_snmp_<device>: false
telegraf_agent_snmp_<device>_agents:
- "192.168.x.x"
3. Add Task to tasks/main.yml
- name: Deploy <device> SNMP config
ansible.builtin.template:
src: telegraf_snmp_<device>.conf.j2
dest: "{{ telegraf_agent_config_dir }}/snmp_<device>.conf"
mode: '0644'
when: telegraf_agent_enable_snmp_<device> | bool
notify: restart telegraf
4. Discover MIBs
# Walk device MIB tree
snmpwalk -v2c -c community device_ip
# Find specific OIDs
snmpwalk -v2c -c community device_ip 1.3.6.1.4.1
Related Roles
This role is often used with:
- influxdb: Deploy InfluxDB database server
- grafana_install: Visualize metrics in Grafana dashboards
- docker_install: Install Docker before monitoring it
- deploy_system_monitoring: Centreon monitoring integration
License
MIT
Author
Created for homelab infrastructure management.