OPNsense Telegraf Configuration

This role configures Telegraf metrics collection agent on OPNsense firewall by deploying custom monitoring scripts and configuring exec input plugins.

Ansible Bash Grafana InfluxDB JSON OPNsense SSH Telegraf

OPNsense Telegraf Configuration Role

Overview

This role configures Telegraf metrics collection agent on OPNsense firewall by deploying custom monitoring scripts and configuring exec input plugins. It creates a scripts directory, deploys a CPU temperature monitoring script, adds exec input configuration to telegraf.conf, and restarts the Telegraf service to activate monitoring. This enables CPU temperature metrics to be collected and sent to InfluxDB for visualization in Grafana.

Purpose

  • CPU Temperature Monitoring: Collect thermal metrics from OPNsense
  • Custom Metrics: Deploy shell scripts for specialized monitoring
  • InfluxDB Integration: Send metrics to time-series database
  • Grafana Visualization: Enable temperature dashboards
  • Exec Input Pattern: Use Telegraf exec plugin for custom collectors
  • Automated Configuration: Deploy and configure via SSH

Requirements

  • Ansible 2.9 or higher
  • OPNsense firewall with Telegraf installed (see opnsense_install_packages)
  • SSH access to OPNsense
  • Root or sudo privileges on OPNsense
  • Network connectivity to OPNsense (VLAN10)
  • InfluxDB server configured (Telegraf output already configured)

What is Telegraf Exec Input?

Telegraf exec input runs external programs to collect metrics:

How it works:

Telegraf → Runs shell script → Parses output → Sends to InfluxDB

Example:

# Script output (InfluxDB line protocol)
sensors,feature=cpu temp_input=45.0
sensors,feature=cpu temp_input=46.5

# Telegraf collects this every interval
# Sends to InfluxDB

Benefits:

  • Collect any metric via script
  • Use existing monitoring scripts
  • Parse custom data sources
  • Extend Telegraf functionality

Role Variables

Optional Variables

VariableDefaultDescription
opnsense_telegraf_configuration_scripts_folder/usr/local/etc/telegraf-scriptsScripts directory path
opnsense_telegraf_configuration_temperature_script_namecputemp.shCPU temp script filename
opnsense_telegraf_configuration_exec_timeout5sScript execution timeout
opnsense_telegraf_configuration_data_formatinfluxData format (influx line protocol)
opnsense_telegraf_configuration_config_file/usr/local/etc/telegraf.confTelegraf config path

Variable Details

opnsense_telegraf_configuration_scripts_folder

Directory to store custom monitoring scripts.

Default: /usr/local/etc/telegraf-scripts

Created automatically if doesn’t exist.

opnsense_telegraf_configuration_temperature_script_name

Name of CPU temperature monitoring script.

Default: cputemp.sh

Full path: <scripts_folder>/cputemp.sh

opnsense_telegraf_configuration_exec_timeout

Maximum time to wait for script execution.

Default: 5s

Values: Duration string (e.g., 5s, 10s, 1m)

Timeout behavior: If script takes longer, Telegraf kills it and reports error.

opnsense_telegraf_configuration_data_format

Output format from exec script.

Default: influx (InfluxDB line protocol)

Other formats: json, graphite, value, nagios

Influx format:

measurement,tag=value field=value timestamp

Dependencies

This role requires:

  • opnsense_install_packages: Install os-telegraf package first
  • SSH access to OPNsense
  • Telegraf service installed and running

Optionally used with:

  • telegraf_agent: Configure Telegraf on other hosts
  • deploy_influxdb: Deploy InfluxDB server
  • deploy_grafana: Visualize metrics

Example Playbook

Basic Usage

---
- name: Configure OPNsense Telegraf
  hosts: opnsense
  become: true

  roles:
    - opnsense_telegraf_configuration

Custom Script Directory

---
- name: Configure Telegraf with Custom Path
  hosts: opnsense
  become: true

  vars:
    opnsense_telegraf_configuration_scripts_folder: /usr/local/scripts

  roles:
    - opnsense_telegraf_configuration

Longer Timeout

---
- name: Configure Telegraf with Extended Timeout
  hosts: opnsense
  become: true

  vars:
    opnsense_telegraf_configuration_exec_timeout: 10s

  roles:
    - opnsense_telegraf_configuration

What This Role Does

  1. Create scripts directory:

    • Creates /usr/local/etc/telegraf-scripts
    • Sets permissions: 0755 (rwxr-xr-x)
    • Ensures directory exists
  2. Deploy CPU temperature script:

    • Copies cputemp.sh to scripts folder
    • Sets executable permissions: 0755
    • Script queries sysctl dev.cpu for temperatures
  3. Configure exec input:

    • Adds configuration block to /usr/local/etc/telegraf.conf
    • Uses blockinfile with markers for idempotency
    • Configures command path, timeout, data format
  4. Restart Telegraf:

    • Restarts telegraf service
    • Activates new configuration
    • Begins collecting CPU temperature metrics

CPU Temperature Script

Script Content

File: cputemp.sh

sysctl dev.cpu | grep temperature | sed 's/[a-z\.]*/sensors,feature=cpu/;s/\.[a-z]*\: / temp_input=/;s/.$//'

What It Does

  1. Query sysctl: sysctl dev.cpu - Get CPU information
  2. Filter temperature: grep temperature - Only temperature readings
  3. Format output: sed - Convert to InfluxDB line protocol

Example Output

Raw sysctl output:

dev.cpu.0.temperature: 45.0C
dev.cpu.1.temperature: 46.5C
dev.cpu.2.temperature: 44.2C
dev.cpu.3.temperature: 47.1C

Script output (InfluxDB format):

sensors,feature=cpu temp_input=45.0
sensors,feature=cpu temp_input=46.5
sensors,feature=cpu temp_input=44.2
sensors,feature=cpu temp_input=47.1

InfluxDB Line Protocol

Format:

measurement,tag1=value1,tag2=value2 field1=value1,field2=value2 timestamp

In this script:

  • Measurement: sensors
  • Tag: feature=cpu
  • Field: temp_input=<temperature>
  • Timestamp: Added by Telegraf automatically

Why tags: Tags are indexed, enabling fast queries like “show me all CPU metrics”

Telegraf Configuration

Exec Input Block

Added to telegraf.conf:

[[inputs.exec]]
  commands = [
    "sh /usr/local/etc/telegraf-scripts/cputemp.sh",
  ]
  timeout = "5s"
  data_format = "influx"

Configuration Details

commands: List of commands to execute

  • Can have multiple scripts
  • Executed every collection interval

timeout: Max execution time

  • Prevents hung scripts
  • Default: 5 seconds

data_format: Output parser

  • influx: InfluxDB line protocol
  • json: JSON output
  • graphite: Graphite format

Idempotency

Uses blockinfile with markers:

marker: "# {mark} ANSIBLE MANAGED BLOCK - CPU Temperature Monitoring"

First run: Adds configuration block Subsequent runs: Updates block if changed, otherwise no-op

Safe to run repeatedly without duplicating configuration.

InfluxDB Measurement

Data Storage

Measurement name: sensors

Tags:

  • feature=cpu
  • host=opnsense (added by Telegraf)

Fields:

  • temp_input: Temperature value (float)

Query Examples

InfluxQL:

-- Get latest CPU temps
SELECT last(temp_input) FROM sensors WHERE feature='cpu' AND host='opnsense'

-- Average over last hour
SELECT mean(temp_input) FROM sensors
WHERE feature='cpu' AND host='opnsense'
AND time > now() - 1h
GROUP BY time(5m)

Flux (InfluxDB 2.x):

from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "sensors")
  |> filter(fn: (r) => r.feature == "cpu")
  |> filter(fn: (r) => r.host == "opnsense")
  |> filter(fn: (r) => r._field == "temp_input")

Grafana Visualization

Create Dashboard Panel

Query:

SELECT mean("temp_input")
FROM "sensors"
WHERE ("host" = 'opnsense' AND "feature" = 'cpu')
AND $timeFilter
GROUP BY time($__interval) fill(linear)

Panel settings:

  • Visualization: Time series or Gauge
  • Unit: Celsius (°C)
  • Thresholds:
    • Green: < 60°C
    • Yellow: 60-80°C
    • Red: > 80°C

Alert Configuration

Alert rule:

Condition:
  WHEN avg() OF query(A, 5m, now) IS ABOVE 80

Notifications:
  Send to: admin-email
  Message: "OPNsense CPU temperature high: {{ value }}°C"

File Locations

FilePathPurpose
Scripts directory/usr/local/etc/telegraf-scripts/Custom monitoring scripts
CPU temp script/usr/local/etc/telegraf-scripts/cputemp.shTemperature collector
Telegraf config/usr/local/etc/telegraf.confMain Telegraf configuration
Telegraf service/usr/local/etc/rc.d/telegrafFreeBSD service script
Telegraf logs/var/log/telegraf/Service logs

Security Considerations

  • SSH Access: Role requires SSH with root/sudo
  • Script Permissions: Scripts executable by Telegraf user
  • Command Injection: Scripts don’t accept external input (safe)
  • Read-Only Data: Scripts only read system data
  • No Network Access: Scripts run locally on OPNsense
  • Resource Usage: Scripts lightweight (negligible CPU/memory)
  • Timeout Protection: Prevents runaway scripts

Tags

This role does not define any tags. Use playbook-level tags if needed:

- hosts: opnsense
  roles:
    - opnsense_telegraf_configuration
  tags:
    - opnsense
    - telegraf
    - monitoring
    - metrics

Notes

  • Requires Telegraf already installed (via opnsense_install_packages)
  • Runs directly on OPNsense (not delegated to localhost)
  • become: true required for file operations and service restart
  • Configuration uses blockinfile for idempotency
  • Scripts run every Telegraf collection interval (typically 10s)
  • CPU temperature from sysctl dev.cpu
  • Output in InfluxDB line protocol format
  • Restarts Telegraf to apply changes

Troubleshooting

No temperature data in InfluxDB

Check Telegraf running:

# On OPNsense
service telegraf status

Check script executable:

ls -la /usr/local/etc/telegraf-scripts/cputemp.sh
# Should be: -rwxr-xr-x

Test script manually:

sh /usr/local/etc/telegraf-scripts/cputemp.sh
# Should output: sensors,feature=cpu temp_input=XX.X

Check Telegraf logs:

tail -f /var/log/telegraf/telegraf.log
# Look for exec plugin errors

Script timeout errors

Symptom: Telegraf logs show timeout

Error in plugin [inputs.exec]: exec: signal: killed

Solution: Increase timeout

opnsense_telegraf_configuration_exec_timeout: 10s

Permission denied errors

Symptom: Telegraf can’t execute script

Error in plugin [inputs.exec]: exec: fork/exec: permission denied

Solution: Check script permissions

chmod +x /usr/local/etc/telegraf-scripts/cputemp.sh

Invalid InfluxDB format

Symptom: Telegraf logs show parse errors

Error in plugin [inputs.exec]: metric parse error

Test script output:

sh /usr/local/etc/telegraf-scripts/cputemp.sh

Expected format:

sensors,feature=cpu temp_input=45.0

Not:

Temperature: 45.0C  # Wrong format

Telegraf not restarting

Manual restart:

service telegraf restart
service telegraf status

Check service:

# FreeBSD service check
service telegraf onestatus

Testing the Role

Verify Files Created

# Check scripts directory
ls -la /usr/local/etc/telegraf-scripts/

# Check CPU temp script
ls -la /usr/local/etc/telegraf-scripts/cputemp.sh
cat /usr/local/etc/telegraf-scripts/cputemp.sh

Test Script

# Run script manually
sh /usr/local/etc/telegraf-scripts/cputemp.sh

# Expected output:
# sensors,feature=cpu temp_input=45.0
# sensors,feature=cpu temp_input=46.5
# ...

Verify Telegraf Config

# Check configuration block added
grep -A 10 "CPU Temperature Monitoring" /usr/local/etc/telegraf.conf

# Should show:
# [[inputs.exec]]
#   commands = [...]
#   timeout = "5s"
#   data_format = "influx"

Verify Telegraf Running

# Check service status
service telegraf status

# Check process
ps aux | grep telegraf

Query InfluxDB

# Via influx CLI
influx -database telegraf -execute "SELECT * FROM sensors WHERE feature='cpu' ORDER BY time DESC LIMIT 10"

# Via curl
curl -G 'http://influxdb:8086/query?db=telegraf' \
  --data-urlencode "q=SELECT * FROM sensors WHERE feature='cpu' ORDER BY time DESC LIMIT 10"

Check Grafana

  1. Log into Grafana
  2. Explore → Select InfluxDB datasource
  3. Query: SELECT temp_input FROM sensors WHERE feature='cpu'
  4. Should see temperature data points

Best Practices

  1. Test scripts manually: Verify output before deploying
  2. Set appropriate timeouts: Based on script complexity
  3. Monitor script errors: Check Telegraf logs regularly
  4. Use line protocol: InfluxDB format for best performance
  5. Tag appropriately: Use tags for dimensions, fields for values
  6. Keep scripts simple: Complex logic belongs in applications
  7. Handle errors: Scripts should not fail catastrophically
  8. Document custom scripts: Comment what they measure
  9. Version control: Store scripts in git
  10. Regular testing: Ensure scripts work after OPNsense updates

Extending This Role

Add Additional Scripts

Create new script:

# files/disktemp.sh
sysctl kern.disks | awk '{print "disk_count count="NF}'

Update role:

- name: Deploy disk temperature script
  ansible.builtin.copy:
    src: disktemp.sh
    dest: "{{ opnsense_telegraf_configuration_scripts_folder }}/disktemp.sh"
    mode: '0755'

- name: Add disk exec input
  ansible.builtin.blockinfile:
    path: "{{ opnsense_telegraf_configuration_config_file }}"
    marker: "# {mark} ANSIBLE MANAGED BLOCK - Disk Monitoring"
    block: |
      [[inputs.exec]]
        commands = [
          "sh {{ opnsense_telegraf_configuration_scripts_folder }}/disktemp.sh",
        ]
        timeout = "5s"
        data_format = "influx"

Multiple Measurements

Script with multiple measurements:

#!/bin/sh
# Output multiple measurements
echo "uptime value=$(sysctl -n kern.boottime | awk '{print $4}' | sed 's/,//')"
echo "processes count=$(ps aux | wc -l)"
echo "load,period=1min value=$(uptime | awk -F'load averages:' '{print $2}' | awk '{print $1}')"

This role is often used with:

  • opnsense_install_packages: Install os-telegraf package
  • telegraf_agent: Configure Telegraf on other hosts
  • deploy_influxdb: Deploy InfluxDB server
  • deploy_grafana: Visualize metrics in dashboards

Performance Impact

CPU: Negligible (~0.1% per script execution) Memory: Minimal (~1-2MB for Telegraf) Disk I/O: Low (script reads sysctl only) Network: Depends on InfluxDB connection Frequency: Default 10s interval (configurable in main telegraf.conf)

Not a concern for typical homelab or production use.

License

MIT

Author

Created for homelab infrastructure management.