#1 Tor plugin : utilities/telegraf-plugins#1

btasker Permalink
11-May-22 08:06

assigned to @btasker

btasker Permalink
11-May-22 08:18

OK, as step 1, let's enable ControlPort on a tor instance.

Generate a password hash

/ $ tor --hash-password SecretPass
16:20F64DD23B8043966023A8797DDE0DE3AC697FD8461C1E7B25FF767D47

Edit torrc to enable the controlport and set the password

ControlPort 9051
HashedControlPassword 16:222D0FF1BE77A55760305E8D3A04304BC68FC37B19069DF43E52FF64E1

We can then netcat in and authenticate

/ $ nc 127.0.0.1 9051
AUTHENTICATE "SecretPass"
250 OK

btasker Permalink
11-May-22 08:32

We can pull

total bytes read (downloaded) and written (uploaded)

GETINFO traffic/read
250-traffic/read=5942577
250 OK
GETINFO traffic/written
250-traffic/written=8463596
250 OK

Daemon uptime

GETINFO uptime
250-uptime=339

Current software version

GETINFO version
250-version=0.4.5.10

Whether tor is currently active

A nonnegative integer: zero if Tor is currently active and building circuits, and nonzero if Tor has gone idle due to lack of use or some similar reason.

GETINFO dormant
250-dormant=0

List of circuits and their status (would need further parsing)

GETINFO circuit-status

List of entry guards and their status

GETINFO entry-guards
250+entry-guards=

Whether self tests against the ORPort worked (will report success if orport not configured)

GETINFO status/reachability-succeeded/or
250-status/reachability-succeeded/or=1

Get state for both ORPort and DirPort checks

GETINFO status/reachability-succeeded 
250-status/reachability-succeeded=OR=1 DIR=1

Get text status of current tor version

GETINFO status/version/current
250-status/version/current=recommended
250 OK

Assessment of network state (up/down)

GETINFO network-liveness
250-network-liveness=up
250 OK

btasker Permalink
11-May-22 08:39

So, breaking those down into tag vs fields, I'm inclined to say

tags

dormant
or_reachability_succeeded
dp_reachability_succeeded
tor_version_state
network_liveness

fields

bytes_rx
bytes_tx
uptime
software_version

entry-guards would get broken down into the following fields

num_known_entry_guards
num_connected_entry_guards
num_down_entry_guards
num_never_connected_entry_guards
num_up_entry_guards
num_unusable_entry_guards
num_unlisted_entry_guards

circuit-status needs further analysis. Section 4.1.1 of the spec details it

Will look at putting a script together later to connect in and collect these

btasker Permalink
11-May-22 18:03

Within the plugin, most of the stats to be collected are defined within a list:

stats = [
    #cmd, output_name, type, tag/field
    ["traffic/read", "bytes_rx", "int", "field"],
    ["traffic/written", "bytes_rx", "int", "field"],
    ["uptime", "uptime", "int", "field"],
    ["version", "tor_version", "string", "field"],
    ["dormant", "dormant", "int", "field"],
    ["status/reachability-succeeded/or", "orport_reachability", "int", "field"],
    ["status/reachability-succeeded/dr", "dirport_reachability", "int", "field"],

    ["status/version/current", "version_status", "string", "tag"],
    ["network-liveness", "network_liveness", "string", "tag"]
]

The first entry in each is the command to pass with GETINFO into the controlport, the second is the field/tag name we provide to telegraf.

type should be one of int,float,string (I guess we should add bool). It's ignored for tags (as they're always strings)

The final index is whether it should be treated as a tag or a field.

This covers most of the items listed above - we still need to break down and parse entry-guards

btasker Permalink
11-May-22 18:38

This is now mostly built.

Default configuration is at the top of the plugin and can be overridden via environment variable

CONTROL_H = os.getenv("CONTROL_HOST", "127.0.0.1")
CONTROL_P = int(os.getenv("CONTROL_PORT", 9051))
AUTH = os.getenv("CONTROL_AUTH", "MySecretPass")
MEASUREMENT = os.getenv("MEASUREMENT", "tor")

We return some additional tags if we failed to connect (or authenticate) with the Tor daemon

tor,controlport_connection=failed,failure_type=connection stats_fetch_failures=1i
tor,controlport_connection=failed,failure_type=authentication stats_fetch_failures=1i

Assuming that all is well, though, we return LP like this

tor,controlport_connection=success,version_status=recommended,network_liveness=up stats_fetch_failures=0i,bytes_rx=234889036i,bytes_rx=276329651i,uptime=35188i,tor_version="0.4.5.10",dormant=0i,orport_reachability=1i,dirport_reachability=1i,guards_total=22i,guards_never_connected=22i,guards_unusable=0i,guards_unlisted=0i,guards_up=0i,guards_down=0i

The next step then is probably to configure this in a telegraf instance and check it all works

btasker Permalink
11-May-22 18:50

The following config can be used

[[inputs.exec]]
  commands = ["/usr/local/bin/tor-daemon.py"]
  data_format = "influx"

Currently, it isn't possible to override env vars from within Telegraf's config, but when this is included in a release, it'll be possible to do something like

[[inputs.exec]]
  commands = ["/usr/local/bin/tor-daemon.py"]
  data_format = "influx"
  environment = [
    "CONTROL_HOST=127.0.0.1",
    "CONTROL_PORT=9051",
    "CONTROL_AUTH=MySecretPass",
    "MEASUREMENT=tor"
  ]

I now have data appearing in my DB - will look at creating some dashboards once there's a decent amount of data to work with

btasker Permalink
11-May-22 19:59

verified

mentioned in commit github-mirror/telegraf-plugins@fa1995e59596784ef022d7a4cdd24da1051bfa54

Commit: github-mirror/telegraf-plugins@fa1995e59596784ef022d7a4cdd24da1051bfa54 
Author: B Tasker                            
                            
Date: 2022-05-11T19:27:06.000+01:00

Message

Report a counter of how many stats have failed to fetch. See utilities/telegraf-plugins#1

+4 -2 (6 lines changed)

btasker Permalink
11-May-22 19:59

verified

mentioned in commit github-mirror/telegraf-plugins@ef590847215243757dac97389708d423be018ab0

Commit: github-mirror/telegraf-plugins@ef590847215243757dac97389708d423be018ab0 
Author: B Tasker                            
                            
Date: 2022-05-11T19:03:59.000+01:00

Message

Start implementing a telegraf-plugin to monitor tor for utilities/telegraf-plugins#1

This currently collects some simple stats via control port

+121 -0 (121 lines changed)

btasker Permalink
11-May-22 19:59

verified

mentioned in commit github-mirror/telegraf-plugins@2787c195c8c625f2e2b965b0fd80bb4455b80e8b

Commit: github-mirror/telegraf-plugins@2787c195c8c625f2e2b965b0fd80bb4455b80e8b 
Author: B Tasker                            
                            
Date: 2022-05-11T20:09:28.000+01:00

Message

Add file header and README for utilities/telegraf-plugins#1

+112 -0 (112 lines changed)

btasker Permalink
11-May-22 19:59

verified

mentioned in commit github-mirror/telegraf-plugins@2d256804d0b40f2e6887a8e73e9724bcc5419cf0

Commit: github-mirror/telegraf-plugins@2d256804d0b40f2e6887a8e73e9724bcc5419cf0 
Author: B Tasker                            
                            
Date: 2022-05-11T19:22:55.000+01:00

Message

Add ability to add counters based around multiline responses. see utilities/telegraf-plugins#1

+52 -3 (55 lines changed)

btasker Permalink
12-May-22 08:15

OK, starting with the most obvious graph: network throughput

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r._field == "bytes_rx" or r._field == "bytes_tx")
  |> filter(fn: (r) => r.host == v.host)
  |> group(columns: ["host", "_field"])
  |> derivative(unit: 1s, nonNegative: true)
  |> aggregateWindow(every: v.windowPeriod, fn: mean)
  |> map(fn: (r) => ({ r with 
      _time: r._time,
      _field: r._field,
      host: r.host,
      _value: r._value * 8.00
  }))

Screenshot_20220512_094224

btasker Permalink
12-May-22 08:27

Graph to show an overview of guard statuses

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "guards_down" or
            r._field == "guards_never_connected" or
            r._field == "guards_total" or
            r._field == "guards_unlisted" or
            r._field == "guards_unusable" or
            r._field == "guards_up")
  |>aggregateWindow(every: v.windowPeriod, fn: max)
  |>keep(columns: ["_time","host", "_field", "_value"])

Screenshot_20220512_092705

btasker Permalink
12-May-22 08:32

Daemon uptime in minutes

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "uptime")
  |> aggregateWindow(every: v.windowPeriod, fn: max)
  |> map(fn: (r) => ({ r with
         _value: float(v: r._value) / 60.0
  }))

Screenshot_20220512_093239

btasker Permalink
12-May-22 08:45

Maximum observed upload

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r._field == "bytes_tx")
  |> filter(fn: (r) => r.host == v.host)
  |> derivative(unit: 1s, nonNegative: true)
  |> max()
  |> map(fn: (r) => ({ r with 
      _value: (r._value * 8.00) / 1000.00  
  }))

With it's counterpart, highest observed download rate

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r._field == "bytes_rx")
  |> filter(fn: (r) => r.host == v.host)
  |> derivative(unit: 1s, nonNegative: true)
  |> max()
  |> map(fn: (r) => ({ r with 
      _value: (r._value * 8.00) / 1000.00  
  }))

Screenshot_20220512_094452

btasker Permalink
12-May-22 08:51

Kibibytes downloaded

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r._field == "bytes_rx")
  |> filter(fn: (r) => r.host == v.host)
  |> group()
  |> difference()
  |> filter(fn: (r) => r._value > 0)
  |> sum()
  |> map(fn: (r) => ({ r with
    _value: r._value / 1024
  }))

btasker Permalink
12-May-22 08:57

Turning the network liveness result into a hot/cold gauge

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r._field == "bytes_rx")
  |> filter(fn: (r) => r.host == v.host)
  |> last()
  |> map(fn: (r) => ({ 
     host: r.host,
     _value: if r.network_liveness == "up" 
             then
                1
             else
                0    
     ,
     _field: "network_liveness"
  }))

Screenshot_20220512_095717

btasker Permalink
12-May-22 09:09

Doing the same for software version assessment

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "tor")
  |> filter(fn: (r) => r._field == "bytes_rx")
  |> filter(fn: (r) => r.host == v.host)
  |> last()
  |> map(fn: (r) => ({ 
     host: r.host,
     _value: 
             if r.version_status == "recommended" or r.version_status == "new" or r.version_status == "new in series"
             then
                // Good to go
                5
             else if r.version_status == "old"
             then
                // might be an issue in future
                3
             else if r.version_status == "unrecommended" or r.version_status == "obsolete"
             then
                // Uhoh
                1
             else
                // Unknown
                7
      ,
     _field: "version_status"
  }))

Screenshot_20220512_100942

btasker Permalink
12-May-22 10:16

mentioned in issue #2

btasker Permalink
13-May-22 11:13

I've published a writeup at https://www.bentasker.co.uk/posts/documentation/general/monitoring-tor-daemon-with-telegraf.html

utilities/telegraf-plugins#1: Tor plugin

Issue Information

Activity