utilities/telegraf-plugins#6: tor-plugin will report daemon call failed if the daemon is slow to respond



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: tor-plugin v0.2
Created: 14-Jun-22 13:36



Description

See Github issue 1 for the background to this.

The command handler function works as follows

    a = sock.sendall(command.encode())

    # Read the response
    res = []
    while True:
        try:
            data = sock.recv(1024)
            if not data:
                break
            res.append(data.decode())
        except BlockingIOError:
            break

The socket (sock) is opened with setblocking(0) (otherwise the loop will hang until the socket times out).

This worked fine in testing.

However, Github issue 1 shows that it's possible for it to go wrong.

If the tor daemon is a little slow to respond then we'll get an empty read from the socket and hit

            if not data:
                break

leading to us returning an empty result. The plugin will then report that the call failed (in that GH ticket the failure type was authentication, but actually it could conceivably happen to any invocation of send_and_respond.

To prove the issue was what we thought it was, in that ticket we added

time.sleep(0.5)

to the while loop, and things started working.

Whilst acceptable to the user, that's obviously not a good long term approach.

Need to look at refactoring to avoid this issue.



Toggle State Changes

Activity


assigned to @btasker

changed the description

changed the description

Essentially, this stems from being lazy and wanting the function to treat the socket as a dumb pipe - write something in, expect something to be ready to be read.

What we probably need to do is to make the function protocol aware.

Once a reply starts being sent, we can tell from the response whether it's multi or single line:

  • 250 OK: end of (successful) response
  • 250-foo=bar: single line
  • 250+foo=bar: multi-line (final line will be a period .)

So we should have the function understand these and implement read-timeouts instead: we've sent a command, if we get nothing back (or are waiting on a line), then throw a timeout.

It's worth noting that there are libraries out there which implement this kind of communication (as it's not unique to Tor) as well as stem, but they're not generally installed by default and I want to try and limit the number of deps that need installing to be able to use the plugin.

I've just merged branch bf-telegraf-plugin-blocking in my local repo into the master branch (so the commit notifications will tip up soon).

The changes

  • Move the socket from non-blocking to blocking with timeout
  • Allow for a retry if initial reads fail
  • Implement basic protocol awareness, we won't re-poll the socket if the message has 250 OK at the end of it
verified

mentioned in commit github-mirror/telegraf-plugins@f9e3ce6949fb2fbcf7be816ec5316306803df14e

Commit: github-mirror/telegraf-plugins@f9e3ce6949fb2fbcf7be816ec5316306803df14e 
Author: B Tasker                            
                            
Date: 2022-06-15T13:51:53.000+01:00 

Message

Implement basic handling of slow tor daemon for utilities/telegraf-plugins#6

This moves to using a timeout on the socket so that we give the tor-daemon a little longer to start replying

It's a fairly rudimentary implementation though, and can (will?) be improved:

  • Allow n retries if we've not received data
  • Some level of protocol awareness so we don't need to wait on a second timeout to know that the response is done
+20 -1 (21 lines changed)
verified

mentioned in commit github-mirror/telegraf-plugins@fe3d6e8311b3d66e4b4ec298b90dfd92a10db70c

Commit: github-mirror/telegraf-plugins@fe3d6e8311b3d66e4b4ec298b90dfd92a10db70c 
Author: B Tasker                            
                            
Date: 2022-06-15T17:51:28.000+01:00 

Message

Implement a retry on initial read from socket (utilities/telegraf-plugins#6)

This should cater to cases where the tor-daemon is (exceptionally) slow to start responding.

+3 -1 (4 lines changed)
verified

mentioned in commit github-mirror/telegraf-plugins@64646e96bc6bb61c44fe1c56d006699d62ea3ec0

Commit: github-mirror/telegraf-plugins@64646e96bc6bb61c44fe1c56d006699d62ea3ec0 
Author: B Tasker                            
                            
Date: 2022-06-15T18:16:27.000+01:00 

Message

Add a small semblence of protocol awareness to the socket reads (utilities/telegraf-plugins#6)

The end of any (successful) message from the daemon will be "250 OK". We check for this so we can avoid trying to read from the socket again (incurring a time penalty the length of a timeout)

+8 -1 (9 lines changed)

The user has tested and confirmed that the changes work on his system.