See Github issue 1 for the background to this.
The command handler function works as follows
a = sock.sendall(command.encode())
# Read the response
res = []
while True:
try:
data = sock.recv(1024)
if not data:
break
res.append(data.decode())
except BlockingIOError:
break
The socket (sock
) is opened with setblocking(0)
(otherwise the loop will hang until the socket times out).
This worked fine in testing.
However, Github issue 1 shows that it's possible for it to go wrong.
If the tor
daemon is a little slow to respond then we'll get an empty read from the socket and hit
if not data:
break
leading to us returning an empty result. The plugin will then report that the call failed (in that GH ticket the failure type was authentication
, but actually it could conceivably happen to any invocation of send_and_respond
.
To prove the issue was what we thought it was, in that ticket we added
time.sleep(0.5)
to the while loop, and things started working.
Whilst acceptable to the user, that's obviously not a good long term approach.
Need to look at refactoring to avoid this issue.
Activity
14-Jun-22 13:36
assigned to @btasker
14-Jun-22 14:02
changed the description
14-Jun-22 14:02
changed the description
15-Jun-22 12:16
Essentially, this stems from being lazy and wanting the function to treat the socket as a dumb pipe - write something in, expect something to be ready to be read.
What we probably need to do is to make the function protocol aware.
Once a reply starts being sent, we can tell from the response whether it's multi or single line:
250 OK
: end of (successful) response250-foo=bar
: single line250+foo=bar
: multi-line (final line will be a period.
)So we should have the function understand these and implement read-timeouts instead: we've sent a command, if we get nothing back (or are waiting on a line), then throw a timeout.
It's worth noting that there are libraries out there which implement this kind of communication (as it's not unique to Tor) as well as stem, but they're not generally installed by default and I want to try and limit the number of deps that need installing to be able to use the plugin.
15-Jun-22 17:25
I've just merged branch
bf-telegraf-plugin-blocking
in my local repo into themaster
branch (so the commit notifications will tip up soon).The changes
250 OK
at the end of it15-Jun-22 17:59
mentioned in commit github-mirror/telegraf-plugins@f9e3ce6949fb2fbcf7be816ec5316306803df14e
Message
Implement basic handling of slow tor daemon for utilities/telegraf-plugins#6
This moves to using a timeout on the socket so that we give the tor-daemon a little longer to start replying
It's a fairly rudimentary implementation though, and can (will?) be improved:
n
retries if we've not received data15-Jun-22 17:59
mentioned in commit github-mirror/telegraf-plugins@fe3d6e8311b3d66e4b4ec298b90dfd92a10db70c
Message
Implement a retry on initial read from socket (utilities/telegraf-plugins#6)
This should cater to cases where the tor-daemon is (exceptionally) slow to start responding.
15-Jun-22 17:59
mentioned in commit github-mirror/telegraf-plugins@64646e96bc6bb61c44fe1c56d006699d62ea3ec0
Message
Add a small semblence of protocol awareness to the socket reads (utilities/telegraf-plugins#6)
The end of any (successful) message from the daemon will be "250 OK". We check for this so we can avoid trying to read from the socket again (incurring a time penalty the length of a timeout)
16-Jun-22 08:18
The user has tested and confirmed that the changes work on his system.