The test configuration did make a noticeable difference to the shape of traffic graphs.
However this was undermined as at roughly 2330 last night the remote server hit it's FD limit and stopped responding to requests (though it continued to accept requests).
The result was that the traffic graphs (measured client side) now show plenty of egress but no measurable ingress (and of course requests for other things aren't being satisfied either).
The configuration used is as follows
Cron
*/15 * * * * ~/BASH_Haystack_generator/client-side/request_generator.sh -c test1 > /dev/null
*/5 * * * * ~/BASH_Haystack_generator/client-side/request_generator.sh -c test1-trickle > /dev/null
*test1 config
LOCKFILE='/tmp/haystackfile.lock'
# Endpoint configuration
REMOTE_FILE='http://server.test/haystackfile.img'
HTTP_HOSTNAME='haystack.test'
# Data Downloads
MAX_CHUNK_SIZE=6553600 # Maximum chunk size to request (bytes)
MAX_REQUESTS=100 # Maximum number of requests per session
RANDOMIZE_CHUNKSIZE="y" # use the same size chunk-size throughout a session
USER_AGENT='HaystackGen V0.1'
MAX_DELAY=5 # maximum delay between requests
# Data uploads
SEND_DATA="r" # Send random data upstream?
MAX_UPSTREAM="655360" # bytes
RANDOMIZE_US_CHUNKSIZE="y" # use the same size upstream per session
test1-trickle config
LOCKFILE='/tmp/haystackfile.lock'
# Endpoint configuration
REMOTE_FILE='http://server.test/haystackfile.img'
HTTP_HOSTNAME='haystack.test'
# Data Downloads
MAX_CHUNK_SIZE=393216 # Maximum chunk size to request (bytes)
MAX_REQUESTS=1200 # Maximum number of requests per session
RANDOMIZE_CHUNKSIZE="y" # use the same size chunk-size throughout a session
USER_AGENT='HaystackGen V0.1'
MAX_DELAY=3
# Data uploads
SEND_DATA="r" # Send random data upstream?
MAX_UPSTREAM="4096" # bytes
RANDOMIZE_US_CHUNKSIZE="n"
Obviously a server-side fix would be to increase FD limits, but want to see whether it's possible to achieve a config which has the desired effect on the traffic graphs without needing to change the default FD limits.
For reference, on the test server
$ ulimit -Hn
1024
$ ulimit -Sn
1024
Endpoint is running NGinx.
Activity
2014-12-08 11:23:11
The vast majority of which are the trickle script. Some are probably stuck as a result of the issues with the remote server, so have killed the lot
will check back in a bit to see what the count is
2014-12-08 12:24:20
Will need to do some fine-tuning I guess
2014-12-08 13:49:29
2014-12-08 13:49:29
2014-12-08 13:52:46
In the meantime as the generator is obviously able to run for longer than I expected
Am reducing the run frequencies in crontab
Suspect there'll still be some overlap, but it should at least reduce the effect of runaway build up
2014-12-08 13:55:39
Was logged so many times that the error log hit 7GB
2014-12-08 14:39:44
From some of the work done in BHAYSTACKG-2, I think the best way forward is to adjust the script to have different 'profiles' so that it can ape the behaviour of some legitimate traffic patterns.
2014-12-08 15:28:28
It's possible the issue isn't what was first thought. On the offchance it's an NGinx specific issue, have disabled sending additional POST data upstream to see if it makes a difference (seems unlikely though)
2014-12-08 23:19:46
Have configured a second test server and am also running against that to see whether the issue can be reproduced.
2014-12-09 09:56:57
Have re-enabled POST (i.e. set SEND_UPSTREAM to 'r') in the client for one server and see whether we then see the issue again. Have also re-increased the frequency in crontab to increase the likelihood of reproducing the issue
If/when the issue re-occurs, will check NGinx's FD usage to confirm for definite whether it's an issue of hitting file descriptor limits.
2014-12-09 10:58:34
Looking at the traffic patterns generated by running the script less frequently; the result is pretty undesirable as it's possible to roughly identify when the script is likely to be running, so the increased frequency of runs is currently required.
Until the generated traffic patterns get a bit smarter (BHAYSTACKG-3), the only real solution is to up the FD limit (1024 is really low anyway) so that an appropriate frequency of checks can be maintained.
Have set the FD limit to 10,000 on the test server, will check back in a while to see what number is actually used
2014-12-09 13:17:14
It's also using a lot of file descriptors for the haystack file
After waiting a little bit, NGinx has also started complaining it's run out of workers
I'm going to go out on a limb and guess that because NGinx thinks its using a persistent connection (the default for HTTP 1.1 being keep-alive) it's not closing the connection, and for some reason the FIN isn't doing it (which would be a bug in NGinx if that's the case).
Adding a 'Connection: close' header to the response, but disabling all runs first so NGinx can be restarted to make counting the number of CLOSE_WAITs a little easier.
On the server
Updated the relevant lines in the placeRequest function
and removed the lockfile
Next scheduled cronjob should kick in in about 4 minutes
2014-12-09 13:27:04
The server isn't currently showing any sockets in CLOSE_WAIT
There are runs occuring that are using POST and also others using GET, so it looks like it's a bug in NGinx, though the version in the repos appears to be quite old
Will look at compiling a newer version of NGinx in a while to re-test, but it seems best to pre-empt this bug within the client script anyway by adding a Connection header.
2014-12-09 13:40:15
2014-12-09 13:43:09
Webhook User-Agent
View Commit
2014-12-09 13:49:35
2014-12-09 13:49:36
2014-12-09 13:49:36
2014-12-09 13:49:39
2014-12-09 14:00:03