########################################################################################## MISC-18: Alternative Load balancing design ########################################################################################## Issue Type: Improvement ----------------------------------------------------------------------------------------- Issue Information ==================== Priority: Major Status: Closed Resolution: Done (2017-07-06 15:50:48) Project: Miscellaneous (MISC) Reported By: btasker Assigned To: btasker Affected Versions: - TorCDN Targeted for fix in version: - TorCDN Time Estimate: 225 minutes Time Logged: 15 minutes ----------------------------------------------------------------------------------------- Issue Description ================== The current load balancing model is contingent on the race condition in Hidden Service descriptor publishing. There's no mechanism on the edge itself to balance load, requests will simply go to whichever edge device most recently published it's descriptor to whichever dirauth the user's client contacts. Although it's not complete yet, interim results from MISC-17 suggest that load may not be spread across the edge quite as hoped. Both edge devices have seen some requests, but the load has primarily been taken by one device. Although it needs testing there's no reason to think this would be any different if one edge devices reaches saturation, which would lead to potentially serious act on delivery. An alternative delivery model might be to have a setup like the following - Site embed is foo.onion/something.js - foo.onion/something.js leads to a 302 to to bar.onion/something.js or another.onion/something.js Where a proportion of the edge would answer to bar.onion and another proportion would answer another.onion. Obviously you could use more than two descriptors if the edge were big enough. Theoretically, all edges could support all HS descriptors, but I suspect we'd then run into the same issue we're trying to work around at the moment. The obvious issue with this, is you're introducing the time required to set up an additional circuit into the mix. So need to test what the performance impact is from a client's perspective. If it's negligible then having some kind of mechanism where the initial point of contact (foo.onion) knows the rough load of the edge would allow it to intelligently decide which descriptor to use for the next request it received. Though spray and pray would probably also give some benefit when compared to the current model. The initial point of contact would also need to be available on multiple edge devices to ensure it's redundant. In principle, it could be available on all edges, though there's a risk that saturation might then impact foo.onion too. The aim of this issue is to test HTTP redirection based balancing and see what the cost of using that method is. ----------------------------------------------------------------------------------------- Issue Relations ================ - relates to MISC-17: Image Tests - relates to MISC-15: Theoretical: Productisation of a CDN as a service ----------------------------------------------------------------------------------------- Activity ========== ----------------------------------------------------------------------------------------- 2016-01-17 15:50:29 btasker ----------------------------------------------------------------------------------------- Depending on whether the module is enabled or not, may be able to use the _perl\_set_ directive to generate a descriptor selection. ----------------------------------------------------------------------------------------- 2016-01-18 19:35:27 btasker ----------------------------------------------------------------------------------------- Unfortunately neither edge has been built with --with-http\_perl\_module -- BEGIN SNIPPET -- ~# nginx -V nginx version: nginx/1.9.9 built by gcc 4.7.2 (Debian 4.7.2-5) built with OpenSSL 1.0.1e 11 Feb 2013 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,--as-needed' --with-ipv6 -- END SNIPPET -- As the main aim is to gauge the impact of redirects, it's not the end of the world, but I'd have preferred to have built a model device selection mechanism to test against. ----------------------------------------------------------------------------------------- 2016-01-19 07:51:32 btasker ----------------------------------------------------------------------------------------- As boring as it is, the following config is probably sufficient for this set of tests -- BEGIN SNIPPET -- server { server_name foo.onion; return 302 http://f5jayrbaz7nmtyyr.onion$request_uri?; } -- END SNIPPET -- ----------------------------------------------------------------------------------------- 2016-01-25 11:04:04 ----------------------------------------------------------------------------------------- btasker changed status from 'Open' to 'In Progress' ----------------------------------------------------------------------------------------- 2016-01-25 11:17:16 btasker ----------------------------------------------------------------------------------------- Generated a new PK by adding the following to Torrc on Edge-1 -- BEGIN SNIPPET -- HiddenServiceDir /var/lib/tor/btaskerstreamingtest-redir/ HiddenServicePort 80 127.0.0.1:9080 -- END SNIPPET -- Which gives us a descriptor of 52umrndqq5rf2o4v.onion Configured in NGinx -- BEGIN SNIPPET -- server { listen localhost:80; root /usr/share/nginx/empty; server_name 52umrndqq5rf2o4v.onion; return 302 http://f5jayrbaz7nmtyyr.onion$request_uri?; } -- END SNIPPET -- Testing -- BEGIN SNIPPET -- ben@milleniumfalcon:~$ curl -vvv -l http://52umrndqq5rf2o4v.onion/noexist/test.foo * Hostname was NOT found in DNS cache * Trying 10.228.223.96... * Connected to 52umrndqq5rf2o4v.onion (10.228.223.96) port 80 (#0) > GET /noexist/test.foo HTTP/1.1 > User-Agent: curl/7.35.0 > Host: 52umrndqq5rf2o4v.onion > Accept: */* > < HTTP/1.1 302 Moved Temporarily * Server nginx is not blacklisted < Server: nginx < Date: Mon, 25 Jan 2016 11:16:45 GMT < Content-Type: text/html < Content-Length: 154 < Connection: keep-alive < Location: http://f5jayrbaz7nmtyyr.onion/noexist/test.foo? < 302 Found

302 Found


nginx
* Connection #0 to host 52umrndqq5rf2o4v.onion left intact -- END SNIPPET -- Looks good, so just need to look at setting the test script running ----------------------------------------------------------------------------------------- 2016-01-25 11:17:24 ----------------------------------------------------------------------------------------- btasker changed status from 'In Progress' to 'Open' ----------------------------------------------------------------------------------------- 2016-01-25 11:17:39 ----------------------------------------------------------------------------------------- btasker changed timespent from '0 minutes' to '13 minutes' ----------------------------------------------------------------------------------------- 2016-01-25 11:20:46 btasker ----------------------------------------------------------------------------------------- Client script triggered -- BEGIN SNIPPET -- ben@milleniumfalcon:~$ SERIAL=0; while [ $SERIAL -lt 500000 ]; do select=`shuf -i1-2 -n1`; if [ $select == 2 ]; then extension="html"; else extension="gif"; fi; number=`shuf -i1-2000 -n1`; curl -H "X-Downstream: Serial-G$SERIAL" -sL -w "G${SERIAL},%{http_code},\"%{url_effective}\",%{time_total},%{time_namelookup},%{time_connect},%{time_redirect},%{time_starttransfer},%{size_download},%{size_request},%{num_redirects},%{speed_download}\\n" -o /dev/null "http://52umrndqq5rf2o4v.onion/qrcodes/image-${number}.${extension}" >> metricsG.csv; SERIAL=$(( $SERIAL + 1 )); done -- END SNIPPET -- ----------------------------------------------------------------------------------------- 2016-01-25 11:21:06 ----------------------------------------------------------------------------------------- btasker changed timespent from '13 minutes' to '15 minutes' ----------------------------------------------------------------------------------------- 2017-07-06 15:50:48 ----------------------------------------------------------------------------------------- btasker changed status from 'Open' to 'Resolved' ----------------------------------------------------------------------------------------- 2017-07-06 15:50:48 ----------------------------------------------------------------------------------------- btasker added 'Done' to resolution ----------------------------------------------------------------------------------------- 2017-07-06 15:50:53 ----------------------------------------------------------------------------------------- btasker changed status from 'Resolved' to 'Closed' ----------------------------------------------------------------------------------------- Worklog ======== ----------------------------------------------------------------------------------------- 2016-01-25 11:17:39 btasker 13 minutes ----------------------------------------------------------------------------------------- Configuring and testing servers ----------------------------------------------------------------------------------------- 2016-01-25 11:21:06 btasker 2 minutes ----------------------------------------------------------------------------------------- Triggering client requests