########################################################################################## MISC-15: Theoretical: Productisation of a CDN as a service ########################################################################################## Issue Type: Sub-task ----------------------------------------------------------------------------------------- Issue Information ==================== Priority: Major Status: Closed Resolution: Done (2017-07-06 15:53:51) Project: Miscellaneous (MISC) Reported By: btasker Assigned To: btasker Child of: MISC-12 - Optimising Video Delivery for Tor / Building a Tor based CDN Affected Versions: - TorCDN Targeted for fix in version: - TorCDN Time Estimate: 120 minutes Time Logged: 0 minutes ----------------------------------------------------------------------------------------- Issue Description ================== If you were to build a Tor based CDN, ideally it'd need to be multi-tenanted, otherwise you're simply increasing the number of nodes you could potentially make a mistake on, leading to your own identification. Providing CDN-like services as an independent third party could be an option, however there are at least a few issues which would need to be addressed: The aim of this issue, is essentially to list those, along with possible solutions. ----------------------------------------------------------------------------------------- Issue Relations ================ - relates to MISC-18: Alternative Load balancing design ----------------------------------------------------------------------------------------- Activity ========== ----------------------------------------------------------------------------------------- 2015-12-24 13:20:23 btasker ----------------------------------------------------------------------------------------- The very first issue that would need to be handled is distribution of keys across the edge. If (as a customer) I hold the private key for foo.onion, handing that key over to a third party so that it can be distributed across the edge would be a _huge_ risk. So, an alternative solution would be to use Tor's "subdomain" behaviour. The CDN's edge would have a fixed private key (for example, resulting in bar.onion). As the operator of foo.onion, I'd have my static content (e.g. images) referenced as being hosted on foo.bar.onion. The CDN would know from the subdomain which origin to proxy onto, and an appropriate cache-key would be used to keep my cached content distinct from other users/customers. The subdomain itself wouldn't actually need to specify the origin to proxy to, as that could lead to abuse of the CDN's proxy capability (for example, by trying to send requests to www.google.com.bar.onion). It could equally well be of the format user123456.bar.onion or even abcdefg.bar.onion ----------------------------------------------------------------------------------------- 2015-12-24 13:25:30 btasker ----------------------------------------------------------------------------------------- Updating configuration when a new user was added could also be problematic with a large edge. One solution would be to have the edge proxy every cache_miss onto the midtier, regardless of the Host header received. Configuration specific to customers would then be added at the midtier. So the mid-tier would then, in effect, also become an authentication layer. So, if a request was received at the edge for www.google.com.bar.onion, the edge would pass the request upstream. Unless the mid-tier caches have a server block specifically for www.google.com.bar.onion the request would result in a 403. Which might give config like the following Edge -- BEGIN SNIPPET -- server { listen 127.0.0.1:80; server_name *.foo.onion; root /usr/share/nginx/onions/empty; resolver 127.0.0.1; # Proxy to the back-end location / { set $cachename "Edge1"; proxy_set_header X-DOWNSTREAM $cachename; # Make sure the host header is correct proxy_set_header Host $host; # Send the request proxy_pass http://cix7cricsvweeu6k.onion:8091; # Enable Keep-Aluve proxy_http_version 1.1; proxy_set_header Connection ""; # Allow revalidations proxy_cache_revalidate on; # Allow request pipe-lining proxy_cache_lock on; proxy_cache streamingcache; proxy_cache_key "$scheme$host$request_uri"; add_header X-Cache-Status "$cachename-$upstream_cache_status"; } } -- END SNIPPET -- Midtier -- BEGIN SNIPPET -- server { listen 127.0.0.1:80; server_name user1234.foo.onion; root /usr/share/nginx/onions/empty; resolver 127.0.0.1; # Proxy to the back-end location / { set $cachename "midtier1"; proxy_set_header X-DOWNSTREAM $cachename; # Make sure the host header is correct proxy_set_header Host "cix7cricsvweeu6k.onion"; # Send the request proxy_pass http://origin1234.onion:8091; # Enable Keep-Aluve proxy_http_version 1.1; proxy_set_header Connection ""; # Allow revalidations proxy_cache_revalidate on; # Allow request pipe-lining proxy_cache_lock on; proxy_cache streamingcache; proxy_cache_key "$scheme$host$request_uri"; add_header X-Cache-Status "$cachename-$upstream_cache_status"; } } server { listen 127.0.0.1:80 default_server; server_name invalid.foo.onion; error_page 403 /custom_403.html; location = /custom_403.html { root /usr/share/nginx/html; internal; } location / { deny all; } } -- END SNIPPET -- So in that example, example.foo.onion would be passed to the midtier, but would be denied. Only requests with a host header of user1234.foo.onion would be acceptable. The problem with that setup, though, is that it would become quite easy to effectively DDoS the mid-tier by spreading "invalid" requests across the edge. So there'd need to be some consideration given to infrastructure to mitigate this risk (probably a requirement for a larger midtier). An alternative route might be to use the NGinx LUA module to build an authentication system, or to use something like Ansible to roll configuration out to the edge. ----------------------------------------------------------------------------------------- 2016-01-10 21:16:06 ----------------------------------------------------------------------------------------- btasker changed Project from 'BenTasker.co.uk' to 'Miscellaneous' ----------------------------------------------------------------------------------------- 2016-01-10 21:16:06 ----------------------------------------------------------------------------------------- btasker changed Key from 'BEN-604' to 'MISC-15' ----------------------------------------------------------------------------------------- 2016-01-10 21:17:16 ----------------------------------------------------------------------------------------- btasker added 'TorCDN' to Version ----------------------------------------------------------------------------------------- 2016-01-10 21:17:23 ----------------------------------------------------------------------------------------- btasker added 'TorCDN' to Fix Version ----------------------------------------------------------------------------------------- 2016-01-17 15:27:53 btasker ----------------------------------------------------------------------------------------- Marking this as related to MISC-18 as some of what has been considered here will probably tie into that issue. ----------------------------------------------------------------------------------------- 2017-07-06 15:53:51 btasker ----------------------------------------------------------------------------------------- Marking this as resolved as I haven't touched it in ages. For completeness though - ultimately, I went with HTTP redirect based load balancing, using the HTTP router component of my RequestRouter Geo routing solution. Each node in the "CDN" has a dedicated onion address, and the router selects one of the nodes and generates a redirect to it's onion address. Works fine for video delivery and/or static content, though you obviously wouldn't want to use it for an entire onion site (as the address in the address bar would change, and people would bookmark the wrong URL etc). For user-facing URLs, the existing balancing mechanism (using the descriptor race) still seems to be the best solution. Possibly paired with having a traditional load balancer behind each so that you can have pools of origins in multiple locations ----------------------------------------------------------------------------------------- 2017-07-06 15:53:51 ----------------------------------------------------------------------------------------- btasker changed status from 'Open' to 'Resolved' ----------------------------------------------------------------------------------------- 2017-07-06 15:53:51 ----------------------------------------------------------------------------------------- btasker added 'Done' to resolution ----------------------------------------------------------------------------------------- 2017-07-06 15:53:57 ----------------------------------------------------------------------------------------- btasker changed status from 'Resolved' to 'Closed'