DescriptionWant to give some thought to whether it's a good idea to also make the site available as a tor HS.
I don't want the Tor client running on the main server for testing, but it could be run on the dev server with an NGinx reverse proxy set up and then moved across if/once it goes live.
That would also allow for tor specific tweaks (like flat out denying any attempt to access administration pages - I generally connect to those via VPN anyway).
I don't need the anonymity protection of a HS for bentasker.co.uk, but it's possible that there may be people who'd rather read via a HS than over the clearnet - this is also, very much, a test-in-principle for another site with a similar set up.
Need to assess the risks, design the setup and test well before making the address publicly available.
If anything, bentasker.co.uk should present a few more challenges than the site this is will eventually be targeted at.
Activity
2015-05-16 19:08:08
Also need to configure how Admin tools will behave - if a user repeatedly tries to compromise the front-end, it's GERA's IP that will be blocked.
Will also need to make sure all URLs within the site are relative (they should be) so that people don't get redirected to the clearnet.
2015-05-17 14:42:59
2015-05-19 11:19:28
The plan, at this point, is as follows (one comment per section to try and keep it readable)
2015-05-19 11:19:40
The tor client will forward port 80 to a HTTP reverse proxy (listening only on localhost) which will then proxy onto the main site via HTTPS.
In doing so, it'll make a couple of changes when going upstream
- Host header will be changed (obviously)
- Insert an header to denote the source is the .onion (more on that in a bit)
- Certain content might be served from disk rather than proxied upstream (more on that in a bit)
Technically, because we're doing a SSL to Plain, you could capture comms between Tor and the NGinx RP, but if you've got the ability to tcpdump the loopback adapter there are plenty of other attacks you could launch (like stealing the HS private key).
2015-05-19 11:19:58
I had originally though the best way to address the tor2web issue was going to be to serve a customised robots.txt on the .onion.
Still going to do that, however, tor2web also include a header identifying the connection as tor2web (see http://comments.gmane.org/gmane.network.tor.user/34292 ) so we can block (with a useful message) based on that - not only does it prevent Google from indexing the site at a different URL, but it gives the opportunity to tell a genuine user that they can access direct via HTTPS or the .onion (reducing the risk of MITM)
2015-05-19 11:20:16
This one is a little more complex (and getting it just right may branch into a sub-task at some point).
Need to be sure that when the Application level protections are repeatedly triggered via the .onion, the resulting ban doesn't adversely affect innocent users who are also accessing via the .onion.
I'm not too keen to make the protections more permissive, as it doesn't address the root issue, just makes it harder to trip, and weakens security in the process.
The method used by Facebook is to tell the origin that the source IP of the client is within the DHCP broadcast network (to ensure it's not routable and won't be in use elsewhere in the network). When protections trip, they've got a real-enough IP to block, meaning the protections themselves don't need to be tinkered with.
So, I could drop a 'unique' IP into X-Forwarded-For (or use a differet header) for each request.
If the same IP is used for any requests within a given connection, the protections can at least effectively invalidate that HTTP keep-alive session.
The downside, is, disconnecting the TCP session and starting a new (or just not using keep-alive) would be all an attacker would need to do to circumvent the ban. But, then, the whole point is that the protections should be good enough to block exploit attempts whether it's the first request made or the millionth.
It's not particularly hard to circumvent IP based bans on the WWW either, so I'm going to roll with it and then re-review later I think.
2015-05-19 11:21:06
Will need to make sure that all URL's are relative, and re-write those that are not.
In particular, absolute URL's are currently used for static content (as certain static content is served from a subdomain to allow the browser to parallelize a bit). Those URL's will need to be rewritten.
I think, again, I'm going to follow Facebook's approach on this one - I'll rewrite to a subdomain of the .onion
So, taking an existing flow
Simply need to adjust the plugin so that if the source is using a .onion (denoted by the NGinx tweak noted above), the flow becomes
Essentially, all we want to do is to rewrite the scheme (from https to http), domain name and the TLD
Similarly, need to make sure that there's little to nothing that actually depends on Javascript being functional - it should be assumed that .onion visitors are going to have Javascript disabled (though that's generally been the assumption on the www. side anyway)
2015-05-19 11:21:57
I'll obviously need to review anything I've got in place in the CORS sense to make sure the new domain (foo.onion) is permitted so that browser protections don't kick in and cause broken rendering.
There shouldn't be much to check/change, but it needs doing
2015-05-19 11:22:09
The idea of redirecting anyone coming from a Tor Exit to the .onion had been mooted - but it's been pointed out that it's may well be wise to try to avoid unexpected behaviour with Tor visitors.
Although I'm not currently looking at having to disable any public functionality for the .onion, there's a possibility that I may need to do so once I get into it. So, it could be that implementing such a redirect would mean taking the visitor to a site that doesn't contain the functionality they want (but would have done had they been permitted to use the exit).
Seems best to revisit this once everything else is set up.
2015-05-19 11:22:47
The plan, from the outset, has been to offer the .onion via port 80, to avoid certificate warnings. In the longer term, though, there may be value in looking at the option of also offering HTTPS.
Apparently Mozilla have announced that they plan to gate new features to only work on HTTPS connections ( https://blog.mozilla.org/security/2015/04/30/deprecating-non-secure-http/ ). Obviously whether that affects Tor users will depend on how exactly Mozilla go about doing that (i.e. whether it's something that can be easily reverted/tested in TBB) as well as which features end up unavailable.
Using HTTPS would also allow Content Security Policy ( CSP ) to be used, so theoretically any link-clicks could be reported (using POST) to an .onion endpoint to help identify any URLs that haven't been successfully rewritten in consideration 4 above.
2015-05-19 11:23:44
This won't be an issue on the site that this will eventually be deployed on, but is on www.bentasker.co.uk so seems worth addressing.
In consideration 4 we'll be rewriting links depending on whether the visitor originated from the .onion or the www. What we don't want, then, is for responses to be cached within the same namespace.
If the page is cached when someone visits via www then the .onion visitor will go out of an exit - which whilst not terrrible, somewhat undermines the efforts here.
But - if the page is cached when someone visits via .onion, the site will completely break for a visitor on the www (as they won't be able to resolve the .onion)
It's only certain pages that are cached, and there's still some value in doing so, so the simple solution here is to update the cache key to include an indicator of whether it's source from the .onion or not (so that www and .onion become two distinct cacheable entities).
2015-05-19 11:24:46
There are a number of resources on the site which may/will be undesirable when accessing via .onion:
- Google Ads
- Google Analytics
- Social Media Sharing buttons
I'm highlighting these in particular because they share information with a third party.
The SM buttons have actually been disabled by default for some time (the buttons displayed are just images, clicking them enables them and then you click to tweet/like/whatever). They'll still work the same way afterwards.
The site has had a 'Block Google Analytics' function in it's sidebar for years - it does rely on Javascript, but then if Javascript is disabled, the Analytics functionality won't be firing either.
Adsense, I'm a little torn about. I don't particularly like having the ads up, but in the case of www.bentasker.co.uk they help keep the site live. I've trialled removing them in the past, and had to put them back.
For most users, traffic to the relevant 3rd party services will likely be via an exit node anyway so the concern is slightly less. Where it's slightly more important is where visitors have a gateway running on their LAN specifically so that they have transparent access to .onions (meaning their connection to Google won't route over Tor).
You could argue that it's a risk they take, but I'd prefer to do what I can to mitigate it a little - need to give this one some thought.
2015-05-19 11:24:59
From the outset, I've not been too concerned about this, but it's interesting to note that Facebook's experience has been that it wasn't quite as bad as expected.
2015-05-19 11:25:15
My initial thoughts on this is were that this is actually a good thing, and no-one's said anything to the contrary, so recording simply for posterity - no action needed.
2015-05-19 12:17:58
2015-05-19 12:18:12
2015-05-19 12:20:03
2015-05-19 12:20:03
2015-05-19 12:20:12
2015-05-19 12:20:17
2015-05-19 12:20:41
2015-05-19 12:22:02
2015-05-19 12:26:19
2015-05-19 14:05:17
Not currently an issue, as the session cookie doesn't use that flag
In fact, there are a few 'best practices' for HTTPS that I can't use (or at least will need to account for at the Tor RP end):
- secure flag in cookies
- Strict Transport Security
Probably some others as well.
2015-05-19 14:20:44
Running a quick test having chucked an example hostname into the server block
Checking local overrides
Looks good to me, finally, checking a tor2web type request
2015-05-19 14:51:29
Not using the onion indicator header directly in the cache key, because an attacker could then hit the www hitting the same page over and over, specifying a different value in that header in order to try and exhaust the space available to the cache.
The two sources now have different keys, for the onion site, the key for the homepage would be
2015-05-19 15:15:01
If we take the facebook approach and continue to treat static content as a subdomain, all should work fine - nothing special needs to be done to make sure those requests hit the same HS and the .onion address would (I've just tested), so at the reverse proxy we'll just need a new server block to handle the domain name and proxy on (that one can definitely be configured to cache).
Before that, though, it's probably worth addressing the plugin which performs that re-write for static content, which may or may not be trivial (can't remember when I last looked at that codebase).
2015-05-19 15:24:57
2015-05-19 16:07:01
Relying on the header sent by the Tor Reverse proxy is a bad idea (partly because I've just documented what it is :) ), and some charitable soul could come along and hit the www. with requests containing that header so that my cache contains lots incorrectly re-written URLS.
The name of that header essentially needs to be kept a secret to prevent that, not ideal, but it's the simplest fix. So the NGinx changes on the origin now become (we can't send the header within the if statement because NGinx won't let us, so need to send it empty if not)
2015-05-19 16:29:04
Cache defined in nginx.conf
New server block created for the subdomain
In theory, now, we should be able to browse via the .onion without having any static resources load over the www (though there may still be some links within the content itself as that's not been checked yet).
2015-05-19 16:54:22
So that'll need to be overridden
2015-05-19 16:57:42
Which resolves the issue. It might be better to look at creating a small plugin to do much the same thing so that it can be managed from the back-end, but the effect is the same.
2015-05-19 17:09:52
2015-05-19 17:28:16
2015-05-21 12:12:29
Needed to update the back-end code for the "Your Data" page - https://www.bentasker.co.uk/your-stored-data - to ensure that it doesn't disclose the header name/value.
2015-05-21 16:29:39
2015-05-22 15:49:57
2015-05-22 15:55:21
2015-05-22 15:57:59
2015-05-22 15:57:59
2015-05-22 15:58:06
2015-05-22 17:32:27
2015-05-23 03:14:40