MISC-40: Chrome trying to load static content from my Onion breaks rendering



Issue Information

Issue Type: Bug
 
Priority: Major
Status: Closed

Reported By:
Ben Tasker
Assigned To:
Ben Tasker
Project: Miscellaneous (MISC)
Resolution: Fixed (2020-06-20 11:39:46)
Labels: Bentasker.co.uk, Onion, Tor,

Created: 2020-06-20 10:22:03
Time Spent Working


Description
I went to test something in the latest Chrome and noticed my site was broken - none of the CSS was loading.

Looking in developer tools there were a whole bunch of Mixed Content warnings citing the Onion address:
Mixed Content: The page at 'https://www.bentasker.co.uk/' was loaded over HTTPS, but requested an insecure prefetch resource 'http://6zdgh5a5e6zpchdz.onion/templates/joomspirit_76/css/compiled.css'. This request has been blocked; the content must be served over HTTPS.


I do run a split cache between the onion and www site, so I initially assumed that something had somehow jumped the cache and polluted the clearnet one, but a search of the markup showed no references to loading static content from 6zdgh5a5e6zpchdz.onion

Similarly, using Chrome's "Copy as CURL" against the same box didn't yield any matches
ben@milleniumfalcon:~$ curl 'https://www.bentasker.co.uk/'   -H 'authority: www.bentasker.co.uk'   -H 'pragma: no-cache'   -H 'cache-control: no-cache'   -H 'upgrade-insecure-requests: 1'   -H 'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Mobile Safari/537.36'   -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'   -H 'sec-fetch-site: none'   -H 'sec-fetch-mode: navigate'   -H 'sec-fetch-user: ?1'   -H 'sec-fetch-dest: document'   -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8'   -H 'cookie: 6849605f66eba1c621d70b2e8a636c78=655dqc21h89eutjg9crh525vk2'   --compressed -s -H "Host: www.bentasker.co.uk" -6 -g https://[2001:41d0:2:a192::2]/| grep 6zdgh 
var h = window.location.hostname;if (h.endsWith('.onion') && !h.endsWith('5e6zpchdz.onion') && !h.startsWith('6zdgh5') ){window.location.href = atob('aHR0cDovLzZ6ZGdoNWE1ZTZ6cGNoZHoub25pb24v') + window.location.pathname + window.location.search;}


Attachments

Issue Links

Chromium Bug 1097465
Initial Twitter Thread
Toggle State Changes

Activity


btasker added 'Screenshot_20200620_100119.png' to Attachments
I did recently make a change though, to add an Onion-Location header to responses - https://www.bentasker.co.uk/blog/privacy/693-onion-location-added-to-site

This header should be ignored by all but Tor Browser Bundle.

So, I went and commented out the relevant line in Nginx on the box I was hitting
#               add_header Onion-Location http://6zdgh5a5e6zpchdz.onion$request_uri;


Chrome suddenly, magically, works.

I wonder if they've got some kind of glob for "Location" in the codebase somewhere?

The odd thing is, in the Network tab of developer tools, it doesn't show as a redirect - you get the request for the root document, and then a bunch of failed (blocked) requests because of the mixed content thing.
I searched the Chromium bugtracker and couldn't find anything, so I've raised https://bugs.chromium.org/p/chromium/issues/detail?id=1097465 for this.
-------------------------
From: git@<Domain Hidden>
To: jira@<Domain Hidden>
Date: None
Subject: MISC-40 Disable Onion-Location header It seems to break Chrome
-------------------------


Repo: domains.d
Host:astria

commit 5805f9fd330921c52b7886faf254da10396f1250
Author: root <root@astria>
Date: Sat Jun 20 10:26:24 2020 +0100

Commit Message: MISC-40 Disable Onion-Location header

It seems to break Chrome

proxy_settings_bentasker.co.uk.inc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


View Commit | View Changes

Yeah this looks like a malfunctioning glob/regex somewhere in the Chrome codebase.

I just added a made up header
                add_header NotA-Location http://foobar.invalid$request_uri;


And Chrome broke again.

Actually.... no. The mixed content warnings still refer to 6zdgh5a5e6zpchdz.onion

This time I do see the onion names in the response
Yeah, looks like this was a polluted cache.

Back when I set up the multi-homing of the site, I set up a variable to be included in the cache key to identify whether the asset was for the Onion version of the site or not - https://projects.bentasker.co.uk/jira_projects/browse/MISC-2.html#comment657857

Some time back, the proxy settings were broken out into a separate file (under BEN-608) and this differentiator was inadvertently lost. This probably happened because it was only applied to certain locations - dynamic pages.

Which doesn't answer why my earlier tests didn't yield the same results - I guess the answer is to search the cache files (I guess the most likely candidate would be Vary headers)

-------------------------
From: git@<Domain Hidden>
To: jira@<Domain Hidden>
Date: None
Subject: MISC-40 Ensure onion differentiator is always used in cache key. Set
 an (empty) definition at server block level to ensure it always exists when
 referenced

Re-enable Onion-Location header
-------------------------


Repo: domains.d
Host:astria

commit 68595c209f9e0e6f932cd5469efa789446de47c2
Author: root <root@astria>
Date: Sat Jun 20 11:16:36 2020 +0100

Commit Message: MISC-40 Ensure onion differentiator is always used in cache key. Set an (empty) definition at server block level to ensure it always exists when referenced

bentasker.co.uk.conf | 3 +++
proxy_settings_bentasker.co.uk.inc | 5 ++---
2 files changed, 5 insertions(+), 3 deletions(-)


View Commit | View Changes

Let's see what we get

root@astria:/mnt/ramcache# egrep -Rl -e 'KEY: httpswww.bentasker.co.uk/$' *
2/8a/c9c48b5cac13f16e5e6bc4a52c1f78a2
4/38/f33ba4a0cacdabd4175640878aaae384
5/b4/e1bfce4dded1bc4d3c4ee960e7f40b45
5/c4/98f5e43d30e3b134fde23108301f7c45
6/7a/eb7d68faa67e6f41211f445051a427a6
8/fe/a60d69ca2c7f4a9ae86ba4c47f560fe8
9/04/9012b1cc6990686a789221e51c78e049
a/22/5324c98657b60d5e52a875f97dbdc22a
b/d7/635a712703ae370cc7bfcc5f19335d7b
c/7d/e7dd43232134b4c04e4d3c71134e87dc
d/ab/f6fc91d7cc4cf609fb5c84600cc41abd


That's significantly more hits than expected

OK, so if we compare two files, we can see that KEY is the same, but within the binary blob at the head of the file, the Accept-Encoding value differs

That shouldn't have happened though - I used "Copy as CURL" within Chrome's Developer Tools specifically because it should send exactly the same headers. I've a feeling it takes a shortcut though.

If we copy as curl again
curl 'https://www.bentasker.co.uk/' \
  -H 'authority: www.bentasker.co.uk' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Mobile Safari/537.36' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'sec-fetch-site: none' \
  -H 'sec-fetch-mode: navigate' \
  -H 'sec-fetch-user: ?1' \
  -H 'sec-fetch-dest: document' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cookie: 6849605f66eba1c621d70b2e8a636c78=655dqc21h89eutjg9crh525vk2' \
  --compressed


Notice it uses --compressed rather than explicitly setting Accept-Encoding. I'm willing to bet the header is slightly different in curl than in Chrome

Also, it writes in the URL rather than using --resolve (so you may go to a different host), but I accounted for that in my initial test.

So, curl sends
ben@milleniumfalcon:~$ curl 'https://www.bentasker.co.uk/' \
>   -H 'authority: www.bentasker.co.uk' \
>   -H 'pragma: no-cache' \
>   -H 'cache-control: no-cache' \
>   -H 'upgrade-insecure-requests: 1' \
>   -H 'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Mobile Safari/537.36' \
>   -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
>   -H 'sec-fetch-site: none' \
>   -H 'sec-fetch-mode: navigate' \
>   -H 'sec-fetch-user: ?1' \
>   -H 'sec-fetch-dest: document' \
>   -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
>   -H 'cookie: 6849605f66eba1c621d70b2e8a636c78=655dqc21h89eutjg9crh525vk2' \
>   --compressed -v -s -o/dev/null 2>&1 | grep Accept-En
> Accept-Encoding: deflate, gzip
< Vary: Accept-Encoding


Whereas looking in Developer tools, Chrome sends accept-encoding: gzip, deflate, br





So, to summarise

At some point, someone has viewed my Onion - 6zdgh5a5e6zpchdz.onion - using a browser that includes Accept-Encoding: gzip, deflate, br in it's request.

Because it was an onion request, the URL of static resources was rewritten from static1.bentasker.co.uk to use an Onion address instead.

Because the onion differentiator variable was missing, that response was cached and returned to any user who's browser also sends Accept-Encoding: gzip, deflate, br, which included me with Chrome this morning.

My habit of trying to confirm things with curl shot me in the foot because Chrome's Copy as CURL doesn't lead to a sufficiently identical request - Curl doesn't include Brotli (https://en.wikipedia.org/wiki/Brotli) - and as a result doesn't hit the same cached asset. My subsequent test with (older) Chromium wasn't careful enough - I got routed to another host - but I didn't notice because it confirmed what I thought I already knew.

Testing a random hypothesis (have Chrome fucked up a regex?) and actually looking at the results set me back onto the right path

Marking this as Fixed
btasker added 'Bentasker.co.uk Onion Tor' to labels
btasker changed status from 'Open' to 'Resolved'
btasker added 'Fixed' to resolution
btasker changed status from 'Resolved' to 'Closed'