MISC-25: Create functionality to block AMP pages



Issue Information

Issue Type: New Feature
 
Priority: Major
Status: Closed

Reported By:
Ben Tasker
Assigned To:
Ben Tasker
Project: Miscellaneous (MISC)
Resolution: Done (2018-02-15 15:58:38)
Affects Version: Google AMP,
Target version: Google AMP,

Created: 2018-02-14 14:23:09
Time Spent Working


Description
I really, really, hate AMP pages.

They serve no useful purpose to me, and are often lacking functionality that I actually use. Also none-to-happy about the number of AMP sites that are hosted directly by Google.

Unfortunately, there still isn't a way to globally opt out of AMP for all sites (and Twitter have now taken to directing links to the AMP version automatically).

The aim of this issue is to create a ruleset for ABP/Ublock Origin which blocks amp.js and other dependancies in the hope that most pages should then react by redirecting you to the canonical URL.

Whilst it'd probably be possible to achieve the same with Greasemonkey (by detecting the canonical and going there instead), that's not really a useful option on mobile


Issue Links

RemoveAMP Greasemonkey script
Toggle State Changes

Activity


Initial test URL is this - https://www.theregister.co.uk/AMP/2018/02/14/kaspersky_us_ban_legal_fight/#click=https://t.co/R2wN7H4fpH (taken direct from Twitter)

Within the response, we can see the callout to ampjs
<script async src="https://cdn.ampproject.org/v0.js"></script>
<script async custom-element="amp-analytics" src="https://cdn.ampproject.org/v0/amp-analytics-0.1.js"></script>
<script async custom-element="amp-ad" src="https://cdn.ampproject.org/v0/amp-ad-0.1.js"></script>


So to begin with, let's blacklist cdn.ampproject.org in ABP.

Unfortunately, that doesn't help. Though the URL https://cdn.ampproject.org/v0/amp-ad-0.1.js can be blocked in either case, bad enough getting AMP, buggered if they're going to serve me ads at the same time

Repo: RemoveAMP
Commit: ae976766087bef361bd07b462a3d26658a9ddda3
Author: Ben Tasker <btasker@<Domain Hidden>>

Date: Thu Feb 15 12:35:26 2018 +0000
Commit Message: Implement initial version of AMP bypasser. See STGNG-7



Added (+)
-------
anti-amp.js




Webhook User-Agent

GitHub-Hookshot/4cd0928


View Commit


Repo: RemoveAMP
Commit: 2c0b74a9456e06424280676137acc52a29bb767f
Author: Ben Tasker <btasker@<Domain Hidden>>

Date: Thu Feb 15 13:52:33 2018 +0000
Commit Message: Creates greasemonkey script for STGNG-7

I had originally planned to load the anti-amp Javascript direct from Github (using raw.) but unfortunately the browser refuses to run it because the content-type is returned as text/plain rather than application/javascript

Instead, we serve it via my CDN.

When adding the script anchor, we include a SubResourceIntegrity (SRI) hash to minimise the chances of a MITM (or compromise of my system) buggering anyone.



Added (+)
-------
greasemonkey_hook.user.js




Webhook User-Agent

GitHub-Hookshot/4cd0928


View Commit

btasker changed Project from 'STAGING' to 'Miscellaneous'
btasker changed Key from 'STGNG-7' to 'MISC-25'
btasker added 'Google AMP' to Version
btasker added 'Google AMP' to Fix Version
So, as the commits above probably indicate, I've created a Greasemonkey script to detect AMP pages and attempt to redirect to the proper canonical URL.

Where the canonical isn't available (as it's almost certain there'll be pages which don't declare it) it'll instead inject a link to search (by page title) on DuckDuckGo to try and find the full-fat version of the site.

I've had to host the JS on my CDN as I didn't really want to inject so much javascript into every page, and couldn't load direct from GitHub because they return text/plain in the Content-Type header so Chrome refuses to run it.

Clients should cache for 30 days, and are able to revalidate, so even if lots of people use it, I shouldn't see too much of a change in traffic.

The injected anchor uses SRI so that if someone gains control over my back-end (or manages a successful MITM) they'll need to also get access to the github repo (and convince people to update).

Repo: RemoveAMP
Commit: cb974e1bf7302233d32d64eaf1a11aca98d82180
Author: B Tasker <github@<Domain Hidden>>

Date: Thu Feb 15 14:43:51 2018 +0000
Commit Message: Ensure function returns after pushing redirect (MISC-25)

Otherwise, if the browser is slow to follow the redirect we'll still write subsequent console.log calls to console. Might make troubleshooting tricky at some point in the future.



Modified (-)(+)
-------
anti-amp.js
greasemonkey_hook.user.js




Webhook User-Agent

GitHub-Hookshot/4cd0928


View Commit

I've added a section to the README in the repo for known limitations.

One that I am currently aware of is that Content-Security-Policy (CSP) may well block the injected script on some sites. For example, Google search results currently block it:
userscript.html?id=f6de7006-3d7d-4a81-8a14-29f0052f1039:42 Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval'". Either the 'unsafe-inline' keyword, a hash ('sha256-DQUplo+SS19U09slU/g8aiq/TL3kF4fU8XPQZP4ERPc='), or a nonce ('nonce-...') is required to enable inline execution.


There isn't really a good way to address this in all cases, as anywhere not specifying unsafe-inline in their CSP is liable to break (no matter what we do). Almost nowhere with CSP configured is going to have that (as it re-opens the door to XSS in a big way). Even then, they'd need to allow static1.bentasker.co.uk for us to load the anti-amp script, but even if we removed that dependancy and injected everything, the lack of unsafe-inline would still screw us.

I assume it isn't possible to have Greasemonkey write values into the browser's view of the returned CSP headers, but even if it is, it's not something I'm willing to consider. That way lies many pains.

So, the conclusion here, is that the script will fail to fire (and generate console log info) on a subset of domains. Over time that subset may well increase if CSP sees an uptake in usage.

It hadn't really occurred to me before that CSP might kill the utility of things like GreaseMonkey/Tampermonkey, but I guess it's probably an obvious/inavoidable casualty. It does look like Tampermonkey has a basic fix for it though - https://github.com/Tampermonkey/tampermonkey/issues/418 though it'll change the way the code is pulled in (still may be worth looking at - https://github.com/Tampermonkey/tampermonkey/issues/472 )

There's a good writeup on having Tampermonkey enforce SRI with @require here - https://forum.tampermonkey.net/viewtopic.php?t=1746

Amending the Tampermonkey script to use that method seems to work quite well with CSP enabled sites.

Should also shout out an apology to Scott Helme. I've been testing against his site as I know he has a good, robust CSP set up. Unfortunately (in this case), he's also got report-uri defined so will have been getting a few reports from my less successful tests.

The only changes needed are to the greasemonkey hook itself.

One additional benefit of this route, is that Tampermonkey doesn't include a referrer string when fetching the resource, so I'm not going to end up with details of other people's browsing sessions if they decide to use the script.

It also seems to trigger much, much faster so you don't get a partial load of the AMP page before the redirect fires.

Looks like a win to me so far, so I'm going to commit it and call it v1.2

Repo: RemoveAMP
Commit: cfffdf43451320a13045755124ed9a8e862f37b3
Author: B Tasker <github@<Domain Hidden>>

Date: Thu Feb 15 15:30:18 2018 +0000
Commit Message: MISC-25 Switch to using TM/GM's require directive

This allows TamperMonkey to run the script even on pages with a strict Content-Security-Policy (as TM supports adding itself into any CSP headers which are present).

It also results in faster trigger times, and means my CDN's logs now won't contain referrer strings showing peoples browsing history (which is a win for both them and me)



Modified (-)(+)
-------
greasemonkey_hook.user.js




Webhook User-Agent

GitHub-Hookshot/4cd0928


View Commit

Tested and working in Firefox Mobile on Android (with TamperMonkey installed)
I've created a test page for a quick check of whether things are working:

- https://projectsstatic.bentasker.co.uk/MISC/MISC25/bad.html is AMP'd and should redirect to
- https://projectsstatic.bentasker.co.uk/MISC/MISC25/good.html

Both very, very simplistic pages, but the first should trigger the redirect
So, pending using it for a bit and finding issues, looks like we're set up and ready to go. Ublock/ABP block the AMP cdn (so that they're not getting referrer data from anywhere I land) and Tampermonkey should now take me to a proper version of the page rather than leaving me trying to find it for myself.

There'll no doubt be more work to do in future though, given they're bringing AMP to email - https://techcrunch.com/2018/02/13/amp-for-email-is-a-terrible-idea/ - (though, at that point I may just switch all mail etc away from Google's services and pay someone else not to piss around with my mail).
I'm going to mark this as Done. Probably better to raise a new issue to try and directly address AMP for email if and when it becomes something that cruds up my inbox.
btasker changed status from 'Open' to 'Resolved'
btasker added 'Done' to resolution
btasker changed status from 'Resolved' to 'Closed'