##########################################################################################
FKAMP-1: Create functionality to block AMP pages
##########################################################################################
Issue Type: New Feature
-----------------------------------------------------------------------------------------
Issue Information
====================
Priority: Major Status: Closed
Resolution: Done (2018-02-15 15:58:38)
Project: Anti-AMP Scripts (FKAMP)
Reported By: btasker
Assigned To: btasker
Targeted for fix in version:
- V1.1 - v1.2 - v1.3
Time Estimate: 0 minutes
Time Logged: 0 minutes
-----------------------------------------------------------------------------------------
Issue Description
==================
I really, really, hate AMP pages.
They serve no useful purpose to me, and are often lacking functionality that I actually
use. Also none-to-happy about the number of AMP sites that are hosted directly by Google.
Unfortunately, there _still_ isn't a way to globally opt out of AMP for all sites (and
Twitter have now taken to directing links to the AMP version automatically).
The aim of this issue is to create a ruleset for ABP/Ublock Origin which blocks amp.js and
other dependancies in the hope that most pages should then react by redirecting you to the
canonical URL.
Whilst it'd probably be possible to achieve the same with Greasemonkey (by detecting the
canonical and going there instead), that's not really a useful option on mobile
-----------------------------------------------------------------------------------------
Issue Relations
================
- relates to FKAMP-2: Anti-Amp Script doesn't work on Google's AMP cache
- RemoveAMP Greasemonkey script (https://github.com/bentasker/RemoveAMP)
-----------------------------------------------------------------------------------------
Activity
==========
-----------------------------------------------------------------------------------------
2018-02-14 14:26:03 btasker
-----------------------------------------------------------------------------------------
Initial test URL is this -
https://www.theregister.co.uk/AMP/2018/02/14/kaspersky_us_ban_legal_fight/#click=https://t.co/R2wN7H4fpH
(taken direct from Twitter)
Within the response, we can see the callout to ampjs
-- BEGIN SNIPPET --
-- END SNIPPET --
So to begin with, let's blacklist cdn.ampproject.org in ABP.
Unfortunately, that doesn't help. Though the URL
https://cdn.ampproject.org/v0/amp-ad-0.1.js can be blocked in either case, bad enough
getting AMP, buggered if they're going to serve me ads at the same time
-----------------------------------------------------------------------------------------
2018-02-15 12:37:06 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: ae976766087bef361bd07b462a3d26658a9ddda3
Author: Ben Tasker >
Date: Thu Feb 15 12:35:26 2018 +0000
Commit Message: Implement initial version of AMP bypasser. See STGNG-7
Added (+)
-------
anti-amp.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/4cd0928
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/ae976766087bef361bd07b462a3d26658a9ddda3
-----------------------------------------------------------------------------------------
2018-02-15 13:55:06 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: 2c0b74a9456e06424280676137acc52a29bb767f
Author: Ben Tasker >
Date: Thu Feb 15 13:52:33 2018 +0000
Commit Message: Creates greasemonkey script for STGNG-7
I had originally planned to load the anti-amp Javascript direct from Github (using raw.)
but unfortunately the browser refuses to run it because the content-type is returned as
text/plain rather than application/javascript
Instead, we serve it via my CDN.
When adding the script anchor, we include a SubResourceIntegrity (SRI) hash to minimise
the chances of a MITM (or compromise of my system) buggering anyone.
Added (+)
-------
greasemonkey_hook.user.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/4cd0928
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/2c0b74a9456e06424280676137acc52a29bb767f
-----------------------------------------------------------------------------------------
2018-02-15 14:04:34
-----------------------------------------------------------------------------------------
btasker changed Project from 'STAGING' to 'Miscellaneous'
-----------------------------------------------------------------------------------------
2018-02-15 14:04:34
-----------------------------------------------------------------------------------------
btasker changed Key from 'STGNG-7' to 'MISC-25'
-----------------------------------------------------------------------------------------
2018-02-15 14:04:41
-----------------------------------------------------------------------------------------
btasker added 'Google AMP' to Version
-----------------------------------------------------------------------------------------
2018-02-15 14:04:47
-----------------------------------------------------------------------------------------
btasker added 'Google AMP' to Fix Version
-----------------------------------------------------------------------------------------
2018-02-15 14:06:32 btasker
-----------------------------------------------------------------------------------------
So, as the commits above probably indicate, I've created a Greasemonkey script to detect
AMP pages and attempt to redirect to the proper canonical URL.
Where the canonical isn't available (as it's almost certain there'll be pages which don't
declare it) it'll instead inject a link to search (by page title) on DuckDuckGo to try and
find the full-fat version of the site.
I've had to host the JS on my CDN as I didn't really want to inject so much javascript
into every page, and couldn't load direct from GitHub because they return text/plain in
the Content-Type header so Chrome refuses to run it.
Clients should cache for 30 days, and are able to revalidate, so even if lots of people
use it, I shouldn't see too much of a change in traffic.
The injected anchor uses SRI so that if someone gains control over my back-end (or manages
a successful MITM) they'll need to also get access to the github repo (and convince people
to update).
-----------------------------------------------------------------------------------------
2018-02-15 14:47:04 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: cb974e1bf7302233d32d64eaf1a11aca98d82180
Author: B Tasker >
Date: Thu Feb 15 14:43:51 2018 +0000
Commit Message: Ensure function returns after pushing redirect (MISC-25)
Otherwise, if the browser is slow to follow the redirect we'll still write subsequent
console.log calls to console. Might make troubleshooting tricky at some point in the
future.
Modified (-)(+)
-------
anti-amp.js
greasemonkey_hook.user.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/4cd0928
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/cb974e1bf7302233d32d64eaf1a11aca98d82180
-----------------------------------------------------------------------------------------
2018-02-15 15:10:52 btasker
-----------------------------------------------------------------------------------------
I've added a section to the README in the repo for known limitations.
One that I am currently aware of is that Content-Security-Policy (CSP) may well block the
injected script on some sites. For example, Google search results currently block it:
-- BEGIN SNIPPET --
userscript.html?id=f6de7006-3d7d-4a81-8a14-29f0052f1039:42 Refused to execute inline
script because it violates the following Content Security Policy directive: "script-src
'unsafe-eval'". Either the 'unsafe-inline' keyword, a hash
('sha256-DQUplo+SS19U09slU/g8aiq/TL3kF4fU8XPQZP4ERPc='), or a nonce ('nonce-...') is
required to enable inline execution.
-- END SNIPPET --
There isn't really a good way to address this in _all_ cases, as anywhere not specifying
unsafe-inline in their CSP is liable to break (no matter what we do). Almost nowhere with
CSP configured is going to have that (as it re-opens the door to XSS in a big way). Even
then, they'd need to allow static1.bentasker.co.uk for us to load the anti-amp script, but
even if we removed that dependancy and injected everything, the lack of unsafe-inline
would still screw us.
I assume it isn't possible to have Greasemonkey write values into the browser's view of
the returned CSP headers, but even if it is, it's not something I'm willing to consider.
That way lies many pains.
So, the conclusion here, is that the script will fail to fire (and generate console log
info) on a subset of domains. Over time that subset may well increase if CSP sees an
uptake in usage.
It hadn't really occurred to me before that CSP might kill the utility of things like
GreaseMonkey/Tampermonkey, but I guess it's probably an obvious/inavoidable casualty. It
does look like Tampermonkey has a basic fix for it though -
https://github.com/Tampermonkey/tampermonkey/issues/418 though it'll change the way the
code is pulled in (still may be worth looking at -
https://github.com/Tampermonkey/tampermonkey/issues/472 )
-----------------------------------------------------------------------------------------
2018-02-15 15:28:21 btasker
-----------------------------------------------------------------------------------------
There's a good writeup on having Tampermonkey enforce SRI with @require here -
https://forum.tampermonkey.net/viewtopic.php?t=1746
Amending the Tampermonkey script to use that method seems to work quite well with CSP
enabled sites.
Should also shout out an apology to Scott Helme. I've been testing against his site as I
know he has a good, robust CSP set up. Unfortunately (in this case), he's also got
report-uri defined so will have been getting a few reports from my less successful tests.
The only changes needed are to the greasemonkey hook itself.
One additional benefit of this route, is that Tampermonkey doesn't include a referrer
string when fetching the resource, so I'm not going to end up with details of other
people's browsing sessions if they decide to use the script.
It also seems to trigger much, much faster so you don't get a partial load of the AMP page
before the redirect fires.
Looks like a win to me so far, so I'm going to commit it and call it v1.2
-----------------------------------------------------------------------------------------
2018-02-15 15:33:04 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: cfffdf43451320a13045755124ed9a8e862f37b3
Author: B Tasker >
Date: Thu Feb 15 15:30:18 2018 +0000
Commit Message: MISC-25 Switch to using TM/GM's require directive
This allows TamperMonkey to run the script even on pages with a strict
Content-Security-Policy (as TM supports adding itself into any CSP headers which are
present).
It also results in faster trigger times, and means my CDN's logs now won't contain
referrer strings showing peoples browsing history (which is a win for both them and me)
Modified (-)(+)
-------
greasemonkey_hook.user.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/4cd0928
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/cfffdf43451320a13045755124ed9a8e862f37b3
-----------------------------------------------------------------------------------------
2018-02-15 15:39:51 btasker
-----------------------------------------------------------------------------------------
Tested and working in Firefox Mobile on Android (with TamperMonkey installed)
-----------------------------------------------------------------------------------------
2018-02-15 15:47:32 btasker
-----------------------------------------------------------------------------------------
I've created a test page for a quick check of whether things are working:
- https://projectsstatic.bentasker.co.uk/MISC/MISC25/bad.html is AMP'd and should redirect
to
- https://projectsstatic.bentasker.co.uk/MISC/MISC25/good.html
Both very, very simplistic pages, but the first should trigger the redirect
-----------------------------------------------------------------------------------------
2018-02-15 15:52:20 btasker
-----------------------------------------------------------------------------------------
So, pending using it for a bit and finding issues, looks like we're set up and ready to
go. Ublock/ABP block the AMP cdn (so that they're not getting referrer data from anywhere
I land) and Tampermonkey should now take me to a proper version of the page rather than
leaving me trying to find it for myself.
There'll no doubt be more work to do in future though, given they're bringing AMP to email
- https://techcrunch.com/2018/02/13/amp-for-email-is-a-terrible-idea/ - (though, at that
point I may just switch all mail etc away from Google's services and pay someone else not
to piss around with my mail).
-----------------------------------------------------------------------------------------
2018-02-15 15:58:38 btasker
-----------------------------------------------------------------------------------------
I'm going to mark this as Done. Probably better to raise a new issue to try and directly
address AMP for email if and when it becomes something that cruds up my inbox.
-----------------------------------------------------------------------------------------
2018-02-15 15:58:38
-----------------------------------------------------------------------------------------
btasker changed status from 'Open' to 'Resolved'
-----------------------------------------------------------------------------------------
2018-02-15 15:58:38
-----------------------------------------------------------------------------------------
btasker added 'Done' to resolution
-----------------------------------------------------------------------------------------
2018-02-15 15:58:42
-----------------------------------------------------------------------------------------
btasker changed status from 'Resolved' to 'Closed'
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker changed Project from 'Miscellaneous' to 'Anti-AMP Scripts'
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker changed Key from 'MISC-25' to 'FKAMP-1'
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker removed 'Google AMP' from Version
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker added 'V1.1' to Fix Version
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker added 'v1.2' to Fix Version
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker added 'v1.3' to Fix Version
-----------------------------------------------------------------------------------------
2019-06-09 12:34:26
-----------------------------------------------------------------------------------------
btasker removed 'Google AMP' from Fix Version
-----------------------------------------------------------------------------------------
2019-06-09 12:38:17 btasker
-----------------------------------------------------------------------------------------
I've created a new JIRA project to track development of these scripts, so
- MISC-25 becomes FKAMP-1
- MISC-29 becomes FKAMP-2
- MISC-31 becomes FKAMP-3