I can't really paste the full amount into JIRA because it's so messy, but here's what Google are actually serving when you hit that location (assuming they believe you're a mobile)
The opening is followed by some CSS and then a whole load of obfuscated javascript (see screenshot).
The important things of note though are
- All that's one one line (yeuch)
- Google aren't actually declaring the document as being AMP (which is why the detector isn't firing) - note the html tag
The user is reporting that they have to manually reload the page before they're redirected away from Google's AMP cache. This only happens when they visit via Google Search though.
I've had no luck (user-agent changes, incognito + UA change etc) in getting Google to actually serve me AMP results so haven't been able to repro.
Now this is interesting, if not particularly helpful in answering the outstanding issue.
I've now been able to get Google to serve me AMP results. But... the must be doing some kind of compatability or experience tracking on their side, because if you do the following:
- Open Private Window in FF
- Select UA that wouldn't get AMP results
- Search "theregister AMP bad"
- No AMPs
- Change to UA that should, re-search
You still get no AMP results.
This is important because it seems that the default Iphone user-agent in the User-Agent switcher extensions, for whatever reason, doesn't get given AMP results. So when I later switched to Mobile/Chrome in there, I still didn't get the results even though I should have. Closing the private window, and then opening anew and starting off with the correct UA gets me AMP results.
Unfortunately, the behaviour I'm getting still differs to the user. I may need to lay hands a Mac to try and repro with Safari. I don't see any obvious smoking gun in the console output - the call out to www-the-register-co-uk.cdn.ampproject.org is interesting (though you can't see the full querystring, so I can't get anything beyond a 404 back atm, but it's going to https://www-the-register-co-uk.cdn.ampproject.org/v/s/www.theregister.co.uk/AMP/2017/05/19/open_source_insider_google_amp_bad_bad_bad/ which does make me wonder if that's the path that's actually being served from).
It's a blind shot in the dark, but I'm tempted to get the user to run a copy of the script which adds that domain to the checker, just to see what the behaviour ends up being.
Either way, we already know the initial change won't be sufficient on it's own, as both Cloudflare and Bing run amp caches too. The problem there, though, is they're rolling out "Real URL" support, so we won't be able to rely on the source hostname in future as they'll serve AMP from theregister.co.uk rather than (say) amp.cloudflare.com/c/theregister.co.uk/path.
Though, to be fair, based on a very quick check, Cloudflare do seem to be honouring the spec and properly declaring AMP
Right, lets take one more look at the original page first though.
So, we've got google's wrapper page, which basically involves serving a small document with a lot of javascript in it (obfuscated in the way most of Google's JS is).
Worth noting, for clients without javascript, there's also a noscript in the head which directs them straight to the cache (seems stupid considering you need JS for amp.js to load?)
ben@milleniumfalcon:~$ head /tmp/amp | wc -c
13669
Neither of those pages properly declares themselves as being AMP.
Now, if we try and detect the AMP hostname, that'll trigger from within the iframe, and so won't be able to redirect the browser.
The bit I still can't fully answer though, is why the user's Safari isn't triggering as the wrapper domain is (or, at least, should be) correct for the check that was added earlier in this issue. Need to look at that screenshot again, there must be something I'm missing.
Actually, thinking about it, the user was originally trying to adjust so they could use the Anti-amp script without the require directive - https://github.com/bentasker/RemoveAMP/issues/1 (seems it's not supported on iOS)
There is a complaint about CSP in the user's screenshot, and the bit of the path that's visible would suggest that it's on Google's wrapper page. The line number's not particularly helpful there, of course as pretty much everything is on one line within the document. But, it seems reasonable to assume that Google wouldn't leave it's own cache fucked by it's CSP, so it's probably the result of injected code.
I should be able to test this to reproduce by creating a new user script which injects the code without using require and then seeing whether I get a similar line in the console. Wouldn't explain why it then works following a refresh, but we can come to that later.
There are, as expected the CSP violations, but they both actually originate from the document within the iframe rather than at Google's level. And actually, if we look at the response serving that wrapper, they're not serving a CSP.
So, given the CSP violation occurs within the iframe, it shouldn't stop the redirect from happening (the redirect is disabled in my test userscript and just calls console.log() instead - we can see it triggered).
Wait.... it's limited to google.com so there's no way that it's responsible for the CSP violation on cdn.ampproject.org. Disabled it, refreshed and the warning is still there. That's hilarious, especially as they've got report-uri defined, every single page view must be resulting in a report going back because the content they're serving violates their own CSP.
Anyway, means we can disregard the CSP related lines in the user's screenshot
OK, I think the long and short of it is, I'm going to need to find a Mac to try and repro further on - whatever the cause is isn't really visible (or maybe just not obvious) in the provided info.
To try and get the user up and running though, I'll create a minor release with an iframe check in it. Won't update the hook to refer to it though as the performance overhead will likely be quite high (as we'll need to do a full DOM scan).
Repo: RemoveAMP
Commit: 51400177ebdc778364be15fc06f2cc3b6c3629e3
Author: B Tasker <github@<Domain Hidden>>
Date: Sun May 12 10:38:49 2019 +0100
Commit Message: MISC-29 Add slightly snarky and very temporary iframe detection to try and work around AMP detection issues on Safari when hitting Google's cache (see #2)
This involves doing a scan of the DOM, so might be quite expensive at times. The aim is to try and replace this once I've laid hands on a Mac to be able to repro the issue and troubleshoot it.
He's noted that if he adjusts the hook script to run fuckOffAMP every 10 seconds, the redirect does happen (he just has to wait a short while - presumably 10s). So there's a theory that maybe the page load isn't being considered a new page. We do rely on the onload event listener (since that commit) so it's certainly a possibility.
Ok, I've laid hands on a Mac. The only downside is now I have to remember how to use the thing.
Steps to repro (hopefully)
- Be using Safari
- Install Tampermonkey (tampermonkey.net will take you to the relevant safariextensions page)
- Safari Menu -> Preferences -> Advanced -> Tick Show Develop menu in menu bar - Go to https://github.com/bentasker/RemoveAMP/raw/master/greasemonkey_hook.user.js - When tampermonkey prompts, press Install - Develop menu, Choose Safari -- IOS 11.3 -- iPhone - Google "The Register Amp Bad"
- Results should have the Register article at the top with a little Amp icon
- Develop -> Show Web Inspector - Click Network, tick Preserve Log - Click the link to the El Reg article
- Redirect doesn't trigger
Well, I had exported a HAR in the hope that I could look through it on hardware that was less... yeah... but looks like I'm stuck on the Mac as Chrome on Linux, and various online HAR viewers claim Safari's chucking out an invalid date format. I don't much fancy trying to patch various incompatabilities in the file...
OK, first thing to note is Safari Developer Tools still labels the window "Web Inspector - www.google.com - search" which does support the theory we've not actually loaded a new page.
Looking through the HAR with grep and less supports this.
So, it seems Google aren't reloading the page, they're simply rewriting it with JS and using JS to update the address bar (AMP and address bar tampering, two things I loathe at once.... fun).
I've not bothered digging through Google's JS to try and find what reads what, but it does seem that if we tamper with their attributes it won't use Google's internal page and instead appears to go straight to cdn.ampproject.org, which is enough for the anti-amp code to trigger and redirect us.
I just want to take a quick look though, and see whether we can disrupt even that and get taken direct
OK, pasting this into Safari's JS console leads to us going direct to the El Reg page rather than anything Ampy
var da;
var bads = ['data-amp','data-amp-cur','data-amp-title','data-amp-vgi','ping'];
var eles = document.getElementsByClassName('amp_r');
for (var i=0; i<eles.length; i++){
if (eles[i].tagName.toLowerCase() != 'a'){
continue;
}
da = eles[i].getAttribute('data-amp-cur');
if (! da){
continue;
}
eles[i].href = da;
for (n=0; n<bads.length; n++){
eles[i].removeAttribute(bads[n]);
}
}
The challenge, of course, will be working out how best to trigger it. My inclination to begin with, though, is just to create a new Greasemonkey script for it, limited to the Google pages and try that
Repo: RemoveAMP
Commit: 91706cc3aae1904c7f19ae30c638df6ae8a846a6
Author: Ben Tasker <btasker@<Domain Hidden>>
Date: Wed May 15 15:26:27 2019 +0100
Commit Message: MISC-29 Create new greasemonkey script
This basically just dumps the script created in that issue into a greasemonkey script - may very well not work (just easier to transfer to the Mac by putting into the repo).
In theory it should prevent Google using their own Amp caches (without a page reload) or even sending the user to an Amp page in the first place. Liable to be a bit fragile though...
The attributes correctly get purged, and the href updated. Unfortunately, something run's after which results in &cf=1 being appended to the URL (so the far end returns a 404). We could do something nasty and hacky and try to ensure that always ends up in the querystring.
Fuck you Google. It doesn't look like it gets appended until you actually click the link. I guess hacky will have to do for now.
Repo: RemoveAMP
Commit: 177f3b41a6250812113d548ebd832ebca862006b
Author: Ben Tasker <btasker@<Domain Hidden>>
Date: Wed May 15 15:41:00 2019 +0100
Commit Message: MISC-29 Hacky hack fix to work around Google's behaviour
On an AMP compatible device, when the user clicks a link in Google's search results, they'll append &cf=1 to the URL (presumably because they expect it to go via their ping page instead of direct).
This makes sure that ends up in the querystring rather than the url path.
Longer term, might be better to rewrite the ping url rather than just overriding href though
The problem I have is that it may prove to be quite fragile. We're already reliant on Google not changing the class name amp_r (although it seems unlikely they would), I'd rather not end up sending random query string arguments to other people's sites. Most of the time you'd expect that ampcf would probably just get ignored, but it only takes on site to be using it in some way and we've potentially broken the user's browsing.
Will take a quick look at how their ping URLs work to see if we can use that instead. I've a feeling though, that if ampcf=1 is appended to a call to them they'll redirect to an AMP cache rather than to the site itself
ben@thor:~/repos/RemoveAMP$ curl "https://www.google.com/url?sa=i&source=web&rct=j&url=https://www.theregister.co.uk/2017/05/19/open_source_insider_google_amp_bad_bad_bad/&ved=2ahUKEwjdkI2P053iAhXYShUIHY3_DQEQFjAAegQIBRAB&psig=AOvVaw1iuOAsoWZI4urLlVg9q4Vn&ust=1558013609587974&ampcf=1"<html lang="en-GB"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Redirect Notice</title><style>body,div,a{font-family:arial,sans-serif}body{background-color:#fff;margin-top:3px}div{color:#000}a:link{color:#00c}a:visited{color:#551a8b}a:active{color:red}div.mymGo{border-top:1px solid #bbb;border-bottom:1px solid #bbb;background:#f2f2f2;margin-top:1em;width:100%}div.aXgaGb{padding:0.5em 0;margin-left:10px}div.fTk7vd{margin-left:35px;margin-top:35px}</style><script nonce="rhgRoKjoLDXeFAPz09F1AA==">function go_back(){window.history.go(-1);return false;}
function ctu(oi,ct){var link = document && document.referrer;var esc_link = "";var e = window && window.encodeURIComponent ?encodeURIComponent :escape;if (link){esc_link = e(link);}
new Image().src = "/url?sa=T&url=" + esc_link + "&oi=" + e(oi)+ "&ct=" + e(ct);return false;}
</script></head><body><div class="mymGo"><div class="aXgaGb"><font style="font-size:larger"><b>Redirect Notice</b></font></div></div><div class="fTk7vd"> The page you were on is trying to send you to an invalid URL.<br><br> If you do not want to visit that page, you can <a href="#" data-ct="originlink" data-oi="unauthorizedredirect" onclick="return go_back();" onmousedown="ctu(this.getAttribute('data-oi'),this.getAttribute('data-ct'));">return to the previous page</a>.<br><br><br></div></body></html>
Makes sense really otherwise they'd have people bouncing victims of their redirect page onto other locations.
Repo: RemoveAMP
Commit: 217d5e8ff8150eb1988d2dbb4837f0f65a4df696
Author: Ben Tasker <btasker@<Domain Hidden>>
Date: Wed May 15 16:20:14 2019 +0100
Commit Message: Revert "MISC-29 Add slightly snarky and very temporary iframe detection to try and work around AMP detection issues on Safari when hitting Google's cache (see #2)"
This reverts commit 51400177ebdc778364be15fc06f2cc3b6c3629e3.
Further testing has identified the cause of the issues in Safari/Google Search, and this detection won't help with that.
Might be useful in the long run, but as it was thrown together quite quickly I don't feel it's been adequately tested so should be removed for now
Activity
2019-05-10 23:03:55
The opening is followed by some CSS and then a whole load of obfuscated javascript (see screenshot).
The important things of note though are
- All that's one one line (yeuch)
- Google aren't actually declaring the document as being AMP (which is why the detector isn't firing) - note the
2019-05-10 23:03:55
2019-05-10 23:07:41
2019-05-10 23:07:41
2019-05-10 23:16:46
Webhook User-Agent
View Commit
2019-05-10 23:25:58
Webhook User-Agent
View Commit
2019-05-11 09:52:44
Looking at logs it's because there are a couple of hops to make:
- Start: https://www.google.com/amp/s/www.theregister.co.uk/AMP/2017/05/19/open_source_insider_google_amp_bad_bad_bad/
- Go to listed canonical (still AMP): https://www.theregister.co.uk/AMP/2017/05/19/open_source_insider_google_amp_bad_bad_bad/
- Go to listed Canonical (proper HTML): https://www.theregister.co.uk/2017/05/19/open_source_insider_google_amp_bad_bad_bad/
2019-05-12 09:08:24
I've had no luck (user-agent changes, incognito + UA change etc) in getting Google to actually serve me AMP results so haven't been able to repro.
2019-05-12 09:10:18
The URL is the same as those mentioned earlier, but when the user goes there direct the redirect works.
2019-05-12 09:10:18
2019-05-12 09:35:21
I've now been able to get Google to serve me AMP results. But... the must be doing some kind of compatability or experience tracking on their side, because if you do the following:
- Open Private Window in FF
- Select UA that wouldn't get AMP results
- Search "theregister AMP bad"
- No AMPs
- Change to UA that should, re-search
You still get no AMP results.
This is important because it seems that the default Iphone user-agent in the User-Agent switcher extensions, for whatever reason, doesn't get given AMP results. So when I later switched to Mobile/Chrome in there, I still didn't get the results even though I should have. Closing the private window, and then opening anew and starting off with the correct UA gets me AMP results.
Unfortunately, the behaviour I'm getting still differs to the user. I may need to lay hands a Mac to try and repro with Safari. I don't see any obvious smoking gun in the console output - the call out to
It's a blind shot in the dark, but I'm tempted to get the user to run a copy of the script which adds that domain to the checker, just to see what the behaviour ends up being.
Either way, we already know the initial change won't be sufficient on it's own, as both Cloudflare and Bing run amp caches too. The problem there, though, is they're rolling out "Real URL" support, so we won't be able to rely on the source hostname in future as they'll serve AMP from
Though, to be fair, based on a very quick check, Cloudflare do seem to be honouring the spec and properly declaring AMP
2019-05-12 09:49:19
So, we've got google's wrapper page, which basically involves serving a small document with a lot of javascript in it (obfuscated in the way most of Google's JS is).
Worth noting, for clients without javascript, there's also a
But, ultimately, Google's page just results in a page which has an iframe in it referencing the actual amp cache.
So, if we go to https://www.google.com/amp/s/www.theregister.co.uk/AMP/2017/05/19/open_source_insider_google_amp_bad_bad_bad/ the iframe loads content from
If you try and access that direct in a browser, you'll get a 404, need to make sure that every request header is correct
At which point, you get our source document
Neither of those pages properly declares themselves as being AMP.
Now, if we try and detect the AMP hostname, that'll trigger from within the iframe, and so won't be able to redirect the browser.
The bit I still can't fully answer though, is why the user's Safari isn't triggering as the wrapper domain is (or, at least, should be) correct for the check that was added earlier in this issue. Need to look at that screenshot again, there must be something I'm missing.
2019-05-12 09:57:54
That directive was added because otherwise the anti-AMP protection wouldn't work on sites with moderately strict Content Security Policies. See here - https://projects.bentasker.co.uk/jira_projects/browse/MISC-25.html#comment2186467
There is a complaint about
I should be able to test this to reproduce by creating a new user script which injects the code without using
2019-05-12 10:08:49
The result isn't actually quite as expected.
What I got was
There are, as expected the CSP violations, but they both actually originate from the document within the iframe rather than at Google's level. And actually, if we look at the response serving that wrapper, they're not serving a CSP.
So, given the CSP violation occurs within the iframe, it shouldn't stop the redirect from happening (the redirect is disabled in my test userscript and just calls
2019-05-12 10:22:23
Wait.... it's limited to
Anyway, means we can disregard the CSP related lines in the user's screenshot
2019-05-12 10:25:36
To try and get the user up and running though, I'll create a minor release with an iframe check in it. Won't update the hook to refer to it though as the performance overhead will likely be quite high (as we'll need to do a full DOM scan).
2019-05-12 10:34:45
2019-05-12 10:44:46
Webhook User-Agent
View Commit
2019-05-12 10:47:00
Rolled out onto the CDN here - https://github.com/bentasker/adblocklists/commit/d3a8cd0399bb361f250f5e308dbe4e917dc14836
Just need to calculate the SRI hash:
2019-05-12 11:05:58
He's noted that if he adjusts the hook script to run
2019-05-15 13:55:20
2019-05-15 13:55:20
2019-05-15 14:35:56
Steps to repro (hopefully)
- Be using Safari
- Install Tampermonkey (tampermonkey.net will take you to the relevant safariextensions page)
-
- Go to https://github.com/bentasker/RemoveAMP/raw/master/greasemonkey_hook.user.js
- When tampermonkey prompts, press
-
- Google "The Register Amp Bad"
- Results should have the Register article at the top with a little Amp icon
-
- Click
- Click the link to the El Reg article
- Redirect doesn't trigger
2019-05-15 14:37:04
2019-05-15 14:47:02
2019-05-15 14:58:19
Although the address in Safari's address bar is https://www.google.com/amp/s/www.theregister.co.uk/AMP/2017/05/19/open_source_insider_google_amp_bad_bad_bad/ we never actually see a request for the path
Looking through the HAR with
So, it seems Google aren't reloading the page, they're simply rewriting it with JS and using JS to update the address bar (AMP and address bar tampering, two things I loathe at once.... fun).
2019-05-15 15:10:06
I've not bothered digging through Google's JS to try and find what reads what, but it does seem that if we tamper with their attributes it won't use Google's internal page and instead appears to go straight to
I just want to take a quick look though, and see whether we can disrupt even that and get taken direct
2019-05-15 15:23:26
The challenge, of course, will be working out how best to trigger it. My inclination to begin with, though, is just to create a new Greasemonkey script for it, limited to the Google pages and try that
2019-05-15 15:30:46
Webhook User-Agent
View Commit
2019-05-15 15:39:31
The attributes correctly get purged, and the
Fuck you Google. It doesn't look like it gets appended until you actually click the link. I guess hacky will have to do for now.
2019-05-15 15:44:45
Webhook User-Agent
View Commit
2019-05-15 15:48:28
The problem I have is that it may prove to be quite fragile. We're already reliant on Google not changing the class name
Will take a quick look at how their ping URLs work to see if we can use that instead. I've a feeling though, that if
2019-05-15 15:51:48
Makes sense really otherwise they'd have people bouncing victims of their redirect page onto other locations.
OK, will leave as is for now
2019-05-15 16:04:24
2019-05-15 16:04:24
2019-05-15 16:22:46
Webhook User-Agent
View Commit
2019-05-15 16:27:36
I've rolled this as v1.4.1 - https://github.com/bentasker/RemoveAMP/releases/tag/v1.4.1
Marking as fixed
2019-05-15 16:27:41
2019-05-15 16:27:41
2019-05-15 16:27:45
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:36:43
2019-06-09 12:38:36
- MISC-25 becomes FKAMP-1
- MISC-29 becomes FKAMP-2
- MISC-31 becomes FKAMP-3
2019-06-09 12:38:44
2019-06-09 12:38:44
2019-06-09 12:38:51
2019-06-09 12:38:55
2019-06-09 12:38:55
2019-06-09 12:39:00