##########################################################################################
FKAMP-5: See if we can find a way to handle AMP on Google News
##########################################################################################
Issue Type: Improvement
-----------------------------------------------------------------------------------------
Issue Information
====================
Priority: Major Status: Closed
Resolution: Done (2019-06-11 18:18:55)
Project: Anti-AMP Scripts (FKAMP)
Reported By: btasker
Assigned To: btasker
Targeted for fix in version:
- v1.4.21
Time Estimate: 0 minutes
Time Logged: 0 minutes
-----------------------------------------------------------------------------------------
Issue Description
==================
FKAMP-4 ultimately implemented a script to redirect Google News to Bing news as a
workaround for Google having made Google News extremely hostile.
It's not a great long-term solution though, so really need to look at trying to get AMP
detection working on Google News
-----------------------------------------------------------------------------------------
Issue Relations
================
- relates to FKAMP-4: Redirect Away from Google News
- relates to FKAMP-6: Parse AMP CDN Urls to decide whether to redirect to HTTP or HTTPS
-----------------------------------------------------------------------------------------
Activity
==========
-----------------------------------------------------------------------------------------
2019-06-11 17:40:21 btasker
-----------------------------------------------------------------------------------------
So, looking at this some more (as I don't think redirects to alternate services are a good
long term solution), I've noticed a couple of things:
- If you force a link to open in a new tab, Google then redirects you to the proper page
- Just like in the search results (FKAMP-2) the actual AMP content is served in an iframe
directed at ampproject.org
So, there are potentially two options here.
- We *could* iterate over all links on a page and add target=_blank to them (remembering
to also add rel="noopener noreferrer" so that the new page doesn't have access to the
Google search tab via window.opener)
- Or, as was originally implemented as a possible fix for FKAMP-2, we could search for the
iframe and then try and trigger a redirect if that's present
The first feels a bit messier, but the second has a couple of additional drawbacks.
Firstly, it means a request still has to go out to foo-bar-sed.cdn.ampproject.org so
you've got additional latency there, plus if *.cdn.ampproject has been blocked in the
user's browser we'll never get the info back.
For some reason, even with all my scripts turned off, the iframe removes itself after a
few seconds if I open developer tools. Anyway the outer HTML for the iframe is
-- BEGIN SNIPPET --
-- END SNIPPET --
So, what we _may_ want to look at doing is checking for iframe elements, for any that are
found check whether their src contains cdn.ampproject and if it does rewrite the window
location to be that value (so the normal triggers can fire).
That's not perfect, but should work in principle
-----------------------------------------------------------------------------------------
2019-06-11 17:44:48 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: c104941e180d239ee9cfa53b250dd67f3a6dbd12
Author: B Tasker >
Date: Tue Jun 11 17:41:07 2019 +0100
Commit Message: FKAMP-5 Update the Googlesearch hook to also work (sort of) with Google
News
Introduces logic into AMPCheck to look for iframe's referencing the AMP project CDN. If
found, it updates the page to point to that URL so that the normal anti-AMP scripts can
fire.
The downside of this is it means there are a couple of page loads before you eventually
land on the full-fat page, so there's definitely some room for improvement
Modified (-)(+)
-------
greasemonkey_hook_googlesearch.user.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/d408d22
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/c104941e180d239ee9cfa53b250dd67f3a6dbd12
-----------------------------------------------------------------------------------------
2019-06-11 17:50:51 btasker
-----------------------------------------------------------------------------------------
The downside of the implementation in c104941 is that we end up with several page loads
- Google News page (technically rewritten with javascript)
- AMP CDN page
- (optional) Publishers own AMP page
- Proper HTML page
That's, very far from ideal, especially given the reasons noted above for why I don't
really want requests to have to go out to cdn.ampproject.org at all.
It's a pity it isn't as simple as search results to work around, but the only the AMP
paths are written into the markup sent back from their servers.
Seems my earlier result was wrong, opening in a new tab isn't enough to force the page to
redirect you to a proper result.
-----------------------------------------------------------------------------------------
2019-06-11 17:55:24 btasker
-----------------------------------------------------------------------------------------
Looking at the URL's in FKAMP-2, along with clicking around Google news, it does look like
the URL structure for Amp project is fairly consistent:
-
https://www-theregister-co-uk.cdn.ampproject.org/v/s/www.theregister.co.uk/AMP/2017/05/19/open_source_insider_google_amp_bad_bad_bad/
-
https://www-bbc-co-uk.cdn.ampproject.org/v/s/www.bbc.co.uk/news/amp/uk-politics-48598760
We can boil that down to
-- BEGIN SNIPPET --
https://({domain_name}.replace('.','-')).cdn.ampprojects.org/v/s/{domain_name}/{page
path}
-- END SNIPPET --
So, we could skip the hop via the AMP cdn by parsing the relevant sections out of the URL.
It'll break if they change their URL structure, but we'll burn that bridge when we come to
it
-----------------------------------------------------------------------------------------
2019-06-11 18:06:46 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: 791707a121b5c66b1a354e51e7749057bd82355c
Author: B Tasker >
Date: Tue Jun 11 18:03:12 2019 +0100
Commit Message: FKAMP-5 Remove need to go to ampproject CDN before being redirected onto
the original publisher
This removes one hop from the redirect chain, and subsequent ones are much faster as you
_tend_ to speak to the same domain name for a publishers copy of the AMP as you would for
the real page, so DNS is already done and there's a connection open already.
This change means that the Anti-AMP functionality still works on Google News with
cdn.ampproject.org blocked in my adblocker
Modified (-)(+)
-------
greasemonkey_hook_googlesearch.user.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/d408d22
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/791707a121b5c66b1a354e51e7749057bd82355c
-----------------------------------------------------------------------------------------
2019-06-11 18:13:21 btasker
-----------------------------------------------------------------------------------------
Although this works, it doesn't account for situations where the origin page requires a
query-string.
But, clicking through Google News I've not been able to locate any examples of that to see
how Amp Project encode it into their URLs.
Google's documentation though -
https://developers.google.com/amp/cache/overview#query-parameter-example - seems to
specify that it'll be part of the original query string. Problem is, if we include the
original (taken from the iframe) there's all sorts of gumph in there, some of which we may
specifically not want to send to the origin server. Lets break the QS down:
-- BEGIN SNIPPET --
amp_js_v=0.1#origin=https%3A%2F%2Fnews.google.com&prerenderSize=1&visibilityState=visible&paddingTop=0&history=0&p2r=0&horizontalScrolling=0&storage=1&development=0&log=0&cap=cid&csi=0&cid=1
-- END SNIPPET --
Ah, actually, that's not so bad.
Looks like all the AMP specific stuff is pushed into the URL fragment (so is never seen by
the AMP CDN, and must just be handled in JS). So, we could just split up to the fragment
in order to leave the original query string in place.
-----------------------------------------------------------------------------------------
2019-06-11 18:14:47 git
-----------------------------------------------------------------------------------------
-- BEGIN QUOTE --
Repo: RemoveAMP
Commit: 14f9a8e250e383c35146d56fe5fcbf08a590a1ab
Author: B Tasker >
Date: Tue Jun 11 18:13:43 2019 +0100
Commit Message: FKAMP-5 Split on the fragment rather than the start of the query string
Modified (-)(+)
-------
greasemonkey_hook_googlesearch.user.js
-- END QUOTE --
*Webhook User-Agent*
-- BEGIN SNIPPET --
GitHub-Hookshot/d408d22
-- END SNIPPET --
https://github.com/bentasker/RemoveAMP/commit/14f9a8e250e383c35146d56fe5fcbf08a590a1ab
-----------------------------------------------------------------------------------------
2019-06-11 18:15:59 btasker
-----------------------------------------------------------------------------------------
OK, as this is now working, I'm going to remove Google news from the redirect script, and
then look at doing a release (so that I can bump version numbers in the scripts)
-----------------------------------------------------------------------------------------
2019-06-11 18:18:55
-----------------------------------------------------------------------------------------
btasker changed status from 'Open' to 'Resolved'
-----------------------------------------------------------------------------------------
2019-06-11 18:18:55
-----------------------------------------------------------------------------------------
btasker added 'Done' to resolution
-----------------------------------------------------------------------------------------
2019-06-11 18:19:00
-----------------------------------------------------------------------------------------
btasker changed status from 'Resolved' to 'Closed'