Had, err, fun over the weekend (jira-projects/CDN#17).
I'd like to make adjustments to the agent so that we collect some additional information when a bad domain is detected.
Not just because of that issue (it's a serious edge-case), but also because I quite regularly see videos.bentasker.co.uk
appear in the bad domains list, and I've no idea why. So, I'd like to update the agent to try and tell me a bit more in both cases.
The aim is to identify the source of the traffic not the user/visitor.
Currently we capture
I'd like to expand that data-set for bad domains:
I've added Referrer
into the second list because I think, for a bad domain, we should change the way we handle it. Ordinarily, we blank it if the referrer is on the same domain
var referrer_domain = '';
var referrer = document.referrer;
if (referrer.startsWith(document.location.protocol + "//" + document.location.hostname)){
// In-site navigation, blank the referrer - we're not looking to stalk users around the site
referrer = '';
}
But, where the site has been mirrored (for example, that Cellebrite issue) there's a good chance everything will be within a single domain. So, it'd be useful to remove this restriction for bad domains.
#15 | Bad Domains reporting should handle file:// scheme |
#14 | Remove screen capture ability |
Activity
28-Mar-22 09:33
assigned to @btasker
28-Mar-22 09:40
I've got mixed feelings about capturing the Page HTML - it's one of those things that's a waste of bandwidth/storage right up until it isn't.
It would have been interesting in jira-projects/CDN#17 to see what they were viewing, similarly it'd be useful to see what a clickjacker is serving to others and it might help me figure out why
videos.bentasker.co.uk
keeps showing up in the list.We can grab it with
document.documentElement.outerHTML
, though might want to think about compressing it (the challenge there though is then decompressing it to make it usable in reporting).28-Mar-22 10:01
If we do capture page HTML, I'd like to be able to configure/define exclusions to this.
For example, we know there will be lots of
onion.ws
andonion.ly
in there (because Tor2Web). There's precious little benefit in sending page HTML for most of those.28-Mar-22 10:10
As a low effort solution for compression, I guess we use this - https://github.com/pieroxy/lz-string/blob/master/libs/lz-string.js - and then base64 it (although
btoa
doesn't seem to like its output).28-Mar-22 12:49
mentioned in commit 09a4797baae12c6b6be95c5955f494967841d552
Message
Implement collection of additional information when a bad domain is detected - see websites/privacy-sensitive-analytics#10
28-Mar-22 14:17
At this point, collection works but we end up trying to write fairly massive blobs into the DB.
I decided it's better to write the files to disk on Mikasa and then simply push a URL to view the image at.
28-Mar-22 14:21
OK, this has tested OK, so I think we roll a release and then look at improving reporting in the next version.
28-Mar-22 14:31
mentioned in commit ae06788c235a481d6daf180f9815bf20096eee26
Message
Write screenshot to disk rather than trying to push it upstream. See websites/privacy-sensitive-analytics#10
28-Mar-22 14:53
For now, can query out with
29-Mar-22 07:40
mentioned in issue #12
02-Apr-22 07:47
mentioned in issue #14
02-Apr-22 07:59
mentioned in issue #15
02-Apr-22 08:00
marked this issue as related to #15
02-Apr-22 08:00
marked this issue as related to #14