project Websites / Privacy Sensitive Analytics avatar

websites/privacy-sensitive-analytics#2: Page paths as tags



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: 0.1
Created: 15-Dec-21 19:43



Description

Do we want to have page paths as tags?

It'd mean we can GROUP BY and things like that (useful in reporting), but comes at the cost of cardinality.



Toggle State Changes

Activity


assigned to @btasker

If we have a site with 300 pages, then although it seems like cardinality should be quite low, we need to factor in the effect of timezone (also currently a tag).

Suddenly you have 300^24 (ignoring timezones that include a fraction of an hour), and that of course, only gets bigger the more pages there are.

I think that having page (and certainly section) broken out into tags is probably more useful than having timezone - it could go back to being a field.

One flipside, though, is that page is more easily played with by the malicious than timezone. We could quite trivially ensure that timezone is an int in range - it's a lot harder to place bounds on an URL path. What I don't want, is for someone to start squirting random strings in for the lulz and causing runaway cardinality.

verified

mentioned in commit 5be0a33d3c967fdd9489f74e354168cb0c9c1ec2

Commit: 5be0a33d3c967fdd9489f74e354168cb0c9c1ec2 
Author: B Tasker                            
                            
Date: 2021-12-15T20:07:57.000+00:00 

Message

Switch path to a tag and timezone to a field for websites/privacy-sensitive-analytics#2

+3 -2 (5 lines changed)