websites/privacy-sensitive-analytics#2: Page paths as tags

Reported By: btasker
Assigned To: btasker

Milestone: 0.1
Created: 15-Dec-21 19:43


Do we want to have page paths as tags?

It'd mean we can GROUP BY and things like that (useful in reporting), but comes at the cost of cardinality.

assigned to @btasker

If we have a site with 300 pages, then although it seems like cardinality should be quite low, we need to factor in the effect of timezone (also currently a tag).

Suddenly you have 300^24 (ignoring timezones that include a fraction of an hour), and that of course, only gets bigger the more pages there are.

I think that having page (and certainly section) broken out into tags is probably more useful than having timezone - it could go back to being a field.

One flipside, though, is that page is more easily played with by the malicious than timezone. We could quite trivially ensure that timezone is an int in range - it's a lot harder to place bounds on an URL path. What I don't want, is for someone to start squirting random strings in for the lulz and causing runaway cardinality.


Switch path to a tag and timezone to a field for websites/privacy-sensitive-analytics#2

