If we have a site with 300 pages, then although it seems like cardinality should be quite low, we need to factor in the effect of timezone (also currently a tag).
Suddenly you have 300^24 (ignoring timezones that include a fraction of an hour), and that of course, only gets bigger the more pages there are.
I think that having page (and certainly section) broken out into tags is probably more useful than having timezone - it could go back to being a field.
One flipside, though, is that page is more easily played with by the malicious than timezone. We could quite trivially ensure that timezone is an int in range - it's a lot harder to place bounds on an URL path. What I don't want, is for someone to start squirting random strings in for the lulz and causing runaway cardinality.
Activity
15-Dec-21 19:43
assigned to @btasker
15-Dec-21 19:51
If we have a site with 300 pages, then although it seems like cardinality should be quite low, we need to factor in the effect of
timezone
(also currently a tag).Suddenly you have 300^24 (ignoring timezones that include a fraction of an hour), and that of course, only gets bigger the more pages there are.
I think that having
page
(and certainly section) broken out into tags is probably more useful than havingtimezone
- it could go back to being a field.One flipside, though, is that
page
is more easily played with by the malicious thantimezone
. We could quite trivially ensure thattimezone
is an int in range - it's a lot harder to place bounds on an URL path. What I don't want, is for someone to start squirting random strings in for the lulz and causing runaway cardinality.15-Dec-21 20:18
mentioned in commit 5be0a33d3c967fdd9489f74e354168cb0c9c1ec2
Message
Switch path to a tag and timezone to a field for websites/privacy-sensitive-analytics#2