It probably isn't possible to downsample the search terms themselves
Actually, on reflection, I'd go further. It isn't really desirable to downsample the terms themselves - some users search for some horrible terms, I don't really want those in a long-lived database.
What'd be nice is if we could find a good way to categorise those terms and record a count for each. Of course, that's easier said than done (at least in any kind of reliable manner) so may be better left as an aspiration.
We now have count and average result counts - don't think there's really anything else we want to keep long term.
I'll keep this open for now, on the offchance I feel inspired to try some kind of categorisation but I suspect I won't (if nothing else, the categories are likely to be quite site specific).
One option might be to "downsample" them into a dedicated DB whilst I try and decide what to do, that way they can easily be dropped if I decide I don't want to do anything with them
Activity
29-Mar-22 07:48
assigned to @btasker
29-Mar-22 07:51
Actually, on reflection, I'd go further. It isn't really desirable to downsample the terms themselves - some users search for some horrible terms, I don't really want those in a long-lived database.
What'd be nice is if we could find a good way to categorise those terms and record a count for each. Of course, that's easier said than done (at least in any kind of reliable manner) so may be better left as an aspiration.
29-Mar-22 14:44
So, yeah, I think we probably want to capture the following (per domain)
29-Mar-22 14:51
We can pull number of searches per domain with
29-Mar-22 14:55
mentioned in commit 3bd0662599f124b73f0cd5c3cd1c50ac9c152ad6
Message
Downsample search terms into a count of searches over time for websites/privacy-sensitive-analytics#13
29-Mar-22 14:57
Result counts are much simpler
29-Mar-22 14:59
mentioned in commit 124ee63effbe739afe0a4bcd39689cd3e24ef97b
Message
Record average number of search results for websites/privacy-sensitive-analytics#13
29-Mar-22 15:23
We now have count and average result counts - don't think there's really anything else we want to keep long term.
I'll keep this open for now, on the offchance I feel inspired to try some kind of categorisation but I suspect I won't (if nothing else, the categories are likely to be quite site specific).
One option might be to "downsample" them into a dedicated DB whilst I try and decide what to do, that way they can easily be dropped if I decide I don't want to do anything with them
29-Mar-22 16:32
mentioned in commit a570a6b783c43d03f81f51c6483329c6759557a2
Message
Capture search terms into a seperate bucket for websites/privacy-sensitive-analytics#13
This is so that we can preserve seaarch information whilst trying to find a reliable way to categorise search inputs
29-Mar-22 16:38
I've implemented capture of search terms into bucket
search_terms/autogen
.There's a wiki page detailing how to list them