project Websites / Privacy Sensitive Analytics avatar

websites/privacy-sensitive-analytics#7: Design downsampling



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: 0.1
Created: 18-Dec-21 19:46



Description

The data will ultimately need to be downsampled.

However, the schema implemented so far doesn't lend itself well to downsampling with a Continous Query - there are transforms and pivots needed, so it probably makes as much sense to perform it with a set of Flux queries.

This ticket is being raised to start thinking about what those might look like



Toggle State Changes

Activity


assigned to @btasker

When looking at historic data, it's all about aggregate figures.

So, for referrering domain we probably want to

  • Pivot to get per-referring domain figures
  • extract the top n% and then allocate the rest to "Other" or something

That way we might show that referring domains were

  • Google: 100
  • Bing: 80
  • Slashdot: 50
  • Other: 40

We could go another way and convert to a tag, but the long-term cardinality ramifications of that are likely to be quite high. It would allow us to say example.com has referred 200 page views over the last year, but traffic comes from a hell of a lot of sources, so there's a big impact even when it's just domains

I have a similar line of thought for platform - it'd be easy for them to blow up over time, so again it'd be worth taking the top ten or similar.

Forgot to reference this issue in commits, obviously having too much fun...

Commits are

  • 6aba5271
  • 228e0909
  • df9c83a9