project Websites / Privacy Sensitive Analytics avatar

websites/privacy-sensitive-analytics#8: Periodic roll-up report



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: 0.2
Created: 29-Dec-21 11:10



Description

I want the system to periodically create (and email?) a report.

It should accept different time-ranges, so that it can generate

  • 1 week
  • 1 month
  • 1 year

(maybe also a quarterly report?)



Toggle State Changes

Activity


assigned to @btasker

Originally, the intention had been to include this in the downsample script, but I'm not altogether convinced that's actually the best idea.

It's probably better to have a dedicated script

The report should be a high level summary (so encompass all recorded domains), what I'd like to see is

  • Total page views
  • Average response time (maybe also per domain?)
  • Share of pageviews per site
  • Share of pageviews per platform
  • Share of pageviews per timezone
  • Most common referrers
  • Top 10 pages
  • Average video ready time
  • Most popular videos (or perhaps share per video)

Where averages are being shown, it'd be good to also include min/max and maybe percentiles?

Ideally, this would all get wrapped up with pie charts/graphs etc

verified

mentioned in commit 093cb700160fcb1f17af4d84ff5a850a87410485

Commit: 093cb700160fcb1f17af4d84ff5a850a87410485 
Author: B Tasker                            
                            
Date: 2021-12-29T11:45:33.000+00:00 

Message

Capture page views and response times for reporting. See websites/privacy-sensitive-analytics#8

+260 -0 (260 lines changed)
verified

mentioned in commit 2bf0d7633699d00195a962329971662b6b497bc4

Commit: 2bf0d7633699d00195a962329971662b6b497bc4 
Author: B Tasker                            
                            
Date: 2021-12-29T12:16:16.000+00:00 

Message

Implement extraction of page views for websites/privacy-sensitive-analytics#8

This pulls out page views per domain, per

  • platform
  • timezone
+76 -1 (77 lines changed)
verified

mentioned in commit fba11aa56a800bc21015ed19838e74ca99637c17

Commit: fba11aa56a800bc21015ed19838e74ca99637c17 
Author: B Tasker                            
                            
Date: 2021-12-29T12:28:49.000+00:00 

Message

Collect referrer info for websites/privacy-sensitive-analytics#8

+21 -0 (21 lines changed)
verified

mentioned in commit 7c72d90ee69ac1640f45ec7a6ea47ce494c5d163

Commit: 7c72d90ee69ac1640f45ec7a6ea47ce494c5d163 
Author: B Tasker                            
                            
Date: 2021-12-29T17:56:59.000+00:00 

Message

Implement reporting script for websites/privacy-sensitive-analytics#8

It's dirty, messy and ugly, but it works.

+349 -5 (354 lines changed)
verified

mentioned in commit 30a9ce880058550187b0c2baea7e0f595a9a6d70

Commit: 30a9ce880058550187b0c2baea7e0f595a9a6d70 
Author: B Tasker                            
                            
Date: 2021-12-29T13:07:37.000+00:00 

Message

Capture video playback info for websites/privacy-sensitive-analytics#8

+60 -3 (63 lines changed)
verified

mentioned in commit 639ce67c97e3486170429ec704293dff186c3c30

Commit: 639ce67c97e3486170429ec704293dff186c3c30 
Author: B Tasker                            
                            
Date: 2021-12-29T12:51:50.000+00:00 

Message

Collect details of top 10 pages for websites/privacy-sensitive-analytics#8

+26 -0 (26 lines changed)

A run of yearly stats OOMs my instance (though it is somewhat RAM restricted) so I've created a new retention policy and set up a CQ to downsample into it

CREATE CONTINUOUS QUERY create_daily_website_stats ON websites BEGIN SELECT max(max_response_time) as max_response_time, min(min_response_time) as min_response_time, sum(platform_PageViews) as platform_PageViews, sum(referrer_PageViews) as referrer_PageViews, sum(requests) as requests, mean(response_time) as response_time, sum(tz_PageViews) as tz_PageViews INTO "websites"."analytic_low_granularity"."pf_analytics" FROM "websites"."analytics"."pf_analytics" GROUP BY time(1d), * END