project Websites / Privacy Sensitive Analytics avatar

websites/privacy-sensitive-analytics#4: Page view count



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: 0.1
Created: 18-Dec-21 11:04



Description

One of the features you'll quite often see in analytics systems is bounce rate - i.e. what percentage of users landed, viewed that one page and then bounced?

I'd like to implement that, but it's a little bit of a challenge - the point of this analytics system is that it shouldn't be possible to track a user's movement across the site.

Still it'd be interesting to know how many pages a visitor views per visit.



Toggle State Changes

Activity


assigned to @btasker

I really don't want to use cookies for this, at best it's completely redundant to have the information sent across the wire, at worst you end up exposing more information than intended.

My current thinking is to use session storage - check whether there's an item with the current domain, and start one if not - then increment by 1.

It won't be shared across tabs, so there is stuff that'll be missed, but it also means that user's individual visits should not be linked.

verified

mentioned in commit 481c254edf230f2b640236cc9dc2061d2b75563b

Commit: 481c254edf230f2b640236cc9dc2061d2b75563b 
Author: B Tasker                            
                            
Date: 2021-12-18T11:18:41.000+00:00 

Message

Add a page view counter for websites/privacy-sensitive-analytics#4

This will not track between sessions, or even browser tabs, but can be used to help calculate the bounce-rate

+20 -4 (24 lines changed)

The commit above implements this using sessionStorage.

So, counts won't be shared across tabs (or domains) and will reset at the end of the user's session.

When talking about bounce-rate, you'll sometimes see a "Time spent on page" stat. This could be tracked with an onunload() event, but I don't overly like the idea of making the user re-contact our server just to be able to leave the page.

So, I've not implemented time capturing

Reporting wise, obviously you can graph this out with mean(viewed_pages) but it should also be able to bucket them with flux

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "pf_analytics_test")
  |> filter(fn: (r) => r.domain == "www.bentasker.co.uk")
  |> filter(fn: (r) => r._field == "viewed_pages")
  |> map(fn: (r) => ({ r with value_bucket: 
                         if (r._value == 1) then "1"
                         else if (r._value > 1 and r._value <= 5) then "2-5" 
                         else if (r._value > 5 and r._value <= 10) then "6-10"
                         else "10+"
                              }))
  |> group(columns: ["value_bucket"])
  |> count()

Would show how many points fall into each bucket - the only thing is, there will be multiple points for any user with >1 page view, so if a user has viewed 8 pages, the report would be

  • 1: 1
  • 2-6: 4
  • 6-10: 3

Which is much less clear than it should be. It should be possible to account/adjust for this, but I'm not overly concerned about this report for now so will leave it til I have more brainpower spare

mentioned in issue #5

OK, so as an example, what I've done is

  • Created pfanalytics.bentasker.co.uk in BunnyCDN
  • Cache expiration - Respect origin-cache control
  • Strip response cookies (just to prevent any accidents)
  • Forward host header to origin

I haven't cut over yet though. There doesn't seem to be a way to tell the CDN not to send X-Forwarded-For, so we don't gain much this way. What we do get for our troubles, though, is additional latency.

So, it might be better to stick with going direct to one of my nodes.

^ I think this was meant for #5

I removed the pageview counter in fe3b7a89

It's not turned out as useful/interesting as I first expected, so it doesn't really feel worth the effort of addressing the issues I detailed above.

Although technically it was Done and then UnDone, I'll close as Won't Fix as this feature's not going forward