project Websites / Privacy Sensitive Analytics avatar

websites/privacy-sensitive-analytics#17: Increase odds of Session ID rotating



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.4
Created: 05-Apr-22 11:10



Description

In websites/privacy-sensitive-analytics#16 a counter has been implemented so that session IDs rotate every 3 uses.

Currently, there's a 1:10 random chance of a rotation - we'll very, very, very rarely encounter that happening now that it only has a 3 use lifetime.

The random rotation was left in with the intention of it increasing uncertainty when trying to correlate requests, but it doesn't currently achieve that.

So, we should either remove it, or increase the probability of it firing



Issue Links

Toggle State Changes

Activity


assigned to @btasker

marked this issue as related to #16

Ideally, we want it to trigger frequently enough to be able to disrupt pattern analysis, whilst not triggering so often that it becomes a signal it's own right.

With a 1:10 chance, I'd say that the majority of the time, it won't trigger before standard rotation.

If we went for a 1:3 chance, the rotation pattern would be

  • Req 1: brand new, no rotation
  • Req 2: 1/3 chance of rotation
  • Req 3: 1/3 chance of rotation
  • Req 4: 100% chance of rotation (assuming no earlier one)

Is 1:3 too high though, do we perhaps want 1:5?

I guess we need to think about why we want rollover at all.

If we get a bunch of pings without rollover

  • /foo.html sess-1234
  • /bar.html sess-1234
  • /sed.html sess-1234
  • /foobar.html sess-789

We can

  • reliably track sess-1234 for the first few requests
  • Say with some confidence that sess-789 is probably the same user as it appeared at the known ID rotation point

With random rollover enabled, that same log might look like

  • /foo.html sess-1234
  • /bar.html sess-1234
  • /sed.html sess-6354
  • /foobar.html sess-789

There's now no definite point we can rely on as the rollover point.

Of course, if that user is the only active user at that time, then we don't gain anything, but if there are multiple active users (all using rollover) it becomes harder to tie a chain of requests together.

So what we're trying to do is strike a balance where

  • The chance of rollover is high enough that many users won't reach the max-count
  • The chance of rollover is low enough to ensure that the majority of users won't rollover at the same point

Essentially, we need each user to have a good - but not guaranteed - chance of rotating session ID at each point

Actually, I'm thinking about the odds backwards.

If there's a 1:3 chance of rotation then there's a 2:3 chance a given write is related to another. If we change the odds of rotation to 1:5 we're increasing certainty (4:5 chance of being related) not decreasing it

verified

mentioned in commit 03c3049

Commit: 03c30497d9f7617b82f2dad4324556f32993538a 
Author: B Tasker                            
                            
Date: 2022-04-07T16:22:13.000+01:00 

Message

Move to a 1:3 chance of a session ID rotating (websites/privacy-sensitive-analytics#17)

This decreases certainty about whether two seperate writes are related.

  • With a 1:10 chance of rotation there's a 9:10 certainty of two writes being related
  • Moving to 1:3 decreases that certainty to 2:3
+2 -2 (4 lines changed)

I've moved us to a 1:3 chance of rotation.