Wiki: Hit Counter Functionality/Websites / Privacy Sensitive Analytics



DEPRECATED: Support for this was disabled in websites/privacy-sensitive-analytics#21

Background

websites/privacy-sensitive-analytics#18 implemented a pixel based endpoint so that hits could be collected without collecting information about the user's browser.


Usage

The calling page should embed the image endpoint

<img src="https://[url]/count.gif" >

Assuming the Referer header is available, the system will then collect which domain + page the image was embedded into.


Cardinality

Because the system records so little information about the requests, there's a strong possibility for simultaneous requests to overwrite one another.

To counter this, the system generates a unique request id based upon Nginx internal information

ngx.var.connection, -- Nginx connection ID
ngx.var.connection_requests, -- how many requests have used this connection
ngx.var.pid -- pid of Nginx

This yields an ID of the form

3316936-1-28791

Whilst this prevents points overwriting one another, it also results in extremely high cardinality within the database.

This identifier should be stripped when downsampling with aggregates


Downsampling

The collected metrics can be downsampled with a simple Flux task

option task = {
    name: "downsample_hitcounter",
    every: 15m,
    offset: 1m,
    concurrency: 1,
}


out_bucket = "websites/analytics"
host="http://192.168.3.84:8086"
token=""

sourcedata = from(bucket: "telegraf/autogen", host: host, token: token)
    |> range(start: -task.every)
    |> filter(fn: (r) => r._measurement == "pf_analytics_test_pixel")
    |> drop(columns: ["sess"])
    |> aggregateWindow(every: 15m, fn: sum)
    |> map(fn: (r) => ({ r with
                _field: "hitcount",
        _measurement: "pf_analytics_pixel"
    }))
    |> drop(columns: ["_start", "_stop", "type"])
    |> to(bucket: out_bucket, host: host, token: token)