#5 Schema Review : utilities/zepp_to

btasker Permalink
05-Aug-23 13:39

assigned to @btasker

btasker Permalink
05-Aug-23 13:50

Non Specific Field Names

For the most part, it's possible to use the existing tagset to identify which category a series falls under - you could conceivable filter for r.activity_type == 'sleep' to get sleep related entries etc.

Where this breaks, though, is the trackers within the general stats:

recorded_{$activity_type}_events: the number of recorded activities of that type that day
recorded_{$stage}_events: the number of recorded instances of that sleep stage that day

The different sleep stages and activity types are dynamic, so it's not possible for a query to know which fields relate to sleep and which relate to activity.

Possible fixes

change the field names
Separate by tag
write into separate measurements

Variance in field names

The tracker points themselves use slightly different field names

current_sleep_state: string representation of the sleep type
current_sleep_state_int: Zepp's integer representation of the sleep type
current_activity_type
current_activity_type_int: Zepp's integer representation of the activity type

Whilst these are descriptive, it means we have 4 columns when we could instead just have two:

activity_type=sleep_stage_tracker  current_type="REM",current_type_int=8
activity_type=activity_type_tracker current_type="slow_walking",current_type_int=1

There's a similar concern with the general stats - recorded_sleep_stages and recorded_activities could similarly share a column name.

Fixes:

Rename columns

btasker Permalink
05-Aug-23 13:53

The big question, really, is whether we want to split things out across multiple measurements.

Rather than just having zepp we could have

zepp
zepp_activity
zepp_sleep
zepp_blood
zepp_stress

etc (with the zepp_ prefix perhaps being optional).

It'd definitely be better schema design, the question is whether it's worth the effort.

I only keep the data for about 7 days, having had a downsampling task normalise the data into a new platform agnostic measurement so it doesn't quite seem worth it.

btasker Permalink
05-Aug-23 13:57

verified

mentioned in commit 8d205919b2c37962dce0cfb7163a84f968737b8c

Commit: 8d205919b2c37962dce0cfb7163a84f968737b8c 
Author: B Tasker                            
                            
Date: 2023-08-05T14:56:11.000+01:00

Message

Rename the sleep/activity stage counters to specify which category they relate to (for utilities/zepp_to_influxdb#5)

This moves both types from using field names of the form recorded_{type}_events to using one of

recorded_activity_{type}_events
recorded_sleep_{type}_events

+2 -2 (4 lines changed)

btasker Permalink
05-Aug-23 13:58

I'm going to leave the tracker field names alone - whilst it's technically less correct, it only gets rid of 2 columns and it doesn't seem worth having to update my downsampling jobs and dashboards for such a small gain.

sysconfiguser Permalink
05-Aug-23 15:23

mentioned in commit sysconfigs/downsample_configs@f07cb53f52444b386e7a603332438941031df403

Commit: sysconfigs/downsample_configs@f07cb53f52444b386e7a603332438941031df403 
Author: ben                            
                            
Date: 2023-08-05T16:23:13.000+01:00

Message

Update downsample field matches to reflect changes in utilities/zepp_to_influxdb#5

+6 -4 (10 lines changed)

btasker Permalink
05-Aug-23 15:33

Just for completeness, here's how I ported my existing downsampled data over.

Copy data to the new columns

from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_light_sleep_events")
  |> set(key: "_field", value: "recorded_sleep_light_sleep_events")
  |> to(bucket: "health")

from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_deep_sleep_events")
  |> set(key: "_field", value: "recorded_sleep_deep_sleep_events")
  |> to(bucket: "health")

  from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_awake_events")
  |> set(key: "_field", value: "recorded_sleep_awake_events")
  |> to(bucket: "health")

  from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_REM_events")
  |> set(key: "_field", value: "recorded_sleep_REM_events")
  |> to(bucket: "health")

Delete the original columns

influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_light_sleep_events"'


influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_deep_sleep_events"'

influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_awake_events"'

influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_REM_events"'

btasker Permalink
08-Aug-23 16:54

mentioned in issue #7

utilities/zepp_to_influxdb#5: Schema Review

Issue Information

Activity