project Utilities / zepp_to_influxdb avatar

utilities/zepp_to_influxdb#5: Schema Review



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.31
Created: 05-Aug-23 13:39



Description

We've ended up adding a lot more fields than I originally anticipated

> SHOW FIELD KEYS FROM zepp
name: zepp
fieldKey                       fieldType
--------                       ---------
activity_duration_m            integer
calories                       integer
current_activity_type          string
current_activity_type_int      integer
current_sleep_state            string
current_sleep_state_int        integer
current_stress_level           integer
deep_sleep_min                 integer
distance_m                     integer
high_stress_time_perc          integer
last_sync                      integer
max_stress_level               integer
mean_stress_level              integer
medium_stress_time_perc        integer
minimum_stress_level           integer
normal_stress_time_perc        integer
odi_read                       float
recorded_REM_events            integer
recorded_activities            integer
recorded_awake_events          integer
recorded_deep_sleep_events     integer
recorded_fast_walking_events   integer
recorded_light_activity_events integer
recorded_light_sleep_events    integer
recorded_sleep_stages          integer
recorded_slow_walking_events   integer
relaxed_time_perc              integer
rem_sleep_min                  integer
score                          float
slept_from                     string
slept_to                       string
spo2_decrease                  float
step_goal                      integer
total_sleep_min                integer
total_steps                    integer

That's only going to grow with time.

Whilst documenting the fields I've spotted a few possible headaches, so it seems worth doing a schema review before publishing a write-up.



Toggle State Changes

Activity


assigned to @btasker

Non Specific Field Names

For the most part, it's possible to use the existing tagset to identify which category a series falls under - you could conceivable filter for r.activity_type == 'sleep' to get sleep related entries etc.

Where this breaks, though, is the trackers within the general stats:

  • recorded_{$activity_type}_events: the number of recorded activities of that type that day
  • recorded_{$stage}_events: the number of recorded instances of that sleep stage that day

The different sleep stages and activity types are dynamic, so it's not possible for a query to know which fields relate to sleep and which relate to activity.

Possible fixes

  • change the field names
  • Separate by tag
  • write into separate measurements

Variance in field names

The tracker points themselves use slightly different field names

  • current_sleep_state: string representation of the sleep type
  • current_sleep_state_int: Zepp's integer representation of the sleep type
  • current_activity_type
  • current_activity_type_int: Zepp's integer representation of the activity type

Whilst these are descriptive, it means we have 4 columns when we could instead just have two:

activity_type=sleep_stage_tracker  current_type="REM",current_type_int=8
activity_type=activity_type_tracker current_type="slow_walking",current_type_int=1

There's a similar concern with the general stats - recorded_sleep_stages and recorded_activities could similarly share a column name.

Fixes:

  • Rename columns

The big question, really, is whether we want to split things out across multiple measurements.

Rather than just having zepp we could have

  • zepp
  • zepp_activity
  • zepp_sleep
  • zepp_blood
  • zepp_stress

etc (with the zepp_ prefix perhaps being optional).

It'd definitely be better schema design, the question is whether it's worth the effort.

I only keep the data for about 7 days, having had a downsampling task normalise the data into a new platform agnostic measurement so it doesn't quite seem worth it.

verified

mentioned in commit 8d205919b2c37962dce0cfb7163a84f968737b8c

Commit: 8d205919b2c37962dce0cfb7163a84f968737b8c 
Author: B Tasker                            
                            
Date: 2023-08-05T14:56:11.000+01:00 

Message

Rename the sleep/activity stage counters to specify which category they relate to (for utilities/zepp_to_influxdb#5)

This moves both types from using field names of the form recorded_{type}_events to using one of

  • recorded_activity_{type}_events
  • recorded_sleep_{type}_events
+2 -2 (4 lines changed)

I'm going to leave the tracker field names alone - whilst it's technically less correct, it only gets rid of 2 columns and it doesn't seem worth having to update my downsampling jobs and dashboards for such a small gain.

mentioned in commit sysconfigs/downsample_configs@f07cb53f52444b386e7a603332438941031df403

Commit: sysconfigs/downsample_configs@f07cb53f52444b386e7a603332438941031df403 
Author: ben                            
                            
Date: 2023-08-05T16:23:13.000+01:00 

Message

Update downsample field matches to reflect changes in utilities/zepp_to_influxdb#5

+6 -4 (10 lines changed)

Just for completeness, here's how I ported my existing downsampled data over.

Copy data to the new columns

from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_light_sleep_events")
  |> set(key: "_field", value: "recorded_sleep_light_sleep_events")
  |> to(bucket: "health")

from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_deep_sleep_events")
  |> set(key: "_field", value: "recorded_sleep_deep_sleep_events")
  |> to(bucket: "health")

  from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_awake_events")
  |> set(key: "_field", value: "recorded_sleep_awake_events")
  |> to(bucket: "health")

  from(bucket: "health")
  |> range(start: -7d)
  |> filter(fn: (r) => r._measurement == "zepp")
  |> filter(fn: (r) => r._field == "recorded_REM_events")
  |> set(key: "_field", value: "recorded_sleep_REM_events")
  |> to(bucket: "health")

Delete the original columns

influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_light_sleep_events"'


influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_deep_sleep_events"'

influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_awake_events"'

influx delete \
-b "health" \
--start 1970-01-01T00:00:00Z \
--stop $(date +"%Y-%m-%dT%H:%M:%SZ") \
--predicate '_measurement="zepp" and _field = "recorded_REM_events"'

mentioned in issue #7