project Utilities / Python Influxdb Downsample avatar

utilities/python_influxdb_downsample#12: Implement conditional short-circuit for first and last aggregates



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: vnext
Created: 06-Feb-23 09:01



Description

Currently aggregates work by building a list of values which can then be operated on. There's, obviously, a memory cost to this.

Whilst this processing is inavoidable for aggregates like mean, there are circumstances where it may be wasted effort.

Neither first nor last need a list of all values (in fact, first doesn't even need to iterate all of them).

If they're used in conjunction with other aggregates then we'll still need to build a list of values, but if they're used alone it'd be good if we could avoid that processing.

Last can be tracked quite simply:

for foo in bar:
    last_val = foo['_value']

first is even easier:

for foo in bar:
    if num_aggregates == 1:
      first = foo['value']
      break

The difficulty comes in detecting it - we'll probably want to populate some kind of state when loading config.



Toggle State Changes

Activity


assigned to @btasker

verified

mentioned in commit abe22c46e937d25a2d0b1a13652694dab522edd7

Commit: abe22c46e937d25a2d0b1a13652694dab522edd7 
Author: B Tasker                            
                            
Date: 2023-02-06T09:03:42.000+00:00 

Message

feat: Count aggregates when loading config

This could be considered prep for utilities/python_influxdb_downsample#12

It also introduces a new configuration attribute: vars

This provides an area that we can push auto-calculated variables into without polluting the outer config namespace.

It's not intended to be human-editable and so shouldn't appear in the config file itself

+6 -1 (7 lines changed)
verified

mentioned in commit 90fa038e8a97cdf5f977790610bea4da94c9df54

Commit: 90fa038e8a97cdf5f977790610bea4da94c9df54 
Author: B Tasker                            
                            
Date: 2023-02-06T09:22:25.000+00:00 

Message

Short-circuit iteration if first is the only active aggregator utilities/python_influxdb_downsample#12

If first is the only active aggregator we'll only process the first record in each table, knowing the rest will be discarded anyway

+15 -5 (20 lines changed)
verified

mentioned in commit 9e543c986ddbbf2626e74392edb14a2c7a50120b

Commit: 9e543c986ddbbf2626e74392edb14a2c7a50120b 
Author: B Tasker                            
                            
Date: 2023-02-06T09:25:18.000+00:00 

Message

Add short-circuit if last is the only active aggregator utilities/python_influxdb_downsample#12

+9 -0 (9 lines changed)

This was simpler than expected.

One thing I wondered was whether we want to allow for complex usage too.

For example, if first and last are used together, we can still make efficiency savings in the last processing (but not first).

But, doing so means adding quite a lot of complexity to the loop (because of the way we currently implement the short-circuit we'd need additional tracking of the first value for this use-case).

Having both first and last active (and without other aggregates) should be pretty rare, so it doesn't seem worth introducing overhead to cater to it.