project Utilities / Python Influxdb Downsample avatar

utilities/python_influxdb_downsample#15: Switch to Batched Writes



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: vnext
Created: 07-Feb-23 17:25



Description

If a particularly large downsample is generated, InfluxDB OSS 1.8.10 may reject the write

2023-02-07 17:14:03.154564: Writing to output home1x failed: (413)
Reason: Request Entity Too Large
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Request-Id': 'd19a7440-a70a-11ed-8867-0242ac130002', 'X-Influxdb-Build': 'OSS', 'X-Influxdb-Error': 'Request Entity Too Large', 'X-Influxdb-Version': '1.8.10', 'X-Request-Id': 'd19a7440-a70a-11ed-8867-0242ac130002', 'Date': 'Tue, 07 Feb 2023 17:14:02 GMT', 'Content-Length': '37', 'Connection': 'close'})
HTTP response body: {"error":"Request Entity Too Large"}


Toggle State Changes

Activity


assigned to @btasker

It probably just needs max-body-size setting in InfluxDB's config, but it'd be good to look at whether we can provide a setting to split output batches if necessary.

It'll interfere with the change made in #9, but the simplest way to implement this would probably be to move the InfluxDB client to using batch mode.

Doing that would also have the benefit that we move onto processing the next job that bit sooner (which is a bit of double-edged sword - it'll also lead to increased resource consumption as we may end up holding multiple datasets, or parts of datasets in RAM).

verified

mentioned in commit 9c552984f00c5e613ade2dddaac1bbee174d2a15

Commit: 9c552984f00c5e613ade2dddaac1bbee174d2a15 
Author: B Tasker                            
                            
Date: 2023-02-07T19:27:24.000+00:00 

Message

Switch upstream writes to batching mode utilities/python_influxdb_downsample#15

This brings a number of benefits

  • We get the benefits of retries when failures happen
  • We won't write massive chunks to upstream in one chunk
  • We can move onto querying the next job (often the longest part)
+37 -5 (42 lines changed)
verified

mentioned in commit c687c8de738b62501d810913b6f10722a65826aa

Commit: c687c8de738b62501d810913b6f10722a65826aa 
Author: B Tasker                            
                            
Date: 2023-02-07T19:49:46.000+00:00 

Message

Adjust recordFailures so that we log when a batch write failed. utilities/python_influxdb_downsample#15

It's no longer possible for us to write data to disk for re-ingest later - although the data is available to us, we no longer have access to details about which output it was being written into etc.

+13 -33 (46 lines changed)

I'm going to relabel and update the description on this issue, although big writes failing were the cause it's probably more accurate to title this "switch to batching writes"

changed title from {-Big writes to Influx 1.x fail-} to {+Switch to Batched Writes+}

mentioned in issue #11

mentioned in issue #18