project project-management-only / Scraper Snitch Bot avatar

project-management-only/scraper-snitch-bot#4: Restrict disclosures to certain times of day

Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.12
Created: 20-Jan-23 08:43


As part of the work looking into legal basis (project-management-only/scraper-snitch-bot#2), it's been identified that there's an additional control that can be put in place to help mitigate the impact of mistakes.

Toots and receipt publication should only happen during UK daytime.

This is to ensure that, if the bot makes a mistake, it doesn't go out at 0100 and remain unfixable until I wake up hours later.

Toggle State Changes


assigned to @btasker

This is primarily a case of adjusting crontab, but also need to make sure that the time period passed into the main SQL query accounts for the gap in reporting.

What we don't want is to have something like this

  • 2200: Query last 4 hours
  • 2300: Query last 4 hours
  • 0800: Query last 4 hours
  • 0900: Query last 4 hours

Because there'll be a significant window of time that isn't accounted for. The time period used in the query needs to be at least as long as the gap in runtimes. If query period is being adjusted, then the minimum number of requests may also need adjusting.

The other, more complex, alternative is to move to a queue based model: rather than tooting/publishing, the analysis bot would write details into a queue for a third process to collect. That third process would be restricted to daytime hours, whilst the analysis bot would just carry on about it's business.

That does rather feel like over-engineering though.

mentioned in issue #2

If query period is being adjusted, then the minimum number of requests may also need adjusting.

Looking at it, I think we can safely stretch the query period out as far as about 12 hours without encountering issues.

I'd like to do a few dry-run tests before actually doing that though

Have run some test queries using the wider threshold and there don't seem to be any negative ramifications - there aren't any bots which suddenly slip from having too few requests to having enough.

Cron has been updated:

20 9-22 * * * $BASEPATH/crons/

The query period has been updated to 12 hour in the wrapper script.