project Utilities / File Location Listing avatar

utilities/file_location_listing#23: Caching Support



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: backlog
Created: 30-Dec-23 18:12



Description

Raising in order to close #22

It'll probably be worth revisiting whether we want to add a caching layer to storage.



Toggle State Changes

Activity


assigned to @btasker

mentioned in issue #22

If we were to implement this, I think we'd probably want to take a fairly simple approach to invalidation:

  • Cache loaded files (in redis, maybe?)
  • If the index is re-read, flush the cache or set some kind of revalidation flag

A revalidation flag could work in much the same way as we check for index changes: has the reported mtime changed? If it hasn't, then the cached item can be considered revalidated.

The problem with the full-flush option is that, bearing in mind we're running in k8s, there might be multiple pods accessing the same shared cache. You don't really want pods blowing the cache away every time they come up (or load the index).

All that said, I think it'd probably be better to first look at whether we can speed file access up at all - caching only helps on subsequent accesses (which, if we're returning results well, should be quite rare).

Maybe I'm overthinking it though - we could start by just using functools.lru_cache() and seeing if that helps.

verified

mentioned in commit 207673c034ebc8f2a75ae93dd0a803facaf4a810

Commit: 207673c034ebc8f2a75ae93dd0a803facaf4a810 
Author: B Tasker                            
                            
Date: 2024-01-07T15:58:35.000+00:00 

Message

feat: cache files read from disk (utilities/file_location_listing#23)

+5 -1 (6 lines changed)

Even with my (currently) small test database, this makes a significant difference: the first query takes 358ms whilst the second takes 7 even though the second search is run with a shortened term (and so matches more candidates).

I've also added caching to processSearchTerm() - the search portal submits a second search (to get related images) - the terms processing for that will be exactly the same, so there's no point wasting CPU time recomputing filters from it.

It seems OK so far, so I'm going to close this issue out (so we can do a release) and treat anything that follows as a bug.