We currently apply search criteria in the following order:
checkTerms()
)checkTerms()
)_checkConstraints()
)_checkConstraints()
)_checkConstraints()
)_checkConstraints()
)_checkConstraints()
)Although logical, this is potentially less efficient than it can be.
Most of the constraints checked by _checkConstraints()
perform simple logical comparisons and they all perform one match.
The same cannot be said for the constraints checked in checkTerms()
- if multiple search terms have been provided, it'll run multiple substring searches.
So, if we take the following search string:
foo bar domain:www.somedomain.invalid
And assume we've indexed the following URLs
https://www.somedomain.invalid/foo/bar.html
https://sub1.somedomain.invalid/foo/blah/bar.html
https://sub2.somedomain.invalid/foo/blah/bar.html
We'll see the following operations get run for each URL
if foo in srchterm (via checkTerms())
if bar in srchterm (via checkTerms())
if domain == www.somedomain.invalid
Whatever order we do things in, the number of operations applied to https://www.somedomain.invalid/foo/bar.html
will always be 3.
But, we've performed 3 operations against each of https://sub1.somedomain.invalid/foo/blah/bar.html
and https://sub2.somedomain.invalid/foo/blah/bar.html
when they could have been excluded with just one cheap one.
Activity
01-Jan-24 12:54
assigned to @btasker
01-Jan-24 13:08
I was curious what the performance difference was, so kicked together a quick script to time applying a few functions to a list of 1 million strings:
It's far from scientific, but works as a rough approximation.
They're all pretty fast, but there are clear differences
The numbers obviously fluctuate between runs but, as expected,
==
is always quicker thanin
01-Jan-24 13:15
I'm somewhat surprised at how slow
startswith
is in comparison toin
. I thought that maybe it was because we were checking for quite a long prefix, but adjusting to teststartswith("1")
doesn't really change the numbers.We use
startswith()
for theprefix
dork, so we want to make sure that_checkConstraints()
applies that after other constraints.Extension checks use
split()
which is significantly more expensive, we definitely want that constraint applied last (and, actually, probably want it applied aftercheckTerms()
has been called).01-Jan-24 13:19
mentioned in commit e511159b5cbbdddf4ba9241d3bc664203bc443b1
Message
fix: optimise order in which constraints are applied (utilities/file_location_listing#30)
01-Jan-24 13:21
The order of application has been updated:
With
_checkConstraints()
having been updated to no longer triggerext
constraints and triggerprefix
constraints last.Where dorks have been used, this should allow us to exclude results as cheaply as possible.