I've been looking at indexing some GILS originated pages.
The issue pages promote issue labels to being keywords.
Whilst that's, generally, quite useful it does mean that there are a bunch of keywords that really aren't that useful, for example
Although I could adjust GILS to skip those, it's quite possible there are non-GILS pages out there which have similarly useless keywords.
I'd like to add a config file which can be populated with tag values - if we get a (case-insensitive) exact match, we should leave it out of the tags index.
Activity
06-Jan-24 18:38
assigned to @btasker
06-Jan-24 18:42
Applying this to indexing rather than crawling means that, if we change our mind, we can simply change config and reindex.
However, it does mean that rendered search results will still show the "blocked" tags (assuming the item matched on something else). Having them show up as tags might prove to be quite confusing, so it'd be worth thinking about whether we can prevent that without incurring too much overhead at search time.
For now, though, lets focus on the indexing alone.
06-Jan-24 18:53
mentioned in commit df46c15f495d37618eaedb35fccedb7a044a920b
Message
feat: don't index any tags listed in
config/ignoretags.txt
(utilities/file_location_listing#33)06-Jan-24 19:28
Whilst building a list of tags to ignore, I've found this is actually much less useful than I had thought it was going to be. I've only actually blocklisted 8 tags.
Still, it's implemented now