project Utilities / File Location Listing avatar

utilities/file_location_listing#33: Ignore specific keywords/tags when indexing



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.2.2
Created: 06-Jan-24 18:38



Description

I've been looking at indexing some GILS originated pages.

The issue pages promote issue labels to being keywords.

Whilst that's, generally, quite useful it does mean that there are a bunch of keywords that really aren't that useful, for example

  • Fixed/Done
  • Task

Although I could adjust GILS to skip those, it's quite possible there are non-GILS pages out there which have similarly useless keywords.

I'd like to add a config file which can be populated with tag values - if we get a (case-insensitive) exact match, we should leave it out of the tags index.



Toggle State Changes

Activity


assigned to @btasker

Applying this to indexing rather than crawling means that, if we change our mind, we can simply change config and reindex.

However, it does mean that rendered search results will still show the "blocked" tags (assuming the item matched on something else). Having them show up as tags might prove to be quite confusing, so it'd be worth thinking about whether we can prevent that without incurring too much overhead at search time.

For now, though, lets focus on the indexing alone.

verified

mentioned in commit df46c15f495d37618eaedb35fccedb7a044a920b

Commit: df46c15f495d37618eaedb35fccedb7a044a920b 
Author: B Tasker                            
                            
Date: 2024-01-06T18:52:29.000+00:00 

Message

feat: don't index any tags listed in config/ignoretags.txt (utilities/file_location_listing#33)

+19 -0 (19 lines changed)

Whilst building a list of tags to ignore, I've found this is actually much less useful than I had thought it was going to be. I've only actually blocklisted 8 tags.

Still, it's implemented now