The spider supports indexing markdown files (introduced in utilities/file_location_listing#5)
However, it relies on the file title being the first line, i.e.
# My file title
blah blah
I currently use Obsidian to manage notes and I'd quite like to be able to add some YAML frontmatter to my templates (because I can then query it with the dataview plugin.
---
category: foo
somevar: somevalue
---
# My file title
blah blah
However, if I were to do this, the indexer would stop collecting titles from these files, because it expects the title to be on the first line and checks that it looks like a heading:
# The title should be on the first line
if lines[0].startswith('#') or lines[1].startswith("---") or lines[1].startswith("==="):
title = lines[0].strip("#").strip
I'd like to update the logic so that frontmatter can be handled (it'd be an added bonus if we could collect it into storage, but the most pressing need is to make sure it doesn't break anything)
Activity
22-Aug-24 09:07
assigned to @btasker
22-Aug-24 13:10
mentioned in commit 4d6f1d3186afda7530cd1a9dc0c3835c570c6ce1
Message
feat: collect YAML frontmatter if it exists (utilities/file_location_listing#65)
22-Aug-24 13:10
mentioned in commit 36e8227c4d91f6acedda5f9f5be3e8a376609871
Message
feat: Markdown parser returns frontmatter info (utilities/file_location_listing#65)
Note: nothing currently happens with this, there isn't really an equivalent for other filetypes so we don't currently have anywhere to store it.
Needs a bit of thought to decide
22-Aug-24 13:54
mentioned in commit ede29961f4ea416e611a1ee99127cad2bceb4d1d
Message
feat: add generic metadata attribute to storage files and write frontmatter in under it (utilities/file_location_listing#65)
22-Aug-24 14:02
The commit above introduces a new
metadata
attribute to storage files.The idea is that this is an extensible section which can have arbitrary attributes added under it - in this case, we've added one called
frontmatter
.Although it's not currently implemented (and won't be under the heading of this issue), the plan is that we'll add a new index key which indicates what metadata attributes a stored item has.
In this case that might look something like
That'll allow indexes to be used to speed up searches which match against something that only exists in
metadata
. Not that there's currently syntax to support it, but if we're doing a search for documents who's frontmatter includes the key "foo" with value "bar" we'd be able to quickly narrow the search set down to only include documents that actually have frontmatter.It might also be that we want to make the storage headers extensible, perhaps including something like
But I think that path probably leads to pain once you start thinking about how to incorporate that into indexes
23-Aug-24 07:25
For now, we keep this simple - the changes should mean than indexing files doesn't break if they've got front-matter.
That's the main change that I need.
Once that's tested and definitely working, we can look at collapsing key-pairs into tag values to make them searchable, but the main thing is I want to be able to release soon (this weekend ideally) so that I can start adding front-matter to my notes.
Edit: actually, it looks like that injection is a 1 line change, so I'll do it now.
23-Aug-24 07:28
mentioned in commit b9fece2d7fe7f91fd6db2e9607bc94d735553494
Message
feat: use frontmatter to inject tags (utilities/file_location_listing#65)
23-Aug-24 07:32
OK, test crawl running now
There's a test file in there with the following content:
23-Aug-24 10:02
With a couple of typos fixed, it looks to have worked:
23-Aug-24 13:00
Additional things checked
There is one small problem though - the current dork processing looks like this
In the example above, the values are
Category:foo
,CatName:foo_Bar_Sed
andage:40
. None of those prefixes exist indorks
, so it's treated as a normal search.However, that may not always be true. If our markdown doc looked like this
We'd run into trouble. It wouldn't be possible to search for tag value
domain:foo
because it would get interpreted as a dork and only results fromwww.bentasker.co.uk
would be displayed (so if this file were on a different domain, it wouldn't appear in results)The result would potentially be quite confusing: At best, you'd get no results back despite knowing that there was something there, at worst you'd get incomplete results back (meaning you may not notice it was inaccurate).
It might be better to collapse frontmatter into something more akin to scoped tags - i.e.
Category::foo
rather thanCategory:foo
. It'd be trivial to have the dork logic skip anything with a double colon23-Aug-24 13:05
The other thing that we need to consider, though, is that the current implementation doesn't actually parse the YAML.
So there are all sorts of YAML supported things which aren't currently handled, for example:
It might instead be prudent to pass it into Python's YAML parser, then we can infer type when injecting tags (injecting multiple for relevant lists).
We should also add special support for
tag
andtags
as Obsidian explicitly includes special handling for those.24-Aug-24 09:35
mentioned in commit 91e615305b6bcef257ea5a56ef5359aaad5c451e
Message
feat: attempt to parse YAML frontmatter and handle entries with multiple values (utilities/file_location_listing#65)
24-Aug-24 09:39
The test file now has the following contents
As of the commit above, the stored tags for this page are
Within the JSON payload, the
metadata
attribute has the following:Which is pretty much what we wanted.
We don't currently support nested objects in the YAML though:
Although it'll still appear under
metadata
, it won't result in any injected tags.I'm intending to do this once I've special cased
::
in search term parsing.24-Aug-24 09:44
mentioned in commit 3301b312f871a046ad60be4cc2d6d92fb203a5e7
Message
feat: special-case a double colon so it can't conflict with dorks (utilities/file_location_listing#65)
This means that a markdown document can include
domain: foo
in its YAML frontmatter without the resulting tagsearch being interpreted as a dork24-Aug-24 09:51
OK, on to special casing of YAML items.
Obsidian's doc lists a set of defaults:
It also notes that it used to support singular versions of these (
tag
,alias
andcssclass
) but that these were deprecated and should not be used.Realistically, deprecated or not, we should probably special case them to ensure historic docs are still supported.
Although not listed in the defaults table, the doc also references
title
- it'd make sense to support setting the page title from this.Down the line, it'd be nice if we could index aliases too, but that's not for today - for now, we just won't special case (then they'll be indexed as tags)
24-Aug-24 09:53
So, there are three changes that need to be made as a result of this
tag
ortags
, don't prefix the tag valuetitle
attribute exists, set the page title from thatcssclasses
orcssclass
, don't inject a tag24-Aug-24 09:58
mentioned in commit 32d658aedf320effce88255043160d428193358a
Message
feat: special case specific YAML property names (utilities/file_location_listing#65)
24-Aug-24 10:01
mentioned in commit 94ca7298c3c5eb9def1c669cf028565545310ae2
Message
feat: special case the title attribute (utilities/file_location_listing#65)
If a title property exists within YAML front matter we'll use it's value as the page title. We also won't inject a tag with prefix
title::
24-Aug-24 10:06
mentioned in commit 7ec30172f8351f4413b27716b9a9188be711a6da
Message
fix: convert dates in frontmatter to string (utilities/file_location_listing#65)
This ensures that we'll be able to serialise to JSON for storage later
24-Aug-24 10:20
mentioned in commit 74d748bd7619abaf2f792ca6bf8dfd3da9394bc6
Message
feat: implement support for float, int, bool etc (utilities/file_location_listing#65)
Note: this also moves the previous datetime serialisation fix to storage
24-Aug-24 10:30
mentioned in commit 46b62754f5e2d9a20f88c5221e70623530636367
Message
chore: make YAML frontmatter settings configurable (utilities/file_location_listing#65)
24-Aug-24 10:33
mentioned in commit 715ce0ba941bfae0e151f9bfb631f0a6f5071373
Message
feat: allow frontmatter parsing to be disabled (utilities/file_location_listing#65)
Setting env var
YML_LOAD_FRONTMATTER
to a value other thantrue
will prevent parsing of front matter.Note: it'll still be popped out to ensure that markdown title detection works
24-Aug-24 10:39
I think we might now be done here.
The following test file works just fine
The new functionality is controlled by environment variables:
YML_LOAD_FRONTMATTER
: should frontmatter be parsed (default:True
)YML_NO_PREFIX
: which YAML properties shouldn't be prefixed when injecting tags (default:tag,tags
)YML_NO_TAG
: which YAML properties shouldn't result in a tag being injected (default:cssclass,cssclasses,title,date
)YML_LOAD_TITLE
: should a YAMLtitle
property be used to set the page title?