Although we support indexing markdown files, we currently just treat them as text files with special rules applied.
This means that we can extract title, tags and outbound links from markdown files.
However, if a markdown file references an image, we don't currently pick up on them (and must instead find the image - if possible - through directory navigation).
If the file were, instead, HTML then for each of those images we'd
[url]
Given that the majority of my notes are in md, it'd be nice if we could do the same with those.
We could just expand the ruleset to include supporting images, but a certain point we may find we're just building a markdown parser - it probably makes as much sense to use an off-the-shelf parser to turn it into a blob of HTML so that we can treat it as if it were HTML.
Activity
07-Jan-24 11:07
assigned to @btasker
07-Jan-24 11:09
Of course, the issue with rendering is that it does potentially present a small security risk: we'll be passing arbitrary content into a parser - whilst the risk of that should be small, it is not 0.
But, to a certain extent, that same risk exists when we pass a HTML file into
BeautifulSoup
.13-Jan-24 00:01
Although importing
markdown
to render is pretty straightforward, if we do this it'll break the markdown tagging support introduced in #10A lot of my notes (probably the main thing I'm interested in location) use the Obsidian tagging layout:
If we process the file as if it's HTML we'll lose the ability to (reliably) extract those.
So, I think it might be better, after all, to look at teaching the markdown logic to be able to parse something like
so that the alt-text and/or the document title gets associated
13-Jan-24 00:40
mentioned in commit f1373e82a85b2dbb583f80e46b25a02fbb147ffb
Message
feat: give extractMetaFromMarkdown() the ability to extract images and alt-tags (utilities/file_location_listing#37)
13-Jan-24 00:41
The feature branch has the basic functionality in now, it can read a markdown file and extract links and image anchors.
I'll play around with it a bit before merging tomorrow
13-Jan-24 11:09
As we've got things cordoned off in a branch anyway, I sort of wonder whether it isn't worth taking the time to also get relative links (as opposed to images) working.
At the moment, if we had the following markdown
We would extract
https://www.example.com
Which, on reflection is actually a little weird - we haven't actually linked to example.com at all (although some renderers might auto-link it), but we've extracted that and not an actual link.
Although normally (and certainly in my notes) we'd end up finding
bar.md
during directory traversal, that might not always be the case: if MD is served without directory indexing enabled for example.So, yeah, I think it's probably worth spending a little bit of time implementing support for that too.
13-Jan-24 11:31
This turned out simpler to do than expected.
I changed the regex used for the image extraction to collect the first char, going from
!\[([^\]]+)?\]\(([^\)]+)\)
to(.)?\[([^\]]+)?\]\(([^\)]+)\)
. The code can then check the 1st group to see whether we're processing an image or a link.Preparing to merge now.
13-Jan-24 11:31
changed title from Markdown {-R-}endering to Markdown {+Specific Indexing (was Markdown r+}endering{+)+}
13-Jan-24 11:32
mentioned in merge request !3
13-Jan-24 11:34
mentioned in commit bf7d4f0f322c530e019c8e3d25da531af1d63482
Message
feat: implement proper markdown processing
domain
from queued items, remove itImplement support for markdown links
Previously, we extracted links by using a regex to find URLs (just as we do with plain text). This commit implements support for processing the
[title](target)
construct, including handling relative linksrefactor to remove other examples of repeatedly calculating absoluteness
13-Jan-24 11:34
mentioned in commit ee21b4657c68a848e3d3bc09851b55161a0112e9
Message
Merge branch 'markdown-processing' into 'main'
feat: implement proper markdown processing
Closes #37
See merge request utilities/file_location_listing!3