Although we're not currently consuming the relevant headers, future use of storage file headers may be problematic where whitespace is included in page titles etc.
Take for example, the following HTML
<head id="dochead">
<title>Category - C
- snippets.bentasker.co.uk</title>
This renders just fine in the browser, but when written into storage the whitespace is included:
KEY:https://snippets.bentasker.co.uk/C.html
HASH:455288f7bb9ed2901156c22747c40021e828588073d04e71df6c98d0ca1a712c
DOMAIN:snippets.bentasker.co.uk
PATH:/C.html
FILE:C.html
CONTENTTYPE:text/html
TITLE:Category - C
- snippets.bentasker.co.uk
ENGINE-VER:0.2.6
It's handled OK in search results (the whitespace is present, but the browser ignores it)
In both cases, though, it should probably be stripped (or at least escaped)
Activity
03-Mar-24 11:30
assigned to @btasker
03-Mar-24 11:31
I'm adding this to backlog for now because:
10-Mar-24 12:49
changed the description
10-Mar-24 12:50
We've ended up making a change anyway, so now might actually be the perfect time to do this.
10-Mar-24 12:54
mentioned in commit 02312bc152056f49d16a92de89e847d403bfccf8
Message
fix: strip newlines from title tags (utilities/file_location_listing#50)
10-Mar-24 12:55
changed the description