project Utilities / File Location Listing avatar

utilities/file_location_listing#50: Whitespace can break storage headers



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.2.6
Created: 03-Mar-24 11:30



Description

Although we're not currently consuming the relevant headers, future use of storage file headers may be problematic where whitespace is included in page titles etc.

Take for example, the following HTML

<head id="dochead">
    <title>Category - C 


        - snippets.bentasker.co.uk</title>

This renders just fine in the browser, but when written into storage the whitespace is included:

KEY:https://snippets.bentasker.co.uk/C.html
HASH:455288f7bb9ed2901156c22747c40021e828588073d04e71df6c98d0ca1a712c
DOMAIN:snippets.bentasker.co.uk
PATH:/C.html
FILE:C.html
CONTENTTYPE:text/html
TITLE:Category - C 


                - snippets.bentasker.co.uk
ENGINE-VER:0.2.6

It's handled OK in search results (the whitespace is present, but the browser ignores it)

In both cases, though, it should probably be stripped (or at least escaped)



Toggle State Changes

Activity


assigned to @btasker

I'm adding this to backlog for now because:

  • It's not currently having any ill effects
  • The next release will introduce storage file version numbering, we don't want to make changes to storage file info at the same time.

changed the description

The next release will introduce storage file version numbering, we don't want to make changes to storage file info at the same time.

We've ended up making a change anyway, so now might actually be the perfect time to do this.

verified

mentioned in commit 02312bc152056f49d16a92de89e847d403bfccf8

Commit: 02312bc152056f49d16a92de89e847d403bfccf8 
Author: B Tasker                            
                            
Date: 2024-03-10T12:54:27.000+00:00 

Message

fix: strip newlines from title tags (utilities/file_location_listing#50)

+1 -1 (2 lines changed)

changed the description