project Websites / Gitlab Issue Listing Script avatar

websites/Gitlab-Issue-Listing-Script#46: Url Encode Label Names in links



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.5
Created: 26-Aug-22 15:47



Description

In #44 we added a page showing issue history for a specific label.

The pages use a SEF path like /labeltimeline/websites/privacy-sensitive-analytics/Bug.html

However, some labels contain slashes which mean the pages do not load (/labeltimeline/websites/privacy-sensitive-analytics/Resolution::Fixed/Done.html).

We should URLencode labels to ensure that slashes are escaped - the SEF handler will need to urldecode it.



Toggle State Changes

Activity


assigned to @btasker

Hmmmm, that's not enough.

In the markup we have

<a href="/labeltimeline/websites/privacy-sensitive-analytics/Resolution%3A%3AFixed%2FDone.html">

But the browser converts it back over in its request.

It's a horrible, horrible solution, but urlencoding twice prevents that.

verified

mentioned in commit 921df2ad868a5732a9f457c1ab5f00baadf830fb

Commit: 921df2ad868a5732a9f457c1ab5f00baadf830fb 
Author: B Tasker                            
                            
Date: 2022-08-26T17:01:49.000+01:00 

Message

Ensure label names are sent urlencoded in paths (websites/Gitlab-Issue-Listing-Script#46)

We have to urlencode twice otherwise Firefox and Chrome will automatically decode when placing the request

+2 -2 (4 lines changed)

mentioned in issue #47

This now seems to work well.

In the Markup we have

<a href="/labeltimeline/websites/privacy-sensitive-analytics/Resolution%253A%253AFixed%252FDone.html">

Firefox decodes that to

http://127.0.0.1:1480/labeltimeline/websites/privacy-sensitive-analytics/Resolution%253A%253AFixed%2FDone.html

It's safe to urldecode() a string twice, even if it's only been urlencoded once, so this handles the Firefox/Chrome case as well as things like wget which won't automatically urldecode the first layer.

Browser (Firefox) behaviour is actually quite odd here:

The markup has

/Resolution%253A%253AFixed%252FDone.html

When you hover over the link, the tooltip at the bottom of the page says

/Resolution%253A%253AFixed%2FDone.html

If you right click the link and say copy link, you get the double-encoded version

/Resolution%253A%253AFixed%252FDone.html

If you click the link, the address bar shows

/Resolution%253A%253AFixed%252FDone.html

Developer Tools seems quite confused about what's been requested (compare highlighted on the left with the URL on the right)

Screenshot_20220826_172456

The request logged by apache is

/Resolution%253A%253AFixed%252FDone.html

So, what happens if we roll back the change in this ticket?

  • Markup: /Resolution%3A%3AFixed%2FDone.html
  • Tooltip: /Resolution::Fixed/Done.html
  • Dev tools left pane: /Resolution::Fixed/Done.html
  • Dev tools right pane: /Resolution%3A%3AFixed%2FDone.html
  • Apache: /labeltimeline/websites/privacy-sensitive-analytics/Resolution%3A%3AFixed%2FDone.html

So actually, the browser is handling it OK at the request level - the issue is that Apache (2.4.38 at time of writing) doesn't. It's not a GILS 404 page we're getting back, but an Apache one, and a quick check confirms the application never gets initiated.

It means there are two issues here

  • Apache decodes the %2F into a / and returns a 404 (despite the rewrite rule)
  • Firefox mishandles url-encoded URLs within certain UI elements (leading to the inconsistencies above)

OK, so I think the only safe route is to leave this double encoded - it provides a safety net against issues within either of the browser or httpd stack.

Re-opening just to bottom something out.

I've tested mirroring with wget and this works fine:

$ ls labeltimeline/websites/Gitlab-Issue-Listing-Script/
 Bug.html                 Improvement.html     New+Feature.html                Resolution::Invalid.html      Task.html
 Gitlab+API+Foible.html   Informational.html   Resolution::Fixed%2FDone.html  "Resolution::Won't+Fix.html"

My concern though, is that we may possibly run into a httpd stack which doesn't appreciate these special chars. And, of course, label names are free text, so there's a whole world of other characters just waiting to cause issues.

I'm wondering, then, whether it might not be better to switch to using label's numeric IDs instead.

It'll be very hard to change from one to another once this functionality is released, so it seems better to err on the side of caution.

verified

mentioned in commit 55a1433b7c0c2c41a507fe6ec2e1f273ce177531

Commit: 55a1433b7c0c2c41a507fe6ec2e1f273ce177531 
Author: B Tasker                            
                            
Date: 2022-08-27T12:10:17.000+01:00 

Message

Use label IDs instead of text for filenames (websites/Gitlab-Issue-Listing-Script#46)

This removes the possibility of special characters causing issues in static mirrors etc.

Following this change, the ID needs to be passed through when generating a SEF link:

GILSUtils::qs2sef("label={$label['id']}&projpath={$project['project']->path_with_namespace}");
+15 -13 (28 lines changed)

Done. Re-closing