utilities/snippets_cli_go#12: Extract language from title

Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.1
Created: 28-Sep-24 09:35


This is, in effect, a successor of #10

The old script was able to show what language a snippet related to:

Search results - String: table, title: False, lang: False, similarto: False
| Snippet ID | Title                                                                               | Language |
| 142        | Check if value exists in table                                                      | LUA      |
| 135        | Rate limiting connections with iptables and hashlimit                               | BASH     |
| 83         | Recursively print table (print_r equivalent)                                        | LUA      |
| 81         | Add a static entry to the ARP table                                                 | BASH     |
| 78         | Intercepting Outbound DNS Queries                                                   | BASH     |
| 45         | Inserting new rows based upon a mix of static values and results from another query | MySQL    |
| 40         | Check if variable is table                                                          | LUA      |
| 35         | ASCII Character Codes                                                               | Misc     |
| 26         | Imploding a table                                                                   | LUA      |
| 12         | Check if table has element                                                          | LUA      |
| 7          | Make ASCII Table                                                                    | Python   |

The CLI does technically show it, because it's part of the title

| Search results: table                                                                             |
|   # | TITLE                                                                                       |
| 141 | Check if value exists in table (lua)                                                        |
| 134 | Rate limiting connections with iptables and hashlimit (bash)                                |
|  83 | Recursively print table (print_r equivalent) (lua)                                          |
|  81 | Add a static entry to the ARP table (bash)                                                  |
|  78 | Intercepting Outbound DNS Queries (bash)                                                    |
|  45 | Inserting new rows based upon a mix of static values and results from another query (mysql) |
|  40 | Check if variable is table (lua)                                                            |
|  35 | ASCII Character Codes (misc)                                                                |
|  26 | Imploding a table (lua)                                                                     |
|  12 | Check if table has element (lua)                                                            |
|   7 | Make ASCII Table (python)                                                                   |

However, I can't easily scan down the column in the way that I can with the original.

What I'd like is for the language to be extracted from the title and dropped into an additional table column

Toggle State Changes


assigned to @btasker


mentioned in commit 4184f5371ab45751a8f0ffddc32e8e02d5584428

Commit: 4184f5371ab45751a8f0ffddc32e8e02d5584428 
Author: B Tasker                            
Date: 2024-09-28T10:54:07.000+01:00 


feat: identify language by parsing title (utilities/snippets_cli_go#12)

+31 -3 (34 lines changed)

This introduces a couple of new attributes when configuring a search destination:

        rss : "http://scratch.holly.home/output/rss.xml",
        elemtype : "div",
        attrib : "itemprop",
        elemid : "articleBody text",
        parseTitle : true,
        extraCol : "Language",

extraCol provides the title for the column in the table. Parsing will only be attempted if parseTitle is true.

The logic is relatively simple:

  • Check whether the title ends with a closing bracket
  • If so, apply the regex \(([^\)]+)\)$ to try and extract what's between the brackets

The result looks like this

| Search results: table                                                                                        |
|   # | TITLE                                                                                       | LANGUAGE |
| 141 | Check if value exists in table (lua)                                                        | lua      |
| 134 | Rate limiting connections with iptables and hashlimit (bash)                                | bash     |
|  83 | Recursively print table (print_r equivalent) (lua)                                          | lua      |
|  81 | Add a static entry to the ARP table (bash)                                                  | bash     |
|  78 | Intercepting Outbound DNS Queries (bash)                                                    | bash     |
|  45 | Inserting new rows based upon a mix of static values and results from another query (mysql) | mysql    |
|  40 | Check if variable is table (lua)                                                            | lua      |
|  35 | ASCII Character Codes (misc)                                                                | misc     |
|  26 | Imploding a table (lua)                                                                     | lua      |
|  12 | Check if table has element (lua)                                                            | lua      |
|   7 | Make ASCII Table (python)                                                                   | python   |

I don't love that it's lowercase, but that's something that needs to be addressed at the publishing end - the import script castes all tags to lower case before this gets injected.

mentioned in issue #8