The "remove text from titles" feature was coded into the admin UI from the previous version, but was never actually implemented in the crawler. Wow. It works now.
Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something?
Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip
Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.
PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.
Run mb_convert_encoding in ALL cases to remove potentially invalid UTF-8 characters.
Add the "replacement" UTF-8 character to the whitespace array to ensure it's removed.
Update PdfParser to the latest snapshot from the repo.
Add code to allow PdfParser to decode XMP Metadata from PDF files, preferring it over other decoded data.
Provide an 'online' value to the Search Result Mustache template. This will, for example, allow you to put things in your Search Result template that will show up when your site is displayed live (PHP), but will not be output when your site is displayed using the offline Javascript, and vice versa.
eg.
{{#online}}
This will only display in your template if it's online.
{{/online}}
{{^online}}
This will only display in your template it it's offline.
{{/online}}
Don't assume that other data from a PDF is the same as the content. Bypasses some still-unfixed PDFParser encoding issues.
Also exit the crawler script if we are in debug mode and there is a crawl already running.
By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js
If the search UI is using typeahead and the user selects a suggested option to go right to a page, then a search is never logged as a search query; it's like the search never happened. Add a fetch request to log the search query just before sending the user on their way to the page.
Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in.
Also change generic '$select' variable to more semantic '$crawldata'.
Add a third value to the sp_progress config value to let the script know if a crawl was resumed or not.
Also restore the sp_sha1 data from the crawltemp table on a resumed crawl.
Get rid of "avg_results" value; it's not intuitive. Instead make sure to use the results tally from the last recorded search query.
Use mb_strlen and mb_substr to avoid searching for single, but multibyte characters like bullet (•).
Using INSTR was correctly matching searches for 'ae' to the ligature æ, but was not matching searches for plain 'a' to å. However, using LIKE behaves exactly the opposite of this. Unless there is a better solution, use both INSTR and LIKE to create the query so all bases are covered.
If an orphan URL is blocked by a user rule, then remove it from the 'sp_exist' list so it doesn't keep coming back again and again. This only happens the next crawl after the user adds new rules.
Other misc edits.