Commit graph

67 commits

Author SHA1 Message Date
Brian Huisman 5cb7c372fb Couple misc fixes
Change element where some classes are applied in admin.php to work with updated Bootstrap.

Add the fi, ff, fl, ffi, ffl series of ligatures to $_RDATA['s_latin'] as they are common in PDFs.
2023-09-29 12:57:23 -04:00
Brian Huisman dee454cb8c Update Bootstrap and jQuery
Bootstrap => v5.3.2
jQuery => v3.7.1
2023-09-28 11:37:46 -04:00
Brian Huisman 6c961d44a3 Add Query Log row display limit 2023-09-25 11:53:20 -04:00
Brian Huisman f8bed73c26 Responsive pagination bar on Page Index 2023-09-15 10:32:59 -04:00
Brian Huisman 873a18fbc9 Add text fragments flag and functionality 2023-09-14 13:18:33 -04:00
Brian Huisman da52e0f7bf Add header image and link
Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something?

Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip
2023-09-14 10:33:59 -04:00
Brian Huisman d4e0e409fe Show Page Titles checkbox on Page Index
Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.
2023-09-11 13:32:45 -04:00
Brian Huisman dd88459d04 Update admin.php
PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.
2023-09-11 12:18:33 -04:00
Brian Huisman 511207e0b2 Add PDF Last Modified multiplier 2023-09-08 15:11:27 -04:00
Brian Huisman 302c8db00e Group statements
Group several statements into single statements. You might not like it, but this makes me happy. :)
2023-07-21 14:17:15 -04:00
Brian Huisman edf2dc338c admin and pdfparser updates
Add some tooltip text to some elements in admin.php
Merge recent PdfParser updates into the library
2023-07-11 10:24:55 -04:00
Brian Huisman eda57224d9 Remove need for 'jw_depth' value
By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js
2023-06-21 15:07:57 -04:00
Brian Huisman 675b25b1e4 Update admin.php
Forgot a [0] dangit.
2023-06-19 12:31:47 -04:00
Brian Huisman 930d4fa793 Update admin.php
Test user-supplied regular expression matches for validity before saving.
2023-06-19 12:12:58 -04:00
Brian Huisman 8ed10eac36 Tweaks
Fix text1/text2 specification in page index SQL query.
Add eszett also to single 's' for replacement.
2023-06-16 13:33:15 -04:00
Brian Huisman e76fdf730c s_show_orphans cleanup
Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in.
Also change generic '$select' variable to more semantic '$crawldata'.
2023-06-15 10:19:05 -04:00
Brian Huisman a489fb1b8e sp_smart => sp_punct
Change sp_smart to sp_punct also in the offline javascript template.
2023-06-14 15:39:30 -04:00
Brian Huisman 4caa178acc Update admin.php
Tighten up the Offline javascript export statement
2023-06-14 15:16:03 -04:00
Brian Huisman 1ce34d9e41 Offline javascript output
Make the javascript output text into a Mustache template.
Add the jw_depth variable.
2023-06-14 14:43:55 -04:00
Brian Huisman c069e44765 Unique IPs in modal
Add a listing for "Unique IPs" in the modal popup on the Query Log page.
2023-06-14 10:27:54 -04:00
Brian Huisman fd2bbf745f Add 'resumed' flag to sp_progress
Add a third value to the sp_progress config value to let the script know if a crawl was resumed or not.
Also restore the sp_sha1 data from the crawltemp table on a resumed crawl.
2023-06-12 12:19:00 -04:00
Brian Huisman fbca9808a2 Update admin.php
Specify `t` table to avoid ambiguity
2023-06-07 12:21:56 -04:00
Brian Huisman 563eb6d014 Query log fixes, multibyte search restrict
Get rid of "avg_results" value; it's not intuitive. Instead make sure to use the results tally from the last recorded search query.
Use mb_strlen and mb_substr to avoid searching for single, but multibyte characters like bullet (•).
2023-06-07 11:45:14 -04:00
Brian Huisman 8b024c438c Remove some unnecessary continues
Also add documentation for the crawler debug mode.
Scope fixes for JS output, still need to work on this.
2023-06-02 14:05:52 -04:00
Brian Huisman 56c84a89cb Prevent endless loop
If an orphan URL is blocked by a user rule, then remove it from the 'sp_exist' list so it doesn't keep coming back again and again. This only happens the next crawl after the user adds new rules.
Other misc edits.
2023-06-01 12:20:09 -04:00
Brian Huisman c7c4960e1e Strict in_array checking 2023-05-30 15:01:24 -04:00
Brian Huisman 44edb43dd2 Update admin.php
Change countup timer element from <span> to <time>. Separate time-since from duration.
2023-05-19 14:33:06 -04:00
Brian Huisman 08db10dcf5 Page titles and version numbers.
Make Administration UI page title the name of the page we are on. Don't need the version number here.
Display current version in the menu bar.
2023-05-19 13:55:34 -04:00
Brian Huisman 956af77655 Update admin.php
Add a message when there are no pages in the Page Index to list so it's less confusing than just an empty table.
2023-05-19 12:47:23 -04:00
Brian Huisman b2091b8cfb Update admin.php
Add display of # of unique IPs in Query Log
2023-05-18 09:11:03 -04:00
Brian Huisman 0f7ea69790 Store s_weights as JSON 2023-05-17 09:22:00 -04:00
Brian Huisman d8e9d5dc91 Admin UI edits for when crawl is in progress
Automatically encode/decode json when saving/reading ODATA config values.
Remove 'sp_links_crawled' config table value, now stored in 'sp_progress'.
Update Crawl Information window in real-time while crawler is running. Be more aggressive at reloading the page to get the latest data once a crawl has finished.
Time the setting of certain config values while crawling in a more sensible way.
2023-05-16 12:00:28 -04:00
Brian Huisman 4bb28031b6 Enable downloading Page Index
Allow downloading of the page index as a csv.
Remove unnecessary database columns url_base and status_noindex
Store list of domains at crawl so we don't need to request them every page-load; you will need to reinstall fresh because of this change
2023-05-12 10:06:57 -04:00
Brian Huisman d82ee666c7 Update admin.php
In the edge case where the same query is requested twice in the same second by different IPs, both would appear in the Query Log UI. Add a second GROUP BY to avoid this.
2023-05-08 07:32:39 -04:00
Brian Huisman 803155547d Rename to sp_punct
Rename sp_smart ("smart" punctuation) to the more general and accurate sp_punct
2023-05-05 11:54:07 -04:00
Brian Huisman 635422b1d6 Punctuation normalization and MIME-type display
Disable Query log download button if query log is empty.
Further database error resiliency.
Add many more punctuation normalization characters; normalize on search as well as storage.
Add count of MIME-types in Search Management UI.
2023-05-05 11:17:39 -04:00
Brian Huisman e6777287d7 Update admin.php
Add scrolling for queries that are too wide for mobile.
2023-05-04 15:06:30 -04:00
Brian Huisman 31421e47bc Update admin.php
Also add same geoip extra check for file download
2023-05-03 11:41:25 -04:00
Brian Huisman 8caf5440d3 Update admin.php
Fix for geolocates that succeed but don't return a country code.
2023-05-03 11:32:24 -04:00
Brian Huisman da235cdc95 Update admin.php
More filter row simplification
2023-05-01 11:40:32 -04:00
Brian Huisman 8a5f5965b0 Update admin.php
Tweak display of Filters row on Page Index
2023-05-01 11:18:49 -04:00
Brian Huisman 83f8fc9ed2 Javascript crawl support enhancement
Don't require reloading the page after a crawl has completed.
Javascript will dynamically update the Crawler Information values if we are on the Crawler Management page.
2023-04-28 13:55:26 -04:00
Brian Huisman 41f6b25f0f Allow specifying Default Category 2023-04-27 13:10:22 -04:00
Brian Huisman 8405e903c3 Update admin.php
Line up Page Index links a bit better
2023-04-26 20:25:56 -04:00
Brian Huisman ba04173c29 Daily updates
Keep Page Index pagination page within limits; add UTF-8 BOM to CSV and TXT download output; use utf8mb4_unicode_520_ci collation to remove need for SQL REGEXP; add more latin accent equivalent characters.
2023-04-26 15:16:13 -04:00
Brian Huisman 761491c21a Update admin.php
Make headers responsive too.
2023-04-25 15:14:18 -04:00
Brian Huisman b3b40a9194 Implement filetype: searching 2023-04-24 16:31:27 -04:00
Brian Huisman fc968ae460 Simplify, add titles to Download buttons 2023-04-24 13:47:18 -04:00
Brian Huisman ab7ad64ac1 Remove 'Allowed' lol 2023-04-22 21:50:07 -04:00
Brian Huisman fed2b979e1 Add query length limit option 2023-04-22 21:48:43 -04:00