Commit graph

181 commits

Author SHA1 Message Date
Brian Huisman 2b80228f2a Update PHP version check for 8.1.x 2024-05-27 11:52:40 -04:00
Brian Huisman 432e15699d Abbreviate display of IPv6 addresses 2024-05-27 11:18:41 -04:00
Brian Huisman 35cf52b65b Limit number of GEOIP lookups.
Don't geolocate every IP, just unique ones. We'll probably still need a cached set of previously geolocated IPs to speed this up further.
2024-05-27 11:00:36 -04:00
Brian Huisman 528a2dbf91 Store IP as text instead of INT (database change)
This change will require you to edit your database or reinstall the Orcinus Site Search from scratch after deleting all associated database tables.

Accounts for IPv4 and IPv6.
2024-05-17 15:12:09 -04:00
Brian Huisman 5af827a728
Update README.md
Place into "release candidate" status.
2024-05-17 12:10:28 -04:00
Brian Huisman d5cfaa9b95
Update README.md 2024-05-17 11:39:10 -04:00
Brian Huisman f34f5097fc Ignore Cloudflare timeout
Don't even notify the user of a Cloudflare timeout, just keep running the interval.
2024-05-17 10:22:18 -04:00
Brian Huisman f7fbaadf9b Kludge for 524 Cloudflare timeout response
There is no need to cancel the crawler for an HTTP 524 response.
2024-05-16 13:33:37 -04:00
Brian Huisman fb7e295490 Update PdfParser to 2.10.0 2024-05-16 12:36:43 -04:00
Brian Huisman 4f679114c3 Misc updates
Some small formatting updates.

Assert that some strings have a length before accessing them via string offset.
2024-05-16 12:30:53 -04:00
Brian Huisman 38a75ce70c Don't display zeros on the graph 2024-04-22 12:47:30 -04:00
Brian Huisman 54f9791a75
Update README.md
Update PHP requirement to currently-supported 8.1.x.
2023-12-12 11:07:13 -05:00
Brian Huisman 5d990e44b0 Update search.php
Don't allow JSON requests to trigger a new crawl or end a stuck one, since the requests may come too fast to handle.

Only allow triggering a new crawl if more than 'sp_timeout_crawl' seconds have passed since we canceled the previous one.

In the future, we might give each successfully initiated crawl a unique ID and then only allow sending a failure email once if it has failed. A very busy search engine is probably indistinguishable from a rapid-fire series of JSON requests.
2023-12-06 10:14:18 -05:00
Brian Huisman a125060d7f Fix $capture typo 2023-11-06 11:29:18 -05:00
Brian Huisman 0c1f359ef0 Graph tweaks
Determine the correct height of the bar from the data-value and the given height of the tbody rather than including an explicit data-height on the bar.

Better algorithm to determine where to draw horizontal lines on the graph.

Only display the top 10 geolocated search locations with the rest falling under "Other", unless there are only 11 locations in the list.
2023-11-06 11:27:15 -05:00
Brian Huisman 9e5dccf8b7 Add Statistics page
Added a Statistics page. Probably still needs some work.
2023-11-03 14:26:43 -04:00
Brian Huisman 438c520f7c Prevent division by zero in edge case 2023-10-18 14:23:00 -04:00
Brian Huisman 4bbe1d967b Misc fixes
Save the process id of the crawler in the sp_crawling DB value instead of just a flag; we can use it to compare and further prevent race conditions which still seem to happen occasionally.
2023-10-17 10:36:34 -04:00
Brian Huisman eed50c3727
Update README.md
Update w/ new banner image
2023-10-04 13:21:10 -04:00
Brian Huisman 5cb7c372fb Couple misc fixes
Change element where some classes are applied in admin.php to work with updated Bootstrap.

Add the fi, ff, fl, ffi, ffl series of ligatures to $_RDATA['s_latin'] as they are common in PDFs.
2023-09-29 12:57:23 -04:00
Brian Huisman dee454cb8c Update Bootstrap and jQuery
Bootstrap => v5.3.2
jQuery => v3.7.1
2023-09-28 11:37:46 -04:00
Brian Huisman 1860d1f8ce Totally forgot to actually implement this feature
The "remove text from titles" feature was coded into the admin UI from the previous version, but was never actually implemented in the crawler. Wow. It works now.
2023-09-27 15:33:06 -04:00
Brian Huisman 6c961d44a3 Add Query Log row display limit 2023-09-25 11:53:20 -04:00
Brian Huisman f8bed73c26 Responsive pagination bar on Page Index 2023-09-15 10:32:59 -04:00
Brian Huisman 873a18fbc9 Add text fragments flag and functionality 2023-09-14 13:18:33 -04:00
Brian Huisman da52e0f7bf Add header image and link
Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something?

Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip
2023-09-14 10:33:59 -04:00
Brian Huisman 4c78a5245f Use REPLACE INTO for resiliency 2023-09-12 10:44:14 -04:00
Brian Huisman d4e0e409fe Show Page Titles checkbox on Page Index
Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.
2023-09-11 13:32:45 -04:00
Brian Huisman dd88459d04 Update admin.php
PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.
2023-09-11 12:18:33 -04:00
Brian Huisman 511207e0b2 Add PDF Last Modified multiplier 2023-09-08 15:11:27 -04:00
Brian Huisman 302c8db00e Group statements
Group several statements into single statements. You might not like it, but this makes me happy. :)
2023-07-21 14:17:15 -04:00
Brian Huisman 382511077a Misc updates
Prettify some SQL code.
Add some error-reponse code for fatal failed SQL statements.
2023-07-21 13:04:51 -04:00
Brian Huisman edf2dc338c admin and pdfparser updates
Add some tooltip text to some elements in admin.php
Merge recent PdfParser updates into the library
2023-07-11 10:24:55 -04:00
Brian Huisman 229129a9e4 Update crawler.php
Get and set sp_crawling in real-time to minimize race conditions.
2023-07-06 15:09:31 -04:00
Brian Huisman 181addfd3d Update PDFDocEncoding.php
Add Yours Truly as the author of this file. I'm humble!
2023-07-04 14:06:28 -04:00
Brian Huisman a5ff604f58 Update crawler.php
Update crawler.php to also try using XMP metadata from updated PDFParser
2023-07-04 13:46:12 -04:00
Brian Huisman 06cc7fe325 PdfParser update
This update adds XMP metadata and PDFDocEncoding support for regular metadata.
2023-07-04 13:08:42 -04:00
Brian Huisman 30630c6c60 Start enforcing PHP and SQL version limitations. 2023-06-26 15:00:42 -04:00
Brian Huisman 5a39280858 PdfParser PHP-CS-Fixer updates 2023-06-23 12:23:20 -04:00
Brian Huisman 3307baac4d Update crawler.php
Run mb_convert_encoding in ALL cases to remove potentially invalid UTF-8 characters.
Add the "replacement" UTF-8 character to the whitespace array to ensure it's removed.
2023-06-22 15:35:40 -04:00
Brian Huisman b12e7991e0 Update PdfParser to latest snapshot
Update PdfParser to the latest snapshot from the repo.
Add code to allow PdfParser to decode XMP Metadata from PDF files, preferring it over other decoded data.
2023-06-22 15:33:53 -04:00
Brian Huisman 47562e0a71 Add 'online' value for Mustache template
Provide an 'online' value to the Search Result Mustache template. This will, for example, allow you to put things in your Search Result template that will show up when your site is displayed live (PHP), but will not be output when your site is displayed using the offline Javascript, and vice versa.

eg.
{{#online}}
  This will only display in your template if it's online.
{{/online}}
{{^online}}
  This will only display in your template it it's offline.
{{/online}}
2023-06-22 09:57:33 -04:00
Brian Huisman 042339d3ef Update crawler.php
Don't assume that other data from a PDF is the same as the content. Bypasses some still-unfixed PDFParser encoding issues.
Also exit the crawler script if we are in debug mode and there is a crawl already running.
2023-06-21 17:23:08 -04:00
Brian Huisman eda57224d9 Remove need for 'jw_depth' value
By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js
2023-06-21 15:07:57 -04:00
Brian Huisman b17a68c175 Update template.offline.js
Quote jw_depth string.
2023-06-21 12:09:28 -04:00
Brian Huisman 675b25b1e4 Update admin.php
Forgot a [0] dangit.
2023-06-19 12:31:47 -04:00
Brian Huisman 930d4fa793 Update admin.php
Test user-supplied regular expression matches for validity before saving.
2023-06-19 12:12:58 -04:00
Brian Huisman 0a83546411 Update crawler.php
Make sure regexp lines in require and ignore URL fields are actually treated as regexps.
2023-06-19 11:51:57 -04:00
Brian Huisman e9c0654295 Update search.js
Tweak the comment
2023-06-16 14:47:51 -04:00
Brian Huisman 54bbbb6a65 Log clicked search suggestions
If the search UI is using typeahead and the user selects a suggested option to go right to a page, then a search is never logged as a search query; it's like the search never happened. Add a fetch request to log the search query just before sending the user on their way to the page.
2023-06-16 14:38:24 -04:00