Commit graph

135 commits

Author SHA1 Message Date
Brian Huisman 89f6fc2393 Try to resume a failed crawl.
Attempt to resume a crawl if it exited without going through the shutdown function
2023-05-30 15:53:41 -04:00
Brian Huisman c7c4960e1e Strict in_array checking 2023-05-30 15:01:24 -04:00
Brian Huisman 9e551324f3 data transferred
'sp_data_transferred' is now an ODATA variable.
2023-05-29 19:15:47 -04:00
Brian Huisman 44edb43dd2 Update admin.php
Change countup timer element from <span> to <time>. Separate time-since from duration.
2023-05-19 14:33:06 -04:00
Brian Huisman 08db10dcf5 Page titles and version numbers.
Make Administration UI page title the name of the page we are on. Don't need the version number here.
Display current version in the menu bar.
2023-05-19 13:55:34 -04:00
Brian Huisman 06c66f214a Update search.php
Update to match similar cancel code in crawler.php.
Fix $reason typo.
2023-05-19 13:16:45 -04:00
Brian Huisman 956af77655 Update admin.php
Add a message when there are no pages in the Page Index to list so it's less confusing than just an empty table.
2023-05-19 12:47:23 -04:00
Brian Huisman 0728849ea4 Update search.php
Ensure that important or negative match strings are not empty.
2023-05-19 12:22:21 -04:00
Brian Huisman b2091b8cfb Update admin.php
Add display of # of unique IPs in Query Log
2023-05-18 09:11:03 -04:00
Brian Huisman f55f9e71b3 Use <mark> instead of <strong>
Also make it easier for a savvy user to use whatever HTML element they like for highlighting.
2023-05-17 14:21:37 -04:00
Brian Huisman 49ff8d6e4e Stop the crawl time JS counter on crawl complete 2023-05-17 09:52:43 -04:00
Brian Huisman 0f7ea69790 Store s_weights as JSON 2023-05-17 09:22:00 -04:00
Brian Huisman d8e9d5dc91 Admin UI edits for when crawl is in progress
Automatically encode/decode json when saving/reading ODATA config values.
Remove 'sp_links_crawled' config table value, now stored in 'sp_progress'.
Update Crawl Information window in real-time while crawler is running. Be more aggressive at reloading the page to get the latest data once a crawl has finished.
Time the setting of certain config values while crawling in a more sensible way.
2023-05-16 12:00:28 -04:00
Brian Huisman f16c4f9e0a Refactor character nomalization
Refactor the whitespace and punctuation normalization arrays.
2023-05-12 13:41:36 -04:00
Brian Huisman f7bc731ba2 Update config.php
Takes care of when config table sp_domains value is empty.
2023-05-12 10:29:14 -04:00
Brian Huisman 4bb28031b6 Enable downloading Page Index
Allow downloading of the page index as a csv.
Remove unnecessary database columns url_base and status_noindex
Store list of domains at crawl so we don't need to request them every page-load; you will need to reinstall fresh because of this change
2023-05-12 10:06:57 -04:00
Brian Huisman bab4a7e2c5 Update config.php
Remove 'a' from exception text. Just grammar.
2023-05-08 15:40:11 -04:00
Brian Huisman d82ee666c7 Update admin.php
In the edge case where the same query is requested twice in the same second by different IPs, both would appear in the Query Log UI. Add a second GROUP BY to avoid this.
2023-05-08 07:32:39 -04:00
Brian Huisman 8a8623b440 Update config.php
Preliminary code to check for DB version
2023-05-05 12:58:28 -04:00
Brian Huisman 803155547d Rename to sp_punct
Rename sp_smart ("smart" punctuation) to the more general and accurate sp_punct
2023-05-05 11:54:07 -04:00
Brian Huisman 635422b1d6 Punctuation normalization and MIME-type display
Disable Query log download button if query log is empty.
Further database error resiliency.
Add many more punctuation normalization characters; normalize on search as well as storage.
Add count of MIME-types in Search Management UI.
2023-05-05 11:17:39 -04:00
Brian Huisman e6777287d7 Update admin.php
Add scrolling for queries that are too wide for mobile.
2023-05-04 15:06:30 -04:00
Brian Huisman 31421e47bc Update admin.php
Also add same geoip extra check for file download
2023-05-03 11:41:25 -04:00
Brian Huisman 8caf5440d3 Update admin.php
Fix for geolocates that succeed but don't return a country code.
2023-05-03 11:32:24 -04:00
Brian Huisman da235cdc95 Update admin.php
More filter row simplification
2023-05-01 11:40:32 -04:00
Brian Huisman 8a5f5965b0 Update admin.php
Tweak display of Filters row on Page Index
2023-05-01 11:18:49 -04:00
Brian Huisman 9734d0aa5a Update admin.css
Add a box shadow to flags to make white-on-white more visible.
2023-04-28 14:11:31 -04:00
Brian Huisman 83f8fc9ed2 Javascript crawl support enhancement
Don't require reloading the page after a crawl has completed.
Javascript will dynamically update the Crawler Information values if we are on the Crawler Management page.
2023-04-28 13:55:26 -04:00
Brian Huisman ddc601697c Ping server to see if crawl has started
If admin UI is loaded while a crawl is not running, add a ping every 5 seconds to check if one has started. Fix issue where reloading the page while a crawl was running would cause a JS error that would cancel the crawl.
2023-04-28 12:26:58 -04:00
Brian Huisman 0c733426db Update crawler.php
Use the previously crawled page's category value if available.
2023-04-27 13:22:20 -04:00
Brian Huisman 41f6b25f0f Allow specifying Default Category 2023-04-27 13:10:22 -04:00
Brian Huisman 8405e903c3 Update admin.php
Line up Page Index links a bit better
2023-04-26 20:25:56 -04:00
Brian Huisman ba04173c29 Daily updates
Keep Page Index pagination page within limits; add UTF-8 BOM to CSV and TXT download output; use utf8mb4_unicode_520_ci collation to remove need for SQL REGEXP; add more latin accent equivalent characters.
2023-04-26 15:16:13 -04:00
Brian Huisman 761491c21a Update admin.php
Make headers responsive too.
2023-04-25 15:14:18 -04:00
Brian Huisman 53e86085bc Update crawler.php
Don't need trim() here. OS_cleanTextUTF8 already does it.
2023-04-25 13:35:22 -04:00
Brian Huisman 46f68d1335 Update config.ini.php
Add double underscore to default prefix so phpMyAdmin shows tables as a group.
2023-04-25 12:58:58 -04:00
Brian Huisman 8d091c8195 Update crawler.php
Add error condition for empty PDF, don't index.
2023-04-25 12:46:38 -04:00
Brian Huisman a444d383da Update admin.js
Disable "grep" when Run Crawler modal is raised. It will be re-enabled if the crawler is currently running.
2023-04-25 11:05:01 -04:00
Brian Huisman 2665cff354 Change If-Modified-Since calculation
Use the last_modified date of the individual file for the If-Modified-Since header instead of the date of the last successful crawl.
2023-04-25 10:01:53 -04:00
Brian Huisman b3b40a9194 Implement filetype: searching 2023-04-24 16:31:27 -04:00
Brian Huisman fc968ae460 Simplify, add titles to Download buttons 2023-04-24 13:47:18 -04:00
Brian Huisman 150f98883d $_SDATA['pages'] is always at least 1 2023-04-24 13:37:44 -04:00
Brian Huisman 0a1c1a52e1 Search for latin accents explicitly via SQL REGEXP 2023-04-24 13:04:44 -04:00
Brian Huisman 8edc94b550 Allow search.php to unstick stuck crawls 2023-04-24 10:42:29 -04:00
Brian Huisman 0f69a2d2c8 Enable ligature / alternate-spelling matching 2023-04-24 09:52:05 -04:00
Brian Huisman a6304d2f5d Testing ligature search changes 2023-04-24 08:27:45 -04:00
Brian Huisman ab7ad64ac1 Remove 'Allowed' lol 2023-04-22 21:50:07 -04:00
Brian Huisman fed2b979e1 Add query length limit option 2023-04-22 21:48:43 -04:00
Brian Huisman 57ef2a6599 fclose 2023-04-21 16:25:28 -04:00
Brian Huisman 8d99e4fd41 Enable downloading Query Log CSV 2023-04-21 11:27:46 -04:00
Brian Huisman daaf934e33 REGEXP_REPLACE
Use REGEXP_REPLACE to capture all leading punctuation, not just ' and "
2023-04-20 22:35:32 -04:00
Brian Huisman 47e0173a1d Maybe let Mustache do the work here 2023-04-20 16:18:51 -04:00
Brian Huisman 2013f64a39 Use raw title for JSON (typeahead) output 2023-04-20 16:11:56 -04:00
Brian Huisman 1d5204caeb Also trim ' from queries for ordering 2023-04-20 14:59:37 -04:00
Brian Huisman c81e043e7f Minor admin.php tweaks 2023-04-20 14:45:48 -04:00
Brian Huisman b4ef2827ae Fix test for 'geo' key 2023-04-20 12:09:55 -04:00
Brian Huisman 8768cae733 Fresh install update 2023-04-20 11:20:10 -04:00
Brian Huisman cac7e90930 Change $_TEMPLATE to $_ORCINUS 2023-04-20 11:03:08 -04:00
Brian Huisman 84e38a5663 Re-upload 3rd party libraries 2023-04-20 10:47:11 -04:00
Brian Huisman 4f459b61d2 Delete all included repos for reupload 2023-04-20 10:20:38 -04:00
Brian Huisman 358fa42aee Update crawler.php 2023-04-19 16:23:42 -04:00
Brian Huisman 1363370840 Fix for dynamic classes deprecation in PHP 8.2 2023-04-19 11:50:48 -04:00
Brian Huisman 11afbe12e8 Update admin.php 2023-04-18 19:20:33 -04:00
Brian Huisman 6c78df9d92 Choose entity flag based on DOCTYPE 2023-04-18 18:36:11 -04:00
Brian Huisman ec2b7aa075 Daily updates, big flow change in crawler.php 2023-04-18 17:20:27 -04:00
Brian Huisman a57cb3ca83 Proper breaks in switch 2023-04-17 18:48:02 -04:00
Brian Huisman 553fc019fe Daily update 2023-04-17 17:47:22 -04:00
Brian Huisman 3b5a22794c Update config.php 2023-04-13 11:44:52 -04:00
Brian Huisman c888c40e0d Merge branch 'main' of https://github.com/GreyWyvern/orcinus-search 2023-04-13 11:11:56 -04:00
Brian Huisman dd2de5ff46 Update config.php 2023-04-13 11:11:53 -04:00
Brian Huisman e05fa2ebca
Update README.md 2023-04-13 11:09:12 -04:00
Brian Huisman e5cb428d62
Update README.md 2023-04-13 11:08:43 -04:00
Brian Huisman 15028bd5a0
Update and rename README.txt to README.md 2023-04-13 11:08:25 -04:00
Brian Huisman 17fa8fae05 Tighten up file headings 2023-04-13 08:27:41 -04:00
Brian Huisman b04facafc2 Update search.css 2023-04-13 08:03:25 -04:00
Brian Huisman 1076b80ef9 Better targetting for typeahead CSS 2023-04-12 21:32:06 -04:00
Brian Huisman f37a732ee9 ARIA search updates 2023-04-12 21:08:53 -04:00
Brian Huisman 519ba2dda6 Add typeahead class to default template 2023-04-12 19:13:33 -04:00
Brian Huisman 062f009829 Updates for the day 2023-04-12 19:08:00 -04:00
Brian Huisman f152264323
Update config.ini.php 2023-04-12 08:34:37 -04:00
Brian Huisman 9ca36b8a0a Add back GeoIP2 directory and README 2023-04-12 08:32:40 -04:00
Brian Huisman 431d9bda0c Delete to recreate GeoIP2 directory 2023-04-12 08:31:58 -04:00
Brian Huisman 5b97f545df Capitalize GeoIP2 2023-04-12 08:30:31 -04:00
Brian Huisman 595740962e Update name to Orcinus 2023-04-12 08:28:29 -04:00
Brian Huisman bffa144421 move os3/ to orcinus/ 2023-04-12 08:08:11 -04:00