Brian Huisman
89f6fc2393
Try to resume a failed crawl.
...
Attempt to resume a crawl if it exited without going through the shutdown function
2023-05-30 15:53:41 -04:00
Brian Huisman
c7c4960e1e
Strict in_array checking
2023-05-30 15:01:24 -04:00
Brian Huisman
9e551324f3
data transferred
...
'sp_data_transferred' is now an ODATA variable.
2023-05-29 19:15:47 -04:00
Brian Huisman
44edb43dd2
Update admin.php
...
Change countup timer element from <span> to <time>. Separate time-since from duration.
2023-05-19 14:33:06 -04:00
Brian Huisman
08db10dcf5
Page titles and version numbers.
...
Make Administration UI page title the name of the page we are on. Don't need the version number here.
Display current version in the menu bar.
2023-05-19 13:55:34 -04:00
Brian Huisman
06c66f214a
Update search.php
...
Update to match similar cancel code in crawler.php.
Fix $reason typo.
2023-05-19 13:16:45 -04:00
Brian Huisman
956af77655
Update admin.php
...
Add a message when there are no pages in the Page Index to list so it's less confusing than just an empty table.
2023-05-19 12:47:23 -04:00
Brian Huisman
0728849ea4
Update search.php
...
Ensure that important or negative match strings are not empty.
2023-05-19 12:22:21 -04:00
Brian Huisman
b2091b8cfb
Update admin.php
...
Add display of # of unique IPs in Query Log
2023-05-18 09:11:03 -04:00
Brian Huisman
f55f9e71b3
Use <mark> instead of <strong>
...
Also make it easier for a savvy user to use whatever HTML element they like for highlighting.
2023-05-17 14:21:37 -04:00
Brian Huisman
49ff8d6e4e
Stop the crawl time JS counter on crawl complete
2023-05-17 09:52:43 -04:00
Brian Huisman
0f7ea69790
Store s_weights as JSON
2023-05-17 09:22:00 -04:00
Brian Huisman
d8e9d5dc91
Admin UI edits for when crawl is in progress
...
Automatically encode/decode json when saving/reading ODATA config values.
Remove 'sp_links_crawled' config table value, now stored in 'sp_progress'.
Update Crawl Information window in real-time while crawler is running. Be more aggressive at reloading the page to get the latest data once a crawl has finished.
Time the setting of certain config values while crawling in a more sensible way.
2023-05-16 12:00:28 -04:00
Brian Huisman
f16c4f9e0a
Refactor character nomalization
...
Refactor the whitespace and punctuation normalization arrays.
2023-05-12 13:41:36 -04:00
Brian Huisman
f7bc731ba2
Update config.php
...
Takes care of when config table sp_domains value is empty.
2023-05-12 10:29:14 -04:00
Brian Huisman
4bb28031b6
Enable downloading Page Index
...
Allow downloading of the page index as a csv.
Remove unnecessary database columns url_base and status_noindex
Store list of domains at crawl so we don't need to request them every page-load; you will need to reinstall fresh because of this change
2023-05-12 10:06:57 -04:00
Brian Huisman
bab4a7e2c5
Update config.php
...
Remove 'a' from exception text. Just grammar.
2023-05-08 15:40:11 -04:00
Brian Huisman
d82ee666c7
Update admin.php
...
In the edge case where the same query is requested twice in the same second by different IPs, both would appear in the Query Log UI. Add a second GROUP BY to avoid this.
2023-05-08 07:32:39 -04:00
Brian Huisman
8a8623b440
Update config.php
...
Preliminary code to check for DB version
2023-05-05 12:58:28 -04:00
Brian Huisman
803155547d
Rename to sp_punct
...
Rename sp_smart ("smart" punctuation) to the more general and accurate sp_punct
2023-05-05 11:54:07 -04:00
Brian Huisman
635422b1d6
Punctuation normalization and MIME-type display
...
Disable Query log download button if query log is empty.
Further database error resiliency.
Add many more punctuation normalization characters; normalize on search as well as storage.
Add count of MIME-types in Search Management UI.
2023-05-05 11:17:39 -04:00
Brian Huisman
e6777287d7
Update admin.php
...
Add scrolling for queries that are too wide for mobile.
2023-05-04 15:06:30 -04:00
Brian Huisman
31421e47bc
Update admin.php
...
Also add same geoip extra check for file download
2023-05-03 11:41:25 -04:00
Brian Huisman
8caf5440d3
Update admin.php
...
Fix for geolocates that succeed but don't return a country code.
2023-05-03 11:32:24 -04:00
Brian Huisman
da235cdc95
Update admin.php
...
More filter row simplification
2023-05-01 11:40:32 -04:00
Brian Huisman
8a5f5965b0
Update admin.php
...
Tweak display of Filters row on Page Index
2023-05-01 11:18:49 -04:00
Brian Huisman
9734d0aa5a
Update admin.css
...
Add a box shadow to flags to make white-on-white more visible.
2023-04-28 14:11:31 -04:00
Brian Huisman
83f8fc9ed2
Javascript crawl support enhancement
...
Don't require reloading the page after a crawl has completed.
Javascript will dynamically update the Crawler Information values if we are on the Crawler Management page.
2023-04-28 13:55:26 -04:00
Brian Huisman
ddc601697c
Ping server to see if crawl has started
...
If admin UI is loaded while a crawl is not running, add a ping every 5 seconds to check if one has started. Fix issue where reloading the page while a crawl was running would cause a JS error that would cancel the crawl.
2023-04-28 12:26:58 -04:00
Brian Huisman
0c733426db
Update crawler.php
...
Use the previously crawled page's category value if available.
2023-04-27 13:22:20 -04:00
Brian Huisman
41f6b25f0f
Allow specifying Default Category
2023-04-27 13:10:22 -04:00
Brian Huisman
619e5b7f11
Documentation additions
2023-04-26 20:50:37 -04:00
Brian Huisman
8405e903c3
Update admin.php
...
Line up Page Index links a bit better
2023-04-26 20:25:56 -04:00
Brian Huisman
ba04173c29
Daily updates
...
Keep Page Index pagination page within limits; add UTF-8 BOM to CSV and TXT download output; use utf8mb4_unicode_520_ci collation to remove need for SQL REGEXP; add more latin accent equivalent characters.
2023-04-26 15:16:13 -04:00
Brian Huisman
761491c21a
Update admin.php
...
Make headers responsive too.
2023-04-25 15:14:18 -04:00
Brian Huisman
53e86085bc
Update crawler.php
...
Don't need trim() here. OS_cleanTextUTF8 already does it.
2023-04-25 13:35:22 -04:00
Brian Huisman
46f68d1335
Update config.ini.php
...
Add double underscore to default prefix so phpMyAdmin shows tables as a group.
2023-04-25 12:58:58 -04:00
Brian Huisman
8d091c8195
Update crawler.php
...
Add error condition for empty PDF, don't index.
2023-04-25 12:46:38 -04:00
Brian Huisman
a444d383da
Update admin.js
...
Disable "grep" when Run Crawler modal is raised. It will be re-enabled if the crawler is currently running.
2023-04-25 11:05:01 -04:00
Brian Huisman
2665cff354
Change If-Modified-Since calculation
...
Use the last_modified date of the individual file for the If-Modified-Since header instead of the date of the last successful crawl.
2023-04-25 10:01:53 -04:00
Brian Huisman
b3b40a9194
Implement filetype: searching
2023-04-24 16:31:27 -04:00
Brian Huisman
fc968ae460
Simplify, add titles to Download buttons
2023-04-24 13:47:18 -04:00
Brian Huisman
150f98883d
$_SDATA['pages'] is always at least 1
2023-04-24 13:37:44 -04:00
Brian Huisman
0a1c1a52e1
Search for latin accents explicitly via SQL REGEXP
2023-04-24 13:04:44 -04:00
Brian Huisman
8edc94b550
Allow search.php to unstick stuck crawls
2023-04-24 10:42:29 -04:00
Brian Huisman
0f69a2d2c8
Enable ligature / alternate-spelling matching
2023-04-24 09:52:05 -04:00
Brian Huisman
a6304d2f5d
Testing ligature search changes
2023-04-24 08:27:45 -04:00
Brian Huisman
ab7ad64ac1
Remove 'Allowed' lol
2023-04-22 21:50:07 -04:00
Brian Huisman
fed2b979e1
Add query length limit option
2023-04-22 21:48:43 -04:00
Brian Huisman
57ef2a6599
fclose
2023-04-21 16:25:28 -04:00