Brian Huisman
4caa178acc
Update admin.php
...
Tighten up the Offline javascript export statement
2023-06-14 15:16:03 -04:00
Brian Huisman
1ce34d9e41
Offline javascript output
...
Make the javascript output text into a Mustache template.
Add the jw_depth variable.
2023-06-14 14:43:55 -04:00
Brian Huisman
c069e44765
Unique IPs in modal
...
Add a listing for "Unique IPs" in the modal popup on the Query Log page.
2023-06-14 10:27:54 -04:00
Brian Huisman
fd2bbf745f
Add 'resumed' flag to sp_progress
...
Add a third value to the sp_progress config value to let the script know if a crawl was resumed or not.
Also restore the sp_sha1 data from the crawltemp table on a resumed crawl.
2023-06-12 12:19:00 -04:00
Brian Huisman
6d9d897784
Update pdfparser to 2.5.0
2023-06-12 09:05:03 -04:00
Brian Huisman
5cfeb0a414
Update crawler.php
...
Also rebuild the domains list if a crawl is resumed.
2023-06-08 09:03:45 -04:00
Brian Huisman
fbca9808a2
Update admin.php
...
Specify `t` table to avoid ambiguity
2023-06-07 12:21:56 -04:00
Brian Huisman
563eb6d014
Query log fixes, multibyte search restrict
...
Get rid of "avg_results" value; it's not intuitive. Instead make sure to use the results tally from the last recorded search query.
Use mb_strlen and mb_substr to avoid searching for single, but multibyte characters like bullet (•).
2023-06-07 11:45:14 -04:00
Brian Huisman
b9d0ff1665
Update search.php
...
Using INSTR was correctly matching searches for 'ae' to the ligature æ, but was not matching searches for plain 'a' to å. However, using LIKE behaves exactly the opposite of this. Unless there is a better solution, use both INSTR and LIKE to create the query so all bases are covered.
2023-06-07 08:52:59 -04:00
Brian Huisman
87ecb553a7
Update crawler.php
...
Whoops, remove debug code.
2023-06-05 11:00:40 -04:00
Brian Huisman
783f1d97ca
Update crawler.php
...
Merge function of $updateNotModified SQL statement with $insertNotModified.
2023-06-05 10:58:32 -04:00
Brian Huisman
8b024c438c
Remove some unnecessary continues
...
Also add documentation for the crawler debug mode.
Scope fixes for JS output, still need to work on this.
2023-06-02 14:05:52 -04:00
Brian Huisman
56c84a89cb
Prevent endless loop
...
If an orphan URL is blocked by a user rule, then remove it from the 'sp_exist' list so it doesn't keep coming back again and again. This only happens the next crawl after the user adds new rules.
Other misc edits.
2023-06-01 12:20:09 -04:00
Brian Huisman
3f9d713633
Simplify logging
2023-05-30 16:12:01 -04:00
Brian Huisman
727936cb80
Update crawler.php
...
$url => $row['url'] fix, and a couple other tweaks.
2023-05-30 16:07:41 -04:00
Brian Huisman
89f6fc2393
Try to resume a failed crawl.
...
Attempt to resume a crawl if it exited without going through the shutdown function
2023-05-30 15:53:41 -04:00
Brian Huisman
c7c4960e1e
Strict in_array checking
2023-05-30 15:01:24 -04:00
Brian Huisman
9e551324f3
data transferred
...
'sp_data_transferred' is now an ODATA variable.
2023-05-29 19:15:47 -04:00
Brian Huisman
44edb43dd2
Update admin.php
...
Change countup timer element from <span> to <time>. Separate time-since from duration.
2023-05-19 14:33:06 -04:00
Brian Huisman
08db10dcf5
Page titles and version numbers.
...
Make Administration UI page title the name of the page we are on. Don't need the version number here.
Display current version in the menu bar.
2023-05-19 13:55:34 -04:00
Brian Huisman
06c66f214a
Update search.php
...
Update to match similar cancel code in crawler.php.
Fix $reason typo.
2023-05-19 13:16:45 -04:00
Brian Huisman
956af77655
Update admin.php
...
Add a message when there are no pages in the Page Index to list so it's less confusing than just an empty table.
2023-05-19 12:47:23 -04:00
Brian Huisman
0728849ea4
Update search.php
...
Ensure that important or negative match strings are not empty.
2023-05-19 12:22:21 -04:00
Brian Huisman
b2091b8cfb
Update admin.php
...
Add display of # of unique IPs in Query Log
2023-05-18 09:11:03 -04:00
Brian Huisman
f55f9e71b3
Use <mark> instead of <strong>
...
Also make it easier for a savvy user to use whatever HTML element they like for highlighting.
2023-05-17 14:21:37 -04:00
Brian Huisman
49ff8d6e4e
Stop the crawl time JS counter on crawl complete
2023-05-17 09:52:43 -04:00
Brian Huisman
0f7ea69790
Store s_weights as JSON
2023-05-17 09:22:00 -04:00
Brian Huisman
d8e9d5dc91
Admin UI edits for when crawl is in progress
...
Automatically encode/decode json when saving/reading ODATA config values.
Remove 'sp_links_crawled' config table value, now stored in 'sp_progress'.
Update Crawl Information window in real-time while crawler is running. Be more aggressive at reloading the page to get the latest data once a crawl has finished.
Time the setting of certain config values while crawling in a more sensible way.
2023-05-16 12:00:28 -04:00
Brian Huisman
f16c4f9e0a
Refactor character nomalization
...
Refactor the whitespace and punctuation normalization arrays.
2023-05-12 13:41:36 -04:00
Brian Huisman
f7bc731ba2
Update config.php
...
Takes care of when config table sp_domains value is empty.
2023-05-12 10:29:14 -04:00
Brian Huisman
4bb28031b6
Enable downloading Page Index
...
Allow downloading of the page index as a csv.
Remove unnecessary database columns url_base and status_noindex
Store list of domains at crawl so we don't need to request them every page-load; you will need to reinstall fresh because of this change
2023-05-12 10:06:57 -04:00
Brian Huisman
bab4a7e2c5
Update config.php
...
Remove 'a' from exception text. Just grammar.
2023-05-08 15:40:11 -04:00
Brian Huisman
d82ee666c7
Update admin.php
...
In the edge case where the same query is requested twice in the same second by different IPs, both would appear in the Query Log UI. Add a second GROUP BY to avoid this.
2023-05-08 07:32:39 -04:00
Brian Huisman
8a8623b440
Update config.php
...
Preliminary code to check for DB version
2023-05-05 12:58:28 -04:00
Brian Huisman
803155547d
Rename to sp_punct
...
Rename sp_smart ("smart" punctuation) to the more general and accurate sp_punct
2023-05-05 11:54:07 -04:00
Brian Huisman
635422b1d6
Punctuation normalization and MIME-type display
...
Disable Query log download button if query log is empty.
Further database error resiliency.
Add many more punctuation normalization characters; normalize on search as well as storage.
Add count of MIME-types in Search Management UI.
2023-05-05 11:17:39 -04:00
Brian Huisman
e6777287d7
Update admin.php
...
Add scrolling for queries that are too wide for mobile.
2023-05-04 15:06:30 -04:00
Brian Huisman
31421e47bc
Update admin.php
...
Also add same geoip extra check for file download
2023-05-03 11:41:25 -04:00
Brian Huisman
8caf5440d3
Update admin.php
...
Fix for geolocates that succeed but don't return a country code.
2023-05-03 11:32:24 -04:00
Brian Huisman
da235cdc95
Update admin.php
...
More filter row simplification
2023-05-01 11:40:32 -04:00
Brian Huisman
8a5f5965b0
Update admin.php
...
Tweak display of Filters row on Page Index
2023-05-01 11:18:49 -04:00
Brian Huisman
9734d0aa5a
Update admin.css
...
Add a box shadow to flags to make white-on-white more visible.
2023-04-28 14:11:31 -04:00
Brian Huisman
83f8fc9ed2
Javascript crawl support enhancement
...
Don't require reloading the page after a crawl has completed.
Javascript will dynamically update the Crawler Information values if we are on the Crawler Management page.
2023-04-28 13:55:26 -04:00
Brian Huisman
ddc601697c
Ping server to see if crawl has started
...
If admin UI is loaded while a crawl is not running, add a ping every 5 seconds to check if one has started. Fix issue where reloading the page while a crawl was running would cause a JS error that would cancel the crawl.
2023-04-28 12:26:58 -04:00
Brian Huisman
0c733426db
Update crawler.php
...
Use the previously crawled page's category value if available.
2023-04-27 13:22:20 -04:00
Brian Huisman
41f6b25f0f
Allow specifying Default Category
2023-04-27 13:10:22 -04:00
Brian Huisman
8405e903c3
Update admin.php
...
Line up Page Index links a bit better
2023-04-26 20:25:56 -04:00
Brian Huisman
ba04173c29
Daily updates
...
Keep Page Index pagination page within limits; add UTF-8 BOM to CSV and TXT download output; use utf8mb4_unicode_520_ci collation to remove need for SQL REGEXP; add more latin accent equivalent characters.
2023-04-26 15:16:13 -04:00
Brian Huisman
761491c21a
Update admin.php
...
Make headers responsive too.
2023-04-25 15:14:18 -04:00
Brian Huisman
53e86085bc
Update crawler.php
...
Don't need trim() here. OS_cleanTextUTF8 already does it.
2023-04-25 13:35:22 -04:00
Brian Huisman
46f68d1335
Update config.ini.php
...
Add double underscore to default prefix so phpMyAdmin shows tables as a group.
2023-04-25 12:58:58 -04:00
Brian Huisman
8d091c8195
Update crawler.php
...
Add error condition for empty PDF, don't index.
2023-04-25 12:46:38 -04:00
Brian Huisman
a444d383da
Update admin.js
...
Disable "grep" when Run Crawler modal is raised. It will be re-enabled if the crawler is currently running.
2023-04-25 11:05:01 -04:00
Brian Huisman
2665cff354
Change If-Modified-Since calculation
...
Use the last_modified date of the individual file for the If-Modified-Since header instead of the date of the last successful crawl.
2023-04-25 10:01:53 -04:00
Brian Huisman
b3b40a9194
Implement filetype: searching
2023-04-24 16:31:27 -04:00
Brian Huisman
fc968ae460
Simplify, add titles to Download buttons
2023-04-24 13:47:18 -04:00
Brian Huisman
150f98883d
$_SDATA['pages'] is always at least 1
2023-04-24 13:37:44 -04:00
Brian Huisman
0a1c1a52e1
Search for latin accents explicitly via SQL REGEXP
2023-04-24 13:04:44 -04:00
Brian Huisman
8edc94b550
Allow search.php to unstick stuck crawls
2023-04-24 10:42:29 -04:00
Brian Huisman
0f69a2d2c8
Enable ligature / alternate-spelling matching
2023-04-24 09:52:05 -04:00
Brian Huisman
a6304d2f5d
Testing ligature search changes
2023-04-24 08:27:45 -04:00
Brian Huisman
ab7ad64ac1
Remove 'Allowed' lol
2023-04-22 21:50:07 -04:00
Brian Huisman
fed2b979e1
Add query length limit option
2023-04-22 21:48:43 -04:00
Brian Huisman
57ef2a6599
fclose
2023-04-21 16:25:28 -04:00
Brian Huisman
8d99e4fd41
Enable downloading Query Log CSV
2023-04-21 11:27:46 -04:00
Brian Huisman
daaf934e33
REGEXP_REPLACE
...
Use REGEXP_REPLACE to capture all leading punctuation, not just ' and "
2023-04-20 22:35:32 -04:00
Brian Huisman
47e0173a1d
Maybe let Mustache do the work here
2023-04-20 16:18:51 -04:00
Brian Huisman
2013f64a39
Use raw title for JSON (typeahead) output
2023-04-20 16:11:56 -04:00
Brian Huisman
1d5204caeb
Also trim ' from queries for ordering
2023-04-20 14:59:37 -04:00
Brian Huisman
c81e043e7f
Minor admin.php tweaks
2023-04-20 14:45:48 -04:00
Brian Huisman
b4ef2827ae
Fix test for 'geo' key
2023-04-20 12:09:55 -04:00
Brian Huisman
8768cae733
Fresh install update
2023-04-20 11:20:10 -04:00
Brian Huisman
cac7e90930
Change $_TEMPLATE to $_ORCINUS
2023-04-20 11:03:08 -04:00
Brian Huisman
84e38a5663
Re-upload 3rd party libraries
2023-04-20 10:47:11 -04:00
Brian Huisman
4f459b61d2
Delete all included repos for reupload
2023-04-20 10:20:38 -04:00
Brian Huisman
358fa42aee
Update crawler.php
2023-04-19 16:23:42 -04:00
Brian Huisman
1363370840
Fix for dynamic classes deprecation in PHP 8.2
2023-04-19 11:50:48 -04:00
Brian Huisman
11afbe12e8
Update admin.php
2023-04-18 19:20:33 -04:00
Brian Huisman
6c78df9d92
Choose entity flag based on DOCTYPE
2023-04-18 18:36:11 -04:00
Brian Huisman
ec2b7aa075
Daily updates, big flow change in crawler.php
2023-04-18 17:20:27 -04:00
Brian Huisman
a57cb3ca83
Proper breaks in switch
2023-04-17 18:48:02 -04:00
Brian Huisman
553fc019fe
Daily update
2023-04-17 17:47:22 -04:00
Brian Huisman
3b5a22794c
Update config.php
2023-04-13 11:44:52 -04:00
Brian Huisman
c888c40e0d
Merge branch 'main' of https://github.com/GreyWyvern/orcinus-search
2023-04-13 11:11:56 -04:00
Brian Huisman
dd2de5ff46
Update config.php
2023-04-13 11:11:53 -04:00
Brian Huisman
e05fa2ebca
Update README.md
2023-04-13 11:09:12 -04:00
Brian Huisman
e5cb428d62
Update README.md
2023-04-13 11:08:43 -04:00
Brian Huisman
15028bd5a0
Update and rename README.txt to README.md
2023-04-13 11:08:25 -04:00
Brian Huisman
17fa8fae05
Tighten up file headings
2023-04-13 08:27:41 -04:00
Brian Huisman
b04facafc2
Update search.css
2023-04-13 08:03:25 -04:00
Brian Huisman
1076b80ef9
Better targetting for typeahead CSS
2023-04-12 21:32:06 -04:00
Brian Huisman
f37a732ee9
ARIA search updates
2023-04-12 21:08:53 -04:00
Brian Huisman
519ba2dda6
Add typeahead class to default template
2023-04-12 19:13:33 -04:00
Brian Huisman
062f009829
Updates for the day
2023-04-12 19:08:00 -04:00
Brian Huisman
f152264323
Update config.ini.php
2023-04-12 08:34:37 -04:00
Brian Huisman
9ca36b8a0a
Add back GeoIP2 directory and README
2023-04-12 08:32:40 -04:00
Brian Huisman
431d9bda0c
Delete to recreate GeoIP2 directory
2023-04-12 08:31:58 -04:00
Brian Huisman
5b97f545df
Capitalize GeoIP2
2023-04-12 08:30:31 -04:00
Brian Huisman
595740962e
Update name to Orcinus
2023-04-12 08:28:29 -04:00
Brian Huisman
bffa144421
move os3/ to orcinus/
2023-04-12 08:08:11 -04:00