beenull/orcinus-search

Author	SHA1	Message	Date
Brian Huisman	5cb7c372fb	Couple misc fixes Change element where some classes are applied in admin.php to work with updated Bootstrap. Add the fi, ff, fl, ffi, ffl series of ligatures to $_RDATA['s_latin'] as they are common in PDFs.	2023-09-29 12:57:23 -04:00
Brian Huisman	dee454cb8c	Update Bootstrap and jQuery Bootstrap => v5.3.2 jQuery => v3.7.1	2023-09-28 11:37:46 -04:00
Brian Huisman	6c961d44a3	Add Query Log row display limit	2023-09-25 11:53:20 -04:00
Brian Huisman	f8bed73c26	Responsive pagination bar on Page Index	2023-09-15 10:32:59 -04:00
Brian Huisman	873a18fbc9	Add text fragments flag and functionality	2023-09-14 13:18:33 -04:00
Brian Huisman	da52e0f7bf	Add header image and link Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something? Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip	2023-09-14 10:33:59 -04:00
Brian Huisman	d4e0e409fe	Show Page Titles checkbox on Page Index Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.	2023-09-11 13:32:45 -04:00
Brian Huisman	dd88459d04	Update admin.php PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.	2023-09-11 12:18:33 -04:00
Brian Huisman	511207e0b2	Add PDF Last Modified multiplier	2023-09-08 15:11:27 -04:00
Brian Huisman	302c8db00e	Group statements Group several statements into single statements. You might not like it, but this makes me happy. :)	2023-07-21 14:17:15 -04:00
Brian Huisman	edf2dc338c	admin and pdfparser updates Add some tooltip text to some elements in admin.php Merge recent PdfParser updates into the library	2023-07-11 10:24:55 -04:00
Brian Huisman	eda57224d9	Remove need for 'jw_depth' value By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js	2023-06-21 15:07:57 -04:00
Brian Huisman	675b25b1e4	Update admin.php Forgot a [0] dangit.	2023-06-19 12:31:47 -04:00
Brian Huisman	930d4fa793	Update admin.php Test user-supplied regular expression matches for validity before saving.	2023-06-19 12:12:58 -04:00
Brian Huisman	8ed10eac36	Tweaks Fix text1/text2 specification in page index SQL query. Add eszett also to single 's' for replacement.	2023-06-16 13:33:15 -04:00
Brian Huisman	e76fdf730c	s_show_orphans cleanup Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in. Also change generic '$select' variable to more semantic '$crawldata'.	2023-06-15 10:19:05 -04:00
Brian Huisman	a489fb1b8e	sp_smart => sp_punct Change sp_smart to sp_punct also in the offline javascript template.	2023-06-14 15:39:30 -04:00
Brian Huisman	4caa178acc	Update admin.php Tighten up the Offline javascript export statement	2023-06-14 15:16:03 -04:00
Brian Huisman	1ce34d9e41	Offline javascript output Make the javascript output text into a Mustache template. Add the jw_depth variable.	2023-06-14 14:43:55 -04:00
Brian Huisman	c069e44765	Unique IPs in modal Add a listing for "Unique IPs" in the modal popup on the Query Log page.	2023-06-14 10:27:54 -04:00
Brian Huisman	fd2bbf745f	Add 'resumed' flag to sp_progress Add a third value to the sp_progress config value to let the script know if a crawl was resumed or not. Also restore the sp_sha1 data from the crawltemp table on a resumed crawl.	2023-06-12 12:19:00 -04:00
Brian Huisman	fbca9808a2	Update admin.php Specify `t` table to avoid ambiguity	2023-06-07 12:21:56 -04:00
Brian Huisman	563eb6d014	Query log fixes, multibyte search restrict Get rid of "avg_results" value; it's not intuitive. Instead make sure to use the results tally from the last recorded search query. Use mb_strlen and mb_substr to avoid searching for single, but multibyte characters like bullet (•).	2023-06-07 11:45:14 -04:00
Brian Huisman	8b024c438c	Remove some unnecessary continues Also add documentation for the crawler debug mode. Scope fixes for JS output, still need to work on this.	2023-06-02 14:05:52 -04:00
Brian Huisman	56c84a89cb	Prevent endless loop If an orphan URL is blocked by a user rule, then remove it from the 'sp_exist' list so it doesn't keep coming back again and again. This only happens the next crawl after the user adds new rules. Other misc edits.	2023-06-01 12:20:09 -04:00
Brian Huisman	c7c4960e1e	Strict in_array checking	2023-05-30 15:01:24 -04:00
Brian Huisman	44edb43dd2	Update admin.php Change countup timer element from <span> to <time>. Separate time-since from duration.	2023-05-19 14:33:06 -04:00
Brian Huisman	08db10dcf5	Page titles and version numbers. Make Administration UI page title the name of the page we are on. Don't need the version number here. Display current version in the menu bar.	2023-05-19 13:55:34 -04:00
Brian Huisman	956af77655	Update admin.php Add a message when there are no pages in the Page Index to list so it's less confusing than just an empty table.	2023-05-19 12:47:23 -04:00
Brian Huisman	b2091b8cfb	Update admin.php Add display of # of unique IPs in Query Log	2023-05-18 09:11:03 -04:00
Brian Huisman	0f7ea69790	Store s_weights as JSON	2023-05-17 09:22:00 -04:00
Brian Huisman	d8e9d5dc91	Admin UI edits for when crawl is in progress Automatically encode/decode json when saving/reading ODATA config values. Remove 'sp_links_crawled' config table value, now stored in 'sp_progress'. Update Crawl Information window in real-time while crawler is running. Be more aggressive at reloading the page to get the latest data once a crawl has finished. Time the setting of certain config values while crawling in a more sensible way.	2023-05-16 12:00:28 -04:00
Brian Huisman	4bb28031b6	Enable downloading Page Index Allow downloading of the page index as a csv. Remove unnecessary database columns url_base and status_noindex Store list of domains at crawl so we don't need to request them every page-load; you will need to reinstall fresh because of this change	2023-05-12 10:06:57 -04:00
Brian Huisman	d82ee666c7	Update admin.php In the edge case where the same query is requested twice in the same second by different IPs, both would appear in the Query Log UI. Add a second GROUP BY to avoid this.	2023-05-08 07:32:39 -04:00
Brian Huisman	803155547d	Rename to sp_punct Rename sp_smart ("smart" punctuation) to the more general and accurate sp_punct	2023-05-05 11:54:07 -04:00
Brian Huisman	635422b1d6	Punctuation normalization and MIME-type display Disable Query log download button if query log is empty. Further database error resiliency. Add many more punctuation normalization characters; normalize on search as well as storage. Add count of MIME-types in Search Management UI.	2023-05-05 11:17:39 -04:00
Brian Huisman	e6777287d7	Update admin.php Add scrolling for queries that are too wide for mobile.	2023-05-04 15:06:30 -04:00
Brian Huisman	31421e47bc	Update admin.php Also add same geoip extra check for file download	2023-05-03 11:41:25 -04:00
Brian Huisman	8caf5440d3	Update admin.php Fix for geolocates that succeed but don't return a country code.	2023-05-03 11:32:24 -04:00
Brian Huisman	da235cdc95	Update admin.php More filter row simplification	2023-05-01 11:40:32 -04:00
Brian Huisman	8a5f5965b0	Update admin.php Tweak display of Filters row on Page Index	2023-05-01 11:18:49 -04:00
Brian Huisman	83f8fc9ed2	Javascript crawl support enhancement Don't require reloading the page after a crawl has completed. Javascript will dynamically update the Crawler Information values if we are on the Crawler Management page.	2023-04-28 13:55:26 -04:00
Brian Huisman	41f6b25f0f	Allow specifying Default Category	2023-04-27 13:10:22 -04:00
Brian Huisman	8405e903c3	Update admin.php Line up Page Index links a bit better	2023-04-26 20:25:56 -04:00
Brian Huisman	ba04173c29	Daily updates Keep Page Index pagination page within limits; add UTF-8 BOM to CSV and TXT download output; use utf8mb4_unicode_520_ci collation to remove need for SQL REGEXP; add more latin accent equivalent characters.	2023-04-26 15:16:13 -04:00
Brian Huisman	761491c21a	Update admin.php Make headers responsive too.	2023-04-25 15:14:18 -04:00
Brian Huisman	b3b40a9194	Implement filetype: searching	2023-04-24 16:31:27 -04:00
Brian Huisman	fc968ae460	Simplify, add titles to Download buttons	2023-04-24 13:47:18 -04:00
Brian Huisman	ab7ad64ac1	Remove 'Allowed' lol	2023-04-22 21:50:07 -04:00
Brian Huisman	fed2b979e1	Add query length limit option	2023-04-22 21:48:43 -04:00

1 2

67 commits