beenull/orcinus-search

Author	SHA1	Message	Date
Brian Huisman	1860d1f8ce	Totally forgot to actually implement this feature The "remove text from titles" feature was coded into the admin UI from the previous version, but was never actually implemented in the crawler. Wow. It works now.	2023-09-27 15:33:06 -04:00
Brian Huisman	6c961d44a3	Add Query Log row display limit	2023-09-25 11:53:20 -04:00
Brian Huisman	f8bed73c26	Responsive pagination bar on Page Index	2023-09-15 10:32:59 -04:00
Brian Huisman	873a18fbc9	Add text fragments flag and functionality	2023-09-14 13:18:33 -04:00
Brian Huisman	da52e0f7bf	Add header image and link Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something? Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip	2023-09-14 10:33:59 -04:00
Brian Huisman	4c78a5245f	Use REPLACE INTO for resiliency	2023-09-12 10:44:14 -04:00
Brian Huisman	d4e0e409fe	Show Page Titles checkbox on Page Index Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.	2023-09-11 13:32:45 -04:00
Brian Huisman	dd88459d04	Update admin.php PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.	2023-09-11 12:18:33 -04:00
Brian Huisman	511207e0b2	Add PDF Last Modified multiplier	2023-09-08 15:11:27 -04:00
Brian Huisman	302c8db00e	Group statements Group several statements into single statements. You might not like it, but this makes me happy. :)	2023-07-21 14:17:15 -04:00
Brian Huisman	382511077a	Misc updates Prettify some SQL code. Add some error-reponse code for fatal failed SQL statements.	2023-07-21 13:04:51 -04:00
Brian Huisman	edf2dc338c	admin and pdfparser updates Add some tooltip text to some elements in admin.php Merge recent PdfParser updates into the library	2023-07-11 10:24:55 -04:00
Brian Huisman	229129a9e4	Update crawler.php Get and set sp_crawling in real-time to minimize race conditions.	2023-07-06 15:09:31 -04:00
Brian Huisman	181addfd3d	Update PDFDocEncoding.php Add Yours Truly as the author of this file. I'm humble!	2023-07-04 14:06:28 -04:00
Brian Huisman	a5ff604f58	Update crawler.php Update crawler.php to also try using XMP metadata from updated PDFParser	2023-07-04 13:46:12 -04:00
Brian Huisman	06cc7fe325	PdfParser update This update adds XMP metadata and PDFDocEncoding support for regular metadata.	2023-07-04 13:08:42 -04:00
Brian Huisman	30630c6c60	Start enforcing PHP and SQL version limitations.	2023-06-26 15:00:42 -04:00
Brian Huisman	5a39280858	PdfParser PHP-CS-Fixer updates	2023-06-23 12:23:20 -04:00
Brian Huisman	3307baac4d	Update crawler.php Run mb_convert_encoding in ALL cases to remove potentially invalid UTF-8 characters. Add the "replacement" UTF-8 character to the whitespace array to ensure it's removed.	2023-06-22 15:35:40 -04:00
Brian Huisman	b12e7991e0	Update PdfParser to latest snapshot Update PdfParser to the latest snapshot from the repo. Add code to allow PdfParser to decode XMP Metadata from PDF files, preferring it over other decoded data.	2023-06-22 15:33:53 -04:00
Brian Huisman	47562e0a71	Add 'online' value for Mustache template Provide an 'online' value to the Search Result Mustache template. This will, for example, allow you to put things in your Search Result template that will show up when your site is displayed live (PHP), but will not be output when your site is displayed using the offline Javascript, and vice versa. eg. {{#online}} This will only display in your template if it's online. {{/online}} {{^online}} This will only display in your template it it's offline. {{/online}}	2023-06-22 09:57:33 -04:00
Brian Huisman	042339d3ef	Update crawler.php Don't assume that other data from a PDF is the same as the content. Bypasses some still-unfixed PDFParser encoding issues. Also exit the crawler script if we are in debug mode and there is a crawl already running.	2023-06-21 17:23:08 -04:00
Brian Huisman	eda57224d9	Remove need for 'jw_depth' value By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js	2023-06-21 15:07:57 -04:00
Brian Huisman	b17a68c175	Update template.offline.js Quote jw_depth string.	2023-06-21 12:09:28 -04:00
Brian Huisman	675b25b1e4	Update admin.php Forgot a [0] dangit.	2023-06-19 12:31:47 -04:00
Brian Huisman	930d4fa793	Update admin.php Test user-supplied regular expression matches for validity before saving.	2023-06-19 12:12:58 -04:00
Brian Huisman	0a83546411	Update crawler.php Make sure regexp lines in require and ignore URL fields are actually treated as regexps.	2023-06-19 11:51:57 -04:00
Brian Huisman	e9c0654295	Update search.js Tweak the comment	2023-06-16 14:47:51 -04:00
Brian Huisman	54bbbb6a65	Log clicked search suggestions If the search UI is using typeahead and the user selects a suggested option to go right to a page, then a search is never logged as a search query; it's like the search never happened. Add a fetch request to log the search query just before sending the user on their way to the page.	2023-06-16 14:38:24 -04:00
Brian Huisman	8ed10eac36	Tweaks Fix text1/text2 specification in page index SQL query. Add eszett also to single 's' for replacement.	2023-06-16 13:33:15 -04:00
Brian Huisman	e76fdf730c	s_show_orphans cleanup Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in. Also change generic '$select' variable to more semantic '$crawldata'.	2023-06-15 10:19:05 -04:00
Brian Huisman	e440babc38	Directly reference jw_depth Don't depend on the id="os_results" element existing in the user template, just use os_odata.jw_depth directly.	2023-06-15 09:48:28 -04:00
Brian Huisman	a489fb1b8e	sp_smart => sp_punct Change sp_smart to sp_punct also in the offline javascript template.	2023-06-14 15:39:30 -04:00
Brian Huisman	4caa178acc	Update admin.php Tighten up the Offline javascript export statement	2023-06-14 15:16:03 -04:00
Brian Huisman	1ce34d9e41	Offline javascript output Make the javascript output text into a Mustache template. Add the jw_depth variable.	2023-06-14 14:43:55 -04:00
Brian Huisman	c069e44765	Unique IPs in modal Add a listing for "Unique IPs" in the modal popup on the Query Log page.	2023-06-14 10:27:54 -04:00
Brian Huisman	fd2bbf745f	Add 'resumed' flag to sp_progress Add a third value to the sp_progress config value to let the script know if a crawl was resumed or not. Also restore the sp_sha1 data from the crawltemp table on a resumed crawl.	2023-06-12 12:19:00 -04:00
Brian Huisman	6d9d897784	Update pdfparser to 2.5.0	2023-06-12 09:05:03 -04:00
Brian Huisman	5cfeb0a414	Update crawler.php Also rebuild the domains list if a crawl is resumed.	2023-06-08 09:03:45 -04:00
Brian Huisman	fbca9808a2	Update admin.php Specify `t` table to avoid ambiguity	2023-06-07 12:21:56 -04:00
Brian Huisman	563eb6d014	Query log fixes, multibyte search restrict Get rid of "avg_results" value; it's not intuitive. Instead make sure to use the results tally from the last recorded search query. Use mb_strlen and mb_substr to avoid searching for single, but multibyte characters like bullet (•).	2023-06-07 11:45:14 -04:00
Brian Huisman	b9d0ff1665	Update search.php Using INSTR was correctly matching searches for 'ae' to the ligature æ, but was not matching searches for plain 'a' to å. However, using LIKE behaves exactly the opposite of this. Unless there is a better solution, use both INSTR and LIKE to create the query so all bases are covered.	2023-06-07 08:52:59 -04:00
Brian Huisman	87ecb553a7	Update crawler.php Whoops, remove debug code.	2023-06-05 11:00:40 -04:00
Brian Huisman	783f1d97ca	Update crawler.php Merge function of $updateNotModified SQL statement with $insertNotModified.	2023-06-05 10:58:32 -04:00
Brian Huisman	8b024c438c	Remove some unnecessary continues Also add documentation for the crawler debug mode. Scope fixes for JS output, still need to work on this.	2023-06-02 14:05:52 -04:00
Brian Huisman	56c84a89cb	Prevent endless loop If an orphan URL is blocked by a user rule, then remove it from the 'sp_exist' list so it doesn't keep coming back again and again. This only happens the next crawl after the user adds new rules. Other misc edits.	2023-06-01 12:20:09 -04:00
Brian Huisman	3f9d713633	Simplify logging	2023-05-30 16:12:01 -04:00
Brian Huisman	727936cb80	Update crawler.php $url => $row['url'] fix, and a couple other tweaks.	2023-05-30 16:07:41 -04:00
Brian Huisman	89f6fc2393	Try to resume a failed crawl. Attempt to resume a crawl if it exited without going through the shutdown function	2023-05-30 15:53:41 -04:00
Brian Huisman	c7c4960e1e	Strict in_array checking	2023-05-30 15:01:24 -04:00

1 2 3 4

160 commits