Commit Graph

512 Commits

Author SHA1 Message Date
Daoud Clarke f0592f99df Require a curated boolean flag 2023-04-13 06:27:51 +01:00
Daoud Clarke 759dbf07b9 Revert index 2023-04-13 05:37:43 +01:00
Daoud Clarke 00b5438492 Track curated items in the index 2023-04-09 06:26:23 +01:00
Daoud Clarke a87d3d6def Store curated pages in the index 2023-04-09 05:31:23 +01:00
Daoud Clarke 61cdd4dd71 Merge branch 'main' into user-registration 2023-04-01 07:17:29 +01:00
Daoud Clarke 3e1f5da28e Off by one error with page size 2023-04-01 06:40:03 +01:00
Daoud Clarke 91269d5100 Handle a bad batch 2023-04-01 06:35:44 +01:00
Rishabh Singh Ahluwalia e9dfd40ecb
Merge pull request #98 from mwmbl/rishabh-fix-trim-data
Fix trimming page size logic while adding to a page
2023-03-28 08:18:53 -07:00
Rishabh Singh Ahluwalia f232badd67 fix comma formatting 2023-03-27 22:18:10 -07:00
Rishabh Singh Ahluwalia 8e197a09f9 Fix trimming page size logic while adding to a page 2023-03-26 10:04:05 -07:00
Daoud Clarke 23688bd3ad Merge branch 'master' into user-registration 2023-03-18 22:37:45 +00:00
Daoud Clarke 0838157185
Merge pull request #97 from mwmbl/initialize-with-found-urls
Initialize with found urls
2023-02-25 18:20:11 +00:00
Daoud Clarke 7d0c55c015 Fix broken test 2023-02-25 18:18:09 +00:00
Daoud Clarke e5c08e0d24 Fix big with other URLs 2023-02-25 16:48:59 +00:00
Daoud Clarke a24156ce5c Initialize URLs by processing them like all other URLs to avoid bias 2023-02-25 13:45:03 +00:00
Daoud Clarke 6bb8bdf0c2 Initialize with new URLs 2023-02-25 10:48:22 +00:00
Daoud Clarke a9e2b48840
Merge pull request #96 from mwmbl/unique-urls-in-queue
Unique URLs in queue
2023-02-25 10:35:32 +00:00
Daoud Clarke 5c94dfa669 Shuffle URLs before batching 2023-02-25 10:35:10 +00:00
Daoud Clarke 6ff62fb119 Ensure URLs in queue are unique 2023-02-25 10:34:09 +00:00
Daoud Clarke c36e1dffcb Remove picolisp as a top domain since there are duplicate URLs 2023-02-25 09:56:26 +00:00
Daoud Clarke 362f9bfa9e Write page to the correct location (metadata size offset bug fix) 2023-02-24 21:46:18 +00:00
Daoud Clarke 5616626fc1
Merge pull request #89 from mwmbl/update-urls-queue-quickly
Update urls queue quickly
2023-02-24 21:39:40 +00:00
Daoud Clarke bc6be8b6d5 Merge branch 'master' into update-urls-queue-quickly 2023-02-24 21:37:54 +00:00
Daoud Clarke a03b76e5cc Fix broken test 2023-02-24 21:37:32 +00:00
Daoud Clarke c97d946fcf Go back to processing 10,000 batches at a time 2023-02-24 21:29:42 +00:00
Rishabh Singh Ahluwalia 38a5dbbf3c
Merge pull request #94 from mwmbl/rishabh-port-configuration
Allow configuration of port
2023-02-23 07:31:07 -08:00
Rishabh Singh Ahluwalia 2aa61a5121
Merge pull request #95 from mwmbl/rishabh-unit-testing-with-ci
Add PyUnit dependency + Unit Tests for completer.py + Github Actions CI for running unit tests
2023-02-23 07:30:48 -08:00
Rishabh Singh Ahluwalia 30aff3b920 Add pytest, unit tests for completer,gh actions ci 2023-02-22 21:37:10 -08:00
Rishabh Singh Ahluwalia 842aec19e2 Add port to args 2023-02-22 19:59:42 -08:00
Daoud Clarke 50a059410b
Merge pull request #93 from mwmbl/add-code-of-conduct-1
Create CODE_OF_CONDUCT.md
2023-02-15 20:36:31 +00:00
Rishabh Singh Ahluwalia 084a870f65
Merge pull request #92 from mwmbl/rishabh-add-launch-json
Add launch.json for vscode run/debugging
2023-02-12 07:17:47 -08:00
Daoud Clarke 68ecdee145
Create CONTRIBUTING.md 2023-02-11 15:17:35 +00:00
Daoud Clarke 3a07fb54b5
Create CODE_OF_CONDUCT.md 2023-02-11 15:13:08 +00:00
Daoud Clarke d8dbe54f9c
Update README.md 2023-02-11 15:10:30 +00:00
Daoud Clarke 2daf902ca3
Merge pull request #90 from mwmbl/m1-mmap-issue-fix-2
Offset by metadata size manually to increase compatibility
2023-02-11 08:30:46 +00:00
Rishabh Singh Ahluwalia 7fdc8480bd add launch.json for vscode debugging 2023-02-10 20:59:09 -08:00
Daoud Clarke e890e56661 Offset by metadata size manually to increase compatibility 2023-02-05 15:49:09 +00:00
Daoud Clarke 5783cee6b7 Fix bugs 2023-01-24 22:52:58 +00:00
Daoud Clarke 77e39b4a89 Optimise URL update 2023-01-22 20:28:18 +00:00
Daoud Clarke 66700f8a3e Speed up domain parsing 2023-01-20 20:53:50 +00:00
Daoud Clarke 2b36f2ccc1 Try and balance URLs before adding to queue 2023-01-19 21:56:40 +00:00
Daoud Clarke 603fcd4eb2 Create a custom URL queue 2023-01-14 21:59:31 +00:00
Daoud Clarke 01f08fd88d Return updated URLs 2023-01-14 19:17:16 +00:00
Daoud Clarke bd0cc3863e Don't try and update an empty list of URLs 2023-01-09 21:02:40 +00:00
Daoud Clarke d347a17d63 Update URL queue separately from the other background process to speed it up 2023-01-09 20:50:28 +00:00
Daoud Clarke 7bd12c1ead Fix some bugs in URL fetching query 2023-01-02 20:51:23 +00:00
Daoud Clarke a50f1d8ae3 Fix postgres install 2023-01-02 12:19:10 +00:00
Daoud Clarke 1ab16b1fb4 Install postgres client 2023-01-02 12:18:03 +00:00
Daoud Clarke dda5a25ad0 Add core domains 2023-01-02 12:05:22 +00:00
Daoud Clarke ab37bbe0a5 Exclude google plus 2023-01-01 22:18:47 +00:00