Commit graph

314 commits

Author SHA1 Message Date
Rishabh Singh Ahluwalia 30aff3b920 Add pytest, unit tests for completer,gh actions ci 2023-02-22 21:37:10 -08:00
Daoud Clarke 50a059410b
Merge pull request #93 from mwmbl/add-code-of-conduct-1
Create CODE_OF_CONDUCT.md
2023-02-15 20:36:31 +00:00
Rishabh Singh Ahluwalia 084a870f65
Merge pull request #92 from mwmbl/rishabh-add-launch-json
Add launch.json for vscode run/debugging
2023-02-12 07:17:47 -08:00
Daoud Clarke 68ecdee145
Create CONTRIBUTING.md 2023-02-11 15:17:35 +00:00
Daoud Clarke 3a07fb54b5
Create CODE_OF_CONDUCT.md 2023-02-11 15:13:08 +00:00
Daoud Clarke d8dbe54f9c
Update README.md 2023-02-11 15:10:30 +00:00
Daoud Clarke 2daf902ca3
Merge pull request #90 from mwmbl/m1-mmap-issue-fix-2
Offset by metadata size manually to increase compatibility
2023-02-11 08:30:46 +00:00
Rishabh Singh Ahluwalia 7fdc8480bd add launch.json for vscode debugging 2023-02-10 20:59:09 -08:00
Daoud Clarke e890e56661 Offset by metadata size manually to increase compatibility 2023-02-05 15:49:09 +00:00
Daoud Clarke bd0cc3863e Don't try and update an empty list of URLs 2023-01-09 21:02:40 +00:00
Daoud Clarke d347a17d63 Update URL queue separately from the other background process to speed it up 2023-01-09 20:50:28 +00:00
Daoud Clarke 7bd12c1ead Fix some bugs in URL fetching query 2023-01-02 20:51:23 +00:00
Daoud Clarke a50f1d8ae3 Fix postgres install 2023-01-02 12:19:10 +00:00
Daoud Clarke 1ab16b1fb4 Install postgres client 2023-01-02 12:18:03 +00:00
Daoud Clarke dda5a25ad0 Add core domains 2023-01-02 12:05:22 +00:00
Daoud Clarke ab37bbe0a5 Exclude google plus 2023-01-01 22:18:47 +00:00
Daoud Clarke 2336ed7f7d Allow posting extra links with lower score weighting 2023-01-01 20:37:41 +00:00
Daoud Clarke 6edf48693b Check the domain is correct, potential bug in psql 2023-01-01 01:30:44 +00:00
Daoud Clarke b7984684c9 Tidy, improve logging 2023-01-01 01:14:05 +00:00
Daoud Clarke 7c14cd99f8 Update the URL queue earlier 2022-12-31 23:37:59 +00:00
Daoud Clarke 0d33b4f68f
Merge pull request #86 from mwmbl/improve-crawling
Improve crawling
2022-12-31 22:56:21 +00:00
Daoud Clarke a86e172bf3 Reinstate background tasks 2022-12-31 22:52:17 +00:00
Daoud Clarke d9cd3c585b Get results from other domains 2022-12-31 22:51:00 +00:00
Daoud Clarke 77f08d8f0a Update URL status 2022-12-31 22:25:05 +00:00
Daoud Clarke 36af579f7c Sample domains 2022-12-31 17:04:38 +00:00
Daoud Clarke ea16e7b5cd WIP: improve method of getting URLs for crawling 2022-12-31 13:37:40 +00:00
Daoud Clarke 7dae39b780 WIP: improve method of getting URLs for crawling 2022-12-31 13:32:15 +00:00
Daoud Clarke c69108cfcc Don't delete an index if the sizes don't match 2022-12-27 10:52:46 +00:00
Daoud Clarke bb8a36a612 Number of pages is an int 2022-12-27 10:40:53 +00:00
Daoud Clarke c01129cdb9 Merge branch 'master' of github.com:mwmbl/mwmbl 2022-12-27 10:25:41 +00:00
Daoud Clarke 26351a1072 Use the correct storage location in prod 2022-12-27 10:24:48 +00:00
Daoud Clarke f3f3831a97
Merge pull request #83 from omasanori/spacy-deps-rework
Rework installation of spaCy models for clarity
2022-12-27 10:20:52 +00:00
Masanori Ogino 71187a3938 Rework installation of spaCy models for clarity
- Install the wheel package for compatibility with future pip
- Use `spacy download` for installing model(s)
- Use `spacy validate` for checking model compatibility explicitly

Signed-off-by: Masanori Ogino <167209+omasanori@users.noreply.github.com>
2022-12-27 11:33:52 +09:00
Daoud Clarke d85067ec09 Remove apt command 2022-12-24 20:20:53 +00:00
Daoud Clarke 1ef60e8d5d Put install in correct place 2022-12-24 20:18:02 +00:00
Daoud Clarke 8e613dd368 Install psql client 2022-12-24 20:13:53 +00:00
Daoud Clarke 80282cfc7a Exclude a domain 2022-12-24 19:59:56 +00:00
Daoud Clarke 57295846cb
Update README.md 2022-12-21 21:49:56 +00:00
Daoud Clarke efc8e8e383
Merge pull request #78 from mwmbl/make-dev-easier
Make it easier to run mwmbl locally
2022-12-19 21:50:54 +00:00
Daoud Clarke f8ab6092b0 Suggest using dokku instead of docker directly 2022-12-08 22:33:58 +00:00
Daoud Clarke a50bc28436 Make it easier to rum mwmbl locally 2022-12-07 20:01:31 +00:00
Daoud Clarke c0f89ba6c3
Update matrix badge 2022-12-05 18:47:26 +00:00
Daoud Clarke dd4dd8a752 Exclude an annoying web site 2022-12-02 21:29:06 +00:00
Daoud Clarke 40f9eade9a Update index name 2022-08-27 09:38:39 +01:00
Daoud Clarke b6183e00ea
Merge pull request #74 from mwmbl/evaluate-indexing
Evaluate indexing
2022-08-27 09:37:22 +01:00
Daoud Clarke cf253ae524 Split out URL updating from indexing 2022-08-26 22:20:35 +01:00
Daoud Clarke f4fb9f831a Use terms and bigrams from the beginning of the string only 2022-08-26 17:20:11 +01:00
Daoud Clarke 619b6c3a93 Don't remove stopwords 2022-08-24 21:08:33 +01:00
Daoud Clarke 578b705609 Don't replace full stops and commas 2022-08-23 22:06:43 +01:00
Daoud Clarke 4779371cf3 Use a custom tokenizer 2022-08-23 21:57:38 +01:00