Commit Graph

25 Commits

Author SHA1 Message Date
Daoud Clarke 28b326aedf Fix broken JS 2023-11-07 18:59:38 +00:00
Daoud Clarke 8293a7afa4 Update query string 2023-11-05 21:45:13 +00:00
Daoud Clarke 36ec3ae4e5 Add database config 2023-10-26 17:32:46 +01:00
Daoud Clarke bd017079d5 Add login using allauth 2023-10-24 10:32:06 +01:00
Daoud Clarke 1227ae33c8 Run poetry lock 2023-10-10 20:21:37 +01:00
Daoud Clarke a55a027107 Store stats in redis 2023-09-29 13:37:54 +01:00
Daoud Clarke 019095a4c1 Exclude blacklisted domains 2023-09-22 21:53:53 +01:00
Daoud Clarke 8d64af4f1b Keep track of curated couments 2023-04-30 18:25:48 +01:00
Rishabh Singh Ahluwalia 30aff3b920 Add pytest, unit tests for completer,gh actions ci 2023-02-22 21:37:10 -08:00
Daoud Clarke d400950689 Add script to process historical data 2022-06-18 15:31:35 +01:00
Daoud Clarke a003914e91 Fix boto3 dependency 2022-06-17 22:14:55 +01:00
Daoud Clarke e2eb405083 Combine crawler and search servers 2022-06-16 22:49:41 +01:00
Daoud Clarke aaca8b2b6e Record historical batches via the API 2022-06-05 09:15:04 +01:00
Daoud Clarke af6a28fac3 Implement learning to rank feature extraction and thresholding 2022-03-20 22:01:45 +00:00
Daoud Clarke e6273c7f76 WIP: include metadata in index - using struct approach 2022-02-18 22:12:22 +00:00
Daoud Clarke 7d829bc319 Use python 3.10; complete terms 2022-01-30 23:24:00 +00:00
nitred a72a08a7d9 added config and binary/entrypoint for mwmbl.tinysearchengine
- using pydantic to validate the config
- added a default bootstrap config at config/tinysearchengine.yaml
- refactored app.py to include parsing CLI argument using argparse
- refactored app.py to use fewer global variables
- added "mwmbl-tinysearchengine" binary/entrypoint in pyproject.toml
- updated Dockerfile to work with these changes and added comments to it
2021-12-29 15:26:33 +01:00
nitred c02c052281 Fixes #12, Added dependencies for indexer as extra or extra_requires
- dependencies for indexer can be installed using "pip install .[indexer]" or "poetry install -E indexer"
2021-12-27 15:46:24 +01:00
Daoud Clarke 9c65bf3c8f WIP: implement docker image. TODO: copy index and set the correct index path using env var 2021-12-22 23:21:23 +00:00
Daoud Clarke 23eb341832 Add search page 2021-12-14 22:01:59 +00:00
Daoud Clarke 2844c1df75 Index common crawl data 2021-12-13 11:23:01 +00:00
Daoud Clarke 65b366d30d Add spacy 2021-12-12 20:58:44 +00:00
Daoud Clarke c46257c6d1 Use our own filesystem-based queue 2021-12-11 16:57:17 +00:00
Daoud Clarke 14817d7657 Optimise imports 2021-12-05 20:38:05 +00:00
Daoud Clarke 312f32bf61 Add common crawl extract script and dependency management with poetry 2021-12-05 20:31:49 +00:00