From c4e86ce3136a53cf08a347644d22eef42a52e90a Mon Sep 17 00:00:00 2001 From: Daoud Clarke Date: Fri, 4 Feb 2022 22:07:09 +0000 Subject: [PATCH] Update readme for recent changes --- README.md | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 79e7039..51f6c7d 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,19 @@ crawler is still to be implemented. Our vision is a community working to provide top quality search particularly for hackers, funded purely by donations. +Crawling +======== + +**Update 2022-02-05:** We now have a distributed crawler that runs on +our volunteers' machines! If you have Firefox you can help out by +[installing our +extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/). This +will crawl the web in the background, retrieving one page a second. It +does not use or access any of your personal data. Instead it crawls +the web at random, using the top scoring sites on Hacker News as seed +pages. After extracting a summary of each page, it batches these up +and sends the data to a central server to be stored and indexed. + Why a non-profit search engine? =============================== @@ -87,26 +100,16 @@ single term and maintain an index smaller than the inverted index design. Well, that's the theory. This idea has yet to be tested out on a large scale. -Crawling -======== - -Our current index is a small sample of the excellent Common Crawl, -restricted to English content and domains which score highly on -average in Hacker News submissions. It is likely for a variety of -reasons that we will want to go beyond Common Crawl data at some -point, so building a crawler becomes inevitable. We plan to start work -on a distributed crawler, probably implemented as a browser extension -that can be installed by volunteers. - How to contribute ================= There are lots of ways to help: + - [Help us crawl the + web](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/) + - [Donate some money](https://opencollective.com/mwmbl) towards + hosting costs and supporting our volunteers - Give feedback/suggestions - - Volunteer to test out the distributed crawler when it's ready - Help out with development of the engine itself - - Donate some money towards hosting costs and/or founding an official - non-profit organisation If you would like to help in any of these or other ways, thank you! Please join our [Matrix chat @@ -123,7 +126,7 @@ Development 4. Run `$ docker run -p 8080:8080 mwmbl` ### Local Testing -1. Create and activate a python (3.9) environment using any tool you like e.g. poetry,venv, conda etc. +1. Create and activate a python (3.10) environment using any tool you like e.g. poetry,venv, conda etc. 2. Run `$ pip install .` 3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml` @@ -132,4 +135,7 @@ Frequently Asked Question ### How do you pronounce "mwmbl"? -Like "mumble". I live in [Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!" +Like "mumble". I live in +[Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt +"Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in +"don't search, just mwmbl!"