Update readme for recent changes

This commit is contained in:
Daoud Clarke 2022-02-04 22:07:09 +00:00
parent 51f2dd2690
commit c4e86ce313

View file

@ -14,6 +14,19 @@ crawler is still to be implemented.
Our vision is a community working to provide top quality search Our vision is a community working to provide top quality search
particularly for hackers, funded purely by donations. particularly for hackers, funded purely by donations.
Crawling
========
**Update 2022-02-05:** We now have a distributed crawler that runs on
our volunteers' machines! If you have Firefox you can help out by
[installing our
extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/). This
will crawl the web in the background, retrieving one page a second. It
does not use or access any of your personal data. Instead it crawls
the web at random, using the top scoring sites on Hacker News as seed
pages. After extracting a summary of each page, it batches these up
and sends the data to a central server to be stored and indexed.
Why a non-profit search engine? Why a non-profit search engine?
=============================== ===============================
@ -87,26 +100,16 @@ single term and maintain an index smaller than the inverted index
design. Well, that's the theory. This idea has yet to be tested out on design. Well, that's the theory. This idea has yet to be tested out on
a large scale. a large scale.
Crawling
========
Our current index is a small sample of the excellent Common Crawl,
restricted to English content and domains which score highly on
average in Hacker News submissions. It is likely for a variety of
reasons that we will want to go beyond Common Crawl data at some
point, so building a crawler becomes inevitable. We plan to start work
on a distributed crawler, probably implemented as a browser extension
that can be installed by volunteers.
How to contribute How to contribute
================= =================
There are lots of ways to help: There are lots of ways to help:
- [Help us crawl the
web](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/)
- [Donate some money](https://opencollective.com/mwmbl) towards
hosting costs and supporting our volunteers
- Give feedback/suggestions - Give feedback/suggestions
- Volunteer to test out the distributed crawler when it's ready
- Help out with development of the engine itself - Help out with development of the engine itself
- Donate some money towards hosting costs and/or founding an official
non-profit organisation
If you would like to help in any of these or other ways, thank you! If you would like to help in any of these or other ways, thank you!
Please join our [Matrix chat Please join our [Matrix chat
@ -123,7 +126,7 @@ Development
4. Run `$ docker run -p 8080:8080 mwmbl` 4. Run `$ docker run -p 8080:8080 mwmbl`
### Local Testing ### Local Testing
1. Create and activate a python (3.9) environment using any tool you like e.g. poetry,venv, conda etc. 1. Create and activate a python (3.10) environment using any tool you like e.g. poetry,venv, conda etc.
2. Run `$ pip install .` 2. Run `$ pip install .`
3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml` 3. Run `$ mwmbl-tinysearchengine --config config/tinysearchengine.yaml`
@ -132,4 +135,7 @@ Frequently Asked Question
### How do you pronounce "mwmbl"? ### How do you pronounce "mwmbl"?
Like "mumble". I live in [Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!" Like "mumble". I live in
[Mumbles](https://en.wikipedia.org/wiki/Mumbles), which is spelt
"Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in
"don't search, just mwmbl!"