Doc : fix whitelists documentation + document data for parsers/scenarios + document expr helpers + link taxonomy (#126)

This commit is contained in:
Thibault "bui" Koechlin 2020-07-08 10:58:20 +02:00 committed by GitHub
parent c1c1a33dd3
commit a0c1ca49d0
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 205 additions and 35 deletions

View file

@ -282,6 +282,30 @@ statics:
expression: evt.Meta.target_field + ' this_is' + ' a dynamic expression'
```
### data
```
data:
- source_url: https://URL/TO/FILE
dest_file: LOCAL_FILENAME
[type: regexp]
```
`data` allows user to specify an external source of data.
This section is only relevant when `cscli` is used to install parser from hub, as it will download the `source_url` and store it to `dest_file`. When the parser is not installed from the hub, {{crowdsec.name}} won't download the URL, but the file must exist for the parser to be loaded correctly.
If `type` is set to `regexp`, the content of the file must be one valid (re2) regular expression per line.
Those regexps will be compiled and kept in cache.
```yaml
name: crowdsecurity/cdn-whitelist
...
data:
- source_url: https://www.cloudflare.com/ips-v4
dest_file: cloudflare_ips.txt
```
## Parser concepts

View file

@ -347,3 +347,28 @@ overflow_filter: any(queue.Queue, { .Enriched.IsInEU == "true" })
If this expression is present and returns false, the overflow will be discarded.
### data
```
data:
- source_url: https://URL/TO/FILE
dest_file: LOCAL_FILENAME
[type: regexp]
```
`data` allows user to specify an external source of data.
This section is only relevant when `cscli` is used to install scenario from hub, as ill download the `source_url` and store it to `dest_file`. When the scenario is not installed from the hub, {{crowdsec.name}} won't download the URL, but the file must exist for the scenario to be loaded correctly.
If `type` is set to `regexp`, the content of the file must be one valid (re2) regular expression per line.
Those regexps will be compiled and kept in cache.
```yaml
name: crowdsecurity/cdn-whitelist
...
data:
- source_url: https://www.cloudflare.com/ips-v4
dest_file: cloudflare_ips.txt
```

View file

@ -0,0 +1,52 @@
# Expressions
> {{expr.htmlname}} : Expression evaluation engine for Go: fast, non-Turing complete, dynamic typing, static typing
Several places of {{crowdsec.name}}'s configuration use {{expr.htmlname}} :
- {{filter.Htmlname}} that are used to determine events eligibility in {{parsers.htmlname}} and {{scenarios.htmlname}} or `profiles`
- {{statics.Htmlname}} use expr in the `expression` directive, to compute complex values
- {{whitelists.Htmlname}} rely on `expression` directive to allow more complex whitelists filters
To learn more about {{expr.htmlname}}, [check the github page of the project](https://github.com/antonmedv/expr/blob/master/docs/Language-Definition.md).
In order to makes its use in {{crowdsec.name}} more efficient, we added a few helpers that are documented bellow.
## Atof(string) float64
Parses a string representation of a float number to an actual float number (binding on `strconv.ParseFloat`)
> Atof(evt.Parsed.tcp_port)
## JsonExtract(JsonBlob, FieldName) string
Extract the `FieldName` from the `JsonBlob` and returns it as a string. (binding on [jsonparser](https://github.com/buger/jsonparser/))
> JsonExtract(evt.Parsed.some_json_blob, "foo.bar[0].one_item")
## File(FileName) []string
Returns the content of `FileName` as an array of string, while providing cache mechanism.
> evt.Parsed.some_field in File('some_patterns.txt')
> any(File('rdns_seo_bots.txt'), { evt.Enriched.reverse_dns endsWith #})
## RegexpInFile(StringToMatch, FileName) bool
Returns `true` if the `StringToMatch` is matched by one of the expressions contained in `FileName` (uses RE2 regexp engine).
> RegexpInFile( evt.Enriched.reverse_dns, 'my_legit_seo_whitelists.txt')
## Upper(string) string
Returns the uppercase version of the string
> Upper("yop")
## IpInRange(IPStr, RangeStr) bool
Returns true if the IP `IPStr` is contained in the IP range `RangeStr` (uses `net.ParseCIDR`)
> IpInRange("1.2.3.4", "1.2.3.0/24")

View file

@ -1,15 +1,28 @@
## Where are whitelists
# What are whitelists
Whitelists are, as for most configuration, YAML files, and allow you to "discard" signals based on :
Whitelists are special parsers that allow you to "discard" events, and can exist at two different steps :
- ip adress or the fact that it belongs to a specific range
- a {{expr.name}} expression
- *Parser whitelists* : Allows you to discard an event at parse time, so that it never hits the buckets.
- *PostOverflow whitelists* : Those are whitelists that are checked *after* the overflow happens. It is usually best for whitelisting process that can be expensive (such as performing reverse DNS on an IP, or performing a `whois` of an IP).
Here is an example :
!!! info
While the whitelists are the same for parser or postoverflows, beware that field names might change.
Source ip is usually in `evt.Meta.source_ip` when it's a log, but `evt.Overflow.Source_ip` when it's an overflow
The whitelist can be based on several criteria :
- specific ip address : if the event/overflow IP is the same, event is whitelisted
- ip ranges : if the event/overflow IP belongs to this range, event is whitelisted
- a list of {{expr.htmlname}} expressions : if any expression returns true, event is whitelisted
Here is an example showcasing configuration :
```yaml
name: crowdsecurity/my-whitelists
description: "Whitelist events from my ipv4 addresses"
#it's a normal parser, so we can restrict its scope with filter
filter: "1 == 1"
whitelist:
reason: "my ipv4 ranges"
ip:
@ -19,67 +32,75 @@ whitelist:
- "10.0.0.0/8"
- "172.16.0.0/12"
expression:
- "'mycorp.com' in evt.Meta.source_ip_rdns"
#beware, this one will work *only* if you enabled the reverse dns (crowdsecurity/rdns) enrichment postoverflow parser
- evt.Enriched.reverse_dns endsWith ".mycoolorg.com."
#this one will work *only* if you enabled the geoip (crowdsecurity/geoip-enrich) enrichment parser
- evt.Enriched.IsoCode == 'FR'
```
## Hands on
Let's assume we have a setup with a `crowdsecurity/base-http-scenarios` scenario enabled and no whitelists.
# Whitelists in parsing
When a whitelist is present in parsing `/etc/crowdsec/config/parsers/...`, it will be checked/discarded before being poured to any bucket. These whitelists intentionally generate no logs and are useful to discard noisy false positive sources.
## Whitelist by ip
Let's assume we have a setup with a `crowdsecurity/nginx` collection enabled and no whitelists.
Thus, if I "attack" myself :
```bash
nikto -host 127.0.0.1
nikto -host myfqdn.com
```
my own IP will be flagged as being an attacker :
```bash
$ tail -f /var/log/crowdsec.log
time="07-05-2020 09:23:03" level=warning msg="127.0.0.1 triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-scan-uniques_404]" bucket_id=old-surf event_time="2020-05-07 09:23:03.322277347 +0200 CEST m=+57172.732939890" scenario=crowdsecurity/http-scan-uniques_404 source_ip=127.0.0.1
time="07-05-2020 09:23:03" level=warning msg="127.0.0.1 triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-crawl-non_statics]" bucket_id=lingering-sun event_time="2020-05-07 09:23:03.345341864 +0200 CEST m=+57172.756004380" scenario=crowdsecurity/http-crawl-non_statics source_ip=127.0.0.1
ime="07-07-2020 16:13:16" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-bad-user-agent]" bucket_id=cool-smoke event_time="2020-07-07 16:13:16.579581642 +0200 CEST m=+358819.413561109" scenario=crowdsecurity/http-bad-user-agent source_ip=80.x.x.x
time="07-07-2020 16:13:16" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-probing]" bucket_id=green-silence event_time="2020-07-07 16:13:16.737579458 +0200 CEST m=+358819.571558901" scenario=crowdsecurity/http-probing source_ip=80.x.x.x
time="07-07-2020 16:13:17" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-crawl-non_statics]" bucket_id=purple-snowflake event_time="2020-07-07 16:13:17.353641625 +0200 CEST m=+358820.187621068" scenario=crowdsecurity/http-crawl-non_statics source_ip=80.x.x.x
time="07-07-2020 16:13:18" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-sensitive-files]" bucket_id=small-hill event_time="2020-07-07 16:13:18.005919055 +0200 CEST m=+358820.839898498" scenario=crowdsecurity/http-sensitive-files source_ip=80.x.x.x
^C
$ {{cli.bin}} ban list
1 local decisions:
+--------+-----------+-------------------------------------+------+--------+---------+----+--------+------------+
| SOURCE | IP | REASON | BANS | ACTION | COUNTRY | AS | EVENTS | EXPIRATION |
+--------+-----------+-------------------------------------+------+--------+---------+----+--------+------------+
| local | 127.0.0.1 | crowdsecurity/http-scan-uniques_404 | 2 | ban | | 0 | 47 | 3h55m57s |
+--------+-----------+-------------------------------------+------+--------+---------+----+--------+------------+
4 local decisions:
+--------+---------------+-----------------------------------+------+--------+---------+---------------------------+--------+------------+
| SOURCE | IP | REASON | BANS | ACTION | COUNTRY | AS | EVENTS | EXPIRATION |
+--------+---------------+-----------------------------------+------+--------+---------+---------------------------+--------+------------+
| local | 80.x.x.x | crowdsecurity/http-bad-user-agent | 4 | ban | FR | 21502 SFR SA | 60 | 3h59m3s |
...
```
## Create the whitelist by IP
Let's create a `/etc/crowdsec/crowdsec/parsers/s02-enrich/whitelists.yaml` file with the following content :
### Create the whitelist by IP
Let's create a `/etc/crowdsec/crowdsec/parsers/s02-enrich/mywhitelists.yaml` file with the following content :
```yaml
name: crowdsecurity/whitelists
description: "Whitelist events from private ipv4 addresses"
description: "Whitelist events from my ip addresses"
whitelist:
reason: "private ipv4 ranges"
ip:
- "127.0.0.1"
reason: "my ip ranges"
ip:
- "80.x.x.x"
```
and restart {{crowdsec.name}} : `sudo systemctl restart {{crowdsec.name}}`
and reload {{crowdsec.name}} : `sudo systemctl restart crowdsec`
## Test the whitelist
### Test the whitelist
Thus, if we restart our attack :
```bash
nikto -host 127.0.0.1
nikto -host myfqdn.com
```
And we don't get bans, instead :
And we don't get bans :
```bash
$ tail -f /var/log/crowdsec.log
...
time="07-05-2020 09:30:13" level=info msg="Event from [127.0.0.1] is whitelisted by Ips !" filter= name=lively-firefly stage=s02-enrich
...
^C
$ {{cli.bin}} ban list
No local decisions.
@ -87,11 +108,12 @@ And 21 records from API, 15 distinct AS, 12 distinct countries
```
Here, we don't get *any* logs, as the event have been discarded at parsing time.
## Create whitelist by expression
Now, let's make something more tricky : let's whitelist a **specific** user-agent (of course, it's just an example, don't do this at home !).
Now, let's make something more tricky : let's whitelist a **specific** user-agent (of course, it's just an example, don't do this at home !). The [hub's taxonomy](https://hub.crowdsec.net/fields) will helps us to find which data is present in which field.
Let's change our whitelist to :
@ -109,7 +131,7 @@ again, let's restart {{crowdsec.name}} !
For the record, I edited nikto's configuration to use 'MySecretUserAgent' as user-agent, and thus :
```bash
nikto -host 127.0.0.1
nikto -host myfqdn.com
```
```bash
@ -120,3 +142,43 @@ time="07-05-2020 09:39:09" level=info msg="Event is whitelisted by Expr !" filte
```
# Whitelist in PostOverflows
Whitelists in PostOverflows are applied *after* the bucket overflow happens.
It has the advantage of being triggered only once we are about to take decision about an IP or Range, and thus happens a lot less often.
A good example is the [crowdsecurity/whitelist-good-actors](https://hub.crowdsec.net/author/crowdsecurity/collections/whitelist-good-actors) collection.
But let's craft ours based on our previous example !
First of all, install the [crowdsecurity/rdns postoverflow](https://hub.crowdsec.net/author/crowdsecurity/configurations/rdns) : it will be in charge of enriching overflows with reverse dns information of the offending IP.
Let's put the following file in `/etc/crowdsec/config/postoverflows/s01-whitelists/mywhitelists.yaml` :
```yaml
name: me/my_cool_whitelist
description: lets whitelist our own reverse dns
whitelist:
reason: dont ban my ISP
expression:
#this is the reverse of my ip, you can get it by performing a "host" command on your public IP for example
- evt.Enriched.reverse_dns endsWith '.asnieres.rev.numericable.fr.'
```
After reloading {{crowdsec.name}}, and launching (again!) nikto :
```bash
nikto -host myfqdn.com
```
```bash
$ tail -f /var/log/crowdsec.log
ime="07-07-2020 17:11:09" level=info msg="Ban for 80.x.x.x whitelisted, reason [dont ban my ISP]" id=cold-sunset name=me/my_cool_whitelist stage=s01
time="07-07-2020 17:11:09" level=info msg="node warning : no remediation" bucket_id=blue-cloud event_time="2020-07-07 17:11:09.175068053 +0200 CEST m=+2308.040825320" scenario=crowdsecurity/http-probing source_ip=80.x.x.x
time="07-07-2020 17:11:09" level=info msg="Processing Overflow with no decisions 80.x.x.x performed 'crowdsecurity/http-probing' (11 events over 313.983994ms) at 2020-07-07 17:11:09.175068053 +0200 CEST m=+2308.040825320" bucket_id=blue-cloud event_time="2020-07-07 17:11:09.175068053 +0200 CEST m=+2308.040825320" scenario=crowdsecurity/http-probing source_ip=80.x.x.x
...
```
This time, we can see that logs are being produced when the event is discarded.

View file

@ -17,6 +17,7 @@ nav:
- Cheat Sheets:
- Ban Management: cheat_sheets/ban-mgmt.md
- Configuration Management: cheat_sheets/config-mgmt.md
- Hub's taxonomy: https://hub.crowdsec.net/fields
- Observability:
- Overview: observability/overview.md
- Logs: observability/logs.md
@ -31,7 +32,8 @@ nav:
- Acquisition: write_configurations/acquisition.md
- Parsers: write_configurations/parsers.md
- Scenarios: write_configurations/scenarios.md
- Whitelist: write_configurations/whitelist.md
- Whitelists: write_configurations/whitelist.md
- Expressions: write_configurations/expressions.md
- Blockers:
- Overview : blockers/index.md
- Nginx:
@ -204,6 +206,11 @@ extra:
Name: Overflow
htmlname: "[overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
Htmlname: "[Overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
whitelists:
name: whitelists
Name: Whitelists
htmlname: "[whitelists](/write_configurations/whitelist/)"
Htmlname: "[Whitelists](/write_configurations/whitelist/)"
signal:
name: signal
Name: Signal

View file

@ -176,7 +176,7 @@ func (o *Output) ProcessOutput(sig types.SignalOccurence, profiles []types.Profi
return err
}
if warn != nil {
logger.Infof("node warning : %s", warn)
logger.Debugf("node warning : %s", warn)
}
if ordr != nil {
bans, err := types.OrderToApplications(ordr)

View file

@ -18,7 +18,7 @@ func reverse_dns(field string, p *types.Event, ctx interface{}) (map[string]stri
}
rets, err := net.LookupAddr(field)
if err != nil {
log.Infof("failed to resolve '%s'", field)
log.Debugf("failed to resolve '%s'", field)
return nil, nil
}
//When using the host C library resolver, at most one result will be returned. To bypass the host resolver, use a custom Resolver.