documentation improvment (#182)

This commit is contained in:
Thibault "bui" Koechlin 2020-08-07 09:40:43 +02:00 committed by GitHub
parent fe8040683e
commit ceb69f0cef
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 178 additions and 337 deletions

View file

@ -10,7 +10,7 @@ When trying to debug a parser or a scenario :
- Work on "cold logs" (with the `-file` and `-type` options) rather than live ones
- Use the `/etc/crowdsec/config/user.yaml` configuration files to have logs on stdout
## Example
## Using user-mode configuration
```bash
crowdsec -c /etc/crowdsec/config/user.yaml -file mylogs.log.gz -type syslog
@ -28,6 +28,25 @@ WARN[05-08-2020 16:16:12] 182.x.x.x triggered a 4h0m0s ip ban remediation for [c
When processing logs like this, {{crowdsec.name}} runs in "time machine" mode, and relies on the timestamps *in* the logs to evaluate scenarios. You will most likely need the `crowdsecurity/dateparse-enrich` parser for this.
## Testing configurations on live system
If you're playing around with parser/scenarios on a live system, you can use the `-t` (lint) option of {{crowdsec.Name}} to check your configurations validity before restarting/reloading services :
```bash
$ emacs /etc/crowdsec/config/scenarios/ssh-bf.yaml
...
$ crowdsec -c /etc/crowdsec/config/user.yaml -t
INFO[06-08-2020 13:36:04] Crowdsec v0.3.0-rc3-4cffef42732944d4b81b3e62a03d4040ad74f185
...
ERRO[06-08-2020 13:36:05] Bad yaml in /etc/crowdsec/config/scenarios/ssh-bf.yaml : yaml: unmarshal errors:
line 2: field typex not found in type leakybucket.BucketFactory
FATA[06-08-2020 13:36:05] Failed to load scenarios: Scenario loading failed : bad yaml in /etc/crowdsec/config/scenarios/ssh-bf.yaml : yaml: unmarshal errors:
line 2: field typex not found in type leakybucket.BucketFactory
```
Using this, you won't have to kill your running service before you know the scenarios/parsers are at least syntactically correct.
## Using debug
Both scenarios and parsers support a `debug: true|false` option which produce useful debug.

View file

@ -9,33 +9,86 @@ At the time of writing, it's mostly files, but it should be more or less any kin
Acquisition configuration always contains a stream (ie. a file to tail) and a tag (ie. "these are in syslog format" "these are non-syslog nginx logs").
## Parsers
File acquisition configuration is defined as :
For logs to be able to be exploited and analyzed, they need to be parsed and normalized, and this is where parsers are used. In most cases, you should be able to find the relevant parsers on our {{hub.htmlname}}.
```yaml
filenames: #a list of file or regexp to read from (supports regular expressions)
- /var/log/nginx/http_access.log
- /var/log/nginx/https_access.log
- /var/log/nginx/error.log
labels:
type: nginx
---
filenames:
- /var/log/auth.log
labels:
type: syslog
```
The `labels` part is here to tag the incoming logs with a type. `labels.type` are used by the parsers to know which logs to process.
## Parsers [[reference](/references/parsers/)]
For logs to be able to be exploited and analyzed, they need to be parsed and normalized, and this is where parsers are used.
A parser is a YAML configuration file that describes how a string is being parsed. Said string can be a log line, or a field extracted from a previous parser. While a lot of parsers rely on the **GROK** approach (a.k.a regular expression named capture groups), parsers can as well reference enrichment modules to allow specific data processing.
A parser usually has a specific scope. For example, if you are using [nginx](https://nginx.org), you will probably want to use the `crowdsecurity/nginx-logs` which allows your {{crowdsec.name}} setup to parse nginx's access and error logs.
Parsers are organized into stages to allow pipelines and branching in parsing.
See the [{{hub.name}}]({{hub.url}}) to explore parsers, or see below some examples :
- [apache2 access/error log parser](https://github.com/crowdsecurity/hub/blob/master/parsers/s01-parse/crowdsecurity/apache2-logs.yaml)
- [iptables logs parser](https://github.com/crowdsecurity/hub/blob/master/parsers/s01-parse/crowdsecurity/iptables-logs.yaml)
- [http logs post-processing](https://github.com/crowdsecurity/hub/blob/master/parsers/s02-enrich/crowdsecurity/http-logs.yaml)
You can as well [write your own](/write_configurations/parsers/) !
## Stages
Parsers are organized into "stages" to allow pipelines and branching in parsing. Each parser belongs to a stage, and can trigger next stage when successful. At the time of writing, the parsers are organized around 3 stages :
- `s00-raw` : low level parser, such as syslog
- `s01-parse` : most of the services parsers (ssh, nginx etc.)
- `s02-enrich` : enrichment that requires parsed events (ie. geoip-enrichment) or generic parsers that apply on parsed logs (ie. second stage http parser)
The number and structure of stages can be altered by the user, the directory structure and their alphabetical order dictates in which order stages and parsers are processed.
Every event starts in the first stage, and will move to the next stage once it has been successfully processed by a parser that has the `onsuccess` directive set to `next_stage`, and so on until it reaches the last stage, when it's going to start to be matched against scenarios. Thus a sshd log might follow this pipeline :
- `s00-raw` : be parsed by `crowdsecurity/syslog-logs` (will move event to the next stage)
- `s01-raw` : be parsed by `crowdsecurity/sshd-logs` (will move event to the next stage)
- `s02-enrich` : will be parsed by `crowdsecurity/geoip-enrich` and `crowdsecurity/dateparse-enrich`
## Enrichers
Enrichment is the action of adding extra context to an event based on the information we already have, so that better decision can later be taken. In most cases, you should be able to find the relevant enrichers on our {{hub.htmlname}}.
A common/simple type of enrichment would be geoip-enrichment of an event (adding information such as : origin country, origin AS and origin IP range to an event).
A common/simple type of enrichment would be [geoip-enrich](https://github.com/crowdsecurity/hub/blob/master/parsers/s02-enrich/crowdsecurity/geoip-enrich.yaml) of an event (adding information such as : origin country, origin AS and origin IP range to an event).
Once again, you should be able to find the ones you're looking for on the {{hub.htmlname}} !
## Scenarios
## Scenarios [[reference](/references/scenarios/)]
Scenarios is the expression of a heuristic that allows you to qualify a specific event (usually an attack). In most cases, you should be able to find the relevant scenarios on our {{hub.htmlname}}.
Scenarios is the expression of a heuristic that allows you to qualify a specific event (usually an attack).It is a YAML file that describes a set of events characterizing a scenario. Scenarios in {{crowdsec.name}} gravitate around the [leaky bucket](https://en.wikipedia.org/wiki/Leaky_bucket) principle.
While not going [into details](/references/scenarios/), a scenario often evolves around several central things.
A scenario description includes at least :
(Let's take "we want to detect ssh bruteforce" as an example!)
- Event eligibility rules. (For example if we're writing a ssh bruteforce detection we only focus on logs of type `ssh_failed_auth`)
- Bucket configuration such as the leak speed or its capacity (in our same ssh bruteforce example, we might allow 1 failed auth per 10s and no more than 5 in a short amount of time: `leakspeed: 10s` `capacity: 5`)
- Aggregation rules : per source ip or per other criterias (in our ssh bruteforce example, we will group per source ip)
- A filter : to know which events are elligible ("I'm looking for failed authentication")
- A grouping key : how are we going to "group" events together to give them a meaning ("We are going to group by source IP performing said failed authentication")
- A rate-limit configuration including burst capacity : to qualify an attack and limit the false positives, we are characterizing the speed at which events need to happen (For a ssh bruteforce, it could be "at least 10 failed authentication within 1 minute")
The description allows for many other rules to be specified (blackhole, distinct filters etc.), to allow rather complex scenarios.
See the [{{hub.name}}]({{hub.url}}) to explore scenarios and their capabilities, or see below some examples :
- [ssh bruteforce detection](https://github.com/crowdsecurity/hub/blob/master/scenarios/crowdsecurity/ssh-bf.yaml)
- [distinct http-404 scan](https://github.com/crowdsecurity/hub/blob/master/scenarios/crowdsecurity/http-scan-uniques_404.yaml)
- [iptables port scan](https://github.com/crowdsecurity/hub/blob/master/scenarios/crowdsecurity/iptables-scan-multi_ports.yaml)
You can as well [write your own](/write_configurations/scenarios/) !
@ -46,3 +99,36 @@ To make user's life easier, "collections" are available, which are just a bundle
In this way, if you want to cover basic use-cases of let's say "nginx", you can just install the `crowdsecurity/nginx` collection that is composed of `crowdsecurity/nginx-logs` parser, as well as generic http scenarios such as `crowdsecurity/base-http-scenarios`.
As usual, those can be found on the {{hub.htmlname}} !
## Event
The objects that are processed within {{crowdsec.name}} are named "Events".
An Event can be a log line, or an overflow result. This object layout evolves around a few important items :
- `Parsed` is an associative array that will be used during parsing to store temporary variables or processing results.
- `Enriched`, very similar to `Parsed`, is an associative array but is intended to be used for enrichment process.
- `Overflow` is a `SignalOccurence` structure that represents information about a triggered scenario, when applicable.
- `Meta` is an associative array that will be used to keep track of meta information about the event.
_Other fields omitted for clarity, see [`pkg/types/event.go`](https://github.com/crowdsecurity/crowdsec/blob/master/pkg/types/event.go) for detailed definition_
## Overflow or SignalOccurence
This object holds the relevant information about a scenario that happened : who / when / where / what etc.
Its most relevant fields are :
- `Scenario` : name of the scenario
- `Alert_message` : a humanly readable message about what happened
- `Events_count` : the number of individual events that lead to said overflow
- `Start_at` + `Stop_at` : timestamp of the first and last events that triggered the scenario
- `Source` : a binary representation of the source of the attack
- `Source_[ip,range,AutonomousSystemNumber,AutonomousSystemOrganization,Country]` : string representation of source information
- `Labels` : an associative array representing the scenario "labels" (see scenario definition)
_Other fields omitted for clarity, see [`pkg/types/signal_occurence.go`](https://github.com/crowdsecurity/crowdsec/blob/master/pkg/types/signal_occurence.go) for detailed definition_
### PostOverflow
A postoverflow is a parser that will be applied on overflows (scenario results) before the decision is written to local DB or pushed to API. Parsers in postoverflows are meant to be used for "expensive" enrichment/parsing process that you do not want to perform on all incoming events, but rather on decision that are about to be taken.
An example could be slack/mattermost enrichment plugin that requires human confirmation before applying the decision or reverse-dns lookup operations.

View file

@ -1,123 +0,0 @@
### **Event**
The objects that are processed within {{crowdsec.name}} are named "Events".
An Event can be a log line, or an overflow result. This object layout evolves around a few important items :
- `Parsed` is an associative array that will be used during parsing to store temporary variables or processing results.
- `Enriched`, very similar to `Parsed`, is an associative array but is intended to be used for enrichment process.
- `Overflow` is a `SignalOccurence` structure that represents information about a triggered scenario, when applicable.
- `Meta` is an associative array that will be used to keep track of meta information about the event.
_Other fields omitted for clarity, see [`pkg/types/event.go`](https://github.com/crowdsecurity/crowdsec/blob/master/pkg/types/event.go) for detailed definition_
### **Overflow or SignalOccurence**
This object holds the relevant information about a scenario that happened : who / when / where / what etc.
Its most relevant fields are :
- `Scenario` : name of the scenario
- `Alert_message` : a humanly readable message about what happened
- `Events_count` : the number of individual events that lead to said overflow
- `Start_at` + `Stop_at` : timestamp of the first and last events that triggered the scenario
- `Source` : a binary representation of the source of the attack
- `Source_[ip,range,AutonomousSystemNumber,AutonomousSystemOrganization,Country]` : string representation of source information
- `Labels` : an associative array representing the scenario "labels" (see scenario definition)
_Other fields omitted for clarity, see [`pkg/types/signal_occurence.go`](https://github.com/crowdsecurity/crowdsec/blob/master/pkg/types/signal_occurence.go) for detailed definition_
### **Acquisition**
Acquisition and its config (`acquis.yaml`) specify a list of files/ streams to read from (at the time of writing, files are the only input stream supported).
On common setups, {{wizard.name}} interactive installation will take care of it.
File acquisition configuration is defined as :
```yaml
filenames: #a list of file or regexp to read from (supports regular expressions)
- /var/log/nginx/http_access.log
- /var/log/nginx/https_access.log
- /var/log/nginx/error.log
labels:
type: nginx
---
filenames:
- /var/log/auth.log
labels:
type: syslog
```
The `labels` part is here to tag the incoming logs with a type. `labels.type` are used by the parsers to know which logs to process.
### **Parser**
A parser is a YAML configuration file that describes how a string is being parsed. Said string can be a log line, or a field extracted from a previous parser. While a lot of parsers rely on the **GROK** approach (a.k.a regular expression named capture groups), parsers can as well reference enrichment modules to allow specific data processing.
Parsers are organized into stages to allow pipelines and branching in parsing.
See the [{{hub.name}}]({{hub.url}}) to explore parsers, or see below some examples :
- [apache2 access/error log parser](https://github.com/crowdsecurity/hub/blob/master/parsers/s01-parse/crowdsecurity/apache2-logs.yaml)
- [iptables logs parser](https://github.com/crowdsecurity/hub/blob/master/parsers/s01-parse/crowdsecurity/iptables-logs.yaml)
- [http logs post-processing](https://github.com/crowdsecurity/hub/blob/master/parsers/s02-enrich/crowdsecurity/http-logs.yaml)
### **Parser node**
A node is an individual parsing description.
Several nodes might be presented in a single parser file.
### **Node success or failure**
When an {{event.htmlname}} enters a node (because the filter returned true), it can be considered as a success or a failure.
The node will be successful if a grok pattern is present and successfully returned data.
A node is considered to have failed if a grok pattern is present but didn't return data.
If no grok pattern is present, the node will be considered successful.
It ensures that once an event has been parsed, it won't attempt to be processed by other nodes.
### **Stages**
Parsers are organized into "stages" to allow pipelines and branching in parsing.
Each parser belongs to a stage, and can trigger next stage when successful.
At the time of writing, the parsers are organized around 3 stages :
- `s00-raw` : low level parser, such as syslog
- `s01-parse` : most of the services parsers (ssh, nginx etc.)
- `s02-enrich` : enrichment that requires parsed events (ie. geoip-enrichment) or generic parsers that apply on parsed logs (ie. second stage http parser)
The number and structure of stages can be altered by the user, the directory structure and their alphabetical order dictates in which order stages and parsers are processed.
### **Enricher**
An enricher is a parser that will call external code to process the data instead of processing data based on a regular expression.
See the [geoip-enrich](https://github.com/crowdsecurity/hub/blob/master/parsers/s02-enrich/crowdsecurity/geoip-enrich.yaml) as an example.
### **Scenario**
A scenario is a YAML configuration file that describes a set of events characterizing a scenario.
Scenarios in {{crowdsec.name}} gravitate around the [leaky bucket](https://en.wikipedia.org/wiki/Leaky_bucket) principle.
A scenario description includes at least :
- Event eligibility rules. (For example if we're writing a ssh bruteforce detection we only focus on logs of type `ssh_failed_auth`)
- Bucket configuration such as the leak speed or its capacity (in our same ssh bruteforce example, we might allow 1 failed auth per 10s and no more than 5 in a short amount of time: `leakspeed: 10s` `capacity: 5`)
- Aggregation rules : per source ip or per other criterias (in our ssh bruteforce example, we will group per source ip)
The description allows for many other rules to be specified (blackhole, distinct filters etc.), to allow rather complex scenarios.
See the [{{hub.name}}]({{hub.url}}) to explore scenarios and their capabilities, or see below some examples :
- [ssh bruteforce detection](https://github.com/crowdsecurity/hub/blob/master/scenarios/crowdsecurity/ssh-bf.yaml)
- [distinct http-404 scan](https://github.com/crowdsecurity/hub/blob/master/scenarios/crowdsecurity/http-scan-uniques_404.yaml)
- [iptables port scan](https://github.com/crowdsecurity/hub/blob/master/scenarios/crowdsecurity/iptables-scan-multi_ports.yaml)
### **PostOverflow**
A postoverflow is a parser that will be applied on overflows (scenario results) before the decision is written to local DB or pushed to API. Parsers in postoverflows are meant to be used for "expensive" enrichment/parsing process that you do not want to perform on all incoming events, but rather on decision that are about to be taken.
An example could be slack/mattermost enrichment plugin that requires human confirmation before applying the decision or reverse-dns lookup operations.

View file

@ -1,4 +1,4 @@
Enrichers are basically {{parsers.htmlname}} that can rely on external methods to provide extra contextual information to the event. The enrichers are usually in the `s02-enrich` {{stages.htmlname}} (after most of the parsing happened).
Enrichers are basically {{parsers.htmlname}} that can rely on external methods to provide extra contextual information to the event. The enrichers are usually in the `s02-enrich` {{stage.htmlname}} (after most of the parsing happened).
Enrichers functions should all accept a string as a parameter, and return an associative string array, that will be automatically merged into the `Enriched` map of the {{event.htmlname}}.

View file

@ -124,7 +124,7 @@ You can find details on the configuration file format of {{ref.output}}.
This directory holds all the {{parsers.htmlname}} that are enabled on your system.
The parsers are organized in {{stages.htmlname}} (which are just folders) and the {{parsers.htmlname}} themselves are yaml files.
The parsers are organized in {{stage.htmlname}} (which are just folders) and the {{parsers.htmlname}} themselves are yaml files.
## scenarios/

View file

@ -5,7 +5,7 @@
!!! info
Alphabetical file order dictates the order of {{stages.htmlname}} and the orders of parsers within stage.
Alphabetical file order dictates the order of {{stage.htmlname}} and the orders of parsers within stage.
You can use the following command to view installed parsers:

View file

@ -1,15 +1,34 @@
## Understanding parsers
Parsers are configurations that define a transformation on an {{event.htmlname}}.
Parsers are expressed as YAML files composed of one or more individual 'parsing' nodes.
An {{event.htmlname}} can be the representation of a log line, or an overflow.
A parser is a YAML configuration file that describes how a string is being parsed. Said string can be a log line, or a field extracted from a previous parser. While a lot of parsers rely on the **GROK** approach (a.k.a regular expression named capture groups), parsers can as well reference enrichment modules to allow specific data processing, or use specific {{expr.htmlname}} feature to perform parsing on specific data, such as JSON.
A parser itself can be used to perform various actions, including :
Parsers are organized into stages to allow pipelines and branching in parsing.
- Parse a string with regular expression (grok patterns)
- Enrich an event by relying on "external" code (such as the geoip-enrichment parser)
- Process one or more fields of an {{event.name}} with {{expr.htmlname}}
See the [{{hub.name}}]({{hub.url}}) to explore parsers, or see below some examples :
- [apache2 access/error log parser](https://github.com/crowdsecurity/hub/blob/master/parsers/s01-parse/crowdsecurity/apache2-logs.yaml)
- [iptables logs parser](https://github.com/crowdsecurity/hub/blob/master/parsers/s01-parse/crowdsecurity/iptables-logs.yaml)
- [http logs post-processing](https://github.com/crowdsecurity/hub/blob/master/parsers/s02-enrich/crowdsecurity/http-logs.yaml)
## Stages
Stages concept is central to data parsing in {{crowdsec.name}}, as it allows to have various "steps" of parsing. All parsers belong to a given stage. While users can add or modify the stages order, the following stages exist :
- `s00-raw` : low level parser, such as syslog
- `s01-parse` : most of the services parsers (ssh, nginx etc.)
- `s02-enrich` : enrichment that requires parsed events (ie. geoip-enrichment) or generic parsers that apply on parsed logs (ie. second stage http parser)
Every event starts in the first stage, and will move to the next stage once it has been successfully processed by a parser that has the `onsuccess` directive set to `next_stage`, and so on until it reaches the last stage, when it's going to start to be matched against scenarios. Thus a sshd log might follow this pipeline :
- `s00-raw` : be parsed by `crowdsecurity/syslog-logs` (will move event to the next stage)
- `s01-raw` : be parsed by `crowdsecurity/sshd-logs` (will move event to the next stage)
- `s02-enrich` : will be parsed by `crowdsecurity/geoip-enrich` and `crowdsecurity/dateparse-enrich`
## Parser configuration format
A parser node might look like :
@ -34,6 +53,8 @@ grok:
apply_on: evt.Parsed.some_field
#statics are transformations that are applied on the event if the node is considered "successfull"
statics:
- parsed: something
expression: JsonExtract(evt.Event.extracted_value, "nested.an_array[0]")
#to which field the value will be written (here -> evt.Meta.log_type)
- meta: log_type
#and here a static value
@ -44,14 +65,12 @@ statics:
expression: "evt.Parsed.src_ip"
```
The parser nodes are processed sequentially based on the alphabetical order of {{stages.htmlname}} and subsequent files.
The parser nodes are processed sequentially based on the alphabetical order of {{stage.htmlname}} and subsequent files.
If the node is considered successful (grok is present and returned data or no grok is present) and "onsuccess" equals to `next_stage`, then the {{event.name}} is moved to the next stage.
## Parser trees
A parser node can contain sub-nodes, to provide proper branching.
A parser node can contain sub-nodes, to provide proper branching (on top of stages).
It can be useful when you want to apply different parsing based on different criterias, or when you have a set of candidates parsers that you want to apply to an event :
```yaml
@ -339,10 +358,10 @@ data:
## Parser concepts
### Success and failure
A parser is considered "successful" if :
- A grok pattern was present and successfully matched
- No grok pattern was present

View file

@ -1,8 +1,5 @@
# Writing {{crowdsec.Name}} parser
!!! info
Please ensure that you have working env or setup test environment before writing your parser.
!!! warning "Parser dependency"
The crowdsecurity/syslog-logs parsers is needed by the core parsing
engine. Deletion or modification of this could result of {{crowdsec.name}}
@ -15,6 +12,7 @@
The most simple parser can be defined as :
```yaml
filter: 1 == 1
debug: true
@ -44,32 +42,25 @@ May 11 16:23:43 sd-126005 kernel: [47615895.771900] IN=enp1s0 OUT= MAC=00:08:a2:
May 11 16:23:50 sd-126005 kernel: [47615902.763137] IN=enp1s0 OUT= MAC=00:08:a2:0c:1f:12:00:c8:8b:e2:d6:87:08:00 SRC=44.44.44.44 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=49 ID=17451 DF PROTO=TCP SPT=53668 DPT=80 WINDOW=14600 RES=0x00 SYN URGP=0
```
## Let's try our mock parser
## Trying our mock parser
!!! warning
Your yaml file must be in the `config/parsers/s01-parser/` directory (relative to your current test directory).
Your yaml file must be in the `config/parsers/s01-parser/` directory.
For example it can be `~/crowdsec-v0.0.19/tests/config/parsers/s01-parser/myparser.yaml`
For example it can be `~/crowdsec-v0.0.19/tests/config/parsers/s01-parser/myparser.yaml`, or `/etc/crowdsec/config/parsers/s01-parser/myparser.yaml`.
The stage directory might not exist, don't forget to create it.
The {{stage.htmlname}} directory might not exist, don't forget to create it.
(deployment is assuming [you're using a test environment](/write_configurations/requirements/))
Setting up our new parser :
```bash
cd crowdsec-v0.X.Y/tests
```
```bash
mkdir -p config/parsers/s01-parser
```
```bash
cp myparser.yaml config/parsers/s01-parser/
```
Testing our new parser :
```bash
./crowdsec -c ./dev.yaml -file ./x.log -type foobar
```
<details>
<summary>Expected output</summary>
@ -264,155 +255,5 @@ DEBU[0000] move Event from stage s01-parser to s02-enrich id=shy-forest name=cr
We have now a fully functional parser for {{crowdsec.name}} !
We can either deploy it to our production systems to do stuff, or even better, contribute to the {{hub.htmlname}} !
If you want to know more about directives and possibilities, take a look at [the parser reference documentation](/references/parsers/) !
<!--
The first field that you will write is the `onsuccess` one. This one indicate what to do in case of success log parsing. Put the value `next_stage` if you want the log to be processed by the next stages in case of parsing success:
```yaml
onsuccess: next_stage
```
Then come the `filter` part.
You will mostly want to filter on the `program` of the event:
```yaml
filter: evt.Parsed.program == '<program>'
```
The `name` (please name your parser like `<github_account_name>/<parser_name>`):
```yaml
name: crowdsecurity/example
```
A small description:
```yaml
description: this parser can process X/Y/Z logs from <program>
```
The grok part:
- If you have only one type of log then you can start with the `grok` object which is defined as below:
```yaml
grok:
pattern: <your_grok_pattern_here> # can't be used with 'name'
name: <grok_name> # grok name loaded from https://github.com/crowdsecurity/crowdsec/tree/master/config/patterns. can't be used with 'pattern'
apply_on: message
statics:
- <meta|target> : <field_name>
<value|expression> : <field_value>
- <meta|target> : <field_name>
<value|expression> : <field_value>
```
The grok pattern will be applied on the `message` field of the previous success stage.
The `pattern` and `name` keyword can't be use together
- If you have more type of logs, you will have to start with the `node` keyword that is a list of grok:
```yaml
nodes:
grok:
pattern: <your_first_grok_pattern>
apply_on: message
statics:
- <meta|target> : <field_name>
<value|expression> : <field_value>
- <meta|target> : <field_name>
<value|expression> : <field_value>
grok:
pattern: <your_second_grok_pattern>
apply_on: message
statics:
- <meta|target> : <field_name>
<value|expression> : <field_value>
- <meta|target> : <field_name>
<value|expression> : <field_value>
statics:
- <meta|target> : <field_name>
<value|expression> : <field_value>
- <meta|target> : <field_name>
<value|expression> : <field_value>
```
The `statics` is a process that will set up a value for a given key in the parsed event.
For the field `name` the keyword can be either `meta` or `target`:
- `meta` : the new field will be created in the evt.Meta object to be accessible like : `evt.Meta.<new_field>`;
```yaml
meta: log_type
```
- `target`: the name of the new field:
```yaml
target: evt.source_ip
```
For the field value, it can be either `value` or `expression`:
- `value` is the value assigned, for example : `http_access_log`
```yaml
value: http_access_log
```
- `expression` the result of a parsed field, for example : `evt.Parsed.remote_addr`
```yaml
expression : evt.Parsed.remote_addr
```
The `statics` can be applied only for the grok it succeed, if it is in the `grok` object, else for whatever grok if at the root level.
Full example with NGINX:
<details>
<summary>Nginx </summary>
```yaml
filter: "evt.Parsed.program == 'nginx'"
onsuccess: next_stage
#debug: true
name: crowdsecurity/nginx-logs
description: "Parse nginx access and error logs"
nodes:
- grok:
name: NGINXACCESS
apply_on: message
statics:
- meta: log_type
value: http_access-log
- target: evt.StrTime
expression: evt.Parsed.time_local
- grok:
# and this one the error log
name: NGINXERROR
apply_on: message
statics:
- meta: log_type
value: http_error-log
- target: evt.StrTime
expression: evt.Parsed.time
# these ones apply for both grok patterns
statics:
- meta: service
value: http
- meta: source_ip
expression: "evt.Parsed.remote_addr"
- meta: http_status
expression: "evt.Parsed.status"
- meta: http_path
expression: "evt.Parsed.request"
```
</details> -->

View file

@ -90,5 +90,5 @@ You can now jump to the next step : [writing our own parser !](/write_configurat
### Custom stage
It is possible to write custom stages. If you want some specific parsing or enrichment to be done after the `s02-enrich` stage, it is possible by creating a new folder `s03-<custom_stage>` (and so on). The configuration that will be created in this folder will process the logs configured to go to `next_stage` in the `s02-enrich` stage.
It is possible to write custom stage. If you want some specific parsing or enrichment to be done after the `s02-enrich` stage, it is possible by creating a new folder `s03-<custom_stage>` (and so on). The configuration that will be created in this folder will process the logs configured to go to `next_stage` in the `s02-enrich` stage.

View file

@ -2,10 +2,9 @@ site_name: Crowdsec
nav:
- Home: index.md
- Getting Started:
- Concepts : getting_started/concepts.md
- Installation : getting_started/installation.md
- Crowdsec Tour: getting_started/crowdsec-tour.md
- Glossary : getting_started/glossary.md
- Concepts & Glossary : getting_started/concepts.md
- FAQ: getting_started/FAQ.md
- Guide:
- Overview: guide/crowdsec/overview.md
@ -171,18 +170,13 @@ extra:
event:
name: event
Name: Event
htmlname: "[event](/getting_started/glossary/#event)"
Htmlname: "[Event](/getting_started/glossary/#event)"
htmlname: "[event](/getting_started/concepts/#event)"
Htmlname: "[Event](/getting_started/concepts/#event)"
expr:
name: expr
Name: Expr
htmlname: "[expr](/write_configurations/expressions/)"
Htmlname: "[Expr](/write_configurations/expressions/)"
stages:
name: stages
name: Stages
htmlname: "[stages](/getting_started/glossary/#stages)"
Htmlname: "[Stages](/getting_started/glossary/#stages)"
filter:
name: filter
Name: Filter
@ -201,13 +195,13 @@ extra:
parsers:
name: parsers
Name: Parsers
htmlname: "[parsers](/getting_started/glossary/#parser)"
Htmlname: "[Parsers](/getting_started/glossary/#parser)"
htmlname: "[parsers](/getting_started/concepts/#parser)"
Htmlname: "[Parsers](/getting_started/concepts/#parser)"
scenarios:
name: scenarios
Name: Scenarios
htmlname: "[scenarios](/getting_started/glossary/#scenario)"
Htmlname: "[Scenarios](/getting_started/glossary/#scenario)"
htmlname: "[scenarios](/getting_started/concepts/#scenario)"
Htmlname: "[Scenarios](/getting_started/concepts/#scenario)"
collections:
name: collections
Name: Collections
@ -216,13 +210,13 @@ extra:
timeMachine:
name: timeMachine
Name: TimeMachine
htmlname: "[timeMachine](/getting_started/glossary/#timemachine)"
Htmlname: "[TimeMachine](/getting_started/glossary/#timemachine)"
htmlname: "[timeMachine](/getting_started/concepts/#timemachine)"
Htmlname: "[TimeMachine](/getting_started/concepts/#timemachine)"
overflow:
name: overflow
Name: Overflow
htmlname: "[overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
Htmlname: "[Overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
htmlname: "[overflow](/getting_started/concepts/#overflow-or-signaloccurence)"
Htmlname: "[Overflow](/getting_started/concepts/#overflow-or-signaloccurence)"
whitelists:
name: whitelists
Name: Whitelists
@ -231,9 +225,14 @@ extra:
signal:
name: signal
Name: Signal
htmlname: "[signal](/getting_started/glossary/#overflow-or-signaloccurence)"
Htmlname: "[Signal](/getting_started/glossary/#overflow-or-signaloccurence)"
htmlname: "[signal](/getting_started/concepts/#overflow-or-signaloccurence)"
Htmlname: "[Signal](/getting_started/concepts/#overflow-or-signaloccurence)"
#scenario stuff
stage:
name: stage
Name: Stage
htmlname: "[stage](/getting_started/concepts/#stages)"
Htmlname: "[Stage](/getting_started/concepts/#stages)"
leakspeed:
name: leakspeed
Name: Leakspeed

View file

@ -3,7 +3,7 @@
# Parser
Parser is in charge of turning raw log lines into objects that can be manipulated by heuristics.
Parsing has several stages represented by directories on config/stages.
Parsing has several stages represented by directories on config/stage.
The alphabetical order dictates the order in which the stages/parsers are processed.
The runtime representation of a line being parsed (or an overflow) is an `Event`, and has fields that can be manipulated by user :