crowdsec/pkg/parser
Laurence Jones 19de3a8a77
Runtime whitelist parsing improvement (#2422)
* Improve whitelist parsing

* Split whitelist check into a function tied to whitelist, also since we check node debug we can make a pointer to node containing whitelist

* No point passing clog as an argument since it is just a pointer to node we already know about

* We should break instead of returning false, false as it may have been whitelisted by ips/cidrs

* reimplement early return if expr errors

* Fix lint and dont need to parse ip back to string just loop over sources

* Log error with node logger as it provides context

* Move getsource to a function cleanup some code

* Change func name

* Split out compile to a function so we can use in tests. Add a bunch of tests

* spell correction

* Use node logger so it has context

* alternative solution

* quick fixes

* Use containswls

* Change whitelist test to use parseipsource and only events

* Make it simpler

* Postoverflow tests, some basic ones to make sure it works

* Use official pkg

* Add @mmetc reco

* Add @mmetc reco

* Change if if to a switch to only evaluate once

* simplify assertions

---------

Co-authored-by: bui <thibault@crowdsec.net>
Co-authored-by: Marco Mariani <marco@crowdsec.net>
2023-10-16 10:08:57 +01:00
..
test_data Allow parsers to capture data for future enrichment (#1969) 2023-01-11 15:01:02 +01:00
tests Add ParseKV helper and rework UnmarshalJSON as a proper helper (#2184) 2023-05-12 09:43:01 +02:00
enrich.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
enrich_date.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
enrich_date_test.go update dependency on go-cs-lib; drop the pkg/ part (#2393) 2023-07-28 16:35:08 +02:00
enrich_dns.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
enrich_geoip.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
enrich_unmarshal.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
grok_pattern.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
node.go Runtime whitelist parsing improvement (#2422) 2023-10-16 10:08:57 +01:00
node_test.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
parsing_test.go light pkg/parser cleanup (#2279) 2023-06-13 13:16:13 +02:00
README.md Typos, grammar (#1905) 2022-12-06 15:55:27 +01:00
runtime.go Add method name to child logger so we can see which function is erroring when in enrichers (#2411) 2023-08-08 13:38:11 +01:00
stage.go minor log message improvements (#2455) 2023-09-12 11:04:56 +02:00
unix_parser.go Reset grokky once all patterns are compiled as we do not need to hold them in memoory (#2420) 2023-10-13 12:53:42 +01:00
whitelist.go Runtime whitelist parsing improvement (#2422) 2023-10-16 10:08:57 +01:00
whitelist_test.go Runtime whitelist parsing improvement (#2422) 2023-10-16 10:08:57 +01:00

![gopherbadger-tag-do-not-edit]

Parser

Parser is in charge of turning raw log lines into objects that can be manipulated by heuristics. Parsing has several stages represented by directories on config/stage. The alphabetical order dictates the order in which the stages/parsers are processed.

The runtime representation of a line being parsed (or an overflow) is an Event, and has fields that can be manipulated by user :

  • Parsed : a string dict containing parser outputs
  • Meta : a string dict containing meta information about the event
  • Line : a raw line representation
  • Overflow : a representation of the overflow if applicable

The Event structure goes through the stages, being altered with each parsing step. It's the same object that will be later poured into buckets.

Parser configuration

A parser configuration is a Node object, that can contain grok patterns, enrichement instructions.

For example :

filter: "evt.Line.Labels.type == 'testlog'"
debug: true
onsuccess: next_stage
name: tests/base-grok
pattern_syntax:
  MYCAP: ".*"
nodes:
  - grok:
      pattern: ^xxheader %{MYCAP:extracted_value} trailing stuff$
      apply_on: Line.Raw
statics:
  - meta: log_type
    value: parsed_testlog

Name

optional if present and prometheus or profiling are activated, stats will be generated for this node.

Filter

filter: "Line.Src endsWith '/foobar'"

  • optional filter : an expression that will be evaluated against the runtime of a line (Event)
    • if the filter is present and returns false, node is not evaluated
    • if filter is absent or present and returns true, node is evaluated

Debug flag

debug: true

  • optional debug : a bool that sets debug of the node to true (applies at runtime and configuration parsing)

OnSuccess flag

onsuccess: next_stage|continue

  • mandatory indicates the behavior to follow if the node succeeds. next_stage make the line go to the next stage, while continue will continue processing the current stage.

Statics

statics:
    - meta: service
      value: tcp
    - meta: source_ip
      expression: "Event['source_ip']"
    - parsed: "new_connection"
      expression: "Event['tcpflags'] contains 'S' ? 'true' : 'false'"
    - target: Parsed.this_is_a_test
      value: foobar

Statics apply when a node is considered successful, and are used to alter the Event structure. An empty node, a node with a grok pattern that succeeded or an enrichment directive that worked are successful nodes. Statics can :

  • meta: add/alter an entry in the Meta dict
  • parsed: add/alter an entry in the Parsed dict
  • target: indicate a destination field by name, such as Meta.my_key The source of data can be :
  • value: a static value
  • expr_result : the result of an expression

Grok patterns

Grok patterns are used to parse one field of Event into one or several others :

grok:
  name: "TCPDUMP_OUTPUT"
  apply_on: message

name is the name of a pattern loaded from patterns/. Base patterns can be seen on the repo : https://github.com/crowdsecurity/grokky/blob/master/base.go


grok:
  pattern: "^%{GREEDYDATA:request}\\?%{GREEDYDATA:http_args}$"
  apply_on: request

pattern which is a valid pattern, optionally with an apply_on that indicates to which field it should be applied

Patterns syntax

Present at the Event level, the pattern_syntax is a list of subgroks to be declared.

pattern_syntax:
  DIR: "^.*/"
  FILE: "[^/].*$"

Enrichment

The Enrichment mechanism is exposed via statics :

statics:
  - method: GeoIpCity
    expression: Meta.source_ip
  - meta: IsoCode
    expression: Enriched.IsoCode
  - meta: IsInEU
    expression: Enriched.IsInEU

The GeoIpCity method is called with the value of Meta.source_ip. Enrichment plugins can output one or more key:values in the Enriched map, and it's up to the user to copy the relevant values to Meta or such.

Trees

The Node object allows as well a nodes entry, which is a list of Node entries, allowing you to build trees.

filter: "Event['program'] == 'nginx'" #A
nodes: #A'
  - grok: #B
      name: "NGINXACCESS"
      # this statics will apply only if the above grok pattern matched
      statics: #B'
        - meta: log_type
          value: "http_access-log"
  - grok: #C
      name: "NGINXERROR"
      statics:
        - meta: log_type
          value: "http_error-log"
statics: #D
  - meta: service
    value: http

The evaluation process of a node is as follows:

  • apply the filter (A), if it doesn't match, exit
  • iterate over the list of nodes (A') and apply the node process to each.
  • if a grok entry is present, process it
    • if the grok entry returned data, apply the local statics of the node (if the grok 'B' was successful, apply B' statics)
  • if any of the nodes or the grok was successful, apply the statics (D)

Code Organisation

Main structs :

  • Node (config.go) : the runtime representation of parser configuration
  • Event (runtime.go) : the runtime representation of the line being parsed

Main funcs :

  • CompileNode : turns YAML into runtime-ready tree (Node)
  • ProcessNode : process the raw line against the parser tree, and produces ready-for-buckets data