All Projects → fluent → fluent-plugin-grok-parser

fluent / fluent-plugin-grok-parser

Licence: other
Fluentd's Grok parser

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to fluent-plugin-grok-parser

fluent-plugin-multiprocess
Multiprocess agent plugin for Fluentd
Stars: ✭ 42 (-58%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-redis
Redis output plugin for Fluent event collector
Stars: ✭ 40 (-60%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-ec2-metadata
Fluentd output plugin to add Amazon EC2 metadata into messages
Stars: ✭ 43 (-57%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-rabbitmq
Fluent input/output plugin for RabbitMQ.
Stars: ✭ 26 (-74%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-windows-eventlog
Fluentd plugin to collect windows event logs
Stars: ✭ 27 (-73%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-http-pull
The input plugin of fluentd to pull log from rest api.
Stars: ✭ 19 (-81%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-gcs
Google Cloud Storage output plugin for Fluentd.
Stars: ✭ 39 (-61%)
Mutual labels:  fluentd, fluentd-plugin
fluent-plugin-webhdfs
Hadoop WebHDFS output plugin for Fluentd
Stars: ✭ 57 (-43%)
Mutual labels:  fluentd, fluentd-plugin
fluentd-plugin-mdsd
Azure Linux monitoring agent (mdsd) output plugin for fluentd
Stars: ✭ 26 (-74%)
Mutual labels:  fluentd, fluentd-plugin
Dagger
Dagger 是一个基于 Loki 的日志查询和管理系统,它是由达闼科技( CloudMinds )云团队的`大禹基础设施平台`派生出来的一个项目。Dagger 运行在 Loki 前端,具备日志查询、搜索,保存和下载等特性,适用于云原生场景下的容器日志管理场景。
Stars: ✭ 149 (+49%)
Mutual labels:  fluentd
Fluent Logger Ruby
A structured logger for Fluentd (Ruby)
Stars: ✭ 238 (+138%)
Mutual labels:  fluentd
Terraform Aws Elasticsearch
Terraform module to provision an Elasticsearch cluster with built-in integrations with Kibana and Logstash.
Stars: ✭ 137 (+37%)
Mutual labels:  fluentd
Fluent Plugin Rewrite Tag Filter
Fluentd Output filter plugin to rewrite tags that matches specified attribute.
Stars: ✭ 151 (+51%)
Mutual labels:  fluentd
Fluent Bit
Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
Stars: ✭ 3,223 (+3123%)
Mutual labels:  fluentd
Pathivu
An efficient log ingestion and log aggregation system https://pathivu.io/
Stars: ✭ 146 (+46%)
Mutual labels:  fluentd
ansible-role-fluentbit
Ansible role that install FluentBit
Stars: ✭ 18 (-82%)
Mutual labels:  fluentd
Aws Eks Kubernetes Masterclass
AWS EKS Kubernetes - Masterclass | DevOps, Microservices
Stars: ✭ 129 (+29%)
Mutual labels:  fluentd
Fluent Plugin Systemd
This is a fluentd input plugin. It reads logs from the systemd journal.
Stars: ✭ 124 (+24%)
Mutual labels:  fluentd
LogiAM
基于日志模板构建,采集任务动态管控、数据质量精确度量,一站式日志采集平台
Stars: ✭ 199 (+99%)
Mutual labels:  fluentd
Fluent Logger Java
A structured logger for Fluentd (Java)
Stars: ✭ 186 (+86%)
Mutual labels:  fluentd

Grok Parser for Fluentd

Testing on Ubuntu Testing on macOS

This is a Fluentd plugin to enable Logstash's Grok-like parsing logic.

Requirements

fluent-plugin-grok-parser fluentd ruby
>= 2.0.0 >= v0.14.0 >= 2.1
< 2.0.0 >= v0.12.0 >= 1.9

What's Grok?

Grok is a macro to simplify and reuse regexes, originally developed by Jordan Sissel.

This is a partial implementation of Grok's grammer that should meet most of the needs.

How It Works

You can use it wherever you used the format parameter to parse texts. In the following example, it extracts the first IP address that matches in the log.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    grok_pattern %{IP:ip_address}
  </parse>
</source>

If you want to try multiple grok patterns and use the first matched one, you can use the following syntax:

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      pattern %{HTTPD_COMBINEDLOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
    </grok>
    <grok>
      pattern %{IP:ip_address}
    </grok>
    <grok>
      pattern %{GREEDYDATA:message}
    </grok>
  </parse>
</source>

Multiline support

You can parse multiple line text.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type multiline_grok
    grok_pattern %{IP:ip_address}%{GREEDYDATA:message}
    multiline_start_regexp /^[^\s]/
  </parse>
</source>

You can use multiple grok patterns to parse your data.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type multiline_grok
    <grok>
      pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
    </grok>
  </parse>
</source>

Fluentd accumulates data in the buffer forever to parse complete data when no pattern matches.

You can use this parser without multiline_start_regexp when you know your data structure perfectly.

Configurations

  • See also: Config: Parse Section - Fluentd

  • time_format (string) (optional): The format of the time field.

  • grok_pattern (string) (optional): The pattern of grok. You cannot specify multiple grok pattern with this.

  • custom_pattern_path (string) (optional): Path to the file that includes custom grok patterns

  • grok_failure_key (string) (optional): The key has grok failure reason.

  • grok_name_key (string) (optional): The key name to store grok section's name

  • multi_line_start_regexp (string) (optional): The regexp to match beginning of multiline. This is only for "multiline_grok".

  • grok_pattern_series (enum) (optional): Specify grok pattern series set.

    • Default value: legacy.

<grok> section (optional) (multiple)

  • name (string) (optional): The name of this grok section
  • pattern (string) (required): The pattern of grok
  • keep_time_key (bool) (optional): If true, keep time field in the record.
  • time_key (string) (optional): Specify time field for event time. If the event doesn't have this field, current time is used.
    • Default value: time.
  • time_format (string) (optional): Process value using specified format. This is available only when time_type is string
  • timezone (string) (optional): Use specified timezone. one can parse/format the time value in the specified timezone.

Examples

Using grok_failure_key

<source>
  @type dummy
  @label @dummy
  dummy [
    { "message1": "no grok pattern matched!", "prog": "foo" },
    { "message1": "/", "prog": "bar" }
  ]
  tag dummy.log
</source>

<label @dummy>
  <filter>
    @type parser
    key_name message1
    reserve_data true
    reserve_time true
    <parse>
      @type grok
      grok_failure_key grokfailure
      <grok>
        pattern %{PATH:path}
      </grok>
    </parse>
  </filter>
  <match dummy.log>
    @type stdout
  </match>
</label>

This generates following events:

2016-11-28 13:07:08.009131727 +0900 dummy.log: {"message1":"no grok pattern matched!","prog":"foo","message":"no grok pattern matched!","grokfailure":"No grok pattern matched"}
2016-11-28 13:07:09.010400923 +0900 dummy.log: {"message1":"/","prog":"bar","path":"/"}

Using grok_name_key

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    grok_name_key grok_name
    grok_failure_key grokfailure
    <grok>
      name apache_log
      pattern %{HTTPD_COMBINEDLOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
    </grok>
    <grok>
      name ip_address
      pattern %{IP:ip_address}
    </grok>
    <grok>
      name rest_message
      pattern %{GREEDYDATA:message}
    </grok>
  </parse>
</source>

This will add keys like following:

  • Add grok_name: "apache_log" if the record matches HTTPD_COMBINEDLOG
  • Add grok_name: "ip_address" if the record matches IP
  • Add grok_name: "rest_message" if the record matches GREEDYDATA

Add grokfailure key to the record if the record does not match any grok pattern. See also test code for more details.

How to parse time value using specific timezone

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      name mylog-without-timezone
      pattern %{DATESTAMP:time} %{GREEDYDATE:message}
      timezone Asia/Tokyo
    </grok>
  </parse>
</source>

This will parse the time value as "Asia/Tokyo" timezone.

See Config: Parse Section - Fluentd for more details about timezone.

How to write Grok patterns

Grok patterns look like %{PATTERN_NAME:name} where ":name" is optional. If "name" is provided, then it becomes a named capture. So, for example, if you have the grok pattern

%{IP} %{HOST:host}

it matches

127.0.0.1 foo.example

but only extracts "foo.example" as {"host": "foo.example"}

Please see patterns/* for the patterns that are supported out of the box.

How to add your own Grok pattern

You can add your own Grok patterns by creating your own Grok file and telling the plugin to read it. This is what the custom_pattern_path parameter is for.

<source>
  @type tail
  path /path/to/log
  <parse>
    @type grok
    grok_pattern %{MY_SUPER_PATTERN}
    custom_pattern_path /path/to/my_pattern
  </parse>
</source>

custom_pattern_path can be either a directory or file. If it's a directory, it reads all the files in it.

FAQs

1. How can I convert types of the matched patterns like Logstash's Grok?

Although every parsed field has type string by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.

The syntax is

grok_pattern %{GROK_PATTERN:NAME:TYPE}...

e.g.,

grok_pattern %{INT:foo:integer}

Unspecified fields are parsed at the default string type.

The list of supported types are shown below:

  • string
  • bool
  • integer ("int" would NOT work!)
  • float
  • time
  • array

For the time and array types, there is an optional 4th field after the type name. For the "time" type, you can specify a time format like you would in time_format.

For the "array" type, the third field specifies the delimiter (the default is ","). For example, if a field called "item_ids" contains the value "3,4,5", types item_ids:array parses it as ["3", "4", "5"]. Alternatively, if the value is "Adam|Alice|Bob", types item_ids:array:| parses it as ["Adam", "Alice", "Bob"].

Here is a sample config using the Grok parser with in_tail and the types parameter:

<source>
  @type tail
  path /path/to/log
  format grok
  grok_pattern %{INT:user_id:integer} paid %{NUMBER:paid_amount:float}
  tag payment
</source>

Notice

If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.x.

See also: Plugin Management | Fluentd

License

Apache 2.0 License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].