## Logstash Filter
Logstash filter plugins are used to perform intermediary processing on Logs. In our production, we use 
`grok` filter to extract various information from different service logs and add tags to them. 

The reason why we use filters is that it will be very helpful for us to do search and data visualization in Kibana.

### What is grok filter?
There are many filter plugins that already developed for you to use. 
You could find them [here](https://www.elastic.co/guide/en/logstash/current/filter-plugins.html)

For our production Logstash, we use grok filter to parse logs collected by Filebeat. 
Grok works by combining text patterns into something that matches your logs. 
The syntax for a grok pattern is `%{SYNTAX:SEMANTIC}`.

`Text patterns` are defined by regular expressions. You could find many predefined patterns 
[here](https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns). 
If you want to build your own patterns, you will find 
[this one](http://grokdebug.herokuapp.com) and 
[this one](http://grokconstructor.appspot.com/) applications quite useful.

The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the 
`NUMBER` pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match.

The `SEMANTIC` is the identifier you give to the piece of text being matched. For example, `3.44` could be the 
duration of an event, so you could call it simply `duration` 
Further, a string `55.3.244.1` might identify the `client` making a request.

For the above example, your grok filter would look something like this:
```shell
	%{NUMBER:duration} %{IP:client}
```

### Production Logstash filter
The Production logstash filter is configured like below:
```json
filter {
    if [type] == "neutron" or [type] == "horizon" or [type] == "nova" or [type] == "glance" or [type] == "cinder" or [type] == "keystone" or [type] == "ceilometer" or [type] == "heat" {
        grok {
            patterns_dir => ["/opt/logstash/patterns"]
            match => { 'message' => '%{STACKPATTERN}'}
        }
    }

    if [type] == "horizon" and [module] == "openstack_auth.forms" {
        grok {
           match => { 'logmessage' => '%{QUOTEDSTRING:username}'}
        }
    }
}
```

When Filebeat collectting logs it will add 'type' field to the log according to what 
openstack service the log message comes from. Such as 'nova', 'glance', 'horizon'... 

This is for use in Logstash. When Logstash received these logs from Filebeat, 
it will parse them using (maybe) different patterns according to service types.

The grok patterns are defined in `/opt/logstash/patterns`. The production patterns is shown below:
```shell
MODULE \b(?:[0-9A-Za-z_][0-9A-Za-z-_]{0,62})(?:\.(?:[0-9A-Za-z_][0-9A-Za-z-_]{0,62}))*(\.?|\b)
STACKPATTERN %{TIMESTAMP_ISO8601:date} %{BASE10NUM:processID} %{LOGLEVEL:loglevel} %{MODULE:module} ?(\[(%{NOTSPACE:request})?\s?(%{NOTSPACE:user})?\s?(%{NOTSPACE:tenant})?\s?(%{NOTSPACE:domain})?\s?(%{NOTSPACE:user_domain})?\s?(%{NOTSPACE:project_domain})?\])? %{GREEDYDATA:logmessage}
CEPHPATTERN %{TIMESTAMP_ISO8601:date} %{GREEDYDATA:logmessage}
```

As you could see above, we could extract many informations like `datetime`, `processID`, `loglevel`, `module` 
from log messages. It is used for future search and visualization in Kibana.