Logstash has a variety of plugins to help integrate it with a variety of input and output sources. Let's explore the various plugins available.
You can execute the following command to list all available plugins in your installed Logstash version:
bin/plugin list
Also, you can list all plugins containing a name fragment by executing this command:
bin/plugin list <namefragment>
To list all plugins for group names, input, output, or filter, we can execute this command:
bin/plugin list --group <group name> bin/plugin list --group output
Before exploring various plugin configurations, let's take a look at the data types and conditional expressions used in various Logstash configurations.
A Logstash plugin requires certain settings or properties to be set. Those properties have certain values that belong to one of the following important data types.
An array is collection of values for a property.
An example can be seen as follows:
path => ["value1","value2"]
A boolean value is either true
or false
(without quotes).
An example can be seen as follows:
periodic_flush => false
Codec is actually not a data type but a way to encode or decode data at input or output.
An example can be seen as follows:
codec => "json"
This instance specifies that this codec, at output, will encode all output in JSON format.
Hash is basically a key value pair collection. It is specified as "key" => "value"
and multiple values in a collection are separated by a space.
An example can be seen as follows:
match => { "key1" => "value1" "key2" => "value2"}
String represents a sequence of characters enclosed in quotes.
An example can be seen as follows:
value => "Welcome to ELK"
Logstash conditionals are used to filter events or log lines under certain conditions. Conditionals in Logstash are handled like other programming languages and work with if
, if else
and else
statements. Multiple if else
blocks can be nested.
Syntax for conditionals is as follows:
if <conditional expression1>{ #some statements here. } else if <conditional expression2>{ #some statements here. } else{ #some statements here. }
Conditionals work with comparison operators, boolean operators and unary operators:
==
, !=
, <
, >
, <=
, >=
=~
, !~
in
, not in
and
, or
, nand
, xor
!
Let's take a look at this with an example:
filter { if [action] == "login" { mutate { remove => "password" } } }
Multiple expressions can be specified in a single statement using boolean operators.
An example can be seen as follows:
output { # Send Email on Production Errors if [loglevel] == "ERROR" and [deployment] == "production" { email{ } } }
The following are types of Logstash plugins:
Now let's take a look at some of the most important input, output, filter and codec plugins, which will be useful for building most of the log analysis pipeline use cases.
An input plugin is used to configure a set of events to be fed to Logstash. Some of the most important input plugins are:
The file
plugin is used to stream events and log lines files to Logstash. It automatically detects file rotations, and reads from the point last read by it.
A most basic file configuration looks like this:
input{ file{ path => "/path/to/logfiles" }
The only required configuration property is the path to the files. Let's look at how we can make use of some of the configuration properties of the file plugin to read different types of files.
The following configuration options are available for the file input plugin:
It is used to add a field to incoming events, its value type is Hash
, and default value is {}
.
Let's take the following instance as an example:
add_field => { "input_time" => "%{@timestamp}" }
It is used to specify a codec, which can decode a specific type of input.
For example: codec => "json"
is used to decode the json
type of input.
The default value of codec is "plain"
.
To exclude certain types of files from the input path, the data type is array
.
Let's take the following instance as an example:
path =>["/app/packtpub/logs/*"] exclude => "*.gz"
This will exclude all gzip files from input.
This is the only required configuration for the file plugin. It specifies an array of path locations from where to read logs and events.
It specifies the location where to write the sincedb
files, which keeps track of the current position of files being monitored. The default is $HOME/.sincedb*
It specifies how often (number in seconds), the sincedb
files that keep track of the current position of monitored files, are to be written. The default is 15 seconds.
It has two values: "beginning"
and "end"
. It specifies where to start reading incoming files from. The default value is "end"
, as in most situations this is used for live streaming data. Although, if you are working on old data, it can be set to "beginning"
.
It specifies the array of tags that can be added to incoming events. Adding tags to your incoming events helps with processing later, when using conditionals. It is often helpful to tag certain data as "processed"
and use those tags to decide a future course of action.
For example, if we specify "processed"
in tags:
tags =>["processed"]
In filter, we can check in conditionals:
filter{ if "processed" in tags[]{ } }
The type
option is really helpful to process the different type of incoming streams using Logstash. You can configure multiple input paths for different type of events, just give a type
name, and then you can filter them separately and process.
Let's take the following instance as an example:
input { file{ path => ["var/log/syslog/*"] type => "syslog" } file{ path => ["var/log/apache/*"] type => "apache" } }
In filter
, we can filter based on type:
filter { if [type] == "syslog" { grok { } } if [type] == "apache" { grok { } } }
As in the preceding example, we have configured a separate type for incoming files; "syslog"
and "apache"
. Later in filtering the stream, we can specify conditionals to filter based on this type.
The stdin
plugin is used to stream events and log lines from standard input.
A basic configuration for stdin
looks like this:
stdin { }
When we configure stdin
like this, whatever we type in the console will go as input to the Logstash event pipeline. This is mostly used as the first level of testing of configuration before plugging in the actual file or event input.
The following configuration options are available for the
stdin
input plugin:
The add_field
configuration for stdin
is the same as add_field
in the file
input plugin and is used for similar purposes.
It is used to decode incoming data before passing it on to the data pipeline. The default value is "line"
.
You may need to analyze a Twitter stream based on a topic of interest for various purposes, such as sentiment analysis, trending topics analysis, and so on. The twitter
plugin is helpful to read events from the Twitter streaming API. This requires a consumer key, consumer secret, keyword, oauth token, and oauth token secret to work.
These details can be obtained by registering an application on the Twitter developer API page (https://dev.twitter.com/apps/new):
twitter { consumer_key => "your consumer key here" keywords => "keywords which you want to filter on streams" consumer_secret => "your consumer secret here" oauth_token => "your oauth token here" oauth_token_secret => "your oauth token secret here" }
The following configuration options are available for the twitter
input plugin:
The add_field
configuration for the twitter
plugin is the same as add_field
in the file
input plugin and is used for similar purposes.
The codec
configuration for twitter
is the same as the codec
plugin in the file
input plugin and is used for similar purposes.
This is a required configuration with no default value. Its value can be obtained from the Twitter app registration page. Its value is the String
type.
This is a boolean configuration with the default value; false
. It specifies whether to record a full tweet object obtained from the Twitter streaming API.
This is an array
type required configuration, with no default value. It specifies a set of keywords
to track from the Twitter stream.
An example can be seen as follows:
keywords => ["elk","packtpub"]
The lumberjack
plugin is useful to receive events via the lumberjack
protocol that is used in Logstash forwarder.
The basic required configuration option for the lumberjack
plugin looks like this:
lumberjack { port => ssl_certificate => ssl_key => }
Lumberjack or Logstash forwarder is a light weight log shipper used to ship log events from source systems. Logstash is quite a memory consuming process, so installing it on every node from where you want to ship data is not recommended. Logstash forwarder is a light weight version of Logstash, which provides low latency, secure and reliable transfer, and provides low resource usage.
More details about Lumberjack or Logstash forwarder can be found from here:
The following configuration options are available for the lumberjack
input plugin:
The add_field
configuration for the
lumberjack
plugin is the same as add_field
in the file
input plugin and is used for similar purposes.
The codec
configuration for the lumberjack
plugin is the same as the codec
plugin in the file
input plugin and is used for similar purposes.
This is a number type required configuration and it specifies the port to listen to. There is no default value.
It specifies the path to the SSL certificate to be used for the connection. It is a required setting.
An example is as follows:
ssl_certificate => "/etc/ssl/logstash.pub"
It specifies the path to the SSL key that has to be used for the connection. It is also a required setting.
An example is as follows:
ssl_key => "/etc/ssl/logstash.key"
The redis
plugin is used to read events and logs from the redis
instance.
The basic configuration of the redis
input plugin looks like this:
redis { }
The following configuration options are available for the redis
input plugin:
The add_field
configuration for redis
is the same as add_field
in the file
input plugin and is used for similar purposes.
The codec
configuration for redis
is the same as codec
in the file
input plugin and is used for similar purposes.
The data_type
option can have a value as either "list"
, "channel"
or "pattern_channel"
.
From the Logstash documentation for the redis
plugin (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-redis.html):
"If
redis_type
islist
, then we will BLPOP the key. Ifredis_type
ischannel
, then we will SUBSCRIBE to the key. Ifredis_type
ispattern_channel
, then we will PSUBSCRIBE to the key."
It specifies the port on which the redis
instance is running. The default is 6379
.
An extensive list and latest documentation on all available Logstash input plugins is available at https://www.elastic.co/guide/en/logstash/current/input-plugins.html.
Now that we have seen some of the most important input plugins for Logstash, let's have a look at some output plugins.
Logstash provides a wide variety of output plugins that help integrate incoming events with almost any type of destination. Let's look at some of the most used output plugins in detail.
The csv
plugin is used to write a CSV file as output, specifying the fields in csv
and the path of the file.
The basic configuration of the csv
output plugin looks like this:
csv { fields => ["date","open_price","close_price"] path => "/path/to/file.csv" }
The following are the configuration options available for the csv
plugin:
It is used to encode the data before it goes out of Logstash. The default value is "plain"
, which will output data as it is.
The csv_options
option is used to specify advanced options for the csv
output. It includes changing the default column and row separator.
An example is as follows:
csv_options => {"col_sep" => " " "row_sep" => " "}
The fields
setting is a required setting that is used to specify the fields for the output CSV file. It is specified as an array of field names and written in the same order as in the array. There is no default value for this setting.
The file
output plugin, just like the file
input plugin, will be used to write events to a file in the file system.
The basic configuration of the file
output plugin looks like this:
file { path = > "path/to/file" }
The email
plugin is a very important output plugin as it is very useful to send e-mails for certain events and failure scenarios.
The basic required configuration looks like this:
email { to => "[email protected]" }
The following configuration options are available for the email
plugin:
The attachments
option is an array of file paths to be attached with the e-mail. The default value is []
The cc
option specifies the list of e-mails to be included as the cc addresses in the e-mail. It accepts multiple e-mail IDs in a comma separated format.
The from
option specifies the e-mail address to be used as the sender address in the e-mail. The default value is "[email protected]"
and must be overridden as per the type of alerts or system.
The to
option is a required setting that specifies the receiver address for the e-mail. It can also be expressed as a string of comma separated e-mail addresses.
The htmlbody
option specifies the body of the e-mail in HTML format. It includes HTML mark-up tags in the e-mail body.
The elasticsearch
plugin is the most important plugin used in ELK Stack, because it is where you will want to write your output to be stored to analyze later in Kibana.
We will take a look at ElasticSearch in more detail in Chapter 5, Why Do We Need Elasticsearch in ELK?, but let's look at the configuration options for this plugin here:
The basic configuration for the elasticsearch
plugin looks like this:
elasticsearch { }
Some of the most important configuration options are mentioned as follows:
option |
data type |
required |
default value |
---|---|---|---|
|
|
|
|
|
|
| |
|
|
| |
|
| ||
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
| |
|
|
| |
|
|
|
Ganglia is a monitoring tool that is used to monitor the performance of a cluster of machines in a distributed computing environment. Ganglia makes uses of a daemon called Gmond, which is a small service that is installed on each machine that needs to be monitored.
The ganglia
output plugin in Logstash is used to send metrics to the gmond
service based on events in logs.
The basic ganglia
output plugin configuration looks like this:
ganglia { metric => value => }
The jira
plugin doesn't come by default in Logstash installation but can be easily installed by a plugin install
command like this:
bin/plugin install logstash-output-jira
The jira
plugin is used to send events to a JIRA instance, which can create JIRA tickets based on certain events in your logs. To use this, the JIRA instance must accept REST API calls, since it internally makes use of JIRA REST API to pass the output events from Logstash to JIRA.
The basic configuration of the jira
output plugin looks like this:
jira { issuetypeid => password => priority => projectid => summary => username => }
As explained on the Hortonworks Kafka page (http://hortonworks.com/hadoop/kafka/):
"Apache™ Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system."
The kafka
output plugin is used to write certain events to a topic on kafka
. It uses the Kafka Producer API to write messages to a topic on the broker.
The basic kafka
configuration looks like this:
kafka { topic_id => }
The lumberjack
plugin is used to write output to a Logstash forwarder or lumberjack.
The basic configuration for the lumberjack
plugin looks like this:
lumberjack { hosts => port => ssl_certificate => }
RabbitMQ is an open source message broker software (sometimes called message-oriented middleware) that implements the Advanced Message Queuing Protocol (AMQP). More information is available in the official documentation at http://www.rabbitmq.com.
In RabbitMQ, the producer always sends messages to an exchange, and the exchange decides what to do with the messages. There are various exchange types that defines a further course of action for the messages, namely direct
, topic
, headers
and fanout
.
The rabbitmq
plugin pushes the events from logs to the RabbitMQ exchange.
The basic configuration of the rabbitmq
plugin looks like this:
rabbitmq { exchange => exchange_type => host => }
The stdout
plugin writes the output events to the console. It is used to debug the configuration to test the event output from Logstash before integrating with other systems.
The basic configuration looks like this:
output { stdout {} }
MongoDB is a document-oriented NoSQL database, which stores data as JSON documents.
Like the jira
plugin, this is also a community maintained plugin and doesn't ship with Logstash. It can be easily installed using the following plugin install
command:
bin/plugin install logstash-output-mongodb
The basic configuration for the mongodb
output plugin is:
mongodb { collection => database => uri => }
The following configuration options are available for the mongodb
plugin:
The uri
option specifies the connection string to be used to connect to mongodb
.
An extensive list and latest documentation on all available Logstash output plugins is available at https://www.elastic.co/guide/en/logstash/current/output-plugins.html.
Filter plugins are used to do intermediate processing on events read from an input plugin and before passing them as output via an output plugin. They are often used to identify the fields in input events, and to conditionally process certain parts of input events.
Let's take a look at some of the most important filter plugins.
The csv
filter is used to parse the data from an incoming CSV file and assign values to fields.
Configuration options for the csv
filter plugin were covered in an example in Chapter 2, Building Your First Data Pipeline with ELK.
In ELK, it is very important to assign the correct timestamp to an event so that it can be analyzed on the time
filter in Kibana. The date
filter is meant to assign the appropriate timestamp based on fields in logs, or events, and assign a proper format to the timestamp.
If the date
filter is not set, Logstash will assign a timestamp as the first time it sees the event or when the file is read.
The basic configuration of the date
filter looks like this:
date { }
Configuration options for the date
filter are already covered in an example in Chapter 2, Building Your First Data Pipeline with ELK.
The drop
filter is used to drop everything that matches the conditionals for this filter.
Let's take the following instance as an example:
filter { if [fieldname == "test"] { drop { } } }
The preceding filter will cause all events having the test
fieldname to be dropped. This is very helpful to filter out non useful information out of the incoming events.
The geoip
filter is used to add the geographical location of the IP address present in the incoming event. It fetches this information from the Maxmind database.
Maxmind is a company that specializes in products built to get useful information from IP addresses. GeoIP is their IP intelligence product that is used to trace the location of an IP address. All Logstash releases have a Maxmind's GeoLite city database shipped with them. It is also available at http://dev.maxmind.com/geoip/legacy/geolite/.
The basic configuration of the geoip
filter looks like this:
geoip { source => }
The following configuration option is available for the
geoip
plugin.
The source
option is a required setting that is of the string
type. It is used to specify an IP address or a hostname that has to be mapped via the geoip
service. Any field from events that contains the IP address or hostname can be provided, and if the field is of the array
type, only the first value is taken.
The grok
option is by far the most popular and most powerful plugin that Logstash has. It can parse any unstructured log event and convert it into a structured set of fields that can be processed further and used in analysis.
It is used to parse any type of logs, whether it be apache logs, mysql logs, custom application logs, or just any unstructured text in events.
Logstash, by default, comes with a set of grok
patterns that can be directly used to tag certain types of fields, and custom regular expressions are also supported.
All available grok
patterns are available at:
https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns
Some examples of the grok
patterns are as follows:
HOSTNAME (?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|) DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) YEAR (?>dd){1,2} HOUR (?:2[0123]|[01]?[0-9]) MINUTE (?:[0-5][0-9])
The preceding grok
patterns can be directly used to tag fields of those types with an operator like this:
%{HOSTNAME:host_name}
Here, host_name
is the field name that we want to assign to the part of the log event that represents the hostname like string
.
Let's try to look at grok
in more detail:
The grok
patterns in logs are represented by this general format: %{SYNTAX:SEMANTIC}
Here, SYNTAX
is the name of the pattern that matches the text in log, and SEMANTIC
is the field name that we want to assign to that pattern.
Let's take the following instance as an example:
Let's say you want to represent the number of bytes transferred in one event:
%{NUMBER:bytes_transferred}
Here, bytes_transferred
will refer to the actual value of bytes transferred in the log event.
Let's take a look at how we can represent a line from HTTP logs:
54.3.245.1 GET /index.html 14562 0.056
The grok
pattern would be represented as:
%{IP:client_ip} %{WORD: request_method } %{URIPATHPARAM:uri_path} %{NUMBER:bytes_transferred} %{NUMBER:duration}
The basic grok
configuration for the preceding event will look like this:
filter{ grok{ match => { "message" =>"%{IP:client_ip} %{WORD:request_method} %{URIPATHPARAM:uri_path} %{NUMBER:bytes_transferred} %{NUMBER:duration}"} } }
After being processed with this grok
filter, we can see the following fields added to the event with the values:
client_ip : 54.3.245.1
request_method : GET
uri_path :/index.html
bytes_transferred :14562
duration :0.056
Custom grok
patterns can be created based on a regular expression if not found in the list of grok
patterns available.
These URLs are useful to design and test grok
patterns for the matching text as required:
http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/
The mutate
filter is an important filter plugin that helps rename, remove, replace, and modify fields in an incoming event. It is also specially used to convert the data type of fields, merge two fields, and convert text from lower case to upper case and vice versa.
The basic configuration of the mutate
filter looks like this:
filter { mutate { } }
There are various configuration options for mutate
and most of them are understood by the name:
Option |
Data type |
Required |
Default value |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
|
The sleep
option is used to put Logstash in sleep mode for the amount of time specified. We can also specify the frequency of sleep intervals based on the number of events.
Let's take the following instance as an example:
If we want to let Logstash sleep for 1
sec for every fifth event processed, we can configure it like this:
filter { sleep { time => "1" # Sleep 1 second every => 5 # Sleep on every 5th event. } }
An extensive list and the latest documentation on all available Logstash filter plugins is available at https://www.elastic.co/guide/en/logstash/current/filter-plugins.html.
Codec plugins are used to encode or decode incoming or outgoing events from Logstash. They act as stream filters in input and output plugins.
Some of the most important codec plugins are:
avro
json
line
multiline
plain
rubydebug
spool
Let's take a look at some details about some of the most commonly used ones.
If your input event or output event consists of full JSON documents, then the json
codec plugin is helpful. It can be defined as:
input{ stdin{ codec => json{ } } }
Or it can be simply defined as:
input{ stdin{ codec => "json" } }
The line
codec is used to read each line in an input as an event or to decode each outgoing event as a line. It can be defined as:
input{ stdin{ codec => line{ } } }
Or it can be simply defined as:
input{ stdin{ codec => "line" } }
The multiline
codec is very helpful for certain types of events where you like to take more than one line as one event. This is really helpful in cases such as Java Exceptions or stack traces.
For example, the following configuration can take a full stack trace as one event:
input { file { path => "/var/log/someapp.log" codec => multiline { pattern => "^%{TIMESTAMP_ISO8601} " negate => true what => previous } } }
This will take all lines that don't start with a timestamp as a part of the previous line and consider everything as a single event.
The plain
plugin is used to specify that there is no encoding or decoding required for events as it will be taken care of by corresponding input or output plugin types itself. For many plugins, such as redis
, mongodb
and so on, this is the default codec type.
The rubydebug
plugin is used only with output event data, and it prints output event data using the Ruby Awesome Print library.
An extensive list and latest documentation on all available Logstash codec plugins is available at https://www.elastic.co/guide/en/logstash/current/codec-plugins.html.