This is Part 4 of my tutorial series on ELK on CentOS 7

  • Part 1 - The Foundation
  • Part 2 - Elasticsearch
  • Part 3 - Kibana
  • Part 4 (This Site) - Logstash
  • Part 5 - Filebeat with Apache and Nginx

The next component of the ELK stack is Logstash. This component receives data from different sources, aggregates and filters it and prepares it to be ingested by Elasticsearch.

You don't necessarily need Logstash for a lot of the things I show in this tutorial. For example, Filebeat can send logs directly to Elasticsearch without having to go through Logstash. However there's two reasons, still using Logstash. First, this will help you understand the ELK-stack better and secondly, if you're planning to collect logs from multiple servers, Logstash is the way to go.

Install Logstash

Update: Currently it seems that the .rpm and .deb packages provided by Elastic contain a bug which prevent them from correctly installing with Oracle Java 11 or OpenJDK 11. The workaround is to use Java/OpenJDK 8 during installation and then switch/upgrade to Java/OpenJDK 11. So far it seems that the operation of Logstash 6.7 works ok with Version 11. Please note that any previous versions of Logstash (up to 6.6 only support Java/OpenJDK 8

Again, as with Elasticsearch and Kibana, we need to make sure we have Elastic's GPG key installed:

$ sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Then we create the repo file:

$ sudo nano /etc/yum.repos.d/logstash.repo

Dump this into the empty file:

[logstash-6.x]
name=Elastic repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Now we can install Logstash:

$ sudo yum install logstash -y

Logstash on startup:

$ sudo systemctl enable logstash.service

And then we start Logstash:

$ sudo systemctl start logstash.service

Configure Logstash

In this part, we will configure Logstash to receive files from a remote Nginx web server. But before we do that, we obviously need to make sure, Logstash can receive encrypted connections, otherwise anybody could send us their logs and spam us.

Configure SSL

This section is based on a tutorial by Benjamin Knofe and has been updated for Elastic 6.x.

In your home folder run this command first to generate the CA cert:

$ cd ~
$ openssl genrsa -out ca.key 2048
$ openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 -out ca.crt

Answer the questions, the only one that really matters is the hostname which should be the FQDN of your Elastic Server. When you run the last command, unfortunately when you mistype, you have to press ctrl+c to abort and run the whole thing again.

Logstash Certificate

Next we need to generate a certificate for Logstash. Create a file in your home folder again called logstash.conf:

$ nano logstash.conf

Dump this into the file and read the instructions to it below:

[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no

[req_distinguished_name]
countryName                     = XX
stateOrProvinceName             = XXXXXX
localityName                    = XXXXXX
postalCode                      = XXXXXX
organizationName                = XXXXXX
organizationalUnitName          = XXXXXX
commonName                      = XXXXXX
emailAddress                    = XXXXXX

[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = DOMAIN_1
DNS.2 = DOMAIN_2
DNS.3 = DOMAIN_3
DNS.4 = DOMAIN_4

In the section [req_distinguished_name], change the XX to what you entered earlier in the questions when generating the ca.crt file (it's not a must have, though).

In the [alt_names] section, change the entries to match all FQDN names of the ELK cluster hosts you will use this specific certificate on.

Hosts which will send their log files which don't belong to the ELK-cluster, will have separate certificates, so don't put those in here (For example if you're sending logs from some web servers you're hosting).

In our case, since we're only running it on one machine, just keep the first entry, change it to match your ELK-host's FQDN and delete the rest.

Logstash Key

Next, generate the Logstash key:

$ openssl genrsa -out logstash.key 2048
$ openssl req -sha512 -new -key logstash.key -out logstash.csr -config logstash.conf

Next, we need to get the serial number of the CA.

$ openssl x509 -in ca.crt -text -noout -serial

The last line of the output is the serial number. Copy only the number and put it into a file (replace [SERIALNUMBER] but keep the "):

$ echo "[SERIALNUMBER]" > serial

Signing of the Logstash Certificate

Next, we create and sign the Logstash certificate:

$ openssl x509 -days 3650 -req -sha512 -in logstash.csr -CAserial serial -CA ca.crt -CAkey ca.key -out logstash.crt -extensions v3_req -extfile logstash.conf
$ mv logstash.key logstash.key.pem && openssl pkcs8 -in logstash.key.pem -topk8 -nocrypt -out logstash.key

Store and Secure the Logstash Certificates and keys

Let's now create a folder to store all this (including the configuration files) and change file permissions. Make sure you're in the directory where all the files we just created are - should be your home):

$ sudo mkdir /etc/elk-certs
$ sudo mv -t /etc/elk-certs/ ca.* logstash.* serial
$ cd /etc/elk-certs
$ sudo chown logstash:root *
$ sudo chmod o-rwx *.key*

The last line removes access to everyone but the logstash and the root user from all private keys.

Create Filebeat Certificates

We will create the Filebeat Certificate on the same machine, since we will need to use the CA we just created to sign it. So make sure, you're still in the proper folder:

$ cd /etc/elk-certs

Create a new file called beat.conf:

$ sudo nano beat.conf

And dump this into it:

[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no

[req_distinguished_name]
countryName                     = XX
stateOrProvinceName             = XXXXXX
localityName                    = XXXXXX
postalCode                      = XXXXXX
organizationName                = XXXXXX
organizationalUnitName          = XXXXXX
commonName                      = XXXXXX
emailAddress                    = XXXXXX

[ usr_cert ]
# Extensions for server certificates (`man x509v3_config`).
basicConstraints = CA:FALSE
nsCertType = client, server
nsComment = "OpenSSL FileBeat Server / Client Certificate"
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer:always
keyUsage = critical, digitalSignature, keyEncipherment, keyAgreement, nonRepudiation
extendedKeyUsage = serverAuth, clientAuth

[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth, clientAuth

Make sure that "commonName" matches the hostname of the web server you will send logs from (not a hard requirement, but you get it).

Then we generate all the necessary stuff (note, that this time I'm using sudo since we're now in a folder which should only be writeable by root):

$ sudo openssl genrsa -out beat.key 2048
$ sudo openssl req -sha512 -new -key beat.key -out beat.csr -config beat.conf
$ sudo openssl x509 -days 3650 -req -sha512 -in beat.csr -CAserial serial -CA ca.crt -CAkey ca.key -out beat.crt -extensions v3_req -extensions usr_cert  -extfile beat.conf

Secure the key file:

$ sudo chmod o-rwx beat.key

Finally you need to copy beat.crt, beat.key and ca.crt to your web server which runs Filebeat. Of course you are free to put them anywhere on your server but I would suggest to keep it consistent with where you keep the certificates on your ELK-stack hosts, in our case in /etc/elk-certs.

Configuring a Pipeline

A pipeline in Logstash is the process of receiving data, filtering and processing the data and then sending it on to somewhere else. We will of course send the data to Elasticsearch but you can also send it to other destinations (Hadoop etc).

By default, Logstash defines the main pipeline. If you're running your ELK stack for one or two purposes only, that's absolutely fine. But if you're running ELK for all sorts of data crunching, I would highly recommend to define pipelines for certain purposes. For example, if you're planning to collect log files from multiple web servers (even a couple of hundred) you should define a pipeline for that. Actually, if you can, split that up into multiple pipelines if necessary, along the lines of clusters that don't have anything to do with each other, for example.

That being said, make sure you keep an eye on your performance parameters. By default, the elastic setup is optimized for one pipeline. If you're running multiple pipelines hot, you need to adjust the settings.

In our case, we will define a new pipeline called web servers while leaving the main pipeline alone for now. This means, we don't have to look into performance:

$ sudo nano /etc/logstash/pipelines.yml

Your pipelines.yml should look like this now:

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"
- pipeline.id: webservers
  path.config: "/etc/logstash/webserver.conf.d/*.conf"

You should have added the last two lines, the rest should already be there.

Configuring the Pipelining Process

In the section above, we have defined a new pipeline. Now we need to say what the pipeline is supposed to do. We have to make sure the folder for our new pipeline exists:

$ sudo mkdir /etc/logstash/webserver.conf.d

Before we go ahead and configure the pipeline, a few words on how the configuration files are actually build. Each pipeline-configuration looks like this (don't use this of course):

input {
  ...
}

filter {
  ...
}

output {
  ...
}

As you can see, it consists of 3 main sections input, filter and output. Most configuration examples keep the whole configuration in one file, and it often makes sense.

In this case here though, we will split up the configuration in a file which contains the input and output section and a few other files which contain the filters for each type of log that comes in, specifically Apache2 and NGINX logs. This is for readability only, and if you like to keep things together, you can keep everything in one file.

Define Input and Output

Now, let's get stuff done. First we're going to generate a file called webserver_io.conf:

$ sudo touch /etc/logstash/webserver.conf.d/webserver_io.conf

And this is how it should look like:

Update: In a previous version of this tutorial, I used a different index name and defined custom index templates. The index name I'm using now is the default, which in this case will spell out filebeat-6.7-[currentdate]. The reason for the change is that with this name, it's much easier to use the in-build sample dashboards that are provided with elastic. The custom index template should only be used if you really know what you're doing. In fact, speaking to the guys at Elastic directly they told me it's better to just use the predefined template provided by Elastic. If you want some fields added, you should submit a change to Elastic so they can amend the index template.

input {
	beats {
		port => 5044
		host => "0.0.0.0"
		ssl => true
		ssl_certificate_authorities => ["/etc/elk-certs/ca.crt"]
		ssl_certificate => "/etc/elk-certs/logstash.crt"
		ssl_key => "/etc/elk-certs/logstash.key"
		ssl_verify_mode => "force_peer"
		}
	}

output {
	elasticsearch {
		hosts => ["localhost:9200"]
		index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
		}
	}

For the details of the SSL configuration, you can check out the respective document on Elastic's website.

The host => "0.0.0.0" directive is to make sure, Logstash doesn't only accept connections from the localhost interface but all interfaces of the machine it's running on.

In Kibana you will be able to either see all results of all the multiple web servers combined - even if coming from NGINX and Apache - but also view just the logs from specific servers. In fact, the way it's set up, all filebeat logs will go into one index. In order to see the data for a specific server, we will then use filters in Kibana.

You could now start Logstash at this point, however since we haven't defined any inputs yet on Filebeat, nothing will happen. So hold off, we'll do that in the next part.

Conclusion

Now Logstash is configured to receive logs from any Beat you send stuff from. This configuration will work, however what you would receive, would not be filtered. This is what we will take care of in the next part.