Previous chapters looked at how Elasticsearch can be used to search and analyze data to solve various use cases. Given that data can be generated anywhere in your environment, and on a large range of source systems, a key challenge to solve is collecting and shipping this data to your data platform.
In this chapter, we'll look at how Beats agents can be used to collect and transport events from a number of source systems to Elasticsearch. The agent you use will depend on the nature of the event (whether it's a log or metric) and how it can be extracted (using a REST API, tailing a file, or instrumenting an application). The chapter also demonstrates how Beats modules help keep the schema and format of your data consistent for downstream use cases (such as searching, alerting, and machine learning analysis).
In this chapter, we will specifically focus on the following:
To see data collection methodologies in action, this chapter will use a web server configured to serve a static HTML web page. Follow these instructions to set up and configure this component before reading ahead:
sudo apt-get update
sudo apt-get install nginx
nginx -v
sudo mkdir -p /var/www/elastic-stack-server
cd /tmp
git clone https://github.com/PacktPublishing/Getting-Started-with-Elastic-Stack-8.0.git
sudo cp -r Getting-Started-with-Elastic-Stack-8.0/Chapter6/html-webpage/* /var/www/elastic-stack-server
sudo systemctl restart nginx
sudo systemctl status nginx
Nginx should be reported as active as shown in the following screenshot:
Navigate to the virtual machine IP address using your browser to confirm the web page is accessible.
You should see the Recipe Search Service web page load as follows:
ls -lah /var/log/nginx
Following are the log files generated by the web server:
Now that the web server is configured to serve the sample page, we will take a closer look at what role Elastic Beats agents play in the collection of logs, metrics, and data from machines across your environment.
A key characteristic of modern IT environments is that useful data is generated everywhere. Appliances sitting on the edge of your network can have important metrics to capture; a fleet of frontend, backend, and database servers can generate critical error or warning logs, and the application your customer runs can generate in-depth instrumentation and tracing related to application performance bottlenecks you care about. Data is not only dispersed across the environment but is also varied in terms of the nature of the information, how it's structured, and how it should be collected.
When data is generated, it typically will need to be shipped to a centralized logging platform for search, correlation, and analysis. Beats agents are designed to collect and ship the data to such a central destination.
Note
A log contains information about a specific event generated by a system or application while a metric is a measurement of the state of a system or application at a given point in time.
The following list summarizes the officially supported list of Beats agents and the types of data sources they can be used to collect:
The following sections will explore the main Beats agents in further detail.
If you have successfully set up nginx to serve the Recipe Search Service web page, you will have noticed some access and error logs being generated for every request served by the web server. The logs can often contain useful insights into how the web server is being used, the distribution of requests being made, and where requests originate from.
Collecting these logs using Filebeat is easy. Follow the instructions to set up the Filebeat agent to collect the logs and ship them to your Elasticsearch deployment for analysis:
Filebeat uses modules to organize configuration for different inputs. We will be using the out-of-the-box module for nginx in this case, which can be loaded from the modules.d/ directory in /etc/filebeat (also referred to as path.config by Filebeat):
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
A list of available Filebeat modules can be found at https://www.elastic.co/guide/en/beats/filebeat/8.0/filebeat-modules-overview.html.
Next, the output configuration for Filebeat is set to send events to an Elasticsearch deployment. Configure the hosts, username, and password settings as they apply to your environment:
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
username: "elastic"
password: "changeme"
If TLS is required between Filebeat and Elasticsearch, the protocol can be set to https in the output.elasticsearch configuration block. If the destination Elasticsearch cluster does not use a TLS certificate issued by a trusted root certificate authority, further SSL settings need to be included to define certificate parameters.
SSL settings for Filebeat can be found in the reference guide: https://www.elastic.co/guide/en/beats/filebeat/8.0/configuration-ssl.html.
Given Beats agents are often distributed across large and complex environments, it can be useful to enrich events from agents to contain some metadata about the host, Docker container, Kubernetes cluster, or cloud provider the agent may run on. The following Filebeat processors adds such metadata to the events collected by the agent:
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
Enable the nginx module and configure the log files to be collected.
Run the following command to enable the Nginx module:
filebeat modules enable nginx
Confirm the following file is present:
ls -lah /etc/filebeat/modules.d/nginx.yml
As per our configuration, all module files that end with .yml will be loaded. Module files by default are suffixed with the text .disabled. The previous command simply renames the file to activate the module.
filebeat setup -E "setup.kibana.host=localhost:5601" --modules nginx --dashboards --pipelines
Filebeat will use configuration options in the filebeat.yml file, as well as the active modules to load the following artifacts:
You should see the following output on the successful setup of artifacts:
systemctl start filebeat
You can search and explore the log data using Discover on Kibana:
In addition to exploring data using Discover, the Logs app on Kibana can also stream log messages as they come in, to highlight keywords or log types of interest.
We will look at more complex examples of using Filebeat to collect module-based and custom application logs in Chapter 11, Observing Applications and Infrastructure Using the Elastic Stack.
Logs make up one aspect of data collection and visibility of a workload you need to monitor. Metrics are a great way to monitor and observe a workload as they represent the internal state of the component at any given point in time.
By correlating logs and metrics, an engineer or developer can quickly understand what a component is doing and how the internal state of the component is changing based on its activities in a given scenario. This is often a useful tool when troubleshooting and resolving issues related to the component in question.
In this section, we will look at collecting some metrics from the nginx web server as well as the host that the server runs on.
Follow the instructions to start collecting system and application metrics using Metricbeat:
server {
server_name 127.0.0.1;
location /server-status {
stub_status;
allow 127.0.0.1;
deny all;
}
}
systemctl restart nginx
Metricbeat uses the concept of modules (as Filebeat does) to control the different metric collection sources on a host.
Configure the metricbeat.yml file in /etc/metricbeat for the following requirements. The entire reference configuration is available in metricbeat/metricbeat.yml in the code repository.
Set up Metricbeat to load configuration modules from the /etc/metricbeat/modules.d/ directory:
metricbeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
Configure the host, user, and password parameters to send events to your Elasticsearch cluster:
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
username: "elastic"
password: "xdZLpBjHu4bFgmEnoh56t7mC"
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
Next, enable the nginx module to collect metrics from the web server by executing the following command:
metricbeat modules enable nginx
Edit the /etc/metricbeat/modules.d/nginx.yml file to include the following parameters:
- module: nginx
metricsets:
- stubstatus
period: 10s
# Nginx metrics API
hosts: ["http://127.0.0.1"]
The metrics from the OS will complement the visibility of nginx during analysis. The metrics we collect include the following:
Enable the system module to collect metrics from the host operating system.
metricbeat modules enable system
- module: system
period: 10s
metricsets:
- cpu
- load
- memory
- network
...
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
- module: system
period: 15m
metricsets:
- uptime
metricbeat setup -E "setup.kibana.host=localhost:5601"
You should see the following output when content is successfully loaded:
Start the Metricbeat systemd service to start collecting metrics:
systemctl start metricbeat
Events from Metricbeat should now be collected and sent into Elasticsearch for indexing. Data can be explored on Kibana on the Discover app; change the data view used on the left side of the screen if necessary to view Metricbeat data:
The Metrics app in Kibana can also be used to visualize infrastructure and system metrics from across your environment, as shown here:
Dashboards specific to the Metricbeat modules in use should also be available in the Dashboard app in Kibana.
The [Metricbeat Nginx] Overview dashboard shows useful metrics regarding the number of active connections, requests, and handle/drop rates:
The [Metricbeat System] Overview dashboard provides useful insight into the state of the machine running the web server, including information such as CPU/memory usage, disk usage, the number of running processes, and system load:
Feel free to explore the data on these and the linked dashboards. We will look at more in-depth aspects of Kibana as a visualization tool in Chapter 8, Interacting with Your Data on Kibana.
Now that we're collecting logs and metrics from our workload, the next section will look at collecting useful security and audit information from the host to track changes in the host OS configuration and security posture using Auditbeat.
While it is useful to monitor logs and metrics directly related to your workload, an important element of comprehensive visibility is monitoring configuration changes on the machine hosting your workload. Audit data derived from the operating system can indicate changes that may result in bugs or undesired behavior, non-compliance with security policies, or users making unauthorized changes.
Auditbeat leverages the Linux audit framework (auditd) to consistently and reliably collect audit/security-relevant data from hosts. The scope of data collection includes the following:
Data collection on auditd can be controlled using rules; curated rules can be found on openly available security hardening and best practice guides online.
Complete the following steps to start collecting audit data from the web server host:
Edit the auditbeat.yml file located in /etc/auditbeat as shown in the reference file in the code repository.
In the case of Filebeat and Metricbeat, we used configuration files to load modules. This example demonstrates how the same configuration parameters can be loaded using the auditbeat.yml file.
The following sample audit rules detect the use of 32-bit APIs on a 64-bit host OS, indicating a potential attack vector for compromise:
- module: auditd
audit_rules: |
-a always,exit -F arch=b32 -S all -F key=32bit-abi
-a always,exit -F arch=b64 -S execve,execveat -k exec
-a always,exit -F arch=b64 -S accept,bind,connect -F key=external-access
The following rules will watch for any changes to user and group identities on the host:
-w /etc/group -p wa -k identity
-w /etc/passwd -p wa -k identity
-w /etc/gshadow -p wa -k identity
The file integrity module can watch for changes to files in the following critical directories on the host and can indicate when binaries and config files are changed:
- module: file_integrity
paths:
- /bin
- /usr/bin
- /sbin
- /usr/sbin
- /etc
The system module collects information regarding successful/failed logs, processes, socket events, and user/host information:
- module: system
datasets:
- host # General host information, e.g. uptime, IPs
- login # User logins, logouts, and system boots.
- process # Started and stopped processes
- socket # Opened and closed sockets
- user # User information
state.period: 12h
Auditbeat is configured to send events collected to the appropriate Elasticsearch cluster for indexing:
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
username: "elastic"
password: "changeme"
Audit data is also enriched with metadata about the host and the cloud platform it runs on for context:
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
Set up Auditbeat artifacts on Elasticsearch and Kibana by running the setup command:
auditbeat setup -E "setup.kibana.host=localhost:5601"
You should see output as follows if the setup runs successfully:
systemctl start auditbeat
After a few moments, audit data should be available on Kibana for you to explore and visualize. Confirm events are visible on the Discover app on Kibana. Out-of-the-box dashboards can also be used to visualize the information collected.
The [Auditbeat System] Login dashboard, for example, displays all attempted login events on the host:
The Security application on Kibana can also be used to visualize some of the data collected with Auditbeat. The Hosts tab, for example, gives users a quick overview of the different types of hosts, authentication events, uncommon processes, and so on in the environment. Chapter 12, Security Threat Detection and Response Using the Elastic Stack, explores some of this functionality in greater detail.
Next, we'll look at monitoring the uptime and availability of the web server using Heartbeat.
An important part of running a web application is ensuring it is available and accessible to users. Various activities can result in an outage or degraded service levels for end users. While logs and metrics provide in-depth information about the workload, monitoring the service from the user's perspective is useful to ensure service standards.
Heartbeat can be set up to continuously monitor assets to ensure they are reachable and configured properly. We will leverage Heartbeat to monitor the Recipe Search Service web page, as well as the Elasticsearch cluster we're using to monitor the web server host.
Follow the instructions to set up and configure the Heartbeat agent:
Heartbeat uses the concept of monitors to perform uptime checks on a given asset.
Configure heartbeat.yml in /etc/heartbeat/ as shown in the reference file in the code repository.
Set up Heartbeat to load monitors from the /etc/heartbeat/monitors.d/ directory:
heartbeat.config.monitors:
# Directory + glob pattern to search for configuration files
path: ${path.config}/monitors.d/*.yml
# If enabled, heartbeat will periodically check the config.monitors path for changes
reload.enabled: false
# How often to check for changes
reload.period: 5s
Configure the Elasticsearch cluster for the events to be sent to:
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
#protocol: "https"
username: "elastic"
password: "changeme"
As assets on the internet can be accessed from any number of geographical locations, it is useful to track availability, latency, and performance from multiple locations on the internet. This is especially important when using services such as content delivery networks (CDNs), which cache content around the globe depending on the location of users. Enrich Heartbeat events with data about the geolocation of the Heartbeat agent, the host, and the cloud platform it is running on for context:
processors:
- add_observer_metadata:
geo:
# Token describing this location
name: on-premise-datacentre
# Lat, Lon "
location: "33.8688, 151.2093"
- add_host_metadata: ~
- add_cloud_metadata: ~
Configure webserver-http.yml in /etc/heartbeat/monitors.d to check the Recipe Search Service web server (as shown in the reference file). Replace localhost with the IP address or DNS name of your web server:
- type: http
id: webserver-1-http
name: Webserver 1
schedule: '@every 5s'
# Configure URLs to ping
hosts: ["http://localhost"]
Configure elasticsearch-http.yml in /etc/heartbeat/monitors.d to monitor the health and availability of the Elasticsearch cluster in use. Remember to replace localhost with the URL to the Elasticsearch cluster:
- type: http
id: elasticsearch-http
name: Dev Elasticsearch deployment
schedule: '@every 5s'
# Configure URLs to ping
hosts: ["http://localhost:9200/_cluster/health"]
# Authentication Credentials
username: 'elastic'
password: 'changeme'
Heartbeat will use an HTTP GET request by default; this can be customized as needed:
# Configure HTTP method to use. Only 'HEAD', 'GET' and 'POST' methods are allowed.
method: "GET"
Configure Heartbeat to check the HTTP response from Elasticsearch to evaluate the cluster's health. Heartbeat will report the monitor as healthy if an HTTP 200 request is received, and the status field in the JSON response has the value green:
# Expected response settings
check.response:
# Expected status code. If not configured or set to 0 any status code not
# being 404 is accepted.
status: 200
# Parses the body as JSON, then checks against the given condition expression
json:
- description: Checks if cluster health status is green
condition:
equals:
status: green
heartbeat setup -E "setup.kibana.host=localhost:5601"
systemctl start heartbeat-elastic
Momentarily, Heartbeat data should be available to explore on Kibana Discover.
The Uptime app on Kibana can be used to visualize this data and set up alerts or explore details when assets are unavailable:
Next, we'll look at using Packetbeat to collect and decode network traffic on the web server host.
We've looked at how you can observe a workload, the host, and the configuration it runs on and the availability of the workload to end users. One final aspect that we're going to look at to establish comprehensive monitoring is collecting, decoding, and analyzing host network data using Packetbeat.
Networking is a critical part of most workloads; this is especially true in the case of our Recipe Search Service web server. Network packets contain information on the following:
Packet data is also quite useful in detecting security threats/exploit attempts, forensic investigations, and troubleshooting performance and connectivity issues in the infrastructure.
Follow the instructions to set up Packetbeat to start collecting network data from the web server host:
Configure packetbeat.yml in /etc/packetbeat.
Set up the network interfaces for Packetbeat to monitor. You can use a label (such as eth0) to specify an interface or use the any parameter to monitor all available interfaces:
packetbeat.interfaces.device: any
Configure the collection of network flow information:
packetbeat.flows:
# Set network flow timeout. Flow is killed if no packet is received before being
# timed out.
timeout: 30s
# Configure reporting period. If set to -1, only killed flows will be reported
period: 10s
Configure the protocols and ports that Packetbeat should collect and decode from the data being sniffed. In this example, we want to collect ICMP, DHCP, DNS, and HTTP communications from the server. A list of supported protocols can be found at https://www.elastic.co/guide/en/beats/packetbeat/8.0/configuration-protocols.html:
packetbeat.protocols:
- type: icmp
enabled: true
- type: dhcpv4
ports: [67, 68]
- type: dns
ports: [53]
- type: http
ports: [80]
Send the data to the Elasticsearch cluster for indexing:
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
username: "elastic"
password: "changeme"
Enrich the packet data with host and cloud platform data for context. HTTP events are enriched with the HTTP content MIME type based on information in the request or response body:
processors:
- add_cloud_metadata: ~
- detect_mime_type:
field: http.request.body.content
target: http.request.mime_type
- detect_mime_type:
field: http.response.body.content
target: http.response.mime_type
Set up the required Packetbeat artifacts on Elasticsearch and Kibana:
packetbeat setup -E "setup.kibana.host=localhost:5601"
systemctl start packetbeat
Data should be available on Discover to explore and visualize as expected. Out-of-the-box dashboards can be explored to visualize network activity and usage.
The [Packetbeat] Overview dashboard provides a high-level breakdown of network flows for different protocols:
The Network tab in the Security app can also visualize Packetbeat network flows in the context of security analytics.
We have looked at establishing effective monitoring and visibility on our web server host in the following ways:
In this chapter, we looked at using different Beats to collect data for shipping into Elasticsearch or Logstash.
We started off by looking at the internals of a Beats agent, the different Beats agents on offer, and what role they play depending on the location and type of your data. Next, we focused on a range of practical scenarios to highlight the role each Beat can play in helping you collect the data you need in Elasticsearch. Finally, we looked at how you can use out-of-the-box Beats content in your monitoring and analysis use cases.
In the next chapter, we will shift our attention to Logstash as a tool to extract, transform, and load information (from sources including Beats) into Elasticsearch.