In the last chapter, we looked at how the Elastic Stack can be used to centralize logs, metrics, and other sources of telemetry from across your organization to better observe and understand the environment. The scale and extensibility of the stack make it possible to collect massive volumes of data for search, analysis, and the building of custom use cases on top of the data. Security teams have very similar data collection and visibility requirements, making the Elastic Stack an ideal platform for security analytics, threat detection, investigation, and response workflows.
As more and more critical business processes move online, it is critical for organizations to invest in sound cyber security controls, alongside appropriate detection, and remediation capabilities. This is key when it comes to protecting the security of your assets, the integrity of your systems, and the trust your customers have in you.
Consider some of the following observations when it comes to modern businesses and their implications for security teams:
Organizations are investing more than ever before in maintaining digital interactions with their customers around the world (heavily leveraging cloud technologies to get there). This also means an increased surface area for potential cyber threats and attacks. Security teams need to be able to scale the way in which they monitor and defend their organization from attack. Leveraging technology to improve visibility, centralized collection, and analysis goes a long way in making this possible.
Cybercrime groups collectively invest in building and maintaining high-quality tooling, intelligence, and procedures. They also actively collaborate to share vulnerabilities and exploits in target systems, given their financial or ideological incentives. For cyber criminals, the key to a successful campaign is moving fast after the initial breach to execute their attack. For defenders, fending off attacks requires taking a layered approach to implementing security controls and monitoring, reducing the blast radius of any one vulnerability in your environment. Additionally, sufficient capability must be deployed to minimize the Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR) to a potential breach.
Well-resourced attacks can go on for long periods of time as attackers operate with stealth to evade detection. Reports have confirmed instances of attacks where attackers were able to maintain access to target systems for multiple years before being discovered. Security teams need long-term data retention and analysis capabilities to effectively detect such campaigns, while also complying with minimum data retention mandates by governments and regulating bodies for satisfactory incident response.
The security solutions on the Elastic Stack help solve some of these challenges for security teams, giving them the ability to effectively defend their organization from security threats. In this chapter, we look at a high level look at what it takes to build an effective security capability using the Elastic Stack. We explore the use of Elastic Agent to implement endpoint security controls on your infrastructure. We also look at the use of a Security Incident and Event Management (SIEM) platform to implement threat detection, analytics, and alerting on security telemetry across all layers of your environment.
Specifically, we will focus on the following topics:
This chapter walks you through the various aspects of monitoring and observing different parts of your technology stack and environment. The code for this chapter can be found on the GitHub repository for the book:
https://github.com/PacktPublishing/Getting-Started-with-Elastic-Stack-8.0/tree/main/Chapter12
The examples in this chapter build on the environment configured for the Recipe Search Service web server in Chapter 10, Building Search Experiences Using the Elastic Stack.
Security teams are responsible for some of the most important and consequential capabilities in any modern organization. As businesses move online, the average user is becoming more technically savvy and conscious of their privacy and security online. Not only do businesses have to provide exceptional customer experiences with high availability and resiliency around the clock; they must also do this while preserving the security and privacy of their customers. On the other hand, competitors, criminals, and adversaries increasingly leverage offensive security practices to disrupt business operations, gain access to sensitive research and intellectual property, and damage brand reputation in the market.
Consider the three principles of data security (also known as the CIA triad). While reading about each principle and related security controls, also consider how using logs, metrics, and security telemetry can help in building corresponding security capabilities in your organization.
Protecting data from unauthorized access is incredibly important when it comes to applying effective security controls. Data can often contain private, sensitive, or personally identifiable information. Adversaries can very quickly and effectively weaponize private user information, given the widespread nature of our online footprint. Some systems can also be commercially or nationally sensitive, where a breach in security controls can have grave consequences for a large group of people.
Preserving the confidentiality of data means defining the entities that have access to a system, including the level of access they have and what they can do with the data on the system. In practice, this is implemented by authenticating users to confirm their identity, and confirming they are authorized to access or manipulate the data in question.
Common approaches for confidentiality and secure access control (not in order of priority) include the following:
Preserving integrity in security means that an unauthorized or external actor cannot change, corrupt, or modify data or configuration on a system. Integrity can also be extended to include the reversal of any modifications made by an authorized person. Maintaining the integrity of data, systems, or configuration means your systems will run in a robust, resilient, and predictable manner.
Common security controls relating to the integrity of systems include the following:
Availability means data, systems, or services remain readily available for use by authorized parties or end users. Availability can also be extended to mean appropriate quality of service controls are implemented so that users of a shared environment cannot degrade the experience of others using the system. Maintaining availability means the system can be used as intended.
Security controls to ensure system availability include the following:
The design and implementation of your security policies, functions, and capabilities should consider how the confidentiality, integrity, and availability of your data and systems can be protected from attack.
To design effective security capabilities, you must also consider how people, processes, and technology can work together to help your team:
The following diagram illustrates core Security Operations Center (SOC) capabilities as discussed in this section:
This chapter will focus primarily on the technology controls (but will touch on the people and process investments) required to build the capabilities discussed, leveraging security data on the Elastic Stack. Let's look at what it takes to build a modern SOC.
We looked at some of the objectives of a modern security operations center to effectively protect your organization from security threats. This section describes the necessary technology or tooling investments required to achieve these objectives.
A SIEM solution can help security teams turn logs, metrics, and other sources of security telemetry from across your organization into actionable insight. This insight helps teams achieve a range of SOC objectives, including being able to detect and remediate threats, understand their security posture, investigate issues, and stay compliant with appropriate policies and standards.
Let's look at some important requirements when it comes to designing your SIEM platform, and how the Elastic Stack can be leveraged to implement some of the capabilities.
A properly configured SIEM is a powerful tool because of the wide variety of data it can collect from across your environment. This gives security teams a single pane of glass to observe their environment, detect potential threats, and investigate their impact. Common sources of high-quality security data include the following:
It is useful to be able to leverage collection agents installed on hosts (such as Beats and Elastic Agent), as well as common data aggregation and streaming mechanisms (Syslog/Rsyslog, Windows Event Forwarding, Netflow, and so on), depending on the data source and volumes in question.
When onboarding data onto your SIEM, it is important to make sure the data conforms to some sort of unified data model or schema. All out-of-the-box integrations available on Beats or Elastic Agent automatically transform data into the Elastic Common Schema (ECS). This allows teams to stitch together activity across different systems by easily searching for and correlating events without learning device-specific schemas and field names.
Documentation on the ECS can be found in the reference guide:
https://www.elastic.co/guide/en/ecs/current/ecs-reference.html
A full list of supported data integrations available on the Elastic Stack can be found in the integrations catalog:
https://www.elastic.co/integrations
All integrations also come with supporting documentation, detailing prerequisites, configuration parameters, and examples:
https://docs.elastic.co/en/integrations
We discussed collecting and onboarding data onto the Elastic Stack using Beats, Elastic Agent, and Logstash in Chapters 6 to 9 in this book.
Install Elastic Agent on the Recipe Search Service web server we configured in Chapter 11, Observing Applications and Infrastructure Using the Elastic Stack, to follow along with the examples in subsequent sections of this chapter. Using the information in Chapter 9, Managing Data Onboarding with Elastic Agent, add the following integrations to the policy for your agent:
Install Packetbeat on the host to collect and decode network packet and protocol-level data to gain further visibility into your environment. Installation instructions for Packetbeat are available in the reference guide:
https://www.elastic.co/guide/en/beats/packetbeat/8.0/packetbeat-installation-configuration.html
Note
Packetbeat must be installed directly on the host machine. Elastic Agent does not provide packet collection functionality at the time of writing.
Packetbeat should be configured to collect network flow logs from all interfaces on the host, as well as to decode common protocols including icmp, dns, http, and tls.
A sample configuration for Packetbeat is provided in the packetbeat.yml file in the code repository for this book.
You should see your events populated on the Overview page in the Security solution on Kibana:
Real-world environments are not static, and neither is the way you collect and instrument your data. APIs and log file formats can change, devices may introduce new functionality requiring additional instrumentation, or you could have data sources in error states from time to time. It is worth thinking about how you are going to monitor the health and integrity of your data collection pipelines, allowing you to ensure you have full and continuous visibility of your assets. Dashboards visualizing your log volumes, as well as alerts set to trigger when log volumes drastically change, are a good starting point in this direction.
Now that you've onboarded some data into your SIEM, let's look at implementing security detection on your data.
Getting your data flowing into your SIEM platform is only half the battle when it comes to building your SOC. Engineering high-quality security detection for your environment is an important area to invest time and resources in to make the most of your SIEM capability. Good detection should find the right balance between coverage and fidelity.
Good coverage means the detection casts a wide enough net in terms of time periods and subsets of data sources to look for all potential threats. Good fidelity means the detection results in alerts that have a high enough confidence interval, making them worth investigating from an analyst perspective. Analyst time and resources are often scarce and detection that produces a high number of false positives will quickly overwhelm the security team and result in a loss of confidence in detection capability.
SIEM detections form part of your broader security controls, and it is important to think about how they fit into your threat modeling and layered security controls strategy. It is useful to leverage a framework such as MITRE Attack (also known as MITRE ATT&CK) to plan your threat detection roadmap.
MITRE Attack is a knowledge base to model adversary behavior against your environment. It leverages tactics (or adversarial goals) and techniques (how these goals may be achieved) to represent the attack surface for your environment. This kind of framework is useful not only to understand areas where you need to invest in detections or controls but also to provide well-documented examples of procedures for attacks, detection mechanisms, and remediations. Further details on MITRE Attack, as well as the matrix with tactics, techniques, and procedures, can be found on the MITRE website:
Explore the matrix to understand how you may leverage MITRE Attack when building detections for your SIEM. The following figure shows Credential Access as a goal (or tactic) an adversary might have against your environment. The list then documents techniques an adversary may leverage to gain access to credentials. Brute forcing credentials is one such technique. Password guessing, cracking, spraying, and stuffing are all documented references to adversaries using the technique.
Detections on Elastic SIEM can be tied to one or more MITRE threats. This gives security analysts a powerful knowledge base and point of reference when triaging or investigating an alert produced by the detection.
Elastic SIEM supports the following rule types for your detections.
Custom query detections can be used to look for a specific behavior or activity in your environment. Detections can be run across an array of data sources, but the actual activity must be limited to a singular event.
For example, you may want to detect all DNS activity to internet-based DNS servers if you have an internally managed DNS. This can indicate potential misconfiguration or an attempt to circumvent your internal DNS server and reduces the level of visibility you have of DNS activity. Malware can also use external DNS servers to exfiltrate information from your environment.
You can use the KQL syntax to define custom rules. A custom query detection can easily look for such activity and would look as follows:
event.category:(network or network_traffic) and (event.type:connection or type:dns) and (destination.port:53 or event.dataset:zeek.dns) and
source.ip:(
10.0.0.0/8 or
...
) and
not destination.ip:(
10.0.0.0/8 or
...
)
Review the KQL reference guide for more information on syntax and capabilities:
https://www.elastic.co/guide/en/kibana/8.0/kuery-query.html
Because your data is in a common schema, the query can easily be run on all Packetbeat, Filebeat, and related log events on your SIEM, giving you broad coverage. The set of source and destination IPs gives your detection higher fidelity by excluding internal network ranges from the detection.
Navigate to the Rules tab on the Security app and create the detection using the file dns-to-internet.ndjson as reference.
Note
Rules can be imported into Elastic Security from the Rules interface on Kibana. Click on the Import Rules button and select the rule definition file to import the rule.
When implemented, the rule looks as follows:
All rules have a description, the associated severity of the detection and risk score, reference URLs (including the mapped MITRE Attack threat), and examples of false positives. Rules run on a predefined schedule (determined by data volumes, nature, and sensitivity). The additional look-back time can help ensure data was not missed by the rule in the case of ingestion lag.
Threshold rules can be used to detect when the number of times an event occurs satisfies a pre-defined threshold. Like custom queries, KQL can be used to define the actual event of interest. Events can optionally be grouped by a field in the data before the threshold is evaluated.
For example, a singular failed SSH authentication attempt is not necessarily indicative of undesired or malicious activity. However, 100 failed attempts within the span of 5 minutes can indicate potential brute force activity and may be of interest to a security analyst.
Implement the threshold rule using ssh-bruteforce-attempt.ndjson as a reference. The rule should look as follows:
Save and activate the detection rule. To test your rule, attempt to SSH into your Linux web server with an incorrect username. Repeat the failed attempt 20 times to simulate the detection and wait for the detection to appear on the Alerts tab.
Event correlation and sequence-based detections give teams the ability to look for a particular sequence or chain of events, across multiple data sources to detect security threats. The detection uses the Event Query Language (EQL), which is a simple query language designed specifically for security use cases, allowing the expression of relationships between different events in your environment.
Detailed information on the EQL, including syntax, function references, and examples can be found in the documentation guide:
https://www.elastic.co/guide/en/elasticsearch/reference/8.0/eql.html
Consider the following scenarios from a threat detection perspective in the Recipe Search Service environment:
In previous examples, we looked at detecting potential brute force attempts using threshold queries. While that indicates an attempt to access credentials, it does not indicate a successful compromise. This is where sequence-based detection can help with engineering high-fidelity detections.
This detection should trigger if several failed SSH attempts are followed by a successful login event within a short period of time. More details on this technique can be found in the MITRE Attack knowledge base:
https://attack.mitre.org/techniques/T1110/
As the EQL is an event-based language, sequences are expressed by event categories (which is the event.category field in ECS). In this example, we're looking at authentication events from the auditd dataset on the web server.
The EQL for this scenario would look as follows:
sequence with maxspan=15s
[ authentication where event.dataset == "auditd.log" and auditd.log.op == "login" and event.outcome == "failed"]
[ authentication where event.dataset == "auditd.log" and auditd.log.op == "login" and event.outcome == "failed"]
[ authentication where event.dataset == "auditd.log" and auditd.log.op == "login" and event.outcome == "failed"]
[ authentication where event.dataset == "auditd.log" and auditd.log.record_type=="USER_AUTH" and event.outcome == "success"]
The query will return any sequence of events, where within the span of 15s, three failed authentication attempts are followed by a successful login event on the server.
Implement the sequence rule using the potentially-successful-ssh-bruteforce.ndjson file as a reference.
Activate the rule and simulate failed SSH login attempts followed by a successful attempt to see the alert.
Now consider a scenario where the credential access gained from the brute force attack is used to reach out to external attacker infrastructure. Malware can often use common protocols such as HTTP or DNS to communicate with command-and-control infrastructure to exfiltrate data, retrieve instructions, or evaluate a kill switch condition.
In this scenario, once logged in, the attacker uses standard tools found on the host operating system to reach out to external infrastructure while evading detection and security tooling. This is also known as living off the land. Curl is used to reach out to a malicious web server outside the environment, after which a file is downloaded with instructions.
The EQL for this sequence of events looks as follows:
sequence
[ authentication where event.dataset == "auditd.log" and auditd.log.op == "login" and event.outcome == "success"]
[ process where process.name == "curl" and process.args in ("www.malicious-site.com", "www.some-external-url.com")]
[ file where process.name == "bash" and event.type=="creation" and file.extension in ("txt", "xml", "zip")]
Implement the rule using potentially-compromised-account.ndjson as a reference. Log in to your host and execute the following command to simulate the detection:
curl www.candc.biz > instructions.txt
The rule should generate an alert after a few minutes.
Elastic machine learning (ML) features can run anomaly detection jobs on your time series data (including logs, metrics, and security telemetry). While taking a rule-based approach is useful to detect known threats, anomaly detection using ML is a great way to look for changes in standard behavior in your environment to bubble up unknown but potentially malicious threats.
We looked at machine learning features, including anomaly detection jobs, in Chapter 5, Running Machine Learning Jobs on Elasticsearch. Some data integrations ship with machine learning jobs out of the box to help teams get started with ML-based security detections.
Consider the scenario where a malicious user compromises credentials to a web server in your environment. The behavior profile of the malicious user is going to be vastly different, compared to the server administrator. Some potential differences can include the following:
Using machine learning jobs can alert analysts to such activity without having to create and maintain many low-fidelity detection rules.
In this example, an anomaly detection job is created to look for rare processes on the web server host. Add the ML job to the security jobs group to make it available to the SIEM. A machine learning based detection rule can then be created to alert on anomalies found by the machine learning job. You can define an anomaly score threshold to filter out any low-confidence anomalies.
Given the potentially large number of alerts generated by this type of job, consider configuring this as a building block alert in the rule definition. Building block alerts do not appear on the alerts UI and are useful to represent low-risk detections. An overarching detection rule can act on multiple building block alerts for a host or user, for example, to create a high-fidelity detection.
Implement the detection rule using detects-rare-processes-on-hosts.ndjson as a reference.
You should see alerts appear for hosts with rare process anomalies in the SIEM alerts interface.
Indicator match rules can be used to detect and alert on indicators of compromise (IOC) in your environment. An IOC refers to evidence of malicious activity that can be used to identify whether a breach occurred and the severity and extent of the compromise.
The indicator match detection type can be configured to work with any type of indicator dataset. A threat intelligence module is available to ingest IOC feeds from a range of open source APIs and intelligence platforms.
URL Abuse is a web-based malicious URL sharing platform and is a useful feed to detect potentially malicious network destinations. Configure Filebeat or Elastic Agent to ingest URL feeds into your SIEM. The threatintel.yml file can be used as a reference when onboarding this dataset.
Configure an indicator match rule using the url-ioc-match.ndjson file as a reference. The url.full source field can be mapped with threatintel.indicator.url.full under the indicator mapping section. The rule should generate alerts if any activity to a malicious URL is found in your logs or telemetry as shown.
Open the alert to see details on the threat intelligence match found:
Note
Using indicator matching for detections is a reactive control and is only as good as the timeliness and validity of the indicator data. Indicator matching should therefore only be relied upon to validate or confirm a breach has occurred, and not as a primary detection mechanism.
Security teams can build comprehensive detection capabilities by leveraging the different rules available in the SIEM. A mature SIEM should be able to clearly identify malicious behavior, enforce security policies and best practices, and use analytics to bubble up stealthy and sophisticated adversarial behavior.
While each environment is unique and mature teams should build and maintain detections relevant to their business, hundreds of prebuilt detection rules are available for the various data integrations supported on Elastic Security. Rules are grouped using tags for easy management and activation depending on the data you have onboarded. You can leverage the prebuilt rules even on custom data sources (if your data is in the ECS) to quickly uplift your threat detection posture when building your SIEM.
The interface also allows security teams to manage and monitor rules, to ensure detections are successfully running as expected.
The next section looks at how analysts can collaborate and work together to investigate alerts on the SIEM.
High-quality detection alerts are only useful if the security analysts in the team can easily understand what the alerts mean, investigate to confirm whether there is an incident, and appropriately respond in a timely manner.
Alerts are shown in the alerts table in the SIEM UI for triaging. Alert trends provide useful context as to the prevalence and distribution of an alert over time. Analysts can also group alerts in different ways (such as by username, host, associated tactic/technique, and so on) to triage and choose the alerts they would like to tackle.
Alert investigations often start by reviewing alert details and high-level metadata on the alert details page. Raw event data can be viewed under the Table tab in the alert. The following alert details malware that was identified on an endpoint on the Recipe Search Service web server.
Analysts can choose to investigate the alert using the timeline feature on the SIEM. The timeline is a visual tool designed to help analysts ask the right questions of their data, to quickly zero in on relevant events in the environment to assist with their investigation.
Fields of interest can be dragged and dropped onto the query builder to see the timeline of events. Once the hypothesis is confirmed and the required data is collected, the timeline can be saved and attached to a case.
The case can be used to collaborate within the team, to centralize notes and comments during the investigation. Timelines and Kibana visualizations can be attached to a case to support your hypothesis. Cases can also be pushed to external systems such as Jira or ServiceNow for ticketing, reporting, and incident response workflows with the rest of the organization.
It is also important to have the capability to tune your security detections based on the outcome of your investigation. Tuning can involve the following:
Effective tuning of detections ensures there is a feedback loop between detection engineering and analyst investigation, helping improve the effectiveness and confidence in your detection controls.
Contextual information in alert information can substantially reduce the time it takes an analyst to understand what an alert means to their environment and security posture. Context is also useful in triaging and prioritizing the alerts an analyst can look at. It may not always be possible to get through all alerts given limited analyst bandwidth and alert noise.
An easy way to apply context is to ingest threat intelligence feeds. The Elastic threat intelligence module can ingest a variety of such feeds to bring more context to your alerts. The following dashboard shows the different types of intelligence data available with the module:
The SIEM can also be configured to apply custom data enrichment pipelines on your incoming data to make it more useful for detections as well as analysis. Common use cases for data enrichment include the following:
We looked at using the Enrich processor for data enrichment in Chapter 4, Leveraging Insights and Managing Data on Elasticsearch. We also looked at using Logstash for data enrichment in Chapter 7, Using Logstash to Extract, Transform, and Load Data.
Threats are constantly evolving and so should your detection controls. It is important to continuously evaluate your security posture and update your controls to stay ahead of adversaries.
Proactive hunting for adversaries in your environment can happen when the following occur:
Teams should be able to leverage their SIEM to do the following:
The SIEM supports analysis for threat hunting by providing a curated interface to understand host and network activity across your environment, regardless of the data source. This complements the dashboarding capability on Kibana that analysts can also use for analysis.
The Hosts view gives teams an overview of key activities related to your hosts, including authentication events, uncommon processes, and anomalous activity. This can be used as a starting point for a hunt, before pivoting into other areas for further investigation.
The Network view gives analysts a holistic understanding of all network activity in the environment. High-level metrics, as well as flow- and protocol-level information, is provided, including for DNS, HTTP, and TLS.
Teams also have the ability to ask ad hoc questions of their hosts using the Osquery Manager integration on Elastic Agent. Osquery is a SQL-like schema that can be used to instrument host operating systems.
Details on the Osquery schema can be found on the documentation site:
https://osquery.io/schema/4.9.0
Navigate to the Osquery app to run a live query against your hosts running Elastic Agent. The following query returns all packages installed on a Debian-based host:
SELECT * FROM deb_packages
The results should look as follows:
This data can also be viewed in Discover and used in visualizations, security detections, and other use cases on Kibana. You can also configure scheduled queries to automatically bring in ad hoc data. This gives teams the ability to ask wide-ranging questions of their environment when hunting for threats or even investigating incidents.
The best way to scale the capacity of a security team while ensuring a consistent and appropriate response is to leverage automation and integrated response actions from within your SIEM. A large proportion of security detections can be repetitive in nature, with a relatively straightforward response or remediation action. Configuring your SIEM to automatically take the appropriate action with little to no analyst interaction frees up the security team to tackle the more complicated detections.
For example, a detection rule might pick up potential bot scanning activity toward your web-facing assets. Rather than have a security analyst investigate the alert and update firewall rules to block offending source IPs, the process can be automated from your SIEM.
Common downstream integrations from a SIEM include the following:
Elastic SIEM integrates with the Endpoint Security EDR solution to provide host isolation and other response actions on host devices.
Host isolation, for example, is useful when you suspect potential compromise but need more time to investigate before taking definitive action. Open an alert and click on the Take action button. Select Isolate host to cordon off the machine from the rest of your network while you investigate and apply necessary remediation. The host will continue to be accessible to Elastic Security for investigations but will not be able to talk to other systems.
A comment log can be maintained for response actions as shown. The host should then appear as Isolating.
Kibana alerting and actions are integrated with the detection engine on the SIEM. A range of response actions can be configured for each detection rule as required.
We looked at Kibana alerting and actions in Chapter 8, Interacting with Your Data on Kibana.
Threat detection capabilities on your SIEM are only as good as the level of visibility you have in your environment. Teams typically will need to bring in more sources of telemetry as their detections, analytics, and automated responses mature over time. This is also true as the broader organization grows, and the environment evolves to deliver business outcomes. Security teams can be empowered to be more proactive with detections without growing analyst resources at the same rate as the environment. Your SIEM needs to be able to scale appropriately to handle the necessary throughput of events per second, detections run, and dashboards/reports required.
Another important consideration is the data retention periods required for your logs and telemetry to serve your threat detection, investigation, hunting, reporting, and compliance requirements. Certain organizations are also required to retain data for a minimum period as determined by applicable regulation. The average dwell time for security breaches can be as long as several months and, sometimes, years. Teams need the ability to go back in time far enough to detect and remediate such threats. An effective SIEM needs to support long-term retention and querying of data in a practical and cost-efficient manner.
Elastic SIEM helps teams tackle these challenges in the following ways:
We explored Elasticsearch features such as scalability, tiers, lifecycle management, and data frame transform jobs in Chapter 3, Indexing and Searching for Data, and Chapter 4, Leveraging Insights and Managing Data on Elasticsearch.
Teams should consider the following aspects for their environment for data visibility and coverage:
The capability maturity of SIEM platforms evolves as the right data is onboarded:
Teams often discover and address visibility gaps over time while simultaneously working on improving detections, tuning existing alerts, and building the right workflows around the SIEM.
An EDR solution is a host-based security control to detect, prevent, and respond to security threats on your endpoint and infrastructure hosts. EDR solutions detect threats by continuously collecting and monitoring rich endpoint telemetry from the host operating system. The telemetry is analyzed on the endpoint, as it is collected to detect potential security issues.
EDR tools often focus on more dynamic behavioral aspects of threats rather than relying on signatures like traditional anti-virus tools. They search for indicators of attack (IOA), rather than looking for known malicious files. This allows EDR tools to scale against the millions of newly emerging modern threats that are often polymorphic in nature. Detections have complete contextual information as EDR tools have good visibility of all activities on the endpoint.
Endpoints generally have the largest attack surface in any environment, given the nature of how users interact and work on them. As threats are detected in real time on the actual endpoint, EDR tools are uniquely positioned to also apply automatic prevention controls to stop threats before they can cause any real damage. EDR tools can also run response actions to mitigate or remediate security issues.
EDR tools complement SIEM capabilities in a mature SOC. While the SIEM focuses on centralized threat detection across a range of systems, EDR capability completes the picture with in-depth endpoint visibility.
An EDR capability is important to a SOC as it provides the following:
The Elastic Endpoint Security integration Elastic Agent provides brings EDR capability to your SOC using the Elastic Stack. This makes it easier for teams that use the unified agent for logs and metrics collection to also implement security controls on their infrastructure. Endpoint security capability is configured and managed using Fleet in Kibana.
The Endpoint Security integration currently offers controls for the following types of threats.
Malware detection capabilities are available to stop malicious activity originating from binaries and malicious files (such as Office documents, PDFs, macros, and so on) on your hosts. EDR tools do not rely on signatures or file hashes for threat detection as those approaches simply do not scale. Instead, pre-built machine learning models trained on large volumes of data in different types of environments are used to infer whether activity is malicious. A score or confidence interval is generated for each inference, where the higher the score, the greater the confidence in the detection.
Malware models are updated periodically and shipped to the Endpoint Security agents to protect against new types of threats. Malware is automatically encrypted and quarantined when detected. This disables the malicious activity but retains artifacts for forensic analysis and investigation.
Ransomware is one of the most prevalent types of endpoint threats today. While ransomware can be detected using the same approach as malware, Endpoint Security goes one step further to stop ransomware attacks on your endpoints quickly and effectively.
The agent creates and strategically distributes hidden canary files in different locations on your endpoint. It then watches for any changes to the canary files to catch ransomware in the process of encrypting your information. If a process attempts to manipulate the canary files in question, Endpoint Security automatically analyzes the behavior to detect whether it is ransomware and kills the process before it can cause any further damage.
Memory-based threats are an attempt by adversaries to evade common detection controls on file-based events. Memory threats run by injecting code into other processes and run entirely from memory; no artifacts are produced on disk to remain stealthy. Endpoint security can detect and prevent such attacks by analyzing kernel events in real time for potential threats.
Malicious behavior detections look for the presence of adversarial behavior on the endpoint. EQL rules are run on the endpoint telemetry data to detect threats that may be leveraging defense evasion techniques. This, for example, detects adversaries using live-off-the-land techniques, where existing tools and utilities available on the host operating system or environment are leveraged to carry out the attack.
Endpoint telemetry is streamed into Elasticsearch and is available on the SIEM Detections engine for custom detections and analytics use cases.
The following telemetry events are collected on Windows, Mac, and Linux based hosts:
The following additional telemetry sources are available on Windows hosts:
To configure Endpoint Security on Elastic Agent, add the Endpoint Security integration to your Fleet policy and deploy the changes to your agents. Then, click on the Endpoint Security policy name on the Fleet UI to show the Endpoint Security configuration page. All detections should automatically be enabled and in prevention mode. This means threats will be prevented in addition to a detection alert being generated for analyst investigation. You should see your endpoints listed in the Endpoints tab in the Security app.
The European Institute for Computer Anti-Virus Research (EICAR) distributes a benign file sample to perform a basic test on EDR controls. On an endpoint running Endpoint Security, download the EICAR test file while detection or prevention controls are enabled.
You should see a popup from the Endpoint Security integration on the host when the detection controls are triggered:
You should also see the corresponding malware prevention alert on the SIEM application for further investigation. As endpoint alerts have detailed process telemetry, analysts can click on Analyze event to see the process tree for the detection, as well as any corresponding file, network, and other events on the endpoint associated with the process.
The EICAR file is only a test to ensure your detection controls are activated. Further detailed tests on endpoint detection and prevention capabilities can be performed by detonating malware in a secure sandbox environment, or by using appropriate red team automation scripts to simulate threats or adversarial behavior.
This concludes the chapter on using Elastic Security for threat detection, prevention, and response in your security operations center.
In this chapter, we looked at some of the core capabilities required as part of a modern security operations center to help analysts and security engineers defend their organizations from attack. First, we looked at why security is important to organizations and what sort of controls they need to build to manage any threats. We also looked at why it is critical to reduce the mean time to detect and mean time to respond to threats to maintain a good security posture.
Next, we looked at how a SIEM platform can help teams centralize their security visibility, threat detection, and investigation capabilities. We saw how a mature SIEM deployment can become the center of your SOC by turning your data into actionable insights and trigger the appropriate remediation or response action. We also looked at some of the capabilities required in your SIEM to build a successful security program.
Finally, we looked at how endpoint detection and response capability can complement your SIEM platform to bring in-depth host-based telemetry while also actively detecting and preventing threats on your most vulnerable attack surfaces. We saw how EDR and SIEM capabilities work hand in hand to empower the SOC in defending the organization.
In the next chapter, we will look at some of the architectural best practices and guidelines for successful production deployments of the Elastic Stack.