Chapter 21. Incident Response

Case Study: Worm Mayhem

Right around lunchtime, a help desk operator at Example, Inc. (a medium-sized manufacturing company) received a frantic call from a user who was unable to use his PC: it was continually rebooting. The user also reported that strange items had appeared on his desktop. The help desk operator was not sure whom to contact about such issues, so he tried calling his boss, but his boss was not in at the moment. The operator then opened a case in his Remedy console, describing the user’s problem and recording his machine’s hostname. Unfortunately, other calls for unrelated support issues grabbed his attention and the rebooting desktop was forgotten.

Meanwhile, the worm—which is what really caused the problems with the user’s PC—continued to spread in the company network. The malicious software was inadvertently brought in by one of the sales people who often had to plug their laptops into untrusted networks. However, most of the security-monitoring capabilities were deployed in the DMZ (or “demilitarized zone”—a somewhat inaccurate term for a semi-exposed part of the network where you place publicly accessed servers such as web, FTP, and email servers) and on the outside network perimeter, which left the “soft, chewy center” unwatched. Thus, the company’s security team was not yet aware of the developing problem.

The network traffic generated by the worm increased dramatically as more machines became infected and contributed to the flood. Only when many of the infected PCs began attempting to spread the worm out of the company network was the infection noticed by the security team, via the flood of pager alerts. Chaos ensued. Since the breach was not initiated from the outside, the standard escalation procedure the company had previously adopted for hacker attacks was ineffective. Several independent investigations, started by different people, were underway, but there was little or no communication. While some people were trying to install antivirus updates, others were applying firewall blocks (preventing not only the worm scanning but also the download of worm updates), and yet another group was trying to scan for vulnerable machines using their own tools (and contributing to the network-level denial-of-service condition).

After many hours, most of the worm-carrying machines were discovered and the reinfection rate was brought under control, if not eliminated. Due to a major loss in employee time, backend system outage, and unstable network connectivity, the management requested an investigation into who was responsible and how to prevent such incidents. The company hired a computer forensics consultant. Unfortunately, the initial infection evidence was either erased, overwritten on disk, or extremely difficult to find (nobody looked into the help desk system, where the initial call for help resided, since the help desk system was not deemed relevant for security information). The investigation concluded that the malicious software was brought in from outside the company, but the initial infection vector was not determined, since by then some of the machines had already been rebuilt by the IT department, overwriting the infected disk images. In addition, it was extremely difficult to track all the vulnerable and exploited machines, since there was no central point for such information.

This nightmare is what might happen to your company if it lacks a central organization for security monitoring and incident handling, as well as an incident response policy. Huge financial losses, dead-end investigation, an inability to accumulate experience and knowledge in order to improve, and many other problems are likely to result.

This chapter should help you to avoid the pitfalls of chaotic, ineffective incident response. As a first step toward our goal, let us clarify some important definitions. Then we’ll build a foundation for an effective incident response policy based on the SANS Institute’s six-step process.[1]

Definitions

A security event is a single, observable occurrence as reported by a security device or application or noticed by the appropriate personnel. Thus, both an IDS alert and a security-related help desk call qualify as a security event. A security incident is an occurrence of one or several security events that have a potential to cause undesired functioning of IT resources or other related problems. We’ll limit our discussion to information security incidents, which cover computer and network security, intellectual property theft, and many other issues related to information systems.

An incident response is the process of identification, containment, eradication, and recovery from computer incidents, performed by a responsible security team. It is worthwhile to note that the security team might consist of just one person, who might be only a part-time incident responder (and not even by choice). Whoever takes part in dealing with the incident’s consequences becomes part of the incident response team, even if the team does not exist as a defined unit within the organization. A security response is defined as an incident response taken in a broad context. Security extends far beyond the incident response process that is activated when a denial-of-service attack hits the web server or a malicious hacker breaches the perimeter. A large part of security is responding to daily security events, log entries, and alerts that might or might not develop into full-scale incidents. Thus, “security response” is the reaction of an organization to security events, ranging from a new line in a logfile to corporate espionage or major a DDoS attack.

An incident case is a collection of evidence and associated workflow related to a security incident. Thus, the case is a history of what happened and what was done, with supporting evidence. The incident case might include various documents such as reports, security event data, results of audio interviews, image files, and more. The incident report is a document prepared after an incident case investigation. An incident report might be cryptographically signed or have other assurances of its integrity. Most incident investigations result in a report that is submitted to appropriate authorities (either internal or outside the company), containing some or all data associated with the case. Note that the term evidence is used throughout this chapter to indicate any data discovered in the process of incident response, not only data collected that is admissible in the court of law.

Prevention-detection-response is the mantra of information security practitioners. Each component is crucial. We have looked prevention in Chapter 11, while Chapter 18 and Chapter 19 covered detection. This chapter completes the mantra: it shows what to do after you detect an attack. We also revisit certain aspects of detection; specifically, how to know that you were attacked.

All three points of the mantra are important to the security posture. Moreover, unlike detection and prevention, response is impossible to avoid. While it is common for organizations to have weak prevention and detection capabilities, response is mandatory—your organization will likely be made to respond in some way after the incident has occurred. Even in cases where ignoring the incident or doing nothing and facing the consequences might be the chosen response option, an organization implicitly follows a response plan. Preparing for incident response is one of the most cost-effective security measures an organization can take.

Timely and effective incident response is directly related to decreasing incident-induced loss to the organization. It can also help to prevent expensive and hard-to-repair damage to your reputation, which often occurs following a security incident. Several industry security surveys have identified a trend: a public company’s stock price may plunge because of a publicly disclosed incident.[2] Incidents that are known to wreak havoc upon organizations may involve hacking, virus outbreaks, economic espionage, intellectual property theft, network access abuse, theft of IT resources, and other policy violations. Many such incidents run counter not only to internal policies, but also counter to to federal, state, and local criminal laws.

Even if a formal incident response plan is lacking, after the incident occurs the company’s management might need to answer these questions:

  • Can we put things back the way they were?

  • Should we try to figure out who is responsible?

  • How do we prevent recurrence?

Answering these questions requires knowledge of your computing environment, company culture, and internal procedures. Effective incident response fuses technical and nontechnical resources with an incident response policy. Such a policy should be continuously refined and improved based on the organization’s incident history, just like the main security policy.

This chapter shows how to detect network or local intrusions on your system. In addition, we review tools that can help you when your system tests positive for intrusions (from both malicious hackers and viruses). We also address the issues of virus incident response and briefly review computer forensics.

While many books exist that cover incident response for large organizations, relatively little information has been devoted to small- and medium-sized companies. We briefly touch on all of these categories.

Incident Response Framework

To build an initial incident response framework, we can use the SANS Institute’s six-step incident response methodology. The methodology includes the following steps for dealing with an incident:

  1. Preparation

  2. Identification

  3. Containment

  4. Eradication

  5. Recovery

  6. Follow-up

The actions defined by the plan begin before an incident transpires (extensive preparation steps) and extend beyond the end of the immediate mitigation activities (follow-up).

Preparation

The preparation stage covers everything that needs to be done before an incident ever takes place. It involves technology issues (such as preparing response and forensics tools), learning the environment, configuring systems for optimal response and monitoring, and business issues such as assigning responsibility, forming a team, and establishing escalation procedures. Additionally, this stage includes steps to increase security and to thus decrease the likelihood of and damage from any possible incidents. Security audits, patch management, employee security awareness programs, and other security tasks all serve to prepare the organization for the incident.

Building a culture of security and a secure computing environment is also incident preparation. For example, establishing real-time system and network security monitoring programs provides early warning about hostile activities and helps in collecting evidence after the incident.

A company-wide security policy is crucial for preparing for incidents. This policy defines the protection of company resources against various risks, including internal abuse and lawsuits. Often the policy must satisfy the “due diligence” requirements imposed by legislation onto specific industries (such as HIPAA for healthcare and GLBA for the finance industry). A separate incident response policy, one that defines all the details of the response process policy and assigns the incident “owners,” might be needed to further specify the actions that have to be taken after a security incident. Such a policy contains guidelines that will help the incident response process to flow in an organized manner. The policy minimizes panic and other unproductive consequences of poor preparation.

Identification

Identification is the first step after an incident is detected, reported by third parties, or even suspected. Determining whether the observed event does in fact constitute an incident is crucial. Careful record keeping is very important, since such documentation will be heavily used at later stages of the response process. You should record everything observed in relation to the incident, whether online or in the physical environment. In fact, several incident response guides mandate pictures of the compromised systems and the environment in which they are used. Increased security event monitoring is likely to help at this stage by providing information about the chain of events. During this stage, it is important that the people responsible for handling the incident maintain the proper chain of custody. Contrary to popular belief, this is important even when the case is never destined to end up in court. Following established and approved procedures also facilitates internal investigations.

Various security technologies play a role in incident identification. For example, firewall, IDS, host, and application logs reveal evidence of potentially hostile activities coming from outside and inside the protected perimeter. Also, logs are often paramount in finding the party responsible for the activities. Security event correlation is essential for high-quality incident identification, due to its ability to uncover patterns in the incoming security event flow. Collecting various audit logs and correlating them in near real time goes a long way toward making the identification step of the response process less painful.

Containment

Containment is what keeps the incident from spreading and incurring higher financial or other loss. During this stage, the incident responders intervene and attempt to limit the damage by tightening network or host access controls, changing system passwords, disabling accounts, etc. During the containment stage, make every effort to keep potential evidence intact, balancing the needs of system owners and incident investigators. A backup of the affected systems is also essential. A backup preserves the system for further investigation. The important decision on whether to continue operating the affected assets should be made by the appropriate authorities during this stage.

Limited, automated containment measures may be deployed in the case of some security incidents, especially those on the perimeter of the organization. This is possible if security event correlation is used in the incident identification process for reliable threat identification. Correlation makes incident identification much more accurate, enabling automated containment measures such as firewall blocking, system reconfiguration, or forced file-integrity checks.

Eradication

Eradication is the stage in which the factors leading to the incident are eliminated or mitigated. Such factors often include system vulnerabilities, unsafe system configurations, out-of-date protection software, or even imperfect physical access control. Also, nontechnology controls such as building access policies or keycard privileges might be adjusted at this stage. In a hacker-related incident, the affected systems are likely to be restored from the last clean backup or rebuilt from the operating system vendor media with all applications reinstalled. In rare cases, the organization might decide that mitigating the flaws is impossible given the current environment, and will make the decision to migrate the affected system to a new platform

Time is critical during the eradication stage. The first response should satisfy several often conflicting criteria, such as accommodating the system owner’s requests, preserving evidence, and stopping the spread of damage, while simultaneously complying with all of the appropriate organization’s policies.

Recovery

During recovery , the organization’s operations return to normal. Systems are restored, configured to prevent recurrence, and returned to regular use. To ensure that the newly established controls are working, the organization might want to increase monitoring of the affected assets for some period of time. Increased monitoring implemented at the recovery stage not only leads to more effective protection of the affected assets, but also might be adopted as a new baseline for the whole organization, especially if such monitoring uncovers new threats.

Follow-Up

Follow-up is an extremely important stage of the incident response process. Just as in the preparation stage, proper incident follow-up helps to ensure that lessons are learned from the incident and that the overall security posture improves as a result. Additionally, follow-up is important in order to prevent the recurrence of similar incidents. A report on the incident is often submitted to senior management. This report covers actions taken, summarizes lessons learned, and serves as a knowledge repository, in case of similar incidents in the future. It might also summarize the intruder’s actions and tools and give details of the vulnerabilities exploited, and it may contain other information on the perpetrator.

Follow-up steps often need to be distributed to a wider audience than the rest of the investigation process. This ensures the IT resource owners are more prepared to combat future threats. To optimize the distribution of incident information, you can use various forms and templates, prepared in advanced for different types of incidents. A summary of suggested actions might also be sent to senior management. The more in-depth changes to the organization’s handling of security are performed at this step.

Benefits of the SANS framework

Overall, the SANS process allows you to give structure to the somewhat chaotic incident response process. It outlines the steps each organization must define. Such steps need to be easy to follow, since they have to work in a high-stress post-incident environment.

In fact, many of the above steps may be built from predefined procedures. Following the steps will then be as easy as selecting and sometimes customizing the procedures for each case at hand. Incident handling workflow thus becomes relatively painless and crucial steps are not missed. Using predefined procedures also trains incident response staff on proper actions for each process step. An automated system can be built to keep track of the response workflow, suggest proper procedures for various steps, and securely handle incident evidence. Additionally, such a system facilitates collaboration between various response team members, who can then share the workload for increased efficiency.

Now that we have built the response framework, let’s review the goals of an incident response program for various environments.

Small Networks

Since corporations often have their own endless tomes of security “best practices” governing incident response (however inadequate they may be, due to the policies being out-of-date, not promoted, or simple not followed), we’ll first focus on incident response for home systems or small businesses.

What are the ideal requirements of a small home office LAN or home system security response? Keep in mind that few users are excited about reviewing their system logfiles. Even fewer collect attack statistics from home systems (unless they are members of the http://www.dshield.org distributed intrusion detection project). Still fewer care about failed attacks (like CodeRed on a system with no web server or on a Unix machine). While collecting such data might make for scintillating conversation for experts, the average user probably does not care how many CodeRed hits his personal firewall blocked. In Windows environments, it is more practical for the average user to simply clean viruses in case of infection than to save them for future dissection and cataloging. While readers of this book might well be interested in dissecting Windows malware (see Chapter 2), most end users are not likely to have such a hobby.

An important consideration in a small network is that there’s usually no administrative requirement to keep audit trails for evidence—so most people do not keep them. Such neglect complicates incident response in comparison with corporate systems. While it is becoming more popular to report port-scanning kiddies to their ISPs, the endeavor often proves futile, especially when the suspected attack comes from a remote country. In fact, many apparent “attack attempts” actually come from worms trying to penetrate systems on random IP addresses, without regard to available vulnerable services.

Note that this heightened user transparency shouldn’t undermine the efficiency of security measures: the fact that users do not notice security measures should not undermine their efficiency against threats the measures are designed to counter.

Home security should serve to stop casual attackers from abusing the system, block popular automated attack tools such as worms, and (depending upon the individual security requirements) prevent some sophisticated intrusions as well. If the system is compromised, there should be enough data logged to learn what happened. This helps prevent recurrence, but it’s probably not enough to build a solid court case.

Before we dive into the area of response, let’s briefly return to prevention, since it falls within the preparation part of incident response, according to the SANS six-step model. Here are some of the examples of best practices for securing a small home LAN or a single Unix/Linux system:

  1. Remove all network services that are not used (NFS, NIS, web server, etc.).

  2. Set the host firewall (Linux iptables, ipchains, FreeBSD, NetBSD, or ipf or OpenBSD’s newer pf code) to drop or reject all incoming connections from the outside. If you can live with these restrictions, it will prevent all network hacking almost as well as being disconnected from the Internet: limiting outbound connections can be useful for a home network and can protect against a Trojan rooted inside.

  3. Use a strong password or long passphrase. (If you do allow remote access, it makes sense to make password guessing more difficult.)

  4. Use some form of automated backup (i.e., hard drive mirroring via script or similar provision).

Some elements considered “good security” (such as patching regularly) are conspicuously absent from this list. The reason is that the above measures simply need to be enabled once and require no maintenance, while contributing a great deal to security.

The incident response plan for smaller systems will likely be aimed at putting the system back as it was and preventing repeat attacks. The recovery stage of the response framework is perceived as much more important than follow-up. Dissecting the attack or seeking prosecution is usually impractical, outside of trying to prevent recurrence. Reporting to law enforcement might be appropriate for the larger company, but the smaller administrator usually elects to forgo such a labor-intensive endeavor. Still, even for smaller companies and individuals, knowing what to do in case of a malicious hacker break-in is important. An advanced preparation stage often saves a lot of grief during the actual break-in. You need an incident response plan that is created during the preparation stage of the incident process. Even though you might not think of it in terms of a formal response plan (as companies do), thinking ahead and having a plan prevents panic and other destructive reactions. Panic occurs when we encounter the unknown—and getting hacked constitutes the unknown for many users. However, wide availability of broadband and the still-miserable state of home/small office network security is changing that for many people. Moreover, while not everyone is hacked, viruses hit almost everyone, especially if you have to administer Windows systems.

So what is your response plan? If you think you are being hacked, the first step is to disconnect from the Internet. Simply pull the Ethernet plug, or turn off your cable or DSL modem. That’s it—you are safe for now. (This advice is not always applicable in corporate situations, where security administrators might want to monitor the hacker—if for no other reason than to collect enough evidence to build a court case.) Now, is it safe to plug the system back in? Not necessarily. If an attacker installed a backdoor or provided other means for returning, connecting the system before doing a cleanup is harmful. Similarly, rebooting the system probably won’t help.

Look around for suspicious signs. Are any files missing or new applications running? Is your antivirus product (if you have one) complaining? What about your personal firewall? In Windows 95/98/ME, there’s not a lot you can do. Apart from looking at running programs using the Task Manager (called by Ctrl-Alt-Del) or running Microsoft System Information, there is not much diagnostic power. However, you can search the Web to find WinTop, an old Microsoft PowerToy for Windows 95 that still works fine. Figure 21-1 shows WinTop running on a Windows ME machine.

WinTop diagnostic utility

Figure 21-1. WinTop diagnostic utility

WinTop shows all the processes running on Windows and identifies some classes of backdoors and hidden servers. Windows NT/2000/XP uses the built-in Task Manager to identify running services and processes. The Windows NT/2000 Resource Kit from Microsoft also contains several useful tools to kill processes and monitor users accessing the system remotely. In addition, many third-party utilities are available for Windows NT/2000 incident response, such as tools from SysInternals (see Chapter 2). These tools identify running hidden processes, discover network connections (and the processes that initiated them), and reveal connected users. Windows NT/2000/XP also has much better system logging support (Event Log) than Windows 95/98/ME (which has barely any). Having a disk with a trusted version of these utilities can really help, since you will know that hackers have not compromised the versions of the programs you are using.

Malware (such as Trojans and worms) can hide from these process viewers. Some utilities (such as those from SysInternals) provide a more in-depth view. Having system integrity software such as Tripwire helps, but the Windows version is not free. In addition, Tripwire is not designed for end users, but for corporate security departments.

On Unix/Linux, the trusty old /bin/ps command helps, as shown in Figure 21-2.

Screenshot of /bin/ps

Figure 21-2. Screenshot of /bin/ps

Malicious software (like rootkits, evil kernel modules, and network backdoors) can hide from /bin/ps. For Linux experts, looking in the /proc directory will help, as shown in Figure 21-3. (In Figure 21-3, ll is an alias for ls -l, common on Linux systems.)

Examining the /proc directory

Figure 21-3. Examining the /proc directory

How can you use the /proc directory to look for signs of malicious activity? In the above ps output, the process with ID number 818 is “named”—the Unix DNS daemon. Pretend for a second that it is a malicious “named” started by hackers and it hides from the ps command (e.g., by modifying the /bin/ps binary, as some older rootkits do). Then we can compare the process IDs shown by /bin/ps with the contents of the /proc directory. In this case, we would see that all process IDs shown by ps have their own entries in /proc, but there is another entry for “818” that is not in the process list. By simply performing the following:

cd /proc/818 ; ls -l

we can see all the processes (listed in Figure 21-4).

Process list

Figure 21-4. Process list

The process with this ID is indeed “named” (see the “exe” entry above for the process name). To get more details, look at various entries such as “cmdline” or “status”. For example, “cat status” produces a nice summary of process behavior (Figure 21-5).

Status output

Figure 21-5. Status output

There are special programs that look for similar signs of malicious activity, such as chkrootkit for Linux/Unix, which looks for traces of well-known hidden hacker tools.

Overall, using /proc provides a nice alternative to using tools such as lsof and also shows more of the system internals (always handy to learn).

Depending on what we found by looking at our system, we take our next step. While backup of important data is best done periodically rather than when disaster strikes (When it’s too late), now is a good time to make sure all the important data is saved elsewhere. In the case of a Windows virus attack, take extra steps to avoid backing up viruses with user files. Cases of virus reinfection from backup media are common. Writable CD-ROMs, CD-RWs, Zip disks, or even another hard drive (that is then taken out of the machine) can be used as backups. Networked backups are also useful (although probably better suited for the Unix world). Tools such as rsync can be used to securely replicate all the machine data over the network.

Now that you are sure your data is safely backed up, you can spend more time snooping around. If you have not found any apparent signs of malicious activity and your antivirus product is silent, there is not much that can be done on a typical Windows system. If you want to be sure that your system is safe, rebuild it from scratch: format the disks, install the operating system, and restore your files from backup media.

Install and update your antivirus scanner and personal firewall, and avoid using especially dangerous programs such as Outlook (or at the very least, configure them securely). For example, restrict various forms of active content in messages (JavaScript and especially ActiveX) and limit the network addresses that they can access. The bugs in such programs (they are legion) might still bite you. A third-party email client such as Netscape is safer, if you don’t mind losing some of the bells and whistles of Outlook.

What if this incident proceeds along a more ominous path? What if you discover your machine was erased completely or rendered unbootable with corrupted disks? Investigation is still possible, but reinstallation with recovery from backups will invariably be the last step.

If you are responsible for Windows machines in a home office or small office environment, consider reading Windows Internet Security: Protecting Your Critical Data by Seth Fogie and Cyrus Peikari (Prentice Hall, 2001). It shows security newcomers how to diagnose and treat hacker or virus attacks on Windows machines. On Unix, you can go much farther. We cover some of the available tools in the next sections.

Overall, the incident response process for a small network is aimed more at putting the system back as it was than at in-depth investigation and prosecution.

Medium-Sized Networks

Let us now consider a small- to medium-sized business, which likely has no dedicated security staff. Although similar to the home system case, the medium-sized network has some important differences, outlined below. As discussed in Chapter 18, a company is regulated by more administrative requirements and legal responsibilities than the home office of a private citizen. Thus, the level of security and accountability is higher. Most organizations connected to the Internet have at least one firewall and some sort of DMZ set up for public servers (web, email, FTP, remote access). Many deploy intrusion detection systems and virtual private networks (VPNs). Signals coming from all these technologies need to be interpreted and dealt with The technologies deployed during the preparation stage can greatly help future identification and containment.

The security response for such an organization focuses on severe threats. It is well known that many low-severity threats (such as someone performing port scans) might be precursors for more serious attacks (such as attempted break-ins). Unfortunately, a small company rarely has the personnel to investigate them. Ideally, security reports should include more serious attacks that actually have a chance of succeeding (unlike, say, exploits for services that are not installed). A central syslog server (for Unix environments) is of great value: using freeware tools such as logcheck (http://www.psionic.com), swatch (http://www.oit.ucsb.edu/~eta/swatch/), logwatch (http://www.logwatch.org), or logsurfer (http://www.cert.dfn.de/eng/logsurf/) helps to cope with a flood of logging information and to detect signs of an attack. A host-based IDS will probably take priority over a network IDS, since the latter produces much more information that requires analysis, while alerts from the former usually indicate a successful intrusion requiring immediate corrective action.

In addition, however unconventional it might sound, security controls for this environment must be user-friendly in order to work. The reasoning behind this is simple: the friendlier they are, the more they will be used—saving the company , for example, from the “password disease” (if you force everybody to have difficult-to-guess passwords, they are likely to post them on their monitors so they don’t forget them). The recent rise of hardware security appliances configurable via a browser-based GUI proves this trend.

The audit trail (including security device and system logs) also needs to be collected and kept with more diligence in a medium-sized network than in a home system, since it might be used for attack analysis. System logs and logs from security devices should be archived for at least a week, if storage space permits. This allows you to track the events that led to a compromise, especially if the attacker first tried other methods or tried to penetrate other machines. This information helps investigators assess the damage, evaluate the efficiency of network defenses, and accumulate more evidence for possible litigation or prosecution. It is necessary to stress the importance of a written security policy for audit data collection. Unless mandated by policy or present in a contract signed by all employees, collection of such data can be considered a privacy offense, putting the company at risk of being sued. This danger especially applies to network sniffers that record all network traffic.

Because of the expense, the incident response process for a small- to medium-sized company concentrates on restoring functionality rather than prosecuting the attacker. The eradication and recovery stages are prominent, often in lieu of preparation (there’s little planning, if any) and identification (the incident is only responded to when it becomes obvious). Reporting the incident to law enforcement might happen if the benefits of such an action are viewed as exceeding the problems it is sometimes known to cause. The critical issue for incident response in this environment is is a response plan. While a dedicated team is impractical, having a plan will take the company a long way toward avoiding common incident problems. Such problems can include panic, denial, confusion, the destruction of evidence, and the blaming of random individuals within the company—as the worm mayhem scenario earlier in this chapter illustrated. It makes sense to designate a person responsible for incident response. Even if not trained in information security, such a person might be able to recognize that an incident is taking place and put a plan into action by contacting the right people. Thus, the preparation stage centers on finding and dedicating such a person within the organization.

Overall, the security response process for such a company focuses on surviving as opposed to fighting back—i.e., speedy recovery and inexpensive prevention. Responding to a major incident will probably involve outside consultants, if detailed investigation is justified for cost reasons. Pursuing an attacker is unlikely.

Large Networks

A company with a large IT department and a dedicated security staff is in a unique position in relation to security response. On the one hand, they have more resources (human and financial) and can accomplish more in terms of security; on the other hand, they have more eggs to watch, in many different baskets. They will likely spend more effort preparing for potential incidents and will often have the infrastructure to identify and contain them.

The theme for a large company’s security response is often cost effectiveness : “How do we accomplish more with less? How do we stay safe and handle the threats that keep appearing in ever-increasing numbers? What do we do when the safeguards fail and the enterprise is faced with a major security crisis?” These questions can be answered by a good security plan based on the SANS six-step process.

A large network adds complexity to the security posture—and having complicated perimeter defenses and thousands of internal machines on various platforms does not simplify incident management. Firewalls, IDSs, various access points (e.g., dial-up servers, VPNs), and systems on the LAN generate vast amounts of security information. It is impossible to respond to all of it. In addition, few of the events mean anything without the proper context: a single packet arriving at port 80 of the internal machine might be somebody from within the LAN mistyping a URL (not important), or it could be a port-scan attempt within the internal network (critical importance) or misconfigured hardware trying to do network discovery (low importance).

Using automated tools to sort through the incoming data might help to discover hidden relations between various security data streams. The simplest example is the slow horizontal port scan—port 80 on IP 1.2.3.4, then port 80 on 1.2.3.5, and so on—as opposed to a sequential port scan with port 80 on 1.2.3.4, then port 81 on 1.2.3.4, and so on. A single packet arriving at the port will most likely go unnoticed if the observer is only looking at an individual device’s output, while the evidence of a port scan becomes clear with correlation. Thus, it makes sense to use technology to intelligently reduce the audit data and to perform analysis in order to selectively respond to confirmed danger signs. Commercial Security Information Management (SIM) solutions can achieve this.

In a large environment, the security professional may be tempted not only to automate the collection and analysis of data but to save even more time by automating incident response. A certain degree of incident response automation is certainly desirable. A recent trend in technology merges SIM solutions with incident workflow engines and aims to optimize many of the response steps. However, an automated response can cause problems (see http://online.securityfocus.com/infocus/1540) if deployed carelessly. Difficult-to-track problems might involve creating DoS conditions on a company’s own systems.

Incident response in a large corporate environment should have a distinct containment stage, since many organizations still adhere to the “hard outside and soft inside” architecture rather than one based on defense-in-depth. Thus, promptly stopping the spread of damage is essential to an organization’s survival.

On the investigative side, a large organization is likely to cooperate with law enforcement and try to prosecute attackers. For certain industries (such as finance), reporting incidents to law enforcement is mandatory. As a result, the requirements for audit trails are stricter and should satisfy the standard for court evidence handling (hard copies locked in a safe, raw logs kept, etc.). You can learn more about law enforcement investigative procedures for computer crimes in the article “How the FBI Investigates Computer Crime” (http://www.cert.org/tech_tips/FBI_investigates_crime.html).

Overall, a large company’s security response concentrates on intelligently filtering out events and developing policies to make incident handling fast and effective, while focusing on stopping the spread of the attack within internal networks. An internal response team might carry the burden of investigation, possibly in collaboration with law enforcement.

Incident Identification

Depending upon how far you want to go to improve the detection capabilities of your computer system, consider solutions ranging from installing a full-blown network intrusion detection system, such as Snort, to doing nothing and relying on backups as a method of recovery. The optimal solution is somewhere in the middle of these extremes.

On Unix/Linux, an integrity-checking program helps a lot. Such programs can pinpoint all changes that have occurred in the filesystem. Unfortunately, malicious hackers have methods that can deceive those tools.

Here, we illustrate how easy is to use such tools. For example, let’s consider AIDE (a free clone of Tripwire with a much simpler interface). AIDE runs on Solaris, Linux, FreeBSD, Unixware, BSDi, OpenBSD, AIX, and True64 Unix. To use AIDE, perform the following steps:

  1. Download the source from its home site (http://sourceforge.net/projects/aide) or from any of the popular Linux RPM sites (a binary RPM package is available for Linux).

  2. Install or compile and install it as follows. To install:

    rpm -U aide-0.8-1.i386.rpm

    To compile and install:

    tar zxf aide*gz; cd aide-0.8; ./configure; make    ; make  install
  3. In order to create a database with a list of all file parameters (sizes, locations, cryptographic MD5 checksums) run aide -init. It is crucial to perform this step on a known clean system—e.g., before connecting the system to the network for the first time. Only a clean baseline allows reliable incident investigation in case of a compromise.

  4. To check the integrity of your system, run aide -check.

  5. To update the database upon introducing some changes to your system, run aide -update.

To use the tool for effective security, you must safeguard the resulting database (/var/aide/aide.db) as well as the tool’s binary file (such as /usr/bin/aide) and related libraries. Copy it on a separate diskette to be used in case of an incident.

Aggressive Response

We have covered some of the basics of incident response in this chapter. Now, let’s address the absolute taboo of incident response: namely, the desire to hack back. If you feel like retaliating, get the attacker’s IP address, run it through a whois service (either a program or an online service such as http://www.SamSpade.org), and report the intruders to their Internet Service Provider or, if their ISP supports (or tolerates) hacking, to their upstream ISP. While certain branches of the government and the military are allowed and even encouraged to hack back, such actions are not appropriate for corporate security professionals. The possible risks far outweigh the gains.

Recovery

Backup, backup, backup. Recovery is much simpler if you can just plug in a CD-ROM with yesterday’s (or a week-old) copy of your data and continue from there. However, imagine that a malicious virus destroyed your collection of MP3s and that your hamster ate your backup CD. Is all hope lost? The short answer is yes. We are only half-joking, since there is no guarantee that any material will be recovered.

In Windows 9x/ME, there are tools that provide reliable file undeletion, if they are used a short time after the file is destroyed. How the file was destroyed makes a difference during recovery attempts. For example, one known worm overwrote files with zero content, without removing them. In this case, most available Windows undelete utilities failed, since they are designed to recover files that are deleted and not replaced with zero-sized copies.

In Windows NT/2000/XP, there is a chance of recovery as well. If NT/2000 was installed on a FAT partition (the same as Windows 9x uses), the files can probably be recovered. In NTFS, the chances for recovery are much lower.

The Unix situation is even worse. An old Unix reference once claimed that on Unix there are no “problems with undeleting removed files” for the simple reason that “it is impossible.” In reality, undeleting is not entirely impossible, but to do so requires spending time with forensics tools that often find only pieces of files, and then only after extensive content-based searching. Such a process is also Unix vendor, version, and flavor-dependent. For example, RedHat Linux versions up to 7.2 allowed easy undeletion using tools such as e2undel and recover (based on a Linux Undeletion HOWTO available at http://www.linuxdoc.org). However, due to some changes in filesystem code, what was once easy is no longer possible. Overall, Unix file recovery falls firmly into the domain of computer forensics (see Chapter 22).

Briefly, The Coroner’s Toolkit (TCT) gives you a finite chance to restore files on Solaris, SunOS, FreeBSD, OpenBSD, and Linux (of course). TCT is the most popular Unix forensics tool. A newer competitor has been released by Brian Carrier (from @Stake): the TASK toolkit incorporates TCT functionality with the TCT-Utils package (also by Brian Carrier). The undeletion functionality of TCT+hat works on all supported Unix flavors is the unrm/lazarus combo.

Overall, the undeletion procedure for these tools is as follows:

  1. Become root on your system.

  2. Determine which filesystem the file was erased from (if you lost /home/you/important.txt and your df command tells you /dev/hda5 is mounted as /home, then the file was on partition /dev/hda5).

  3. Unmount the above partition or even take the disk out and install it in a different machine. Another good solution is to make an image (bit-by-bit or forensic) copy and operate on it. Use a different machine for recovery. The goal is to make sure the file is not overwritten by your recovery effort.

  4. Run the unrm tool on the above partition:

    #~/tct-1.09/bin/unrm /dev/hda5 > /tmp/all-data

    Make sure /tmp is not part of /dev/hda5!

  5. Now run lazarus:

    #~/tct-1.09/lazarus/lazarus -r /tmp/all-data
  6. Start up your browser and open the file ~/tct-1.09/www/all-data.frame.html. You should be able to look at all deleted files (with no names) by type.

  7. As an alternative to step 6, you can go to ~/tct-1.09/blocks and look for your file based on size and type. Run various commands (such as grep and file) to locate the file in the sea of removed file chunks.

Unfortunately, this procedure is not guaranteed to work. Success greatly depends on a combination of luck (the most important factor), the amount of time that has passed since file deletion, and your knowledge of the file parameters. It is much easier to recover text files, since you can just use grep within a block to look for the file content.

References



[1] The SANS Institute’s six-step incident response methodology was originally developed for the U.S. Department of Energy and was subsequently adopted elsewhere in the U.S. Government and then popularized by the SANS Institute (http://www.sans.org).

[2] If you think not disclosing is a measure against this effect, think again—often the attacker will do it for you, just to embarrass your company. Also, new laws may require you to disclose incidents.

[3] Unlike the popular misconception, CERT is not a Computer Emergency Response Team (see http://www.cert.org/faq/cert_faq.html#A2).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset