Chapter 8 IRC and Botnets

Introduction

In this chapter we look at ourmon’s IRC facility and see how it can be used to detect botnet client meshes and botnet server meshes as well as the occasional compromised host that may be hosting an IRC-related hacker channel. We will refer to the two case histories introduced in Chapter 6: “Case Study #3: Bot Client” and “Case Study #4: Bot Server.” We will also look at a few other cases of malware that could be bot-related as well. Before we get started on bot clients and servers, though, we want to first talk about the IRC protocol itself and then take a brief look at ourmon’s IRC related statistics. This will help you navigate ourmon’s IRC Web page and reports.

Understanding the IRC Protocol

Assume that the local enterprise security officer has been informed that a botnet client exists on the local IP address 192.168.2.3. How might that happen? One way is that some other security engineer or network engineer might send e-mail to a locally registered abuse e-mail that says something like:

To: [email protected]

Subject: scanning client on your IP address

Greetings. You have a host scanning from IP address 192.168.2.3 and it is scanning hosts on our campus at ports 445 and 139. Please fix this problem and advise us when the problem has been solved.

Yours truly, Joe Network Person, Joe Network Inc.

So now you use a network monitoring device of some sort, possibly a sniffer like tcpdump (www.tcpdump.org), which is free, or possibly a commercial tool. In our case we might reach for a free tool that is ASCII oriented (due to previous experience) called ngrep (network grep) and invoke it as follows:

# ngrep -i em0 tcp and host 192.168.2.3

The tool ngrep can take patterns (regular expressions) and Berkeley Packet Filter (BPF) expressions that are used with sniffers like tcpdump or WireShark (www.wireshark.org). The incantation means “Run ngrep on the Ethernet interface called em0” (FreeBSD Intel driver). In this case we are not using a regular expression. The BPF expression is “tcp and host 192.168.2.3.” That means “Give me only TCP packets sent to and from host 192.168.2.3” So after waiting patiently for some period of time, we might see the following:

T 10.1.2.3:8641 -> 192.168.2.3:3103 [AP]

:[email protected] PRIVMSG #zz :.advscan asn445 330 5 0 65.

78.174.x -r -s..

So what does this mean, and is it bad news? It means you have a botnet with one or more hosts, and yes, it is bad news. Ngrep has extracted a message in IRC format sent from the bot server to the bot client, telling the latter to do scanning using a particular exploit (presumably for an ASN.1 vulnerability on port 445). Later on you might see a message roughly like the following one, which unfortunately means that a new host (192.168.2.4) has been infected and has finished a download of something called “msutil64.exe.” We suspect that msutil64.exe has some sort of malware payload in it. These are both examples of the IRC protocol that might be used by botnets.

T 192.168.2.4:2345 -> 10.1.2.3:8641 [AP]:[email protected] PRIVMSG #zz :^B.DOWN.^B File download: 19. 0KB to: c:msutil64.exe @ 19.0KB/sec.]

Internet Relay Chat (IRC) is an Internet Engineering Task Force specified protocol. Its original version was RFC 1459, which was written in 1993. Later on, RFC 1459 was updated (but not replaced) by RFCs 2810-2813. (See www.irchelp.org/irchelp/rfc for more information.) Internet Relay Chat has a strange history. It is not the only chat protocol (there are many such protocols, and one might include Internet messaging protocols as well). But it is popular with botnet software authors as well as with ordinary users who just seek to chat. It has been popular with hackers because there is no need to register accounts or handles, and it is easy to set up your own channels and servers. It has also been popular with hackers for discussing the distribution of illegal files (warez) and attack methodologies.

The basic idea is that you have a network of one or more servers and IRC clients. A user must connect to an IRC server with an IRC client at a certain port (traditionally port 6667, although any port can be used), select a nickname (a nick or handle), and join one or more channels with a possibly optional password. Joe Hacker might call himself l33tguy in the channel. The important thing to note here is that the logic that glues IRC together is the IRC channel name. The channel is a logical chat room.

Figure 8.1 shows two IRC networks, both organized around channels. Network 1 is organized around the linux chat channel and consists of two servers and a number of client hosts. Network 2 has one server (which happens to be a botnet C&C) and a couple of clients. With Network 2, the channel name is lsass445. Using the IRC protocol, a client sends a data (PRIVMSG) message to an IRC channel, which is an abstraction for a set of users on possibly different client computers and one or more servers. Channel names are basically ASCII strings with a little bit of “syntax sugar” possible. The server that the client is directly connected to takes the message (typically just an ASCII string like “hi there”) and forwards it to other directly connected clients as long as the client has logged into the channel. The first server may also forward it to other servers if other servers are connected to the first server. In turn those servers may forward the message to other clients or servers interested in the channel, and so on. IRC is said to be a logical mesh network and the data is flooded to other potential recipients in the mesh. This means data goes one way to all the logical clients through all the servers. Put another way, the servers make sure the message doesn’t get sent twice to any client interested in the channel.

image

Figure 8.1 Two IRC Networks

imageTIP

See http://en.wikipedia.org/wiki/Internet_Relay_Chat for a good discussion of both IRC and its history, although it doesn’t say much about IRC’s dark side.

Our goal here is to not explain all the IRC protocol. Ourmon only cares about a very small restricted set of IRC, and as a result that IRC subset is all we intend to explain here. Also please note that we are talking about the low-level IETF IRC protocol; we are not talking about IRC commands used in any particular IRC client program. The four kinds of IRC protocol messages ourmon understands are as follows:

image JOINS JOINS are used by an IRC client to log into a channel on a server. The channel name and password are part of the JOIN message.

image PINGS PINGS are sent from a server to a client to discover if the client is still interested in the channel and has not for example crashed or gone away otherwise. Typically PINGS are sent in a periodic fashion at some multiple of 30 seconds.

image PONGS See PINGS above. PONGS are returned from the client to the server to show that it does not want to be logged out and still exists.

image PRIVMSG A PRIVMSG contains both the channel name and data sent to the channel name. The basic idea here is that the message (“hi mom” or “scan using port 445”) should be sent to all the hosts in the logical IRC channel.

JOINS and PRIVMSG messages contain the channel names, and ourmon uses those messages along with the IP addresses in the IP header to construct a list of channels with associated IP hosts (as IP addresses). Ourmon does not look at the data part of the PRIVMSG. because our goal is only to construct a network mesh, not look at user data. It also keeps track of PING and PONG messages because they indicate basic IRC mesh connectivity. It is possible for a client to send a JOIN message and not do PINGs and PONGS. So in some cases a client could simply send a JOIN over and over again. In the world of large IRC servers, clients might do this to keep an administrator from logging a particular client out manually.

Of course we are really looking for botnets with this mechanism. We don’t care about human chat groups. We care about programmatic use of IRC as a communication channel and programs that link up to servers elsewhere (meaning bot clients and bot servers). As a result, our focus is on statistics. For example, we want to know the IRC channel names and the IP addresses of hosts in those channels. We want to know if mysterious new channels appear. We want to know if the statistics show anything unusual, which might include unexpected numbers of PINGS and PONGS, indicating a very large (and previously unknown) IRC channel on campus. We especially want to know about any IRC channel that is inhabited by a large number of scanning hosts. This might indicate a botnet client mesh.

Ourmon’s RRDTOOL Statistics and IRC Reports

In this section we look at ourmon’s IRC user interface. Before we go on, refer to Chapter 7, Figure 7.1. Find the middle jump table with the title important security and availablility reports/web pages and then note the hypertext link called irc stats page. That’s where the ourmon IRC statistics live. Go to that page for the following discussion. A screenshot of the IRC page is findable, as shown in Figure 8.2. We want to discuss both the page and the format of the summarized IRC report as well as say a few words about the RRDTOOL statistics available on that page.

image

Figure 8.2 The IRC Stats Page

The IRC stats page has three things available on it that are all IRC-related:

image The 30-second IRC report This report and the weekly summarizations all have the same format. However, this particular report only has the last 30 seconds’ worth of data.

image The weekly summarizations, including the daily report As is usual with summarizations, the current daily report is available at the left-hand side. It is run hourly and rolled over at midnight to become yesterday. Yesterday is rolled over to become today, then 2 days, and so on. All together there are eight full days in addition to the current day.

image The RRDTOOL global IRC stats Figure 8.2 shows the daily strip chart, and Figure 8.3 shows a weekly strip chart. As is usual with RRDTOOL, strip charts for daily, weekly, monthly, and yearly stats are available. The ourmon system counts the total number of IRC PING, PONG, JOIN, and PRIVMSGs for the entire network as seen by the probe. Usually these messages have low counts.

image

Figure 8.3 Normal Weekly IRC Statistics

imageTIP

A typical way to use this information is to take a quick glance at the daily and weekly total stats. This could help you detect the presence of an IRC bot server on your network (as we will see in a moment). You want to see normal small daily bumps, not counts in the thousands. Then take a look at the summarized reports for today (daily) and yesterday. You want to see if there are new channels you don’t understand and if there are so-called evil channels with sets of attacking hosts. We will look at examples in the following sections.

The Format of the IRC Report

In this section we will look at a brief overview of the IRC report. First let’s talk about the structure of the IRC report and then take a look at a few benign human chat groups so that we know what normal looks like. Our goal here is to explain some of the statistics and the overall layout of the report. The basic report format consists of a timestamp of when the report was made, followed by a short section of global statistics (see the following), and two bigger sections on channel statistics and host statistics.

image

Various subreports are found under channel stats and host stats. We will only look at the first few channel subreports that are by far the most important parts of the IRC report. We informally call the first channel subreport the evil channel report. This report is officially called channels sorted by wormy (evil) hosts. We define an evil channel as a channel that might have a number of scanning clients in it. The second subreport will be called the channel max message report. It is labeled channels sorted by max messages above. Channels are sorted in that subreport by the maximum IRC messages seen over the time period. The third channel subreport is also useful; we will call it the channel host report. Above it is channels with associated host IPs and host stats. In this subreport all the host IPs in the channel are given. Each host IP has a set of statistics associated with it.

The following is a simple and benign example. First we want to look at something safe, and then we will be able to compare it to a botnet client mesh. Later on we will see some examples that are not so benign.

imageNOTE

Compared with real ourmon data, the tabular data shown in Tables 8.1 and 8.2 has been simplified for formatting reasons. Not all available fields will necessarily be shown in the examples.

Table 8.1 Channels Sorted by Max Messages

image image image

Table 8.2 Channel Ubuntu with Per-Host Stats

image

In the two tables we see normal (and benign) IRC statistics. In this report, the evil channel report has no messages, so we do not show it. In Table 8.1, we show channels sorted by max messages. All IRC channels seen during the time in question are listed, and all the basic four kinds of message types are added together and put under the label msgs. We see that channel Ubuntu has sent 4275 messages, which is more than the second channel, Rubyonrails. The number of PRIVMSGS is high, which can be taken as a sign that the channel is probably truly occupied by people, compared to a channel that has no PRIVMSGS and possibly only JOIN messages. The various columns have these meanings:

image Msgs Total number of IRC messages for all hosts in that channel

image Joins Total number of JOINS

image Privmsgs Total number of PRIVMSGS

image Ipcount Total number of IP hosts in the channel (including IRC servers)

image Wormyhosts Total number of hosts deemed to be scanners according to the TCP work weight.

image Evil? E means that there are at least two scanners, and e (lowercase) means at least one; this flag is both a joke and an attempt to alert the analyst to potential trouble.

imageNOTE

Why is the Evil flag a joke? On April 1, 2003, Steve Bellovin, a well-known security expert, posted IETF RFC 3514. He proposed that every IP packet should have a flag set if it was evil. In other words, hackers with evil intentions should mark their packets so that firewalls could drop them. Unfortunately, this idea remains unimplemented.

This subreport is important for any number of reasons. First and foremost it gives you a list of the IRC channels within your network. Take a good hard look at that list. You want to compare today’s summarization with previous days to see if you have new channels (possibly new channels with strange names). Knowledge of your IRC channels is important because it can lead you to detect botnets or unknown hacker chat channels on your own, sans fancy expert knowledge. IRC channels that lack PRIVMSGS are also interesting. This means the channel is not being used for chat. It is possible that it is unpopular, but many hosts on a channel with no PRIVMSGS could be a sign of a botnet channel. One reason for this is that some botnets have used JOIN messages as their data channel and have not transmitted commands using PRIVMSG.

imageTIP

Know the names of your IRC channels so you can look for sudden changes in those channel names. This might not be easy to do at a university, but within a private enterprise network you might have no IRC at all. So any IRC activity could be evidence of an infection or a worker who is not working and is indulging in games.

The channel subreport entitled channels sorted by evil factor appears at the top of the IRC report. It is extremely important because its primary goal is to alert you to an attacking botnet client mesh. Thus we put it at the top of the report so you don’t have to go far to find it. It is sorted by the number of “wormyhosts”—in other words, by the number of hosts that are scanning. A high number of scanning hosts in an IRC channel is likely a botnet client mesh. For example, if you have seven hosts in the IRC channel and six of them are local hosts (with a remote server) and most local hosts have high work weights, you probably have an infected channel. This subreport has the same form as the channels sorted by max messages subreport.

The other important subreport is channels with per host stats. Table 8.2 is an example and has been simplified to show one client host and one server host. Here the IP hosts and statistics related to those IP hosts are given under the channel name. The various column headings are as follows:

image Ip_src The IP address of the IRC host in question.

image Tmsg Total max IRC messages (JOINS, PINGS, PONGS, PRIVMSGS).

image Tjoin Total number of JOIN messages.

image Tping Total PING messages.

image Tpong Total PONG messages.

image Tprivmsg Total PRIVMSGS.

image Maxchans Count of the number of channels this host has joined.

image Maxworm This is a special form of the TCP work weight. This particular version of the TCP work weight is the maximum value seen over all 30-second instances in the IRC summarization. It is also a “weak” statistical measure. We will discuss it in more detail in a moment.

image Server? The probe IRC module attempts to figure out if an IRC host is an IRC client or IRC server. S stands for server and H stands for host. Not all IRC protocols conform to the IETF standards; sometimes you might see an IRC channel with all servers. This is not unusual and is sometimes found with computer games using IRC.

image Sport/dport These are sampled IRC TCP source and destination ports. This field may sometimes make obvious the destination port on the server, which could be a useful thing to know. It is also a per-host sample, so if the host is in multiple channels, it might be wrong. Look for hosts in the channel that agree on the server port.

image First_ts This field is new. It shows the first time a host in an IRC channel showed any IRC activity during the day. The timestamp is based on a particular IP host in a channel, so the same host in a different channel might have a different timestamp.

How is the TCP work weight used in IRC summarizations? The IRC summarization itself is pulling together a set of IP hosts found to be talking inside a particular IRC channel. Let’s say we have two channels, one called bark and the other called x0#. Channel bark has 10 clients and one server. Channel x0# has five clients and three servers. When we look at these two channels in channels with per host stats we see that channel x0# has five clients, all with TCP work weight values (maxworm) of 99. So from the big picture this means we have a channel with all its clients scanning. The TCP work weight is the maximum value of all work weights seen. The reason is that if you have an outbreak of multiple bots it becomes pretty easy to spot that all of them or most of them (the clients in channel x0#) are infected. This is what the evil channel report is trying to show you. If you have a high work weight for a good number of hosts, you can assume that all the clients in this channel are infected, too. Some of them might not have been ordered to scan or might for some reason not be responding to the hacker’s commands.

Here we want to draw your attention to a channel where a number of hosts are all behaving badly in the same way, which strongly implies that they are under remote control. In addition, the IRC version of the TCP work weight is a weaker statistical measure than the TCP work weight used in the TCP port report. It is calculated the same way in terms of SYN count, FIN count, and so on. However, in this case we don’t insist on a strength value of approximately 1 SYN per second. Three SYNS and no FINS and no data packets will in this case still get you 100 percent for a host. This could detect some cases of weak scanning done by a botnet mesh. But it also could result in false positives where there are one or two hosts with high work weights in an IRC channel with many other hosts. Again, the goal is to show multiple scanners in a botnet mesh, which leads you to suspect that the entire set of hosts in that channel is infected. When in doubt, you can also look at the TCP port reports to see if the host is scanning from the pure anomaly detection point of view. We will touch on this idea more in a moment and in the next chapter, when we talk about tricks for searching the ourmon logs.

Notes from the Underground …

Hackers and Channel Names

We have seen some really bad choices for channel names from those on the dark side. For example, xploit or lsass445 might not have been the best choices. The latter is particularly bad given that it alludes to the exploit being used to grow the number of hosts in the botnet. That said, there is no telling why human beings pick the channel names they pick. The only true recourse for the analyst is to be knowledgeable about which channel names are normal locally and to investigate new ones if local security policies allow such investigation.

For more details on the subreports in the IRC summarization, see ourmon’s info.html Web page under its IRC section.

Detecting an IRC Client Botnet

In this section we take a look at some example client botnets detected in action. This will include our Case Study #3 from Chapter 6. When you are looking at the evil channel sort or the max message sort of channel names, there are really four possible outcomes for botnet client mesh detection:

1. You might have an attacking botnet client mesh with 1 or some or all hosts in the channel scanning.

2. You might have a passive botnet client mesh and need other means to identify it.

3. You might have a false positive (it isn’t a botnet client mesh, it’s something else entirely).

4. You might not be able to figure it out.

So let’s say you decide to look at the ourmon IRC summarization:

channels sorted by evil factor: max number of wormy hosts:

and you see something like the report shown in Table 8.3.

Table 8.3 Evil Channel Sort

image

So there are four channels that need to be investigated. Channel x0# has no PRIVMSGs, nine hosts, and five scanners. This does not look good. The other three channels have only one scanner in them. Odds are good at this point that channel x0# is evil. The other three could simply be false positives.

Let’s look at x0# and its host breakdown to begin to see why we can claim it is a botnet (see Table 8.4).

Table 8.4 Channel x0# Hosts

image

Let’s also look at the summarized TCP port report for one of the local IP addresses, which we get from the Web page syndump summarization:

image

What we can observe here is that all the local hosts (net 192.168) have high work weights, and when we look at the port report ports we see that is because the hosts in question (like 192.168.1.1) are scanning on ports 1433 and 5900, and scanned into our darknet as well (P in application flags). A little searching on the Internet (www.dshield.org is a good site for intelligence about ports) reveals that these are popular ports for exploits aimed at SQL and VNC (see http://isc.sans.org/diary.php?storyid=1331). We don’t really need to see any more. The timestamps in the summarization are interesting, though. They suggest when local hosts might have been exploited and infected. We now know five local infected hosts and a number of remote IP addresses of botnet servers. Of course, there is much more to do and other intelligence we might want to collect, including what exactly is the virus, where are those hosts, how did the attack arrive? How is the botnet actually controlled (we don’t necessarily know that as there are no PRIVMSGS in this data set), and how might we try and clean up the infected hosts? But ourmon has done its job.

Next let’s look at the channels that could be false positives. We look at channel hobo (our “Case Study #3: Bot Clients”) and actually discover that channel i-exp has the same remote botnet server IP address. Hobo is an example of a fringe case where it is not completely clear (at first) whether or not this is a botnet. Once you find a botnet server, you should always search through the entire report to look for other instances of that IP address. It is not unusual for a botnet to use different channels for different functions, including launching scan commands or initiating downloads. Hobo (shown in Table 8.5) is a little tricky because there is only one local host with a high work weight. On the other hand, there are 22 PRIVMSG commands.

Table 8.5 Channel Hobo Hosts

image

imageTIP

When looking at ourmon data with a Web browser, use your Web browser search function. For example, if using Firefox, use Control + F and Control + G.

When we go and look at our TCP port report summarization, we discover that 192.168.6.66 has indeed been scanning on ports 139 and port 445. Those are classic ports for Microsoft-based exploits. If we aren’t convinced, we might resort to other measures. For example, if your acceptable-use policy lets you peek at data payloads, you might now use ngrep to look at host 192.168.6.66 or host 10.38.4.27 (because PRIVMSGS exist and at least one host appears to be in contact with the server). A command like this could reveal something interesting:

# ngrep host 192.168.6.66 or host 10.38.4.27

imageTIP

If you are suspicious, watch traffic associated with the server’s IP address. As a result you might see traffic with other infected hosts that you did not yet suspect. If you find a suspicious server IP in the IRC report, search all the way through that report. Note all the channel names where the server’s IP address appears. As a result we could learn that channels hobo and .i-exp have the same server.

As a result of watching the server, you might see an IRC payload like this:

PRIVMSG #.i-exp :[S]CAN WKSSVCE445: Exploiting IP: 192.1.2.4

Oops! You just caught the bad guys in the act. Apparently results for about 445 port scans are being reported, and a new IP on your net might have just been infected.

Using honeypot technologies, we eventually determined that this particular bot is known as toxbot. Symantec calls this one W32. Toxbot.AL. See Symantec’s web page for more information on this bug (www.symantec.com/security_response/writeup.jsp?docid=2005-100715-4523-99).

Last we have our channel alien. This turns out to be a false positive. Although we won’t show the information here, there wasn’t any useful information in the TCP port report that clearly indicated that this was a scan. No well-known attacked ports were shown. In this case, by sheer dumb luck we know who was using the host in question, so we asked them, and they said, “It’s a game.” Sometimes asking people might be what you need to do. If someone says, “Well, no, I don’t use IRC,” you know you have a security problem. Of course, once again we can watch the IRC channel with tools like ngrep to see if people are talking or game commands are going by, or just maybe there are bot commands such as the ones we saw in our example.

Let’s summarize the analysis techniques we might use to decide if an IRC channel is hostile or not:

1. If the channel has a number of hosts in it attacking a few ports, it is probably automated and evil. Use the IRC evil channel report and associated TCP port report summarizations and 30-second logs to give you more details as necessary. You might need to do some research on whether or not the ports are being scanned planetwide (see dshield.org or isc.sans.org).

2. Watch the IRC channel names over time and learn which IRC channels are used for legitimate traffic. This might help you note new and possibly suspicious channel names if they show up. Of course, users might always have a new chat channel, too.

3. You can always watch the channel with a sniffer like ngrep to determine if the traffic is suspicious.

4. Once you learn about a bad botnet server, you should note its IP address and check the IRC logs carefully to see if that IP address shows up with other hosts. The odds are high that those hosts are infected as well.

imageTIP

If you are unsure what the IRC TCP work weight means when it is associated with a host, you can either look the host IP up via the Web in either the basic TCP port report summarization or the syndump summarization, which will have all local enterprise hosts in it. If you want to get a 30-second sample point of view for the host over the day, search the TCP port report log directory with the grep pattern-matching tool. For example, first we change directory to the desired day of the week in the logging directory and then we use find, xargs, and grep to search the saved 30-second reports for the desired host IP address.

#cd /home/mrourmon/logs/portreport/Fri

# find . | xargs grep 192.168.21.138

The output comes out in timestamp order, so you can watch how the host behaved during the day. For example, here are three slightly simplified log entries where we show the timestamp, IP address, work weight, and port signature fields:

20:03:44_PDT 192.168.21.138 (Ew) 81 [80,9][139,23][445,65]
 
20:04:11_PDT 192.168.21.138 (EW) 95 [80,4][139,25][445,64]
 
20:04:45_PDT 192.168.21.138 (EW) 91 [80,0][139,26][445,67]
 

Last, one should point out that a commercial enterprisewide virus platform (like Symantec’s System Center) might have enterprise-level tools that can give you information about whether host X is infected with some known piece of malware. As a result, you might be able to make a correlation between ourmon and the enterprisewide virus system. This can also help you deal with fringe cases such as the host in our alien channel. If you are lucky, your enterprisewide tool might tell you that hosts X, Y, and Z are infected with toxbot or some other bot client. Correlation of a network point of view like ourmon’s and virus detection systems is a new frontier, and we can hope for more in this direction in the future. Of course, you might not be able to make any correlation with virus detection tools if the bot is new and there is as not yet an AV signature.

Detecting an IRC Botnet Server

In this section we look at details for “Case Study #4: Botnet Server.” Around Thanksgiving Day 2005 we unfortunately had a botnet client on campus with the IP address of 192.168.2.51. If we look at a slightly simplified TCP port report line for this IP address at 11:06 PST, we see the data shown in Table 8.6.

Table 8.6 TCP Report for IP Address 192.168.2.51

image

From the application flags (IP), this appears to be a system using IRC that is also scanning into our darknet. It is also using the conventional ports of 139 and 445 for its scanning attacks. It’s a botnet client on a channel called f7, as we learned later. If we come back and look at the same data in the next hour, we find the data shown in Table 8.7.

Table 8.7 192.168.2.51, Later in the Day

image

This host is still scanning but it has now acquired 2881 friends in its 30-second period at 1747 ports, and all 10 port signature buckets are full too (not all shown). In addition, note how the work weight has gone down, but the SA/S value is now nonzero. It appears that the system in question is starting to act like a server. So what happened? The bot client was turned into a bot server. Of course, given the tendency of P2P applications like BitTorrent to have large numbers of peers, maybe it’s an infected bot client with a local user (or the remote hacker?) running BitTorrent. As it turns out, there are other simpler ways to detect a bot server.

So how can you detect a bot server? Some of the simpler ways are:

1. Look at the RRDTOOL IRC network message counts.

2. Look for any IRC channel with too many hosts in it. For example, if you know you have a normal channel called Ubuntu with 20 host IPs in it and all of a sudden you have a channel with 200, 2000, or 200,000 hosts in it, it’s probably a botnet server channel!

3. Look for any IRC server with unusual message counts.

Refer to Figure 8.3 and Figure 6.4 (Case Study #4) in the introductory ourmon chapter. Figure 8.3 gives you normal IRC message counts for the entire PSU network. These really are not very high either. Even the automated parts of IRC, like PING and PONG messages, are on the order of 44 pings per 30-second period, really 1 per second. Now what does Figure 6.4 tell you? All of a sudden we had 2k PINGS and PONGS a second. Large jumps like this in basic message types are a simple giveaway.

Now let’s look at some report data from the IRC daily summarization.

channels sorted by evil factor:

image

channels sorted by max messages (note e/E for possible evil channel):

image

We have shown the beginning of the evil channel and channels by max messages subreports. The channels by max messages subreport is really outstanding in any number of ways. Note that channel blahblah was the busiest human IRC channel for the day. That channel had only 12 IP hosts in it. On the other hand, channel f appears to have 47134 hosts in it. The broken-out listing of hosts for that channel was amazing, but we are not going to show it here. There was only one local IP host in it (the botserver). Of course, the message counts for channel f are high, too, especially compared to the human blahblah channel. Analysis of this report showed that channels f, x, and f-exp were all used by the same botnet. They all had the same bot server.

One other really interesting thing to note is that the botnet shows up in the evil channel sort, which at first makes no sense. Given one on-campus host and 47,133 off-campus hosts in channel f, why did 2629 of those off-campus hosts appear to be scanners? We can only speculate here to some extent, but it’s likely those off-campus hosts are trying to connect to the bot server and failing. This could be because the botnet server has exhausted some set of OS resources, so bot client wannabes cannot connect to it. This is one reason that the TCP port report now shows one sample IP destination host. (At that time it did not show a sample IP destination host.) If at the time it had shown such an IP address destination, all the remote scanners would have shown the IP destination of the local botnet server.

In summary, we have seen at least four ways to tell that you have a bot server on campus:

1. Use the RRDTOOL strip charts to look for outlandish message counts.

2. In the channels by max messages subreport, look for channels with abnormal host counts. Thousands are very likely to be abnormal. Hundreds, depending on your site, could be abnormal.

3. In the channels by max messages subreport, bot servers will have abnormal amounts of messages, too.

4. Bot servers might seem to be undergoing scans from remote hosts and thus could appear in the evil channel sort. Don’t depend on this; it is a scalability problem with the bot server system, but it can happen.

One other curious side effect can be seen by looking at the daily summarization for three sample hosts from that day. Keep in mind that these are summarizations; the numbers were averaged across port reports for the entire day. The first sample is for a client using BitTorrent. The second is for our bot server. The third is for a busy campus Web server. What, if anything, might we learn? (Refer to Chapter 7 for summarization headings.) The interesting part is that the bot server seems to have a higher average for Layer 3 IP destination addresses per sample.

For example, the bot server has an average of 1183 L3D (unique IP destination addresses) versus 106 for the BitTorrent client and 802 for the Web server. This is not a strong result; we have seen BitTorrent clients with counts of over 1000 L3D in 30-second samples. However, it is possible that in general the bot server might tend to have more peers than most other hosts. Packet counts don’t work very well. The bot server sends and receives 3746 and 2516 packets per second. Because the host is used for control data, it might simply not send as many packets as a P2P host or a Web server. The BitTorrent client sends and receives 5296 and 3373 packets per sample period. Another way to look at it is that although the bot server has thousands of clients, it really isn’t sending very many packets. Most of its packets are control packets (PING and PONG and the like) maintaining the client-server connection. Host 192.168.2.2 in the following example is using BitTorrent. Host 192.168.2.51 is, of course, our bot server. Host 192.168.2.3 is a busy Web server.

image
image

Summary

In this chapter we have looked at the IRC protocol, and ourmon’s statistical IRC reports based on four kinds of basic messages, including JOIN, PING, PONG, and PRIVMSG. These messages allow ourmon to extract the channels from IRC and determine which hosts belong to which channels. Ourmon also uses a variation of the TCP work weight used for anomaly detection. The work weight is associated with hosts in a channel, and as a result ourmon can tell you in its evil channel report if a given IRC channel seems to be full of scanning hosts. If so, that channel could be a botnet client mesh. We have also learned to pay attention to channel names so that if new channels pop up, an analyst can investigate them to learn if they are genuine chat channels. We can also use the global RRDTOOL IRC message count strip charts and statistics found primarily in the IRC max message sort to learn if a given local host has become a bot server. From a strict IRC point of view, bot servers stand out compared to ordinary IRC hosts. Hopefully these tools taken together can help an analyst find and cure botnets.

Solutions Fast Track

Understanding the IRC Protocol

imageThe ngrep tool can be used to directly sniff strings on the network.

imageIn IRC, channels are strings. Channels are the fundamental target of data messages.

imageAn IRC network consists of a set of servers and hosts.

imageUsers join a channel and can then send messages to other users. The messages are distributed by the servers to clients interested in the channel.

imageOurmon looks for four fundamental IRC messages, including PINGS and PONGS used by servers to tell if clients still exist, JOIN used to join channels, and PRIVMSG used to send data to channels.

Ourmon’s RRDTOOL Statistics and IRC Reports

imageAll IRC statistics are found on the irc.html page.

imageThe IRC data has three parts: RRDTOOL graphics that show a global network IRC message counts, an hourly summarization (rolled over at midnight to the previous day), and a 30-second report.

imageThe IRC RRDTOOL graph shows message counts for PING, PONG, JOIN, and PRIVMSG IRC messages.

imageThe IRC ASCII report shows global, per channel, and per-host statistics.

imageThe most important parts of the ASCII report are the two channel sorts at the top, including the evil channel sort and the max message sort, as well as the breakdown of each channel with per-host statistics.

imageThe evil channel sort shows IRC channels sorted by the number of scanning hosts (wormy hosts) in the channel.

imageThe max message sort shows IRC channels sorted by the total number of all four kinds of IRC messages.

imageThe per-channel host statistics show the IP addresses of hosts in an IRC channel as well as other data, including the maximum TCP work weight seen for any host in the channel.

imageThe maxworm field in the per-host statistics is really the TCP work weight, as discussed in the previous chapter.

Detecting an IRC Client Botnet

imageAn IRC channel with more than a few (say, two) clients with high maxworm (work weight) values could be a botnet channel.

imageIf there is only a few hosts with high work weights, one should search the TCP port report logs to see if the host has been scanning.

imageNote that nonscanning hosts in an “evil channel” are likely remote botnet servers. It is a good idea to watch those hosts’ behavior with a sniffer.

Detecting an IRC Botnet Server

imageHigh and anomalous counts in the RRDTOOL IRC statistics graph could indicate the presence of a local botnet server.

imageBotnet servers typically have unusual host counts.

imageBotnet servers could have unusual counts for remote IP destinations (L3D).

imageBotnet servers might appear in the evil channel sort. This is due to connection failures by remote exploited hosts.

Frequently Asked Questions

The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the “Ask the Author” form.

Q: Why is the measurement for the TCP work weight weaker here than in the TCP port report? For example, it does not take into account some number of SYNS per second as is the case with the normal work weight.

A: The reason is that we are looking at things from a parallel point of view. We want to see if there are many scanning hosts in a channel. So, for example, if you see a channel with 10 hosts and nine hosts having a summarized work weight of 99, you can take that as meaning the entire channel is infected. On the other hand, one host out of 10 scanning might not mean much. You can go and examine the TCP port reports, either individual logged versions or the daily summarization, and see if you can learn anything more. If you can’t find the host, that means the host had a trivial work weight problem. You can probably ignore it.

Q: In the section on detecting IRC bot servers, why did you mention the L3D statistic?

A: As mentioned in the previous chapter, L3D means the number of unique IP destinations associated with a host during ourmon’s 30-second sample period. This statistic is a Layer 3 (IP layer) statistic and it could never be hidden with encryption.

Q: I tried to use ngrep with an IRC channel name and it didn’t work. Why?

A: Besides obvious problems like the channel is suddenly quiet, you need to know that an IRC channel name is case-insensitive. So, for example, if the channel was LSASS445, we use the –i parameter to do case-insensitive packet matching. We are also looking for PRIVMSG messages only sent to and from a particular host. You could try something like the following:

# ngrep -q -i “PRIVMSG.*#lsass445 tcp and host 192.168.2.3

Q: A 30-second report for IRC exists, but you don’t mention it much here. Why?

A: It might be of some use for debugging or if there is a very active botnet, but in general IRC is a slow communications medium. We have to look for patterns across hours or days.

Q: What happens if the hackers switch to port 666 and use some other protocol for command and control, say ROT 13 (a variation of the Caesar Cipher, in this case rotating the letters 13 times) in a new protocol?

A: This is why we discussed anomaly detection in the previous chapter. Sooner or later they will attack; otherwise owning a box is useless. When they do, the anomaly detection meters will go off. Then you could choose to watch the attacked box with a sniffer and see who is talking to it. If two boxes behave badly, and they are both talking to an outsider, then watch the outsider. Forensics on the attacked host could indicate an IP address for an attacker. These clues might provide you with an address for a bot server. All we have done with the IRC module is automate this task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset