Chapter 13. Reading BIND Debugging Output

“O Tiger-lily!” said Alice, addressing herself to one that was waving gracefully about in the wind, “I wish you could talk!”

“We can talk,” said the Tiger-lily, “when there’s anybody worth talking to.”

One of the tools in your troubleshooting toolbox is the nameserver’s debugging output. As long as your nameserver has been compiled with DEBUG defined, you can get query-by-query reports of its internal operation. The messages you get are often quite cryptic; they were meant for someone who has the source code to follow. We’ll explain some of the debugging output in this chapter. Our goal is to cover just enough for you to follow what the nameserver is doing; we aren’t trying to supply an exhaustive compilation of debugging messages.

As you read through the explanations here, think back to material covered in earlier chapters. Seeing this information again, in another context, should help you understand more fully how a nameserver works.

Debugging Levels

The amount of information the nameserver provides depends on the debugging level. The lower the debugging level, the less information you get. Higher debugging levels give you more information, but they also fill up your disk faster. After you’ve read a lot of debugging output, you’ll develop a feel for how much information you’ll need to solve any particular problem. Of course, if you can easily recreate the problem, you can start at level 1 and increase the debugging level until you have enough information. For the most basic problem—why a name can’t be looked up—level 1 will often suffice, so you should start there.

What Information Is at Each Level?

Here’s a list of the information that each debugging level produces for BIND 8 and BIND 9 nameservers. The debugging information is cumulative; for example, level 2 includes all of level 1’s debugging information. The data is divided into the following basic areas: starting up, updating the database, processing queries, and maintaining zones. We won’t cover updating the nameserver’s internal database; problems almost always occur elsewhere. However, what the nameserver adds or deletes from its internal database can be a problem, as you’ll see in Chapter 14.

BIND 8 and 9 have a whopping 99 debug levels, but most of the debugging messages are logged at just a few of those levels. We’ll look at those now.

BIND 8 debugging levels

Level 1

The information at this level is necessarily brief. Nameservers can process lots of queries, which can create lots of debugging output. Since the output is condensed, you can collect data over long periods. Use this debugging level for basic startup information and for watching query transactions. You’ll see some errors logged at this level, including syntax errors and DNS packet-formatting errors. This level also shows referrals.

Level 2

Level 2 provides lots of useful stuff: it lists the IP addresses of remote nameservers used during a lookup, along with their round-trip time values; it calls out bad responses; and it tags a response as to which type of query it is answering—a SYSTEM (sysquery) or a USER query. When you are tracking down a problem with a slave server loading a zone, this level shows you the zone values—serial number, refresh time, retry time, expire time, and time left—as the slave checks if it is up to date with its master.

Level 3

Level 3 debugging becomes much more verbose because it generates lots of messages about updating the nameserver database. Make sure you have enough disk space if you are going to collect debugging output at level 3 or above. At level 3, you also see duplicate queries called out, system queries generated (sysquery), the names of the remote nameservers used during a lookup, and the number of addresses found for each server.

Level 4

Use level 4 debugging when you want to see the query and response packets received by the nameserver. This level also shows the credibility level for cached data.

Level 5

There are a variety of messages at level 5, but none of them is particularly useful for general debugging. This level includes some error messages—for example, when a malloc() fails or when the nameserver gives up on a query.

Level 6

Level 6 shows the response sent to the original query.

Level 7

Level 7 shows a few configuration and parsing messages.

Level 8

There is no significant debugging information at this level.

Level 9

There is no significant debugging information at this level.

Level 10

Use level 10 debugging when you want to see the query and response packets sent by the nameserver. The format of these packets is the same format used in level 4. You won’t use this level very often because you can see the nameserver response packet with nslookup or dig.

Level 11

There are only a couple of debugging messages at and above this level, and they are in seldom-traversed code.

BIND 9 debugging levels

Level 1

Level 1 shows you basic nameserver operation: zone loading, maintenance (including SOA queries, zone transfers and zone expiration, and cache cleaning), NOTIFY messages, and high-level tasks dispatched (such as looking up addresses for a nameserver).

Level 2

Level 2 logs multicast requests.

Level 3

Level 3 shows you low-level task creation and operation. Unfortunately, most of these tasks don’t have particularly descriptive names (requestmgr_detach?), and the arguments they report are awfully cryptic. Level 3 also shows you journal activity, such as when the nameserver writes a record of a zone change to the zone’s journal or when the nameserver applies a journal to a zone at startup. Operation of the DNSSEC validator and checking of TSIG signatures also come in at debug level 3.

Level 4

Level 4 logs when a master nameserver falls back to using AXFR because the transferred zone’s journal isn’t available.

Level 5

Level 5 logs which view was used while satisfying a particular request.

Level 6

A handful of outbound zone transfer messages are logged at level 6, including checks of the query that initiated the transfer.

Level 7

There are only a couple of new debugging messages at this level: logging of journal adds and deletes, and a count of how many bytes were returned by a zone transfer.

Level 8

Many dynamic update messages are logged at level 8: prerequisite checks, writing journal entries, and rollbacks. Several low-level zone transfer messages also appear here, including a log of resource records sent in a zone transfer.

Level 10

Level 10 reports a couple of messages about zone timer activity.

Level 20

Level 20 reports an update to a zone’s refresh timer.

Level 90

Low-level operation of the BIND 9 task dispatcher is logged at level 90.

With BIND 8 and BIND 9, you can configure the nameserver to print out the debug level with the debug message. Just turn on the logging option print-severity, as explained in the section "Logging" in Chapter 7.

Keep in mind that this is debugging information; it was used by the authors of BIND to debug the code, so it is not as readable as you might like. You can use it to figure out why the nameserver isn’t doing what you think it should be or just to learn how the nameserver operates. However, don’t expect nicely designed, carefully formatted output.

Turning On Debugging

Nameserver debugging can be started either from the command line or with control messages. If you need to see the startup information to diagnose your current problem, you’ll have to use the command-line option. If you want to start debugging on a nameserver that is already running, or if you want to turn off debugging, you’ll have to use controls. The nameserver writes its debugging output to named.run in the nameserver’s working directory.

Debugging Command-Line Option

When troubleshooting, you sometimes need to see the sortlist, know which interface a file descriptor is bound to, or find out where in the initialization stage the nameserver was when it exited (if the syslog error message wasn’t clear enough). To see this kind of debugging information, you’ll have to start debugging with a command-line option; by the time you send a control message, it will be too late. The command-line option for debugging is —d level.

Changing the Debugging Level with Control Messages

If you don’t need to see the nameserver’s initialization, start your nameserver without the debugging command-line option. You can later turn debugging on and off using rndc (or, with BIND 8, ndc) to send the appropriate control message to the nameserver process. Here, we set debugging to level 3, then turn off debugging:

#rndc trace 3
# rndc notrace

And, as you might expect, if you turn on debugging from the command line, you can still use rndc to change the nameserver’s debug level.

Reading Debugging Output

We’ll cover five examples of debugging output. The first example shows the nameserver starting up. The next two examples show successful name lookups. The fourth example shows a slave nameserver keeping its zone up to date. And in the last example, we switch from showing you nameserver behavior to showing you resolver behavior: the resolver search algorithm. After each trace (except the last one), we killed the nameserver and started it again so that each trace started with a fresh, nearly empty cache.

You might wonder why we’ve chosen to show normal nameserver behavior for all our examples; after all, this chapter is about debugging. We’re showing you normal behavior because you have to know what normal operation is before you can track down abnormal operation. Another reason is to help you understand the concepts (retransmissions, roundtrip times, etc.) we described in earlier chapters.

Nameserver Startup (BIND 8, Debug Level 1)

We’ll start the debugging examples by watching the nameserver initialize. This first nameserver is a BIND 8 nameserver. We used -d 1 on the command line, and this is the named.run output that resulted:

1 Debug level 1
2 Version = named 8.2.3-T7B Mon Aug 21 19:21:21 MDT 2000
3 [email protected]:/usr/local/src/bind-8.2.3-T7B/src/bin/named
4 conffile = ./named.conf
5 starting.  named 8.2.3-T7B Mon Aug 21 19:21:21 MDT 2000
6 [email protected]:/usr/local/src/bind-8.2.3-T7B/src/bin/named
7 ns_init(./named.conf)
8 Adding 64 template zones
9 update_zone_info('0.0.127.in-addr.arpa', 1)
10 source = db.127.0.0
11 purge_zone(0.0.127.in-addr.arpa,1)
12 reloading zone
13 db_load(db.127.0.0, 0.0.127.in-addr.arpa, 1, Nil, Normal)
14 purge_zone(0.0.127.in-addr.arpa,1)
15 master zone "0.0.127.in-addr.arpa" (IN) loaded (serial 2000091500)
16 zone[1] type 1: '0.0.127.in-addr.arpa' z_time 0, z_refresh 0
17 update_zone_info('.', 3)
18 source = db.cache
19 reloading hint zone
20 db_load(db.cache, , 2, Nil, Normal)
21 purge_zone(,1)
22 hint zone "" (IN) loaded (serial 0)
23 zone[2] type 3: '.' z_time 0, z_refresh 0
24 update_pid_file(  )
25 getnetconf(generation 969052965)
26 getnetconf: considering lo [127.0.0.1]
27 ifp->addr [127.0.0.1].53 d_dfd 20
28 evSelectFD(ctx 0x80d8148, fd 20, mask 0x1, func 0x805e710, uap 0x40114344)
29 evSelectFD(ctx 0x80d8148, fd 21, mask 0x1, func 0x8089540, uap 0x4011b0e8)
30 listening on [127.0.0.1].53 (lo)
31 getnetconf: considering eth0 [192.249.249.3]
32 ifp->addr [192.249.249.3].53 d_dfd 22
33 evSelectFD(ctx 0x80d8148, fd 22, mask 0x1, func 0x805e710, uap 0x401143b0)
34 evSelectFD(ctx 0x80d8148, fd 23, mask 0x1, func 0x8089540, uap 0x4011b104)
35 listening on [206.168.194.122].53 (eth0)
36 fwd ds 5 addr [0.0.0.0].1085
37 Forwarding source address is [0.0.0.0].1085
38 evSelectFD(ctx 0x80d8148, fd 5, mask 0x1, func 0x805e710, uap 0)
39 evSetTimer(ctx 0x80d8148, func 0x807cbe8, uap 0x40116158, due
969052990.812648000, inter 0.000000000)
40 exit ns_init(  )
41 update_pid_file(  )
42 Ready to answer queries.
43 prime_cache: priming = 0, root = 0
44 evSetTimer(ctx 0x80d8148, func 0x805bc30, uap 0, due
969052969.000000000, inter 0.000000000)
45 sysquery: send -> [192.33.4.12].53 dfd=5 nsid=32211 id=0 retry=969052969
46 datagram from [192.33.4.12].53, fd 5, len 436
47 13 root servers

We added the line numbers to the debugging output; you won’t see them in yours. Lines 2–6 give the version of BIND you are running and the name of the configuration file. Version 8.2.3-T 7B was released by ISC (Internet Software Consortium) in August 2000. We used the configuration file in the current directory, ./named.conf, for this run.

Lines 7–23 show BIND reading the configuration file and the zone datafiles. This nameserver is a caching-only nameserver—the only files read are db.127.0.0 (lines 9–16) and db.cache (lines 17–23). Line 9 shows the zone being updated (0.0.127.inaddr.arpa), and line 10 shows the file containing the zone data (db.127.0.0). Line 11 indicates that any old data for the zone is purged before new data is added. Line 12 says the zone is being reloaded, even though the zone is actually being loaded for the first time. The zone data is loaded during lines 13–15. On lines 16 and 23, z_time is the time to check when this zone is up to date; z_refresh is the zone refresh time. These values matter only if the nameserver is a slave for the zone.

Lines 25–39 show the initialization of file descriptors. (In this case, they’re really socket descriptors.) File descriptors 20 and 21 (lines 27–29) are bound to 127.0.0.1, the loopback address. Descriptor 20 is a datagram socket, and descriptor 21 is a stream socket. File descriptors 22 and 23 (lines 32–34) are bound to the 192.249.249.3 interface. Each interface address was considered and used; they would not be used if the interface had not been initialized or if the address were already in the list. File descriptor 5 (lines 36–39) is bound to 0.0.0.0, the wildcard address. Most network daemons use only one socket bound to the wildcard address, not sockets bound to individual interfaces. The wildcard address picks up packets sent to any interface on the host. Let’s digress for a moment to explain why named uses both a socket bound to the wildcard address and sockets bound to specific interfaces.

When named receives a request from an application or from another nameserver, it receives the request on one of the sockets bound to a specific interface. If named did not have sockets bound to specific interfaces, it would receive the requests on the socket bound to the wildcard address. When named sends back a response, it uses the same socket descriptor that the request came in on. Why does named do this? When responses are sent out via the socket bound to the wildcard address, the kernel fills in the sender’s address with the address of the interface the response was actually sent out on. This address may or may not be the same address that the request was sent to. When responses are sent out via the socket bound to a specific address, the kernel fills in the sender’s address with that specific address—the same address the request was sent to. If a nameserver gets a response from an IP address it doesn’t know about, the response is tagged a martian and discarded. named tries to avoid causing martian responses by sending its responses on descriptors bound to specific interfaces, so the address it replies from is the same address the request was sent to. However, when named sends out queries, it uses the wildcard descriptor because there is no need to use a specific IP address.

Lines 43–47 show the nameserver sending out a system query to find which nameservers are currently serving the root zone. This is known as priming the cache. The first server queried sent a response that included 13 nameservers.

The nameserver is now initialized and ready to answer queries.

Nameserver Startup (BIND 9, Debug Level 1)

Here’s what a BIND 9 nameserver looks like starting up. When you start the nameserver, BIND 9 logs debug messages to named.run in your shell’s current directory. Then, after BIND 9 has read the configuration file and switches to the directory in which your database files are stored, it logs debug messages to named.run in that directory. Here is the named.run from your shell’s current directory:

1 26-Jun-2005 15:34:23.136 starting BIND 9.3.2 -d1
2 26-Jun-2005 15:34:23.232 loading configuration from '/etc/named.conf'
3 26-Jun-2005 15:34:23.247 no IPv6 interfaces found
4 26-Jun-2005 15:34:23.247 listening on IPv4 interface lo, 127.0.0.1#53
5 26-Jun-2005 15:34:23.248 listening on IPv4 interface eth0, 192.249.249.3#53
6 26-Jun-2005 15:34:23.255 command channel listening on 127.0.0.1#953

Lines 1 and 2 show the version of BIND we’re running (9.3.2) and the configuration file it’s reading.

Line 3 reminds us that our host doesn’t have any IP Version 6 network interfaces; if it did, BIND 9 could listen on those interfaces for queries.

Lines 4 and 5 show the nameserver listening on two network interfaces: lo, the loopback interface, and eth0, the Ethernet interface. BIND 9 displays the address and port in the format address#port, unlike BIND 8, which uses [address].port. Line 6 shows named listening on port 953, the default port, for control messages.

At this point, BIND 9 reads the configuration file and switches to the directory with your database files if your configuration file has an options statement that specifies a new directory, like the one shown here:

options {
        directory "/var/named";
};

Here is the named.run from /var/named:

1 26-Jun-2005 15:34:23.255 now using logging configuration from config file
2 26-Jun-2005 15:34:23.256 load_configuration: success
3 26-Jun-2005 15:34:23.256 zone 0.0.127.IN-ADDR.ARPA/IN: starting load
4 26-Jun-2005 15:34:23.258 zone 0.0.127.IN-ADDR.ARPA/IN: loaded
5 26-Jun-2005 15:34:23.258 zone 0.0.127.IN-ADDR.ARPA/IN: journal
rollforward completed successfully: no journal
6 26-Jun-2005 15:34:23.258 zone 0.0.127.IN-ADDR.ARPA/IN: loaded serial 3
7 26-Jun-2005 15:34:23.258 zone authors.bind/CH: starting load
8 26-Jun-2005 15:34:23.259 zone authors.bind/CH: loaded
9 26-Jun-2005 15:34:23.259 zone hostname.bind/CH: starting load
10 26-Jun-2005 15:34:23.259 zone hostname.bind/CH: loaded
11 26-Jun-2005 15:34:23.259 zone version.bind/CH: starting load
12 26-Jun-2005 15:34:23.259 zone version.bind/CH: loaded
13 26-Jun-2005 15:34:23.260 zone id.server/CH: starting load
14 26-Jun-2005 15:34:23.260 zone id.server/CH: loaded
15 26-Jun-2005 15:34:23.260 dns_zone_maintenance: zone 0.0.127.IN-ADDR.ARPA/IN: enter
16 26-Jun-2005 15:34:23.260 dns_zone_maintenance: zone version.bind/CH: enter
17 26-Jun-2005 15:34:23.260 dns_zone_maintenance: zone hostname.bind/CH: enter
18 26-Jun-2005 15:34:23.260 dns_zone_maintenance: zone authors.bind/CH: enter
19 26-Jun-2005 15:34:23.260 dns_zone_maintenance: zone id.server/CH: enter
20 26-Jun-2005 15:34:23.263 running

Lines 3–6 show the nameserver loading 0.0.127.in-addr.arpa. The starting and loaded messages are self-explanatory. The no journal message indicates that no journal file was present. (A journal file, described in Chapter 10, is a record of dynamic updates the nameserver received for the zone.)

Lines 7–14 show the nameserver loading the built-in CHAOSNET zones authors.bind, hostname.bind, version.bind, and id.server.

Finally, lines 15–19 show the nameserver doing maintenance on its zones. Zone maintenance is the process that schedules periodic tasks, such as SOA queries for slave and stub zones or NOTIFY messages.

If you are curious about what is in the built-in CHOASNET zones, you can query your nameserver for the zone data, as we did here with a dig query for authors.bind, record type any, class CHAOSNET:

#dig @wormhole.movie.edu authors.bind any c
; <<>> DiG 9.3.2 <<>> @wormhole.movie.edu authors.bind any ch
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6822
;; flags: qr aa rd; QUERY: 1, ANSWER: 14, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;authors.bind.            CH    ANY

;; ANSWER SECTION:
authors.bind.        0    CH    TXT    "Mark Andrews"
authors.bind.        0    CH    TXT    "James Brister"
authors.bind.        0    CH    TXT    "Ben Cottrell"
authors.bind.        0    CH    TXT    "Michael Graff"
authors.bind.        0    CH    TXT    "Andreas Gustafsson"
authors.bind.        0    CH    TXT    "Bob Halley"
authors.bind.        0    CH    TXT    "David Lawrence"
authors.bind.        0    CH    TXT    "Danny Mayer"
authors.bind.        0    CH    TXT    "Damien Neil"
authors.bind.        0    CH    TXT    "Matt Nelson"
authors.bind.        0    CH    TXT    "Michael Sawyer"
authors.bind.        0    CH    TXT    "Brian Wellington"
authors.bind.        86400    CH    SOA    authors.bind.
hostmaster.authors.bind. 0 28800 7200 604800 86400
authors.bind.        0    CH    NS    authors.bind.

;; Query time: 2 msec
;; SERVER: wormhole.movie.edu#53(192.249.249.1)
;; WHEN: Sun Jun 26 16:30:28 2005
;; MSG SIZE  rcvd: 402

A Successful Lookup (BIND 8, Debug Level 1)

Suppose you want to watch the nameserver look up a name. Your nameserver wasn’t started with debugging. Use ndc once to turn on debugging, look up the name, then use it again to turn off debugging, like this:

#ndc trace 1
# /etc/ping galt.cs.purdue.edu.
# ndc notrace

We did this; here’s the resulting named.run file:

datagram from [192.249.249.3].1162, fd 20, len 36

req: nlookup(galt.cs.purdue.edu) id 29574 type=1 class=1
req: missed 'galt.cs.purdue.edu' as '' (cname=0)
forw: forw -> [198.41.0.10].53 ds=4 nsid=40070 id=29574 2ms retry 4sec
datagram from [198.41.0.10].53, fd 4, len 343

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40070
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 9
;;              galt.cs.purdue.edu, type = A, class = IN
EDU.                        6D IN NS    A.ROOT-SERVERS.NET.
EDU.                        6D IN NS    H.ROOT-SERVERS.NET.
EDU.                        6D IN NS    B.ROOT-SERVERS.NET.
EDU.                        6D IN NS    C.ROOT-SERVERS.NET.
EDU.                        6D IN NS    D.ROOT-SERVERS.NET.
EDU.                        6D IN NS    E.ROOT-SERVERS.NET.
EDU.                        6D IN NS    I.ROOT-SERVERS.NET.
EDU.                        6D IN NS    F.ROOT-SERVERS.NET.
EDU.                        6D IN NS    G.ROOT-SERVERS.NET.
A.ROOT-SERVERS.NET.             5w6d16h IN A    198.41.0.4
H.ROOT-SERVERS.NET.             5w6d16h IN A    128.63.2.53
B.ROOT-SERVERS.NET.             5w6d16h IN A    128.9.0.107
C.ROOT-SERVERS.NET.             5w6d16h IN A    192.33.4.12
D.ROOT-SERVERS.NET.             5w6d16h IN A    128.8.10.90
E.ROOT-SERVERS.NET.             5w6d16h IN A    192.203.230.10
I.ROOT-SERVERS.NET.             5w6d16h IN A    192.36.148.17
F.ROOT-SERVERS.NET.             5w6d16h IN A    192.5.5.241
G.ROOT-SERVERS.NET.             5w6d16h IN A    192.112.36.4
resp: nlookup(galt.cs.purdue.edu) qtype=1
resp: found 'galt.cs.purdue.edu' as 'edu' (cname=0)
resp: forw -> [192.36.148.17].53 ds=4 nsid=40071 id=29574 1ms
datagram from [192.36.148.17].53, fd 4, len 202

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40071
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 4
;;   galt.cs.purdue.edu, type = A, class = IN
PURDUE.EDU.                 2D IN NS    NS.PURDUE.EDU.
PURDUE.EDU.                 2D IN NS    MOE.RICE.EDU.
PURDUE.EDU.                 2D IN NS    PENDRAGON.CS.PURDUE.EDU.
PURDUE.EDU.                 2D IN NS    HARBOR.ECN.PURDUE.EDU.
NS.PURDUE.EDU.              2D IN A     128.210.11.5
MOE.RICE.EDU.               2D IN A     128.42.5.4
PENDRAGON.CS.PURDUE.EDU.    2D IN A  128.10.2.5
HARBOR.ECN.PURDUE.EDU.      2D IN A     128.46.199.76
resp: nlookup(galt.cs.purdue.edu) qtype=1
resp: found 'galt.cs.purdue.edu' as 'cs.purdue.edu' (cname=0)
resp: forw -> [128.46.199.76].53 ds=4 nsid=40072 id=29574 8ms
datagram from [128.46.199.76].53, fd 4, len 234

send_msg -> [192.249.249.3].1162 (UDP 20) id=29574
Debug off

First, notice that IP addresses, not domain names, are logged—odd for a nameserver, don’t you think? It’s really not that odd, though. If you are trying to debug a problem with looking up names, you don’t want the nameserver looking up additional names just to make the debugging output more readable; the extra queries would interfere with the debugging. None of the debugging levels translate IP addresses into domain names. You have to use a tool (like the one we provide later) to convert them for you.

Let’s go through this debugging output line by line. This detailed approach is important if you want to understand what each line means. If you turn on debugging, you’re probably trying to find out why some name can’t be looked up, and you’re going to have to figure out what the trace means.

datagram from [192.249.249.3].1162, fd 20, len 36

A datagram came from the host with IP address 192.249.249.3 (toystory.movie.edu ). You may see the datagram come from 127.0.0.1 if the sender is on the same host as the nameserver. The sending application used port 1162. The nameserver received the datagram on file descriptor (fd) 20. The startup debugging output, like the one shown earlier, tells you which interface file descriptor 20 is bound to. The length (len) of the datagram was 36 bytes.

req: nlookup(galt.cs.purdue.edu) id 29574 type=1 class=1

Since the next debugging line starts with req, we know that the datagram was a request. The name looked up in the request was galt.cs.purdue.edu. The request ID is 29574. The type=1 means the request is for address information. The class=1 means the class is IN. You can find a complete list of query types and classes in the header file /usr/ include/arpa/nameser.h.

req: missed 'galt.cs.purdue.edu' as '' (cname=0)

The nameserver looked up the requested name and didn’t find it. It then tried to find a remote nameserver to ask; none was found until the root zone (the empty quotes). The cname=0 means the nameserver didn’t encounter a CNAME record. If it does see a CNAME record, the canonical name is looked up instead of the original name, and cname will be nonzero.

forw: forw -> [198.41.0.10].53 ds=4 nsid=40070 id=29574 2ms retry 4sec

The query was forwarded to the nameserver (port 53) on host 198.41.0.10 (j.root-servers.net). The nameserver used file descriptor 4 (which is bound to the wildcard address) to send the query. The nameserver tagged this query with ID number 40070 (nsid=40070) so that it could match the response to the original question. The application used ID number 29574 (id=29574), as you saw on the nlookup line. The nameserver will wait four seconds before trying the next nameserver.

datagram from [198.41.0.10].53, fd 4, len 343

The nameserver on j.root-servers.net responded. Since the response was a delegation, it is printed in full in the debug log.

resp: nlookup(galt.cs.purdue.edu) qtype=1

After the information in the response message is cached, the name is looked up again. As mentioned earlier, qtype=1 means that the nameserver is looking for address information.

resp: found 'galt.cs.purdue.edu' as 'edu' (cname=0)
resp: forw -> [192.36.148.17].53 ds=4 nsid=40071 id=29574 1ms
datagram from [192.36.148.17].53, fd 4, len 202

The root nameserver responds with a delegation to the edu servers. The same query is sent to 192.36.148.17 (i.root-servers.net), one of the edu servers. i.root-servers.net responds with information about the purdue.edu servers.

resp: found 'galt.cs.purdue.edu' as 'cs.purdue.edu' (cname=0)

This time there is some information at the cs.purdue.edu level.

resp: forw -> [128.46.199.76].53 ds=4 nsid=40072 id=29574 8ms

A query is sent to the nameserver on 128.46.199.76 (harbor.ecn.purdue.edu). This time the nameserver ID is 40072.

datagram from [128.46.199.76].53, fd 4, len 234

The nameserver on harbor.ecn.purdue.edu responded. We have to look at what happens next to figure out the contents of this response.

send_msg -> [192.249.249.3].1162 (UDP 20) id=29574

The last response must have contained the address requested, since the nameserver responded to the application (which used port 1162, if you look back at the original query). The response was in a UDP packet (as opposed to a TCP connection), and it used file descriptor 20.

This nameserver was “quiet” when we did this trace; it wasn’t handling other queries at the same time. When you do a trace on an active nameserver, though, you won’t be so lucky. You’ll have to sift through the output and patch together those pieces that pertain to the lookup in which you are interested. It’s not that hard, though. Start up your favorite editor, search for the nlookup line with the name you looked up, then trace the entries with the same nsid. You’ll see how to follow the nsid in the next BIND 8 trace.

A Successful Lookup (BIND 9, Debug Level 1)

We’ll show you the debugging output produced by looking up the same domain name on a BIND 9 nameserver at debug level 1, but it’s almost laughably short. Still, as we said, it’s important to know what debugging output looks like under correct operation. Anyway, here goes:

1 28-Jun-2005 21:14:20.554 createfetch: galt.cs.purdue.edu A
2 28-Jun-2005 21:14:20.568 createfetch: . NS

This does in fact show you the query being processed. You will get better data if you turn on query logging to see the actual queries. After adding the following lines to /etc/named.conf:

logging {
 category queries {
     default_debug;
 };
};

you get the following debug output:

1 28-Jun-2005 21:16:36.080 client 192.249.249.3#1090: query: galt.cs.purdue.edu IN A +
2 28-Jun-2005 21:16:36.081 createfetch: galt.cs.purdue.edu A
3 28-Jun-2005 21:16:36.081 createfetch: . NS

The first line tells us that a client at IP address 192.249.249.3 (that is, the local host), running on port 1090, sent us a query for galt.cs.purdue.edu’s address. The + at the end of the line indicates that recursion was requested. The other lines are logged by the portion of the nameserver that does name resolution to let us know what it’s up to.

A Successful Lookup with Retransmissions (BIND 8, Debug Level 1)

Not all lookups are as “clean” as the last one: sometimes the query must be retransmitted. The user doesn’t see any difference as long as the lookup succeeds, although a query involving retransmissions will take longer. Following is a trace where there are retransmissions. We converted the IP addresses to domain names after the trace was done. Notice how much easier it is to read with names!

1 Debug turned ON, Level 1
2
3 datagram from toystory.movie.edu port 3397, fd 20, len 35
4 req: nlookup(ucunix.san.uc.edu) id 1 type=1 class=1
5 req: found 'ucunix.san.uc.edu' as 'edu' (cname=0)
6 forw: forw -> i.root-servers.net port 53  ds=4 nsid=2 id=1 0ms retry 4 sec
7
8 datagram from i.root-servers.net port 53, fd 4, len 240
<delegation lines removed>
9 resp: nlookup(ucunix.san.uc.edu) qtype=1
10 resp: found 'ucunix.san.uc.edu' as 'san.uc.edu' (cname=0)
11 resp: forw -> uceng.uc.edu port 53 ds=4 nsid=3 id=1 0ms
12 resend(addr=1 n=0) - > ucbeh.san.uc.edu port 53 ds=4 nsid=3 id=1 0ms
13
14 datagram from toystory.movie.edu port 3397, fd 20, len 35
15 req: nlookup(ucunix.san.uc.edu) id 1 type=1 class=1
16 req: found 'ucunix.san.uc.edu' as 'san.uc.edu' (cname=0)
17 resend(addr=2 n=0) - > uccba.uc.edu port 53 ds=4 nsid=3 id=1 0ms
18 resend(addr=3 n=0) - > mail.cis.ohio-state.edu port 53 ds=4 nsid=3 id=1 0ms
19
20 datagram from mail.cis.ohio-state.edu port 53, fd 4, len 51
21 send_msg -> toystory.movie.edu (UDP 20 3397) id=1

This trace starts out the same way as the last trace (lines 1–11): the nameserver receives a query for ucunix.san.uc.edu, sends the query to an edu nameserver (i.root-servers.net), receives a response that includes a list of nameservers for uc.edu, and sends the query to one of the uc.edu nameservers (uceng.uc.edu).

What’s new in this trace are the resend lines (lines 12, 17, and 18). The forw on line 11 counts as resend(addr=0 n=0); we CS dweebs always start counting at zero. Since uceng.uc.edu didn’t respond, the nameserver goes on to try ucbeh.san.uc.edu (line 12), uccba.uc.edu (line 17), and mail.cis.ohio-state.edu (line 18). The off-site nameserver on mail.cis.ohio-state.edu finally responds (line 20). Notice that you can track all the retransmissions by searching for nsid=3; that’s important to know, because lots of other queries may be wedged between these.

Also, notice the second datagram from toystory.movie.edu (line 14). It has the same port, file descriptor, length, ID, and type as the query on line 3. The application didn’t receive a response in time, so it retransmitted its original query. Since the nameserver is still working on the first query transmitted, this one is a duplicate. It doesn’t say so in this output, but the nameserver detected the duplicate and dropped it. We can tell because there is no forw: line after the req: lines, as there was on lines 4–6.

Can you guess what this output might look like if the nameserver has trouble looking up a name? You’d see a lot of retransmissions as the nameserver kept trying to look up the name (which you could track by matching the nsid= lines). You’d see the application send a couple more retransmissions, thinking that the nameserver hadn’t received the application’s first query. Eventually, the nameserver would give up, usually after the application itself gave up.

With a BIND 9.1.0 or later nameserver, you won’t see resends until debug level 3, and at that point they’ll be very difficult to pick out from BIND 9’s other logged messages. Moreover, even at debug level 3, BIND 9.1.0 doesn’t tell you which nameserver it’s resending to.

A Slave Nameserver Checking Its Zone (BIND 8, Debug Level 1)

In addition to tracking down problems with nameserver lookups, you may have to track down why a slave server is not loading from its master. Tracking down this problem can often be done by simply comparing the zone’s SOA serial numbers on the two servers using nslookup or dig, as we’ll show in Chapter 14. If your problem is more elusive, you may have to resort to looking at the debugging information. We’ll show you what the debugging information should look like if your server is running normally.

This debugging output was generated on a “quiet” nameserver—one not receiving any queries—to show you exactly which lines pertain to zone maintenance. Remember that a BIND 8 slave nameserver uses a child process to transfer the zone data to the local disk before reading it in. While the slave logs its debugging information to named.run, the slave’s child process logs its debugging information to xfer.ddt.PID. The PID suffix, by default the process ID of the child process, may be changed to ensure that the filename is unique. Beware—turning on debugging on a slave nameserver will leave xfer.ddt.PID files lying around, even if you are only trying to trace a lookup. Our trace is at debugging level 1, and we turned on the BIND 8 logging option print-time. Debug level 3 gives you more information, more than you may want if a transfer actually occurs. A debugging level 3 trace of a zone transfer of several hundred resource records can create an xfer.ddt.PID file several megabytes in size.

21-Feb 00:13:18.026 do_zone_maint for zone movie.edu (class IN)
21-Feb 00:13:18.034 zone_maint('movie.edu')
21-Feb 00:13:18.035 qserial_query(movie.edu)
21-Feb 00:13:18.043 sysquery: send -> [192.249.249.3].53 dfd=5
                         nsid=29790 id=0 retry=888048802
21-Feb 00:13:18.046 qserial_query(movie.edu) QUEUED
21-Feb 00:13:18.052 next maintenance for zone 'movie.edu' in 2782 sec
21-Feb 00:13:18.056 datagram from [192.249.249.3].53, fd 5, len 380
21-Feb 00:13:18.059 qserial_answer(movie.edu, 26739)
21-Feb 00:13:18.060 qserial_answer: zone is out of date
21-Feb 00:13:18.061 startxfer(  ) movie.edu
21-Feb 00:13:18.063 /usr/etc/named-xfer -z movie.edu -f db.movie
                    -s 26738 -C 1 -P 53 -d 1 -l xfer.ddt 192.249.249.3
21-Feb 00:13:18.131 started xfer child 390
21-Feb 00:13:18.132 next maintenance for zone 'movie.edu' in 7200 sec

21-Feb 00:14:02.089 endxfer: child 390 zone movie.edu returned
                         status=1 termsig=-1
21-Feb 00:14:02.094 loadxfer(  ) "movie.edu"
21-Feb 00:14:02.094 purge_zone(movie.edu,1)

21-Feb 00:14:30.049 db_load(db.movie, movie.edu, 2, Nil)
21-Feb 00:14:30.058 next maintenance for zone 'movie.edu' in 1846 sec

21-Feb 00:17:12.478 slave zone "movie.edu" (IN) loaded (serial 26739)
21-Feb 00:17:12.486 no schedule change for zone 'movie.edu'

21-Feb 00:42:44.817 Cleaned cache of 0 RRs

21-Feb 00:45:16.046 do_zone_maint for zone movie.edu (class IN)
21-Feb 00:45:16.054 zone_maint('movie.edu')
21-Feb 00:45:16.055 qserial_query(movie.edu)
21-Feb 00:45:16.063 sysquery: send -> [192.249.249.3].53 dfd=5
                         nsid=29791 id=0 retry=888050660
21-Feb 00:45:16.066 qserial_query(movie.edu) QUEUED
21-Feb 00:45:16.067 next maintenance for zone 'movie.edu' in 3445 sec
21-Feb 00:45:16.074 datagram from [192.249.249.3].53, fd 5, len 380
21-Feb 00:45:16.077 qserial_answer(movie.edu, 26739)
21-Feb 00:45:16.078 qserial_answer: zone serial is still OK
21-Feb 00:45:16.131 next maintenance for zone 'movie.edu' in 2002 sec

Unlike the previous traces, each line in this trace has a timestamp. The timestamp makes it clear which debug statements are grouped together.

This nameserver is a slave for a single zone, movie.edu. The line with time 00:13:18.026 shows that it is time to check with the master server. The server queries for the zone’s SOA record and compares serial numbers before deciding to load the zone. The lines with times 00:13:18.059 through 00:13:18.131 show the zone’s serial number (26739), show that the zone is out of date, and start a child process (pid 390) to transfer the zone. At time 00:13:18.132, a timer is set to expire 7,200 seconds later. This is the amount of time the server allows for a transfer to complete. At time 00:14:02.089, you see the exit status of the child process. The status of 1 indicates that the zone data was successfully transferred. The old zone data is purged (time 00:14:02.094), and the new data is loaded.

The next maintenance (see time 00:14:30.058) is scheduled for 1,846 seconds later. For this zone, the refresh interval is 3,600, but the nameserver chose to check again in 1,846 seconds. Why? The nameserver is trying to avoid having its refresh timer become synchronized. Instead of using 3,600 exactly, it uses a random time between half the refresh interval (1,800) and the full refresh interval (3,600). At 00:45:16.046, the zone is checked again, and this time it is up to date.

If your trace ran long enough, you’d see more lines like the one at 00:42:44.817—one line each hour. What’s happening is that the server is making a pass through its cache, freeing any data that has expired to reduce the amount of memory used.

The master server for this zone is a BIND 4 nameserver. If the master is a BIND 8 nameserver, the slave is notified when the zone changes rather than waiting for the refresh interval to pass. The slave server’s debug output looks almost exactly the same, but the trigger to check the zone status is a NOTIFY:

rcvd NOTIFY(movie.edu, IN, SOA) from [192.249.249.3].1059
qserial_query(movie.edu)
sysquery: send -> [192.249.249.3].53 dfd=5
          nsid=29790 id=0 retry=888048802

A Slave Nameserver Checking Its Zone (BIND 9 Debug Level 1)

The equivalent debugging output from a BIND 9.3.1 nameserver at level 1 is, as usual, more concise. Here’s what it looks like:

04-Jul-2005 15:05:00.059 zone_timer: zone movie.edu/IN: enter
04-Jul-2005 15:05:00.059 zone_maintenance: zone movie.edu/IN: enter
04-Jul-2005 15:05:00.059 queue_soa_query: zone movie.edu/IN: enter
04-Jul-2005 15:05:00.059 soa_query: zone movie.edu/IN: enter
04-Jul-2005 15:05:00.061 refresh_callback: zone movie.edu/IN: enter
04-Jul-2005 15:05:00.062 refresh_callback: zone movie.edu/IN: Serial: new 2005010923,
                                                              old 2005010922
04-Jul-2005 15:05:00.062 queue_xfrin: zone movie.edu/IN: enter
04-Jul-2005 15:05:00.062 zone movie.edu/IN: Transfer started.
04-Jul-2005 15:05:00.062 zone movie.edu/IN: requesting IXFR from 192.249.249.3#53
04-Jul-2005 15:05:00.063 transfer of 'movie.edu/IN' from 192.249.249.3#53:
                                         connected using 192.249.249.2#1106
04-Jul-2005 15:05:00.070 calling free_rbtdb(movie.edu)
04-Jul-2005 15:05:00.070 zone movie.edu/IN: zone transfer finished: success
04-Jul-2005 15:05:00.070 zone movie.edu/IN: transferred serial 5
04-Jul-2005 15:05:00.070 transfer of 'movie.edu' from 192.249.249.3#53: end of
    transfer
04-Jul-2005 15:05:01.089 zone_timer: zone movie.edu/IN: enter
04-Jul-2005 15:05:01.089 zone_maintenance: zone movie.edu/IN: enter
04-Jul-2005 15:05:19.121 notify_done: zone movie.edu/IN: enter
04-Jul-2005 15:05:19.621 notify_done: zone movie.edu/IN: enter

The message at 15:05:00.059 shows the refresh timer popping, causing the nameserver to begin maintenance for the zone on the next line. First, the nameserver queues a query for the SOA record for the IN class zone movie.edu (queue_soa_query at the same timestamp), which it sends. At 15:05:00.062, the nameserver finds that the master nameserver has a higher serial number than it does (2005010923 to its 2005010922), so it queues an inbound zone transfer (queue_xfrin). All of eight milliseconds later (at 15:05:00.070), the transfer is done, and at 15:05:01.089, the nameserver resets the refresh timer (zone_timer).

The next three lines show the nameserver doing maintenance on movie.edu again. If, for example, some of movie.edu’s nameservers were outside the movie.edu zone, the nameserver would use this opportunity to look up their addresses (not just A, but also A6 and AAAA records!) so that it could include them in future responses. On the last two lines, our nameserver sends NOTIFY messages—two, to be exact—to the nameservers listed in the NS records for movie.edu.

The Resolver Search Algorithm and Negative Caching (BIND 8)

In this trace, we’ll show you what the BIND search algorithm and negative caching look like from the perspective of a BIND 8 nameserver. We could look up galt.cs.purdue.edu like the last trace, but it wouldn’t show the search algorithm. Instead, we will look up foo.bar, a name that doesn’t exist. In fact, we’ll look it up twice:

1 datagram from cujo.horror.movie.edu 1109, fd 6, len 25
2 req: nlookup(foo.bar) id 19220 type=1 class=1
3 req: found 'foo.bar' as '' (cname=0)
4 forw: forw -> D.ROOT-SERVERS.NET 53 ds=7 nsid=2532 id=19220 0ms retry 4sec
5
6 datagram from D.ROOT-SERVERS.NET 53, fd 5, len 25
7 ncache: dname foo.bar, type 1, class 1
8 send_msg -> cujo.horror.movie.edu 1109 (UDP 6) id=19220
9
10 datagram from cujo.horror.movie.edu 1110, fd 6, len 42
11 req: nlookup(foo.bar.horror.movie.edu) id 19221 type=1 class=1
12 req: found 'foo.bar.horror.movie.edu' as 'horror.movie.edu' (cname=0)
13 forw: forw -> carrie.horror.movie.edu 53 ds=7 nsid=2533 id=19221 0ms
                                                          retry 4sec
14 datagram from carrie.horror.movie.edu 53, fd 5, len 42
15 ncache: dname foo.bar.horror.movie.edu, type 1, class 1
16 send_msg -> cujo.horror.movie.edu 1110 (UDP 6) id=19221

Look up foo.bar again:

17 datagram from cujo.horror.movie.edu 1111, fd 6, len 25
18 req: nlookup(foo.bar) id 15541 type=1 class=1
19 req: found 'foo.bar' as 'foo.bar' (cname=0)
20 ns_req: answer -> cujo.horror.movie.edu 1111 fd=6 id=15541 size=25 Local
21
22 datagram from cujo.horror.movie.edu 1112, fd 6, len 42
23 req: nlookup(foo.bar.horror.movie.edu) id 15542 type=1 class=1
24 req: found 'foo.bar.horror.movie.edu' as 'foo.bar.horror.movie.edu' (cname=0)
25 ns_req: answer -> cujo.horror.movie.edu 1112 fd=6 id=15542 size=42 Local

Let’s look at the resolver search algorithm. The first name looked up (line 2) is exactly the name we typed in. Since the name had at least one dot, it is looked up without modification. When that name lookup failed, horror.movie.edu was appended to the name and looked up.

Line 7 shows the caching of the negative answer (ncache). If the same name is looked up again in the next few minutes (line 19), the nameserver still has the negative response in its cache, so the server can answer immediately that the name doesn’t exist. (If you don’t believe this hand-waving, compare lines 3 and 19. On line 3, nothing was found for foo.bar, but line 19 shows the whole name being found.)

The Resolver Search Algorithm and Negative Caching (BIND 9)

Here’s what a BIND 9.3.1 nameserver’s debugging output looks like when looking up foo.bar twice:

04-Jul-2005 15:45:42.944 client cujo.horror.movie.edu#1044: query: foo.bar A +
04-Jul-2005 15:45:42.945 createfetch: foo.bar. A
04-Jul-2005 15:45:42.945 createfetch: . NS
04-Jul-2005 15:45:43.425 client cujo.horror.movie.edu#1044: query:
                                                  foo.bar.horror.movie.edu A +
04-Jul-2005 15:45:43.425 createfetch: foo.bar.horror.movie.edu. A

This assumes, of course, that you added the following lines to /etc/named.conf to see the queries:

logging {
    category queries {
        default_debug;
    };
};

This output is more subtle and succinct than BIND 8’s, but you can get the information you need from it. The first line, at 15:45:42.944, shows the initial query for foo.bar’s address arriving from the client cujo.horror.movie.edu (remember, we ran this through our magic IP-to-name filter, which we’ll introduce next). The next two lines show the nameserver dispatching two tasks (createfetch) to look up foo.bar: the first is the actual task to look up foo.bar’s address, while the second is a subsidiary task to look up NS records for the root zone, necessary to complete the foo.bar lookup. Once the nameserver has current NS records for the root, it queries a root nameserver for foo.bar’s address and gets a response indicating that no top-level domain called bar exists. Unfortunately, you don’t see that.

The line at 15:45:43.425 shows cujo.horror.movie.edu applying the search list, looking up foo.bar.horror.movie.edu. This causes the nameserver to dispatch a task (createfetch) to look up that domain name.

When we look up foo.bar again, we see:

04-Jul-2005 15:45:46.557 client cujo.horror.movie.edu#1044: query: foo.bar A +
04-Jul-2005 15:45:46.558 client cujo.horror.movie.edu#1044: query:
                                                  foo.bar.horror.movie.edu A +

Notice the absence of createfetch entries? That’s because our nameserver has the negative answers cached.

Tools

Let’s wrap up loose ends. We told you about our tool to convert IP addresses to names so that your debugging output is easier to read. Here is such a tool written in Perl:

#!/usr/bin/perl -n

use "Socket";

if (/)(d+.d+.d+.d+)/) {
$addr = pack('C4', split(/./, $1));
($name, $rest) = gethostbyaddr($addr, &AF_INET);
if($name) {s/$1/$name/};
}

print;

It’s best not to pipe named.run output into this script with debugging on because the script will generate its own queries to the nameserver.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset