Chapter 14. Troubleshooting DNS and BIND

“Of course not,” said the Mock Turtle. “Why, if a fish came to me, and told me he was going on a journey, I should say, `With what porpoise?’”

“Don’t you mean `purpose'?” said Alice.

“I mean what I say,” the Mock Turtle replied, in an offended tone. And the Gryphon added, “Come, let’s hear some of your adventures.”

In the last two chapters, we’ve demonstrated how to use nslookup and dig, and how to read the nameserver’s debugging information. In this chapter, we’ll show you how to use these tools—plus traditional Unix networking tools like trusty ol’ ping—to troubleshoot real-life problems with DNS and BIND.

Troubleshooting, by its nature, is a tough subject to teach. You start with any of a world of symptoms and try to work your way back to the cause. We can’t cover the whole gamut of problems you may encounter on the Internet, but we will certainly do our best to show how to diagnose the most common of them. And along the way, we hope to teach you troubleshooting techniques that will be valuable in tracking down more obscure problems that we don’t document.

Is NIS Really Your Problem?

Before we launch into a discussion of how to troubleshoot a DNS or BIND problem, we should make sure you know how to tell whether a problem is caused by DNS as opposed to NIS. On hosts running NIS, figuring out whether the culprit is DNS or NIS can be difficult. The stock BSD nslookup, for example, doesn’t pay any attention to NIS. You can run nslookup on a Sun and query the nameserver ‘til the cows come home while all the other services are using NIS.

How do you know where to put the blame? Some vendors have modified nslookup to use NIS for name service if NIS is configured. The HP-UX nslookup, for example, will report that it’s querying an NIS server when it starts up:

%nslookup
Default NIS Server:  toystory.movie.edu
Address:  192.249.249.3

>

A surefire way to decide whether an answer came from NIS is to use ypcat to list the hosts database. For example, to find out whether andrew.cmu.edu is in your NIS hosts map, you could execute:

%ypcat hosts | grep andrew.cmu.edu

If you find the answer in NIS (and you know NIS is being consulted first), you’ve found the cause of the problem.

Finally, in the versions of Unix that use the nsswitch.conf file, you can determine the order in which the different name services are used by referring to the entry for the hosts database in the file. An entry like this, for example, indicates that NIS is being checked first:

hosts:    nis dns files

while this entry has the name resolver querying DNS first:

hosts:    dns nis files

For more detailed information on the syntax and semantics of the nsswitch.conf file, see Chapter 6.

These hints should help you identify the guilty party or at least exonerate one suspect. If you narrow down the suspects and DNS is still implicated, you’ll just have to read this chapter.

Troubleshooting Tools and Techniques

We went over nslookup, dig, and the nameserver’s debugging output in the last two chapters. Before we go on, let’s introduce some new tools that can be useful in troubleshooting: named-xfer, nameserver database dumps, and query logging.

How to Use named-xfer

named-xfer is the program that BIND 8 nameservers start to perform zone transfers. (BIND 9 nameservers, you’ll remember, are multithreaded, so they don’t need a separate program to do inbound zone transfers; they just start a new thread.) named-xfer checks whether the slave’s copy of the zone data is up to date and transfers a new zone if necessary.

In Chapter 13, we showed you the debugging output a BIND 8 slave nameserver logged as it checked its zone. When the slave server transferred the zone, it started a child process (named-xfer) to pull the data to the local filesystem. We didn’t tell you, however, that you can also start named-xfer manually instead of waiting for named to start it, and that you can tell it to produce debugging output independent of named.

This can be useful if you’re tracking down a problem with zone transfers but don’t want to wait for named to schedule one. To test a zone transfer manually, you need to specify a number of command-line options:

%/usr/sbin/named-xfer
Usage error: no domain
Usage: named-xfer
        -z zone_to_transfer
        -f db_file
        [-i ixfr_file]
        [-s serial_no]
        [-d debug_level]
        [-l debug_log_file]
        [-t trace_file]
        [-p port]
        [-S] [-Z]
        [-C class]
        [-x axfr-src]
        [-X axfr-src-v6]
        [-T tsig_info_file]
        servers [-ixfr|-axfr]...

This is the output from a BIND 8.4.7 version of named-xfer. Earlier versions of named-xfer won’t have all these options.

When named starts named-xfer, it specifies the -z option (the zone named wants to check), the -f option (the name of the zone datafile that corresponds to the zone, from named.conf), the -s option (the zone’s serial number on the slave from the current SOA record), and the addresses of the servers the slave was instructed to load from (the IP addresses from the masters substatement in the zone statement in named.conf). If named is running in debug mode, it also specifies the debug level for named-xfer with the -d option. The other options aren’t usually necessary to troubleshoot problems; they have to do with incremental zone transfers, TSIG signing zone transfers, and such.

When you run named-xfer manually, you can also specify the debug level on the command line with -d. (Don’t forget, though, that debug levels above 3 produce tons of debugging output if the transfer succeeds!) You can also specify an alternate filename for the debug file with the -l option. The default log file is /var/tmp/xfer.ddt.XXXXXX, where XXXXXX is a suffix appended to preserve uniqueness or a file by the same name in /usr/tmp. And you can specify the name of the host to load from instead of its IP address.

For example, with the following command line, you can see whether zone transfers from toystory.movie.edu are working:

%/usr/sbin/named-xfer -z movie.edu -f /tmp/db.movie -s 0 toystory.movie.edu.
% echo $?
4

In this command, we specified a serial number of 0 because we wanted to force named-xfer to attempt a zone transfer even if it wasn’t needed. If 0 is higher than movie.edu’s serial number on toystory (remember, serial numbers use sequence space arithmetic), we’d need to choose a different number. Also, we told named-xfer to put the new zone datafile in /tmp rather than overwrite the zone’s working zone datafile.

We can tell if the transfer succeeded by looking at named-xfer’s return value. If you’re running BIND 8.1.2 or older, your named-xfer has four possible return values:

0

The zone data is up to date and no transfer was needed.

1

Indicates a successful transfer.

2

The host(s) named-xfer queried can’t be reached, or an error occurred and named-xfer may have logged an error message to syslog.

3

An error occurred and named-xfer logged an error message to syslog.

As of BIND 8.2, four new return values have been added to accommodate incremental zone transfers:

4

Indicates a successful AXFR (full) zone transfer

5

Indicates a successful IXFR (incremental) zone transfer

6

Indicates that the master nameserver returned an AXFR to named-xfer’s IXFR request

7

Indicates that the transfer was refused

It’s perfectly legal for a nameserver—even one that supports IXFR—to return a full zone transfer to a request for an incremental zone transfer. For example, the master nameserver may be missing part of the record of the changes made to the zone.

Note that BIND 8.2 and later named-xfers don’t use return value 1 anymore. Return value 1 has been replaced by return values 4–7.

What if I Don’t Have named-xfer?

If you’ve upgraded to BIND 9 and don’t have a named-xfer binary, you can still use nslookup or dig to do a zone transfer. Either query tool will give you some of the information that named-xfer would have given you.

For example, to use dig to do the same zone transfer we showed you earlier, you can run:

%dig @toystory.movie.edu movie.edu. axfr

With nslookup, you can change your nameserver and use the ls -d command from interactive mode.

Unfortunately, both dig and nslookup are more limited than named-xfer is in reporting errors. If nslookup can’t transfer a zone, it usually reports an “unspecified error”:

>ls movie.edu.
[toystory.movie.edu]
*** Can't list domain movie.edu: Unspecified error

This could be caused by an allow-transfer access list, the fact that toystory.movie.edu isn’t actually authoritative for movie.edu, or a number of other problems. To tell which, you may just have to send other, related queries or check the syslog output on the master nameserver.

How to Read a BIND 8 Database Dump

Poring over a dump of the nameserver’s internal database—including cached information—can also help you track down problems. The ndc dumpdb command causes named to dump its authoritative data, cached data, and hints data to named_dump.db in BIND’s working directory.[*] An example of a named_dump.db file follows. The authoritative data and cached entries, mixed together, appear first in the file. At the end of the file is the hints data.

; Dumped at Tue Jan  6 10:49:08 1998
;; ++zone table++
; 0.0.127.in-addr.arpa (type 1, class 1, source db.127.0.0)
;   time=0, lastupdate=0, serial=1,
;   refresh=0, retry=3600, expire=608400, minimum=86400
;   ftime=884015430, xaddr=[0.0.0.0], state=0041, pid=0
;; --zone table--
; Note: Cr=(auth,answer,addtnl,cache) tag only shown for non-auth RR's
; Note: NT=milliseconds for any A RR which we've used as a nameserver
; --- Cache & Data ---
$ORIGIN .
.   518375  IN      NS  G.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  J.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  K.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  L.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  M.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  A.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  H.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  B.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  C.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  D.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  E.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  I.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
    518375  IN      NS  F.ROOT-SERVERS.NET.   ;Cr=auth [128.8.10.90]
EDU  86393  IN      SOA A.ROOT-SERVERS.NET.  hostmaster.INTERNIC.NET. (
         1998010500 1800 900 604800 86400 )   ;Cr=addtnl [128.63.2.53]
$ORIGIN  0.127.in-addr.arpa.
0        IN    SOA cujo.movie.edu. root.cujo.movie.edu. (
         1998010600 10800 3600 608400 86400 )        ;Cl=5
         IN    NS  cujo.movie.edu.   ;Cl=5
$ORIGIN  0.0.127.in-addr.arpa.
1        IN    PTR localhost.    ;Cl=5
$ORIGIN EDU.
PURDUE   172787  IN  NS  NS.PURDUE.EDU.           ;Cr=addtnl [192.36.148.17]
         172787  IN  NS  MOE.RICE.EDU.            ;Cr=addtnl [192.36.148.17]
         172787  IN  NS  PENDRAGON.CS.PURDUE.EDU.  ;Cr=addtnl [192.36.148.17]
         172787  IN  NS  HARBOR.ECN.PURDUE.EDU.    ;Cr=addtnl [192.36.148.17]
$ORIGIN  movie.EDU.
;cujo    593     IN  SOA  A.ROOT-SERVERS.NET. hostmaster.INTERNIC. NET. (
;        1998010500 1800 900 604800 86400 );EDU.; NXDOMAIN  ;-$
   ;Cr=auth [128.63.2.53]
$ORIGIN   RICE.EDU.
MOE      172787  IN  A   128.42.5.4        ;NT=84 Cr=addtnl [192.36.148.17]
$ORIGIN   PURDUE.EDU.
CS       86387   IN  NS  pendragon.cs.PURDUE.edu.    ;Cr=addtnl [128.42.5.4]
         86387   IN  NS  ns.PURDUE.edu.              ;Cr=addtnl [128.42.5.4]
         86387   IN  NS  harbor.ecn.PURDUE.edu.      ;Cr=addtnl [128.42.5.4]
         86387   IN  NS  moe.rice.edu.               ;Cr=addtnl [128.42.5.4]
NS       172787   IN  A  128.210.11.5        ;NT=4 Cr=addtnl [192.36.148.17]
$ORIGIN   ECN.PURDUE.EDU.
HARBOR   172787  IN   A  128.46.199.76       ;NT=6 Cr=addtnl [192.36.148.17]
$ORIGIN   CS.PURDUE.EDU.
galt     86387   IN   A  128.10.2.39                 ;Cr=auth [128.42.5.4]
PENDRAGON  172787  IN  A  128.10.2.5         ;NT=20 Cr=addtnl [192.36.148.17]
$ORIGIN   ROOT-SERVERS.NET.
K        604775    IN  A  193.0.14.129       ;NT=10 Cr=answer [128.8.10.90]
A        604775    IN  A  198.41.0.4         ;NT=20 Cr=answer [128.8.10.90]
L        604775    IN  A  198.32.64.12       ;NT=8 Cr=answer [128.8.10.90]
B        604775    IN  A  128.9.0.107        ;NT=9 Cr=answer [128.8.10.90]
M        604775    IN  A  202.12.27.33       ;NT=20 Cr=answer [128.8.10.90]
C        604775    IN  A  192.33.4.12        ;NT=17 Cr=answer [128.8.10.90]
D        604775    IN  A  128.8.10.90        ;NT=11 Cr=answer [128.8.10.90]
E        604775    IN  A  192.203.230.10     ;NT=9 Cr=answer [128.8.10.90]
F        604775    IN  A  192.5.5.241        ;NT=73 Cr=answer [128.8.10.90]
G        604775    IN  A  192.112.36.4       ;NT=14 Cr=answer [128.8.10.90]
H        604775    IN  A  128.63.2.53        ;NT=160 Cr=answer [128.8.10.90]
I        604775    IN  A  192.36.148.17      ;NT=102 Cr=answer [128.8.10.90]
J        604775    IN  A  198.41.0.10        ;NT=21 Cr=answer [128.8.10.90]
; --- Hints ---
$ORIGIN .
.   3600           IN  NS  A.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  B.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  C.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  D.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  E.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  F.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  G.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  H.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  I.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  J.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  K.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  L.ROOT-SERVERS.NET.     ;Cl=0
    3600           IN  NS  M.ROOT-SERVERS.NET.     ;Cl=0
$ORIGIN   ROOT-SERVERS.NET.
K     3600         IN   A  193.0.14.129      ;NT=11 Cl=0
L     3600         IN   A  198.32.64.12      ;NT=9 Cl=0
A     3600         IN   A  198.41.0.4        ;NT=10 Cl=0
M     3600         IN   A  202.12.27.33      ;NT=11 Cl=0
B     3600         IN   A  128.9.0.107       ;NT=1288 Cl=0
C     3600         IN   A  192.33.4.12       ;NT=21 Cl=0
D     3600         IN   A  128.8.10.90       ;NT=1288 Cl=0
E     3600         IN   A  192.203.230.10    ;NT=19 Cl=0
F     3600         IN   A  192.5.5.241       ;NT=23 Cl=0
G     3600         IN   A  192.112.36.4      ;NT=18 Cl=0
H     3600         IN   A  128.63.2.53       ;NT=11 Cl=0
I     3600         IN   A  192.36.148.17     ;NT=21 Cl=0
J     3600         IN   A  198.41.0.10       ;NT=13 Cl=0

The nameserver that created this named_dump.db file was authoritative only for 0.0.127.in-addr.arpa. Only two names have been looked up by this server: galt.cs.purdue.edu and cujo.movie.edu. In the process of looking up galt.cs.purdue.edu, this server cached not only the address of galt, but also the list of nameservers for purdue.edu and the addresses for those servers. The name cujo.movie.edu, however, doesn’t really exist (nor does the zone movie.edu, except in our examples), so the server cached the negative response. In the dump file, the negative response is commented out (the line starts with a semicolon), and the reason is listed (NXDOMAIN) instead of real data. You’ll notice the TTL is quite low (593). On BIND 8.2 and later nameservers, negative responses are cached according to the last field in the SOA record, which is usually much smaller than the default TTL for the zone.

The hints section at the bottom of the file contains the data from the db.cache file. The TTL of the hints data is decremented, and it may go to 0, but the hints are never discarded.

Note that some of the resource records are followed by a semicolon and NT=. You will only see these on the address records of nameservers. The number is the round-trip time calculation that the nameserver keeps so that it knows which nameservers have responded most quickly in the past; the nameserver with the lowest round-trip time will be tried first the next time.

The cached data is easy to pick out: those entries have a tag (Cr=) and (sometimes) the IP address of the server the data came from.[*] The zone data and hint data are tagged with Cl=, which is just a count of the level in the domain tree (the root is level 0, foo would be level 1, foo.foo would be level 2, etc.). Let’s digress a moment to explain the concept of credibility.

One of the advances between BIND 4.8.3 and 4.9 was the addition of a credibility measure. This allows a nameserver to make more intelligent decisions about what to do with new data from a remote server.

A 4.8.3 nameserver had only two credibility levels: locally authoritative data and everything else. The locally authoritative data was data from your zone datafiles; your nameserver knew better than to update its internal copy of what came from your zone file. But all data from remote nameservers was considered equal.

Here is a situation that could happen and the way a 4.8.3 server would deal with it. Suppose that your server looked up an address for toystory.movie.edu and received an authoritative answer from the movie.edu nameserver. (Remember, an authoritative answer is the best you can get.) Sometime later, while looking up foo.oreilly.com, your server receives another address record for toystory.movie.edu , but this time as part of the delegation information for oreilly.com (which toystory.movie.edu is a slave for). The 4.8.3 nameserver would update the cached address record for toystory.movie.edu , even though the data came from the com nameserver instead of the authoritative movie.edu nameserver. Of course, the com and movie.edu nameservers will have exactly the same data for toystory.movie.edu , so this won’t be a problem, right? Yeah, and it never rains in southern California, either.

A 4.9 or newer nameserver is more intelligent. Like a 4.8.3 nameserver, it still considers your zone data unassailable—beyond any doubt. But a 4.9 or newer nameserver distinguishes among the different data from remote nameservers. Here is the hierarchy of remote data credibility from most credible to least:

auth

These records are data from authoritative answers—the answer section of a response message with the authoritative answer bit set.

answer

These records are data from nonauthoritative, or cached, answers—the answer section of a response message without the authoritative answer bit set.

addtnl

These records are data from the rest of the response message—the authority and additional sections. The authority section of the response contains NS records that delegate a zone to an authoritative nameserver. The additional section contains address records that may complete information in other sections (e.g., address records that go with NS records in the authority section).

There is one exception to this rule: when the nameserver is priming its root nameserver cache, the records that would be at credibility addtnl are bumped up to answer to make them harder to change accidentally. Notice in the dump that the address records for root nameservers are at credibility answer, but the address records for the purdue.edu nameservers are at credibility addtnl.

In the situation just described, a 4.9 or newer nameserver would not replace the authoritative data (credibility = auth) for toystory.movie.edu with the delegation data (credibility = addtnl) because the authoritative answer would have higher credibility.

How to Read a BIND 9 Database Dump

With BIND 9, the database dump changed significantly. Here is the result of running rndc dumpdb. The nameserver dumps its cache data to named_dump.db in BIND’s working directory. What you don’t see in this dump is the authoritative data. To get that, you must run rndc dumpdb -all.

;
; Start view _default
;
;
; Cache dump of view '_default'
;
$DATE 20050827190436
; authanswer
.            518364    IN NS    A.ROOT-SERVERS.NET.
             518364    IN NS    B.ROOT-SERVERS.NET.
             518364    IN NS    C.ROOT-SERVERS.NET.
             518364    IN NS    D.ROOT-SERVERS.NET.
             518364    IN NS    E.ROOT-SERVERS.NET.
             518364    IN NS    F.ROOT-SERVERS.NET.
             518364    IN NS    G.ROOT-SERVERS.NET.
             518364    IN NS    H.ROOT-SERVERS.NET.
             518364    IN NS    I.ROOT-SERVERS.NET.
             518364    IN NS    J.ROOT-SERVERS.NET.
             518364    IN NS    K.ROOT-SERVERS.NET.
             518364    IN NS    L.ROOT-SERVERS.NET.
             518364    IN NS    M.ROOT-SERVERS.NET.
; glue
A3.NSTLD.COM.        172764    A    192.5.6.32
; glue
C3.NSTLD.COM.        172764    A    192.26.92.32
; glue
D3.NSTLD.COM.        172764    A    192.31.80.32
; glue
E3.NSTLD.COM.        172764    A    192.12.94.32
; glue
G3.NSTLD.COM.        172764    A    192.42.93.32
; glue
H3.NSTLD.COM.        172764    A    192.54.112.32
; glue
L3.NSTLD.COM.        172764    A    192.41.162.32
; glue
M3.NSTLD.COM.        172764    A    192.55.83.32
; glue
edu.            172764    NS    A3.NSTLD.COM.
            172764        NS    C3.NSTLD.COM.
            172764        NS    D3.NSTLD.COM.
            172764        NS    E3.NSTLD.COM.
            172764        NS    G3.NSTLD.COM.
            172764        NS    H3.NSTLD.COM.
            172764        NS    L3.NSTLD.COM.
            172764        NS    M3.NSTLD.COM.
; authauthority
cujo.movie.edu.        10796    -ANY    ;-$NXDOMAIN
; glue
purdue.edu.        172764    NS    NS.purdue.edu.
            172764    NS    MOE.RICE.edu.
            172764    NS    HARBOR.ECN.purdue.edu.
            172764    NS    PENDRAGON.cs.purdue.edu.
; authauthority
cs.purdue.edu.        86364    NS    ns.purdue.edu.
            86364     NS    moe.rice.edu.
            86364     NS    ns2.purdue.edu.
            86364     NS    harbor.ecn.purdue.edu.
            86364     NS    pendragon.cs.purdue.edu.
; authanswer
galt.cs.purdue.edu.       86364    A    128.10.2.39
; glue
PENDRAGON.cs.purdue.edu. 172764    A    128.10.2.5
; glue
HARBOR.ECN.purdue.edu.   172764    A    128.46.154.76
; glue
NS.purdue.edu.           172764    A    128.210.11.5
; additional
ns2.purdue.edu.            3564    A    128.210.11.57
; glue
MOE.RICE.edu.            172764    A    128.42.5.4
; additional
A.ROOT-SERVERS.NET.      604764    A    198.41.0.4
; additional
B.ROOT-SERVERS.NET.      604764    A    192.228.79.201
; additional
C.ROOT-SERVERS.NET.      604764    A    192.33.4.12
; additional
D.ROOT-SERVERS.NET.      604764    A    128.8.10.90
; additional
E.ROOT-SERVERS.NET.      604764    A    192.203.230.10
; additional
F.ROOT-SERVERS.NET.      604764    A    192.5.5.241
; additional
G.ROOT-SERVERS.NET.      604764    A    192.112.36.4
; additional
H.ROOT-SERVERS.NET.      604764    A    128.63.2.53
; additional
I.ROOT-SERVERS.NET.      604764    A    192.36.148.17
; additional
J.ROOT-SERVERS.NET.      604764    A    192.58.128.30
; additional
K.ROOT-SERVERS.NET.      604764    A    193.0.14.129
; additional
L.ROOT-SERVERS.NET.      604764    A    198.32.64.12
; additional
M.ROOT-SERVERS.NET.      604764    A    202.12.27.33
;
; Start view _default
;
;
; Address database dump
;
; M3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.55.83.32 [srtt 20] [flags 00000000] [ttl 1796]
; L3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.41.162.32 [srtt 20] [flags 00000000] [ttl 1796]
; H3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.54.112.32 [srtt 27] [flags 00000000] [ttl 1796]
; G3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.42.93.32 [srtt 15] [flags 00000000] [ttl 1796]
; E3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.12.94.32 [srtt 17] [flags 00000000] [ttl 1796]
; D3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.31.80.32 [srtt 10] [flags 00000000] [ttl 1796]
; C3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.26.92.32 [srtt 28156] [flags 00000000] [ttl 1796]
; A3.NSTLD.COM [v4 TTL 6] [v4 success] [v6 unexpected]
;    192.5.6.32 [srtt 23155] [flags 00000000] [ttl 1796]
; M.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    202.12.27.33 [srtt 0] [flags 00000000] [ttl 1764]
; L.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    198.32.64.12 [srtt 16] [flags 00000000] [ttl 1764]
; K.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    193.0.14.129 [srtt 22] [flags 00000000] [ttl 1764]
; J.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.58.128.30 [srtt 25] [flags 00000000] [ttl 1764]
; I.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.36.148.17 [srtt 19] [flags 00000000] [ttl 1764]
; H.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    128.63.2.53 [srtt 19] [flags 00000000] [ttl 1764]
; G.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.112.36.4 [srtt 24] [flags 00000000] [ttl 1764]
; F.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.5.5.241 [srtt 17850] [flags 00000000] [ttl 1764]
; E.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.203.230.10 [srtt 7] [flags 00000000] [ttl 1764]
; D.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    128.8.10.90 [srtt 8] [flags 00000000] [ttl 1764]
; C.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.33.4.12 [srtt 5] [flags 00000000] [ttl 1764]
; B.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    192.228.79.201 [srtt 24] [flags 00000000] [ttl 1764]
; A.ROOT-SERVERS.NET [v4 TTL 86364] [v4 success] [v6 unexpected]
;    198.41.0.4 [srtt 29] [flags 00000000] [ttl 1764]
;
; Unassociated entries
;
;    128.210.11.5 [srtt 47718] [flags 00000000] [ttl 1764]
;    128.10.2.5 [srtt 9] [flags 00000000] [ttl 1764]
;    128.42.5.4 [srtt 2] [flags 00000000] [ttl 1764]
;    128.46.154.76 [srtt 6] [flags 00000000] [ttl 1764]
;
; Start view _bind
;
;
; Cache dump of view '_bind'
;
$DATE 20050827190436
;
; Start view _bind
;
;
; Address database dump
;
;
; Unassociated entries
;
; Dump complete

The nameserver that created this named_dump.db file was authoritative only for 0.0.127.in-addr.arpa (although you won’t see that data because we didn’t use rndc dumpdb -all to dump the authoritative data). Only two names have been looked up by this server: galt.cs.purdue.edu and cujo.movie.edu. In the process of looking up galt.cs.purdue.edu, this server cached not only the address of galt, but also the list of nameservers for edu, purdue.edu, cs.purdue.edu, and the addresses for those servers. The name cujo.movie.edu, however, doesn’t really exist (nor does the zone movie.edu, except in our examples), so the server cached the negative response.

Like BIND 8, BIND 9 tags each data with information about how trustworthy the data is. The trust measure is displayed in a comment before the actual data. In the snippet below, the NS record for the root domain is at trust level authanswer.

; authanswer
.            518364    IN NS    A.ROOT-SERVERS.NET.

Here is a complete list of the trust levels you might see in a database dump:

Trust level

Description

secure

DNSSEC-validated

authanswer

Answer from an authoritative server

authauthority

Data from the authority section of an authoritative response

answer

Answer from a nonauthoritative server

glue

Referral data

additional

Data from the additional section of a response

pending

Subject to DNSSEC validation but has not yet been validated

In the Address database dump section of the previous code, the nameserver is displaying some additional data it keeps about other nameservers. Some of the data is associated with the name (whether it does IPv4 or IPv6), and some of the data is associated with the address (the smoothed round-trip time and flags, which indicates only EDNS0 support at this point).

The next section is the Unassociated entries section. This section is just like the Address database dump section, but the data associated with the name has gone away. The only thing left is the data associated with the address. The first entry in the Address database dump section (M3.NSTLD.COM) has a TTL of 6. In six seconds, the data associated with the name will expire, and the data associated with 192.55.83.32 will be demoted to the Unassociated entries section.

Logging Queries

BIND has a feature called query logging that can help diagnose certain problems. When query logging is turned on, a running nameserver logs every query to syslog. This feature can help you find resolver configuration errors because you can verify that the name you think is being looked up really is the name being looked up.

First, make sure that LOG_INFO messages are being logged by syslog for the facility daemon. Next, you turn on query logging. This can be done in several ways: for BIND 8, start the nameserver with -q on the command line or send an ndc querylog command to a running nameserver. For BIND 9.1.0 or later (earlier versions don’t support query logging), use rndc querylog. You’ll start seeing syslog messages like this:

Feb 20 21:43:25 toystory named[3830]:
                     XX+ /192.253.253.2/carrie.movie.edu/A
Feb 20 21:43:32 toystory named[3830]:
                     XX+ /192.253.253.2/4.253.253.192.in-addr.arpa/PTR

Or, if you’re running BIND 9, like this:

Jan 13 18:32:25 toystory named[13976]: info: client 192.253.253.2#1702: query:
                     carrie.movie.edu IN A
Jan 13 18:32:42 toystory named[13976]: info: client 192.253.253.2#1702: query:
                     4.253.253.192.in-addr.arpa IN PTR

These messages include the IP address of the host that made the query, as well as the query itself. Since the first example comes from a BIND 8.2.3 nameserver and these queries are recursive, they begin with XX+. Iterative queries begin with just XX. (Nameservers older than BIND 8.2.1 don’t distinguish between recursive and nonrecursive queries.) After enough queries have been logged, you can turn off query logging by sending another ndc querylog or rndc querylog command to your nameserver.

If you’re stuck running an older BIND 9 nameserver, you can still see the queries received in named’s debugging output at level 1.

Potential Problem List

Now that we’ve given you a nice set of tools, let’s talk about how you can use them to diagnose real problems. There are some problems that are easy to recognize and correct. We should cover these as a matter of course; they’re some of the most common problems because they’re caused by some of the most common mistakes. Here are the contestants, in no particular order. We call ‘em our “Unlucky Thirteen.”

1. Forgot to Increment Serial Number

The main symptom of this problem is that slave nameservers don’t pick up any changes you made to the zone’s datafile on the primary. The slaves think the zone data hasn’t changed because the serial number is still the same.

How do you check whether you remembered to increment the serial number? Unfortunately, that’s not so easy. If you don’t remember what the old serial number was, and your serial number gives you no indication of when it was updated, there’s no direct way to tell whether it’s changed.[*] When you reload the primary, it loads the updated zone file regardless of whether you’ve changed the serial number. It checks the file’s timestamp, sees that it’s been modified since it last loaded the data, and reads the file. About the best you can do is to use nslookup to compare the data returned by the primary and by a slave. If they return different data, you probably forgot to increment the serial number. If you can remember a recent change you made, you can look for that data. If you can’t remember a recent change, you can try transferring the zone from a primary and from a slave, sorting the results, and using diff to compare them.

The good news is that, although determining whether the zone was transferred is tricky, making sure the zone is transferred is simple. Just increment the serial number on the primary’s copy of the zone datafile and reload the zone on the primary. The slaves should pick up the new data within their refresh interval, or sooner if they use NOTIFY. If you run BIND 9.3 slaves, you can use the new rndc retransfer command to force an immediate zone transfer. To force BIND 8 slaves to transfer the new data, you can delete the backup file and restart named, or execute named-xfer by hand (on the slaves, naturally):

#/usr/sbin/named-xfer -z movie.edu -f bak.movie.edu -s 0 toystory.movie.edu
# echo $?

If named-xfer returns 1 or 4, the zone was transferred successfully. Other return values indicate that no zone was transferred, either because of an error or because the slave thought the zone was up to date. (See the earlier section "How to Use named-xfer" for more details.)

There’s another variation of the “forgot to increment the serial number” problem. We see it in environments where administrators use tools such as h2n to create zone datafiles from the host table. With scripts like h2n, it’s temptingly easy to delete old zone datafiles and create new ones from scratch. Some administrators do this occasionally because they mistakenly believe that data in the old zone datafiles can creep into the new ones. The problem with deleting the zone datafiles is that, without the old datafile to read for the current serial number, h2n starts over at serial number 1. If your zone’s serial number on the primary rolls all the way back to 1 from 598 or what have you, the slaves emit a syslog error message warning you that something might be wrong:

Jun  7 20:14:26 wormhole named[29618]: Zone "movie.edu"
                (class 1) SOA serial# (1) rcvd from [192.249.249.3]
                is < ours (112)

So if the serial number on the primary looks suspiciously low, check the serial number on the slaves, too, and compare them:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> set q=soa
> movie.edu.
Server:  toystory.movie.edu
Address:  192.249.249.3

movie.edu
        origin = toystory.movie.edu
        mail addr = al.movie.edu
        serial = 1
        refresh = 10800 (3 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)
> server wormhole.movie.edu.
Default Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

> movie.edu.
Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

movie.edu
        origin = toystory.movie.edu
        mail addr = al.movie.edu
        serial = 112
        refresh = 10800 (3 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)

wormhole.movie.edu , as a movie.edu slave, should never have a larger serial number than the primary, so clearly something’s amiss.

This problem is really easy to spot, by the way, with the tool we’ll write in Chapter 15.

2. Forgot to Reload Primary Nameserver

Occasionally, you may forget to reload your primary nameserver after making a change to the configuration file or to a zone datafile. The nameserver won’t know to load the new configuration or the new zone data; it doesn’t automatically check the timestamp of the file and notice that it changed. Consequently, any changes you’ve made won’t be reflected in the nameserver’s data: new zones won’t be loaded, and new records won’t percolate out to the slaves.

To check when you last reloaded the nameserver, scan the syslog output for the last entry like this for a BIND 9 nameserver:

Mar  8 17:22:08 toystory named[22317]: loading configuration from '/etc/named.conf'

Or like this for a BIND 8 nameserver:

Mar  8 17:22:08 toystory named[22317]: reloading nameserver

These messages tell you the last time you sent a reload command to the nameserver. If you killed and then restarted the nameserver, you’ll see an entry like this on a BIND 9 nameserver:

Mar  8 17:22:08 toystory named[22317]: running

On a BIND 8 nameserver, it’d look like:

Mar  8 17:22:08 toystory named[22317]: restarted

If the time of the restart or reload doesn’t correlate with the time you made the last change, reload the nameserver again. And check that you incremented the serial numbers in zone datafiles you changed, too. If you’re not sure when you edited the zone datafile, you can check the file modification time by doing a long listing of the file with ls -l.

3. Slave Nameserver Can’t Load Zone Data

If a slave nameserver can’t get the current serial number for a zone from its master nameserver, it logs a message via syslog. On a BIND 9 nameserver, that looks like:

Sep 25 22:02:38 wormhole named[21246]: refresh_callback: zone
       movie.edu/IN: failure for 192.249.249.3#53: timed out

On BIND 8, look for:

Jan  6 11:55:25 wormhole named[544]: Err/TO getting serial# for "movie.edu"

If you let this problem fester, the slave will expire the zone. A BIND 9 nameserver will report:

Sep 25 23:20:20 wormhole named[21246]: zone_expire: zone
       movie.edu/IN: expired

A BIND 8 nameserver will log:

Mar  8 17:12:43 wormhole named[22261]: secondary zone
       "movie.edu" expired

Once the zone has expired, you’ll start getting SERVFAIL errors when you query the nameserver for data in the zone:

%nslookup robocop wormhole.movie.edu.
Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

*** wormhole.movie.edu can't find robocop.movie.edu: Server failed

There are three leading causes of this problem: a loss in connectivity to the master server due to network failure, an incorrect IP address for the master server in the configuration file, or a syntax error in the zone datafile on the master server. First, check the configuration file’s entry for the zone and see what IP address the slave is attempting to load from:

zone "movie.edu" {
                type slave;
                masters { 192.249.249.3; };
                file "bak.movie.edu";
};

Make sure that’s really the IP address of the master nameserver. If it is, check connectivity to that IP address:

%ping 192.249.249.3 -n 10
PING 192.249.249.3: 64 byte packets

----192.249.249.3 PING Statistics----
10 packets transmitted, 0 packets received, 100% packet loss

If the master server isn’t reachable, make sure that the host the nameserver runs on is really running (e.g., is powered on, etc.), or look for a network problem. If the host is reachable, make sure named is running on the host and that you can manually transfer the zone:

#/usr/sbin/named-xfer -z movie.edu -f /tmp/db.movie.edu -s 0 192.249.249.3
# echo $?
2

A return code of 2 means that an error occurred. Check to see if there is a syslog message. In this case, there is a message:

Jan  6 14:56:07 zardoz named-xfer[695]: record too short from
[192.249.249.3], zone movie.edu

At first glance, this error looks like a truncation problem. The real problem is easier to see if you use nslookup:

%nslookup - toystory.movie.edu
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> ls movie.edu                   This attempts a zone transfer
[toystory.movie.edu]
*** Can't list domain movie.edu: Query refused

What’s happening here is that named is refusing to allow you to transfer its zone data. The remote server has probably secured its zone data with an allow-transfer substatement.

If the master server is responding as not authoritative for the zone, you’ll see a message like this from your BIND 9 nameserver:

Sep 26 13:29:23 zardoz named[21890]: refresh_callback: zone movie.edu/IN:
     non-authoritative answer from 192.249.249.3#53

Or on BIND 8, like this:

Jan  6 11:58:36 zardoz named[544]: Err/TO getting serial# for "movie.edu"
Jan  6 11:58:36 zardoz named-xfer[793]: [192.249.249.3] not authoritative for
     movie.edu, SOA query got rcode 0, aa 0, ancount 0, aucount 0

If this is the correct master server, the server should be authoritative for the zone. This probably indicates that the master had a problem loading the zone, usually because of a syntax error in the zone datafile. Contact the administrator of the master server and have her check her syslog output for indications of a syntax error (see the section "5. Syntax Error in Configuration File or Zone Datafile“).

4. Added Name to Zone Datafile but Forgot to Add PTR Record

Because mappings of hostnames to IP addresses are disjointed from mappings of IP addresses to hostnames in DNS, it’s easy to forget to add a PTR record for a new host. Adding the A record is intuitive, but many people who are used to host tables assume that adding an address record takes care of the reverse mapping, too. That’s not true: you need to add a PTR record for the host to the appropriate reverse-mapping zone.

Forgetting to add the PTR record for a host’s address usually causes that host to fail authentication checks. For example, users on the host won’t be able to rlogin to other hosts without specifying a password, and rsh or rcp to other hosts simply won’t work. The servers these commands talk to must be able to map a client’s IP address to a domain name to check .rhosts and hosts.equiv. These users’ connections will cause entries like this to be syslogged:

Aug 15 17:32:36 toystory inetd[23194]: login/tcp:
       Connection from unknown (192.249.249.23)

Also, some network servers on the Internet, including certain FTP servers, deny access to hosts whose IP addresses don’t map back to domain names. An attempt to access such a server might produce an error message like this:

530- Sorry, we're unable to map your IP address 140.186.66.1 to a hostname
530- in the DNS.  This is probably because your nameserver does not have a
530- PTR record for your address in its tables, or because your reverse
530- nameservers are not registered.  We refuse service to hosts whose
530- names we cannot resolve.

That makes the reason you can’t use the service pretty evident. Other servers, however, don’t bother printing informative messages; they simply deny service.

nslookup is handy for checking whether you’ve forgotten the PTR record:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> beetlejuice       Check for a name-to-address mapping
Server:  toystory.movie.edu
Address:  192.249.249.3

Name:    beetlejuice.movie.edu
Address:  192.249.249.23

> 192.249.249.23    Now check for a corresponding address-to-name mapping
Server:  toystory.movie.edu
Address:  192.249.249.3

*** toystory.movie.edu can't find 192.249.249.23: Non-existent domain

On the primary for 249.249.192.in-addr.arpa, a quick check of the db.192.249.249 file will tell you if the PTR record hasn’t been added to the zone datafile or if the nameserver hasn’t been reloaded. If the nameserver having trouble is a slave for the zone, check that the serial number was incremented on the primary and that the slave has had enough time to load the zone.

5. Syntax Error in Configuration File or Zone Datafile

Syntax errors in a nameserver’s configuration file and in zone datafiles are also relatively common (more or less, depending on the experience of the administrator). Generally, an error in the config file will cause the nameserver to fail to load one or more zones. Some typos in the options statement will cause the nameserver to fail to start at all and to log an error like this via syslog (BIND 9):

Sep 26 13:39:30 toystory named[21924]: change directory to '/var/name' failed: file
    not found
Sep 26 13:39:30 toystory named[21924]: options configuration failed: file not found
Sep 26 13:39:30 toystory named[21924]: loading configuration: failure
Sep 26 13:39:30 toystory named[21924]: exiting (due to fatal error)

A BIND 8 nameserver logs:

Jan  6 11:59:29 toystory named[544]: can't change directory to /var/name: No
     such file or directory

Note that you won’t see an error message when you try to start named on the command line or at boot time, but named won’t stay running for long.

If the syntax error is in a less important line in the config file—say, in a zone statement—only that zone will be affected. Usually, the nameserver won’t be able to load the zone at all (say, you misspell “masters” or the name of the zone datafile, or you forget to put quotes around the filename or domain name). This produces syslog output from BIND 9 like this:

Sep 26 13:43:03 toystory named[21938]: /etc/named.conf:80:
    parse error near 'masters'
Sep 26 13:43:03 toystory named[21938]: loading configuration: failure
Sep 26 13:43:03 toystory named[21938]: exiting (due to fatal error)

Or, from BIND 8:

Jan  6 12:01:36 toystory named[841]: /etc/named.conf:10: syntax error near
     'movie.edu'

If a zone datafile contains a syntax error yet the nameserver succeeds in loading the zone, it either answers as nonauthoritative for all data in the zone or returns a SERVFAIL error for lookups in the zone:

%nslookup carrie.movie.edu.
Server:  toystory.movie.edu
Address:  192.249.249.3

*** toystory.movie.edu can't find carrie.movie.edu.: Server failed

Here’s the BIND 9 syslog message produced by the syntax error that caused this problem:

Sep 26 13:45:40 toystory named[21951]: error: dns_rdata_fromtext: db.movie.edu:11:
    near 'postmanrings2x': unexpected token
Sep 26 13:45:40 toystory named[21951]: error: dns_zone_load: zone movie.edu/IN:
    database db.movie.edu: dns_db_load failed: unexpected token
Sep 26 13:45:40 toystory named[21951]: critical: loading zones: unexpected token
Sep 26 13:45:40 toystory named[21951]: critical: exiting (due to fatal error)

Here’s BIND 8’s error:

Jan  6 15:07:46 toystory named[693]: db.movie.edu:11: Priority error
     (postmanrings2x.movie.edu.)
Jan  6 15:07:46 toystory named[693]: master zone "movie.edu" (IN) rejected due
     to errors (serial 1997010600)

If you looked in the zone datafile for the problem, you’d find this record:

postmanrings2x     IN     MX     postmanrings2x.movie.edu.

The MX record is missing the preference field, which causes the error.

Note that unless you correlate the SERVFAIL error or lack of authority (when you expect the nameserver to be authoritative) with a problem or scan your syslog file assiduously, you might never notice the syntax error!

Also, an “invalid” hostname can be a syntax error:

Jan  6 12:04:10 toystory named[841]: owner name "ID_4.movie.edu" IN (primary)
     is invalid - rejecting
Jan  6 12:04:10 toystory named[841]: db.movie.edu:11: owner name error
Jan  6 12:04:10 toystory named[841]: db.movie.edu:11: Database error near (A)
Jan  6 12:04:10 toystory named[841]: master zone "movie.edu" (IN) rejected
     due to errors (serial 1997010600)

6. Missing Dot at the End of a Domain Name in a Zone Datafile

It’s very easy to leave off trailing dots when editing a zone datafile. Since the rules for when to use them change so often (don’t use them in the configuration file, don’t use them in resolv.conf, do use them in zone datafiles to override $ORIGIN . . .), it’s hard to keep them straight. These resource records:

zorba         IN     MX     10 zelig.movie.edu
movie.edu     IN     NS     toystory.movie.edu

really don’t look that odd to the untrained eye, but they probably don’t do what they’re intended to. In the db.movie.edu file, they’d be equivalent to:

zorba.movie.edu.        IN    MX    10 zelig.movie.edu.movie.edu.
movie.edu.movie.edu.    IN    NS    toystory.movie.edu.movie.edu.

unless the origin were explicitly changed.

If you omit a trailing dot after a domain name in the resource record’s data (as opposed to leaving off a trailing dot in the resource record’s name), you usually end up with wacky NS or MX records:

%nslookup -type=mx zorba.movie.edu.
Server:  toystory.movie.edu
Address:  192.249.249.3

zorba.movie.edu      preference = 10, mail exchanger
                     = zelig.movie.edu.movie.edu
zorba.movie.edu      preference = 50, mail exchanger
                     = postmanrings2x.movie.edu.movie.edu

The cause of this should be fairly clear from the nslookup output. But if you forget the trailing dot on the domain name field in a record (as in the movie.edu NS record just listed), spotting your mistake might not be as easy. If you try to look up the record with nslookup, you won’t find it under the domain name you thought you used. Dumping your nameserver’s database may help you root it out:

$ORIGIN edu.movie.edu.
movie    IN    NS    toystory.movie.edu.movie.edu.

The $ORIGIN line looks odd enough to stand out.

7. Missing Root Hints Data

You’re unlikely to run into this problem with BIND 9 because it has built-in root hints.

If, for some reason, you forget to install a root hints file on your nameserver or you accidentally delete it, your nameserver will be unable to resolve names outside of its authoritative data. This behavior is easy to recognize using nslookup, but be careful to use full, dot-terminated domain names, or else the search list may cause misleading failures:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> ftp.uu.net.    A lookup of a name outside your nameserver's authoritative data
                                   causes a SERVFAIL error . . .

Server:  toystory.movie.edu
Address:  192.249.249.3

*** toystory.movie.edu can't find ftp.uu.net.: Server failed

A lookup of a name in your nameserver’s authoritative data returns a response:

>wormhole.movie.edu.
Server:  toystory.movie.edu
Address:  192.249.249.3

Name:    wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

> ^D

To confirm your suspicion that the root hints data is missing, check the syslog output for an error like this:

Jan  6 15:10:22 toystory named[764]: No root nameservers for class IN

Class 1, you’ll remember, is the IN, or Internet, class. This error indicates that because no root hints data was available, no root nameservers were found.

8. Loss of Network Connectivity

Though the Internet is more reliable today than it was back in the wild and woolly days of the ARPAnet, network outages are still relatively common. Without “lifting the hood” and poking around in debugging output, these failures usually look like poor performance:

%nslookup nisc.sri.com.
Server:  toystory.movie.edu
Address:  192.249.249.3

*** Request to toystory.movie.edu timed out ***

If you turn on nameserver debugging, though, you may see that your nameserver, anyway, is healthy. It received the query from the resolver, sent the necessary queries, and waited patiently for a response. It just didn’t get one. Here’s what the debugging output might look like on a BIND 8 nameserver:

Debug turned ON, Level 1

Here, nslookup sends the first query to our local nameserver for the IP address of nisc.sri.com. The query is then forwarded to another nameserver, and, when no answer is received, is resent to a different nameserver:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
forw: forw -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms retry 4 sec
resend(addr=1 n=0) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms

Now nslookup is getting impatient, and it queries our local nameserver again. Notice that it uses the same source port. The local nameserver ignores the duplicate query and tries forwarding the query two more times:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
resend(addr=2 n=0) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=3 n=0) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms

nslookup queries the local nameserver again, and the nameserver fires off more queries:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
resend(addr=4 n=0) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=0 n=1) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=1 n=1) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=2 n=1) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=3 n=1) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=4 n=1) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=0 n=2) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms
Debug turned OFF

On a BIND 9 nameserver, there’s considerably less detail at debug level 1. Still, you can see that the nameserver is trying repeatedly to look up nisc.sri.com:

Sep 26 14:33:27.486 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:33:27.486 createfetch: nisc.sri.com. A
Sep 26 14:33:32.489 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:33:32.490 createfetch: nisc.sri.com. A
Sep 26 14:33:42.500 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:33:42.500 createfetch: nisc.sri.com. A
Sep 26 14:34:02.512 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:34:02.512 createfetch: nisc.sri.com. A

At higher debug levels, you can actually see the timeouts, but BIND 9.3.2 still doesn’t show the addresses of the remote nameservers tried.

From the BIND 8 debugging output, you can extract a list of the IP addresses of the nameservers that your nameserver tried to query, and then check your connectivity to them. Odds are, ping won’t have much better luck than your nameserver did:

%ping 198.41.0.4 -n 10  ping first nameserver queried
PING 198.41.0.4: 64 byte packets

----198.41.0.4 PING Statistics----
10 packets transmitted, 0 packets received, 100% packet loss
% ping 128.9.0.107 -n 10  ping second nameserver queried
PING 128.9.0.107: 64 byte packets

----128.9.0.107 PING Statistics----
10 packets transmitted, 0 packets received, 100% packet loss

If it does, you should check that the remote nameservers are really running. You might also check whether your Internet firewall is inadvertently blocking your nameserver’s queries. If you’ve upgraded to BIND 8 or 9 recently, see the sidebar "A Gotcha with BIND 8 or 9 and Packet-Filtering Firewalls" in Chapter 11 and see if it applies to you.

If ping can’t get through either, all that’s left to do is locate the break in the network. Utilities like traceroute and ping’s record route option can be very helpful in determining whether the problem is on your network, the destination network, or somewhere in the middle.

Also, use your own common sense when tracking down the break. In this trace, for example, the remote nameservers your nameserver tried to query are all root nameservers. (You might have had their PTR records cached somewhere, so you could find out their domain names.) Now it’s not very likely that each root’s local network went down, nor that the Internet’s backbone networks collapsed entirely. Occam’s razor says that the simplest condition that could cause this behavior—namely, the loss of your network’s link to the Internet—is most likely the cause.

9. Missing Subdomain Delegation

Even though registrars do their very best to process your requests as quickly as possible, it may take a day or two for your subdomain’s delegation to appear in your parent zone’s nameservers. If your parent zone isn’t one of the generic top-level domains, your mileage may vary. Some parents are quick and responsible, others are slow and inconsistent. Just like in real life, though, you’re stuck with them.

Until your zone’s delegation appears in your parent zone’s nameservers, your nameservers will be able to look up data in the Internet’s namespace, but no one out on the Internet (outside of your domain) will know how to look up data in your namespace.

That means that even though you may be able to send mail outside of your domain, the recipients won’t be able to reply to it. Furthermore, no one will be able to ssh to, ftp to, or even ping your hosts by domain name.

Remember that this applies equally to any in-addr.arpa zones you may run. Until their parent zones add delegation to your servers, nameservers on the Internet won’t be able to reverse-map addresses on your networks.

To determine whether your zone’s delegation has made it into your parent zone’s nameservers, query a parent nameserver for the NS records for your zone. If the parent nameserver has the data, any nameserver on the Internet can find it:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> server a.root-servers.net.  Query a root nameserver
Default Server:  a.root-servers.net
Address:  198.41.0.4

> set norecurse              Instruct the server to answer out of its own data
> set type=ns                and to look for NS records
> 249.249.192.in-addr.arpa.  for 249.249.192.in-addr.arpa
Server:  a.root-servers.net
Address:  198.41.0.4

192.in-addr.arpa        nameserver = chia.ARIN.NET
192.in-addr.arpa        nameserver = dill.ARIN.NET
192.in-addr.arpa        nameserver = BASIL.ARIN.NET
192.in-addr.arpa        nameserver = henna.ARIN.NET
192.in-addr.arpa        nameserver = indigo.ARIN.NET
192.in-addr.arpa        nameserver = epazote.ARIN.NET
192.in-addr.arpa        nameserver = figwort.ARIN.NET

> server dill.arin.net.  Query an in-addr.arpa nameserver
Server:  dill.arin.net
Address:  192.35.51.32

> 249.249.192.in-addr.arpa.
Server:  dill.arin.net
Address:  192.35.51.32

*** dill.arin.net can't find 249.249.192.in-addr.arpa.: Non-existent domain

Here, the delegation clearly hasn’t been added yet. You can either wait patiently or, if an unreasonable amount of time has passed since you requested delegation from your parent zone, contact your parent zone’s administrator and ask what’s up.

10. Incorrect Subdomain Delegation

Incorrect subdomain delegation is another familiar problem on the Internet. Keeping delegation up to date requires human intervention—informing your parent zone’s administrator of changes to your set of authoritative nameservers. Consequently, delegation information often becomes inaccurate as administrators make changes without letting their parents know. Far too many administrators believe that setting up delegation is a one-shot deal: they let their parents know which nameservers are authoritative once when they set up their zone and then never talk to them again. They don’t even call on Mother’s Day.

An administrator may add a new nameserver, decommission another, and change the IP address of a third, all without telling the parent zone’s administrator. Gradually, the number of nameservers correctly delegated to by the parent zone dwindles. In the best case, this leads to long resolution times as querying nameservers struggle to find an authoritative nameserver for the zone. If the delegation information becomes badly out of date, and the last authoritative nameserver is brought down for maintenance, the information within and below the zone will be inaccessible.

If you suspect bad delegation from your parent zone to your zone, from your zone to one of your children, or from a remote zone to one of its children, you can check with nslookup:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> server a.root-servers.net.      Set server to the parent zone's nameserver that
                                                                     you
suspect has bad delegation
Default Server:  a.root-servers.net
Address:  198.41.0.4

> set type=ns                     Look for NS records
> hp.com.                         for the zone in question
Server:         a.root-servers.net.
Address:        198.41.0.4

Non-authoritative answer:
*** Can't find hp.com.: No answer

Authoritative answers can be found from:
com     nameserver = A.GTLD-SERVERS.NET.
com     nameserver = G.GTLD-SERVERS.NET.
com     nameserver = H.GTLD-SERVERS.NET.
com     nameserver = C.GTLD-SERVERS.NET.
com     nameserver = I.GTLD-SERVERS.NET.
com     nameserver = B.GTLD-SERVERS.NET.
com     nameserver = D.GTLD-SERVERS.NET.
com     nameserver = L.GTLD-SERVERS.NET.
com     nameserver = F.GTLD-SERVERS.NET.
com     nameserver = J.GTLD-SERVERS.NET.
com     nameserver = K.GTLD-SERVERS.NET.
com     nameserver = E.GTLD-SERVERS.NET.
com     nameserver = M.GTLD-SERVERS.NET.
A.GTLD-SERVERS.NET      has AAAA address 2001:503:a83e::2:30
A.GTLD-SERVERS.NET      internet address = 192.5.6.30
G.GTLD-SERVERS.NET      internet address = 192.42.93.30
H.GTLD-SERVERS.NET      internet address = 192.54.112.30
C.GTLD-SERVERS.NET      internet address = 192.26.92.30
I.GTLD-SERVERS.NET      internet address = 192.43.172.30
B.GTLD-SERVERS.NET      has AAAA address 2001:503:231d::2:30
B.GTLD-SERVERS.NET      internet address = 192.33.14.30
D.GTLD-SERVERS.NET      internet address = 192.31.80.30
L.GTLD-SERVERS.NET      internet address = 192.41.162.30
F.GTLD-SERVERS.NET      internet address = 192.35.51.30
J.GTLD-SERVERS.NET      internet address = 192.48.79.30
K.GTLD-SERVERS.NET      internet address = 192.52.178.30
E.GTLD-SERVERS.NET      internet address = 192.12.94.30
M.GTLD-SERVERS.NET      internet address = 192.55.83.30

> server a.gtld-servers.net.                     Switch to a COM nameserver
Default server: a.gtld-servers.net.
Address: 192.5.6.30#53

> hp.com.                     Ask again
Server:         a.gtld-servers.net.
Address:        192.5.6.30#53

Non-authoritative answer:
hp.com  nameserver = am10.hp.com.
hp.com  nameserver = am3.hp.com.
hp.com  nameserver = ap1.hp.com.
hp.com  nameserver = eu1.hp.com.
hp.com  nameserver = eu2.hp.com.
hp.com  nameserver = eu3.hp.com.

Authoritative answers can be found from:
am10.hp.com     internet address = 15.227.128.50
am3.hp.com      internet address = 15.243.160.50
ap1.hp.com      internet address = 15.211.128.50
eu1.hp.com      internet address = 16.14.64.50
eu2.hp.com      internet address = 16.6.64.50
eu3.hp.com      internet address = 16.8.64.50

Let’s say you suspect that the delegation to am10.hp.com is incorrect. You now query am10.hp.com for data in the hp.com zone (e.g., the SOA record for hp.com) and check the answer:

>server am10.hp.com.
Default Server:  am10.hp.com
Addresses:  15.227.128.50

> set norecurse
> set type=soa
> hp.com.
Server:  am10.hp.com
Addresses:  15.227.128.50

Non-authoritative answer:
hp.com
        origin = charon.core.hp.com
        mail addr = hostmaster.hp.com
        serial = 1008811
        refresh = 3600
        retry = 900
        expire = 604800
        minimum = 600

Authoritative answers can be found from:
hp.com  nameserver = eu3.hp.com.
hp.com  nameserver = am3.hp.com.
hp.com  nameserver = ap1.hp.com.
hp.com  nameserver = eu1.hp.com.
hp.com  nameserver = eu2.hp.com.
am3.hp.com      internet address = 15.243.160.50
ap1.hp.com      internet address = 15.211.128.50
eu1.hp.com      internet address = 16.14.64.50
eu2.hp.com      internet address = 16.6.64.50
eu3.hp.com      internet address = 16.8.64.50

If am10.hp.com really were authoritative for hp.com, it would have responded with an authoritative answer. The administrator of the hp.com zone can tell you whether am10.hp.com should be an authoritative nameserver for hp.com, so that’s who you should contact.

Another common symptom of this is a “lame server” error message:

Oct 1 04:43:38 toystory named[146]: Lame server on '40.234.23.210.in-addr.arpa'
(in '210.in-addr.arpa'?): [198.41.0.5].53 'RS0.INTERNIC.NET': learnt(A=198.41.0.
21,NS=128.63.2.53)

Here’s how to read this: your nameserver was referred by the nameserver at 128.63.2.53 to the nameserver at 198.41.0.5 for a name in the domain 210.in-addr.arpa, specifically 40.234.23.210.in-addr.arpa. The response from the nameserver at 198.41.0.5 indicated that it wasn’t, in fact, authoritative for 210.in-addr.arpa, and therefore either the delegation that 128.63.2.53 gave you is wrong, or the server at 198.41.0.5 is misconfigured.

11. Syntax Error in resolv.conf

Despite the resolv.conf file’s simple syntax, people do occasionally make mistakes when editing it. And, unfortunately, lines with syntax errors in resolv.conf are silently ignored by the resolver. The result is usually that some part of your intended configuration doesn’t take effect: either your local domain name or search list isn’t set correctly, or the resolver won’t query one of the nameservers you configured it to query. Commands that rely on the search list won’t work, your resolver won’t query the right nameserver, or it won’t query a nameserver at all.

The easiest way to check whether your resolv.conf file is having the intended effect is to run nslookup. nslookup will kindly report the local domain name and search list it derives from resolv.conf, plus the nameserver it’s querying, when you type set all, as we showed you in Chapter 12:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> set all
Default Server:  toystory.movie.edu
Address:  192.249.249.3

Set options:
  novc                  nodebug         nod2
  search                recurse
  timeout = 0           retry = 3       port = 53
  querytype = A         class = IN
  srchlist=movie.edu

>

Check that the output of set all is what you expect, given your resolv.conf file. For example, if you set search fx.movie.edu movie.edu in resolv.conf, you’d expect to see:

srchlist=fx.movie.edu/movie.edu

in the output. If you don’t see what you’re expecting, look carefully at resolv.conf. If there’s nothing obvious, look for unprintable characters (with vi’s set list command, for example). Watch out for trailing spaces, especially; on older resolvers, a trailing space after the domain name will set the local domain name to include a space. No real top-level domain names actually end with spaces, of course, so all of your non-dot-terminated lookups will fail.

12. Local Domain Name Not Set

Failing to set your local domain name is another old standby gaffe. You can set it implicitly by setting your hostname to your host’s fully qualified domain name or explicitly in resolv.conf. The characteristics of an unset local domain name are straightforward: folks who use single-label names (or abbreviated domain names) in commands get no joy:

%telnet br
br: No address associated with name
% telnet br.fx
br.fx: No address associated with name
% telnet br.fx.movie.edu
Trying...
Connected to bladerunner.fx.movie.edu.
Escape character is '^]'.

HP-UX bladerunner.fx.movie.edu A.08.07 A 9000/730 (ttys1)
login:

You can use nslookup to check this one, much as you do when you suspect a syntax error in resolv.conf:

%nslookup
Default Server:  toystory.movie.edu
Address:  192.249.249.3

> set all
Default Server:  toystory.movie.edu
Address:  192.249.249.3

Set options:
  novc                  nodebug         nod2
  search                recurse
  timeout = 0           retry = 3       port = 53
  querytype = A         class = IN
  srchlist=

Notice that the search list is set. You can also track this down by enabling debugging on the nameserver. (This, of course, requires access to the nameserver, which may not be running on the host that the problem is affecting.) Here’s how the debugging output from a BIND 9 nameserver might look after trying those telnet commands:

Sep 26 16:17:58.824 client 192.249.249.3#1032: query: br A
Sep 26 16:17:58.825 createfetch: br. A
Sep 26 16:18:09.996 client 192.249.249.3#1032: query: br.fx A
Sep 26 16:18:09.996 createfetch: br.fx. A
Sep 26 16:18:18.677 client 192.249.249.3#1032: query: br.fx.movie.edu A

On a BIND 8 nameserver, it would look something like this:

Debug turned ON, Level 1

datagram from [192.249.249.3].1057, fd 5, len 20
req: nlookup(br) id 27974 type=1 class=1
req: missed 'br' as '' (cname=0)
forw: forw -> [198.41.0.4].53 ds=7 nsid=61691 id=27974 0ms retry 4 sec

datagram from [198.41.0.4].53, fd 5, len 20
ncache: dname br, type 1, class 1
send_msg -> [192.249.249.3].1057 (UDP 5) id=27974

datagram from [192.249.249.3].1059, fd 5, len 23
req: nlookup(br.fx) id 27975 type=1 class=1
req: missed 'br.fx' as '' (cname=0)
forw: forw -> [128.9.0.107].53 ds=7 nsid=61692 id=27975 0ms retry 4 sec

datagram from [128.9.0.107].53, fd 5, len 23
ncache: dname br.fx, type 1, class 1
send_msg -> [192.249.249.3].1059 (UDP 5) id=27975

datagram from [192.249.249.3].1060, fd 5, len 33
req: nlookup(br.fx.movie.edu) id 27976 type=1 class=1
req: found 'br.fx.movie.edu' as 'br.fx.movie.edu' (cname=0)
req: nlookup(bladerunner.fx.movie.edu) id 27976 type=1 class=1
req: found 'bladerunner.fx.movie.edu' as 'bladerunner.fx.movie.edu'
     (cname=1)
ns_req: answer -> [192.249.249.3].1060 fd=5 id=27976 size=183 Local
Debug turned OFF

Contrast this with the debugging output produced by the application of the search list in Chapter 13. The only names looked up here are exactly what the user typed, with no domain names appended at all. Clearly, the search list isn’t being applied.

13. Response from Unexpected Source

One problem we’ve seen increasingly often in the DNS newsgroups is the “response from unexpected source.” This was once called a martian response: it’s a response that comes from an IP address other than the one your nameserver sent a query to. When a BIND nameserver sends a query to a remote server, BIND conscientiously makes sure that answers come only from the IP addresses on that server. This helps minimize the possibility of accepting spoofed responses. BIND is equally demanding of itself: a BIND server makes every effort to reply via the same network interface that it received a query on.

Here’s the error message you’d see upon receiving a possibly unsolicited response:

Mar  8 17:21:04 toystory named[235]: Response from unexpected source
([205. 199.4.131].53)

This can mean one of two things: either someone is trying to spoof your nameserver, or—more likely—you sent a query to an older BIND server or a different make of nameserver that’s not as assiduous about replying from the same interface it receives queries on.

Transition Problems

With the release of BIND 8, and now BIND 9, many Unix operating systems are updating their resolvers and nameservers. Some features of the most recent versions of BIND, however, may seem like errors to you after you upgrade to a new version. We’ll try to give you an idea of some changes you may notice in your nameserver and name service after making the jump.

Resolver Behavior

The changes to the resolver’s default search list described in Chapter 6 may seem like a problem to your users. Recall that with a local domain name set to fx.movie.edu, your default search list will no longer include movie.edu. Therefore, users accustomed to using commands such as ssh db.personnel and having the partial domain name expanded to db.personnel.movie.edu will have their commands fail. To solve this problem, you can use the search directive to define an explicit search list that includes your local domain name’s parent. Or just tell your users to expect the new behavior.

Nameserver Behavior

Before Version 4.9, a BIND nameserver would gladly load data in any zone from any zone datafile that the nameserver read as a primary. If you configured the nameserver as the primary for movie.edu and told it that the movie.edu data was in db.movie.edu, you could stick data about hp.com in db.movie.edu, and your nameserver would load the hp.com resource records into the cache. Some books even suggested putting the data for all your in-addr.arpa zones in one file. Ugh.

All BIND 4.9 and later nameservers ignore any “out of zone” resource records in a zone datafile. So if you cram PTR records for all your in-addr.arpa zones into one file and load it with a single zone statement, the nameserver ignores all the records not in the named zone. And that, of course, means loads of missing PTR records and failed gethostbyaddr() calls.

BIND does log that it’s ignoring the records in syslog. The messages look like this in BIND 9:

Sep 26 13:48:19 toystory named[21960]: dns_master_load: db.movie.edu:16:
ignoring out-of-zone data

and like this in BIND 8:

Jan  7 13:58:01 toystory named[231]: db.movie.edu:16: data "hp.com" outside zone
     "movie.edu" (ignored)
Jan  7 13:58:01 toystory named[231]: db.movie.edu:17: data "hp.com" outside zone
     "movie.edu" (ignored)

The solution is to use one zone datafile and one zone statement per zone.

Interoperability and Version Problems

With the move to BIND 9 and the introduction of Microsoft DNS Server, more interoperability problems are cropping up between nameservers. There are also a handful of problems unique to one version or another of BIND or the underlying operating system. Many of these are easy to spot and correct, and we would be remiss if we didn’t cover them.

Zone Transfer Fails Because of Proprietary WINS Record

When a Microsoft DNS Server is configured to consult a WINS server for names it can’t find in a given zone, it inserts a special record into the zone datafile. The record looks like this:

@   IN   WINS   &IP address of WINS server

Unfortunately, WINS is not a standard record type in the IN class. Consequently, if there are BIND slaves that transfer this zone, they’ll choke on the WINS record and refuse to load the zone:

May 23 15:58:43 toystory named-xfer[386]: "fx.movie.edu IN 65281" -
unknown type (65281)

The workaround for this is to configure the Microsoft DNS Server to filter out the proprietary record before transferring the zone. You do this by selecting the zone on the left side of the DNS Manager screen, right-clicking on it, and selecting Properties. Click on the WINS Lookup tab in the resulting Zone Properties window, shown in Figure 14-1.

Zone Properties window
Figure 14-1. Zone Properties window

Checking Settings only affect local server filters out the WINS record for that zone. However, if there are any Microsoft DNS Server slaves, they won’t see the record either, even though they can use it.

Nameserver Reports “no NS RR for SOA MNAME”

You’ll see this error only on BIND 8.1 servers:

May 8 03:44:38 toystory named[11680]: no NS RR for SOA MNAME "movie.edu" in
     zone "movie.edu"

The 8.1 server was a real stickler about the first field in the SOA record. Remember that one? In Chapter 4, we said that it was, by convention, the domain name of the primary nameserver for the zone. BIND 8.1 assumes it is and checks for a corresponding NS record pointing the zone’s domain name to the server in that field. If there’s no such NS record, BIND emits that error message. It will also prevent NOTIFY messages from working correctly. The solution is either to change your MNAME field to the domain name of a nameserver listed in an NS record or to upgrade to a newer version of BIND 8. Upgrading is the better option because BIND 8.1 is so old. The check was removed in BIND 8.1.1.

Nameserver Reports “Too many open files”

On hosts with many IP addresses or a low limit on the maximum number of files a user can open, BIND will report:

Dec 12 11:52:06 toystory named[7770]: socket(SOCK_RAW): Too many open files

and die.

Since BIND tries to bind() to and listen on every network interface on the host, it may run out of file descriptors. This is especially common on hosts that use lots of virtual interfaces, often in support of web hosting. The possible solutions are:

  • Use name-based virtual hosting, which doesn’t require additional IP addresses.

  • Configure your BIND 8 or 9 nameserver to listen on only one or a few of the host’s network interfaces using the listen-on substatement. If toystory.movie.edu is the host we’re having this problem with, the following:

    options {
        listen-on { 192.249.249.3; };
    };
  • will tell named on toystory.movie.edu to bind() only to the IP address 192.249.249.3.

  • Reconfigure your operating system to allow a process to open more file descriptors concurrently.

Resolver Reports “asked for PTR, got CNAME”

This is another problem related to BIND’s strictness. On some lookups, the resolver logs:

Sep 24 10:40:11 toystory syslog: gethostby*.getanswer: asked for
     "37.103.74.204.in-addr.arpa IN PTR", got type "CNAME"
Sep 24 10:40:11 toystory syslog: gethostby*.getanswer: asked for
     "37.103.74.204.in-addr.arpa", got "37.32/27.103.74.204.in-addr.arpa"

What happened here is that the resolver asked the nameserver to reverse-map the IP address 204.74.103.37 to a domain name. The server did, but in the process found that 37.103.74.204.in-addr.arpa was actually an alias for 37.32/27.103.74.204.in-addr.arpa. That’s almost certainly because the folks who run 103.74.204.in-addr.arpa are using the scheme we described in Chapter 9 to delegate part of their namespace. The BIND 4.9.3-BETA resolver, however, doesn’t understand that and flags it as an error, thinking it didn’t get the domain name or the type it was after. And, believe it or not, some operating systems ship with the BIND 4.9.3-BETA resolver as their system resolver.

The only solution to this problem is to upgrade to a newer version of the BIND resolver.

Nameserver Startup Fails Because UDP Checksums Disabled

On some hosts running SunOS 4.1.x, you’ll see this error:

Sep 24 10:40:11 toystory named[7770]: ns_udp checksums NOT turned on: exiting

named checked to make sure UDP checksumming was turned on on this system, and it wasn’t, so named exited. named is insistent on UDP checksumming for good reason: it makes copious use of UDP and needs those UDP datagrams to arrive unmolested.

The solution to this problem is to enable UDP checksums on your system. The BIND distribution has documentation on that in shres/sunos/INSTALL and src/port/sunos/shres/ISSUES (in the BIND 8 distribution).

Other Nameservers Don’t Cache Your Negative Answers

You need a keen eye to notice this problem, and, if you’re running BIND 8, you’d have to have turned off an important feature to have caused the problem. If you’re running BIND 9, though, the feature is turned off by default. If you’re running a BIND 8 or 9 nameserver and other resolvers and servers seem to ignore your server’s cached negative responses, auth-nxdomain is probably off.

auth-nxdomain is an options substatement that tells a BIND 8 or 9 nameserver to flag cached negative responses as authoritative, even though they’re not. That is, if your nameserver has cached the fact that titanic.movie.edu does not exist from the authoritative movie.edu nameservers, auth-nxdomain tells your server to pass along that cached response to resolvers and servers that query it as though it were the authoritative nameserver for movie.edu.

The reason this feature is sometimes necessary is that some nameservers check to make sure that negative responses (such as an NXDOMAIN return code or no records with a NOERROR return code) are marked authoritative. In the days before negative caching, negative responses had to be authoritative, so this was a sensible sanity check. With the advent of negative caching, however, a negative response could come from the cache. To make sure that older servers don’t ignore such answers, though, or consider them errors, BIND 8 and 9 let you falsely flag those responses as authoritative. In fact, that’s the default behavior for a BIND 8 nameserver, so you shouldn’t see remote queriers ignoring your BIND 8 server’s negative responses unless you’ve explicitly turned off auth-nxdomain. BIND 9 nameservers, on the other hand, have auth-nxdomain off by default, so queriers may ignore their responses even if you haven’t touched the config file.

TTL Not Set

As we mentioned in Chapter 4, RFC 2308 was published just before BIND 8.2 was released. RFC 2308 changed the semantics of the last field in the SOA record to be the negative-caching TTL and introduced a new control statement, $TTL, to set the default TTL for a zone datafile.

If you upgrade to a BIND 8 nameserver newer than 8.2 without adding the necessary $TTL control statements to your zone datafiles, you’ll see messages like this one in your nameserver’s syslog output:

Sep 26 19:34:39 toystory named[22116]: Zone "movie.edu" (file db.movie.edu): No
default TTL ($TTL <value>) set, using SOA minimum instead

BIND 8 generously assumes that you just haven’t read RFC 2308 yet and is content to use the last field of the SOA record as both the zone’s default TTL and its negative-caching TTL. BIND 9 nameservers older than 9.2.0, however, aren’t so forgiving:

Sep 26 19:35:54 toystory named[22124]: dns_master_load: db.movie.edu:7: no TTL
    specified
Sep 26 19:35:54 toystory named[22124]: dns_zone_load: zone movie.edu/IN:
    database db.movie.edu: dns_db_load failed: no ttl
Sep 26 19:35:54 toystory named[22124]: loading zones: no ttl
Sep 26 19:35:54 toystory named[22124]: exiting (due to fatal error)

So before upgrading to BIND 9, be sure that you add the necessary $TTL control statements.

TSIG Errors

As we said in Chapter 11, transaction signatures require time synchronization and key synchronization (the same key on either end of the transaction, plus the same key name) to work. Here are a couple of errors that may arise if you lose time synchronization or use different keys or key names:

  • Here’s an error you’d see on a BIND 8 nameserver if you had configured TSIG but had too much clock skew between your primary nameserver and a slave:

    Sep 27 10:47:49 wormhole named[22139]: Err/TO getting serial# for "movie.edu"
    Sep 27 10:47:49 wormhole named-xfer[22584]: SOA TSIG verification from server
    [192.249.249.3], zone movie.edu: message had BADTIME set (18)
  • Here, your nameserver tries to check the serial number of the movie.edu zone on toystory.movie.edu (192.249.249.3). The response from toystory.movie.edu doesn’t verify because wormhole.movie.edu ’s clock shows a time difference of more than 10 minutes from the time the response was signed. The Err/TO message is just a byproduct of the failure of the TSIG-signed response to verify.

  • If you use a different key name on either end of the transaction, even if the data the key name refers to is the same, you’ll see an error like this one from your BIND 8 nameserver:

    Sep 27 12:02:44 wormhole named-xfer[22651]: SOA TSIG verification from server
    [209.8.5.250], zone movie.edu: BADKEY(-17)
  • This time, the TSIG-signed response doesn’t check out because the verifier can’t find a key with the name specified in the TSIG record. You’d see the same error if the key name matched but pointed to different data.

As always, BIND 9 is considerably more closed-mouthed about TSIG failure, reporting only:

Sep 27 13:35:42.804 client 192.249.249.1#1115: query: movie.edu SOA
Sep 27 13:35:42.804 client 192.249.249.1#1115: error

at debug level 3 for both previous scenarios.

Problem Symptoms

Some problems, unfortunately, aren’t as easy to identify as the ones we listed. You’ll experience some misbehavior but won’t be able to attribute it directly to its cause, often because any of a number of problems can cause the symptoms you see. For cases like this, we’ll suggest some of the common causes of these symptoms and ways to isolate them.

Local Name Can’t Be Looked Up

The first thing to do when a program such as ssh or ftp can’t look up a local domain name is to use nslookup or dig to try to look up the same name. When we say “the same name,” we mean literally the same name: don’t add labels and a trailing dot if the user didn’t type them. Don’t query a different nameserver than the user did.

As often as not, the user mistyped the name or doesn’t understand how the search list works and just needs direction. Occasionally, you’ll turn up real host configuration errors:

  • Syntax errors in resolv.conf (problem 11 in the earlier section "Potential Problem List“)

  • An unset local domain name (problem 12)

You can check for either of these using nslookup’s set all command.

If nslookup points to a problem with the nameserver rather than with the host configuration, check for the problems associated with the type of nameserver. If the nameserver is the primary for the zone, but it isn’t responding with data you think it should:

  • Check that the zone datafile contains the data in question and that the nameserver has loaded it (problem 2). A database dump can tell you for sure whether the data was loaded.

  • Check the configuration file and the pertinent zone datafile for syntax errors (problem 5). Check the nameserver’s syslog output for indications of those errors.

  • Ensure that the records have trailing dots, if they require them (problem 6).

If the nameserver is a slave server for the zone, you should first check whether its master has the correct data. If it does, and the slave doesn’t:

  • Make sure you’ve incremented the serial number on the primary (problem 1).

  • Look for a problem on the slave in updating the zone (problem 3).

If the primary doesn’t have the correct data, of course, diagnose the problem on the primary.

If the problem server is a caching-only nameserver:

  • Make sure it has its root hints (problem 7).

  • Check that your parent zone’s delegation to your zone exists and is correct (problems 9 and 10). Remember that to a caching-only server, your zone looks like any other remote zone. Even though the host it runs on may be inside your zone, the caching-only nameserver must be able to locate an authoritative server for your zone from your parent zone’s servers.

Remote Names Can’t Be Looked Up

If your local lookups succeed but you can’t look up domain names outside your local zones, there is a different set of problems to check:

  • First, did you just set up your nameservers? You might have omitted the root hints data (problem 7).

  • Can you ping the remote zone’s nameservers? Maybe you can’t reach the remote zone’s servers because of connectivity loss (problem 8).

  • Is the remote zone new? Maybe its delegation hasn’t yet appeared (problem 9). Or the delegation information for the remote zone may be wrong or out of date due to neglect (problem 10).

  • Does the domain name actually exist on the remote zone’s servers (problem 2)? On all of them (problems 1 and 3)?

Wrong or Inconsistent Answer

If you get the wrong answer when looking up a local domain name, or an inconsistent answer depending on which nameserver you ask or when you ask, first check the synchronization between your nameservers:

  • Are they all holding the same serial number for the zone? Did you forget to increment the serial number on the primary after you made a change (problem 1)? If you did, the nameservers may all have the same serial number, but they will answer differently out of their authoritative data.

  • Did you roll the serial number back to 1 (problem 1 again)? Then the primary’s serial number will appear much lower than the slaves’ serial numbers.

  • Did you forget to reload the primary (problem 2)? Then the primary will return (via nslookup or dig, for example) a different serial number from the one in the zone datafile.

  • Are the slaves having trouble updating from their master(s) (problem 3)? If so, they should have syslogged appropriate error messages.

  • Is the nameserver’s round-robin feature rotating the addresses of the domain name you’re looking up?

If you get these results when looking up a domain name in a remote zone, you should check whether the remote zone’s nameservers have lost synchronization. You can use tools such as nslookup and dig to determine whether the remote zone’s administrator forgot to increment the serial number, for example. If the nameservers answer differently from their authoritative data but show the same serial number, the serial number probably wasn’t incremented. If the primary’s serial number is much lower than the slaves', the primary’s serial number was probably accidentally reset. We usually assume a zone’s primary nameserver is running on the host listed in the MNAME (first) field of the SOA record.

You probably can’t determine conclusively that the primary hasn’t been reloaded, though. It’s also difficult to pin down updating problems between remote nameservers. In cases like this, if you’ve determined that the remote nameservers are giving out incorrect data, contact the zone administrator and (gently) relay what you’ve found. This will help the administrator track down the problem on the remote end.

If you can determine that a parent nameserver—a remote zone’s parent, your zone’s parent, or even one in your zone—is giving out a bad answer, check whether this is coming from old delegation information. Sometimes this requires contacting both the administrator of the remote zone and the administrator of its parent to compare the delegation and the current, correct list of authoritative nameservers.

If you can’t induce the administrator to fix the data or if you can’t track down the administrator, you can always use the bogus server substatement to instruct your nameserver not to query that particular server.

Lookups Take a Long Time

Slow name resolution is usually due to one of two problems:

  • Connectivity loss (problem 8), which you can diagnose with nameserver debugging output and tools such as ping

  • Incorrect delegation information (problem 10) pointing to the wrong nameservers or the wrong IP addresses

Usually, going over the debugging output and sending a few pings will point to one or the other: either you can’t reach the nameservers at all, or you can reach the hosts but the nameservers aren’t responding.

Sometimes, though, the results are inconclusive. For example, the parent nameservers delegate to a set of nameservers that don’t respond to pings or queries, but connectivity to the remote network seems all right (a traceroute, for example, will get you to the remote network’s “doorstep”—the last router between you and the host). Is the delegation information so badly out of date that the nameservers have long since moved to other addresses? Are the hosts simply down? Or is there really a remote network problem? Usually, finding out requires a call or a message to the administrator of the remote zone. (Remember, whois gives you phone numbers!)

rlogin and rsh to Host Fails Access Check

This is a problem you expect to see right after you set up your nameservers. Users unaware of the change from the host table to domain name service won’t know to update their .rhosts files. (We covered what needs to be updated in Chapter 6.) Consequently, rlogin’s or rsh’s access check will fail and deny the user access.

Other causes of this problem are missing or incorrect in-addr.arpa delegation (problems 9 and 10) or forgetting to add a PTR record for the client host (problem 4). If you’ve recently upgraded to BIND 4.9 or newer and have PTR data for more than one in-addr.arpa zone in a single zone datafile, your nameserver may be ignoring the out-of-zone data. Any of these situations will result in the same behavior:

%rlogin wormhole
Password:

In other words, the user is prompted for a password despite having set up password-less access with .rhosts or hosts.equiv. If you were to look at the syslog file on the destination host (wormhole.movie.edu , in this case), you’d probably see something like this:

May  4 18:06:22 wormhole inetd[22514]: login/tcp: Connection
       from unknown (192.249.249.213)

You can tell which problem it is by stepping through the resolution process with your favorite query tool. First, query one of your in-addr.arpa zone’s parent nameservers for NS records for your in-addr.arpa zone. If these are correct, query the nameservers listed for the PTR record corresponding to the IP address of the rlogin or rsh client. Make sure they all have the PTR record and that the record maps to the right domain name. If not all the nameservers have the record, check for a loss of synchronization between the primary and the slaves (problems 1 and 3).

Access to Services Denied

Sometimes rlogin and rsh aren’t the only services to go. Occasionally, you’ll install BIND on your server and your diskless hosts won’t boot, and hosts won’t be able to mount disks from the server, either.

If this happens, make sure that the case of the domain names your nameservers return agrees with the case your previous name service returned. For example, if you are running NIS and your NIS host maps contain only lowercase names, you should make sure your nameservers also return lowercase domain names. Some programs are case-sensitive and won’t recognize names in a different case in a datafile, such as /etc/bootparams or /etc/exports.

Can’t Get Rid of Old Data

Sometimes, after decommissioning a nameserver or changing a server’s IP address, you’ll find the old address record lingering around. An old record may show up in a nameserver’s cache or in a zone datafile weeks or even months later. The record clearly should have timed out of any caches by now. So why’s it still there? Well, there are a few reasons this happens. We’ll describe the simpler cases first.

Old delegation information

The first (and simplest) case occurs if a parent zone doesn’t keep up with its children or if the children don’t inform the parent of changes to the authoritative nameservers for the zone. If the edu administrators have this old delegation information for movie.edu:

$ORIGIN movie.edu.
@    86400    IN    NS    toystory
     86400    IN    NS    wormhole
toystory      86400    IN    A    192.249.249.3
wormhole      86400    IN    A    192.249.249.254 ; wormhole's former
                                                  ; IP address

the edu nameservers will give out the bogus old address for wormhole.movie.edu .

This is easily corrected once it’s isolated to the parent zone’s nameservers: just contact the parent zone’s administrator and ask to have the delegation information updated. If your parent zone is one of the gTLDs, you may be able to fix the problem by filling out a form on your registrar’s web site to modify the information about the nameserver. If any of the child zone’s nameservers have cached the bad data, kill them (to clear out their caches), delete any backup zone datafiles that contain the bad data, and restart them.

Registration of a non-nameserver

This is a problem unique to the gTLD zones: com, net, and org. Sometimes, you’ll find the gTLD nameservers giving out stale address information about a host in one of your zones—and not even a nameserver! But why would the gTLD nameservers have information about an arbitrary host in one of your zones?

Here’s the answer: you can register hosts in the gTLD zones that aren’t nameservers at all, such as your web server. For example, you can register an address for www.foo.com through a com registrar, and the com nameservers will give out that address. You shouldn’t, though, because you’ll lose a fair amount of control over the address. If you need to change the address, it could take a day or more to push the change through your registrar. If you run the foo.com primary nameserver, you can make the change almost instantly.

What have I got?

How do you determine which of these problems is plaguing you? Pay attention to which nameservers are distributing the old data and which zones the data relates to:

  • Is the nameserver a gTLD nameserver? Check for a stale, registered address.

  • Is the nameserver your parent nameserver but not a gTLD nameserver? Check the parent for old delegation information.

That’s about all we can think to cover. It’s certainly not a comprehensive list, but we hope it’ll help you solve the more common problems you encounter with DNS and give you ideas about how to approach the rest. Boy, if we’d only had a troubleshooting guide when we started!



[*] BIND 9.1.0 is the first version of BIND 9 to support dumping the database.

[*] The nameserver prints the IP address of the remote nameserver if it’s available. On BIND 8.2 and later nameservers, the IP address is available only if you’ve turned on host-statistics, which we introduced in Chapter 8. On earlier BIND 8 nameservers, it’s on by default. host-statistics keeps impressive statistics on every nameserver and resolver you’ve ever communicated with, which is very useful for some purposes (such as figuring out which nameserver your server got a record from) but consumes a fair amount of memory.

[*] On the other hand, if you encode the date into the serial number, as many people do (e.g., 2001010500 is the first rev of data on January 5, 2001), you may be able to tell at a glance whether you updated the serial number when you made the change.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset