One of my favorite interview questions, which seems to be used by just about everyone, is to ask the interviewee to describe all of the steps of an HTTP request. There are lots of ways you could dig into this. I once had someone explain to me how the computer interprets keyboard presses. While that is fascinating, for now we will walk through the networking side of things.
Let's say you type natwelch.com/resume
into your browser. We will go through all of the steps that happen for that to deliver content for your browser to render.
DNS (domain name system): Computers talk to each other using IP addresses and DNS translates a domain into an IP address. Technically, DNS does a little more than that, but let's start by describing domain names and go from there. In our example, natwelch.com
is the domain and /resume
is the path.
A domain name is made up of a few parts:
co
in .co.uk
the SLD. Others say that natwelch
in natwelch.com
is a SLD. There seems to be no real consensus.www.natwelch.com
, www
is the subdomain.Domain names are actually parsed recursively like a tree.
DNS requests ask for data recursively via a tree. Usually the flow is like so:
natwelch.com
is.natwelch.com
is.natwelch.com
is, but they will know where the .com
servers are.When you are configuring DNS, you will often write out zone records, so your name server will know where to point things. Following are some common record types:
There are lots of other possible DNS records, but the preceding are the most common.
To make DNS requests from the command line, you can use the dig tool
, which is very powerful for exploring DNS. The most basic usage is to run dig
with a domain name:
; <<>> DiG 9.10.6 <<>> natwelch.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5354 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1280 ;; QUESTION SECTION: ;natwelch.com. IN A ;; ANSWER SECTION: natwelch.com. 180 IN A 35.190.39.138 ;; Query time: 55 msec ;; SERVER: 2604:2000:1281:f9:2e30:33ff:fe5f:55af#53(2604:2000:1281:f9:2e30:33ff:fe5f:55af) ;; WHEN: Tue Jun 19 01:45:59 UTC 2018 ;; MSG SIZE rcvd: 57
The only real line that matters is:
natwelch.com. 180 IN A 35.190.39.138
The line says that this natwelch.com.
record has 180 seconds until it is no longer valid. It returned a single A record, with the IP 35.190.39.138
. If there were multiple A records, there would be multiple lines.
dig
also lets you specify a DNS server that you wish to query.
$ dig www.natwelch.com @8.8.8.8 +nocomment ; <<>> DiG 9.10.6 <<>> www.natwelch.com @8.8.8.8 +nocomment ;; global options: +cmd ;www.natwelch.com. IN A www.natwelch.com. 183 IN CNAME natwelch.com. natwelch.com. 183 IN A 35.190.39.138 ;; Query time: 45 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue Jun 19 01:58:39 UTC 2018 ;; MSG SIZE rcvd: 75
@8.8.8.8
specifies that we want to use Google's DNS server at 8.8.8.8
to provide us with an answer. Also, notice here that I requested a domain with a CNAME
instead of an A record, and the DNS server also responded with the appropriate A record, saving me a round trip. DNS don't have to do this, but many do.
The preceding +nocomment
simplifies the output a bit. If you wanted very short output, you can use +short
:
$ dig google.com @8.8.8.8 +short 172.217.12.206
You can also add a record type that you would like to query. In the following example, we ask gmail.com
for its MX records:
$ dig gmail.com MX +nocomment ; <<>> DiG 9.10.6 <<>> gmail.com MX +nocomment ;; global options: +cmd ;gmail.com. IN MX gmail.com. 1330 IN MX 5 gmail-smtp-in.l.google.com. gmail.com. 1330 IN MX 10 alt1.gmail-smtp-in.l.google.com. gmail.com. 1330 IN MX 20 alt2.gmail-smtp-in.l.google.com. gmail.com. 1330 IN MX 30 alt3.gmail-smtp-in.l.google.com. gmail.com. 1330 IN MX 40 alt4.gmail-smtp-in.l.google.com. ;; Query time: 47 msec ;; SERVER: 2604:2000:1281:f9:2e30:33ff:fe5f:55af#53(2604:2000:1281:f9:2e30:33ff:fe5f:55af) ;; WHEN: Tue Jun 19 02:01:28 UTC 2018 ;; MSG SIZE rcvd: 161
You can do lots more with dig
, so run man dig
and find out about all of its cool uses.
Now that we've talked about translating a domain name into an IP, how do you connect two IPs? The most common way for thinking about networking is the Open Systems Interconnection (OSI) model. The OSI model describes networking as layers:
In our example of an HTTP request, layer 1 is the physical connection and layer 2 contains Wi-Fi, Ethernet, and other protocols used for controlling a physical connection. Layer 3 is for the IP, which is how things are routed. Layer 4 is for TCP and UDP. Layers 5 and 6 are rarely talked about, but they are where things such as SSL encapsulation and other networking wrappers happen. Finally, layer 7 is where HTTP and other user-level messages are sitting.
Layers are often useful for describing features of networking tools. For example, Amazon Web Services (AWS) sells two separate load balancers (LB). Its classic Elastic LB is a layer 4 LB. It can only do routing based on what IP and port a request comes in on. Amazon's newer application Elastic LB is a layer 7 LB because it can route based on details inside the HTTP request, such as headers and the request path.
The physical and data link layers of networking are often generalized as "Ethernet." Technically, a lot more is going on here, but the important information to know is that every networking device has a MAC address. MAC (media access control), but in modern-day networking, MACs are just a way of specifying a network connection, such as an Ethernet port, Wi-Fi adapter, Bluetooth connection, or cell phone antenna. Most networking devices come with a default MAC address burned into them, although you can usually change it if needed, because every device on a network needs a unique MAC address.
A MAC address is six octets long, which means it contains 48 bits of data. That means there are 248 possible MAC addresses. MAC addresses are usually shared as 12 hex digits. For example, da:99:9c:e1:a5:f3
.
IP is the protocol for routing packets around the internet. There are three common packets floating around on the internet: TCP, UDP, and ICMP. There are others as well and you can send arbitrary data, but IP is what tells routers how to route your packets and provides a wrapper around them.
A packet is a bunch of bytes sent over the network. It is grouped by bytes and the first few are headers that store data about the connection. Each protocol over the network has extra data. IP starts with the connection data, then inside of it are the higher layers. So, layer 2 wraps layer 3, which wraps layer 4, which wraps layer 5, and so on.
IP promises absolutely nothing, so there is no guarantee that your data will ever make it to a host, and if you don't control that host, there is no central monitoring to see how the data gets there.
Each packet that traverses IP has a source IP address and destination IP address. It can contain other metadata as well.
I won't go into detail here, but reading about Address Resolution Protocol (ARP), Interior Gateway Protocol (IGP), and Border Gateway Protocol (BGP) will explain the complicated mess of how each router that receives a packet determines where to send the packet next on its way through the mesh of the internet toward the packet's final destination IP address.
Some books that go into detail on these topics include:
Classless Inter-Domain Routing (CIDR) notation is a compact way to describe a range of IP addresses. CIDR notation contains an IP address, a slash, and an integer. For example, 10.0.0.0/8
. This example is often read as "ten dot zero dot zero dot zero, slash 8."
To know how many IP addresses a slash (also known as a prefix) contains, you need to know powers of two, or at least how to use a calculator. 2address length − slash number tells you how many addresses. IPv4 contains 32 bits. So, for our earlier example, 232 - 8 = 224 = 16,777,216 addresses, a slash eight is the largest continuous block of IP addresses that were given out in the early days of IPv4. Organizations such as Apple, the US Postal Service, and AT&T all have /8s. The United States Department of Defense has 13 /8, which is the most any one organization owns: 218,103,808 addresses.
For IPv6, there are 128 bits in an address. So, while a /32 in IPv4 is one address, in IPv6 it is 2128 - 32 = 296, which roughly equals 79 billion billion billion addresses.
The Internet Control Message Protocol (ICMP) is used for testing IP. ICMP is often blocked by modern network configurations because it can leak lots of information about infrastructure topology. That being said, ICMP is often used to debug things. The tools most commonly used are ping
and traceroute
. Both use ICMP packets, which are small packets with just IP data and control messages. Control messages tell receiving networking hardware what to do with the packet. ICMP packets can also be sent by hardware to tell you that you cannot connect to what you are trying to connect to.
If you want to see ICMP packets fly by, you can use tcpdump
. In one window, we run sudo tcpdump -i any -v icmp
. We use sudo
because tcpdump
needs root permissions to access all of our networking interfaces. In another window, we run ping google.com -c 1
, which sends a single ICMP packet to google.com and times how long it takes to get it back:
$ ping google.com -c 1 PING google.com (172.217.10.110): 56 data bytes 64 bytes from 172.217.10.110: icmp_seq=0 ttl=55 time=213.133 ms --- google.com ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 213.133/213.133/213.133/0.000 ms $ sudo tcpdump -i any -v icmp tcpdump: data link type PKTAP tcpdump: listening on any, link-type PKTAP (Apple DLT_PKTAP), capture size 262144 bytes 02:59:58.459604 IP (tos 0x0, ttl 64, id 56927, offset 0, flags [none], proto ICMP (1), length 84) 192.168.1.17 > lga34s15-in-f14.1e100.net: ICMP echo request, id 28065, seq 0, length 64 02:59:58.672657 IP (tos 0x0, ttl 55, id 0, offset 0, flags [none], proto ICMP (1), length 84) lga34s15-in-f14.1e100.net > 192.168.1.17: ICMP echo reply, id 28065, seq 0, length 64
If you want to see all of the routers in between you and a website, you can use traceroute
or mtr. traceroute
just does the trace once, while mtr
runs it many times. Both are great—they show a bunch of data about all of the hops you pass through:
$ traceroute google.com traceroute to google.com (172.217.10.110), 30 hops max, 60 byte packets 1 gateway (207.251.90.49) 0.206 ms 0.209 ms 0.198 ms 2 206.252.215.173 (206.252.215.173) 1.147 ms 1.174 ms 1.308 ms 3 ae-12.a00.nycmny13.us.bb.gin.ntt.net (128.241.0.233) 9.558 ms 9.580 ms 9.571 ms 4 ae-4.r07.nycmny01.us.bb.gin.ntt.net (129.250.6.66) 1.046 ms ae-4.r08.nycmny01.us.bb.gin.ntt.net (129.250.6.74) 1.038 ms 1.053 ms 5 ae-0.a01.nycmny01.us.bb.gin.ntt.net (129.250.3.214) 3.042 ms ae-1.a01.nycmny01.us.bb.gin.ntt.net (129.250.6.69) 3.057 ms 3.013 ms 6 ae-0.tata-communications.nycmny01.us.bb.gin.ntt.net (129.250.9.114) 5.005 ms 0.740 ms 0.774 ms 7 72.14.195.232 (72.14.195.232) 1.200 ms 1.147 ms 1.139 ms 8 108.170.248.33 (108.170.248.33) 2.116 ms 2.147 ms 2.305 ms 9 216.239.62.157 (216.239.62.157) 1.119 ms 1.269 ms 216.239.62.159 (216.239.62.159) 1.232 ms 10 lga34s15-in-f14.1e100.net (172.217.10.110) 1.193 ms 1.208 ms 1.206 ms
mtr
is a curses
application, which means it uses the curses
library to draw constantly updating graphics in the Terminal. So, while the preceding is a screenshot, if you run mtr google.com
in your Terminal, you will see data constantly updating. You can press ?
to see all of the options, d
to change the display, and q
to quit.
Byte Offset |
0 |
1 |
2 |
3 |
0 |
Source Port |
Destination Port | ||
4 |
Length |
Checksum | ||
8 |
Data |
Figure 7: A picture of a UDP packet header
The User Datagram Protocol (UDP) is a layer 4 protocol for sending data on the network. On top of the basics of IP, UDP adds destination and source ports. It also adds length, which specifies how much data follows the headers. There is a checksum in the header, although it is optional. It lets you verify the data you receive. The first 8 octets or bytes contain header information, and the next length bytes (with a max of 65,507 bytes) contain data.
UDP is fire and forget—UDP packets do not retry if they don't get delivered or if they get delivered corrupt.
The Transmission Control Protocol (TCP) is the main way people connect to websites. Whenever you use your browser, your browser opens hundreds of TCP connections in the background to download data.
Byte Offset |
0 |
1 |
2 |
3 |
0 |
Source Port |
Destination Port | ||
4 |
Sequence Number | |||
8 |
Acknowledgement Number | |||
12 |
Metadata | |||
16 |
Checksum |
Urgent pointer | ||
20 |
Options + Padding | |||
… |
Data |
Figure 8: A picture of a TCP packet header
TCP is different from UDP in that the services try to verify that all data gets there and it is in order and not corrupt. Before data is sent, there is first a handshake, which agrees on the parameters of the transfer. Most of the data needed for the handshake is included in the metadata field in the packet's headers. You never really need to know about the packet headers, but it's something that people seem to love asking in interviews. The more useful point to remember is how the TCP handshake works. This is useful because TCP does have an overhead compared to UDP. UDP just starts sending data and hopes that it gets there. TCP makes sure the server is there and tells the server how much data it is getting.
Okay, now we know how packets get to a server, so let's start sending HTTP requests to a server. HTTP (Hypertext Transfer Protocol), is a plain text protocol. You send plain text requests to a server and the server sends back plain text responses.
A standard HTTP request looks like the following:
Method Path Version Header: value (0 or more rows) empty line Message body (optional)
Headers are always case insensitive and header names cannot have spaces. They are traditionally just ASCII characters and dashes. The simplest request looks like the following:
GET / HTTP/1.1
This request says, "Give me the content at /
(commonly referred to as root). I'm using HTTP version 1.1." The problem with this in modern systems is that a single IP may be hosting hundreds of domains. This is why you should always specify a host header with your request. This is not required, but many services will react differently, or not respond at all, if you do not include a host header.
GET / HTTP/1.1 Host: google.com
If you want to build and send an HTTP request by hand, you can use telnet
. HTTP servers traditionally run on port 80
(and HTTPS servers are traditionally on 443
), so to build a request to google.com, we would type the following:
$ telnet google.com 80 Trying 2607:f8b0:4006:803::200e... Connected to google.com. Escape character is '^]'. GET / HTTP/1.1 HOST: google.com Connection: close HTTP/1.1 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Date: Sun, 17 Jun 2018 21:19:15 GMT Expires: Tue, 17 Jul 2018 21:19:15 GMT Cache-Control: public, max-age=2592000 Server: gws Content-Length: 219 X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Connection: close <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/">here</A>. </BODY></HTML> Note all we typed above was: GET / HTTP/1.1 Host: google.com Connection: close
The Connection: close
bit is because we are telling the server that we will not be sending another request after we get the full response. It's not necessary but it makes working with telnet
easier. Everything else in the output is from either the telnet
program or a response from Google. Let us walk line by line through Google's response to understand what it is sending us:
HTTP/1.1 301 Moved Permanently
Google is responding with HTTP version 1.1. The response has a status code of 301
and a status message of Moved Permanently
. Status messages are mostly ignored in this day and age. This is mainly because status codes have become relatively standardized and humans rarely look at the response messages from servers. There are common messages tied with status codes, but as a server operator, you can basically return anything in the status message, as they are for humans, not for software.
Status codes are categorized by their first digit and are always three digits long. The last two digits provide more information, but status codes are almost always grouped by their first digit:
Now let us look through the headers:
Location: http://www.google.com/
The location header tells us, in the case of a 3xx response, where to make a follow-up request to get the correct content. In this specific case, it is because we included a Host
header with a domain without a www
prefix.
Content-Type: text/html; charset=UTF-8
This header is telling us the content of the response body is HTML and is encoded to UTF-8
.
UTF-8 is text encoding that is controlled by the Unicode consortium. Unicode and text encodings are very important but probably too complicated for this book. Instead, I suggest you do some research on Unicode and encodings. Stealing from Joel Spolsky (https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/), though, I will say that there is no such thing as plain text, and assuming the encoding of something you receive as ASCII or any other encoding almost always ends in a bad user experience.
Date: Sun, 17 Jun 2018 21:19:15 GMT
This header tells us when the response was sent.
Expires: Tue, 17 Jul 2018 21:19:15 GMT Cache-Control: public, max-age=2592000
The expires
and cache-control
headers are used to tell the client how long it can cache the response for. expires
says that at Tue, 17 Jul 2018 21:19:15 GMT
this content will no longer be good. The cache-control
header gives two pieces of information. public
means anyone can cache this response (a user, a CDN, and so on). max-age=2592000
means that whoever caches this can cache it for 30 days or 2592000 seconds.
Server: gws
The server
header is purely vanity. You can put whatever you want in there to tell clients which application served this response. In this case, Google is responding with the string gws
:
Content-Length: 219
The content-length
header tells us that we should receive 219 bytes of data in the body of our response. Technically, this is the count of octets, not bytes, but in most modern systems, 8 bits = 1 octet = 1 byte.
X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN
These two headers are more modern security headers and they tell browsers how to deal with the content:
Connection: close
This final header is agreeing with our request that the connection will close after the body is received. The response ends with an empty line and then a blob of HTML.
We are not going to talk about HTTPS in detail in this book, but HTTPS is HTTP with Transport Layer Security (TLS). The goal of this is to prevent anyone between the client and the server reading the contents of the HTTP request. It does not prevent someone from seeing which host the request is going to, however. In modern situations, it is recommended that HTTPS is used everywhere to promote a baseline of security for users.
Two of the most popular tools for building HTTP requests from the command line are curl
and wget
. wget
has some sane defaults and by default downloads request responses to a file. curl
sends response data to standard output by default. In general, I prefer curl
. Both tools can do the same things, but I just tend to remember the curl
command-line flags better, so that's what I use.
The most common thing I do with curl
is to request a page, throw the content away, and look at the headers:
$ curl -svL google.com > /dev/null * Rebuilt URL to: google.com/ * Trying 172.217.4.206... * TCP_NODELAY set * Connected to google.com (172.217.4.206) port 80 (#0) > GET / HTTP/1.1 > Host: google.com > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 301 Moved Permanently < Location: http://www.google.com/ < Content-Type: text/html; charset=UTF-8 < Date: Sat, 23 Jun 2018 17:20:42 GMT < Expires: Mon, 23 Jul 2018 17:20:42 GMT < Cache-Control: public, max-age=2592000 < Server: gws < Content-Length: 219 < X-XSS-Protection: 1; mode=block < X-Frame-Options: SAMEORIGIN < * Ignoring the response-body { [219 bytes data] * Connection #0 to host google.com left intact * Issue another request to this URL: 'http://www.google.com/' * Trying 216.58.194.100... * TCP_NODELAY set * Connected to www.google.com (216.58.194.100) port 80 (#1) > GET / HTTP/1.1 > Host: www.google.com > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 23 Jun 2018 17:20:42 GMT < Expires: -1 < Cache-Control: private, max-age=0 < Content-Type: text/html; charset=ISO-8859-1 < P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." < Server: gws < X-XSS-Protection: 1; mode=block < X-Frame-Options: SAMEORIGIN < Set-Cookie: 1P_JAR=2018-06-23-17; expires=Mon, 23-Jul-2018 17:20:42 GMT; path=/; domain=.google.com < Set-Cookie: NID=133=QXSN94wGKlX7EAQRKXcaaadBdvjh5zlrRRBBpLYbIbOIn4lINCGUD53jO2DAJyvT-y0Q8-nWKuYqUpplb5H3LeztzGD5CB2taBaq98gjkX_WZu0eJIT_omJznNIDi; expires=Sun, 23-Dec-2018 17:20:42 GMT; path=/; domain=.google.com; HttpOnly < Accept-Ranges: none < Vary: Accept-Encoding < Transfer-Encoding: chunked < { [2143 bytes data] * Connection #1 to host www.google.com left intact
In this example, I am passing three flags to curl
:
-s
removes a progress bar that would normally appear-v
prints the headers of your request-L
follows redirectsThe > /dev/null
at the end redirects the output to /dev/null
, which is a device in Unix and Linux that discards the information written to it. This is useful because you can inspect both the full request and the full response you get from the servers you talk to. By default, curl
makes a GET request. With the -I
header, curl
makes a HEAD request instead and prints out the response headers. You can also send any method you want to a server with the -X
header:
$ curl -sv -X TEST www.google.com/ > /dev/null * Trying 216.58.194.100... * TCP_NODELAY set * Connected to www.google.com (216.58.194.100) port 80 (#0) > TEST / HTTP/1.1 > Host: www.google.com > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 405 Method Not Allowed < Content-Type: text/html; charset=UTF-8 < Referrer-Policy: no-referrer < Content-Length: 1589 < Date: Sat, 23 Jun 2018 17:37:46 GMT < { [1589 bytes data] * Connection #0 to host www.google.com left intact
Here, we're sending a random method called TEST
because HTTP methods are actually just arbitrary strings. If we wanted to, we could also send data with our request. Both the previous and next request fail because these aren't things I would expect to work, but they are repeatable tests:
$ curl -sv -X DELETE -d '' www.google.com/ > /dev/null * Trying 216.58.194.100... * TCP_NODELAY set * Trying 2607:f8b0:4000:813::2004... * TCP_NODELAY set * Connected to www.google.com (216.58.194.100) port 80 (#0) > DELETE / HTTP/1.1 > Host: www.google.com > User-Agent: curl/7.54.0 > Accept: */* > Content-Length: 0 > Content-Type: application/x-www-form-urlencoded > < HTTP/1.1 405 Method Not Allowed < Allow: GET, HEAD < Date: Sat, 23 Jun 2018 17:41:29 GMT < Content-Type: text/html; charset=UTF-8 < Server: gws < Content-Length: 1591 < X-XSS-Protection: 1; mode=block < X-Frame-Options: SAMEORIGIN < { [1589 bytes data] * Connection #0 to host www.google.com left intact
Here, the -d
flag lets you specify the request body. By default, curl
will make any request you send a POST request when you specify that flag. The above example is overriding that default with -X
to send a different method. Instead of an empty string, you could send JSON or form data, or just random bytes, whatever you need. If you start your argument to -d
with an @
symbol, you can load data from a file. If your argument is @-
, you can read from standard in. In the following example, we take a file, look for the google
string, sort and remove duplicates, and then post to google.com
, which then ignores what we sent because we are sending way too much data:
$ cat urls.txt | grep google | sort -u | curl -d @- -sv google.com/ > /dev/null * Trying 216.58.216.206... * TCP_NODELAY set * Connected to google.com (216.58.216.206) port 80 (#0) > POST / HTTP/1.1 > Host: google.com > User-Agent: curl/7.54.0 > Accept: */* > Content-Length: 1677433 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > < HTTP/1.1 413 Request Entity Too Large < Content-Type: text/html; charset=UTF-8 < Referrer-Policy: no-referrer < Content-Length: 2398 < Date: Sat, 23 Jun 2018 18:00:20 GMT < Connection: close < { [2398 bytes data] * Closing connection 0
curl
is incredibly powerful, as is wget
. Both tools can be used to explore how a service works by sending requests and data and seeing how the server responds. They can also be used to interact with services that publish REST APIs. One of my favorite things to do is to open Google Chrome Developer tools, click on the networking tab, and right-click on a request. Chrome lets you copy a request as a curl
command, with all of the proper flags to repeat your request many times from the command line.