Sending an HTTP request

One of my favorite interview questions, which seems to be used by just about everyone, is to ask the interviewee to describe all of the steps of an HTTP request. There are lots of ways you could dig into this. I once had someone explain to me how the computer interprets keyboard presses. While that is fascinating, for now we will walk through the networking side of things.

Let's say you type natwelch.com/resume into your browser. We will go through all of the steps that happen for that to deliver content for your browser to render.

DNS

DNS (domain name system): Computers talk to each other using IP addresses and DNS translates a domain into an IP address. Technically, DNS does a little more than that, but let's start by describing domain names and go from there. In our example, natwelch.com is the domain and /resume is the path.

DNS

Figure 3: In all three of the above domains, natwelch on the left is the domain. In the first example, com is the TLD. In the second, uk is the TLD, and co is the SLD. In the third example, app is the TLD.

A domain name is made up of a few parts:

  • Top Level Domain (TLD): This is the last part of the domain, such as com, net, org, app, horse, and so on.
  • Dots: These are periods and the separators in domain names. Also, all domains technically end in a dot. This is because a dot at the end means a domain is absolute, while one without a dot is relative. Most people imply the end dot, such as in the preceding image, but technically it is required to imply the absolute end of the domain.
  • Second Level Domain (SLD): The part to the left of the rightmost dot. This is actually controversial. In some cases, people only call the co in .co.uk the SLD. Others say that natwelch in natwelch.com is a SLD. There seems to be no real consensus.
  • Subdomain: This is the piece of text to the left of the domain. If we have www.natwelch.com, www is the subdomain.

Domain names are actually parsed recursively like a tree.

DNS

Figure 4: In the preceding example, you can see a very rough idea of how data is controlled in the DNS tree. The root servers control the root dot. There is a name server for each TLD. Then, each domain has name servers for its subdomains and records.

DNS requests ask for data recursively via a tree. Usually the flow is like so:

  • You ask a local cache (either on your machine or on the network) if it knows where natwelch.com is.
  • If the local cache doesn't know where, then you ask the DNS server if it knows where natwelch.com is.
  • If the DNS server doesn't know, then you ask the "root" servers (aka "dot" servers). They won't know where natwelch.com is, but they will know where the .com servers are.
    DNS

    Figure 5: root-servers.org is the definitive page for each of the 13 root server organizations. It provides a map of where all root server machines are hosted and who maintains them. It also provides statistics about how the root servers are used and how to connect to them.

  • The .com servers won't know where natwelch.com's servers are, but they will know the name servers that do know. This is known as an NS record and it points to the name servers for a domain.
  • The name servers will then return whatever record was requested.

When you are configuring DNS, you will often write out zone records, so your name server will know where to point things. Following are some common record types:

A

An array of IPv4 addresses where this domain is located.

CNAME

This domain is an alias for this other domain.

AAAA

An array of IPv6 addresses where this domain is located.

MX

Used for mail delivery.

NS

Where this domain's name servers are.

TXT

Text strings for descriptions.

ANAME or ALIAS

Non-standard record types for putting a CNAME on the root of a domain. By default, CNAME records can only be on subdomains.

SOA

A Start of Authority record. It contains a bunch of information about who is the administrator for a DNS zone. It also describes the current state of the zone to help other DNS servers properly cache the data about the zone.

There are lots of other possible DNS records, but the preceding are the most common.

dig

To make DNS requests from the command line, you can use the dig tool, which is very powerful for exploring DNS. The most basic usage is to run dig with a domain name:

; <<>> DiG 9.10.6 <<>> natwelch.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5354
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1280
;; QUESTION SECTION:
;natwelch.com.			IN	A

;; ANSWER SECTION:
natwelch.com.		180	IN	A	35.190.39.138

;; Query time: 55 msec
;; SERVER: 2604:2000:1281:f9:2e30:33ff:fe5f:55af#53(2604:2000:1281:f9:2e30:33ff:fe5f:55af)
;; WHEN: Tue Jun 19 01:45:59 UTC 2018
;; MSG SIZE  rcvd: 57

The only real line that matters is:

natwelch.com. 180 IN A 35.190.39.138

The line says that this natwelch.com. record has 180 seconds until it is no longer valid. It returned a single A record, with the IP 35.190.39.138. If there were multiple A records, there would be multiple lines.

dig also lets you specify a DNS server that you wish to query.

$ dig www.natwelch.com @8.8.8.8 +nocomment

; <<>> DiG 9.10.6 <<>> www.natwelch.com @8.8.8.8 +nocomment
;; global options: +cmd
;www.natwelch.com.		IN	A
www.natwelch.com.	183	IN	CNAME	natwelch.com.
natwelch.com.		183	IN	A	35.190.39.138
;; Query time: 45 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Jun 19 01:58:39 UTC 2018
;; MSG SIZE  rcvd: 75

@8.8.8.8 specifies that we want to use Google's DNS server at 8.8.8.8 to provide us with an answer. Also, notice here that I requested a domain with a CNAME instead of an A record, and the DNS server also responded with the appropriate A record, saving me a round trip. DNS don't have to do this, but many do.

The preceding +nocomment simplifies the output a bit. If you wanted very short output, you can use +short:

$ dig google.com @8.8.8.8 +short
172.217.12.206

You can also add a record type that you would like to query. In the following example, we ask gmail.com for its MX records:

$ dig gmail.com MX +nocomment

; <<>> DiG 9.10.6 <<>> gmail.com MX +nocomment
;; global options: +cmd
;gmail.com.			IN	MX
gmail.com.		1330	IN	MX	5 gmail-smtp-in.l.google.com.
gmail.com.		1330	IN	MX	10 alt1.gmail-smtp-in.l.google.com.
gmail.com.		1330	IN	MX	20 alt2.gmail-smtp-in.l.google.com.
gmail.com.		1330	IN	MX	30 alt3.gmail-smtp-in.l.google.com.
gmail.com.		1330	IN	MX	40 alt4.gmail-smtp-in.l.google.com.
;; Query time: 47 msec
;; SERVER: 2604:2000:1281:f9:2e30:33ff:fe5f:55af#53(2604:2000:1281:f9:2e30:33ff:fe5f:55af)
;; WHEN: Tue Jun 19 02:01:28 UTC 2018
;; MSG SIZE  rcvd: 161

You can do lots more with dig, so run man dig and find out about all of its cool uses.

Ethernet and TCP/IP

Now that we've talked about translating a domain name into an IP, how do you connect two IPs? The most common way for thinking about networking is the Open Systems Interconnection (OSI) model. The OSI model describes networking as layers:

  • Layer 1: Physical layer
  • Layer 2: Data link layer
  • Layer 3: Network layer
  • Layer 4: Transport layer
  • Layer 5: Session layer
  • Layer 6: Presentation layer
  • Layer 7: Application layer

In our example of an HTTP request, layer 1 is the physical connection and layer 2 contains Wi-Fi, Ethernet, and other protocols used for controlling a physical connection. Layer 3 is for the IP, which is how things are routed. Layer 4 is for TCP and UDP. Layers 5 and 6 are rarely talked about, but they are where things such as SSL encapsulation and other networking wrappers happen. Finally, layer 7 is where HTTP and other user-level messages are sitting.

Layers are often useful for describing features of networking tools. For example, Amazon Web Services (AWS) sells two separate load balancers (LB). Its classic Elastic LB is a layer 4 LB. It can only do routing based on what IP and port a request comes in on. Amazon's newer application Elastic LB is a layer 7 LB because it can route based on details inside the HTTP request, such as headers and the request path.

Ethernet

The physical and data link layers of networking are often generalized as "Ethernet." Technically, a lot more is going on here, but the important information to know is that every networking device has a MAC address. MAC (media access control), but in modern-day networking, MACs are just a way of specifying a network connection, such as an Ethernet port, Wi-Fi adapter, Bluetooth connection, or cell phone antenna. Most networking devices come with a default MAC address burned into them, although you can usually change it if needed, because every device on a network needs a unique MAC address.

A MAC address is six octets long, which means it contains 48 bits of data. That means there are 248 possible MAC addresses. MAC addresses are usually shared as 12 hex digits. For example, da:99:9c:e1:a5:f3.

IP

IP is the protocol for routing packets around the internet. There are three common packets floating around on the internet: TCP, UDP, and ICMP. There are others as well and you can send arbitrary data, but IP is what tells routers how to route your packets and provides a wrapper around them.

A packet is a bunch of bytes sent over the network. It is grouped by bytes and the first few are headers that store data about the connection. Each protocol over the network has extra data. IP starts with the connection data, then inside of it are the higher layers. So, layer 2 wraps layer 3, which wraps layer 4, which wraps layer 5, and so on.

IP promises absolutely nothing, so there is no guarantee that your data will ever make it to a host, and if you don't control that host, there is no central monitoring to see how the data gets there.

Each packet that traverses IP has a source IP address and destination IP address. It can contain other metadata as well.

I won't go into detail here, but reading about Address Resolution Protocol (ARP), Interior Gateway Protocol (IGP), and Border Gateway Protocol (BGP) will explain the complicated mess of how each router that receives a packet determines where to send the packet next on its way through the mesh of the internet toward the packet's final destination IP address.

Some books that go into detail on these topics include:

  • Computer Networks by Andrew S. Tanenbaum
  • TCP/IP Illustrated, Volume 1 by W. Richard Stevens

CIDR notation

Classless Inter-Domain Routing (CIDR) notation is a compact way to describe a range of IP addresses. CIDR notation contains an IP address, a slash, and an integer. For example, 10.0.0.0/8. This example is often read as "ten dot zero dot zero dot zero, slash 8."

To know how many IP addresses a slash (also known as a prefix) contains, you need to know powers of two, or at least how to use a calculator. 2address length − slash number tells you how many addresses. IPv4 contains 32 bits. So, for our earlier example, 232 - 8 = 224 = 16,777,216 addresses, a slash eight is the largest continuous block of IP addresses that were given out in the early days of IPv4. Organizations such as Apple, the US Postal Service, and AT&T all have /8s. The United States Department of Defense has 13 /8, which is the most any one organization owns: 218,103,808 addresses.

For IPv6, there are 128 bits in an address. So, while a /32 in IPv4 is one address, in IPv6 it is 2128 - 32 = 296, which roughly equals 79 billion billion billion addresses.

ICMP

The Internet Control Message Protocol (ICMP) is used for testing IP. ICMP is often blocked by modern network configurations because it can leak lots of information about infrastructure topology. That being said, ICMP is often used to debug things. The tools most commonly used are ping and traceroute. Both use ICMP packets, which are small packets with just IP data and control messages. Control messages tell receiving networking hardware what to do with the packet. ICMP packets can also be sent by hardware to tell you that you cannot connect to what you are trying to connect to.

If you want to see ICMP packets fly by, you can use tcpdump. In one window, we run sudo tcpdump -i any -v icmp. We use sudo because tcpdump needs root permissions to access all of our networking interfaces. In another window, we run ping google.com -c 1, which sends a single ICMP packet to google.com and times how long it takes to get it back:

$ ping google.com -c 1
PING google.com (172.217.10.110): 56 data bytes
64 bytes from 172.217.10.110: icmp_seq=0 ttl=55 time=213.133 ms

--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 213.133/213.133/213.133/0.000 ms

$ sudo tcpdump -i any -v icmp
tcpdump: data link type PKTAP
tcpdump: listening on any, link-type PKTAP (Apple DLT_PKTAP), capture size 262144 bytes
02:59:58.459604 IP (tos 0x0, ttl 64, id 56927, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.1.17 > lga34s15-in-f14.1e100.net: ICMP echo request, id 28065, seq 0, length 64
02:59:58.672657 IP (tos 0x0, ttl 55, id 0, offset 0, flags [none], proto ICMP (1), length 84)
    lga34s15-in-f14.1e100.net > 192.168.1.17: ICMP echo reply, id 28065, seq 0, length 64

If you want to see all of the routers in between you and a website, you can use traceroute or mtr. traceroute just does the trace once, while mtr runs it many times. Both are great—they show a bunch of data about all of the hops you pass through:

$ traceroute google.com
traceroute to google.com (172.217.10.110), 30 hops max, 60 byte packets
 1  gateway (207.251.90.49)  0.206 ms  0.209 ms  0.198 ms
 2  206.252.215.173 (206.252.215.173)  1.147 ms  1.174 ms  1.308 ms
 3  ae-12.a00.nycmny13.us.bb.gin.ntt.net (128.241.0.233)  9.558 ms  9.580 ms  9.571 ms
 4  ae-4.r07.nycmny01.us.bb.gin.ntt.net (129.250.6.66)  1.046 ms ae-4.r08.nycmny01.us.bb.gin.ntt.net (129.250.6.74)  1.038 ms  1.053 ms
 5  ae-0.a01.nycmny01.us.bb.gin.ntt.net (129.250.3.214)  3.042 ms ae-1.a01.nycmny01.us.bb.gin.ntt.net (129.250.6.69)  3.057 ms  3.013 ms
 6  ae-0.tata-communications.nycmny01.us.bb.gin.ntt.net (129.250.9.114)  5.005 ms  0.740 ms  0.774 ms
 7  72.14.195.232 (72.14.195.232)  1.200 ms  1.147 ms  1.139 ms
 8  108.170.248.33 (108.170.248.33)  2.116 ms  2.147 ms  2.305 ms
 9  216.239.62.157 (216.239.62.157)  1.119 ms  1.269 ms 216.239.62.159 (216.239.62.159)  1.232 ms
10  lga34s15-in-f14.1e100.net (172.217.10.110)  1.193 ms  1.208 ms  1.206 ms
ICMP

Figure 6: mtr provides the same information as the traceroute above, but instead of preforming it once, it constantly runs the traceroute and records how it changes over time.

mtr is a curses application, which means it uses the curses library to draw constantly updating graphics in the Terminal. So, while the preceding is a screenshot, if you run mtr google.com in your Terminal, you will see data constantly updating. You can press ? to see all of the options, d to change the display, and q to quit.

UDP

Byte Offset

0

1

2

3

0

Source Port

Destination Port

4

Length

Checksum

8

Data

Figure 7: A picture of a UDP packet header

The User Datagram Protocol (UDP) is a layer 4 protocol for sending data on the network. On top of the basics of IP, UDP adds destination and source ports. It also adds length, which specifies how much data follows the headers. There is a checksum in the header, although it is optional. It lets you verify the data you receive. The first 8 octets or bytes contain header information, and the next length bytes (with a max of 65,507 bytes) contain data.

UDP is fire and forget—UDP packets do not retry if they don't get delivered or if they get delivered corrupt.

TCP

The Transmission Control Protocol (TCP) is the main way people connect to websites. Whenever you use your browser, your browser opens hundreds of TCP connections in the background to download data.

Byte Offset

0

1

2

3

0

Source Port

Destination Port

4

Sequence Number

8

Acknowledgement Number

12

Metadata

16

Checksum

Urgent pointer

20

Options + Padding

Data

Figure 8: A picture of a TCP packet header

TCP is different from UDP in that the services try to verify that all data gets there and it is in order and not corrupt. Before data is sent, there is first a handshake, which agrees on the parameters of the transfer. Most of the data needed for the handshake is included in the metadata field in the packet's headers. You never really need to know about the packet headers, but it's something that people seem to love asking in interviews. The more useful point to remember is how the TCP handshake works. This is useful because TCP does have an overhead compared to UDP. UDP just starts sending data and hopes that it gets there. TCP makes sure the server is there and tells the server how much data it is getting.

TCP

Figure 9: Here, the source client on the left is initiating a connection with a target. It starts by doing a three-way handshake. Then, the connection is established and the client starts sending the target data. When the client is done, it initiates a graceful shutdown of the open connection with a three-way handshake again. When someone says SYN, SYN/ACK, ACK, or FIN in regards to TCP, they are referring to flags set in the TCP packet headers during the handshake process.

HTTP

Okay, now we know how packets get to a server, so let's start sending HTTP requests to a server. HTTP (Hypertext Transfer Protocol), is a plain text protocol. You send plain text requests to a server and the server sends back plain text responses.

A standard HTTP request looks like the following:

Method Path Version
Header: value (0 or more rows)
empty line
Message body (optional)

Headers are always case insensitive and header names cannot have spaces. They are traditionally just ASCII characters and dashes. The simplest request looks like the following:

GET / HTTP/1.1

This request says, "Give me the content at / (commonly referred to as root). I'm using HTTP version 1.1." The problem with this in modern systems is that a single IP may be hosting hundreds of domains. This is why you should always specify a host header with your request. This is not required, but many services will react differently, or not respond at all, if you do not include a host header.

GET / HTTP/1.1
Host: google.com

If you want to build and send an HTTP request by hand, you can use telnet. HTTP servers traditionally run on port 80 (and HTTPS servers are traditionally on 443), so to build a request to google.com, we would type the following:

$ telnet google.com 80
Trying 2607:f8b0:4006:803::200e...
Connected to google.com.
Escape character is '^]'.
GET / HTTP/1.1
HOST: google.com
Connection: close

HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sun, 17 Jun 2018 21:19:15 GMT
Expires: Tue, 17 Jul 2018 21:19:15 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Connection: close

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
Note all we typed above was: 
GET / HTTP/1.1
Host: google.com
Connection: close

The Connection: close bit is because we are telling the server that we will not be sending another request after we get the full response. It's not necessary but it makes working with telnet easier. Everything else in the output is from either the telnet program or a response from Google. Let us walk line by line through Google's response to understand what it is sending us:

HTTP/1.1 301 Moved Permanently

Google is responding with HTTP version 1.1. The response has a status code of 301 and a status message of Moved Permanently. Status messages are mostly ignored in this day and age. This is mainly because status codes have become relatively standardized and humans rarely look at the response messages from servers. There are common messages tied with status codes, but as a server operator, you can basically return anything in the status message, as they are for humans, not for software.

Status codes are categorized by their first digit and are always three digits long. The last two digits provide more information, but status codes are almost always grouped by their first digit:

  • 1xx: Messages are usually telling the client to continue and sometimes asking to change version
  • 2xx: Success!
  • 3xx: Redirection to somewhere (see the headers)
  • 4xx: Bad request, client error, or user error
  • 5xx: Server error

Now let us look through the headers:

Location: http://www.google.com/

The location header tells us, in the case of a 3xx response, where to make a follow-up request to get the correct content. In this specific case, it is because we included a Host header with a domain without a www prefix.

Content-Type: text/html; charset=UTF-8

This header is telling us the content of the response body is HTML and is encoded to UTF-8.

Note

UTF-8 is text encoding that is controlled by the Unicode consortium. Unicode and text encodings are very important but probably too complicated for this book. Instead, I suggest you do some research on Unicode and encodings. Stealing from Joel Spolsky (https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/), though, I will say that there is no such thing as plain text, and assuming the encoding of something you receive as ASCII or any other encoding almost always ends in a bad user experience.

Date: Sun, 17 Jun 2018 21:19:15 GMT

This header tells us when the response was sent.

Expires: Tue, 17 Jul 2018 21:19:15 GMT
Cache-Control: public, max-age=2592000

The expires and cache-control headers are used to tell the client how long it can cache the response for. expires says that at Tue, 17 Jul 2018 21:19:15 GMT this content will no longer be good. The cache-control header gives two pieces of information. public means anyone can cache this response (a user, a CDN, and so on). max-age=2592000 means that whoever caches this can cache it for 30 days or 2592000 seconds.

Server: gws

The server header is purely vanity. You can put whatever you want in there to tell clients which application served this response. In this case, Google is responding with the string gws:

Content-Length: 219

The content-length header tells us that we should receive 219 bytes of data in the body of our response. Technically, this is the count of octets, not bytes, but in most modern systems, 8 bits = 1 octet = 1 byte.

X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN

These two headers are more modern security headers and they tell browsers how to deal with the content:

Connection: close

This final header is agreeing with our request that the connection will close after the body is received. The response ends with an empty line and then a blob of HTML.

Note

We are not going to talk about HTTPS in detail in this book, but HTTPS is HTTP with Transport Layer Security (TLS). The goal of this is to prevent anyone between the client and the server reading the contents of the HTTP request. It does not prevent someone from seeing which host the request is going to, however. In modern situations, it is recommended that HTTPS is used everywhere to promote a baseline of security for users.

curl and wget

Two of the most popular tools for building HTTP requests from the command line are curl and wget. wget has some sane defaults and by default downloads request responses to a file. curl sends response data to standard output by default. In general, I prefer curl. Both tools can do the same things, but I just tend to remember the curl command-line flags better, so that's what I use.

The most common thing I do with curl is to request a page, throw the content away, and look at the headers:

$ curl -svL google.com > /dev/null
* Rebuilt URL to: google.com/
*   Trying 172.217.4.206...
* TCP_NODELAY set
* Connected to google.com (172.217.4.206) port 80 (#0)
> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Sat, 23 Jun 2018 17:20:42 GMT
< Expires: Mon, 23 Jul 2018 17:20:42 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
<
* Ignoring the response-body
{ [219 bytes data]
* Connection #0 to host google.com left intact
* Issue another request to this URL: 'http://www.google.com/'
*   Trying 216.58.194.100...
* TCP_NODELAY set
* Connected to www.google.com (216.58.194.100) port 80 (#1)
> GET / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sat, 23 Jun 2018 17:20:42 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
< P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
< Server: gws
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Set-Cookie: 1P_JAR=2018-06-23-17; expires=Mon, 23-Jul-2018 17:20:42 GMT; path=/; domain=.google.com
< Set-Cookie: NID=133=QXSN94wGKlX7EAQRKXcaaadBdvjh5zlrRRBBpLYbIbOIn4lINCGUD53jO2DAJyvT-y0Q8-nWKuYqUpplb5H3LeztzGD5CB2taBaq98gjkX_WZu0eJIT_omJznNIDi; expires=Sun, 23-Dec-2018 17:20:42 GMT; path=/; domain=.google.com; HttpOnly
< Accept-Ranges: none
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
<
{ [2143 bytes data]
* Connection #1 to host www.google.com left intact

In this example, I am passing three flags to curl:

  • -s removes a progress bar that would normally appear
  • -v prints the headers of your request
  • -L follows redirects

The > /dev/null at the end redirects the output to /dev/null, which is a device in Unix and Linux that discards the information written to it. This is useful because you can inspect both the full request and the full response you get from the servers you talk to. By default, curl makes a GET request. With the -I header, curl makes a HEAD request instead and prints out the response headers. You can also send any method you want to a server with the -X header:

$ curl -sv -X TEST www.google.com/ > /dev/null
*   Trying 216.58.194.100...
* TCP_NODELAY set
* Connected to www.google.com (216.58.194.100) port 80 (#0)
> TEST / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 405 Method Not Allowed
< Content-Type: text/html; charset=UTF-8
< Referrer-Policy: no-referrer
< Content-Length: 1589
< Date: Sat, 23 Jun 2018 17:37:46 GMT
<
{ [1589 bytes data]
* Connection #0 to host www.google.com left intact

Here, we're sending a random method called TEST because HTTP methods are actually just arbitrary strings. If we wanted to, we could also send data with our request. Both the previous and next request fail because these aren't things I would expect to work, but they are repeatable tests:

$ curl -sv -X DELETE -d '' www.google.com/ > /dev/null
*   Trying 216.58.194.100...
* TCP_NODELAY set
*   Trying 2607:f8b0:4000:813::2004...
* TCP_NODELAY set
* Connected to www.google.com (216.58.194.100) port 80 (#0)
> DELETE / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Length: 0
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 405 Method Not Allowed
< Allow: GET, HEAD
< Date: Sat, 23 Jun 2018 17:41:29 GMT
< Content-Type: text/html; charset=UTF-8
< Server: gws
< Content-Length: 1591
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
<
{ [1589 bytes data]
* Connection #0 to host www.google.com left intact

Here, the -d flag lets you specify the request body. By default, curl will make any request you send a POST request when you specify that flag. The above example is overriding that default with -X to send a different method. Instead of an empty string, you could send JSON or form data, or just random bytes, whatever you need. If you start your argument to -d with an @ symbol, you can load data from a file. If your argument is @-, you can read from standard in. In the following example, we take a file, look for the google string, sort and remove duplicates, and then post to google.com, which then ignores what we sent because we are sending way too much data:

$ cat urls.txt | grep google | sort -u | curl -d @- -sv google.com/ > /dev/null
*   Trying 216.58.216.206...
* TCP_NODELAY set
* Connected to google.com (216.58.216.206) port 80 (#0)
> POST / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Length: 1677433
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 413 Request Entity Too Large
< Content-Type: text/html; charset=UTF-8
< Referrer-Policy: no-referrer
< Content-Length: 2398
< Date: Sat, 23 Jun 2018 18:00:20 GMT
< Connection: close
<
{ [2398 bytes data]
* Closing connection 0

curl is incredibly powerful, as is wget. Both tools can be used to explore how a service works by sending requests and data and seeing how the server responds. They can also be used to interact with services that publish REST APIs. One of my favorite things to do is to open Google Chrome Developer tools, click on the networking tab, and right-click on a request. Chrome lets you copy a request as a curl command, with all of the proper flags to repeat your request many times from the command line.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset