Network programming prelude

"A program is like a poem: you cannot write a poem without writing it."

E. W. Dijkstra

Building a medium through which machines can communicate with each other over the internet is a complicated task. There are different kinds of devices that communicate over the internet, running different OS and different versions of applications, and they need a set of agreed upon rules to exchange messages with one another. These rules of communication are called network protocols and the messages devices send to each other are referred to as network packets.

For the separation of concerns of various aspects, such as reliability, discoverability, and encapsulation, these protocols are divided into layers with higher-layer protocols stacked over the lower-layers. Each network packet is composed of information from all of these layers. These days, modern operating systems already ship with a network protocol stack implementation. In this implementation, each layer provides support for the layers above it.

At the lowest layer, we have the Physical layer and the Data Link layer protocol for specifying how packets are transmitted through wires across nodes on the internet and how they move in and out of network cards in computers. The protocols on this layer are the Ethernet and Token Ring protocols. Above that, we have the IP layer, which employs the concept of unique IDs, called IP addresses, to identify nodes on the internet. Above the IP layer, we have the Transport layer, which is a protocol that provides point-to-point delivery between two processes on the internet. Protocols such as TCP and UDP exist at this layer. Above the Transport layer, we have Application layer protocols such as HTTP and FTP, both of which are used to build rich applications. This allows for a higher level of communication, such as a chat application running on mobile devices. The entire protocol stack works in tandem to facilitate these kinds of complex interactions between applications running on computers, spread across the internet.

With devices connecting to each other over the internet and sharing information, distributed application architectures started to proliferate. Two models emerged: the decentralized model, popularly known as the peer-to-peer model, and the centralized model, which is widely known as the client-server model. The later is more common out of the two these days. Our focus in this chapter will be on the client-server model of building network applications, especially on the Transport layer.

In major operating systems, the Transport layer of the network stack is exposed to developers under a family of APIs named Sockets. It includes a set of interfaces, which are used to set up a communication link between two processes. Sockets allow you to communicate data back and forth between two processes, either locally or remotely, without requiring the developer to have an understanding of the underlying network protocol.

The Socket API's roots lie in the Berkley Software Distribution (BSD), which was the first operating system to provide a networking stack implementation with a socket API in 1983. It serve as the reference implementation for networking stacks in major operating systems today. In Unix-like systems, a socket follows the same philosophy of everything is a file and exposes a file descriptor API. This means that one can read and write data from a socket just like files.

Sockets are file descriptors (an integer) that point to a descriptor table of the process that's managed by the kernel. The descriptor table contains a mapping of file descriptors to file entry structures, which contains the actual buffer for the data that's sent to the socket.

The Socket API acts primarily at the TCP/IP layer. On this layer, the sockets that we create are categorized on various levels:

  • Protocol: Depending on the protocol, we can either have a TCP socket or a UDP socket. TCP is a stateful streaming protocol that provides the ability to deliver messages in a reliable fashion, whereas UDP is a stateless and unreliable protocol.
  • Communication kind: Depending on whether we are communicating with processes on the same machine or processes on remote machines, we can either have internet sockets or Unix domain sockets. Internet sockets are used for exchanging messages between processes on remote machines. It is represented by a tuple of an IP address and a port. Two processes that want to communicate remotely must use IP sockets. Unix domain sockets are used for communication between processes that run on the same machine. Here, instead of an IP address-port pair, it takes a filesystem path. For instance, databases use Unix domain sockets to expose connection endpoints.
  • I/O model: Depending on how we read and write data to a socket, we can create sockets of two kinds: blocking sockets and non-blocking sockets.

Now that we know more about sockets, let's explore the client-server model a bit more. In this model of networking, the usual flow of setting up two machines to communicate with each other follows this process: the server creates a socket and binds it to an IP address-port pair before specifying a protocol, which can be TCP or UDP. It then starts listening for connections from clients. The client, on the other hand, creates a connecting socket and connects to the given IP address and port. In Unix, processes can create a socket using the socket system. This call gives back a file descriptor that the program can use to perform read and write calls to the client or to the server.

Rust provides us with the net module in the standard library. This contains the aforementioned networking primitives on the Transport layer. For communicating over TCP, we have the TcpStream and TcpListener types. For communicating over UDP, we have the UdpSocket type. The net module also provides proper data types for representing IP addresses and supports both v4 and v6 versions.

Building network applications that are reliable involves several considerations. If you are okay with few of the packets getting dropped between message exchanges, you can go with UDP sockets, but if you cannot afford to have packets dropped or want to have message delivery in sequence, you must use TCP sockets. The UDP protocol is fast and came much later to cater to needs where you require minimal latency in the delivery of packets and can deal with a few packets being dropped. For example, a video chat application uses UDP, but you aren't particularly affected if a few of the frames drop from the video stream. UDPs are used in cases where you are tolerant of no delivery guarantees. We'll focus our discussion on TCP sockets in this chapter, as it's the most used protocol by the majority of network applications that need to be reliable.

Another factor to consider, is how well and efficient your application is able to serve clients. From a technical standpoint, this translates to choosing the I/O model of sockets.

I/O is an acronym for Input/Output, and in this context, it is a catch-all phrase that simply denotes reading and writing bytes to sockets.

Choosing between blocking and non-blocking sockets changes its architecture, the way we write our code, and how it scales to clients. Blocking sockets give you a synchronous I/O model, while non-blocking sockets let you do asynchronous I/O. On platforms that implement the Socket API, such as Unix, sockets are created in blocking mode by default. This entails the default I/O model in major network stacks following the synchronous model. Let's explore both of these models next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset