Appendix A: Usage of Internet for Bioinformatics

RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

Bioinformatics is the branch of science that deals with the processing of biological data with the help of computer science. In sequencing gene or protein data, for example, voluminous amounts of data are processed, analyzed for deriving biological meaning and, ultimately, stored for further use with the help of bioinformatics. Therefore, it is a hybrid science, comprising biology and computer science, that is conceptualizing biology in terms of macromolecules (in the sense of physical chemistry) and then applying “informatics” techniques (derived from disciplines such as applied mathematics, computer science, and statistics) to understand and organize the information associated with these molecules on a large scale (Luscombe et al., 2001).

The internet is an interconnected network of information across the globe that provides remote communications. The introduction of Transmission Control Protocol (TCP) and Internet Protocol (IP) (or, together, TCP/IP) by the Advanced Research Project Agency (ARPA) in 1969 evolved remote communication radically. The IP number (example; 96.47.32.230) that recognizes the server and the computer is unique. Since it is hard to recognize these long strings of numbers, IP addresses have been associated with a Fully Quantified Domain Name (FQDN). For example, the IP address given above is associated with www.gadvasu.in, a website belonging to the Guru Angad Dev Veterinary and Animal Science University. The top level of domain name includes .com, .edu, .gov, .org, and .in.

There are various facilities for the bioinformatics that are provided by the internet, such as:

  • Electronic mail (email);
  • Electronic journals;
  • Educational resource materials on bioinformatics;
  • Biological databases (NCBI, EBI, Genome browser, DDBJ, PDB, TIGR);
  • Software tools;
  • World Wide Web (www) searches.

Email is the most convenient way of writing/replying and receiving mail electronically. In this, an email address is assigned to each sender or receiver, in the form [email protected]. The sender can attach files that can be sent with the email instantaneously. The speed of sending an email message with attachments varies with the speed of the local area network (LAN), the time of sending the email (because of network congestion), the size of attachments, and so on.

Despite several advantages offered by email in transmitting a message, users do experience difficulties in transmitting files, especially with attachments. Microsoft Exchange is one of the common platforms to send an email, but sending emails across the platform hinders decoding or detaching files by the receivers. Therefore, the urgency of a protocol has arisen, where files can be transferred to a remote server quickly. This protocol is called File Transfer Protocol (FTP). In FTP, a connection is made between the user’s computer and a remote computer, files are transferred at a faster rate, and the connection remains effective until the session is over. TCP facilitates files back and forth between FTP‐server and FTP‐client.

In the case of public or anonymous FTP, server sign‐in may not be required, but if using a private FTP server, sign‐in with a valid username and password to initiate the data transfer is a must. Data can be transferred via FTP by three different ways: stream mode, block mode, and compressed mode. In stream mode, the file is transferred in a continuous stream from the port without any data formatting. This occurs when server and client have identical operating systems. In block mode, data are transferred into blocks of information, such as header, byte count and actual data. In compressed mode, large data files are compressed and modified by codes and then transferred. In response to the need to transfer sensitive data, the need for more security has arisen. In 1994, Netscape developed a Secure Sockets Layers (SSL) protocol and FTP transferred, now armed with SSL protection called SFTP.

Although FTP and SFTP have tremendous use in file transfer from one computer to another computer, there are a number of limitations. In FTP, the user has to enter a particular directory, and has access to only those files that are there in the appropriate directory of the server’s computer. However, the user cannot access another user’s directory located on the different server. This inherent drawback has led to the development of an interactive client application system called Distributed Document Delivery Systems (DDDS). DDBS, commonly known as the World Wide Web (www), enabled navigation to the Web without prior knowledge of the location of the server with the directory and information. The need to access files over the Web led to the development of Hyper Text Transfer Protocol (HTTP or http) as the means of obtaining information on the World Wide Web. The Web also provides room for keeping other information through the linkage (called hyperlinked files). The WWW is now the most popular method of using the internet, providing access to any web pages by entering http:// in front of the address. Today, browsers do not require the user to type “HTTP”, because it is the default method of communication for accessing the Web.

Browsers are client‐server applications, and are connected to a remote website to download requested information. Since the information is retrieved in a fast and continuous manner, a platform‐independent format is required to display the information. The development of Hyper Text Markup Language (HTML), a text‐based format, has allowed graphics, images and other information to be displayed in a separate file whose form is standard to most users.

Example of a small HTML document

Information pertaining to the above HTML codes will appear on the website as:

Meaning of HTML notations

  • All documents in HTML will start with a declaration as <!DOCTYPE html>.
  • All HTML documents will begin with <html> and end with </html>.
  • The body of information that is displayed is that in between <body> and </body>.
  • <h1> describes the first heading, which ends with </h1>.
  • Likewise, the order number will start with <o1> and ends with </o1>.

The Internet also a way to find information about bioinformatics, electronic journals and online tools on bioinformatics, as well as providing access to biological databases. Different types of online tools of bioinformatics, and various databases of genomics, transcriptomics and proteomics are elaborated elsewhere in this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset