CHAPTER

2

Web Pages Using Web Standards

Chapter Objectives

• Introduce fundamental concepts of web computing

• Introduce XHTML and Cascading Style Sheets

• Introduce how HTTP supports web browser and web server interactions

 

2.1 Overview

 

A web browser is a graphic user interface for a user to interact with various web applications. A web browser communicates with the web servers that host the web applications over the Internet. A web browser can send requests to a web server for data or service. The web server will reply and send back the response data in a language called HTML, short for Hypertext Markup Language. The web browser can then present the response data to the user after some rendering directives or defaults.

The focus of this chapter is the introduction to the basics of hypertext markup languages. There are several variations of such languages in use now. The popular HTML version 4 is more lenient to syntax errors and has limited support for presenting data in various presentation devices such as PCs, PDAs, or cellular phones. XHTML, short for Extensible Hypertext Markup Language, rewrites HTML in XML (Extensible Markup Language, to be covered in Chapter 3) for better supporting flexible data presentation on different devices. At this time the browser support for XHTML is still limited. This chapter introduces a subset of XHTML that is supported by any web browser that supports the traditional HTML version 4, and the chapter treats HTML and XHTML as synonyms.

A typical web application has four tiers: the presentation tier on the client side (web browsers), the presentation tier on the web servers, the business logic tier on the application servers, and the database tier on the database servers. HTML (XHTML) will be introduced for defining logical data (contents) structures, and Cascading Style Sheets, or CSS, will be introduced as an important mechanism for defining presentation styles of HTML elements. Because XHTML is a special dialect of XML, the HTML introduction in this chapter also serves as the first-iteration introduction to XML discussed in the following chapter.

Web browsers and web servers communicate through a simple application protocol named HTTP, short for Hypertext Transfer Protocol, on top of the TCP/IP network transportation layer. This chapter will explain HTTP basics and how HTML forms can be used as the main mechanism for submitting user data to a web server application.

2.2 HTML Basics

 

HTML is a markup language. An HTML document is basically a text document marked up with instructions for logical document structure and document presentation. There are multiple versions of HTML. Whereas the earlier HTML versions used a more relaxed syntax and focused more on document presentation than on document structure, the latest HTML, called XHTML, uses the stricter and more standard XML syntax to mark up text document structures and depends on the separate CSS to control the presentation of the document. This separation of document structure and document presentation, even though not complete yet, is essential for supporting the same document's being rendered by various modern presentation devices, including PCs and cell phones, that must use different presentation markups. The HTML concepts and examples in this chapter are based on XHTML 1.0, which is now supported by all the latest web browsers, including Microsoft's Internet Explorer and Mozilla's Firefox.

2.2.1 Tags, Elements, and Attributes

An HTML tag name is a predefined keyword, such as html, body, head, title, p, b, all in lowercase, for describing document structure or presentation.

A tag name is used in the form of a start tag or an end tag. A start tag is a tag name enclosed in angle brackets, < and >, like <html> and <p>. An end tag is the same as the corresponding start tag, except that it has a forward slash / immediately before the tag name, like </html> and </p>.

An element consists of a start tag and a matching end tag based on the same tag name, with optional text or other elements, called the element value, between them. The following are some element examples:

images

Although the elements can be nested, they cannot be partially nested: the end tag of an element must come after the end tags of all its nested elements (first starting, last ending). The following example is not a valid element because it violates the above rule:

images

The newline character, the tab character, and the space character are collectively called the white-space characters. A sequence of white-space characters acts like a single space for the web browser's data presentation. Therefore, in normal situations, an HTML document's formatting is not important (it will not change its presentation in web browsers) as long as you do not remove all white-space characters between successive words. As a result, the following two html elements are equivalent:

images

If an element contains no value, the start tag and the end tag can be combined into one tag as <tagName/> (there are some special tags, like script, for which such a combination cannot be used). Therefore, the following two p elements are equivalent:

images

The start tag of an element may contain one or more attributes, each in the form “attributeName=“attributeValue””. The following is a p element with two attributes:

images

If an attribute value contains quotes, they should be the single quote, ', as in

images

Here we use the “style” attribute to set the font to present the p element's value: boldface, 24 pixels, first choice is font family “Times New Roman,” and the second choice is font “serif.”

2.2.2 Basic Structure of an HTML File

A basic XHTML 1.0 file that is compatible with HTML 4.0 must start with a “DOCTYPE” declaration for HTML's root element html, followed by a single html element. The DOCTYPE declaration specifies a universal resource identifier (URI; a unique string for identifying a network resource that may not be an address for accessing the resource), “-//W3C//DTD XHTML 1.0 Transitional//EN”, for the version of HTML used in the current file, as well as a uniform resource locator (URL), “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd, for accessing the DTD (data type definition; to be introduced in the next chapter) file defining the syntax of the version of HTML used in the current file. Such long strings in this chapter should not be broken by newline characters, even though sometimes we have to break them up in the book samples because of our book page's limited text width.

Like any XML file, an HTML file can contain only one root element (an element that is not nested inside another element). All the other text and elements must be nested inside this root element. For HTML, this root element is html. For XHTML 1.0 files, the start tag of an html element must have a namespace attribute, xmlns, with value http://www.w3.org/1999/xhtml. There are many different specifications of html elements, and this attribute specifies a particular specification of html elements that is adopted by XHTML 1.0.

An html element must contain exactly one body element, which encloses much of the document data. An optional head element can appear before the body element to specify a title of the document to be displayed in the title bar of the web browser window, and any JavaScript code and CSS directives, which will be covered later in this chapter.

The following is a sample HTML skeleton that you can use as the starting point of your own HTML files. Be aware that all HTML element and attribute names are in lowercase, but DOCTYPE must be in uppercase. All quoted strings in an HTML file, as well as those in XML and program files, must be typed on one line, even though sometimes we have to break them in our book examples because of limited page width, as in the third line of the following HTML skeleton. When a quoted string value must be printed on two lines, the character [SYMBOLCHARACTER] is put at the end of the first line to indicate that these two lines should be on the same line in HTML files. The following introduction to HTML features will use only incomplete HTML pieces. To try them out, just copy them in the body element of this skeleton and display the resultant file in a web browser.

images

In this chapter, many HTML elements will be introduced in generic terms. For example, element h1 is introduced for creating large-size headings. Most of the presentation details, like which font is used, in which size, and how the heading is aligned on its line, are not specified. This is because HTML is supposed to specify a document's logical structure, and the document's presentation should be specified by CSS, which will be covered in a later section of this chapter. Each type of web browser has a default way to present these elements, and CSS specifications can be used to change the default presentation.

2.2.3 Basic HTML Elements

2.2.3.1 Creating Headings, Paragraphs, and Line Breaks, and Formatting Text

HTML supports elements h1, h2, h3, h4, h5, and h6 to create headings in decreasing font size.

Element p is used to create paragraphs. There is extra vertical space between successive paragraphs. White-space characters (new-line, tab, and space) are used only to separate successive words, and a sequence of white-space characters is equivalent to just one. A new-line character will not break a line in a web browser presentation. To break the current line but avoid the extra space introduced by a new paragraph, use a br element in form <br/>.

Element b, like <b>text</b>, will present its text in boldface.

Element i, like <i>text</i>, will present its text in italic. Elements b and i can be nested, as <b><i>text</i></b>, to present text in bold italic.

Element tt, like <tt>text</tt>, will present its text in a monospace font.

The text inside a pre element will be presented in a monospace font, with all white-space characters preserved. Elements b and i can be used inside pre elements.

An empty hr element, <hr/>, can be used to create a horizontal line on a web page.

The following is an HTML piece using the preceding elements and its web browser presentation (copy the HTML piece into the body element of the HTML skeleton file and load the skeleton file in a web browser).

images

images

2.2.3.2 Creating Lists

The ul (unordered list) elements can be used to create a bullet list, in which each item is an li element. The following is an unordered list with two items:

images

The ol (ordered list) elements can be used to create a numbered list, in which each item is an li element. The following is an ordered list with two items:

images

The ul elements support attribute style with values of form "list-style-type: type", where type could be disc (filled circle, the default), circle (unfilled circle), and square (filled square).

The ol elements support attribute style with values of form "list-style-type: type", where type could be decimal (1, 2, 3, ’, the default), lower-roman (i, ii, iii, iv, ’), and lower-alpha (a, b, c, ’). The ol elements also support attribute start for specifying the starting number/letter. For example, the first item of the following ordered list has sequence number 2.

images

The li elements in an ol element can use attribute value to specify a sequence number out of order. For example, the second item of the following ordered list has sequence number 3.

images

The following is an HTML piece using the preceding elements and its web browser presentation (copy the HTML piece into the body element of the HTML skeleton file and load the skeleton file in a web browser). Make sure that you understand why the web browser presents this way.

images

images

2.2.3.3 Inserting Special Characters

Not all characters have corresponding keys on a computer keyboard. Also, characters <, >, and & are metacharacters in HTML, and web browsers will try to interpret them as part of markups—so they cannot be part of document text.

Like XML, HTML uses entities to specify those special characters. An HTML (XML) entity can be specified with syntax &code;, where code could be a predefined entity name or a predefined entity number. Only some popular entities have entity names. Table 2.2.1 shows the most useful HTML entity definitions.

The following is an HTML piece using the preceding entities and its web browser presentation.

images

Table 2.2.1 Popular HTML Entities

Symbol Entity Name Entity Number
& (ampersand) &amp; &#38;
< (less than) &lt; &#60;
> (greater than) &gt; &#62;
" (straight double quote) &quot; &#34;
' (straight single quote) &apos; &#39;
(space) &#32;
(nonbreaking space) &nbsp; &#160;
(tab) &#09;
© (copyright) &copy; &#169;
† (dagger) &dagger; &#8224;
" (curly double start quote) &#147;
"(curly double end quote) &#148;
' (curly single start quote) &#145;
'(curly single end quote) &#146;
. (period) &#46;

2.2.3.4 Applying Colors

For any HTML element that can contain text as its value, like body and p, you can apply a foreground color property for rendering its text by assigning value color: color to its style attribute, and apply a background color property for the text by assigning value background-color: color to its style attribute, where aqua, black, blue, gray, green, lime, navy, red, silver, white, and yellow are just a few examples of predefined color values for color. You can search “HTML color” on the Web to find more HTML color choices, or you can define your own colors.

If a style attribute specifies more than one property, the successively specified properties should be separated by a semicolon. For example, the following example specifies navy as the body's background color and blue as its foreground color.

images

2.2.3.5 Creating Hyperlinks and Anchors

Each web page on the Internet has a URL to identify its location. A typical URL has the following format:

images

where domain-name is a unique name to identify a server computer on the Internet, like www.amazon.com; application is a server-side folder containing all resources related to an application or service; and resource could be the name (or alias or nickname) of an HTML or script/program file residing on a server disk, where the script or program can generate an HTML file on the fly from data submitted by a user. The domain name could be replaced by an IP address, which is four decimal numbers, each between 0 and 255, separated by periods, like 108.168.1.2. Fundamentally, all server computers are identified by their unique IP addresses, and the domain names are just nicknames for the IP addresses so that they will be easier for people to remember. More explanation for URLs will be provided in Section 2.4.1 on page 46.

An HTML file can contain hyperlinks to other web pages so that users can click on them to visit different web pages. A hyperlink has the general structure of <a ref=“url”>Hyperlink Text</a>. The web page linked to by the hyperlink is called the target page of the hyperlink. By default, a web browser will display a hyperlink's text with an underline, and the hyperlink will be a different color on the basis of whether the hyperlink has been visited (clicked) or whether the mouse cursor is hovering on the hyperlink. For example,

images

is a hyperlink to Google's home page. Many websites define a “welcome page” so that if a user uses a URL for the website without the resource name, the welcome page will be returned. Because Google has defined “index.html” as its welcome page, the following hyperlink will have the same effect as the previous one:

images

The preceding URLs are also called absolute paths for web pages. An absolute path can be used in any web page as a hyperlink target independent of the page's own URL. If a web page needs to link to another web page on the same web server, say, in the same web server directory, then you can use a shorter relative path, which is a path relative to the current page's location. Let us use a scenario to illustrate relative paths. Assume that a web application has three nested directories a/b/c; directory a contains directory b and file a.html; directory b contains directory c and files b1.html and b2.html; and directory c contains file c.html. File b1.html can use hyperlink <a href=”../a.html“>Link A</a> to link to file a.html, where “../” represents the parent directory of directory b; use hyperlink <a href=“b2.html”>Link B</a> to link to file b2.html; and use hyperlink <a href=“c/c.html”>Link C</a> to link to file c.html. The forward slash, /, used in relative paths is operating system independent.

By default, clicking on a hyperlink will cause the target page of the hyperlink to replace the current page in a web browser. You can also use a target attribute in an a element to display the target page in a new web browser instance, as in the following:

images

You can also use a hyperlink to send emails. You just need to use a URL of form mailto:email-address. If a user clicks on the following hyperlink, the user's default mail application will be started with address [email protected] filled in its To text field.

images

You can also specify a subject for the email by using a query string of form “?subject=Title” (refer to Section 2.4.1 on page 46 for the definition of query strings). When a user clicks on the following hyperlink, the default mail application will be launched with [email protected] in its To text field and Comment in its Subject text field.

images

You can also display a tooltip when a user puts the mouse cursor on top of the hyperlink by using a title attribute of an a element. When a user puts the mouse cursor on top of the following hyperlink, tooltip “Comment on the topic” will be displayed next to the cursor.

images

So far you have been using hyperlinks to link to separate web pages. You can also use hyperlinks to link to specific anchors on the same page or other web pages. When a user clicks on such a hyperlink, the web browser will jump to display the text close to the anchor. This feature is useful for long documents. An anchor is like a bookmark in an HTML file that can be used as the target of a hyperlink. To define an anchor for the word Conclusion in an HTML document test.html, make Conclusion the value of an a element, as in

images

where the value of attribute name can be any string. To make a hyperlink to this anchor in the same file, you can use a hyperlink like

images

To make a hyperlink to this anchor from another file in the same directory, you can use a hyperlink like

images

2.2.3.6 Creating Tables

Tables are a popular format of presenting data. Until the adoption of CSS, tables had also been used to format web page layout.

A table consists of a few rows, and each row is further divided into a few data fields. In HTML, a table element encapsulates all the table rows, a tr (table row) element specifies each row, and a td (table data) element specifies each data field. A th element is similar to a tr element except that it is used to specify table headers that will be presented in a different style from that of the table data. A table can also have an optional caption created with a caption element. The following is a basic table with default properties:

images

By default, a table does not have a border. You can use the border attribute of a table element to add a solid border of width 1 pixel (px) by rewriting the start tag as <table border= “1”>. You can use the width attribute of a td or th element to set the width of a column, as in <th width=“100px”>. You can set the width of a table column by setting the width of any single th or td element in this column. If a data field needs to use two columns and there is a column to its right, you can use a colspan attribute of the td element to combine the two neighboring data fields, as in <td colspan=“2”>. The following shows the preceding table with the addition of the new features.

images

images

At this point you may hope that the text in all the td elements will be centered. Yes, you can do it here, but you need to repeat the text alignment property for each of the five td elements, which is tedious work. Later you will see how you can use CSS to customize table presentation in a more efficient way.

2.2.3.7 Inserting Graphics

Graphics can make a web page alive and catchy. They are important for user-friendly websites. There are three popular graphic formats for web page design. Graphic Interchange Format (GIF) represents each pixel in 8 bits and thus can support only 256 colors. GIF files are compressed without loss of quality. Many graphics applications can be used to make the background of a GIF file transparent and thus easier to mingle with neighboring text or to make a simple animation by integrating a series of images into one GIF file. GIF is the recommended format for images created with graphics applications, like simple icons.

On the other hand, Joint Photographic Experts Group (JPEG, JPG) format represents each pixel with 24 bits and thus supports up to 1.6 million colors. You can trade off JPEG file size with image quality: the higher the compression rate, the more loss of precision. JPEG files do not support transparent background or built-in animation, as GIF files do. JPEG files are recommended for images created with cameras.

Portable Network Graphics (PNG) is a new graphics format for combining the advantages of GIF and JPEG as well as overcoming a patent issue with GIF. A PNG image uses 24 or 48 bits to represent a pixel. PNG format supports lossless compression, transparent background, and built-in animation. Because more web browsers are supporting it, PNG is recommended for all new web graphics.

An image element can be used to insert an image in the current web page location, as in <image src=“tomcat.gif” />, where attribute src is used to specify the image file name. You can also use the image element's width and height attributes to specify the width and height of the image in pixels, and you can use the alt attribute to specify a short text description for the image that will be presented only when the web browser cannot present the image. Normally you do not specify image width and height at the same time because doing so might change the original image's aspect ratio, as you see in the second image in the following example. You can also use an image as a hyperlink, for the third smaller tomcat image in the following example. Here an a element's target attribute is used to present the target image in a new web browser window or tab, and its title attribute is used to set a tooltip that will show a message when a mouse cursor is put on the hyperlink image. By default, an image embedded in a hyperlink has a border. To remove this border, you can use the image element's style attribute and set its value to “border: none” as in <image src=“tomcat.gif” width=“40” style=“border: none” />.

images

images

You can use the style property float to flush an image to the left or right, depending on whether you assign value left or right to float. The text will wrap around the image. To move text down vertically until the space occupied by the image becomes “clear,” use style property clear. The following example illustrates these features, in which <h3 style=“clear: left”> moves the h3 header immediately below the image and aligns it toward the left.

images

images

2.3 Cascading Style Sheets (CSS)

 

HTML before version 4 uses tags to mark up for both logical data structure (like the h1 and p elements) and presentation (like the b and i elements), and it lacks the ability to apply one directive to format many elements. As of HTML version 4, HTML tags are recommended to mark up mainly logical data structures, and the presentation details will be specified with separate and better structured cascading style sheets.

Cascading style sheets are based on the success of the word processor's style concept. In word processing, you can define styles for formatting each type of document element, and you can format a particular document element instance by simply applying a predefined style.

Each web browser has a default way to render HTML elements. For example, the HTML standards do not specify the font size of h1 elements, and the web browser designers have the freedom to choose a font size to present h1 elements as long as the font size for h1 elements is no smaller than that for h2 elements. Such default behavior of a web browser can be modified in multiple ways:

images A user may use the web browser's graphic user interface, most likely under the View menu, to change some limited aspects of an HTML document's presentation. For example, almost all web browsers allow users to change text size.

images An HTML file may import external cascading style sheets by using a link element inside a head element. The following example shows how to import stylesheet entries from a CSS file named “default.css” in the same directory as the HTML file.

    images

images An HTML file may also specify some local style rules within a style element, nested inside the head element, as shown in the previous example.

images Each start tag in the HTML file may also contain a style attribute to define style properties for that particular element. You have seen some examples in this category earlier in this chapter.

If an HTML element has some presentation aspects not defined by the HTML standard, the web browser will search for their potential definitions in reverse order of the previous list, the closest definitions first, and apply the first found definitions. This is the first reason why cascading style sheets are so named. You can always override general style rules with lower-level style definitions.

On the other hand, HTML elements are highly nested. A style rule specified for an element will also be applied to elements nested in it unless it is overridden in its child elements. This behavior also suggests the name of cascading style sheets.

The ability for many HTML files to share style definitions in external CSS files is important. A website can change many web pages’ presentation by modifying only one CSS file.

The following sections will show many CSS definitions. To test them, you can either copy them inside a style element or copy them in an external CSS file that is linked to the HTML file with a link element, as shown earlier in this section.

2.3.1 Style Rule Format

A style sheet consists of a list of style rules, and most style rules in CSS are of form

e1 e2 … ek {attribute1: value1; attribute2: value2; … attributen: valuen}

where “e1 e2 … ek”, called a selector, is a list of space-separated elements, and each of them, except the first one, is nested in the element to its left (notations based on attributes id and class will be introduced later to represent a subset of elements, and they can also be used in the style rule selectors, but the concept of general to specific in a selector list is still true). This style rule specifies values to attributes for all ek elements in the current document that are successively nested in ek-1, …, e2, e1. As a simple example,

images

specifies that all paragraphs in the current document will have a two-pixel-width solid external border. If you need to apply this style only to a particular paragraph, you can use the p element's style attribute to specify the same external border:

images

The attribute value strings must be on the same line in HTML files, even though sometimes they have to be printed on multiple lines in this book. Although the following discussions will introduce CSS attributes mainly in the style sheet format, you should be able to follow this example to rewrite them in the form of an element's style attribute if necessary.

2.3.2 Formatting Text

Specifying a certain font to appear on a page can be tricky because not everyone has the same fonts installed. To work around this problem, you can specify a font family rather than an individual font. A font family is a set of fonts listed in order of preference. If the computer displaying your page does not have the first font in the list, it checks the second, and the third, and so on, until it finds a match. The last font on a font family list is normally a font that is guaranteed to be available on any computer. Such generic fonts are specified without using double quotes around them. If a web browser cannot find any font match, it will use its default font to display the text.

You can use the font-family attribute to specify font families. The following are some commonly used font families:

images “Arial Black”, “Helvetica Bold”

images “Arial”, “Helvetica”, sans-serif

images “Verdana”, “Geneva”, “Arial”, “Helvetica”, sans-serif

images “Times New Roman”, “Times”, serif

images “Courier New”, “Courier”, monospace

images “Georgia”, ”Times New Roman”, “Times”, serif

images “Zapf-Chancery”, cursive

images “Western”, fantasy

If you specify a font family in an element's style attribute, the double quotes around the font names should be dropped. The following examples show how to specify a font family in a style rule and in a style attribute:

images

Font size can be specified with attribute font-size. The commonly used font-size values include small, medium (default), large, 12px (any font size specified in pixel number), and 120% (120% of the base/inherited size, which is the font size used in the immediate context of this element).

Attribute font-style can be used to specify whether the text should be in normal or in italic style, where normal and italic are font-style's most popular values.

Attribute font-weight can be used to specify the darkness or boldness of the text. Attribute font-weight's popular values include lighter, normal (default), bold, and bolder.

The font color is specified by attribute color. The background color of text is specified by attribute background-color. The popular color values include blue, green, red, yellow, gray, magenta, lime, and white. For more color values, make a web search for “HTML color.”

Text alignment can be specified with attribute text-align, which can take on values left, right, center, and justify with the same meaning as they have in word processors.

Attribute text-indent can be used to specify the indentation of the first line of a paragraph, as in style rule p {text-indent: 20px}.

The line height is the amount of space between each line, which is also referred to as leading. You can use attribute line-height to specify line height as a percentage of the base one, with popular values 100% (single spacing), 150% (1.5-line spacing), and 200% (double spacing).

HTML text can be further decorated with lines or blinking effects with attribute text-decoration, which supports the following values: underline (line under the text), overline (line over the text), line-through (strike-through), blink (flashing text), and none (remove all inherited decoration).

You can also control the extra spacing between successive words with attribute word-spacing, and extra spacing between successive letters with attribute letter-spacing. By default, both word-spacing and letter-spacing have a value of 0 pixels. If you specify positive integers, the spacing increases. If you specify negative integers, the spacing decreases. Usually one or two pixels in either direction is plenty. As an example, style rule p {word-spacing: 1px} increases the space between successive words by one pixel.

2.3.3 Formatting a Subset of Element Instances

So far you have learned how to apply style rules to all elements of a particular type, say, p. In practice, you need to be able to support exceptions. For example, while specifying that all paragraphs start with an indention on their first lines, you may also want the first line of the first paragraph to have no text indentation. You may also want paragraphs in different sections of a document to be formatted differently.

If you need to format a unique element instance, say, a specific paragraph, of an HTML document differently, you can use attribute id to assign a unique string value to this element and use a special style rule to format this element instance differently. The selector for this style rule is the pound, #, followed by the unique id string value. For example, the following style rules and HTML body will indent the first line of each paragraph by 20 pixels, except for the first paragraph, which will have no first-line indention. Each attribute id of an HTML file must have a unique value in the file, even though several web browsers do not enforce this rule.

images

If you need to format a subset of element instances, say, all paragraphs in a particular section, of an HTML document differently, you can use attribute class to assign a class name to those element instances and use a special style rule to format these element instances differently. The selector for this style rule is the period character, ., followed by the class name. A document can have many elements carrying the same class value, and these elements may be based on different tag names. The following style rule and HTML body show how you can define a class named“important”to present several elements in red.

images

2.3.4 Formatting Part of Text or a Document with span and div

So far you have learned how to format all text in an element differently. Sometimes you also need to format a few words in an element differently. You cannot do so at this point because these words may not be all text in an element and your style rules or style attributes can be applied only to all text in an element. The solution is the introduction of a new type of element: span. A span element itself has no visual effect on text formatting. But like all HTML elements, span supports attributes style, id, and class; therefore, you can use style attributes and style rules to format text in a span element differently.

On the other hand, you may also want to format all elements in a particular logical section of an HTML document in a special way. You can use a div element to enclose all elements in a logical section and assign an id or class attribute value to identify this division. You can also use the style attribute to apply formatting to the entire div box. Like span, element div itself has no visual effects on a web page's view. It depends on style rules or style attributes to format its contents. Whereas span is an inline box, part of a text line, which does not start new lines, div usually encloses elements like paragraphs, lists, and headers, which are separated from elements outside the div element by some vertical space.

The following example illustrates the HTML features introduced in this subsection. It uses id “id1” to display a term in the document in underlined red. It defines two divisions and displays both in a different boundary box. Although all elements of class “keyword” are set to display in italic blue, all elements of class “keyword” inside division “ajax” are set to display in italic green. As explained earlier, a style rule for a specific subset of elements overrides that for a more generic style rule applied to a larger scope containing that subset.

images

images

You can use attribute width to specify the width of a division, whose value can be in the form of 100px for 100 pixels or 40% for 40% of the web browser window's width. You can use attribute float to specify whether the division should float to the left or right, depending on whether you assign to it value left or right. For example, if you add the following two style rules to the last example, the two divisions will be displayed side by side, each taking 40% of the screen width.

images

images

Sometimes you may want to put a division at a specific location of the web page relative to its parent element, normally the body element. On other occasions you may want to move the division relative to its natural position. Even though both positioning mechanisms may lead to content overlapping, they could be handy for advanced page layout. You can use div's attribute position to specify a division's location: if its value is absolute, the division's position is relative to its parent element, normally the top of a web page; if its value is relative, the division's position is relative to its natural position. Attribute position must be used in conjunction with attribute left, right, top, or bottom to specify the location. For example, if you change the style rules for #intro and #ajax to absolute positioning, as the following,

images

the screen captures show that the two divisions are 20 pixels from their left and right browser window boundaries, respectively, and they may overlap if the browser window is too narrow.

images

The following example modifies the style rules for #intro and #ajax to use relative positioning as the following,

images

and the screen capture shows the resulting web browser display:

images

2.4 HTML Forms and HTTP Basics

 

Web browsers interact with web servers with HTTP, which runs on top of TCP/IP network connections. The main function of HTTP is for web browsers or programs to download web pages or any data from web servers or to submit user data to web servers. HTML form elements can be used to create graphic user interfaces in web browsers and interact with web servers through HTTP.

2.4.1 HTTP Basics

You need to review the concept and general format of a URL. A URL is an address for uniquely identifying a web resource, like a particular web page, and it has the following general format:

images

where http is the protocol for accessing the resource (https and ftp are popular alternative protocols standing for secure HTTP and File Transfer Protocol); domain-name is for uniquely identifying a server computer on the Internet, like www.amazon.com; port is an integer between 0 and 65535 for identifying a particular server process; application is a server-side folder containing all resources related to a website, a web application, or a web service; resource could be the name (alias or nickname) of an HTML or script/program file residing on a server hard disk; and the optional query string passes user data to the web server.

The domain name is typically in the form of a sequence of three strings separated by periods, like www.amazon.com. The rightmost is one of the top-level domain names, among which “com” stands for companies, “edu” for education, and “gov” for government. The string to its immediate left is one of its subdomains, typically representing a company or an institution. The leftmost string is normally an alias for one of the server computers in the company or institution. A server computer may have multiple domain names, all referring to the same server computer. For example, www in many URLs is optional. But each domain name must refer to no more than one server computer (which may be a façade or interface for a cluster of computers working behind the scene). The domain name could be replaced by an IP address, which is four decimal numbers, each between 0 and 255, separated by periods, like 108.168.1.2. In Windows you can easily find your computer's IP address by typing command ipconfig in a Command Prompt window. Fundamentally, each server computer is identified by one or more IP addresses, and one or more domain names are used as the nicknames for each IP address so that they will be easier for people to use. There is a special domain name, “localhost”, that is normally defined as an alias of local IP address 127.0.0.1. Domain name “localhost” and IP address 127.0.0.1 are for addressing a local computer, useful for testing web applications when the web browser and the web server are running on the same computer. When a user uses a domain name to specify a URL, the web browser will use a DNS (Domain Name Server) on the Internet to translate the domain name into an IP address.

A server computer may run many server applications, like web servers and database servers, and you may run more than one web server on the same computer, too. A running program is called a process. A computer may have many server processes running at the same time, and some of them may be running the same application. When a client sends information or a request to this computer, there needs to be a way for the client to specify that the information or request is directed to a particular server process. The port numbers are used to identify different server processes. Each server process will claim an unused port number and listen only to messages directed to that port number. No two server processes can use the same port number. If you start a server program that uses a port but the port is currently in use by another process, the server program will fail to start. Port numbers from 0 to 1024 are reserved for popular server applications, and user applications should not use them. For example, by default HTTP of web servers uses port 80, HTTPS uses port 443, FTP uses ports 20 and 21 (for data transfer and FTP commands, respectively), SSH (Secure Shell) uses port 22, telnet uses port 23, DNS uses port 53, and the IMAP email system uses port 220. Many server applications allow you to change the port numbers.

One way for a web browser or client program to submit user data to a web server is to use a query string, which was originally used for sending database query criteria. A query string starts with the question mark character, ?, and consists of a sequence of “name=value” (both name and value are strings) assignments separated by character &. Because a valid URL cannot contain some special characters, such as space and those with special meanings in HTML, URL encoding is used to encode these special characters in the query strings. For example, space is encoded as + or %20, tab as %09, linefeed as %0a, carriage return as %0d, & as %26, ; as %3b, ? as %3f, / as %2f, : as %3a, # as %23, = as %3d, < as %3c, > as %3e, + as %2b, % as %25, ” as %22; ’ as %27, ~ as %7e, | as %7c, $ as %24, * as %2a, (as %28,) as %29, and, as %2c. For example, a query string containing names “lang” and “os” with values “Java & C++” and “unix”, respectively, will be encoded in a URL as “?lang=Java+%26+C%2b%2b&os=unix”.

HTTP is a stateless protocol. Every time a user uses a web browser or program to interact with a web server through HTTP, HTTP has no memory of the user's recent interactions with the web server. For a web server to remember the recent interactions with a user, it needs to adopt some mechanisms like cookies and server-side session objects to explicitly record interaction history.

HTTP GET and HTTP POST are the two main HTTP methods for a web browser or client-side program to interact with a web server. When you click on a hyperlink in a web browser, the web browser will generate an HTTP GET request to the web server specified by the hyperlink. For a web browser to interact with a web server with the HTTP POST method, you need to use an HTML form.

2.4.2 HTML Forms

HTML form elements are used to create simple graphic user interfaces in a web browser for the user to interact with a web server with HTTP GET or HTTP POST. The form element has two major attributes: method for specifying HTTP submission method (with common value get or post) and action for specifying the URL of a web resource that will accept this HTTP request. The following is an excerpt of example file echoPost.html deployed in this book's demo web application. In this example, HTTP POST is used to submit user data to web resource echo (a Java servlet) inside the web application demo deployed in your local Tomcat web server.

images

A form element can contain text and most other HTML elements. For each element type that is introduced here for collecting data from a user, it supports a parameter called name. This name parameter is for specifying a unique name representing the data that the user specifies through this input element. The server scripts or programs can use this name to access the data that the user has specified through this input element.

In this example, an input element <input type=“text” name=“user”/> is used to create a text field with name “user”. Element input can be used to specify several types of input controls (devices), and its type attribute specifies its particular input control type. Another input element of type “submit”, <input type=“submit” value=“Submit”/>, is used to create a submit button. The value attribute here is used to specify the string on top of the button. When a user clicks on a submit button, all data that the user has entered in the form will be submitted to the target web server resource, as specified by the action attribute value of the form element, with either the HTTP POST or HTTP GET method, as specified by the method attribute value of the form element. A third input element, <input type=“reset” value=“Reset”/>, is used to specify a reset button with type value “reset”. I t s value attribute is used to specify the string on top of this reset button. When a user clicks on a reset button, all the data that the user has entered in the form will be erased, and the form is reset to its initial state so that the user can enter the data again from scratch.

Make sure that you have downloaded and deployed book resource file “demo.war” in your Tomcat installation's directory “webapps”, and ensure that your local Tomcat web server is running at its default port 8080. If you load file “http://localhost:8080/demo/echoPost.html” into a web browser, you will see a graphic user interface similar to the following one. Here the user has typed string “Ada” in the text field.

images

If the user clicks on the submit button now, the web browser will generate an HTTP POST request to the web resource “http://localhost:8080/demo/echo” specified by the action attribute of the form element. Basically, a TCP/IP communication channel will be created to connect the web browser to the Tomcat web server running on port 8080, and the following HTTP request text (simplified) will be sent through the TCP/IP channel to the Tomcat web server.

images

The first line of an HTTP request is used to specify the submission type, GET or POST; the specific web resource on the web server for receiving and processing the submitted data; and the latest HTTP version that the web browser supports. As of 2008, version 1.1 is the latest HTTP specification. The following lines, up to before the blank line, are HTTP header lines for declaring web browser capabilities and extra information for this submission, each of form “name: value”. The first two Accept headers declare that the web browser can process HTML files and any standard audio file formats from the web server. The User-agent header declares the software architecture of the web browser. The Referer header specifies the URL of a web page from which this HTTP request is generated (this is how online companies like Amazon and Yahoo collect money for advertisements on their web pages from their sponsors). Any text after the blank line below the header lines is called the entity body of the HTTP request, which contains user data submitted through HTTP POST. The Content-length header specifies the exact number of bytes that the entity body contains. If the data is submitted through HTTP GET, the entity body will be empty and the data go to the query string of the submitting URL, as you will see later.

In response to this HTTP POST request, the Tomcat web server will forward the submitted data to resource echo of web application demo, and the resource echo will dynamically generate an HTML page for most data it can get from the submission and let Tomcat send the HTML page back to the web browser as the entity body of the following HTTP response.

images

The first line of an HTTP response specifies the latest HTTP version that the web browser supports. The first line also provides a web server processing status code, popular values of which include 200 for OK, 400 if the server does not understand the request, 404 if the server cannot find the requested page, and 500 for a server internal error. The third entry on the first line is a brief message explaining the status code. The first two header lines declare the web server capabilities and metadata for the returned data. In this example, the web server is based on a software architecture named “NCSA/1.3,” and it supports Multipurpose Internet Mail Extension (MIME) specification 1.0 for web browsers to submit text or binary data with multiple parts. The last two header lines declare that the entity body contains HTML data with exactly 2000 bytes. The web browser will parse this HTTP response and present the response data in a window similar to the following one:

images

Example file http://localhost:8080/demo/echoGet.html is the same as http://localhost:8080/demo/echoPost.html, except that the value of form attribute method has been changed from “post” to “get”. If you type “Ada” in its text field and click on the submit button, the submitted data will be in the form of a URL query string, as shown in the following, and the HTTP GET request's entity body will be empty.

images

2.4.3 HTTP GET vs. HTTP POST

HTTP GET was initially designed for downloading static web pages from web servers, and it used short query strings mainly to specify the web page search criteria. HTTP POST was initially designed for submitting data to web servers, so it used the request entity body to send data to the web servers as a data stream, and its response normally depended on the submitted data and the submission status. Although both HTTP GET and HTTP POST can send user requests to web servers and retrieve HTML pages from web servers for a web browser to present, they have the following subtle but important differences:

images HTTP GET sends data as query strings, so people can read the submitted data over the submitter's shoulders.

images Web servers have limited buffer size, typically 512 bytes, for accommodating query string data. If a user submits more data than that limit, the data would be truncated, the web server would crash, or the submitted data could potentially overwrite some computer code on the server and the server would be led to run some hideous code hidden as part of the query string data. The last case is the so-called buffer overflow, a common way for hackers to take control of a server and spread viruses or worms.

images By default, web browsers keep (cache) a copy of the web page returned by an HTTP GET request so that future requests to the same URL can be avoided and the cached copy can be easily reused. Although this approach can definitely improve the performance if the requested web page does not change, it could be disastrous if the web page changes or depends on the data submitted by the user.

2.5 Summary

Web technologies are based on a tiered web architecture, with each tier having its well-defined roles. HTML is the web language for describing the logical structure of web documents, and cascading style sheets are for customizing the presentation of web documents. HTTP is the application-level protocol to support dynamic interactions between web browsers and web servers. In general, HTTP POST is a more secure way for a client to interact with web applications.

2.6 Self-Review Questions

1. HTML is a language for specifying data presentation in web browsers.

a. True

b. False

2. CSS is a language for specifying data presentation in web browsers.

a. True

b. False

4. XHTML and HTML are totally different languages.

a. True

b. False

5. Users can introduce new tags in an XHTML document.

a. True

b. False

6. Attributes are mainly for specifying large chunks of business data.

a. True

b. False

7. Which HTML elements are normally used to define the general layout of a web page?

a. table

b. form

c. div

8. You can customize hyperlink views without using CSS.

a. True

b. False

9. Multiple elements of an HTML document can have the same value for their id attribute.

a. True

b. False

10. Attributes id and class are for defining special formatting of subsets of elements.

a. True

b. False

11. A URL is for specifying the location of a network resource.

a. True

b. False

12. HTTP is a network protocol similar to TCP/IP.

a. True

b. False

13. The port number in a URL is for identifying a server-side process for receiving the HTTP GET or HTTP POST request.

a. True

b. False

14. HTTP GET is more secure than HTTP POST in submitting large amounts of data to a web server.

a. True

b. False

15. Using HTTP GET to submit data from a text field or text area could lead to which of the following?

a. Crash of the web server

b. Web server buffer overflow

c. Viruses or worms being implanted on the web server and starting to run

d. Web browser presenting outdated data

16. HTTP GET should be used to request the price of a stock.

a. True

b. False

17. If you do not want people to read over your shoulders what you are submitting to a web server, you should use which method to submit the data?

a. HTTP GET

b. HTTP POST

18. When you click on a hyperlink, an HTTP GET request will be sent to the web server specified by the hyperlink.

a. True

b. False

19. You can use the space character explicitly in a query string.

a. True

b False

20. CSS is a language for specifying data presentation in web browsers.

a. True

b. False

Keys to the Self-Review Questions

1.b 2. a 3.b 4. b 5. b 6. ac 7. b 8. b 9. a 10. a 11. b 12. a 13. b 14. abcd 15. b 16. b 17. a 18. b 19. a

2.7 Exercises

1. What are the main advantages of using CSS to format data presentation relative to the old approach of formatting data presentation inside HTML?

2. How do the id and class attributes of HTML elements support the special data presentations for a subset of elements?

3. What are the major components of a URL, and what are their functions?

4. What are the major differences between HTTP GET and HTTP POST for submitting form data to a web server?

2.8 Programming Exercises

1. Use XHTML and CSS to create a website for the course that adopts this book. The main web page has three sections: the top banner section for website title and some graphics, the narrow bottom-left section for navigation links, and the large bottom-right section for the contents of the link that the user has chosen on the navigation link section. You should use CSS division-based layout for your solution.

2. Create a web application on Tomcat that will collect student information from each of its users and echo the user data back for the user to review. You can use this chapter's demo web application as the foundation of your project. Make sure that your graphic user interface uses select control, text field, text area, password, radio buttons, and a submit button.

2.9 References

Laura Lemay and Rafe Colburn. Sams Teach Yourself Web Publication with HTML and CSS in One Hour a Day, Sams, 2006. ISBN 0-672-32886-0.

Faithe Wempen. HTML and XHTML Step by Step, Microsoft Press, 2006. ISBN 0-7356-2263-9.

XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition). http://www.w3.org/TR/xhtml1/

XHTML Tutorial. http://www.w3schools.com/xhtml/

HTML Tutorial. http://www.w3schools.com/html/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset