9781430237891_CO-01.jpg

Chapter 4

Giving Your Pages Structure: HTML5

Start paragraph. Take a minute to imagine what it would be like to have to describe the function of (strong emphasis) everything (end strong emphasis) you’re saying. How (emphasis) irritating (end emphasis) would that be in a conversation? Give it a try: try reading this paragraph out loud. End paragraph.

That’s enough of that. Writing the entire chapter in that style not only takes quite a bit longer, but is a pain to read, no? In dropping those annotations though, we lose some meaning to the text we’ve written. If only there were a way to describe the meaning of a particular piece of text for the purposes of publishing that text!

Wait! There is! When you’re reading through a book like this, you may see certain pieces of text written in italics or in bold. You’ve probably seen that on the Web, as well. The trouble with bold and italic is that, while you and I can discern the differences, a computer can’t. That’s where we bring in HTML, or HyperText Markup Language.

What are web pages, really?

It may take you by surprise that, even though the web pages you look at are full of pictures and videos and things like that, underneath every page is just a single (albeit possibly very long) string of text. Not all of it is in English, of course, because computer programs such as your web browser don’t actually speak or understand English. Nevertheless, your web browser does understand a kind of language: the Hypertext Markup Language (HTML).

When you go to a website in your web browser, you’re actually telling your web browser to download a file. This file, called an HTML document, is nothing more complicated than a bunch of text that your web browser knows how to read. The HTML you download is then interpreted by your web browser and displayed to you as a web page.

If it’s all just a single string of text, you might be wondering, then how does the web browser know that a picture should go in the top-left corner or that two pieces of text are in different paragraphs? The answer is that, in addition to all the content that’s in the HTML document (such as the text of the two paragraphs), there’s also information about the content. This information about the content isn’t actually visible to you on the web page, but it’s there behind the scenes.

All of that extra information, called markup , tells the web browser what kind of thing each content piece is. For example, it tells your browser where things begin and end in that long string of text, as well as what purpose they serve (remember our opening example—start paragraph, end paragraph).

The basics of markup

Everything that ends up on a web page has to be specifically prepared to go there. As much as you might wish it so, you can’t just copy everything in a Microsoft Word document, paste it into an empty text file, and call that a web page. That file is missing all of the information that a web browser needs to interpret its contents. Preparing a document for the Web—or marking up a document—is actually a pretty simple process that involves working with a specific set of tools (tags). The rest of this chapter will give you an introduction to that set of tools and start you off with a sample project that we’ll work with throughout the rest of this book.

Elements (or tags)

A tag is the basic building block of HTML. Most tags have both an opening tag and a closing tag (to show the start and end of an element), but there are some exceptions to that rule. There are also different types of tags, but we’re getting ahead of ourselves:

 <header>Your header content goes here</header>

In its simplest form, that’s all there is to a set of tags. Of course, “simple” doesn’t always translate to “most powerful” or “most useful,” so tags can be extended through the use of attributes:

 <a href="http://www.google.com/">Google: a handy search engine</a>

In this example, href is an attribute that tells the <a> tag where to link to. An element is nothing more than text wrapped between a pair of tags. Angle brackets, the < and > characters, delimit the beginning and end of a tag. But before we dive into the intricacies of HTML, let’s look at a quick XML example. XML allows you to create your own tags on the fly, instead of using a pre-defined set, as is the case with HTML. Because of that, it’s easier to read and follow along with. It will be useful to give you a taste of a markup language before diving into HTML. Take for instance, this piece of text:

 Where the Streets Have No Name

You might easily identify this text as a song title written by the band U2. However, someone else might not be able to recognize this song title, so you need to indicate what it is in proper markup form (especially when it’s dropped into the middle of a book about building websites). Let’s label what it is and identify it as a title, like so:

 <title>Where the Streets Have No Name

You’ve now identified where the title starts by annotating the beginning of the song title with the <title> tag; but in XML, that’s not enough. You also have to identify where this title ends. To do that, you simply use the same tag again and include a forward slash directly after the left angle bracket:

 <title>Where the Streets Have No Name</title>

Now you have successfully marked up the title of this song by annotating its beginning and end explicitly, using opening <title> and closing </title> tags. What you’ve created is a <title> element that, in this case, is nothing more than the title of the song.

Next, let’s add the artist information so that readers of the XML document will know what band wrote the song. All you have to do is mark up the text U2 as a similarly obvious element. Let’s call it <artist> and add that directly after the title:

 <title>Where the Streets Have No Name</title>

 <artist>U2</artist>

You can continue adding information about the song in this way and include all sorts of information. For instance, this song came out on the album titled, “The Joshua Tree” in 1987. You can add that to your list of information, marked up as follows:

 <title>Where the Streets Have No Name</title>

 <artist>U2</artist>

 <album>The Joshua Tree</album>

 <released>1987</released>

By now you have a good deal of information encoded in the XML document about this song. Let’s take this one step further and encode information about many songs in the same document. To do this, you need a way of grouping all the elements related to a single song together, so you can clearly delineate where the information about one song ends and the next one begins.

In XML and most other markup languages, elements can be nested (placed inside one another) if the information can be grouped together in a meaningful way. So, you could take all the markup you have already defined for your single song and put it inside a <song> element, like this:

 <song>

  <title>Where the Streets Have No Name</title>

  <artist>U2</artist>

  <album>The Joshua Tree</album>

  <released>1987</released>

 </song>

Now, you have defined one <song> element in your document. Since that <song> element encloses four other elements, it is said to have four children. <song> itself is called the parent element of those child elements.

Any element that is nested inside another element is typically indented, and the enclosing element’s start and end tags are on their own lines. This is done purely for ease of reading and editing by humans. Software programs written to interpret this markup will typically ignore all this whitespace between the tags (including the indentation, the line breaks, and so on), as well as any extra whitespace inside elements or an element’s content. Technically, all these tags could be right next to each other and still be just as meaningful. The browser simply interprets any number of consecutive spaces or tabs as a single space. Nevertheless, it’s good practice to keep to this style of indentation because it’s much easier to edit the markup when its structure is visually obvious.

Now that your song’s information is properly grouped, it’s easy to expand this document to include other songs. For instance, you might want to have a document with information that encodes one of your favorite mixes. You can duplicate your single song markup and simply change the information held in the child elements for the different songs, as shown here:

 <song>

  <title>Where the Streets Have No Name</title>

  <artist>U2</artist>

  <album>The Joshua Tree</album>

  <released>1987</released>

 </song>

 <song>

  <title>Hungry Like the Wolf</title>

  <artist>Duran Duran</artist>

  <album>Rio</album>

  <released>1982</released>

 </song>

 <song>

  <title>Dream On</title>

  <artist>Aerosmith</artist>

  <album>Aerosmith</album>

  <released>1973</released>

 </song>

This isn’t right quite, yet, because all you have now is a bunch of individual songs, not a playlist for your mix. You still need to group these songs together in some way. Let’s use a <playlist> element for that purpose:

 <playlist>

  <song>

  <title>Where the Streets Have No Name</title>

  <artist>U2</artist>

  <album>The Joshua Tree</album>

  <released>1987</released>

  </song>

  <song>

  <title>Hungry Like the Wolf</title>

  <artist>Duran Duran</artist>

  <album>Rio</album>

  <released>1982</released>

  </song>

  <song>

  <title>Dream On</title>

  <artist>Aerosmith</artist>

  <album>Aerosmith</album>

  <released>1973</released>

  </song>

 </playlist>

Finally, you have a complete playlist of four songs that’s been properly marked up in XML. Using just the very basic building block of XML elements, you can create arbitrarily complex documents that encode information about whatever topic you like in a hierarchical fashion. This hierarchical structure is known as a tree. However, if all you had to work with were elements, your tree would grow very big very quickly, and it would quickly become unwieldy. So, in addition to elements, there are two more very important parts of the XML markup language that allow you to add extra information about your data: attributes and values.

Attributes and their values

As mentioned previously, attributes are a means of adding extra information to elements in an HTML or XML document. Attributes always specify some value, so you’ll always see attributes and values together in this form:

 attribute-name="attribute-value"

Generally, in XML each element can have an unlimited number of attributes, though in practice each specific application of XML has a certain set of valid attributes and valid values for those attributes. Part of learning any dialect of XML is learning which elements can have which attributes and what values are allowed for those attributes.

Attributes and their values are specified by adding their name and value inside the opening tag for the element to which they apply. For instance, you can include the date this playlist was created as an attribute of the <playlist> tag. For example, let’s specify that your playlist was created on May 28, 2012. You could add this data to the <playlist> tag, as follows:

 <playlist created="May 28, 2012">

Alternatively, if you wanted to specify the author of the playlist, you might add that data like so:

 <playlist author="johndoe">

To include both the author of the playlist and the date it was created, you simply include multiple attributes in the same element:

 <playlist author="johndoe" created="May 28, 2012">

This brings up an interesting question: what data should be marked up as an element, and what data should be marked up as an attribute/value pair? You could have just as easily created an <author> element and defined johndoe as its content inside the playlist tag or a <created> element in the same way. Although there is no real right or wrong answer here, it’s generally considered best practice to use attributes and values for metadata (or data about data) and to use elements for everything else.

In other words, if it’s data that’s meant to be seen by the end user, it’s best to mark it up in an element. If it’s data that describes some other data in the document, it’s best to use an attribute/value pair. This is often a confusing topic, so don’t think you have to get it perfect right away. Oftentimes, you just have to step back and take a look at how you’re intending to use the data to determine the best solution for your markup based on its application.

Empty elements

There’s one other way you might encode information inside an XML document, and that is through the use of an empty element. Empty elements are just like regular elements, except that they don’t contain any content, neither text nor other elements, inside them. Such elements are well suited for things such as embedding pointers to other documents or objects (such as pictures) inside an XML document or for storing important yes/no (Boolean) values about the document.

For example, if you want to use your playlist document in an automated CD-burning application, you might embed an empty element inside the <playlist> element called <burn>, which could have an attribute called format, the value of which would indicate the kind of CD format to use when ripping the mix:

 <burn format="music" />

Notice that empty elements, like their nonempty counterparts, are both opened and closed. In the case of empty elements, however, the closing tag is the opening tag and is denoted simply by including the closing forward slash right before the right angle bracket.

Document types

When marking up a document, you must conform to some kind of standard of what elements, attributes, and values are allowed to appear in the document, as well as where they are allowed to appear and what content they are allowed to contain. If web developers didn’t conform to these standards, every website in the world might use a different set of elements, attributes, and values, and no web browser (much less any human!) would be guaranteed to be able to interpret the content on the web page correctly. For example, where you used a <playlist> element to enclose your favorite mix in the previous example, someone else might have used a <favorite-mix> element.

Luckily for us, the standards that web pages and most other kinds of XML documents use are published publicly and are known as document types. Every XML document declares itself as a specific type of document by including a special <!DOCTYPE> tag. The DOCTYPE tag used to be a long string of text referring to some obscure web page (that nobody could remember); so with HTML5, this tag has been simplified:

 <!DOCTYPE html>

As a brief exercise, let’s figure out what elements, attributes, and values are allowed in the earlier simple example. First, you need to allow for the existence of the following elements: <playlist>, <burn>, <song>, <title>, <artist>, <album>, and <released>. Next, you need to specify that the attributes called created and author are permitted to be attached only to the <playlist> element and that the format attribute can be attached to the <burn> element. You can say that any values are valid for these attributes, or you can say that only date values are permitted for the created attribute and only text values are permitted for the author and format attributes. If you do that, you’ll also have to explicitly define what “date values” and “text values” are.

Document Type Definitions (DTDs) are useful, not only so you can understand how to formulate your markup properly, but also so that your markup can actually be checked by a computer. Software programs called validators can read a document of markup and its corresponding DTD, and they can alert you to any inconsistencies between the two. Using the preceding example, your DTD could alert you if you accidentally mistyped and created a <spong> element or tried to specify the author attribute on a <title> element instead of <playlist>.

Starting with HTML5

There have been several flavors of HTML over the years. HTML5 is the latest flavor to hit the scene, and it is being developed to address a lot of the shortcomings of its predecessors. Thankfully, instead of adding to HTML’s complexity, the contributors to HTML5 took the following approach: “HTML is pretty complicated—what can we do to make things better/more intuitive?” Sure, there’s still a learning curve, but trust us when we say that things have gotten much better with this release (and if you have had exposure to one of the earlier versions, or to XHTML, hopefully you’ll nod your head in agreement).

Document shell

When creating a new HTML document, it’s easiest to start with a basic template because there are several elements, attributes, and values you’ll always need, and most of them are difficult to remember. First, create a new folder on your computer called HTML, and inside it create a new blank document using a text editor. On Windows, you can just use the built-in Notepad application that comes with Windows to get started. On a Mac, grab a copy of TextWrangler from the Mac App Store (it’s free) and save your new file as index.html with the following code:

 <!DOCTYPE html>

 <html lang="en">

 <head>

  <meta charset="utf-8">

  <title></title>

  <link rel="stylesheet" href="style.css">

  <script src="javascript.js"></script>

 </head>

 <body>

 </body>

 </html>

If you opt to open this file in a web browser like Internet Explorer, Firefox, Safari, or Chrome, you’ll see a blank page. Just a word of caution: You may get an alert if you’re using Internet Explorer. We’ve specified a JavaScript file that doesn’t exist, and IE interprets this as a potential security issue. Before moving on, let’s examine what this markup means, piece by piece.

This is the code to identify the doctype of this document:

 <!DOCTYPE html>

This is hands-down one of the greatest improvements (from our perspective) when it comes to HTML5. In previous versions of HTML, we had to remember a long, convoluted string including a URL pointing to the actual document type definition (.dtd) file on the Web. No longer! HTML5 has done away with some of the legacies of its past, including a reliance on external document type definitions.

This is the opening tag of the <html> element:

 <html lang="en">

Doctype excluded, everything else in the document will be inside this root element. This has also been massively simplified and/or improved from previous versions. You used to have to specify an XML namespace (with XHTML), as well. Now, it’s just simple: state that it’s an HTML document and that it’s written in English (or for whatever language you’re using, change the lang attribute).

The head

This is the <head> of the document:

 <head>

  <meta charset="utf-8">

  <title>New Document</title>

  <link rel="stylesheet" href="style.css">

  <script src="javascript.js"></script>

 </head>

A large amount of information may go into the <head>, much more than we currently have in this example. The important distinction here is that none of the information in the <head> of the document actually displays in the web browser window (except for the information in the <title>). The browser will, however, read some of the information and use it in a wide variety of ways.

The first element you see inside the <head> element is the (empty) <meta> element. As mentioned earlier, metadata is data about data, and this element is used to provide information about the document itself. In this case, it says that the content inside is encoded using the UTF-8 (unicode) character set. This gives a web browser access to the widest range of character glyphs available, so that, if your document is written in Russian or in Arabic, there is a likelihood that the letter you’re looking for is actually available.

The next element is quite possibly one of the most important in the document: the <title> element. It is simply a place to name the content of your document. The content inside the <title> element is normally shown in the title bar of the web browser’s window or on the browser tab where the page is being viewed. The <title> of a document is also the default name associated with a page if somebody bookmarks that page (either in her browser or when using something like PinBoard). To change the value of the <title> element, simply change New Document to whatever is most accurate and descriptive of your page content.

The title is also commonly used as the headline in search results from search engines such as Google and Yahoo. When people are searching for content that is on your page, and search engines find and show your page as a result, the title of your page is often the only thing they have to decide whether your page is what they want. If you intend your page to be found by people using search engines, pay close attention to the <title> element.

Finally, our <head> section is rounded out with a link to our style sheet and our JavaScript. Style sheets provide the formatting information (e.g., colors, fonts, and backgrounds) for our document. We’ll learn more about formatting starting in Chapter 5. JavaScript gives our page all kinds of interactivity and advanced functionality. We’ll learn more about that in Chapter 8.

The body

The body is where the meat of your document goes. It’s currently empty, but you’ll be filling it with the content for your page soon enough:

 <body>

 </body>

Finally, you need to end the <html> element you opened near the top of the document:

 </html>

Marking up content

Now that you have the shell of your document in place, it’s time to start adding real content. There are many elements at your disposal in HTML5 (some old and some new), and all of them serve a different purpose. Let’s walk through some of the most important, starting with headlines.

As you start adding marked-up content to your HTML5 document, pay close attention to the names of the elements you use. Since the purpose of markup is to explicitly label the content to define what kind of building block it is intended to be in the document, all the elements in HTML5 are designed to be semantically rich; in other words, they are designed to provide meaning to the content they define.

A lot of the new elements introduced in HTML5 are geared towards making documents more semantically relevant. There are new tags for grouping content together and for defining what part is a header, what part is a footer, what the main body is, and what sections are within that. Basically, the contributors to the HTML5 specification sat down and looked at most pages on the Web and asked themselves what most of them have in common. The following sections outline what they came up with.

Sections

Used for grouping together content of similar meaning, the <section> element should replace a lot of overuse of the<div> tag. The<section> element is a big improvement because it actually has semantic meaning—it means that “this content is related” (div could have been used to denote a sidebar, or a pagewrapper used to apply formatting to the entire document; it didn’t necessarily mean that the content contained within had any connection):

 <section>

  <h1>HTML5 Overview</h1>

  <p>HTML5 is used to structure documents on the web.

  It contains a lot of useful elements (or tags).</p>

 </section>

Article and aside

Similar to a <section>, an <article> was introduced as a way to group together related information in a document. A lot of the elements in HTML5 translate well to thinking about a blog. For example, an <article> would contain a single post in your blog, whereas a <section> may contain a month’s worth of posts on a page and list all that content chronologically.

An <aside> is perfectly suited to a sidebar (not a bar that is physically located on the side of a page, but rather “additional information”). Think of a sidebar as it relates to a newspaper: news outlets will frequently include pullquotes and/or additional (related) articles in their sidebars. That’s all perfect fodder for the <aside> element.

Headers and headlines

Another new addition to the HTML5 element family is <header>, which is to be used for grouping together introductory content (just as a header would). Traditionally, when you think of a header, you imagine that there would be one header per page. That’s not necessarily the case in HTML5, though; you can have as many headers as needed throughout your document, and you group those headers with their associated content using a <section> element.

In almost any document you create, you’re going to come across the need for headlines to introduce the sections of text or other content in the document. If you’ve dealt with any form of publishing in the past (even just MS Word), you’ll understand that there are different levels of headlines: primary, secondary, tertiary, and so forth. In HTML5, you can use six levels of headlines, and the elements for these are called <h1> to <h6>. The <h1> tag is for the most important headline in a section (which is also new to HTML5; it used to be for the most important headline in a document), and <h2>, <h3>, and so on decrease in importance until you get to <h6>, the least important (or most esoteric) headline in the section. Let’s take a look at an example:

 <h1>How I Learned to Ride a Bike</h1>

Headlines, as you would expect, cannot be nested inside each other, but they can be right next to each other:

 <h1>Overview of How I Learned to Ride a Bike</h1>

 <h2>Step 1: Practice with Training Wheels</h2>

 <h2>Step 2: Practice Balancing Without Support</h2>

In essence, headlines create an outline for your content. So when you mark up a section with headlines, be sure to think about what kind of outline makes the most sense.

Another element commonly found within the <header> element(s) of a page is the <nav> element. This element is used to denote navigation information, like a list of links. While it isn’t required that the <nav> element appear within the <header>, it is a good place for it to go.

Footers

Also new to the HTML5 scene (you might get sick of us saying that by the end of this chapter), the <footer> element is used to contain information commonly found in the footer of a section. This content may be contact information, copyright information, or possibly some metadata about the author of an article. As with the <header> element, you may have multiple <footer> elements per page, each grouped into a different <section>:

 <footer>

  <p>Contents of this page licensed under a Creative Commons Share Alike license.</p>

 </footer>

Blocks of text

Most likely, a majority of your document will simply be blocks of text such as paragraphs, asides, block quotations, and other such things. HTML5 has elements specifically meant for each of these items. Most are just as easy to apply as headlines. Let’s look at an example with a few paragraphs:

William Shakespeare was an English poet and playwright. He is widely regarded as the greatest writer of the English language and the world's pre-eminent dramatist. His surviving works include approximately 38 plays and 154 sonnets, as well as a variety of other poems. Shakespeare was also respectively modest about his innovative works, never truly boasting about his tremendous ability. He is often called England's national poet and the "Bard of Avon" (or simply "The Bard").

Shakespeare was born and raised in Stratford-upon-Avon, and at age eighteen married Anne Hathaway, with whom he had three children. Sometime between 1585 and 1592 Shakespeare moved to London, where he was an actor, writer, and part-owner of the playing company the Lord Chamberlain's Men (later known as the King's Men), with which he found financial success. Shakespeare appears to have retired to Stratford in 1613, where he passed away three years later at the age of 52.

You specify to the web browser that these blocks of text are paragraphs by marking them up with the element designed for such content. Since paragraphs of text are such a common occurrence on web pages, the element for them is given the shorthand name, <p>:

<p>William Shakespeare was an English poet and playwright. He is widely regarded as the greatest writer of the English language and the world's pre-eminent dramatist. His surviving works include approximately 38 plays and 154 sonnets, as well as a variety of other poems. Shakespeare was also respectively modest about his innovative works, never truly boasting about his tremendous ability. He is often called England's national poet and the "Bard of Avon" (or simply "The Bard").</p>

<p>Shakespeare was born and raised in Stratford-upon-Avon, and at age eighteen married Anne Hathaway, with whom he had three children. Sometime between 1585 and 1592 Shakespeare moved to London, where he was an actor, writer, and part-owner of the playing company the Lord Chamberlain's Men (later known as the King's Men), with which he found financial success. Shakespeare appears to have retired to Stratford in 1613, where he passed away three years later at the age of 52.</p>

Again, there’s nothing complicated here. Simply start the paragraph with the open paragraph tag (<p>) and end it with the close paragraph tag (</p>).

In addition to paragraphs, you can indicate that a selection of text is quoted from another source by using the <blockquote> element (an abbreviation of block quotation). Since the only thing that the <blockquote> element indicates is that its content was sourced from somewhere else, inside each <blockquote> element you will still need to use elements such as headlines and paragraphs to mark up the quotation’s content. It is by nesting elements in this way that you begin to provide rich meaning to your content. Here’s an example of a block quotation that quotes a couple of paragraphs:

 <blockquote>

  <p>Through the release of atomic energy, your generation has brought into the world the most revolutionary force since prehistoric man's discovery of fire. This basic force of the universe cannot be fitted into the outmoded concept of narrow nationalisms.</p>

  <p>For there is no secret and there is no defense; there is no possibility of control except through the aroused understanding and insistence of the peoples of the world. We scientists recognize your inescapable responsibility to carry to your fellow citizens an understanding of atomic energy and its implication for society. In this lies your only security and your only hope - you believe that an informed citizenry will act for life and not for death.</p>

 </blockquote>

Most of the time when you’re quoting an outside source, you want to cite that source along with the quote. As you might have guessed, there’s another HTML5 element designed to do just that: the <cite> element. <cite> has been slightly redefined in HTML5, but frankly, the “redefinition” doesn’t make a lot of sense. <cite> is only supposed to be used to cite the title of a quoted work, not the author. Unfortunately, there is no similar element for citing the author; so really, I think this is a case where you can break from the rules. Unlike the <blockquote> element that contains other blocks of text, however, the <cite> element is permitted to contain only other text (and not blocks of text). This is what it looks like when you cite the previous quote:

 <blockquote>

<p>Through the release of atomic energy, your generation has brought into the world the most revolutionary force since prehistoric man's discovery of fire. This basic force of the universe cannot be fitted into the outmoded concept of narrow nationalisms.</p>

<p>For there is no secret and there is no defense; there is no possibility of control except through the aroused understanding and insistence of the peoples of the world. We scientists recognize your inescapable responsibility to carry to your fellow citizens an understanding of atomic energy and its implication for society. In this lies your only security and your only hope - you believe that an informed citizenry will act for life and not for death.</p>

<p><cite>Albert Einstein</cite></p>

 </blockquote>

One good reason why the <cite> element must be placed inside the paragraph here is so that you can more accurately indicate the cited reference. Say, for example, the content you wanted in that last paragraph is “Original Quote by Albert Einstein.” Rather than citing that whole sentence, which wouldn’t make much sense, you can mark up the reference as such:

 <p>Original Quote by <cite>Albert Einstein</cite></p>

This is much more accurate because it specifically and explicitly isolates the cited reference, Albert Einstein in this example, and thus this example provides far more meaningful markup.

Similar to the way that the <blockquote> element is used, two additional common elements are used to mark up specific blocks of content. First, the code element indicates that the enclosed text is, as you would guess, computer code of some sort. (What sort of code isn’t typically specified, although you can do so using certain attributes if you like.) The code element can be used to mark up large blocks of content or just a single piece of code or even a word in a sentence, much like the <cite> tag.

Then there is the <pre> tag, which indicates that the content inside it is preformatted with line breaks, whitespace, tabs, and so on; and it should be rendered the same way in the browser window as it appears in the actual physical markup. The <pre> tag is most often used in combination with the <code> tag to show computer language excerpts or to display certain kinds of formatted text such as poetry.

Identifying content

There are two core attributes defined in an HTML5 document that can be used to give elements names that you can refer to later (mostly in Cascading Style Sheets, described later in this book). These attributes are id and class, and they can both be used on any element. The id attribute is simply a way to give a unique identity to an element. The class attribute serves a similar purpose, but instead of being unique throughout the document, the same class can be specified any number of times.

A common example of this is to set the id attribute on particular elements to provide a little more meaning to the group. For example, say you have a headline and a paragraph about a featured product on your website:

 <h1>Swingline Stapler</h1>

<p>This workhorse stapler brings you a solid and consistent performance that makes it an industry standard. An all-metal die-cast base for years of durability. A performance driven mechanism with an inner rail for long-term stapling integrity. Ease of use, refined design, time-tested features: exactly what you'd expect from America's #1 stapler.</p>

Let’s nest these two elements inside an <article> element to group them together:

 <article>

<h1>Swingline Stapler</h1>

<p>This workhorse stapler brings you a solid and consistent performance that makes it an industry standard. An all-metal die-cast base for years of durability. A performance driven mechanism with an inner rail for long-term stapling integrity. Ease of use, refined design, time-tested features: exactly what you'd expect from America's #1 stapler.</p>

 </article>

Now you can identify that <article> element as a feature on your web page by using the id attribute:

 <article Id="feature">

<h1>Swingline Stapler</h1>

<p>This workhorse stapler brings you a solid and consistent performance that makes it an industry standard. An all-metal die-cast base for years of durability. A performance driven mechanism with an inner rail for long-term stapling integrity. Ease of use, refined design, time-tested features: exactly what you'd expect from America's #1 stapler.</p>

 </article>

Since you’ve used the id attribute, you’ve also implied that you’ll have only one feature on your web page at any given time. If this isn’t true, and you’ll in fact have multiple “features,” then you need to use the class attribute instead. You can still use unique values in id attributes to identify individual features if you like:

 <article class="feature" Id="feature-1">

<h1>Swingline Stapler</h1>

<p>This workhorse stapler brings you a solid and consistent performance that makes it an industry standard. An all-metal die-cast base for years of durability. A performance driven mechanism with an inner rail for long-term stapling integrity. Ease of use, refined design, time-tested features: exactly what you'd expect from America's #1 stapler.</p>

</article>

<article class="feature" Id="feature-2">

<h1>Black Standard Stapler</h1>

<p>Not as exciting as the Swingline Stapler, but a classic stapler nonetheless! </p>

 </article>

Most importantly, id and class attributes should add meaning to the markup and describe what the content is. You should avoid id values that are presentational in nature, such as left-column or blue-box. Presentation and styling will be completely handled using CSS, so your markup should remain as semantic as possible.

Links

Of course, the breakthrough that HTML introduced was the ability to allow page authors to create links from one document to another by marking up specific text with an element that created a reference (also called a pointer or a hyperlink) to another page. The element that was created for this purpose is simply called the <a> element, which stands for the anchor element. Authors would place anchors in their pages that each linked to some other page, and those pages in turn would provide their own anchors that pointed at other pages; this was the beginning of hypertextual navigation, or web surfing.

Like the <cite> element, anchors are typically parts of a sentence. To actually create a link, the anchor needs to do two things. First, it needs to designate something to be the link itself. This is what the user ends up being able to click. Second, it needs to specify the destination to which the link points. This is where the web browser will take the user when the user clicks the link.

The link itself is created just like any other element, by enclosing a word or phrase (or other item) within the <a> element. The resulting content of the link is referred to as the anchor text. Since the destination at which the link points is metadata, it is specified in an attribute of the <a> element. This attribute is called the hyperlink reference and is abbreviated to href for short. You can set the href attribute to any URL on the Internet or to any local page on the current site. Let’s take the following sentence as an example:

 For more information, visit Wikipedia, the free online encyclopedia.

It would make sense to link part of this sentence to Wikipedia, located online at http://wikipedia.org/ . So, use the <a> element and set the href attribute to the Wikipedia website:

 For more information, visit <a href="http://wikipedia.org/">Wikipedia </a>, the free online encyclopedia.

From a technical perspective, this is absolutely correct. The element is in the right place, it’s wrapped around the right word, and the attribute is valid. However, there’s something else you should think about whenever you create links from one page to another. Forgetting to think about this is one of the most common mistakes that web developers make. What you need to remember to think about is how you can link up as much relevant text as possible.

There are several reasons why this is an important thing to do:

  • It provides a larger clickable area for the visitor of the website. Rather than having to hover over a single word, multiple words together provide more room to click, and that makes it easier to use.
  • It creates more descriptive anchor text, and that can help people decide whether that link is really what they want to visit. It also improves accessibility by screen reader applications (for folks who are visually impaired).
  • It’s better from an SEO (search engine optimization) perspective to provide descriptive links to pages. Descriptive text in a link to another page is one of the factors Google takes into account when determining how relevant a page is to a certain subject.

How do you know whether the anchor text you’ve provided is good enough? A good rule of thumb that you can use to test it is that you should always be able to extract all the anchor text of all the links on your page; and without any supporting context, the links should clearly imply their destination. Avoid linking words that have nothing to do with where you’re linking to, such as “click here,” or just “here.” These types of links aren’t helpful to anyone. So again, the more descriptive a link’s anchor text is, the better.

To help this along, HTML5 has introduced the ability to create links out of multiple elements. So now, if you’d like to turn a large part of your header into a hyperlink, there’s nothing stopping you:

 <header>

  <a href="http://www.cnn.com/">

  <h1>CNN.com &mdash; Your source for news</h1>

  <p>CNN.com provides local and global perspectives on the news that matters to

  you.</p>

  </a>

 </header>

The only exception to this rule is that one <a> element can’t be nested inside another <a> element; but apart from that, there are no restrictions.

The href attribute, URLs, and web page addresses

Without links, the Web wouldn’t be the Web, so it behooves you to pay a little more attention to the href attribute and the values it can take.

For obvious reasons, the href attribute is always going to contain a hyperlink reference, which is really just a fancy way of saying a web page address or, more formally, a URL. The acronym URL stands for Uniform Resource Locator . For the purposes of this chapter, a URL is just a way to generalize the kinds of resources that you can request through a web browser. A web page is one kind of resource, an image is another, and a video is yet another. Most of the time, hyperlink references refer only to web pages. All URLs, including those that specify web page addresses, can be written in three distinct ways. Each of these ways is a valid value for the href attribute.

The most verbose form of a web page address is called a fully qualified URL. Fully qualified URLs are so named because they are composed of all three pieces of a URL: the scheme (also called the protocol), the domain name (also called the host name), and the path. In the example link to Wikipedia in the previous section, the scheme was http:, the domain name was //wikipedia.org, and the path was the trailing / (which means “the home page”).

Another way to specify a URL is called an absolute link . Absolute links require that only the path portion of the address is to be placed in the href value, but that the path begins with the website’s root, or initial forward slash (/). These are called absolute links because they specify the web page’s full path from the root of the website (typically the home page) to their current location.

The final method is called a relative link . Like absolute links, relative links also require that only the path portion of the address be present. Unlike an absolute link, however, a relative link must not begin with an initial forward slash. Relative links always use the context of the current location of your web page to infer the complete URL, so they are by definition restricted to always link to pages hosted at the same domain name. Relative links can be a little harder to grasp, so let’s look at a few examples.

A website’s structure is just like the files and folders on your computer. If you have a file called overview.html on your server and you want to link to messages.html, you could simply place messages.html as the href value, like this:

 <a href="messages.html">Messages from fans</a>

If you want to link to john.html in the people folder, you will have to add the folder (also called the directory) before the file, with a forward slash between the folder name and file name:

 <a href="people/john.html">John's Contact Information</a>

Both of these are relative links that link to different pages on the same website, relative to their own positions. Relative links are handy because they don’t need to be updated if the structure of some part of your website changes. They need to be updated only if the file that they reference moves (or if they themselves move).

What if you’re currently looking at john.html in the people folder and you want to link back to the overview HTML page above it? In this instance, you need to use a special symbol to indicate you want to move up one folder level before looking for a particular file. That symbol is two periods, which just means “the folder above me.” Like all folders, you then follow this symbol with a forward slash:

 <a href="../overview.html">Overview of our service</a>

The preceding snippet tells the browser to go up one level in the website tree, and then load the overview.html page. This ../ sequence can be used multiple times if you need to go up two or more levels.

It’s almost always best to link to files on your own site with relative or absolute links, as opposed to fully qualified ones. This saves you from having to type your own domain name over and over again (which makes your HTML files that much shorter); it also makes your site more portable in case your domain name changes in the future.

Emphasis

When working with almost any kind of content, some parts of that content will need to be emphasized more than others. For instance, many times when marking up a document, you need to make words bold or italic. Back in the old days, before web developers realized the importance of having semantically rich and meaningful HTML documents, this was most commonly done using the <b> and <i> elements. If you were to wrap a word or phrase in the <b> element, it would render as bold in the browser window. That seems simple enough, right? But let’s analyze this for a minute.

What does “bolding” some part of a sentence really mean? It doesn’t mean “make it bigger”—that’s just how it ends up looking. What it means is, “make it stand out.” The key concept, of course, is that when you think of making something bold, all you really want to change is its presentation, or how it looks. Well, as we all know, thanks to being the enlightened semantic web developers that we are, markup should be about what the content means and not what it looks like.

In modern HTML, you typically replace all occurrences of the <b> element with the <strong> element, which simply means “strong emphasis.” Similarly, you replace all occurrences of the <i> element with the <em> element, meaning simply “emphasis.” When you enclose words or phrases inside the <em> element, the browser automatically defaults to italicizing the text. Similarly, the browser defaults to bolding words or phrases inside the <strong> element.

Note: We’ll see how <b> and <i> are treated in HTML5 in the next section.

Be sure to use the <em> and <strong> tags only when you mean to emphasize text. If you’re looking to achieve italic or bold on text without meaning to emphasize it (e.g., on a book title), then you can stylize the text appropriately using CSS (this topic is discussed in future chapters). Don’t put <em> or <strong> tags around text just because you want it to be italicized or bold; instead, use a <span> tag with a class specified. Be sure to keep your markup meaningful and accurate.

Other inline elements

<strong> and <em> are called inline elements because they can be included in the middle of a block of text, and they won’t cause breaks or added spacing within that text. <cite>, which we discussed earlier, is also considered an inline element. Let’s take a look at a few other inline elements that will allow you to give your content even more meaning.

The old: redefined

In previous versions of HTML, <small> and <big> were used to format text as either bigger or smaller than regular paragraph text. In HTML5, <big> is gone, but <small> has taken on a new meaning: it’s now used to represent that small print that lawyers love to use so often. HTML5 makes no lawyer jokes here; it makes them feel included by giving them their very own element to splash about liberally.

Similarly, the <b> and <i> elements mentioned previously have been redefined. Instead of just making something bold, <b> now has the semantic meaning of “an element to be stylistically offset from the normal prose without conveying added importance.” So basically, it still does the same thing, but now it actually does it with feeling! <i> is a little more interesting; instead of meaning “italicize,” <i> now means “in an alternate voice or mood.” I don’t know about you, but I’ve been dying for a semantic way to represent my sarcasm online.

The previously mentioned <span> is really a semantically void inline element used for wrapping things up (very similar to <div> at the block level). Spans will often have a class or id assigned to them, and they’re used primarily as formatting hooks for a piece of text. In other words, if you can avoid using a <span> by using something a little more semantically rich, do it!

Look, shiny new elements!

HTML5 isn’t just a redefinition of the same old stuff. It also introduces a wide range of new elements for your semantic enjoyment. For example, how many times have you said to yourself: “Self, I wish there was a way I could highlight a passage of text that is relevant within its context.” (If your answer is anything other than “never,” maybe we should have a talk over a warm cup of tea).

Fear not! HTML5 rides to the rescue with the introduction of the <mark> element . The <mark> element (besides being a snappy dresser) is intended to highlight a piece of text, but to not give it any added importance. The best example I’ve read of a situation where you’d like to do this would be in a set of search results. If somebody searches for the word “HTML5” in your document, and you want to highlight all instances of that word in the search results, <mark>’s your man.

HTML5 introduces a pile of new elements for marking up your data (and by a pile, I mean two). <time> and <meter> are handy for your date and time needs, as well as for any measurements you need to include. They allow you to include some machine-readable attributes that keep things consistent, but that also make it so that you can use any format you’re comfortable with for display:

<time datetime="2012-02-14">Valentine's Day in 2012</time>will fall on a Tuesday. Remember to buy your sweetheart <meter max="36" min="12" optimum="24" value="24">two dozen roses</meter>to show how much you love him.

I went a little crazy in the preceding example, but it helps to show you the great attributes available to you.

Well, we’ve made some great progress with this chapter so far. Wait, that reminds me—there’s one more element worth mentioning: <progress> . Progress can be used to mark up a measure of progress (think: download or upload of a file). Just like <meter>, it’s got some great attributes to enhance its meaning:

 <progress min="0" max="100" value="30">You're now about 30% of the way through this chapter!</progress>

Lists

Lists are another common item in publishing. Using HTML, you can have three different kinds of lists: ordered, unordered, and definition lists. Ordered lists are meant to itemize sequential items, so they are prefixed by numbers that are listed in order by default. In contrast, items in an unordered list are meant to itemize things that don’t imply a sequence, so they’re prefixed by bullets by default. Definition lists are meant to hold glossary terms or other title and description items in a list, so they’re not prefixed by anything by default.

The elements you use to define these lists are <ol> for ordered lists, <ul> for unordered lists, and <dl> for definition lists. In each case, the only children these elements are allowed to have are the appropriate list items. So, let’s start building a list.

Unordered and ordered lists

For this example, let’s mark up a grocery list:

 Milk

 Eggs

 Butter

 Flour

 Sugar

 Chocolate Chips

Here, an unordered list makes perfect sense because the order you buy these items in really makes no difference. If this were not a shopping list but, say, a list of directions for making chocolate brownies, then perhaps the order you add these items to your mixing bowl would matter, and an ordered list would be more appropriate. But let’s stick with your shopping example and add the <ul> element to your list:

 <ul>

 Milk

 Eggs

 Butter

 Flour

 Sugar

 Chocolate Chips

 </ul>

So, you have all the items in your list inside the appropriate element. Next, each item of an unordered (or ordered) list must be contained within a list item element, which is defined with <li>. Let’s insert the <li> elements into your unordered list:

 <ul>

  <li>Milk</li>

  <li>Eggs</li>

  <li>Butter</li>

  <li>Flour</li>

  <li>Sugar</li>

  <li>Chocolate Chips</li>

 </ul>

Great! Aided by your shopping list, you go out and gather your ingredients for chocolate brownies. When you are ready to mark up your brownie recipe, however, you find out that these were actually put in this order for a reason (that’s the order you need to add them to the mixing bowl). Marking that up is as simple as replacing your <ul> element with an <ol> element. The <li> elements work inside both ordered and unordered lists, and the browser will render the appropriate prefix to each list item.

So, let’s see the markup for your newly ordered list:

 <ol>

  <li>Milk</li>

  <li>Eggs</li>

  <li>Butter</li>

  <li>Flour</li>

  <li>Sugar</li>

  <li>Chocolate Chips</li>

 </ol>

Notice that you didn’t need to put the numbers in the list at all; the browser does that on its own. This turns out to be especially convenient when you need to add items to the top or middle of an ordered list. There is no need to redo the numbering yourself because it’s handled automatically. This is one example of the power that can be harnessed by using semantic, meaningful markup.

It’s interesting to see how you can start to use multiple elements in concert, if needed, to continue encoding semantic meaning to your list’s content. For instance, if you wanted to emphasize the text inside a particular list item, you could nest the <em> element inside the <li> element, like this:

 <li><em>Chocolate Chips</em></li>

Similarly, you could link a particular list item to another page on the Internet:

 <li><a href="http://www.aeb.org/">Eggs</a></li>

You could even, if you wanted, emphasize the text that’s being linked:

 <li><em><a href="http://www.aeb.org/">Eggs</a></em></li>

You can also nest lists inside each other to create a list of lists. For example, you could expand your shopping list into item categories and their actual ingredients:

 <ul>

  <li>Baking Ingredients

  <ul>

  <li>Milk</li>

  <li>Eggs</li>

  <li>Butter</li>

  <li>Flour</li>

  <li>Sugar</li>

  <li>Chocolate Chips</li>

  </ul>

  </li>

  <li>Cereal

  <ul>

  <li>Fruity Pebbles</li>

  <li>Cheerios</li>

  <li>Cinnamon Toast Crunch</li>

  </ul>

  </li>

 </ul>

What you have here is an unordered list nested inside the list item of another unordered list.

By default, most web browsers will automatically switch the bullet styles of the inner lists when you create nested lists to make it easier to tell them apart.

Many elements in HTML can be nested inside the same elements (like these lists). Some, however, cannot. These rules are all defined in HTML’s DTD, although oftentimes the best rule of thumb is just to think about whether it makes sense for the element to be found inside another element of the same kind. Lists, as you can see, make perfect sense. Paragraphs, however, do not. What sense would it make to put a paragraph inside another paragraph? It wouldn’t make very sense at all (but you can certainly put a paragraph inside a list item).

Definition lists

Let’s quickly examine how to mark up definition lists because they are slightly different from ordered and unordered lists. Instead of the single <li> element for the list items, you have two elements to work with inside definition lists. These are <dt> for the definition’s title and <dd> for the definition’s description:

 <dl>

  <dt>body</dt>

  <dd>Holds content of the document</dd>

  <dt>head</dt>

  <dd>Holds additional information about the document</dd>

  <dt>HTML</dt>

  <dd>Root element of the document</dd>

 </dl>

You can see the basic premise is what you’re already used to: a <dt> element wrapping the definition’s title, followed by a <dd> element wrapping the description. All of these are held together by being enclosed inside a definition list.

Images

So far, you’ve explored various kinds of elements that deal solely with text. However, web pages can be a lot more interesting than that. Let’s find out how you can add images to your pages next.

When placing images in a document, you use the <img> element. Image elements are empty elements because they will be replaced with the image you’re referencing. However, to work properly, you need to tell the web browser where to find the image you’d like to insert, as well as give it a brief textual description of what the image is.

These two pieces of metadata are specified inside the two required attributes that <img> elements hold. First, the src attribute points to the actual image file to be loaded into the document. The same rules apply to the src attribute as apply to the href attribute on the <a> element. Relative or absolute, the src attribute gives the browser the URL that tells it where to look for the image.

The second attribute is alt, short for the alternate text for the image. (You might also hear people calling this the image’s alternative text, or alt text.) It’s used to describe what the image is and what meaning it has in the document:

 <img src="/images/moose.jpg" alt="The Majestic Moose">

Recall that this is an empty element, so it contains no other content, and thus it is opened and closed with a single tag. The attributes here indicate that you want to load in the moose.jpg file found in the images folder relative to the website’s root and that you want to provide the alternate text of “The Majestic Moose.”

The src attribute’s purpose is obvious, but the alt attribute’s purpose isn’t. Usually, the alternate text doesn’t even show up on the page when you look at it in your web browser. So, what’s it there for?

The alternate text actually provides several very important features. First, in keeping with the semantic importance of markup, it’s what lets the image provide some meaning to the document. In the previous example, if this were a page about wildlife, then the picture of the moose would likely be considered part of the content of the page. Without the alternate text in the <img> element, the image would be essentially “meaningless,” and the document would lose some of its semantic value.

On the other hand, if the rest of the page were about grocery shopping, and the picture of the moose were just there for visual ambiance, then the image would be probably superfluous and might be considered “not content.” In this case, you would leave the alternate text completely blank to indicate that the image is used only for visual reasons and shouldn’t be considered an important part of the content of the page (so that it’s not picked up by assistive technologies):

 <img src="/images/moose.jpg" alt="" />

Note that you still need to include the alt attribute in the <img> element code, but that the value of the alt attribute is missing (or null). Determining what images on your page should be “real content” and which ones shouldn’t is part of the art of web development. There often isn’t a clear or right answer, so you simply need to use your best judgment.

Assuming the picture does add some value to the page, the alternate text also helps out in a few additional, concrete ways. For example, even though it does not show up on the page in the normal sense, many web browsers will show the alternate text for an image if you hover your cursor over the image for a few seconds or if you give the image keyboard focus. You can think of it like a transient caption; it’s there if the visitor to your web page requests to see it, but it’s otherwise hidden.

Of course, some visitors to your website won’t be able to see at all. These visitors might be humans with visual impairments or they might be machines, such as Google’s search engine spider (googlebot). For these visitors, the presence (or absence) of the alternate text is more important than the image itself.

Visually impaired people browsing the Web often do so with a special kind of web browser that reads the content of web pages aloud to them. These programs are called screen readers. When a screen reader happens upon an image in your document, it uses the alternate text to describe what the image shows. Providing good alternate text for your images is crucial to making your web pages accessible and understandable to these visitors.

Similarly, when Google or Yahoo is reading the markup of your page, good alternate text in your <img> elements can help a search engine make sense of the purpose and content of a given image, allowing it to be more effective in ranking your web pages appropriately in the search result listings. Generally, the better your alternate text describes the content in your images, the higher your search result rankings will be.

Finally, if for any reason the image you’ve told the web browser to include in the page can’t be found (the image file itself) or loaded, most web browsers will replace the <img> element with the text of the alt attribute. This provides a sort of contingency plan to help ensure that visitors to your website get what they came for, even if they can’t get it in the preferred form. Similar techniques are used all the time in web development, and collectively they’re called graceful degradation .

When using images on your sites, be mindful of how many you have put on each page, how big they are, and what purpose they serve. Excessive use of images will make a page slow to load, and it might also be visually disruptive. Compared to HTML, images are extremely large. On most pages, the bulk of the time it takes to download a page is caused by its images.

What if you want to draw?

While your friend the <img> tag is great for displaying photos and other pre-generated images, what if you want to draw a picture or generate a chart of some data on the fly? HTML5’s got you covered on that front with the <canvas> tag.

The <canvas> element is very simple (by itself). All you do is specify the dimensions (and give it an id so that you can target it with the JavaScript you’re going to use to draw the picture):

 <canvas Id="a_simple_box" width="640" height="480"></canvas>

If you put anything between the canvas tags, browsers that don’t yet support <canvas> will render that instead:

 <body onload="drawIt();">

  <canvas Id="a_simple_box" width="640" height="480">

  <p>Sorry, you can't see the beautiful box drawn here. Get with the times.</p>

  </canvas>

 </body>

It’s a bit of a tease, but this chapter is already long enough without discussing the JavaScript necessary to actually draw the box. We’ll look at a listing of some sample code, but avoid discussing it in any detail (we will go into more about JavaScript in later chapters):

 <script type="text/javascript">

 function drawIt() {

  var canvas=document.getElementById('a_simple_box'),

  var context=canvas.getContext('2d'),

  context.strokeStyle='#000000';

  context.fillStyle='red';

  context.fillRect(10, 20, 300, 300);

  context.strokeRect(10, 20, 300, 300);

 }

 </script>

I want moving pictures (and sound)!

You kids these days are never satisfied with just still pictures. You’re always striving to engage more of your senses and incorporate video and audio. Luckily for you, the folks working on HTML5 are with you, and they’ve added a pair of new elements for just such outlandish purposes. Reader, meet <audio> and <video>.

Both <audio> and <video> have friends that play nicely together, but that come from very different backgrounds. There has been and continues to be some disagreement as to what file format should serve as “the video format” or “the audio format” for the Web. On the audio front, MP3 is hugely popular and it’s supported by the majority of the players (Internet Explored, Safari, and Chrome). Unfortunately the MP3 format has a few patent issues surrounding it, whereas the OGG format is completely open and free. Currently, OGG is the only format supported in Firefox, so in order to support maximum compatibility, both must be included. It’s pretty much an ideological battle at this point, and hopefully things will settle down eventually. But, for the time being, we’ve got to support both. Here’s what that looks like:

 <audio controls preload="auto">

  <source src="my_audio.ogg" type="audio/ogg" />

  <source src="my_audio.mp3" type="audio/mpeg" />

 </audio>

The <audio> element has only a few attributes, but these attributes may look a little different than what we’ve seen previously. In HTML5, not all attributes have to have a value assigned. Certain attributes of the audio and video tags, as well as certain form attributes, can be used in a standalone manner and act as a true/false type of toggle. If it’s present, the value is true; if it’s absent, it’s false. Here are a handful of standalone attributes for the <audio> element:

  • The attribute will show a player with play, pause, and volume controls (this is a true/false attribute).
  • The attribute will ensure that audio starts downloading “in the background” before somebody goes to play it.
  • There are also (one guess what that does!) and attributes. However, you should be careful with that last one because it can be really disorienting when you load up a web page and all of a sudden audio starts playing. Under most circumstances, it’s better to let your users control playback.

The <video> element has file patent problems similar to those of audio. The big options here are MP4 video and Theora; but just as with <audio>, there’s no harm in providing and specifying both. Here’s what that looks like:

 <video controls width="640" height="480" poster="img/slam_it.jpg">

  <source src="my_video.ogv" type="video/ogg" />

  <source src="my_video.mp4" type="video/mp4" />

 </video>

The controls attribute should be familiar to you from audio. We’re also specifying the height and width of the movie here, as well as providing a poster. Basically, that’s a static image that will display when the page loads, but the user hasn’t clicked the video’s play button, yet.

Tables

When you have to mark up data that is best presented in rows and columns (tabular data), that calls for the <table> element. Some common examples of tabular data are calendars (where rows are weeks and columns are the different days of the week) or comparison charts. When you think of tabular data like this, you often think of spreadsheets and all the riveting data that goes along with them.

Fortunately, marking up tabular data is relatively easy, and HTML gives you plenty of semantic elements to organize your data appropriately. Let’s start with some example data and walk through the process of marking it up with semantic HTML elements:

 <table>

 </table>

Next, let’s add the first row. Table rows are defined with the <tr> element:

 <table>

  <tr>

  </tr>

 </table>

This row has three distinct cells. HTML table cells are marked up using the <td> element, which stands for table data:

 <table>

  <tr>

  <td>First Name</td>

  <td>Last Name</td>

  <td>Birthday</td>

  </tr>

 </table>

You now have a table with one row and three cells in that row. But you can do even better, since this row is obviously a header that introduces what data you’ll find in the rows that follow. Using HTML5, you can use the <thead> element to mark up your header data and indicate that this row has this special purpose:

 <table>

  <thead>

   <tr>

   <td>First Name</td>

   <td>Last Name</td>

   <td>Birthday</td>

   </tr>

  </thead>

 </table>

This introductory row is special, but so too is each of the three cells, which are column headings, not ordinary table data. To indicate this in your markup, you’ll replace the <td> elements with <th> elements, which indicate table headings. The <th> element is used for any table cell that is a heading for either a column or a row (a column, in your case) to differentiate it from regular table data:

 <table>

  <thead>

   <tr>

   <th>First Name</th>

   <th>Last Name</th>

   <th>Birthday</th>

   </tr>

  </thead>

 </table>

Now that your header is in place, you can start adding the bulk of the table’s data. Just as your header data was enclosed in a <thead> element, your body data should be wrapped in a <tbody> element:

 <table>

  <thead>

   <tr>

   <th>First Name</th>

   <th>Last Name</th>

   <th>Birthday</th>

   </tr>

  </thead>

  <tbody>

  </tbody>

 </table>

To insert your individual data cells, you use the <td> element, with each row put inside a separate <tr> element, as before:

 <table>

  <thead>

   <tr>

   <th>First Name</th>

   <th>Last Name</th>

   <th>Birthday</th>

   </tr>

  </thead>

  <tbody>

   <tr>

   <td>George</td>

   <td>Washington</td>

   <td>February 22, 1732</td>

   </tr>

   <tr>

   <td>Abraham</td>

   <td>Lincoln</td>

   <td>February 12, 1809</td>

   </tr>

  </tbody>

 </table>

An important thing to note about HTML tables is that, when using this minimal markup, columns are created by implication based on the number of individual table data cells that exist in each row. With three table data cells per table row element, you create a three-column table. If you add a table data cell to the rows, you make a table with four columns. And so on. You can modify this behavior slightly using certain attributes that will be described in just a moment; but the important thing to remember is that (in most cases), only table rows are structurally grouped together in the markup, whereas the columns are usually implied.

Astute readers will point out that there is, in fact, a way to explicitly label columns using HTML markup. You can do this by defining a set of <col> elements that are optionally contained within one or more colgroup elements before the start of the table data. Although semantically rich and mostly harmless to old browsers, these elements aren’t widely implemented in the real world, so they are considered to be outside the scope of this book. Implementing them in this example is left as an exercise to the reader.

Before you call your table complete, however, you’ll add a few more things that will make it truly exemplary of semantic HTML markup. By design, a data table is a very visual element that relates data in columns and rows, so a viewer can grasp a complicated concept or more easily understand a large data set. For much the same reasons that you added the alt attribute to the <img> element, you’ll provide some additional data about this table in text form.

First, you’ll add a <caption> element, which is (obviously) meant to be a caption for the table. You can use this to provide a title for the table or perhaps an overview of its purpose. In this example, you could use “American Presidents” as the caption. Next, you’ll add a summary attribute on the <table> element. A table’s summary attribute is meant to hold more detail about what information is actually displayed in the table, and, just like the alt attribute on <img> elements, the summary is read aloud to visually impaired users who are browsing your web page with the aid of a screen reader. Let’s see this in code:

 <table summary="This table compares the first name,

  last name, and birth date of George Washington

 and Abraham Lincoln.">

  <caption>American Presidents</caption>

  <thead>

   <tr>

   <th>First Name</th>

   <th>Last Name</th>

   <th>Birthday</th>

   </tr>

  </thead>

  <tbody>

   <tr>

   <td>George</td>

   <td>Washington</td>

   <td>February 22, 1732</td>

   </tr>

   <tr>

   <td>Abraham</td>

   <td>Lincoln</td>

   <td>February 12, 1809</td>

   </tr>

  </tbody>

 </table>

Tables can have as many rows and columns as you’d like. As mentioned earlier, the number of <th> or <td> elements inside the <tr> element(s) dictates the number of columns in the table, and the number of <tr> elements dictates the number of rows.

Both <th> and <td> elements accept attributes called rowspan and colspan. These attributes indicate how many rows or columns this particular cell spans. The rowspan attribute, as you can infer, describes how many rows the particular cell spans, and colspan defines how many columns the cell spans. Let’s take a look at an example using your American Presidents table.

Instead of having the individual header cells for the first name and the last name, let’s merge those into one title cell called Name. Similarly, let’s take both birthday cells and merge them into one with new content that merely states, “A long time ago”:

 <table summary="This table compares the first name,

  last name, and birth date of George Washington

 and Abraham Lincoln.">

  <caption>American Presidents</caption>

  <thead>

   <tr>

   <th colspan="2">Name</th>

   <th>Birthday</th>

   </tr>

  </thead>

  <tbody>

   <tr>

   <td>George</td>

   <td>Washington</td>

   <td rowspan="2">A long time ago</td>

   </tr>

   <tr>

   <td>Abraham</td>

   <td>Lincoln</td>

   </tr>

  </tbody>

 </table>

You’ll notice in the markup that you need only two <th> elements now because the first one is taking the place of two of them. Similarly, you don’t need the third <td> for the birthday column in the last row because it’s being filled by the <td> from the previous row by the rowspan value.

As you can see, tables provide a lot of flexibility and can be a great way to display certain kinds of data. Before the advent of Cascading Style Sheets in modern web development, however, tables were more often used to control the visual layout of a web page, which polluted the document’s semantics and made web designers bend over backward to try to find clever ways of making their designs fit into the grid-like structure dictated by tables.

Even today, some websites that simply have to look right in old browsers use tables for this purpose. If you can avoid doing this, great! But if not, then remember this if you remember nothing else: include an empty summary attribute on the <table> element. Just as you would provide an empty alt attribute value on an image that serves no semantic purpose, so too must you declare a meaningless table if you use one!

Forms

Filling out forms online is a great way to glean information from your visitors or provide enhanced services to them. Although the scripts that actually receive and process such form data are beyond the scope of this book, in this section we’ll discuss how some of the various form elements work in HTML markup.

A quick note of caution before we proceed: A lot of the new elements in HTML5 aren’t well supported by most browsers. That said, a browser will just fall back to a regular text box in cases where it doesn’t yet support a particular element.

The first thing you’ll need in any form you create is the <form> element:

 <form>

 </form>

<form> elements accept many attributes, but only one, the action attribute, is required. The action attribute is similar to the href attribute in the <a> element. It specifies a particular URL where the script that processes the form data can be found. Let’s set the example form to be processed at /path/to/script:

 <form action="/path/to/script">

 </form>

When this form is submitted, it will send the browser to the /path/to/script page along with all the data from your form. Right now, however, there’s no data to be sent because you haven’t defined any fields to fill in. So, let’s add some fields to this form.

A <form> element is a lot like a <blockquote> in the sense that it can have only certain child elements. For simple forms, you can use the <p> or <div> elements that you’ve already seen used before, and you can place all the form’s controls into these elements. However, since you’re about to build a relatively complex form, you’ll use the HTML element specifically designed to group form controls into groups: the <fieldset> element.

The fieldset element

The <fieldset> element, which (as you might expect) defines a set of related form fields, is a lot like the <div> element. Both elements are used to group related elements together in a single chunk of code. Unlike <div> elements, however, <fieldset> elements can contain a special element called a legend that gives the <fieldset> element a name that is visible to your web page’s visitors. You might do well to think of this element in the same way you might think of the <caption> element for tables.

With the <fieldset> element and its legend in your form, your markup now looks like this:

 <form action="/path/to/script">

  <fieldset>

  <legend>Personal Information</legend>

  </fieldset>

 </form>

Adding an input element

The most common kind of form field is a simple text field where users can type something. These (and most other kinds of form fields) are added to forms by defining <input> elements. Form <input> elements can be of a number of different types, such as text fields, check boxes, or radio buttons. The kind of form control any particular <input> element becomes is specified by the value of its type attribute.

If a type attribute isn’t specified, the default of type="text" is inferred. For example, this is how you might ask for the visitor’s first and last name using a text input field inside your form:

 <form action="/path/to/script">

 <fieldset>

  <legend>Personal Information</legend>

  <input type="text" name="fullname" />

 </fieldset>

 </form>

There are two things you should notice about the <input> element used here. First, it’s an empty element, like an <img>. Second, you’ve also included an attribute called name, and you’ve given it a value of fullname. All form elements that need to provide some data to processing scripts need to have a name attribute with a value. This is the name that the processing script will use to refer to the data in this <form> element after the form has been submitted. If you don’t provide a name, then the processing script won’t be able to see any of the data in the form element.

You can see that the name attribute isn’t displayed anywhere (nor should it be), so there’s currently no easy way for visitors to know what they’re expected to type into that text field. To give the text field a name that human visitors can use to refer to it, you have to use the <label> element.

Adding a label

You can place <label> elements anywhere in the form—they don’t have to appear right next to the input field that they label. For this reason, each <label> element needs to be uniquely associated with an input field in some way other than proximity. You’ve already learned how to uniquely identify elements on a web page using the id attribute, so let’s first uniquely identify this form field with an id of fullname_field:

 <input type="text" name="fullname" Id="fullname_field" />

Now that you have a unique identifier to refer to this input field with, you can add the <label> element to your form. You’ll add it directly in front of the <input> element for simplicity’s sake; but again, you technically could have placed it anywhere inside the <form> element. When using a <label>element, the text inside the element should describe the field that the element labels (as the anchor text does for links), and the value of the for attribute should match the id of the field being labeled. In code, that might look something like this:

 <label for="fullname_field">First and Last Names:</label>

 <input type="text" name="fullname" Id="fullname_field" />

New attributes

In the last section, we hinted at some new attributes available to form fields in HTML5. Let’s take a look at those now and see what they do:

 <input type="text" Id="fullname" name="fullname_field" autocomplete="on" required autofocus />

The first part, you’ve seen before. We’re creating a text input with the name of fullname. We’ve also created an id for that element with the value of fullname_field. Here are the attributes you haven’t seen before, and what they do:

  • autocomplete tells the browser whether to use the built-in auto-completion functionality on that particular element. By default, it’s on (i.e., we didn’t have to specify it in the preceding example; we did so solely for illustrative purposes).
  • required indicates whether a form field is optional (if so, it’s omitted) or required. If it’s required, the browser won’t allow the form to be submitted until that field is filled in. A word to the wise, though: This new attribute isn’t very widely supported yet, so if you have a form where you absolutely need to have a certain field filled out, this isn’t the way to go (yet). Browsers that don’t support this attribute will just ignore it and treat the field as optional. You can turn to JavaScript and server-side processing to double-check the required fields (we’ll show you the JavaScript version in Chapters 9 and 10).
  • autofocus is a handy new attribute that replaces something that’s been done in JavaScript for some time. Have you ever loaded a web page and had your cursor immediately appear in the search box (think Google)? That behavior is called autofocusing.

Along with these new, timesaving attributes, HTML5 introduces quite a few new field types. Some look no different from current field types; they just behave differently under certain circumstances. Others are a huge sigh of relief for developers who have had to develop complex JavaScript analogs over the years. Let’s check them out.

New datatype fields

HTML5 ushers in a raft of contact data-type fields. Currently, in a desktop web browser, you won’t notice any difference in these fields from a standard text input (text box). The real magic happens when you access these fields with a mobile web browser (like the one on an iPhone). If you use <input type="tel">, a user will receive a numeric keyboard in Mobile Safari—because really, no one needs to enter a letter for her phone number:

 <input type="email" name="email" Id="email" />

 <input type="tel" name="mobile" Id="mobile" />

 <input type="url" name="web" Id="web">

<input type="email"> and <input type="url"> will also yield special keyboard configurations, including an @ sign for e-mail addresses and a menu of domain suffixes (e.g., .com, .net, or .org) for URLs. This is nothing earth shattering, but it is certainly helpful when people are trying to type things out with tiny on-screen keyboards:

 <input type="search" name="search" Id="search">

The search input type is another subtle variation from the standard text input. The only difference is that some browsers will style it differently than a regular text input. Arguably, using a search type is also better semantically if you’re planning on including a search field in your pages.

Adding a check box

Another possible type value for the <input> element is check box. A check box is a simple “yes or no” input, and it is nearly the same as the text field’s <input> element in code. Let’s use a check box to ask somebody whether he is currently enrolled in a college or university:

 <input type="checkbox" name="enrolled" Id="enrolled_field" value="1" />

 <label for="enrolled_field">I am currently enrolled in a college or university</label>

One of the attributes you’ve added to the check box is the value attribute (careful, this is an attribute named value). The value of the value attribute will be sent to the processing script only if the box is checked. Here you’ve used 1, a simple approach to indicate that “yes, this box was checked on the form.”

Let’s stop for a minute and think about how best to display a form. If you think about it, what you’re beginning to create is a list of inputs for the user to fill out. In this case, these inputs are in a particular order. It sounds like an ordered list fits this scenario perfectly. Let’s nest a few of the inputs and labels from previous examples in list items, and then nest the list items in an ordered list element:

 <form action="/path/to/script">

 <fieldset>

  <legend>Personal Information</legend>

  <ol>

  <li>

  <label for="fullname_field">First and Last Names:</label>

  <input type="text" name="fullname" Id="fullname_field" />

  </li>

  <li>

  <input type="checkbox" name="enrolled" Id="enrolled_field" value="1" />

  <label for="enrolled_field">I am currently enrolled in a college or university</label>

  </li>

  </ol>

 </fieldset>

 </form>

There’s one other attribute that you can use for a check box, for cases where you want it to be checked when the visitor first loads the form. In this case, perhaps a majority of your visitors will be enrolled in some college or university. All you need to do to make this happen is specify a checked attribute. The visitor can still uncheck the box; but in some situations, it might prove easier to have the field checked in the first place. Let’s see the markup for the check box when it’s checked by default:

 <input type="checkbox" name="enrolled" Id="enrolled_field" value="1" checked/>

 <label for="enrolled_field">I am currently enrolled in a college or university</label>

Adding radio buttons

Another input type that HTML forms permit you to use is a radio button. A radio button is similar to a check box, except it includes multiple options from which a user can choose, and only one can ultimately be selected. Radio buttons are thus almost always found in sets that ask the user to select one of a small number of items.

Each radio button is represented as an <input> element with its type set to radio. You associate multiple radio buttons as belonging to the same set by giving each one the same value in its name attribute. Each radio button’s value attribute, however, should be unique among the radio buttons in the set. Like a check box, each radio button needs a label, so each radio button’s id attribute should also be unique (in the entire document, not just for the set of radio buttons or even just for the <form> element they’re in).

For example, let’s ask a visitor her favorite color. You’ll give her four colors to select from: blue, red, yellow, or green. While marking up this question for your form, you’ll immediately notice that you have yet another list. This time, though, the items are not in any particular order, so you’ll use an unordered list.

You will also need to introduce the question because now your <label> elements are labeling the actual choices, instead of the question at hand. You’ll do that simply by including some text before you start the unordered list:

 <ol>

 <li>What is your favorite color?

 <ul>

  <li>

  <input type="radio" name="favorite_color" Id="favorite_color_blue" value="blue">

  <label for="favorite_color_blue">Blue</label>

  </li>

  <li>

  <input type="radio" name="favorite_color" Id="favorite_color_red" value="red" />

  <label for="favorite_color_red">Red</label>

  </li>

  <li>

  <input type="radio" name="favorite_color" Id="favorite_color_yellow" value="yellow" />

  <label for="favorite_color_yellow">Yellow</label>

  </li>

  <li>

  <input type="radio" name="favorite_color" Id="favorite_color_green" value="green"/>

  <label for="favorite_color_green">Green</label>

  </li>

 </ul>

 </li>

 </ol>

You may be thinking that there’s a problem using four checkboxes here: what if the user’s favorite color is something other than one of the listed options? HTML5 comes to the rescue again with the new datalist element:

 <input type="text" name="favorite_color" Id="favorite_color" list="colors">

 <datalist Id="colors">

  <option value="Red">

  <option value="Blue">

  <option value="Green">

  <option value="Orange">

  <option value="Yellow">

  <option value="Periwinkle">

 </datalist>

The datalist is an enhancement to a regular text input field. It provides a list of options that the user can choose from; or, the user can type in her own option if she doesn’t like any options in the list. This is ideal in a situation where you’re filling out a survey and being asked to select one of the options in the list or to select “other.” The datalist will save you the trouble of having two separate elements for this type of question.

Dropdown list

Let’s ask for your visitor’s time zone and provide a free-form area in which he can type whatever text he likes. Since these two items are somewhat unrelated to the previous three questions, let’s group them separately from the previous form items by using a new <fieldset> element.

To let the visitor select his time zone, you could display a set of radio buttons in a list, just as you did for his favorite color. However, radio buttons take up a lot of space. Furthermore, there are dozens of time zones in the world. If you used radio buttons again, your form would be gigantic. You could use the new datalist element, but there really are a finite number of options. We don’t want somebody typing in “-56 Wonkaland.”

Instead, let’s use a different form control that HTML makes available: a drop-down list. This control is defined by the <select> and <option> elements in tandem, much like a regular ordered or unordered list:

 <label for="timezone_field">Time zone</label>

 <select name="timezone" Id="timezone_field">

  <option>-5</option>

  <option>-6</option>

  <option>-7</option>

  <option>-8</option>

 </select>

This markup gives the user four options to select her time zone from: –5, –6, –7, or –8. These are actually the number of hours offset from UTC time of the eastern United States through to the Pacific coast. It’s how computers most often tell time (by keeping track of everything in UTC time, and then calculating offsets per time zone); however, this is not how people think of their time zones. So, using the value attribute for each option element, let’s give the computers what they expect, but also show human visitors values that are more comfortable for them to read:

 <label for="timezone_field">Time zone</label>

 <select name="timezone" Id="timezone_field">

  <option value="-5">Eastern</option>

  <option value="-6">Central</option>

  <option value="-7">Mountain</option>

  <option value="-8">Pacific</option>

 </select>

Now, when the user selects Eastern as her time zone, the processing script that sees this form will see the value –5.

You can place as many <option> elements inside <select> elements as you like. However, when there are a lot of options, or the options are representative of different categories, it makes sense to group these options. For that, you use the <optgroup> element, just as if you were using <div> or <fieldset>. Let’s add more time zones to the list, but let’s organize them inside <optgroup> elements, so that the categorization is obvious:

 <label for="timezone_field">Time zone</label>

 <select name="timezone" Id="timezone_field">

  <optgroup label="Mainland America">

  <option value="-5">Eastern</option>

  <option value="-6">Central</option>

  <option value="-7">Mountain</option>

  <option value="-8">Pacific</option>

  </optgroup>

  <optgroup label="Outlying States">

  <option value="-9">Alaskan</option>

  <option value="-10">Hawaiian</option>

  </optgroup>

 </select>

Notice that inside the <optgroup> element, you’re using a label attribute to give a name to your group of <option> elements. Be careful not to confuse this with the <label> element used to label the entire <select> element.

Adding a textarea

As a final touch, let’s add that free-form area of text where you encourage your visitors to send you their comments. This is accomplished with a <textarea> element, which is one of the simplest form controls to use. It has just two differences. First, unlike most of the other form controls you’ve seen, the <textarea> element does not need a value attribute because its content becomes its value. Second, it requires two new attributes, rows and cols, to indicate its default height and width, respectively. (This sizing can later be overridden with CSS.) A <textarea> element’s markup might look like this:

 <label for="comments_field">Comments</label>

 <textarea name="comments" Id="comments_field" rows="6" cols="65"></textarea>

Adding a submit button

At long last you have your form controls all set up to collect input from the user. There’s only one thing you’re missing: a way for the visitor to actually submit the form to the processing script. For that, you need to use one more type of <input> element: the submit type. This type of <input> element will create a button that the user can click to submit the form.

Typically, when using elements that create form controls that operate on the form (such as a submit button), you can skip specifying a label for that element because the form control itself effectively serves that purpose. Since you’re not using a <label> element for your submit button, you don’t have to use an id attribute either, so your element is really quite simple.

 <input type="submit" value="Submit this Form" />

Now your visitors will be able to click the Submit this Form button to send the form to the processing script.

All the rest

Of course, there’s even more you can do with HTML5 forms. For instance, there are a few more options for the type attribute that you can specify on <input> elements. An important one to know about is the password type , which creates an element that is identical to the text type, except that it renders all the characters that the user types into it as asterisks (*) or bullets (•) instead of actual text. The text that’s typed is, in fact, unchanged, but it doesn’t show up on screen for security purposes. There’s also the file type, which allows the visitor to select a file from his computer to upload through the form using his operating system’s native file-selection dialog box.

There are also a few more kinds of buttons that are similar to the submit type of input. For example, there’s a button type that displays a button that you can control through JavaScript, although it won’t do anything at all by default. There’s also a button element that can be used to turn certain other elements (such as an <img>) into a button. (These kinds of buttons will, by default, submit the form.) There’s also the image type of <input> element, which provides another way to turn an image into a button. This element takes a src attribute, just like the <img> tag.

Finally, there’s also a reset type of the <input> element that also creates a button; however, instead of submitting the form, this kind of button will restore the form to its original state, removing any input the user may have entered. There’s rarely any real need for this kind of button (especially on longer forms), but it’s there if you ever need it.

As you can see, designing forms can be quite an undertaking on the Web. They introduce a kind of interactivity that you can’t get from other kinds of media like television or magazines. Forms—and the capabilities they provide to enable two-way communication between the website creator and the website visitor—are what make the Web a truly unique publishing platform.

Less well supported

In addition to the subtle additions discussed earlier, there have been some other great additions made to the form element library. Unfortunately, a lot of these elements are not yet supported at all by most browsers, so I would encourage you to use a great deal of caution if you choose to use them. Testing your form with a range of browsers and providing fallbacks or alternatives is a good idea here.

<input type="date"> scratches an itch that a lot of developers have had for a long time. It provides a browser-native date picker that lets you control the input and formatting of the date in a form. The problem this solves is that there are a lot of ways to write a date. For example, you can write any of the following: 12-Oct-2012; October 12th, 2012; 2012-10-12; or even 12/10/2012. All are acceptable formats for you, but a computer (database) generally expects dates to be in a particular format (the mySQL date type is YYYY-MM-DD, for example). If somebody provides you with something other than what’s expected, you either need to validate that and get her to fix it, or fix it yourself using Javascript and/or some type of server-side script. <input type="date"> will give you better control over what can be input, and it will put it in a very user-friendly format!

Similarly, <input type="color"> will help you control the input on your forms, so that they can prompt the user to enter a proper color value. Most on-screen color values are represented in hexadecimal format (#FFFFFF is white, for example). Eventually, when browsers support it, this input will yield a beautiful color picker (like the one you see in PhotoShop) that will translate colors into six character strings for you. Until that day, dream on!

The last two new input types focus on numerical values. <input type="range" min="10" max="1000"> will display a slider control that allows the user to slide a knob back and forth and select a value within a range of values. Meanwhile, <input type="number" min="0" max="1000"> will give you a text box with an up and down arrow that allows the user to pick a precise value within that range (the range input just gives you a graphical representation, allowing the user to “ballpark” it).

Special characters

When marking up content, you’re bound to notice that there are certain characters that just don’t display properly when you put them directly in your markup. Perhaps the most obvious of these would be the less-than and greater-than symbols, which are also called left and right angle brackets (< and >). Instead of treating these like angle brackets, the browser assumes these are the beginnings and endings of tags.

Because of the double duty these characters serve, if you actually want to use < or > in your content, you need to use special codes called entity references to refer to them. In computer speak, this is called escaping the characters. Escaping characters in HTML5 is achieved simply by replacing the character you intend to type with a combination of other characters that represent it. In every case, escape sequences such as these begin with an ampersand (&) and end with a semicolon (;).

The entity references for the less-than and greater-than symbols are &lt; and &gt;, respectively. This, however, poses a new problem. How do you mark up an ampersand symbol? The answer is the same: you escape it by using its entity reference. The entity reference for an ampersand is &amp;.

There are hundreds of different entity references, and an exhaustive glossary would be superfluous in this book. However, keep these in mind when marking up your content because they will probably prove very useful. For example, assume you want to make the following sentence appear on your page:

 I'm sure that using the <div>tag is more professional than "faking a layout" using tables.

First, you can see that we’ll need to replace the angle brackets with their entity references. Don’t forget to mark up the HTML tag as computer code, either:

 I'm sure that using the <code>&lt;div&gt;</code>element is more professional than "faking a layout" using tables.

There is also the issue of the curly quotes around the words “faking a layout.” Let’s replace those:

I'm sure that using the <code>&lt;div&gt;</code>tag is more professional than &ldquo;faking a layout&rdquo; using tables.

And finally, you’ll need to escape the curly apostrophe in the word “I’m”:

I&rsquo;m sure that using the <code>&lt;div&gt;</code>tag is more professional than &ldquo;faking a layout&rdquo; using tables.

When you view this in a browser, it will look like your original sentence.

Using entity references is quite simple because you’re simply replacing one character with a short sequence of others. They’re helpful in keeping your markup portable across browsers, and they keep your content looking and acting properly. You can get a complete list of entity references at www.webstandards.org/learn/reference/charts/entities/ .

Let’s go to an example!

We’re going to run with an extended example throughout the next few chapters. This chapter on HTML has shown you how to structure your pages and how to outline your content. Unfortunately, by themselves, these pages would look pretty plain. In the next couple of chapters, we’re going to show you how to add some style to your pages using CSS, and then show you how to add some function and interactivity using JavaScript.

We thought that a good example to work with would be a registration form for an event. That will let us use a number of the new HTML5 elements and really kick the tires on things. So without further ado, let’s start with our basic page shell:

 <!DOCTYPE html>

 <html lang="en">

  <head>

  <meta charset="utf-8">

  <title>Summer Smash 2012 Registration</title>

  <link rel="stylesheet" href="css/style.css">

  <script src="js/javascript.js"></script>

 </head>

 <body>

  </body>

 </html>

The preceding code provides a good starting point. Next, let’s outline some of the basic elements of our page between the <body> tags:

 <body>

  <div Id="pagewrap">

   <header>

   </header>

   <section Id="main">

    <section Id="registration_form">

    </section>

    <aside>

    </aside>

   </section>

   <footer>

   </footer>

  </div>

 </body>

So far, we’ve laid out the “block” structure for this page. It’s a registration form, so we want to do away with a lot of the interface clutter we’d find on a regular web page, so that we don’t distract our user from his goal here. (Have you ever noticed how the checkout process on Amazon lacks page navigation? It’s intentional!) That said, let’s provide a way to get back to the rest of our web site, in case the user isn’t quite ready to register, yet:

 <header>

  <a href="/">

   <h1>Summer Smash 2012</h1>

   <p>This summer's most smashing event!</p>

  </a>

 </header>

Let’s also put in a little contact information, just in case our user has some pre-registration questions. The <aside> is the perfect spot for this:

 <aside>

  <h2>Questions about whether Summer Smash is right for you?</h2>

  <p>We're here to help! Our team of smashtastic smashers will

 shatter any doubt in your mind. Why not give us a call, or send

 us an email?</p>

  <ul>

   <li>Call: 1-800-555-5555</li>

   <li>Email: <a href="mailto: [email protected]">[email protected]</a></li>

  </ul>

 </aside>

Finally, it wouldn’t hurt to say who’s behind this whole Summer Smash thing—you know, give it some credibility by associating our names with it. The footer seems like a good place to do this:

 <footer>

  <p>Summer Smash 2012. A Barker and Lane Production.</p>

 </footer>

We’ve got a lot of the peripheral elements assembled now, so let’s put them together and see what that looks like:

 <!DOCTYPE html>

 <html lang="en">

 <head>

  <meta charset="utf-8">

  <title>Summer Smash 2010 Registration</title>

  <link rel="stylesheet" href="css/style.css">

  <script src="js/javascript.js"></script>

 </head>

 <body>

  <div Id="pagewrap">

   <header>

    <a href="/">

     <h1>Summer Smash 2012</h1>

     <p>This summer's most smashing event!</p>

    </a>

   </header>

   <section Id="main">

    <section Id="registration_form">

    </section>

    <aside>

     <h2>Questions about whether Summer Smash is right for you?</h2>

     <p>We're here to help! Our team of smashtastic

 smashers will shatter any doubt in your mind. Why not give us a

 call, or send us an email?</p>

     <ul>

      <li>Call: 1-800-555-5555</li>

      <li>Email: <a href="mailto: [email protected] ">[email protected]</a></li>

      </ul>

     </aside>

    </section>

   <footer>

    <p>Summer Smash 2012. A Barker and Lane Production.</p>

   </footer>

  </div>

 </body>

 </html>

Things are looking pretty good. Now we just need to drop our form into the registration form section. What do we need for a registration form? Let’s start with some basics:

 <section Id="registration_form">

 <form action="/registration/">

  <fieldset>

  <legend>Registrant Information</legend>

  <ol>

   <li>

    <label for="name">Registrant name:</label>

    <input type="text" name="name" Id="name" required autofocus />

   </li>

   <li>

    <label for="email">Email address:</label>

    <input type="email" name="email" Id="email" required />

   </li>

   <li>

    <label for="phone">Phone number:</label>

    <input type="tel" name="phone" Id="phone" />

   </li>

   <li>

    <label for="party">How many people do you

 have in your party (including yourself)?</label>

    <input type="number" name="party" Id="party" min="1" max="10" />

   </li>

   <li>

   <label for="dob">Your date of birth:</label>

   <input type="date" name="dob" Id="dob" />

   </li>

  </ol>

  </fieldset>

  <fieldset>

  <legend>A few quick questions</legend>

  <ol>

   <li>

   Is this your first Summer Smash event?

   <ul>

    <li>

    <input type="radio" name="yes_first" Id="yes_first" value="1" />

    <label for="yes_first">Yes, this is my first</label>

    </li>

    <li>

    <input type="radio" name="no_first" Id="no_first" value="0" />

    <label for="no_first">No, I've been to one before</label>

    </li>

   </ul>

   </li>

   <li>

   <label for="how_hear">How did you hear about Summer Smash 2012?</label>

   <input type="text" name="how_hear" Id="how_hear" list="media">

   <datalist Id="media">

    <option value="Google search">

    <option value="Magazine ad">

    <option value="A friend told me">

    <option value="An enemy told me">

    <option value="My dog told me">

    <option value="My dead uncle Henry told me">

   </datalist>

   </li>

  </ol>

  </fieldset>

  <input type="submit" value="Register now!" />

 </form>

 </section>

We’ve definitely taken some chances here, using those brand-spanking-new HTML5 form elements; however, the Summer Slam crowd is generally pretty tech savvy, and it’s likely to be using the latest and greatest browsers available. Heck, even if it isn’t, we’ll give it some fallbacks—but not until we touch JavaScript, so that aspect of the site is just going to have to wait!

Our page is complete! It’s ugly, but it’s complete! We’ll address that ugliness factor in the next couple of chapters when we dive into using CSS (Cascading Style Sheets) to fancify our markup. Just note at this point that, while we’ve laid out the basics for our page in this chapter, we the authors do hereby reserve the right to make changes to this markup in later chapters. That’s life. Sometimes, the day before launch your client comes to you and demands that you ask for the user’s shoe size on the registration form. No big deal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset