If you ever have to develop HTML documents—when developing personal Web sites, completing a class project, or creating Web pages on the job—the tidy utility can be a handy resource for you. If you’re creating HTML pages by hand, you’ll likely make occasional errors. These errors probably won’t cause significant problems with using the pages, but they might make the pages harder to read, harder to maintain, and harder to subject to the scrutiny of your peers. Not to worry; tidy can help!
tidy is not usually included with Linux or Unix distributions, but you can download (and install, using the instructions in Chapter 14) from http://tidy.sourceforge.net.
1. | vi sampledoc.html Use the editor of your choice to create an HTML document. Our sample document is called, well, sampledoc.html (Figure 17.1) Don’t worry about getting the tagging or syntax exactly right; tidy will take care of the details. Save and close your document. Figure 17.1. Even a flawed HTML document, like this one, can be fixed by tidy. | |
2. | tidy sampledoc.html The tidy utility will apply HTML formatting rules and then output a massaged version of your document that is technically correct (Code Listing 17.1). Cool, huh? Code Listing 17.1. The tidy command is handy for cleaning up HTML documents.
| |
3. | tidy sampledoc.html > fixedupdoc.html If you like the results, redirect the document to a new filename, as shown here, or use tidy –m sampledoc.html to replace the original document. |
✓ Tips
For even spiffier results, we like using tidy –indent –quiet ––doctype loose—modify sampledoc.html, which suppresses the informative messages from tidy, makes the output an HTML 4 document, tidily indents the output, and replaces the original with the modified file (Code Listing 17.2). All that, and only one command.
Consider using tidy with the sed script (described in the next section) to do a lot of cleanup at once.
[jdoe@frazz public_html]$ tidy -indent → -quiet—doctypeloose sampledoc.html line 10 column 6 -- Warning: discarding → unexpected </ul> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML → 4.01 Transitional//EN"> <html> <head> <meta name="generator" content="HTML → Tidy, see www.w3.org"> <title> Jdoe's Home Page </title> </head> <body> <h1> Making Unix Work, One Day at a Time </h1> <p> Read these tips, when I get around → to writing them, and weep. </p> <ul> <li> To be written </li> <li> To be written later </li> <li> To be written next week </li> </ul> <address> [email protected] </address> </body> </html> HTML&CSS specifications are available → from http://www.w3.org/ To learn more about Tidy see → http://www.w3.org/People/Raggett/tidy/ Please send bug reports to Dave Raggett → care of <[email protected]> Lobby your company to join W3C, see → http://www.w3.org/Consortium [jdoe@frazz public_html]$ |