Chapter 12. Design patterns for advanced user interfaces
When introducing Greenstone in Section 1.7, we claimed that it is capable of implementing almost all of what is presented in Part I of this book. Take a moment to flip through the illustrations and review the panoply of digital libraries presented here. *Chapters 10 and 11 have taught you most of what you need to know to recreate in Greenstone the functionality described in earlier chapters:
• how to handle different media types and file formats
• how to extract and manually assign metadata
• how to build browsing structures and search indexes based on metadata and text
• how to support different languages
• how to present documents in different ways depending on their media form
• how to add authentication to individual documents or entire collections
• how to interoperate with other digital library systems.
When introducing Greenstone in Section 1.7, we claimed that it is capable of implementing almost all of what is presented in Part I of this book. Take a moment to flip through the illustrations and review the panoply of digital libraries presented here. *Chapters 10 and 11 have taught you most of what you need to know to recreate in Greenstone the functionality described in earlier chapters:
• how to handle different media types and file formats
• how to extract and manually assign metadata
• how to build browsing structures and search indexes based on metadata and text
• how to support different languages
• how to present documents in different ways depending on their media form
• how to add authentication to individual documents or entire collections
• how to interoperate with other digital library systems.
What you have not learned, however, is how to control the presentation of Greenstone collections to give the appearance you want. Chapters 10 and 11 touched on this, but concentrated on functionality rather than appearance. In this chapter presentation is the primary focus.
*Note incidentally that Greenstone supports page flipping too (see Figure 3.5). To turn it on for HTML documents in your own collection, just add the option – use_realistic_book in HTMLPlugin.
Greenstone provides two mechanisms to control the presentation of Web pages: format statements and macros (see the examples of Chapter 10). We used a format statement to add volume and date information to the search results list of the paged image collection; we customized the appearance of the Greenstone home page by replacing the macro home.dm with yourhome.dm (see the end of Section 10.4). Now it is time to develop more advanced user interfaces. We will express examples in the form of design patterns, a software engineering term for general, reusable solutions to commonly occurring problems.
Format statements and macros work in concert with cascading style sheets (CSS, described in Section 4.4), but operate in a way that is more tightly coupled with the digital library software itself. As we saw in the branding example at the end of Section 10.4, CSS can be used to alter the overall style of your digital library: the size and typeface of the font used, the width of the margins, and so on. Format statements and macros control far more. For example, macros determine whether the query page has a single one-line text box, a large query box capable of holding whole paragraphs, or a form with boxes for different metadata elements. Format statements dictate the number of columns in a table and what to put in each one. Large, unwieldy format statements, which can arise when developing complex interfaces, can be stored in macro files. A further degree of control is provided by the programming capabilities of the JavaScript language. For example, a Greenstone collection can be made to respect local conventions for displaying dates, based on the locale set in the user's operating system. We encounter all these in the examples below.
Format statement and macro files are arcane—a fact that stems from their power rather than the whim of their designers!—but it is worth taking time to master them. They are the key to radical customization of a Greenstone digital library and can alter not only presentation but functionality as well. They are what allow collections like Papers Past (Figure 3.4) to be implemented with Greenstone. It is said that the true measure of a software tool's success is when it is used in ways that its designers never even imagined, and that is certainly the case here. An experienced hand can quickly transform a collection from Greenstone's default behavior to something breathtakingly different. You can see the possibilities by browsing the list of sites at www.greenstone.org/examples.
This chapter begins by reviewing and extending the material on format statements introduced in Chapter 10 by presenting some commonly used examples. Next we explain the macro language, which was mentioned in passing when describing Greenstone's institutional repository facility (see Section 11.6).
We follow with a series of five design patterns. The first shows how to use macros to add static content to Greenstone collections, a technique that can be repeated to create as many additional pages as needed—our illustration adds a page containing personal contact details. The second uses JavaScript to further control the format of a Web page—our illustration uses it to format dates more clearly and to convert file sizes from bytes to KB and MB as appropriate. Metadata cannot be represented directly in macros as it can be in format statements; the third design pattern shows how to overcome this by passing metadata as arguments to a macro.
In the late 1990s, dynamic HTML revolutionized the user experience of the Web by allowing pages that crackle with interactivity. The key is real-time manipulation of CSS in tandem with the document object model (DOM) that represents the structure of Web pages. The fourth design pattern illustrates this by showing how to inject header information into alphabetically ordered browsing lists and how to selectively expand and contract sections of a table of metadata. A further significant leap in Web interactivity came with AJAX—Asynchronous JavaScript and XML—and the fifth design pattern starts with a simple example and proceeds to describe a digital music page-turner developed within the Greenstone framework.

12.1. Format Statements and Macros

Greenstone ships with a selection of default macros and format statements. The practical work in Chapters 10 and 11 has familiarized you with these defaults and with the appearance they give digital library collections. However, one shoe does not fit all. This chapter shows, through examples, how to go beyond the default setup.
In broad terms, macros control the structure of pages: where page headers and footers are and what they show, where the results are displayed on the search results page, etc. Format statements fill in details within this structure: for instance, they control just what information the search result list displays. An important difference is that format statements are implicitly associated with documents and consequently can access their metadata (by enclosing the item name in square brackets, such as [dc.Title]; see Section 10.4). Macros are not associated with documents, and therefore cannot access metadata directly. In broad terms:
• Macros express static site-wide information, yielding headers, footers, help text, and so forth.
• Format statements generate dynamic metadata-driven content from a particular collection.
This is an over-simplification, however: for example, it is possible for macros to access metadata if they are invoked from within a format statement. We show how to do this in Section 12.2. But the simplification is useful to get us started. In the next two sections we examine these mechanisms in turn, starting with a series of format statements that demonstrate common ways of tweaking the default presentation, and proceeding to an overview of what macros do and how they are expressed.

Format statements

Format statements introduce document metadata into text marked up in HTML. Additional control is provided by {If} and {Or} statements. These concepts are imported from computer programming languages but are no more difficult that formulas in a spreadsheet, which is familiar territory for many non-programmers. The structure of an Excel if expression is the same as Greenstone's: Excel's If( test,value-if-true,value-if-false) corresponds to Greenstone's { If}{ test,value-if-true,value-if-false}. Unlike Excel, in Greenstone such statements are intermingled with HTML-tagged text, and the role of the curly brackets is to allow the runtime system to sift them out. The {Or} statement is logically unnecessary, providing a syntactic convenience in certain situations: everything it can do can also be done with a more verbose combination of { If}s.
In writing, lists are usually organized vertically, so the most commonly customized format statements are for VLists—vertical lists (see Format features in Section 10.4). Here are some common patterns.
If a VList format statement starts with a <td> tag, it is assumed to be a row in a table. Greenstone automatically encloses it with <tr> … </tr> tags and encloses the entire table with <table> … </table>. Basic control over document items is provided by
<td>[link][icon][/link]</td>
<td>[dc.Title]</td>
This generates a two-column table, with a link to the document in the first column and title metadata in the second. The original source document—or anything else—could be added in an extra column:
<td>[srclink][srcicon][/srclink]</td>
<td>[link][icon][/link]</td>
<td>[dc.Title]</td>
Note that link, icon, srclink, and so on are actually treated as extracted metadata—they are generated by Greenstone and associated with each document. The Librarian interface presents names for extracted metadata as beginning with the prefix ex. However, this prefix is omitted internally: Greenstone assumes extracted metadata by default when it encounters unqualified metadata names like link, icon, and srclink.
Another pattern introduced in Chapter 10 is the use of image thumbnails instead of the icon that is automatically generated by [ icon]:
<td>[srclink][thumbicon][/srclink]</td>
<td>[dc.Title]</td>
Note the similarity with the first of the two examples above. Exactly the same effect can be achieved by replacing the first table cell (i.e., line 1) above by this more complicated version:
<td>[srclink]<img src="_httpassocdir_/[assocfilepath]/[Thumb]"
width="[ThumbWidth]" height="[ThumbHeight]">[/srclink]</td>
In fact, behind the scenes the ImagePlugin sets thumbicon metadata to something equivalent to this expression (it also sets ThumbWidth and ThumbHeight metadata for the thumbnail image). While thumbnail images are common enough to warrant thumbicon metadata as a convenient shorthand, a huge variety of other associated files might be bound to a document, and this more explicit technique shows how to link to such files.
In the excerpt above, the src attribute with value _httpassocdir_/[assocfilepath]/[Thumb] creates a URL for the associated image. The image plug-in sets up the thumbnail using Greenstone's generic mechanism for a document's associated files. The macro _httpassocdir_ gives the name of the folder where Greenstone places associated files (in fact, this is the folder <collectionname>/index/assoc); within this file area, assocfilepath metadata gives the name of the associated-file folder for this document; and Thumb metadata gives the name of the actual thumbnail image. The same pattern can be used to construct < a href=…> hyperlinks to associated files. There is a range of plug-ins in Greenstone that set up associated files following this pattern, varying only the last part: the metadata name (or perhaps explicit filename) used for the actual resource associated with the document.
Another useful pattern encountered in Section 10.4 (Formatting exercise 1) is
<td>[link][icon][/link]</td>
<td>{If}{[numleafdocs],[Title]([numleafdocs]),[dc.Title]}</td>
The first cell is the same as in the first example above. The second uses the variable numleafdocs in two different roles. Its first occurrence tests whether the variable has a value, in order to determine which of the two branches of the If to take. The metadata numleafdocs is only set at internal nodes of a hierarchy (e.g., bookshelves in a classifier), and gives the total number of leaves below this node, which is printed in parentheses by the second occurrence above. In either case, the title is printed: the extracted title [Title] for internal nodes, which is set to the metadata value corresponding to that node; and the Dublin Core title [dc.Title] for leaf nodes (i.e., documents).
When format statements become overly complicated, they can often be simplified by going back to the source documents and assigning metadata that supports what you want to present. For example, if different types of document need to be presented in different ways (article, paper, book, …), you could define a metadata value, say dc.Format, that specifies the type. If different types are in different folders in the input material, this can be quickly done by assigning metadata in the Librarian interface at the folder level. This metadata is not necessarily presented to the user, rather it is used as internal metadata that helps simplify the logic of If statements.

Macros

Greenstone macros control the structure of all Web pages generated; they also provide language-independent interfaces. Figure 12.1 gives an artificial excerpt that illustrates how macros are defined and used. Macro definitions usually comprise a name flanked by underscores, and a definition placed within braces { … }. Lines beginning with the hash character (#) are comments. A second form of definition is for macros with arguments (see below). Macro definitions contain a mix of plain text, HTML, and references to other macros. For conciseness, the definitions in Figure 12.1 are short. In practice they often span 20 lines or more, indented to improve readability. Macros do not contain references to metadata values like [dc.Title], because macro definitions are not bound to particular documents.
B9780123748577000165/gr1.jpg is missing
Figure 12.1:
A macro file
Macros are grouped into packages, and an inheritance scheme is used to determine which definitions are in effect at any given time. This allows the content generated for a page to be embedded within a global formatting style. For instance, most pages have a standard header and footer that enclose page-specific content. Figure 12.1 shows a baseline page defined in the Global package, whose _content_ macro is not intended to be seen. It is overridden by the immediately following query package that generates a page inviting users to enter search terms and perform queries.
Macros can include parameters interposed in square brackets between name and content. These are known as page parameters because they control the overall generation of a page, and they are expressed as [ x=y], which gives parameter x the value y. The _header_ macro in Figure 12.1 uses the page parameter l, which determines the interface language. Another parameter (not shown in this example) is v, which controls whether images are used in the interface. A precedence ordering for page parameters is built into the macro language to resolve conflicting definitions.
Figure 12.1 gives versions of the _header_ macro in English, French, and Spanish. They set the parameter l to the two-letter international standard abbreviation (ISO 639), enabling the system to present the appropriate version when the page is generated. If a macro has no definition for a given language, the parameter is simply omitted, which in standard Greenstone produces an English version because this is the most complete and up-to-date language for the interface. However, it is easy to change the default. It is specified in the main configuration file (see Configuration files in Section 11.2) using the cgiarg option, specifying (for French) cgiarg shortname=l argdefault=fr.
The macro language has its own version of the If statement _If_—an example appears in Figure 12.1's _content_ macro. In this case it uses the macro _cgiargqb_ to determine whether to display a normal query box or a large one on the search page. The value of _cgiargqb_ is set at runtime by the Greenstone system; its value is ultimately determined by the user's setting on the Preferences page, which can be controlled through a pull-down menu that offers the choice of query style as single line or large. Choosing large causes the Web page to set the HTML form field qb to large, which in turn leads the runtime system to set _cgiargqb_ to large. Several other system-defined macros have values that are determined at runtime—for example, the number of documents returned by a search. We return to this point below.
The final macro definition in Figure 12.1, _matches_, illustrates the use of arguments. If invoked by _matches_(5), this generates the text 5 document(s) matched your query. An If statement could be added to separate the singular and plural cases, rather than resorting to the clumsy “document(s)” —but that is not the point this part of the figure is looking to demonstrate. Notice the use of _1 _ in the macro definition to refer to the first argument passed in when the macro is invoked. The pattern continues with _2 _ for the second argument (if required), and so on; and _matches_(5,3) would be used for a two-argument macro. It is, however, not normal to call a macro with fixed numeric arguments such as 5 and 3. Values set by the runtime system (such as _numdocs_) or metadata from a document are more common.
Figure 12.1 is artificial in both content and layout. Although macro definitions from different packages can be mixed together in a single file, as here, this is not the norm. Usually, different files control the layout of different pages, such as about.dm, query.dm, and home.dm for the Greenstone About, Query, and Home pages. There are two exceptions to this convention. One is a collection-specific macro file called extra.dm, which contains any macros defined in the Macros section of the Librarian interface's Format panel. Another is the set of macros containing textual language fragments, which are grouped into files by language ( english.dm, french.dm, and so on). In fact, each language has two files, for example english.dm and english2.dm, the former containing phrases used in the core Greenstone system and the latter containing those in auxiliary subsystems. The distinction is made to avoid overwhelming translators and to help them focus on the text used most often.
The standard Greenstone Reader's interface is built entirely out of macros. It is an excellent source of additional examples of how to use them.

Commonly used macros

Table 12.1 lists three values defined in Greenstone's site configuration file ( gsdlsite.cfg, Section 11.2) that are set as macros _httpprefix_, _httpweb_, and _gwcgi_. The purpose of these values is to map Greenstone into the Web server's space, with increasing levels of refinement. Greenstone is usually installed in a single folder in the file system (say greenstone-2.83), which is mapped to a unique prefix in the Web server's configuration file that is also recorded in _httpprefix_. For example, setting it to /greenstone would be a good choice. If a more up-to-date version of Greenstone were installed later, this could be placed in a new folder on the file system, collections moved across as needed, and the Web server's configuration file altered to map /greenstone to the new location.
Table 12.1: Global macros defined in gsdlsite.cfg
DefinitionDescription
_httpprefix_Specifies the HTTP address of the Greenstone home directory, from which you can build up URLs for images and collections
_httpweb_Specifies the HTTP address of the top-level web folder in the Greenstone home directory
_gwcgi_The URL of the Greenstone digital library, which is an executable CGI program named library.cgi (the gw, short for gateway, in gwcgi is an archaic historical hangover)
The _httpweb_ macro specifies how the root area for supporting files such as images, CSS, and JavaScript in Greenstone is mapped to the Web server's space. By default, the folder web, located in the folder where Greenstone is installed, forms the root, with sub-folders for images, style and script. Continuing the configuration example from before, having set _httpprefix_ to /greenstone, _httpweb_ would need to be set to /greenstone/web. In fact, on this occasion it can be omitted as this is its default value. The macro _gwcgi_ provides the URL for accessing the digital library. The norm is to leave this undefined, in which case, for the configuration being described, the library's URL is  /greenstone/cgi-bin/library.cgi.
These three macros are fundamental to Greenstone's operation, and numerous definitions in the macro files are based upon them. Table 12.2 shows some macros that are often used for defining other macros and in format statements. The upper entries define partial URLs that pinpoint general areas within the installation from which full URLs can be formed. The lower part gives fully formed URLs that can be used directly to link to pages like a collection's Preferences or Query page.
Table 12.2: Macros in base.dm that define URLs
DefinitionDescription
Partial URLs_httpimages_URL prefix for where the site-wide images folder is located in the Greenstone home directory
_httpstyle_URL prefix for where the site-wide CSS style folder is located in the Greenstone home directory
_httpcollection_URL prefix for where the current collection is located
_httpcimages_URL prefix for where the collection-specific images folder is located
_httpassocdir_URL prefix for where the collection's associated files directory is located
Full URLs_httppagex()_Useful macro for forming URLs to specific pages
_httppagehome_Link to the home page of the digital library
_httppageabout_Link to the About this collection page
_httppagehelp_Link to the Help page
_httppagepref_Link to the Preferences page
_httpcurrentdocument_Link to the current document
Other macros dictate what goes into pages. We have already seen that the overall structure is controlled by _header_, _content_, and _footer_. Table 12.3 summarizes useful macros that splice in further information, be it graphics, text, or anything else that can be embedded into HTML—such as JavaScript. For example, _pagescriptextra_ is used in the design patterns below to introduce JavaScript functions that extend Greenstone's functionality by improving how items are presented. The lower part of the table gives macros that are set by the runtime system. For example, _navigationbar_ can only be defined once the collection's indexing and browsing structures have been determined.
Table 12.3: Macros that help generate pages
DefinitionDescription
Determined in advance_pagetitle_What gets displayed at the top of the browser window
_pagescriptextra_Extra JavaScript to include in the header
_pagebannerextra_Anything extra to display in the page banner
_pagefooterextra_Anything extra to display in the page footer
Determined at runtime_navigationbar_Navigation bar populated with search and browse elements
_builddate_When the collection was last built
_numbytesNumber of bytes indexed
_numdocs_Number of documents indexed
_numsections_Number of sections indexed
_numwords_Number of words indexed
_httpnextarrow_
_httpprevarrow_
Hyperlinks to the next and previous sections of the document being viewed (only set when performing the document action, i.e., a=d in the CGI arguments)
_numpages_Number of pages in a document (only set when viewing a document processed by PagedImagePlugin)
_thisOID_File name given to document at import time (based on, but not necessarily exactly the same as, its document identifier)
_versionnum_Version number of the Greenstone installation
Another kind of macro definition that is set by the runtime system takes the form _cgiarg…_, where the ellipsis gives the name of an argument in the URL. We met this in Section 10.6 when switching between images and text in a collection of paged images. It is a useful way to incorporate new functionality. There are no setup requirements—nothing has to be done in advance to declare a new variable. Greenstone sets a macro of the form _cgiarg…_ for every CGI argument in the URL, and macro files can immediately start using these to access the variable. As a result, macros can change what is presented to the user. This is how Greenstone was extended to support the display of HTML documents as realistic books (Figure 3.5).

12.2. Design Patterns

Now we study some larger examples, framed as design patterns because the focus is on the general approach rather than what the specific examples actually achieve.

Design pattern 1: Additional static pages

How can you add extra pages to a Greenstone collection? As you already know, the Greenstone home page is created by macros in the file home.dm. It starts with the phrase package home followed by several macro definitions, the most important being _content_, which supplies the page content. There are no definitions for _header_ or _footer_ macros, so the default definitions in the Global package will be used. The end result is to sandwich the new _content_ definition between the standard header and footer (see Section 12.1 for more details). You can repeat this pattern of defining new macro packages and populating them with macro definitions yourself, inventing new package names as you see fit.
We will describe how to add the page in Figure 12.2a, which gives contact details and photos of the three authors of the book, to either a Greenstone digital library site or an individual collection. Figure 12.2b shows the macro file that generates it. It shows the home page of each author as a hyperlink, and each one's e-mail address, hyperlinked using mailto to launch the user's mail client. The telephone numbers are formatted by a Skype plug-in that was installed in the browser used for the snapshot: it links values it thinks are telephone numbers to Skype and automatically dials when they are clicked. (As Figure 12.2a illustrates, it includes heuristics to suppress this for fax numbers.)
B9780123748577000165/gr2a.jpg is missing
B9780123748577000165/gr2b.jpg is missing
Figure 12.2:
Adding a static page to a collection: (a) viewing the result in a Web browser; (b) macro file
These macro definitions are placed in a package called contact, specified in the new file contact.dm. Lines 10–31 define the _content_ macro, which is made more readable (and maintainable) using a set of support macros, defined above it. Points of interest include:
Line 3. The _pagetitle_ macro gives the text that appears in the Web browser's title. It overrides the default defined in the Global package that displays the name of the collection.
Line 5. The _homepage_ macro takes the user's login name as an argument and generates the home page URL in the University of Waikato Computer Science Department.
Lines 7–8. The _email_ macro takes an e-mail name (which does not have to be the same as the user's login name) as an argument and generates a full address. The @ symbol in the e-mail address is specified using its HTML entity form & #64; in an attempt to defeat spammers from automatically ripping this information from the Web page. The _esubject_ macro defines the text How to build a digital library (encoded for inclusion in a URL using %20 to represent spaces) and is paired with the _email_ macro so that if a user clicks on the hyperlink his or her e-mail client will automatically populate the subject line with this text.
To link this new page into the digital library, two things must be done. First, the new macro package must be made visible to the Greenstone software. Second, one or more links to it must be embedded into an existing page. Macro definitions can be added to collections by placing them in a file called extra.dm in the collection's macros folder or by pasting them into the Macros tab of the Librarian interface's Format panel, which accomplishes exactly the same thing. This file is read automatically. Alternatively, the file contact.dm could be placed in Greenstone's top-level macros folder, in which case its name must be added to the main configuration file (Section 11.2; see also the end of Section 10.4). It is a good idea to use separate files for significant changes, making things easier to manage in the long run.
The second step, adding a link, can be done using the _httppagex_( x) macro in Table 12.2. For instance, this line could be added to the _content_ macro in home.dm:
<a href="_httppagex_(contact)">Click here to view contact information.</a>
This would make the new page with contact details available from the digital library's home page.
As noted, Greenstone automatically picks up the file extra.dm in a collection's macros folder and associates it with that collection. This is also true for other macro files. If desired, you can copy standard files such as query.dm and about.dm into the collection's macros folder and customize them: the definitions included there will override the site-wide definitions. (Note, however, that Greenstone only picks up standard macro files whose names appear in the main configuration file.) If you wanted to put the same information on the About page of an individual collection, you could copy about.dm into the collection's macros folder and add the above line to its _content_ macro. Delete all the other macros, because they are no different than the original definitions.

Design pattern 2: Using JavaScript to adjust presentation

Format statements allow you to manipulate how things are presented. However, there are limits to what they can accomplish. For example, although they can include conditional tests ( If statements), they cannot perform numeric calculations or manipulate strings. If you want to add two metadata values together and test the result, you will need to use JavaScript. This design pattern illustrates the use of JavaScript through two examples that take raw data and display it on the Web page in a more readable form.
The first shows file sizes using KB and MB for large values. The second displays dates in a way that respects local conventions, determined by the locale set in the user's operating system. Both use the JavaScript document.write("….") function to inject fresh text at that point into an HTML document. JavaScript must be embedded in <script>…</script> tags to ensure that it is not interpreted as HTML. The whole thing can be placed directly into a format statement, but as soon as the JavaScript exceeds a couple of lines (which is invariably the case) it makes more sense to define a function, place it in a macro file, and call it from within the format statement using JavaScript. Greenstone already embeds a set of JavaScript functions into the pages generated. Extra functions can be included in this set by defining the macro _pagescriptextra_ (listed in Table 12.3), and that is what we use here.
Figure 12.3 defines two functions to accomplish these tasks: format_filesize() on lines 5–17 and format_date() on lines 19–29. The snapshot associated with the next design pattern (Figure 12.4a) shows the result. In the snapshot, a video is playing, and in the last two lines of the metadata tabulated below you can see the file size of the video, 7 MB (rounded to the nearest megabyte and hyperlinked to download the file), and its release date, Wednesday, 20 December 2006, which has been reformatted from the internal Greenstone value of 20061220. The format statement that produces the “Download size” table cells, for instance, simply calls the format_filesize() function in JavaScript:
B9780123748577000165/gr3.jpg is missing
Figure 12.3:
Using JavaScript to present file size and date metadata
B9780123748577000165/gr4a.jpg is missing
B9780123748577000165/gr4b.jpg is missing
B9780123748577000165/gr4c.jpg is missing
Figure 12.4:
A video player in Greenstone: (a) Web browser view; (b) macro file; (c) the object created by the macro in (b)
<td>Download size:</td>
<td> [srclink]<script>format_filesize("[FileSize]")</script>[/srclink]</td>
Points of interest in Figure 12.3 include
Lines 10, 13. Some ambiguity surrounds the values used for kilobytes and megabytes. Is a kilobyte 1000 or 1024 bytes? Is a megabyte 1000 or 1024 kilobytes? We have used 1024 here.
Lines 21–23. The built-in JavaScript function substr() extracts a substring: the first argument is the start position and the second is the length of the substring.
Lines 25–26. The built-in JavaScript Date object has a member function toLocaleDateString(), which converts dates to the appropriate format.
JavaScript uses curly braces { … } for grouping, but they are reserved characters in Greenstone's macro language. Consequently they are “escaped” by prefixing them with a backslash () in order to pass through unaltered to the Web browser's JavaScript interpreter. Underscore is also a reserved character, and is also escaped (for example, in format\_filesize).

Design pattern 3: Making formats statements reusable through macro definitions

Figure 12.4a demonstrates a video collection whose DocumentText format statement has been customized to stream video into the open source Flowplayer, a Flash-based utility giving a similar experience to YouTube. Over a dozen configuration parameters are needed to set this up, and they would be unwieldy and difficult to manage in a format statement. They can be more conveniently laid out in a macro definition (and decomposed into smaller macros, if desired). But there is a snag—it is necessary to refer to metadata values associated with the video document and, as mentioned previously, this cannot be done directly in macro files.
Figure 12.4b shows how to achieve the best of both worlds by defining a macro to which the metadata values are passed as parameters. It generates the HTML <object> … </object> element shown in Figure 12.4c, which is exactly what is needed to play the video. Most of the code in Figure 12.4b (lines 8–32) defines a Greenstone macro called _videoplayer_ that generates this HTML. This macro has two arguments, _1_ and _2_; they are used on line 28 and represent metadata values that are needed to construct the video file's URL. The macro is called from the format statement as follows:
format DocumentText “ … _videoplayer_([assocfilepath],[streamablevideo]) …”
The arguments are two metadata values, [ assocfilepath] and [ streamablevideo]. The former is defined for any document that has associated files (such as the video) and provides the name of the folder in which they are stored—in fact, we encountered it in the thumbnail format statement in Section 12.1. The latter metadata is generated by the plug-in that processed the video file, and provides its filename within the associated file directory. These two metadata values are combined with the _baseurl_ macro defined in Figure 12.4b (line 3) to yield the complete URL of the video. This is then set as the value of the videoFile configuration field in the HTML <object>. The result is the line near the end of Figure 12.4c that reads
videoFile: '/gsdl/collect/nzonair/index/assoc/HASH7e4d.dir/
Duchess_Raglan_City_stream.flv'
This macro file is not only easier to create and maintain than embedding all this directly in a format statement, but is in a convenient form to share with other collections and others with Greenstone installations.

Design pattern 4: Dynamic HTML

When a Web page is rendered in a browser, it takes the form of a tree structure in memory that encapsulates the sequencing and nesting of HTML elements. This tree forms an important component of the Document Object Model (DOM), and can be manipulated with JavaScript to dynamically alter what the Web browser displays, usually in conjunction with associated cascading style sheets (CSS). The term “tree” is used in its computer science sense to refer to an abstract structure that is hierarchically composed (as in Figure 11.1).
Figure 12.5 shows an abridged version of the DOM for the introductory HTML example in Chapter 4 (Figure 4.6). HTML elements map to nodes in the tree: nested elements are children of the parent tag that encloses them; sequential elements (such as the < title> and two < meta> tags at the top left) are siblings, ordered left to right at the same level, with the same parent. Text within an element, such as text to be italicized, also forms a node, shown in the figure as #text.
B9780123748577000165/gr5.jpg is missing
Figure 12.5:
Document Object Model for the Web page in Figure 4.6
DOM, a World Wide Web Consortium specification, is the key to dynamic HTML. In addition to its hierarchical structuring of HTML elements, it includes methods to access the width and height of the browser window and to determine whether or not x and y scrollbars are in effect and what their positions are—and it includes ways of changing these values. What we see here is just the tip of the iceberg.

Opening and closing tables interactively

It is often useful to be able to interactively open and close sections of a table, be it a table of contents or a table displaying metadata. For example, Figure 12.6a shows the metadata for an educational resource exported from the University of Calgary's Learning Commons Educational Object Repository in IEEE LOM format (Section 6.4; this is the same example as in Figure 6.10). Top-level headings appear as emphasized rows in the metadata table—Classification, General, Lifecycle, Educational, Rights, and Technical. Beside each are small triangles (▹ and ▿) that open and close the section. In the figure all are closed except General.
B9780123748577000165/gr6a.jpg is missing
B9780123748577000165/gr6b.jpg is missing
Figure 12.6:
Opening and closing sections of a table: (a) Web browser view (b) macro file, including JavaScript
You can place an ID attribute into any HTML tag to mark a salient position in the page with an easily identifiable name that is unique within the file (you can choose whatever name you want); in the example this has been done for each major section of the table in the DocumentText format statement. Figure 12.6b shows the JavaScript functions that support the opening and closing operations, show_description() and hide_description(), respectively, defined using the above-mentioned _pagescriptextra_ macro. The part of the table to change is passed in as the argument elem, and it is located by the built-in DOM function getElementById(). The appearance of the part to change ( elem) is altered by switching its CSS display style from none (not displayed) to block (displayed) and vice versa: lines 7, 9, and 12; and 20, 22, and 25. In fact, there are three things to change. To close a table section, the “open” triangle is located (line 6) and visually suppressed (line 7). Then the “closed” triangle is displayed (lines 8–9). Finally, the section of the table that contains the metadata is suppressed (lines 11–12).
For conciseness, the code relies on the convention that the three related items share the same root for the id attribute, and it is this root that is passed in as the elem argument. With the default being the closed status, the following excerpt of format statement implements the open operation for the section of the table for General metadata:
<div id="generalopen" style="display: block;">
<img src="_httpimages_/close.gif"
onclick="javascript:show\_description('general')">
<b>General</b>
</div>
A similar excerpt is needed for the close operation, and the metadata itself is displayed more simply with:
<div id="generaltext" style="display: none;">
<!-- display the various metadata fields General Language, General Title … -->
</div>

Adding table headers

Figure 12.7 shows how to add headers to the information presented when browsing a collection by inserting a row into the table that displays a classifier's VList information. Figure 12.7a shows the result: the headings for Title, Subject, and Description beneath the main banner have been spliced into the document. Figure 12.7b shows the macros that achieve this.
B9780123748577000165/gr7a.jpg is missing
B9780123748577000165/gr7b.jpg is missing
Figure 12.7:
Adding header information to a Vlist: (a) Web browser view; (b) macro file, including JavaScript
The inclusion of the string a=d in the Greenstone digital library URL indicates that the user is either displaying a document or performing a browsing action. Technically, this is called the document action: it assigns the value d (for document) to the CGI argument a (for action). The runtime system can readily distinguish document display from browsing by the other CGI arguments. For the former, the document in question is defined by the d argument; for the latter the cl argument is used to specify the classifier involved.
The document action ( a=d) is controlled by macros defined in the package called document. Figure 12.7b defines the macros _pagescriptextra_ and _pagefooterextra_ (Table 12.3). The first defines two JavaScript functions. The main one, manipulateVList(), locates the relevant table in the Web page and adds in the necessary <th>…</th> tags for the Title, Subject, and Description table headings. The other, createVListth(), avoids repeating the code required to construct the header elements. The second macro, _pagefooterextra_, calls manipulateVList() to effect the desired transformation. It is placed in the footer to ensure that the main content of the page has been already constructed by the time it is called.
Points of note in the example include:
Line 16. This locates the table in question. Again, getElementById() is used, but unlike the tables described above, it relies on the identifier, group_top, which is automatically generated by the Greenstone runtime system when creating the top-level table within a VList.
Lines 19–22. This statement overcomes a discrepancy between Firefox and Internet Explorer that showed up during testing. Tables in Internet Explorer contain an extra node (a text node). Unfortunately, such differences are not uncommon when developing DOM-based JavaScript. In this case, it is possible to resolve the issue without needing to test explicitly which browser is being used, which has the added benefit that the code will work on other browsers, whether or not they introduce the extra text node.
There are several software libraries for DOM manipulation that aim to shield programmers from differences between browsers (such as YUI, Yahoo's User Interface library). Our illustrations use unadulterated JavaScript because it is more complicated to explain examples that use libraries.
Lines 41–43. These lines determine if the page has been generated by a classifier by testing whether _cgiargcl_ is defined. Macro code can safely be included within a section of JavaScript (marked with <script>…</script> tags) because Greenstone resolves all macros before sending the result to the Web browser. What the browser sees between the tags is correct JavaScript code.

Design pattern 5: Exploiting Asynchronous JavaScript and XML (AJAX)

As we have seen, some nifty effects can be created using DOM manipulation. They give Web pages an interactive feel—for example, users can open and close sections of tables without reloading the page. But these adjustments can use only information stored in the page when it was first generated or information provided by the user when interacting with it. AJAX adds the ability to communicate with the server that produced the page. This greatly extends the range of what is possible:
• A user interaction can prompt a request to the server for further information that is used to update the display.
• Changes in the state of the interface can be sent to the server incrementally and can be stored there.
• Multi-way communication channels routed through the server can support simultaneous collaboration with other users.
For example, AJAX could be used to support a more responsive form of query refinement or faceted browsing, avoiding the expense of reloading pages when users narrow the focus of their search. Less data is transmitted overall because the server need only provide information about matching documents that have not already been sent.
Our first example below returns to the Depositor, Greenstone's institutional repository system, and shows how AJAX is used to support the checksum feature. A surprisingly short piece of JavaScript is needed to effect such a radical change in how the Web is used. The second example is a digital music stand that is integrated into Greenstone. Rather than going into technical details, we focus on the bigger picture and describe the role that AJAX plays.

Server-side checksums

Under the discussion of institutional repositories in Section 11.6, we saw how Greenstone can be configured to provide an author-submission workflow comparable to that of DSpace (described in Section 7.6). Most of this is accomplished using features of the macro language that we have studied above. However, it is a little more complicated to calculate the checksum of the uploaded file. To do this, a small server-side CGI program called checksum was written; AJAX was used to call it from within the Web page; and the DOM was updated with the data returned. Figure 12.8 shows the page before and after clicking the “calculate …” hyperlink embedded in the table.
B9780123748577000165/gr8ab.jpg is missing
B9780123748577000165/gr8c.jpg is missing
Figure 12.8:
Using AJAX to retrieve information from the server: (a) before the checksum is calculated; (b) after the checksum is calculated; (c) AJAX JavaScript code for asynchronous loading
Figure 12.8c gives the necessary JavaScript. The function urlGetAsync(), defined at the beginning, provides a general routine for AJAX communication that can be used elsewhere in the digital library. Its first argument is the URL of the CGI program to communicate with on the server (more than one such program could exist on the server, to augment functionality in different ways). The second argument names the DOM element to be altered as a result of calling the server. With this general routine established, the checksum feature is accomplished by add_checksum( filename) (lines 43–46), whose filename argument gives the name of the uploaded file on the server. The macro _httpchecksum_ is defined at the beginning (line 3): it resolves to the URL of the CGI-executable program that computes the checksum based on the file name provided.
There are different ways to transmit data to a Web server and receive the result; the name urlGetAsync() reflects the choices made here. The GET method is the standard way to send data on the Web and is suitable for lightweight data items such as numbers or short strings—it produces the CGI arguments we often see in URLs, such as a=p& p=about. (There is an alternative called POST that is typically used for transmitting heavyweight data, such as files.) Data can be returned either synchronously or asynchronously. In the former case, the calling program (i.e., the JavaScript in the Web page) makes a connection with the server and awaits a response from it. In the latter case, execution continues and the server returns the result by calling a stipulated “callback function”; this is what urlGetAsync() uses.
The function urlGetAsync() first primes the variable  xmlHttp for server-side communication. This object is created in different ways for different browsers, accounting for the bulk of the code to this routine (lines 9–28). Lines 30–37 provide the callback function, and lines 39–40 transmit the data. Callback functions can do many sophisticated things, but this one is simple: it takes the second parameter elem and sets its HTML to whatever the server returns. Incidentally, the call could be made synchronous by changing xmlHttp.open()'s third argument from true to false (line 39). The callback function on lines 30–37 would then be unnecessary and should be removed.
All that remains is to set up a hyperlink to invoke the checksum calculation. When discussing Figure 11.11 in Section 11.6, we explained that a different macro is used to define the page content for each step of the process of depositing a new item in the repository. These are called step1content, step2content, … . Step 4 is the one that creates Figure 12.8a and the “calculate …” hyperlink embedded in the table is generated by
<a href="javascript:add\_checksum('tmp/_cgiargdi1tmp_/
_cgiargdi1timestamp_/_di1userfile_')">calculate …</a>
This calls the JavaScript add_checksum function with the appropriate filename as its argument. Files uploaded to the digital library Web server are placed in a subfolder of tmp that encodes the timestamp when they arrived. The macros _cgiargdi1tmp_, _cgiargdi1timestamp_, and _di1userfile_ encode the folders and filename used.

Digital music stand

In Section 10.6 (Non-textual documents) we show the basic mechanism Greenstone provides for displaying documents that comprise paged images. The facilities provided are rather austere and also a little cumbersome, because the browser loads a new Web page every time the user turns the page. Figure 12.9 illustrates a digital music stand that is integrated with Greenstone. It is also page based, but uses AJAX to create a smoother interaction with more capabilities. To reach this screen, the user has located a piece of music through the standard browsing and searching features; it is only when she has clicked to view the document that the differences become apparent. For instance, the digital music stand supports textual annotations: a non-destructive—and clearly typed!—analog of the hasty penciled notes that all musicians make. Halfway down the page in Figure 12.9a is a warning to watch the fingering. Notes can be positioned anywhere on the page.
B9780123748577000165/gr9a.jpg is missing
B9780123748577000165/gr9b.jpg is missing
Figure 12.9:
An AJAX-based digital music stand: (a) adding an annotation; (b) an animated page-wipe
The digital music stand also supports an animated fast-to-slow page wipe. Turning pages is a critical problem for performing musicians, who are already under real-time stress and may be sight-reading the music for the first time. Turning a physical page suddenly obscures the current one, but in this electronic version the next page is overlaid gradually. The transition can be seen as a full-width horizontal bar halfway down the page in Figure 12.9b (colored red, in fact) that wipes downward, carrying the new page with it—so that the top of the next page can be seen along with the bottom of the current one. The motion is fast at first but slows down on reaching the point marked by the scroll-bar (on the right). This gives the musician plenty of time to finish playing the last line of the current page. Even if she reaches the end of the last few notes on the current page before the transition has completed, there is no need to panic because the beginning of the next page has already appeared at the top.
By default, the transition point from fast to slow is between the last two staff systems on the page. If the user changes this, the value is immediately transmitted (via an AJAX call) to the server and stored as metadata with that document, so that on returning to this page (later in the performance, or on a different day) the user's preference is restored. Likewise, annotations are stored as metadata. AJAX is also used to asynchronously load next and previous pages in the background before they are needed, and the dimensions of the user's screen window are sent to the server so it can generate an appropriately paginated version that makes full use of the available space.

12.3. The Greenstone Research Project

The Greenstone digital library software grew out of an ongoing project at the University of Waikato in New Zealand. It has been widely used internationally, and (as noted in the Preface) a joint project with UNESCO is promoting its use for practical digital libraries in developing countries. As a result, members of the project have strong motivation and a great deal of personal satisfaction. However, widespread adoption of Greenstone also has the side effect of suppressing innovation. There is a disincentive to introduce new research-led concepts and facilities into a production system because they are less reliable, difficult to test on multiple platforms, and often entail substantial upheavals in the software. Because of Greenstone's wide range of applications and computer platforms, modest changes can introduce bugs that affect users in unforeseen ways. Moreover, the interface and documentation have been translated into many languages, which itself discourages new development.

Research with Greenstone3

Our solution has been to create a new version of the software with the express intention that it constitute a framework for innovative research. We named it Greenstone3 to differentiate it from the production version described in this book, retrospectively dubbed Greenstone2 to coincide with the version number we had already been using. Extensive backward-compatibility was built in right from the start—Greenstone is the digital analog of the physical library building, and you don't have to rewrite the books or card catalogs when libraries move to new premises! Consequently, Greenstone3 can run collections built for earlier versions of Greenstone without any modification whatsoever. Other goals include different levels of customization, software modularity, dynamic services, distributed architecture, future compatibility, integrated documentation, self-describing modules, and integration into the computing environment.
Greenstone3 is a service-based architecture. A network of modules that provide these services communicate with each other via XML messages, which can optionally be set up to communicate in a distributed fashion using the Simple Object Access Protocol (SOAP) described in Section 7.4. Messages can be customized at any point using XSL transformations (mentioned in Section 4.4). Dynamically loadable Web services are layered on top of the communication channel. A “describe-yourself” call is a mandatory fixture that allows services to be discovered in a heterogeneous world of communicating applications; it also assists future compatibility.
From a research viewpoint, Greenstone3 has been a resounding success. We have used it to investigate such things as dynamic text mining, query visualization, alerting services, spatial searching, ontologies, and many more. We are capitalizing on the contents of digital libraries to automate the production and delivery of practice exercises for students who are learning English. We are exploring the role of AJAX in digital libraries to create richly interactive interfaces—for example, it is used for the Koru knowledge-based information retrieval system described in Section 9.3. This research tests the ability to provide future compatibility, because the rise of AJAX postdates Greenstone3's design.
Greenstone3 is an open source project and is under active development. However, demands from users for further enhancements to Greenstone2 have drawn more heavily on our time than we had foreseen when we embarked upon the re-implementation.

Reconciling research and production values

Greenstone3 was envisaged purely as a research framework: its use would require IT skills. It has achieved many of its goals—for example, it is now far easier for external projects to build upon the digital library core, as in the examples mentioned above. By and large, these external projects develop new or enhanced functionality that augments the document searching, browsing, and access already provided. Some projects, notably ones utilizing AJAX, produce new Web-browser interfaces that issue requests to the digital library infrastructure and manipulate the XML messages that are returned. Others develop standalone applications (using the programming language and toolkit that suit their needs) that integrate with the infrastructure at as deep a level as required. In applications that run on the same computer as the digital library system, the communication protocol can be bypassed in favor of direct procedure calls using an application programming interface.
We have found that maintaining two independent versions of the software—in particular, ensuring backward-compatibility when new and enhanced features are still being added to Greenstone2—stretches our resources too thinly. Consequently, we are now developing Greenstone3 to the point that, by default, users will find its installation and operation indistinguishable from those of Greenstone2. For example, the original Greenstone3 was installed by obtaining the code from a software repository and compiling it for your machine. Now, however, it uses Greenstone2's cross-platform installation wizard, which allows end users to install it easily. Greenstone's interface translations into different languages have been migrated to Greenstone3. Enhancements have been made to the default services so that what readers see through a Web browser can be matched, step for step, with what they already experience with Greenstone2. Automated regression testing helps minimize the problem of checking for undesired side effects of recently committed changes. Workshop and tutorial material have been tested to ensure that they work equally well with Greenstone3.
In terms of ingesting documents into a collection, we now utilize Greenstone2's exporting ability, which was originally designed to facilitate interoperability, to output to the registered METS profile utilized by Greenstone3. Greenstone2's collection-building code was upgraded to parse Greenstone3 configuration files, which allows the Librarian interface to operate equally well with either version. Overall, this pattern of gradually moving the two code bases closer together seems to be working well. For example, the realistic book facility illustrated in Figure 3.5 was originally planned for Greenstone3 alone. However, it turned out that an extra optional CGI parameter could be introduced into Greenstone2 that made it generate XML in the Greenstone3 document format, which allowed the work to be “back ported” to Greenstone2 and used in any existing collection.
There is one area where Greenstone3 still differs radically from Greenstone2, and that is in the format statements and macro language described in this chapter. They are based on XSL transformations, a mechanism that is more powerful but, unfortunately, even more difficult to use. However, XSL transformations can more readily support automatic construction of format statements and macros. We are designing an interactive editor for Greenstone digital library pages that will eventually render format statements and macros obsolete—much to the relief of users. However, this is a major undertaking, which is why we have based this book on Greenstone2 rather than the newer version. From the user's point of view, all other changes when upgrading to Greenstone3 will be invisible. In fact, you might like to download it (from www.greenstone.org/greenstone3-home) and see for yourself.
Looking to the future, our overarching goal is to broaden the realms in which digital libraries are utilized. We are now focusing on multimedia and enhancing Greenstone to incorporate content-based analysis for images, audio, and video, with particular specializations to newspapers, music, and maps. We foresee an extension mechanism as the key to managing this ever-widening and diversifying spectrum of research-inspired functionality, and we will produce an administration tool that facilitates general maintenance of an installation along with the ability to download and incorporate particular extensions.

Closing words

We believe that digital libraries are a key aspect of civil society. Their importance grows with every passing day. People are beginning to recognize the dangers of relying on one or two giant universal search engines for access to our society's treasure-house of information—which constitutes our entire literary, scientific, and cultural heritage. What the world needs, we believe, are focused collections of information, created and curated by people with an intellectual stake in their contents. We mean digital libraries, built by librarians!
We are proud and very pleased that the Greenstone Digital Library Software is playing a part in this international endeavor. Year by year, its technical strength grows and its rate of adoption increases. Yet this success has come at a price: although we are at heart a research group, we feel obliged to provide support for the growing user base of our production system—particularly in developing countries. This final section has given an insight into the development processes of the Greenstone software and conveyed our approach to the difficult problem of reconciling the production values of stability and reliability with a vigorous research framework of innovation in digital library software.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset