Format statements
Format statements introduce document metadata into text marked up in HTML. Additional control is provided by {If} and {Or} statements. These concepts are imported from computer programming languages but are no more difficult that formulas in a spreadsheet, which is familiar territory for many non-programmers. The structure of an Excel
if expression is the same as Greenstone's: Excel's
If(
test,value-if-true,value-if-false) corresponds to Greenstone's {
If}{
test,value-if-true,value-if-false}. Unlike Excel, in Greenstone such statements are intermingled with HTML-tagged text, and the role of the curly brackets is to allow the runtime system to sift them out. The {Or} statement is logically unnecessary, providing a syntactic convenience in certain situations: everything it can do can also be done with a more verbose combination of {
If}s.
In writing, lists are usually organized vertically, so the most commonly customized format statements are for VLists—vertical lists (see
Format features in
Section 10.4). Here are some common patterns.
If a VList format statement starts with a <td> tag, it is assumed to be a row in a table. Greenstone automatically encloses it with <tr> … </tr> tags and encloses the entire table with <table> … </table>. Basic control over document items is provided by
<td>[link][icon][/link]</td>
<td>[dc.Title]</td>
This generates a two-column table, with a link to the document in the first column and title metadata in the second. The original source document—or anything else—could be added in an extra column:
<td>[srclink][srcicon][/srclink]</td>
<td>[link][icon][/link]</td>
<td>[dc.Title]</td>
Note that
link,
icon,
srclink, and so on are actually treated as extracted metadata—they are generated by Greenstone and associated with each document. The Librarian interface presents names for extracted metadata as beginning with the prefix
ex. However, this prefix is omitted internally: Greenstone assumes extracted metadata by default when it encounters unqualified metadata names like
link,
icon, and
srclink.
Another pattern introduced in
Chapter 10 is the use of image thumbnails instead of the icon that is automatically generated by [
icon]:
<td>[srclink][thumbicon][/srclink]</td>
<td>[dc.Title]</td>
Note the similarity with the first of the two examples above. Exactly the same effect can be achieved by replacing the first table cell (i.e., line 1) above by this more complicated version:
<td>[srclink]<img src="_httpassocdir_/[assocfilepath]/[Thumb]"
width="[ThumbWidth]" height="[ThumbHeight]">[/srclink]</td>
In fact, behind the scenes the ImagePlugin sets
thumbicon metadata to something equivalent to this expression (it also sets ThumbWidth and ThumbHeight metadata for the thumbnail image). While thumbnail images are common enough to warrant
thumbicon metadata as a convenient shorthand, a huge variety of other associated files might be bound to a document, and this more explicit technique shows how to link to such files.
In the excerpt above, the
src attribute with value
_httpassocdir_/[assocfilepath]/[Thumb] creates a URL for the associated image. The image plug-in sets up the thumbnail using Greenstone's generic mechanism for a document's associated files. The macro
_httpassocdir_ gives the name of the folder where Greenstone places associated files (in fact, this is the folder
<collectionname>/index/assoc); within this file area,
assocfilepath metadata gives the name of the associated-file folder for this document; and
Thumb metadata gives the name of the actual thumbnail image. The same pattern can be used to construct <
a href=…> hyperlinks to associated files. There is a range of plug-ins in Greenstone that set up associated files following this pattern, varying only the last part: the metadata name (or perhaps explicit filename) used for the actual resource associated with the document.
Another useful pattern encountered in
Section 10.4 (Formatting exercise 1) is
<td>[link][icon][/link]</td>
<td>{If}{[numleafdocs],[Title]([numleafdocs]),[dc.Title]}</td>
The first cell is the same as in the first example above. The second uses the variable
numleafdocs in two different roles. Its first occurrence tests whether the variable has a value, in order to determine which of the two branches of the
If to take. The metadata
numleafdocs is only set at internal nodes of a hierarchy (e.g., bookshelves in a classifier), and gives the total number of leaves below this node, which is printed in parentheses by the second occurrence above. In either case, the title is printed: the extracted title [Title] for internal nodes, which is set to the metadata value corresponding to that node; and the Dublin Core title [dc.Title] for leaf nodes (i.e., documents).
When format statements become overly complicated, they can often be simplified by going back to the source documents and assigning metadata that supports what you want to present. For example, if different types of document need to be presented in different ways (article, paper, book, …), you could define a metadata value, say
dc.Format, that specifies the type. If different types are in different folders in the input material, this can be quickly done by assigning metadata in the Librarian interface at the folder level. This metadata is not necessarily presented to the user, rather it is used as internal metadata that helps simplify the logic of
If statements.
Macros
Greenstone macros control the structure of all Web pages generated; they also provide language-independent interfaces.
Figure 12.1 gives an artificial excerpt that illustrates how macros are defined and used. Macro definitions usually comprise a name flanked by underscores, and a definition placed within braces { … }. Lines beginning with the hash character (#) are comments. A second form of definition is for macros with arguments (see below). Macro definitions contain a mix of plain text, HTML, and references to other macros. For conciseness, the definitions in
Figure 12.1 are short. In practice they often span 20 lines or more, indented to improve readability. Macros do
not contain references to metadata values like [dc.Title], because macro definitions are not bound to particular documents.
Macros are grouped into packages, and an inheritance scheme is used to determine which definitions are in effect at any given time. This allows the content generated for a page to be embedded within a global formatting style. For instance, most pages have a standard header and footer that enclose page-specific content.
Figure 12.1 shows a baseline page defined in the Global package, whose
_content_ macro is not intended to be seen. It is overridden by the immediately following query package that generates a page inviting users to enter search terms and perform queries.
Macros can include parameters interposed in square brackets between name and content. These are known as page parameters because they control the overall generation of a page, and they are expressed as [
x=y], which gives parameter
x the value
y. The
_header_ macro in
Figure 12.1 uses the page parameter
l, which determines the interface language. Another parameter (not shown in this example) is
v, which controls whether images are used in the interface. A precedence ordering for page parameters is built into the macro language to resolve conflicting definitions.
Figure 12.1 gives versions of the
_header_ macro in English, French, and Spanish. They set the parameter
l to the two-letter international standard abbreviation (ISO 639), enabling the system to present the appropriate version when the page is generated. If a macro has no definition for a given language, the parameter is simply omitted, which in standard Greenstone produces an English version because this is the most complete and up-to-date language for the interface. However, it is easy to change the default. It is specified in the main configuration file (see
Configuration files in
Section 11.2) using the
cgiarg option, specifying (for French)
cgiarg shortname=l argdefault=fr.
The macro language has its own version of the
If statement
_If_—an example appears in
Figure 12.1's
_content_ macro. In this case it uses the macro
_cgiargqb_ to determine whether to display a normal query box or a large one on the search page. The value of
_cgiargqb_ is set at runtime by the Greenstone system; its value is ultimately determined by the user's setting on the Preferences page, which can be controlled through a pull-down menu that offers the choice of query style as
single line or
large. Choosing
large causes the Web page to set the HTML form field
qb to
large, which in turn leads the runtime system to set
_cgiargqb_ to
large. Several other system-defined macros have values that are determined at runtime—for example, the number of documents returned by a search. We return to this point below.
The final macro definition in
Figure 12.1,
_matches_, illustrates the use of arguments. If invoked by
_matches_(5), this generates the text
5 document(s) matched your query. An
If statement could be added to separate the singular and plural cases, rather than resorting to the clumsy “document(s)” —but that is not the point this part of the figure is looking to demonstrate. Notice the use of
_1
_ in the macro definition to refer to the first argument passed in when the macro is invoked. The pattern continues with
_2
_ for the second argument (if required), and so on; and
_matches_(5,3) would be used for a two-argument macro. It is, however, not normal to call a macro with fixed numeric arguments such as 5 and 3. Values set by the runtime system (such as
_numdocs_) or metadata from a document are more common.
Figure 12.1 is artificial in both content and layout. Although macro definitions from different packages can be mixed together in a single file, as here, this is not the norm. Usually, different files control the layout of different pages, such as
about.dm,
query.dm, and
home.dm for the Greenstone About, Query, and Home pages. There are two exceptions to this convention. One is a collection-specific macro file called
extra.dm, which contains any macros defined in the Macros section of the Librarian interface's Format panel. Another is the set of macros containing textual language fragments, which are grouped into files by language (
english.dm,
french.dm, and so on). In fact, each language has two files, for example
english.dm and
english2.dm, the former containing phrases used in the core Greenstone system and the latter containing those in auxiliary subsystems. The distinction is made to avoid overwhelming translators and to help them focus on the text used most often.
The standard Greenstone Reader's interface is built entirely out of macros. It is an excellent source of additional examples of how to use them.
Commonly used macros
Table 12.1 lists three values defined in Greenstone's site configuration file (
gsdlsite.cfg,
Section 11.2) that are set as macros
_httpprefix_, _httpweb_, and
_gwcgi_. The purpose of these values is to map Greenstone into the Web server's space, with increasing levels of refinement. Greenstone is usually installed in a single folder in the file system (say
greenstone-2.83), which is mapped to a unique prefix in the Web server's configuration file that is also recorded in
_httpprefix_. For example, setting it to
/greenstone would be a good choice. If a more up-to-date version of Greenstone were installed later, this could be placed in a new folder on the file system, collections moved across as needed, and the Web server's configuration file altered to map
/greenstone to the new location.
Table 12.1: Global macros defined in
gsdlsite.cfg
Definition | Description |
---|
_httpprefix_ | Specifies the HTTP address of the Greenstone home directory, from which you can build up URLs for images and collections |
_httpweb_ | Specifies the HTTP address of the top-level
web folder in the Greenstone home directory |
_gwcgi_ | The URL of the Greenstone digital library, which is an executable CGI program named
library.cgi (the
gw, short for gateway, in
gwcgi is an archaic historical hangover) |
The
_httpweb_ macro specifies how the root area for supporting files such as images, CSS, and JavaScript in Greenstone is mapped to the Web server's space. By default, the folder
web, located in the folder where Greenstone is installed, forms the root, with sub-folders for
images,
style and
script. Continuing the configuration example from before, having set
_httpprefix_ to
/greenstone,
_httpweb_ would need to be set to
/greenstone/web. In fact, on this occasion it can be omitted as this is its default value. The macro
_gwcgi_ provides the URL for accessing the digital library. The norm is to leave this undefined, in which case, for the configuration being described, the library's URL is
/greenstone/cgi-bin/library.cgi.
These three macros are fundamental to Greenstone's operation, and numerous definitions in the macro files are based upon them.
Table 12.2 shows some macros that are often used for defining other macros and in format statements. The upper entries define partial URLs that pinpoint general areas within the installation from which full URLs can be formed. The lower part gives fully formed URLs that can be used directly to link to pages like a collection's Preferences or Query page.
Table 12.2: Macros in
base.dm that define URLs
| Definition | Description |
---|
Partial URLs | _httpimages_ | URL prefix for where the site-wide
images folder is located in the Greenstone home directory |
| _httpstyle_ | URL prefix for where the site-wide CSS
style folder is located in the Greenstone home directory |
| _httpcollection_ | URL prefix for where the current collection is located |
| _httpcimages_ | URL prefix for where the collection-specific
images folder is located |
| _httpassocdir_ | URL prefix for where the collection's associated files directory is located |
Full URLs | _httppagex()_ | Useful macro for forming URLs to specific pages |
| _httppagehome_ | Link to the home page of the digital library |
| _httppageabout_ | Link to the
About this collection page |
| _httppagehelp_ | Link to the Help page |
| _httppagepref_ | Link to the Preferences page |
| _httpcurrentdocument_ | Link to the current document |
Other macros dictate what goes into pages. We have already seen that the overall structure is controlled by
_header_, _content_, and
_footer_.
Table 12.3 summarizes useful macros that splice in further information, be it graphics, text, or anything else that can be embedded into HTML—such as JavaScript. For example,
_pagescriptextra_ is used in the design patterns below to introduce JavaScript functions that extend Greenstone's functionality by improving how items are presented. The lower part of the table gives macros that are set by the runtime system. For example,
_navigationbar_ can only be defined once the collection's indexing and browsing structures have been determined.
Table 12.3: Macros that help generate pages
| Definition | Description |
---|
Determined in advance | _pagetitle_ | What gets displayed at the top of the browser window |
| _pagescriptextra_ | Extra JavaScript to include in the header |
| _pagebannerextra_ | Anything extra to display in the page banner |
| _pagefooterextra_ | Anything extra to display in the page footer |
Determined at runtime | _navigationbar_ | Navigation bar populated with search and browse elements |
| _builddate_ | When the collection was last built |
| _numbytes | Number of bytes indexed |
| _numdocs_ | Number of documents indexed |
| _numsections_ | Number of sections indexed |
| _numwords_ | Number of words indexed |
| _httpnextarrow_ _httpprevarrow_ | Hyperlinks to the next and previous sections of the document being viewed (only set when performing the document action, i.e.,
a=d in the CGI arguments) |
| _numpages_ | Number of pages in a document (only set when viewing a document processed by PagedImagePlugin) |
| _thisOID_ | File name given to document at import time (based on, but not necessarily exactly the same as, its document identifier) |
| _versionnum_ | Version number of the Greenstone installation |
Another kind of macro definition that is set by the runtime system takes the form
_cgiarg…_, where the ellipsis gives the name of an argument in the URL. We met this in
Section 10.6 when switching between images and text in a collection of paged images. It is a useful way to incorporate new functionality. There are no setup requirements—nothing has to be done in advance to declare a new variable. Greenstone sets a macro of the form
_cgiarg…_ for every CGI argument in the URL, and macro files can immediately start using these to access the variable. As a result, macros can change what is presented to the user. This is how Greenstone was extended to support the display of HTML documents as realistic books (
Figure 3.5).