Chapter 27. Using SharePoint Server for Search

Introduction

In previous chapters, we went through installation of the Microsoft Office SharePoint Server 2007 (MOSS) multiserver farm, as well as basic usage scenarios of a majority of the MOSS features. Installation and configuration of Search, even for simple use of out-of-the-box features, is very important to the overall health of your MOSS farm, as well as the health of your end users’ search results.

Search in MOSS 2007 offers many significant improvements over SharePoint Portal 2003, and starting with the Office 2007 release, it is also available as a standalone product. Simply speaking, Search is an increasingly important tool for the end users as more and more data is stored in repositories that are indexed by the MOSS Search. But why is it important to an end user? Consider, for instance, as an end user, a task of finding a document stored either on a local computer or somewhere on a corporate file share. Finding the document in the first scenario may take up some time, depending on the user’s personal folder organization or use of some desktop search, but the second scenario may literally take hours. This is where the Search part of MOSS steps in. MOSS Search is one of the tools that will allow you to do such a search, and not only will it index data in MOSS sites, but, among other data sources, it will also index websites, file shares, Microsoft Exchange, and with additional work, many other line of business applications via its Business Data Catalog. A well-tuned Search literally can save hours of productivity per person.

Whether you are migrating from the Microsoft SharePoint Server 2003 or just starting out, overall Search components require a bit of planning and configuration—just installing Search will not yield any results. Hence, working with Search will be one of the key tasks for an administrator while configuring all of the elements of the MOSS farm. For medium and large companies, implementing and tuning Search will become a significant project task. Many different options should be researched before the implementation, including the number of servers involved, use of dedicated servers, scheduled crawls, and use of content access accounts.

More significantly, there will be day-to-day operational tasks associated with Search, ranging from service monitoring and dedicated backup, to usage reporting monitoring and inclusion of content owners in tuning of best bets and relevancy results.

Finally, developers can work with Search in many different ways, as many aspects of the technology are meant to be extensible. It is important to think of Search as yet another potential development platform. Not only can the developers write indexing add-ons, such as IFilters, but the developers can also write custom search applications or even customize existing elements of Search. A nice example of such a Search customization is the People search, a distinct tab that offers different options from a standard Search tab. The People search offers unique Web Parts and improved display of the results specifically for finding other people.

At the end of this chapter, you will be able to:

  • Understand Search from the end user’s perspective

  • Understand different features of Search

  • Understand the architecture of Indexing and Search

  • Understand the basic administration elements of Search services

  • Understand advanced configuration options

  • Understand the extensibility options associated with Search

Search from the End User’s Perspective

Search is one of the few applications that have become somewhat commoditized in the eyes of the end users. With Internet growth being fueled by many search services—such as Yahoo, Google, and MSN—everyone has an opinion and decent knowledge of what “search” is, what it should do, and how fast it should work. Search applications that deviate from common characteristics can be misunderstood by users and their adoption can fail. Fortunately, Search services in MOSS follow the majority of these characteristics, and therefore should be very familiar to most users.

Basic use of search is intuitive enough that almost no training is necessary, but it may be worthwhile to educate users on advanced searches, and eventually on enhanced application searches. MOSS is capable of searching through almost all of the data stored on the server, and it is capable of searching lists, content stored on different pages, document properties (metadata), and document contents (full-text search).

Warning

There are some small differences in Search when moving from Windows SharePoint Services 2.0 or SharePoint Portal Server 2003 to Windows SharePoint Services 3.0 or MOSS 2007. Although we will not dwell on the details, it is worth mentioning that WSS-style site searches now automatically include subsites, and automatically use Portal features (if available). Additionally, use of multiple search words automatically uses AND, not OR as in the past, which significantly improves the search results.

We’ll start off by discussing the common elements of Search that will be utilized by the end users, and then move onto the common tasks that power users or web designers can undertake to customize the search experience under MOSS.

Search Elements Across Pages

Search as an application includes a number of different elements that are visible to the end user (Table 27-1). Figure 27-1 shows a modified tab in a Search center application, with custom Web Parts from a Knowledge Network application. Conceptually, the tab is similar to Google but provides some improvements, such as the Refine Your Search Web Part, which allows for interactive filtering of search results.

Table 27-1. Search elements

Element

Purpose/Placement

Search Box

• All Portal Pages, except settings and property pages

Search Scopes (Figure 27-2)

• Link to advanced search page

 

• Search scopes drop-down (context-sensitive up to a list or site)

 

• Specialized use of search box with certain property keywords or symbols (implicit AND between search words)

Search Center (Figure 27-1)

• Search-specific part of a portal enabled via use of features; also available as template

 

• Customizable by site owners, in terms of location, appearance, new types of searches, or utilization of certain properties

 

• Typically contains a search box and links to advanced search

 

• Can contain specialized tabs for unique searches, e.g., people, external people, etc.

Search Results (Figure 27-3)

• Same as Search center, plus results Web Parts

 

• Any restrictions (language, number of results) can trigger advanced search

Advanced Search (Figure 27-4)

• Ability to use complex logic

 

• Additional filters available: language, content type, property

 

• Search multiple scopes

As seen in Figure 27-1, a customized MOSS-based Search application, such as the Knowledge Network enhancement, can provide tremendous value to an organization. A collection of Web Parts provides a web-friendly way to:

  • Locate exact matches

  • Locate relevant matches (organized in an easily understood way)

  • View other relevant information (such as web advertisements)

  • Work with search refinement options

  • Leverage additional help

Tip

Knowledge Network will be a freely downloadable add-on to MOSS as a Technical Preview in the first half of 2007. Watch the KN blog at http://blogs.msdn.com/kn/.

Search Web Parts in action for Knowledge Network

Figure 27-1. Search Web Parts in action for Knowledge Network

In this sample, many different parts of Search have been utilized, but in most circumstances, the basics of Search will suffice. Next, we will go through the very simple scenarios of searching basic content embedded directly within a MOSS site.

Using Search

A user’s first direct experience with Search will be via the Search Box, available in the top-right corner of the page (Figure 27-2). The search box allows the user to select a scope of the search, which is either context-sensitive (this site) or preconfigured by the administrator. Next, the user can enter search text, and either click on the magnifying glass to proceed with the search or select Advanced Search.

The search text itself can contain keywords, special characters, and property filters. If multiple search terms are used, MOSS assumes that an implicit AND has been used. Hence, when a user types in “quick brown fox,” the search is interpreted as “quick AND brown AND fox.” In the previous versions, SharePoint Search used OR, which led to too many results. Next, Search server can utilize different styles of keywords. Keywords can be classified as words (one or more characters with no spaces of punctuation), phrases (multiple words enclosed in quotation marks), or a prefix (part of a word). Further, keywords can be combined with special characters, which may signify inclusion or exclusion from search results. The default behavior of using multiple words without special characters is a simple AND, but without the guarantee of inclusion.

Use of Search Scopes

Figure 27-2. Use of Search Scopes

Table 27-2 lists these special characters.

Table 27-2. Special characters

Character

Action

+

Must contain content followed by “+”

-

Must not contain content followed by “-”

The next major enhancement to MOSS Search is the ability to use Property Filters without using the Advanced Search option. This is a popular extension to the major search engines, and users should be educated about this feature in order to utilize it successfully. Table 27-3 lists some of the popular properties.

Table 27-3. Properties

Name

Description

site

The URL property that must be specified in full. It can be a property from a SharePoint site or anexternal site, if such was indexed. This should not contain a trailing “/” symbol.

author

Author’s name. It must be enclosed in quotes if it contains a space, or a network ID can be used.

title

Title property. It should be enclosed in quotes if it contains spaces.

duplicate

Allows the search to identify duplicate results (which are typically collapsed).

scope

Specifies the friendly name for the scope, as in the Scope drop-down box.

Armed with these different elements of search syntax, we can take a look at some examples and interpret what they would represent in Table 27-4.

Table 27-4. Queries and interpretation

Query

Interpretation

Author:"Piotr Prussak” site:http://portal/sites/training

Find documents authored by Piotr Prussak in a training site.

Title:sharepoint–author:"bob fox”

Find anything with a title “sharepoint” AND where Bob Fox is not the author.

Department:HR department:IT

Find anything where the custom column “department” is set to HR OR IT.

Site:http://portal1 site:http://portal2

Find results from either of the portals.

One key difference between use of standard keywords and properties is that when two Property Filters for the same property are used, the resulting query is an OR query and not an AND query.

What about security? The Enterprise Search will hide all results that the current user of the website is not entitled to view at query time. In other words, there is no chance that the user can see the result, unless they would have access to it via other means.

The results screen, in its simplest form (Figure 27-3), has a number of interesting elements. First, there is a repeat of the search query in the query box. Next, near the top, there are links allowing the user to change the ordering of the results, as well as to set up alerts or an RSS feed for the underlying search (which is processed once a day). The following line represents an approximate number of results, the processing time, and a simple paging mechanism.

Finally, the results are broken down into High Confidence Results, or Exact Matches (if there are any), and partial matches. The results display an icon, signifying the type of the result, title of the result, and partial text with hit highlights. Underneath each result, the URL as well as some additional metadata are shown.

For those who would prefer to change any of the elements of MOSS search, the layouts and the contents of the Search and results pages are quite customizable. Indeed, many things can be easily modified or reconfigured. Latter parts of the chapter are dedicated to the possible modifications of the Search features.

Tip

There are also several third-party products that further enhance the MOSS search capability. You can find a number of them at the Microsoft Partner Site: http://directory.partners.extranet.microsoft.com/.

Finally, the advanced search screen (Figure 27-4) offers clear options to users who would like a very targeted and specific search, but who may not be familiar with all of the keywords and property filters that could be entered in the search box otherwise. Advanced search allows users to specify the search terms in a text entry box, to disambiguate a potential use of AND and OR, and select the use of + and symbols.

Standard search results

Figure 27-3. Standard search results

Additionally, the screen allows additional specification of the language, type of the target documents, and entry of any of the additional properties that would mimic the use of a property filter in the earlier examples. In order for a property to be listed in the Pick Property drop-down, the property value must be both crawled and managed. Both of these settings are covered in the “Administering Search Services” section.

Tip

For the truly hardcore users who like to tinker with different tools and APIs, there are two more ways to utilize Search services without dragging out Visual Studio 2005 and using the .NET APIs. One of them is the simple URL access, and the second is the Search web service.

Search Web Parts and Search Center

The next natural step is customization of the Search features visible to the end user. Although there are many administrator-type and developer-type options available, there are also many things that can be done by information workers and power users of the portal. Search Center, after all, works on the same infrastructure as all other pages, and is based on the Web Part architecture. MOSS’s Search Center ships with five pages and 11 Web Parts. Although information workers should also be involved in fine-tuning configuration of Search services, knowledge of the Search Center pages is important, as it introduces concepts such as Keywords, Best Bets, Search Statistics, and content targeting.

Advanced search screen

Figure 27-4. Advanced search screen

Tip

You can’t really edit pages behind the simple WSS Search. WSS Search is typically employed when the This Site scope is used in the text box.

Before you start with Search Center, you have to make sure that your site has Search deployed, or that your site is linked to a portal with a Search Center. If there is no Search Center deployed, it can simply be created as a subsite.

To create a Search Center subsite from the Site Actions menu:

  1. Select Create.

  2. Select Sites and Workspaces.

  3. Fill in typical site details (name, URL).

  4. Select either Search Center With Tabs or Search Center from the Enterprise templates box.

  5. Click Create.

The Search Center with Tabs template includes an in-page tab-like interface (which does not interfere with top navigation) and provides an additional People search tab. This template is a preferred interface when different Search screens might be developed for more specialized end user searches.

Warning

There may be a temptation to allow for a lot of personalization, but Search is one of the areas where this can lead to unnecessary confusion. With out-of-the-box settings, a Contributor can edit a search page only if she created it. As an administrator, you can always prevent people from creating pages by modifying permissions in the Pages library.

The Results page features seven Web Part zones, which allow you to add any Web Part from the Web Part catalog (the default query page has only two zones, and the Advanced Search has four zones). You can switch to the Edit Mode (Figure 27-5) by clicking on the Site Actions menu. In order to edit the page, it will need to be checked out, and in order for everyone to see the changes, the page will have to follow the standard process associated with editing Web Part pages: it must be checked in, published, and approved.

Search page tab in Edit Mode

Figure 27-5. Search page tab in Edit Mode

With the page in Edit Mode, you can now move, add, delete, or configure any of the Web Parts. Typically, the Results page is of most interest, as most of the prebuilt Web Parts are available. All of the Web Parts on this page work very well with each other, and they plug into the same result set available on the page. When adding new Web Parts within the Search Center, the Add Web Parts pop-up groups all Search-related Web Parts in one logical unit (Figure 27-6).

Search Web Parts

Figure 27-6. Search Web Parts

Table 27-5 shows the Web Parts that are directly related to Search. The majority of them are customizable even further, and some can provide data to other Web Parts.

Table 27-5. Web Part names and purposes

Web Part name

Purpose

Advanced Search Box

Used for advanced searches on the Advanced Search page (an example of customization of this Web Part is provided in the “Adding Custom Column to Search” section).

People Search Box

Used for people-specific searches.

People Search Core Results

Displays core results for People-related searches.

Search Action Links

Search action links include RSS, Alert, and sorting links.

Search Best Bets

Displays related Keyword, Best Bet, and High Relevancy items. These are covered in more detail in “Advanced Configuration Options,” later in this chapter.

Search Box

The standard search box.

Search Core Results

Displays the results of most common searches. This is the most frequently customized Web Part.

Search High Confidence Results

Shows Keywords, Best Bets, and other high-confidence results.

Search Paging

Displays links for navigation between pages.

Search Statistics

Displays basic search statistics.

Search Summary

Shows autocorrection suggestions in a “Did you mean” format.

For instance, the Search Core Results WebPart can be easily configured to show different results, including the internal contents of the resulting text. The display is controlled by a selection of columns and a configurable XSLT. This is a very flexible Web Part and should be the first element of customization of any of the result pages.

In order to configure the Core Results WebPart, the page needs to be in Edit Mode, and the user needs to click the Edit button next to the Web Part. Unfortunately, due to the unusual size of the WebPart zone, the property editor may have to be minimized and the browser maximized in order to see the properties. Some of the most useful properties are listed in Table 27-6.

Table 27-6. Useful properties

Property

Purpose

Results Per Page

Number of hits to show per page

Sentences In Summary

Length of the text under the title of the result

Highest Result Page

Maximum number of results that the user can reach

Default Results View

Ordering of the results

Remove Duplicate Results

Check if duplicate results should be collapsed

Enable Search Term Stemming

Check if keywords and results can be approximated using word stem (e.g., running versus run)

Permit Noise Word Queries

Check if previously defined noise words can be utilized in search

Selected Columns

XML definition of the columns to be retrieved

XSL Editor

XSLT style sheet that can be used to transform the results

Working with existing pages and Web Parts may not necessarily be sufficient in every single case where some deeper level of customization is desired. For instance, there may be some fixed queries, or potentially a brand-new application developed with new Web Parts (similar to the Knowledge Network mentioned earlier). In such cases, new pages and new tabs can be added to the Search Center site, as seen in Figure 27-7.

Create Page in Search Center

Figure 27-7. Create Page in Search Center

The process is very familiar:

  1. From the Search Center, click on Site Actions.

  2. Select Create Page.

    Tip

    In order for the Search to work, you should add two or three separate pages to the site, a Search page and a Results page. Once a Search page is added, you will need to configure Web Parts and edit Target Search options.

  3. On a newly created Search Page, select the Search Box Web Part and click Edit.

  4. In the Miscellaneous section (Figure 27-8), edit the Target Search Results Page URL to point to the newly created Results page.

  5. Optionally, you may also edit the Advanced Search Page if you have created one.

    Configuration of Search Box

    Figure 27-8. Configuration of Search Box

  6. Modify any other Web Part settings on both pages.

  7. Click Add New Tab to complete navigation.

  8. Typically, you will also want to add a custom search Scope and content type via SSP to target search results.

Different Features of Search

So far we have taken a look at Search functionality from the end user’s (and perhaps a power user/developer’s) perspective. Now, switching to the IT professional or a business decision maker, we need to look at the features that make Search happen behind the scenes. From the overall feature perspective, there are some major features of Search that ship specifically with MOSS, and some that ship directly with Windows SharePoint Services. Following the direct comparison of the most important features, there are also some secondary and nontrivial features that are worth mentioning.

Warning

True wildcard search is not readily available within the basic User Interface, but it is supported via the APIs. Additionally, it is supported by some third-party tools, such as Ontolica.

Unlike the previous version, where WSS team sites were not capable of using the SharePoint 2003 Search, once a WSS site functions under the Office Server itself, it is capable of inheriting all of its features. Without the Office Server, the features are slightly limited (Table 27-7).

Table 27-7. WSS and MOSS Features

Common Features

WSS

MOSS

Index SharePoint Content

X

X

Index Web sites, Exchange, file shares, Notes, LOB

 

X

Rich Results

X

X

Alerts

X

X

RSS

X

X

Remove Duplicates

X

X

Scopes, Managed Properties

 

X

Best Bets, Results Removal, Query

 

X

Tabs

 

X

People Search, Knowledge Network

 

X

Business Data Catalog

 

X

Query API, Web Service

X

X

Admin API

 

X

Tip

Results Removal is a new feature that allows quick removal of a URL from the results. As it occasionally happens, a popular document can be deleted. Instead of allowing the users to find the document in the search results and meet with disappointment (at least until the index is refreshed), the document can be manually removed from search results.

Clearly, the differences between the two products are quite significant in terms of Search technology, starting from the advanced indexing capabilities of MOSS down to the superior Web Parts. What about other esoteric features and improvements over other products and versions?

Word stemming

This is the ability to properly trim endings of words at index and query time so that proper matches can be made. This also means users don’t have to create wild-card searches to find words that are closely related to one another; for example, “bathing” and “bather” will both use the stem “bath.” To use this feature you have to turn on Enable Search Term Stemming in the Search Core Results Web Part.

Security trimming

The ability to properly filter results and display only the results to which the user has appropriate access.

Improved relevancy

New logic within the indexes map the distance between terms as well as relevancy within a document. For example, if a term is in a headline, it is more important than in the body of the document.

Query reporting

A great improvement for the server operator and the person who is responsible for managing overall quality of the data and search results within an organization. For example, people often look for particular HR documents, such as company vacations, but since every person may use a different search term, the administrator can set up keywords, best bets, and a thesaurus to point users to the document.

Shared engine

The search engine is now shared among all of the other Microsoft applications. This means that all of the bug fixes and improvements will ideally trickle in faster than if there were multiple engines to maintain.

Performance improvements (continuous propagation, security change crawl, change log crawl)

A lot of improvement went into the new indexing and gathering elements of the search, and they work significantly better and faster than in the past.

Tip

If the existing security trimmings are not sufficient, as may be the case with scans of external catalogs, a custom security trimming may be developed through the ISecurityTrimmer interface.

Architecture of Indexing and Search

Because the Search component is actually the only element of the MOSS infrastructure that tends to span just about any physical piece of the environment, it is also the most challenging to set up and to configure. A simple introduction to its architecture and to the inner workings of the system will potentially avoid future setup nightmares and configuration pitfalls.

To start off, see Figure 27-9 for a graphical representation of communication paths between some of the core features and elements of Search. In the middle, we have the core elements, the Index engine, and the query engine behind Microsoft Search. The index engine retrieves search configuration data from the database and is responsible for crawling the data sources and compiling the index. Query Engine, on the other hand, only works with the built index (as well as some additional configuration data) to serve the Search results.

Warning

Contrary to popular opinion, MOSS Search does not perform a full-text search directly against the data in the database. The gatherer part of the Index engine actually walks through every known piece of content against a WebFront End (WFE) server using standard HTTP protocol, discovering the content before deciding whether it fits the criteria of being indexed.

Core elements of Search technology

Figure 27-9. Core elements of Search technology

Architecture of Indexing and Search

Data flow across processes

Perhaps the best illustration is the actual flow of data within the Search, from the time a document is uploaded to the portal to the moment that the document is found via search by another end user:

  1. Document is uploaded to the portal by the end user.

  2. Index engine starts its gatherer to collate content.

  3. Index server retrieves data from the configuration database and retrieves rules, crawler rules, and impact rules.

  4. Index server selects the appropriate protocol handler to work with the MOSS data source.

  5. Index engine starts looking at the Web Front End server.

  6. Index engine finds a new document via the change log and retrieves its metadata.

  7. Index engine checks the rules to see whether the document should be included in the index.

  8. Index engine retrieves the document itself.

  9. Index engine opens an appropriate handler (IFilter) to do a full-text index of the document.

  10. Index engine processes the document, and places the appropriate information in the content index database.

  11. Database is propagated to the search engine.

  12. Document can be found via search.

Warning

Because the index propagation is done via NetBIOS, your propagation may run into trouble if there is a firewall between the Index and the Search servers.

Backup of indexes

Although the data in the portal can be backed up with an off-the-shelf SQL backup tool, the backup of the index is slightly more complicated because the index includes database tables as well as the resulting index files. As such, you must use the built-in backup utilities to achieve a full fidelity index backup. With a basic SQL backup, you will back up the database part of the index, which means that upon a restore, index propagation will commence, and the end users will not be able to search immediately. Thus, as part of your disaster recovery plan, you should include the time to propagate the indexes.

Server architecture

Depending on the needs and characteristics of the deployment, you may choose a variety of configurations. The typical choices will be influenced by server geography, volume of data, and volume of searches. Table 27-8 describes these factors and their solutions. There are several important rules to observe:

  • There can be multiple Search servers that will answer the queries made by the end users or via the API.

  • There can be only one Index server per SSP.

  • You can have a dedicated front end server to be used for crawling content.

Table 27-8. Factor and solution

Factor

Solution

Multiple geographic sites

Consider using multiple SSPs if the volume of data is significant in each site.

Large volume of content

Index daily, and use incremental updates combined with continuous propagation. If index times spill over to business hours, consider use of a dedicated server for crawling data.

High frequency of updates

Index more frequently, and observe closely the speed of updates as well as performance of the front end servers.

Large volume of search queries

Monitor search performance and add new Search servers as necessary.

High impact of indexing on front end servers

Utilize a dedicated front end server if the impact of the performance is noticeable to the end users.

Warning

When using a dedicated WebFront End Server, the underlying mechanism adds entries in the HOSTS file on the Index server, which may not work if multiple NIC cards are used and host headers are not used. Since the entries are automatically added by SharePoint, this may not work and might be hard to troubleshoot.

Administering Search Services

Administration of Search services can be a daunting task. The larger the server deployment, and the more content stored, the harder things become. Ideally, administration and configuration should be a task for a broad team, and could potentially include application architects, server and network operators, SharePoint administrators, and analysts and information workers. The majority of the administrative elements begin at the SharePoint Operations Center during the initial configuration of the portal and the Shared Services Provider (SSP), and then move onto the individual Web Applications and Site Collections. Options that are configured during the initial setup of the farm are accessible from the Manage Search Service page within the Application Management section, which is covered later in “Advanced Configuration Options.”

Tip

When initially installed, MOSS does not automatically crawl any content; crawls have to be initiated manually, or scheduled via the Content Sources and Crawl Schedules page.

Assuming that an SSP has already been configured in a farm, the bulk of the Search configuration lies in the Configure Search Settings Page, as seen in Figure 27-10.

Search Settings page

Figure 27-10. Search Settings page

Key Search Crawl Settings properties and action pages that can be managed are listed in Table 27-9.

Table 27-9. Search Crawl settings

Crawl setting

Description and use

Content Sources and Crawl Schedules

This page allows immediate management of crawls, as well as configuration of various sources of data and their schedules, that can be Full or Incremental. This is covered in detail in the upcoming “MOSS Content Source configuration” section.

Crawl Rules

Administration of crawling rules that decides whether a given page or site will be crawled (typically at least one per Web Application). Rules are specified via wild-cards, and additionally, different content access accounts can be configured here.

File Types

Lists and manages extensions of files that will be indexed. Note that Access, ZIP, or PDF documents are not included here. If you add a new file extension to be indexed, ensure that the appropriate IFilter is also installed on the Index server. For instance, to index Acrobat files, you need to download and install the Adobe Acro bat IFilter.

Crawl Logs

Report pages that indicate actual results of each log, which are useful in trouble shooting crawl problems. Results are broken down by hostname (or URL) and status of each crawl attempt. Further, you can filter on specific warning or error >messages that are causing problems. Generally, if you start seeing a lot of errors in the crawl logs, it may mean that either data is changing quite often or the front end server may not be able to cope with the load induced by the indexer.

Default Content Access Account

This page allows you to configure an account that would be used for crawling. It should not be an administrator or a well-known account. This account will also be added to the Policy for Web Application with Full Read permission.

Metadata Property Mappings

Administer the ability to search site columns and map them to some meaningful names for use with Search and Advanced Search. Use of this feature is described in the upcoming “Adding Custom Column to Search” section.

Server Name Mappings

This page is used to provide translations between the addresses used in the crawl and addresses being returned in the results. This is similar to the Alternate Access Mappings page, but typically is used with other content sources.

Search Based Alerts

Use these settings to activate or deactivate subscriptions to alerts for search results (compiled daily).

Search Result Removal

Lists any URLs that should be removed from Search Results. This is useful for any deleted or embarrassing search results.

Reset All Crawled Content

Erases the content index.

Warning

If you use an administrative account to crawl site content, and hide unpublished drafts from ordinary readers, the search results may show some undesirable parts of the document in the document summary, as the crawl account will have access to the unpublished document.

Additionally, Shared Services are also responsible for the Scope Administration pages and the Authoritative Pages (covered in the “Advanced Configuration Options” section later in this chapter).

In the Scope Administration pages, you can manage farm-wide scopes that are visible within the search boxes. Scopes can be configured as either shared per farm or applicable to a specific site. Additionally, scopes can target specific Search Pages as well as specific content. Scope rules can be made out of the following:

  • Web addresses

  • Properties (based on Author, contentclass, Site, or SiteName)

  • Content sources

  • All content

Further, each rule can be Included, Required, or Excluded. As you add new scopes to the system, it is important to keep mental track of their exclusivity. If you want to have separate and independent scopes for certain document types or websites, make sure that one scope includes a rule (or requires it) and another one excludes that particular rule. In fact, the All Sites scope excludes the SPSPeople content class (shown in Figure 27-11), and as a result, would never display people on the main search results page.

Scope Properties and Rules

Figure 27-11. Scope Properties and Rules

If, for instance, your portal depends on a particular document type, you can create a scope rule where a contentclass property restriction is equal to the document type in question. Similarly, a scope rule can be based on a subsite or a custom content source, such as your Exchange public folders.

Advanced Configuration Options

The next few options are typically used with bigger farms, or when there is a need to gather data from external sources or to fine-tune the querying or search results.

Advanced Configuration Options

Application Management: Manage Search Service

One of the Search administration pages actually lies within the Application Management page, within the Manage Search Service section (Figure 27-12). It is a mishmash of farm-wide and application-specific settings, which generally ties the integration between the application, the farm, and the SSP on a single page. The Shared Services section simply indicates and links to the Shared Services Provider associated with a given WebApplication, as well as the management pages we’ve worked with throughout this chapter.

Manage Search Service

Figure 27-12. Manage Search Service

The Manage Search Service page is the place where you can change the options that were originally set during the configuration of the farm, either on the Farm-Level Search Settings page or via the Office SharePoint Server Search Indexing and Query link.

The Managing Farm-Level Search Settings deals with configuration for external searches, and allows you to configure information such as:

  • An email address associated with the crawler

  • Proxy settings to be used when crawling other servers

  • Connection and request acknowledgement timeouts

  • SSL certificate name warnings settings

Next, in Figure 27-13, you can see the Configure Office SharePoint Server Search Service Settings On Server servername administration page. This page mixes server and farm settings for configuration of service-related properties. Table 27-10 lists the search configuration elements.

Table 27-10. Search configuration elements

Configuration element

Description

Query and Indexing

Indicates the role for which the server will be used. Note that the Office SharePoint Server Search must be started for this to be available.

Contact E-mail Address

This is the email address associated with the crawling account.

Farm Search Service Account

This is an account that is used by the Windows service, not the account that is used to gather data. It must have privileges to use the search databases and to run as a service.

Index Server Default File Location

This is the location where index files are stored, but in order for it to be edited, you must use STSADM, utility in the BIN folder. You should monitor disk activity and place this on a fast, dedicated disk if necessary.

Indexer Performance

This allows the administrator to set the impact of indexing on SQL and internal server resources.

Web Front End and Crawling

This setting allows you to select a dedicated WFE as a target for crawling. Note that there are some situations in which a dedicated WFE does not work (described previously in the “Architecture of Indexing and Search” section).

The last element of the farm-level management is the Impact Rules management page, which allows us to edit Crawler Impact Rules, as in Figure 27-14.

Crawler Impact Rules are used to throttle the retrieval of requests that are made against the WebFront End (WFE) server. In many cases, the WFE simply is not capable of serving as many pages as the crawler can request. This results in poor performance at the WFE, along with crawl errors associated with the data source. Most likely this type of problem indicates an underpowered server, combined with heavy ASP.NET processing, which results in longer processing of the pages. This can be fixed in two ways: by disallowing the crawl of the ASPX pages within the site (this is configured within the Site administration settings) or by changing the Request Frequency within an Impact Rule.

Office SharePoint Server Search Service Settings

Figure 27-13. Office SharePoint Server Search Service Settings

Add Crawler Impact Rule

Figure 27-14. Add Crawler Impact Rule

Each Impact rule is associated with a URL that is used for crawling a SharePoint site, and then a request frequency, which can be represented either in terms of the number of simultaneous requests or in terms of a delay between requests. You must look at the performance of the WFE server to determine the optimal request frequency, while also balancing out the need for a short indexing time.

MOSS Content Source configuration

Content Source configuration allows for indexing and management of schedules of different sources of data that flow through the index. Most commonly, this will be used to set up a schedule for a default data source, but also it will be used to add external content to the Search engine, such as external web sites or file shares.

Content Source configuration is available from the SSP, from the Content Sources And Crawl Schedules page. There you can see the listing of available Content Sources and their schedules. Clicking the New Content Source link opens the Con-tent Source Configuration Page (Figure 27-15).

After specifying the name, you have to specify the desired indexing handler or effectively indicate the appropriate Content Source Type, which can be one of the following:

  • SharePoint Sites

  • Web Sites

  • File Shares

  • Exchange Public Folders

  • Business Data

Next is the start address, which will typically be http://servername or \servername share for a folder share. Unlike previous versions of Index server, you can associate a single content source with multiple addresses. In order for the Business Data crawl to take place, the Business Data Catalog must first be configured within the Shared Services Provider. For the Exchange Server, you’ll typically configure the search of public folders via the HTTP protocol.

The Crawl Settings section adjusts dynamically and is context-sensitive to the Content Source Type (Table 27-11). Crawl Settings displays several options, ranging from single page crawls to unlimited crawl within the content source. It is unadvisable to use very broad settings, especially for web sites (e.g., unlimited page or server hops), as broad settings will potentially kill the content source and may never finish indexing. In fact, it may be easier and more economical in the long run to integrate results from external search engines into MOSS rather than index big sites.

Tip

When adding a People source to an existing SharePoint Sites index, you have to use an sps3:// moniker instead of the usual http:// in order to leverage it.

Content Source Configuration page

Figure 27-15. Content Source Configuration page

Table 27-11. MOSS Crawl Settings

Content Source Type

Crawl Settings

SharePoint Sites

Crawl everything

 

Crawl only the top level site

Web Sites

Crawl all pages within the server

 

Crawl single page

 

Custom

 

Allow x page hops

 

Allow x server hops

File Shares

Crawl folder and all subfolders

Exchange Public Folders

Crawl folder only

Business Data

Crawl entire Business Data Catalog

 

Crawl a specific BDC application

The last setting for each Content Source is the schedule, which resembles a typical scheduling calendar. Besides the choice of Crawl Schedule frequency, which can be set at Daily, Weekly or Monthly intervals, there are additional settings that indicate how often the given schedule should be repeated and how long the crawl should last. Each Content Source can have two different types of crawls, Full Crawl or Incremental Crawl. Obviously, to build an index from scratch, you will need to start off with a Full Crawl, and the Incremental Crawl will intelligently pick up additions, deletions, and changes to the content source. There are, however, other times when the Full Crawl must be run:

  • Changes to the inclusion/exclusion rules

  • Changes to a crawl account

  • Changes to file types and IFilters

  • Changes to Property Mappings

  • Major changes to WSS sites that delete the Change Log

Tip

If you have a specific site that requires high-frequency updates, consider adding a new Content Source specific to that site and fine-tuning the Incremental Crawl schedule to a point where the index would always be fresh enough.

In most circumstances, a daily Incremental Crawl schedule should be sufficient.

Authoritative pages

Generally, the Authoritative pages make sense only in the context of using multiple Content Sources, and the concept of Result Relevance may not be applicable to single portal sites. Essentially, the Authoritative pages increase the relevancy of a page by decreasing the click distance to the source of truth. Administration is very simple, as there are four entry boxes, allowing for entry of the following:

  • Most Authoritative Pages

  • Second-Level Authoritative Pages

  • Third-Level Authoritative Pages

  • Sites to Demote

As such, when indexing a number of content sources and observing the quality of the search results (as described in the next section), there is a possibility that a page with better information could be presented further down the results listing than a page with the best match. Thus, in order to increase the rank position, the administrator should add the site that hosts the more relevant data as the Most Authoritative Page, and perhaps add the useless site as the Site to Demote.

This feature is useful when indexing many content sources that potentially have similar data—for instance, your intranet site versus your public Internet site. Most likely, your intranet would have the most relevant information.

Search Query/Results Monitoring

One of SharePoint’s most desirable features for information officers is the ability to monitor query and results statistics via the Query Reporting Tool (Figure 27-16). The tool creates graphs as well as different types of reports that can be exported to Excel or PDF formats. These are available for each SSP within the Search Usage Reports section, and the Search Results page is available via the menu on the lefth and side of the page. Based on this information, you may want to fine-tune best bets, scopes, authoritative sites, and plan capacity of your Search services. This page displays the following reports:

  • Queries Over Previous 27 Days (bar chart)

  • Queries Over Previous 12 Months (bar chart)

  • Top Query Origin Site Collections Over Previous 30 Days (pie chart)

  • Queries Per Scope Over Previous 30 days (pie chart)

  • Top Queries Over Previous 30 Days (text summary)

SSP Search Queries Reports

Figure 27-16. SSP Search Queries Reports

Similarly, the Search Results Report has the following reports available:

  • Search Results Top Destination Pages (text summary)

  • Queries With Zero Results (text summary)

  • Most Clicked Best Bets (text summary)

  • Queries With Zero Best Bets (text summary)

  • Queries With Low Clickthrough (text summary)

The Search Results Reports page is very useful because it highlights the problem areas, or at least the areas where addition of Best Bets or Authoritative sites may increase the end users’ ability to find relevant information using Search.

Unfortunately, these pages are buried within the administrative features, and may not find an appropriate audience. The typical consumer of these reports should be the site collection administrator who is managing the Best Bets.

Search Configuration in Sites and Site Collections

Looking away from the SSP, and at individual sites and portals, there are many advanced search settings that are configurable only from the Site Collection or Site level. In order to open these settings, you have to select Site Settings (or Modify All Site Settings) from the Site Actions drop-down menu. Depending on the site template and privileges, your Site Settings page may be hidden under a particular submenu. Figure 27-17 displays the Site Settings page at the Site Collection level, and highlights management pages that are relevant to Search.

Site Settings page at Site Collection level

Figure 27-17. Site Settings page at Site Collection level

Depending on the context of the Site Settings page, some of the management options that are listed in Table 27-12 and visible in Figure 27-18 may not be available; however, if they are available, their meaning and management screens will be identical.

Table 27-12. Search Configuration options

Page

Description and use

Searchable Columns

This page allows you to view all of the columns used on the site and to select the columns that should be excluded from crawling. For columns that are not relevant or are frequently updated (via some automated means), this could be a way of exclusion from repetitive crawls.

Search Visibility

This page sets web-level availability of the results within Search. Additionally, the page allows fine-grained control over indexing of the ASPX pages on the site.

Related Links Scope Settings

Use these to manage local site scopes that will be visible only when using a Search Box within the site.

Search Settings

Set the ability to use a specific Search Center and advanced scopes for a Site Collection.

Search Scopes

Manage local Site Collection scopes against the scopes that are provided by the SSP

Search Keywords

Ability to manage keywords, best bets, and any associated approvals. This is a place where a site collection administrator can easily influence the contents of the search results screen.

 

Expired Keywords

 

Keywords Requiring Review

Finally, the Add/Edit Keyword page, as seen in Figure 27-18, can be launched at a site collection level from the Manage Keywords page. It provides a quick way for an information officer to enhance the search experience. Based on the Query and Search Result data, you can easily steer the users in a proper direction by providing additional synonyms, a definition, or even best bets to increase the value of the Search Results page (this is where the contents will eventually show up).

Similar to the result advertisements in Google, the targeted results show up on the righthand side of the page (refer back to Figure 27-1). Typically this feature is used when a large number of queries target a particular search term—for instance, “vacation” on an HR site, with no proper exit pages. In such a case, the administrator can associate the term “vacation” with a couple of synonyms, such as “holidays” or “days off,” and provide a link to the company policy on vacations.

A Keyword or a Keyword Phrase consists of the following elements: synonyms, best bets, definition, contact, and various dates associated with publishing and visibility. Synonyms are important because they allow association of various elements with a best bet or a definition. Best bets are important because they allow the site administrator to advertise a particular link or a location as a good result for a particular query. Similarly, a definition, as well as a contact, can provide additional clues to the end user about the applicability of the suggested result.

Add Keyword

Figure 27-18. Add Keyword

Warning

When managing a list, one of the options on the Customize List page is the Indexed Columns page. This indexing is not directly related to Search, but rather to CAML queries, similar to the way that SQL Server columns are indexed. If none of the Web Parts or custom code utilizes CAML, there is no need to add any columns for indexing.

Adding Custom Column to Search

Besides enhancing the search with a keyword or a best bet, the second typical scenario is the ability to add a site column that shows up in a custom list to the Advanced Search screen. For the sake of the exercise, we’re assuming that this column already exists and is called “Company Name”:

  1. Open SSP and navigate to the Search Settings Page.

  2. Open the Metadata Property Mappings page.

  3. Click New Managed Property:

    1. Provide a name for the new property, “Company Name” (making sure it is unique).

    2. Select the proper type; in our case, it is Text.

    3. Click Add Mappings and find the Company Name (Text) Property in the New Managed Property screen by searching “Company” and clicking Find.

    4. Click OK.

  4. Perform a Full Crawl.

  5. The content is now available to be searched via standard search.

  6. Modify the Advanced Search Box Web Part on the Advanced Search page:

    1. Expand the Properties section and open the Properties dialog box to add new properties that should be shown.

      Tip

      Open the contents in an XML editor, and do not edit this in the default entry box.

    2. Add the following to the <PropertyDefs> node:

      	<PropertyDef Name="Company Name" DataType="text" DisplayName="Company" />
      
    3. Add the following to the <ResultType> node:

      	<PropertyRef Name="Company Name" />
      
    4. Click OK twice and exit the page Edit Mode.

  7. Results should be visible on the screen.

Extensibility Options Associated with Search

Although WebPart and application development areas are covered in different chapters, Search has its own set of APIs available for consumption. The basic APIs are fully covered in the Object Model chapter, and the Search Web Service in the chapter on Web Services, but there are some additional technologies of interest:

  • WSS .Net APIs

    Microsoft.SharePoint.Search.Administration namespace
    Microsoft.SharePoint.Search.Query namespace
  • MOSS .Net APIs

    Microsoft.Office.Server.Search.Administration namespace
    Microsoft.Office.Server.Search.Administration.Security namespace
    Microsoft.Office.Server.Search.Query namespace
    Microsoft.Office.Server.Search.WebControls namespace
  • Search Web Service

  • Specialized SQL Syntax

  • URL syntax for executing queries

  • IFilter technology

Warning

There are also several obsolete legacy Portal APIs that should not be used. They are in the Microsoft.SharePoint.Portal namespace, and have all been replaced by the Microsoft.Office.Server namespace.

The preceding list gives a good account of all extensibility options that are available to Search developers. The key APIs specific to execution and access to SharePoint are obviously the .NET APIs, but all of the work underneath is actually carried out via specialized SQL-like search queries. Also of some interest is the URL syntax for executing queries, as it essentially allows the developer to reuse the existing search infrastructure to quickly provide some customized searches.

Extensibility Options Associated with Search

Specialized SQL syntax

The Enterprise Search extends the SQL-92 and SQL-99 standards to provide additional functionality in the area of search. For those who already know standard SQL, the query format is very familiar:

	SELECT <columns>
	FROM SCOPE( )
	WHERE <conditions>
	ORDER BY <columns>

The following table describes these standard clauses.

Clause

Description

SELECT

Specifies columns to be returned

FROM SCOPE()

Somewhat deprecated; has to be FROMSCOPE(), and actual scope is selected within the WHERE clause

WHERE

Specifies Search conditions that indicate a match

ORDER BY

Specifies sort order for the results

Although the basic clauses are the same as in standard SQL, some other features and keywords may be unavailable, and there are two specialized predicates: CONTAINS and FREETEXT. Additionally, the main areas where Search queries extend SQL Search include the ability to use 127-character column names, accent insensitivity, use of a thesaurus, and a looser interpretation of the NULL predicate. A typical query to display a title, path, and author is:

	SELECT title, path, author FROM Scope() WHERE CONTAINS('author:Piotr Prussak') AND
	"scope"='Books' AND FREETEXT(DEFAULTPROPERTIES, 'MOSS') ORDER BY Path ASC

Warning

There are some deprecated elements from SharePoint Portal 2003 Search: COALESCE_TABLE, RANKBY, UNIONALL, MATCHES, FROM<scope> (now part of the WHERE clause), and CAST, as well as column weighting.

Unlike SQL Server queries, where columns are defined in tables or views, the columns that are available for Searching are only defined in SSP’s Search Settings on the Metadata Property Mappings page. What makes the metadata search work well is the ability to map multiple site columns into a single search column, where, for instance, a Contact Name column and an Employee Name column could both be mapped to the same property LastName. Two additional common results columns are Rank and LastModifedTime, which are often used to sort the results.

URL syntax

Another possible way of utilizing MOSS Search is via the URL syntax. The decision to have either a piece of code that submits HTTP GET queries or screen-scrapes pages is up to you. The syntax is very simple, but it is somewhat more limited as compared to the other forms of Search. Typically, the URL queries are submitted to a specialized search page, such as results.aspx within the Search Center. In most circumstances, all parameters are combined, but be careful when submitting only a partial URL syntax, as the results page may fail to render results. The parameters are outlined in the following table.

Parameter

Example

Description

k

Results.aspx?k=MOSS

Specifies keyword to be searched

s

Results.aspx?s=Piotr%20Site

Specifies the scope

v

Results.aspx?v=date

Specifies order, where v can be either date or relevance

start

Results.aspx?start=2

Specifies a page to display

A full URL query may look like this:

Results.aspx?k=moss%20books&s=people&v=relevance&start=3

That URL query would look for the following:

  • Keyword = moss books

  • Scope = people

  • Order view = relevance

  • Start page = 3

Tip

IFilter development is outside of the scope of this book. There is more information to be found on various Microsoft and third-party websites specializing in this technology, such as the blog http://blogs.msdn.com/ifilter/ or MSDN’s http://msdn2.microsoft.com/en-us/library/ms691105.aspx and http://www.ifilter.org/.

Conclusion

Search Services are a very significant feature of MOSS. Because of Search’s complexity, it may take a diverse team some time to master all of its configurable elements. It will also be one area of your portal deployment that will typically require regular care and feeding. Not only should you be on the lookout for general performance of indexing and search in terms of numbers of errors, accuracy, or speed, but also study the search patterns of end users and tweak keywords, as along with best bets.

Last, for a happy Search Environment, here are some best practices:

  • Know the environment:

    —Utilize dedicated front end servers for large or busy sites.
    —Schedule for off-hours, especially full updates.
    —Schedule wisely, when speed of updates is critical.
    —Use Crawler Impact Rules if crawling against a live front end.
  • Use Automated Crawl Lists for each Web Application with a host header.

  • For global or multisite deployments, consider dedicated crawlers and SSPs for each site.

  • Use the same account for crawling, but use a non-administrator account.

  • Use MOSS Backup and Restore to back up indexes.

  • Pause indexes if needed, and do not stop them, as this triggers a full update.

  • Monitor performance and results.

  • Utilize Query and Results Reports to study user behavior.

  • Utilize Keywords, Best Bets, and Authoritative Sites to improve user experience.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset