18

Introduction to Enterprise Search

WHAT’S IN THIS CHAPTER?

  • Understanding why search is more important and more complicated today
  • A survey of SharePoint 2010 search product offerings and how to choose the right one for you
  • A look at the search user experience, how to utilize it, and what information it provides to end users

Search has become a powerful and ubiquitous tool in business enterprises and personal life. Very few users in an enterprise have not had some experience with a search engine in one form or another. Search engines help us find such things as pictures of tree frogs, websites that sell office chairs, and the latest car-chase videos. They also help us answer questions like: Where do tree frogs live? How is the traffic on the Eisenhower? What is today’s horoscope?

In short, search connects people to data that is timely and relevant to their current needs. In today’s enterprises, the data is located on multiple systems that may or may not be under the control of the organization. Relevant data on those systems is in an increasingly abundant number of data formats. The corpus of content that our organizations are interested in is so large that end users can no longer simply browse for files or remember where all their documents are. In today’s enterprise, search is becoming a business-critical system.

The SharePoint 2010 stack of search products offers search solutions that address nearly every search need that an organization may have, from the small-scale team and small business search all the way up to worldwide enterprises. With this broad range of search coverage and the depth of features offered by SharePoint comes a complex set of technical and conceptual components. You cannot simply “turn on” search and expect it to work. An effective search solution requires a lot of analysis and a lot of up-front configuration. Once in place, it will require constant monitoring and tweaking so that it does not grow stale and useless over time.

This chapter focuses on getting you up to speed with search and the SharePoint search products. It does not include any technical directions for configuring search components; that is saved for Chapters 19 and 20. Rather than jump into the technical side, this chapter introduces you to each of the moving parts involved in a search engine. Then it covers each of the SharePoint products that provide search capabilities. Finally, the chapter wraps up with a user’s guide point-of-view discussion of the search user interface.

UNDERSTANDING SEARCH

When you come to a search product for the first time, it’s important to understand how a search engine works. How do people use a search engine? What results do search users expect to see? How does the search engine gather results and present them to the end user? How much can I change the user interface to meet my organization’s unique style and needs?

Often we think of search as a simple query/response model where the end user types in a set of keywords and the search engine displays results in a series of pages. Search Engine Optimization experts leverage this model to help get targeted entries higher in the search results page. The error in this model is that it does not take into account all the things a search engine can accomplish, nor does it allow for the variety of ways in which a person uses a search engine. This section begins with a discussion about search behaviors — a cursory look at how people use search engines and the kinds of results they expect to see. (Spoiler alert: People do more than search for a specific website.) The section then explains the many working parts that make up the search engine system — you cannot learn how to select and configure search until you first know how it works and how to use it.

Search Behaviors

Users approach search engines in a variety of ways and for a variety of purposes. The user who has forgotten where he uploaded a sales report uses search to find his document. The user who is writing a statement of work for the very first time uses search to find a set of sample documents to help him get going. The marketing manager who is getting ready to launch a product line will use search to find all content related to the product. The human resources specialist will use search to locate areas of expertise in the organization. In short, every user interacts with the search system in his or her own way and brings individual problems to the search. To be able to design and operate an effective search solution, it is helpful to distill these search behaviors into a few broad categories that the search manager can understand and manage.

  • Content search: SharePoint is optimized for the content search approach. In a content search, the user searches to find results that are related to a specific search keyword or phrase. The search user is looking for a specific piece of content and may not know where it’s located or may have forgotten. Content search answers a question like “Where is this year’s benefits summary?” or “Where is the sales presentation that I uploaded yesterday?” It can also answer location-type questions like “Where is the Acme project team site?” or “Where are the corporate logos?”
  • Query search: Sometimes we don’t need a specific piece of content as much as we need to answer a question about a topic. In the query search approach the results are a collection of content and locations related to the search term in some way. This allows the search user to browse through a small set in a single location (the search results page) rather than haphazardly going through sites, file shares, databases, and other systems to answer the question. Some questions answered by the query search approach are “What do other statements of work documents look like?” or “Who’s contributing the most documents on network operations?”
  • Research search: Research search is a refinement of query search but is directed more to providing all information relative to a topic. More than answering a single question, the end user expects to browse and aggregate the information returned by the search engine. Perhaps the user is writing a report in response to an audit or is contributing to the company’s annual report. In these and similar cases, the user may want to see people results that show most frequent contributors, a complete listing of all team and project sites related to the topic, presentations, reports, and other information that the searcher can then use to complete the audit report or annual report.
  • Application search: Application search is a special situation used by developers who are building custom solutions. Many times, a developer may build an application that uses the search engine to provide data to the application. For example, a developer may build a Web Part that uses search against the company’s CRM solution to display client information that is specific to the logged-in user.

In most organizations, you’ll find all these search behaviors; some will be used more frequently than others. It is important to understand these categories of search behavior and how they are used in your organization, because each edition of SharePoint search products offers features that are optimized for each of these scenarios. This will not only affect your selection of a search product; it will also affect the operational maintenance and monitoring of your search solution. Knowing these behaviors will directly affect the success of your search solution.

I once consulted on a search project for a major food corporation where a senior vice president was campaigning to scrap the current search solution and bring in a new one, because he thought the current solution couldn’t deliver relevant results. The problem was that this particular person frequently uploaded documents to team sites and promptly forgot where they were, or he had someone else upload documents and did not get a document link from them. When he came back to the system on the same day or within a couple of days he would search for the documents by entering the document’s filename in the search box — a technique that should put the desired document at the top of the list. Consistently, the documents would not appear in the search results — not just on the first page, but not at all. When we examined his behavior against the search system, we found that the implementers of the search solution did not consider a user behavior in which the user would use search to find documents within a few days of uploading them. They assumed there would be a larger lag between uploading and searching and therefore set the crawl schedules at one-week intervals. When we changed the crawl schedule to every four hours, the VP began to find his documents every time he searched. Ultimately the problem was not the technology; it was the operational understanding of user behaviors.

Basic Understanding

The challenge to effectively learn search in the SharePoint product stack is to understand the basic components that make up a search solution. Although most users are quite familiar with using search engines on the Internet, this experience is limited to interactions with a small portion of the search system. A complete search system (Figure 18-1) includes five main components (crawl, index, query, user interface, and monitoring) and an application interface for developers.

Search begins with the crawler. This is the component of the search engine that accesses content in its various forms and locations. In each location, the crawler is responsible for building the initial search index by reading, categorizing, and identifying metadata properties of the content stored in those locations. The crawler does not simply bounce from location to location without direction; rather, it follows a directed path identified by a set of content sources.

A content source in the search index identifies a location for the crawler to index and the parameters by which the crawler must perform its duties at that location. The settings defined in the content source help to limit the crawler so that it doesn’t get out of control while roaming the content. These rules include limiting the number of hops that the crawler will follow, or limiting the crawler to a single server.

Beyond simple crawl settings, crawl rules allow you to set more restrictive controls on the crawler to include or exclude content. Unlike a filter that gathers all content together and then sifts out certain content for presentation, crawl rules force the crawler either to add entries to the index database or exclude them. In this way, the crawl rules affect the very content of the index and by extension what is available for the end users to view in search results.

In larger search systems where performance becomes an issue due to the quantity or size of content located in the content source, crawler impact rules and effective scheduling will aid in configuring the most efficient crawls. Each content source allows for its own crawl schedule, which is a powerful performance optimization technique allowing you to crawl frequently. The crawler impact rules allow you to increase crawl efficiency by defining batch sizes to limit the number of documents that are simultaneously crawled or to set a delay between documents for which the crawler needs a longer time.

Whereas the crawler gathers data from the content sources, the indexer is responsible for organizing the content. It takes the content from the crawler and determines how things are stored in the index databases. As it performs this organization it also continuously propagates the data to the index databases that are located on each query server.

Some aspects of the indexer are configured by search administrators to enhance the search experience or to improve the relevancy of search results that are presented to the end user. As the crawler works through content, it identifies metadata properties that it indexes. These crawled properties can be document properties in Word, PowerPoint, and other document files, columns in SharePoint lists or Excel spreadsheets, metadata tags in HTML files, and other identifiable fields associated with the content.

In many cases, you will want to utilize crawled properties in search scopes and as parameters in the Advanced Search form in the Search Center. Mapping a crawled property to a managed property allows for this kind of promotion. The managed property not only makes a crawled property available to search scopes and filters; it can also combine multiple crawled properties into a single managed property. For example, in many SharePoint instances, multiple SharePoint lists may have columns that refer to a vendor using column names like Vendor or Supplier. Managed properties allow you to combine these similar properties into one unit that will ensure search results contain all information relevant to a search query.

The query server is the final step in the search process. It processes requests and returns result sets to the web server for display in the user interface. Each query server contains its own copy of the index in order to respond most efficiently to search requests. This separation of indexes ensures that the crawler will not affect search performance as it interacts with content sources on the other end, and the other end will not affect the index. If any query or crawl operation becomes resource-intensive, it will not adversely affect the overall operation of the search system.

Although fully functional search systems provide the ability to index content from a variety of sources and formats, it is not feasible to index all information that may be relevant to an end user’s search needs. Sometimes content is located on other systems that are indexed by other search engines, such as the Internet, which is serviced by engines such as Bing. Federated Search leverages open standards that will allow you to include search results from other search engines in your search results.

At the end of the search process, the user interface is the portion of a search system that all users are most familiar with. The user interface both accepts search criteria from the end user and presents search results back. The results presented in the user interface are sorted based on their relevance, which is a calculated ranking of each item based on the search terms and a variety of other criteria. Administrators are able to enhance the search experience by configuring Best Bets that will show specific search-result entries at the top of the results page above all others when specific keyword matches are made in the search criteria. Although the user interface is a rather passive presenter of information (the query server does the work of ranking, trimming, stemming, and sorting content), the interface is highly customizable in SharePoint.

With this basic understanding of the components of search in SharePoint, you will be better able to examine the many search offerings available in the SharePoint 2010 stack. Chapter 19 provides a more detailed look at each of these search components as well as in-depth discussions about configuring these features in SharePoint. The next section looks at each of these offerings in detail and follows that up with a discussion of the criteria that you need to consider when selecting a search solution.

PRODUCT OVERVIEW

Microsoft’s search offerings seem to grow exponentially from one version of SharePoint to the next. In early versions of SharePoint, basic search was the only option. With MOSS 2007 and WSS, a selection between basic search and a more robust enterprise search offering was built into SharePoint Server. In the past few years, Microsoft has introduced stand-alone server products in free and licensed versions, and it has acquired FAST. With the release of the SharePoint 2010 stack, selecting a search solution is now a complex task that requires careful consideration of your organization’s search needs. Therefore, you will need to understand what each search product brings to the table.

SharePoint Foundation 2010

SharePoint Foundation 2010 is an entry-level search offering that is integrated with the free version of SharePoint. This built-in offering limits the search capability to a single site collection within SharePoint Foundation and will only crawl SharePoint content — that is, lists, libraries, and web pages contained in the site collection. There is limited ability to configure the search system in SharePoint Foundation and the crawler is scheduled automatically.

The crawl server and the index server are combined and cannot be separated onto dedicated servers for redundancy or optimization purposes. On the other hand, the search capabilities can be separated onto a single dedicated search application server.

In most cases, SharePoint Foundation 2010 search will be insufficient to meet your requirements, not only because of the limitations in the technology, but also because Microsoft offers Search Server 2010 Express as a free download. When installing SharePoint Foundation 2010 as a collaboration solution, you should always download and install Search Server 2010 Express as well.

Search Server 2010 Express

Search Server 2010 Express is also an entry-level search offering, but it adds a number of important features that make search much more usable. This server is able to go beyond the boundaries of a single SharePoint site collection and can crawl a variety of external content sources such as SharePoint sites, websites, and file shares. Consider for a moment that most SharePoint Foundation 2010 installations are intended as collaborative environments for department- or project-level entities. Also consider that in most organizations the entire body of content that is relevant to the department or project is not usually contained within the SharePoint site: it is usually contained in SharePoint, file shares, other websites, databases, and possibly e-mail systems. The addition of Search Server provides your SharePoint Foundation installation with the necessary connectors to bring together the information from these disparate systems.

Similar to SharePoint Foundation, Search Server 2010 Express is very limited in its topology. You cannot deploy to multiple servers for redundancy or separate the index and crawl components for performance. Likewise, you are limited to 300,000 items in the index. For the most part, Search Server 2010 Express will fulfill your needs when you are adding search to a very small-scale solution or when your budget is extremely tight.

Search Server 2010

Search Server 2010 is the beginning of enterprise-level search solutions. As such, it is also the first product that requires a license to use. In the SharePoint stack, Search Server 2010 sits between Search Server 2010 Express and SharePoint 2010 as a stepping stone. In many cases, Search Server 2010 is deployed when your content overloads Search Server 2010 Express and requires a more substantial system. But it can also be used when SharePoint 2010 is not deployed or not available to provide search to your content.

Search Server 2010 provides most of the search capabilities that are found in the licensed SharePoint 2010 product with the exception of People Search, taxonomy, and social searching. This is an important point when making decisions about your search solution because it substantially impacts search-result relevance. One of the great advantages SharePoint Search has over many competing products is that it can produce highly relevant results that are better and more efficient because of the social searching capabilities it offers. Search Server 2010 does not benefit from these improved relevancy features.

This is not to say that Search Server 2010 results are not relevant. On the contrary, Search Server 2010 is able to leverage multiple content sources, Federated Search, crawled and managed properties, scopes, and other core search components to provide very relevant results. What it lacks is the ability to provide a search result that places a particular document at the top of the result stack because it was authored by your boss or because you tagged it in the SharePoint interface.

Search Server 2010 is freed from the topology restrictions that constrain Search Server Express and SharePoint Foundation. It can be scaled to multiple servers for redundancy and performance purposes, giving you the ability to index up to 100 million items. It is a robust search solution for the organization that does not use SharePoint. But if your organization is using SharePoint or is moving to SharePoint, you should strongly consider moving your search solution under the SharePoint umbrella.

SharePoint Server 2010

SharePoint Server 2010 is a complete search solution that provides a better search experience than its competitors. Like Search Server, SharePoint is able to index content from a variety of enterprise locations such as multiple SharePoint servers, file shares, web servers, Exchange, and Lotus Notes, and external line-of-business systems. It also provides the ability to tune, refine, and monitor search to improve the search experience. Finally, as noted earlier, SharePoint Server 2010 adds social-search capabilities and People Search.

SharePoint Server 2010 comes in a variety of editions, but the most important are the Standard and Enterprise editions. From a search point of view the difference between these editions is in the smaller details; the core search components are the same as well as the administrative components. In all editions, the search component is highly scalable, allowing you to separate the crawl from the index components and leverage multiple servers for each.

FAST Search Server 2010 for SharePoint

FAST Search Server 2010 is a new search product that provides advanced enterprise search capabilities. Where SharePoint Server 2010 can scale to index up to 100 million items, a FAST Search Server can be scaled to index more than a billion items of content (although Microsoft’s published material lists it at 500 million plus).

Where FAST Search Server 2010 really shines is in the new features that it brings to the search experience. FAST adds Visual Best Bets, thumbnails, the ability to preview PowerPoint in the browser window, contextual search capabilities, and a variety of relevance-tuning options that take search from a generalized connection between users and content to a powerful search application that delivers timely and contextually relevant data to the right users. These new capabilities allow administrators and developers to create search applications such as research applications.

FAST Search Server 2010 is the high end of search products, but gains in productivity and expanded search experience will in most cases justify any costs that go with implementing this product.

WHICH PRODUCT IS RIGHT FOR ME?

In most organizations, the selection of a search product will come down to a decision between SharePoint 2010 search and FAST Search Server 2010 for SharePoint. The fact that you are reading this book suggests that you are looking at the entire SharePoint suite, rather than at search in isolation. That said, Search Server and FAST Server have stand-alone variants that are robust, fully featured search solutions in their own right.

For the purpose of this discussion, I will assume that your search decision will primarily fall between one of the enterprise products. The analyses presented here will also assist you if you are trying to decide between SharePoint Foundation (enhanced by Search Server 2010 Express) and full SharePoint Server.

Location Analysis

The first thing to consider when deciding your search solution, as well as when planning your selected solution, is to determine where the content and users are located geographically. In today’s global environment content is often located not only in many different systems but also across the globe. In completing your analysis, you must consider both aspects of location.

The first aspect is to consider the system location of the content. That is to say, where are the physical files that your search solution will crawl and index? It is very rare to have a search solution where the entire body of content will lie within a single system. In most cases, content will reside in file shares, other websites, e-mail servers, remote and local databases, ERP systems, and other file repositories. In this step of your analysis you are less concerned with the kinds of content that you will search (that will be the focus of the next phase of analysis); rather, you need to understand where things are located geographically.

To develop a complete understanding of the content locations, it is most effective to develop an audit document or spreadsheet that will list all the content locations that contain files you want your search solution to service, as shown in Table 18-1. This spreadsheet should capture the kind or type of content location, such as database, e-mail, or file share, and it should capture the geographical location of the content.

Table 18-1: Search Content Location Analysis Worksheet

image

The second aspect of location you’ll be concerned with at this phase is user location. That is, who will need to search each of the content locations and where are they physically located? Using the same spreadsheet, you can add columns that indicate the users and their location for each content location that you already identified. At this stage, you should be concerned only with groups of users in a general sense. It would be overkill to identify Active Directory groups at this point. The goal is to get a geographical idea of where users are accessing search and what content they are interested in.

The audit of the content and user locations will not only help you begin the process of deciding which search product is right for you, but will also help you to design the topology of your search solution. The topology will be informed by performance and network factors that will become more evident as you complete the analysis. You need to keep two key considerations in mind as you identify content and user locations:

  • Crawl performance: The crawler operates from the application servers out to the content locations and returns index information to the servers. Geographically dispersed content locations will increase the amount of time it takes for the crawler to send data back over the network to the application servers.
  • Query performance: Similar to the crawler, the query engine resides on application servers that communicate with the user interface over the network. End users in geographically remote locations may experience less efficient search-query performance as a result.

When you have content and users that are geographically spread out, possibly worldwide, you need to consider ways to resolve the performance problems that come with large topologies. Generally speaking, you have three ways to resolve these problems based on your location analysis and performance needs:

  • Regional installation: This is the best option when your content and users are located together in a small region. This solution focuses on a single installation of SharePoint Server 2010 that is scaled as content grows. Network performance is less of a concern in this scenario.
  • Central installation: If your network provides acceptable throughput, for crawling and querying content you may deploy a central SharePoint Server 2010 installation that services all the remote locations. Although this is one of the easier-to-manage scenarios, you have to closely monitor your network performance and user load.
  • Distributed installation: In scenarios where network performance is a concern or where the access requirements are regional, you may deploy many regional installations of SharePoint Server 2010 and tie them together only where it is needed. In the example presented in the Table 18-1 worksheet, the Engineering department in Munich has a database and file shares that need not be indexed and queried by anyone else in the organization. A regional installation allows the content to be indexed and queried for local usage, but frees the central installation of SharePoint from the responsibility for this local data.

The kind of installation you choose for your needs will affect your search solution decision, particularly if you are considering scenarios that will leverage SharePoint Foundation 2010 and Search Server 2010 Express. Because these two products are very limited in the scale of topology they support, they are only capable of regional-type installations, either as part of a larger distributed SharePoint Server topology or as a solution for smaller organizations.

When deciding between SharePoint Server 2010 and FAST Search Server 2010 for SharePoint, the key considerations will be how many users and locations you need to service. Both products are highly scalable in all three installation scenarios; however, extremely large organizations will benefit from many of the high-end performance and scalability features of FAST.

Content Analysis

Though content and user location is an important first step in your product decision, it is not the only step. The amount of content, the types of content, and the growth rate of content will also factor in deciding which search product can service your needs. Thus, it is necessary to dig a little deeper into each content location to identify some important characteristics.

In the content analysis phase, you will want to go beyond the storage medium, such as database, file share, or web, and look at the files and file types of the content located in each of those systems that the crawler must index (Table 18-2). The total set of files and the amount of space they consume is known as your search corpus. When managing a search solution it will be important to understand your corpus at all times as it grows. Understanding the growth rate of content in each location will also help you to configure and maintain a search configuration that will continue to be responsive as the system matures.

Table 18-2: Search Content Analysis Worksheet

image

The first metric to keep in mind is that each Search Server product has a finite number of items that it can index. Your content audit should give you a very close idea of the size of your current corpus, which will be your starting point for a search solution; but, it is only the starting point. As time passes, the size of your corpus will change as business users create and remove files in each of your content locations. You will need to make sure that your search solution will be able to grow with the corpus over time. Table 18-3 summarizes the limitations of each search product.

Table 18-3: Search Scales

SEARCH PRODUCT MAX ITEMS IN INDEX
SharePoint Foundation 10 million
Search Server 2010 Express 300,000 (with SQL Express)
10 million (with SQL Server)
Search Server 2010 100 million
SharePoint Server 2010 100 million
FAST Search Server 2010 for SharePoint 500 million +

One factor that can affect the number of items in your index is whether or not you will use Federated Search in your search results. Federated Search allows you to reduce the number of items in your index because it is a request that is passed to another search engine; the results from that request are integrated into the local search results. Let’s take for an example an Engineering department site. This site will serve as a collaboration space for engineers who will use the search system to find files and information on local file shares and on SQL Server. Additionally, the engineers want to search content that is located on the corporate intranet site. The corporate intranet happens to be hosted on SharePoint Server 2010 and is being indexed by its own crawler. In this situation, you can use Federated Search to request search results from the corporate intranet rather than index the content on the engineering search server. In this example, the handful of web pages located on the intranet will probably not impact the overall size of the corpus on the engineering search server, but what if the engineers want to search content on an R&D file share that contains a few hundred thousand files? If that R&D file share is serviced by its own search engine, you can avoid duplication of indexes by using Federated Search.

The size of the corpus and of individual content sources becomes a factor when you are considering crawl performance and continuous propagation. Large file sizes and types will slow the crawler and will take longer to update the indexes. This affects how long it will take before new files can show up in search results. If the files you are crawling are very large, you will need to consider either the high-end capabilities of FAST or design a topology in which large content sources have dedicated crawlers.

All the search offerings from Search Server 2010 Express to FAST have the ability to handle multiple content sources and types of content in the index; therefore, the kind of content you are crawling will not typically affect your decision about which product to deploy. Where you’ll need to make platform decisions is when you begin separating out crawl and query servers to account for frequently changing content, large quantities of content, and large content sizes.

Feature Requirements

The last factor to consider when deciding on a search solution is to examine the features you need in your solution. Up to this point, the location and content analyses have focused mainly on the performance aspects of search. From a performance point of view, the decision between products is rather straightforward: low performance needs Search Server 2010 Express, medium to high performance needs Search Server 2010 or SharePoint Server 2010, and high to extremely high performance needs FAST. Yet, there are many cases in which your performance needs are low but you need the higher-end products like SharePoint Server 2010 or FAST because they have features that the other products do not. When it comes to FAST, the search features alone can justify the choice and cost of deploying FAST.

When looking at features two major feature gateways will push your product decision one way or another. The first gateway is People, Social, and Taxonomy search. People Search allows the end user to query the MySite profiles of SharePoint users and view profile results. Social Search leverages the MySite and organizational information stored in user profiles to rank search results that make them more relevant based on user actions. For example, a document that is tagged by many users will rate higher in relevance than a similar document that has not been tagged at all. Finally, taxonomy integration leverages metadata and content tagging to refine search results. If these three capabilities, People, Social, and Taxonomy search, are required by your search solution, you can rule out SharePoint Foundation 2010, and both editions of Search Server 2010, because these features are only available in SharePoint Server 2010 and FAST.

The next gateway is Visual, Contextual, and Refinement search, and these are the features that separate FAST from SharePoint Server 2010. FAST allows the inclusion of Visual Best Bets and document thumbnails. If that is not enough, it includes a slide browser for PowerPoint files that allows the end user to browse through the slide deck within the search results page. Contextual search allows you to refine results based on a specific type of profile or audience. Lastly, FAST offers very powerful refinement options that substantially enhance the search experience. If search is an important tool for the end users in your organization, you should seriously consider deploying FAST even in cases where SharePoint Server can handle your performance needs.

Table 18-4 will help you examine the features that are available in each platform. The Microsoft Enterprise Search Center has many feature comparison tables that provide various looks at each server product and what they can do. The following table is extracted from the Search Model 1 of 4 – Search Technologies document that is available at http://tinyurl.com/28fy8sx.

Table 18-4: Feature Comparison

image

image

USER EXPERIENCE

Now that you understand the basic concepts and components of search, the variety of search products available in the SharePoint stack, and the analyses necessary to select a search product, it is time to dig into the search experience in more detail. This chapter closes with a detailed look at the user experience provided by the search products. Chapter 19 digs deeper into the administrative and operational details of search, and Chapter 20 provides a deeper look at FAST.

Although significant feature differences exist between the search products in the SharePoint stack, the user experience is generally similar on all platforms. Where they differ is in the details, such as document thumbnails that appear only in FAST search results. For this reason, the following discussion and screenshots will not differentiate between the different product offerings. All the screenshots were taken on a system that has FAST Search Server 2010 for SharePoint deployed. Therefore, the search-results screenshots show things like document thumbnails and PowerPoint previews, which are not available in the other product lines. If you are unsure whether your edition of search has a feature that is shown here, consult Table 18-4.

In all the search products end users will interact with three key user interfaces to complete searches: the Simple Search Box, the Search Center, and the Advanced Search page. These three make up the input portion of the search interface where users enter their search criteria.

The Simple Search Box (Figure 18-2) is an element that is included in the master page definition for sites. Out of the box, it includes a drop-down that lists scopes along with a textbox where a user enters the search query. The scopes available in the Simple Search Box are defined by the site collection administrator in the site settings. A search scope is a filter that helps to refine the search results. Because the simple search is part of the master page, there are options for the developer to change or even remove this component from the page. If you are developing a master page for your SharePoint installation, it’s very important before doing so to consider the user impact of changing this control.

The Search Center (Figure 18-3) provides an interface that is familiar to any user who has ever interacted with Internet search engines. The tabs that appear across the top of the Search Center (All Sites, People, and Reports) are scopes similar to the drop-down in the Simple Search Box. Out of the box, the configured tabs are All Sites and People. Like simple search, the administrator may configure which tabs are available to the end user.

Advanced Search (Figure 18-4) provides a much more detailed search capability allowing the user to specify phrases, languages, and properties to search on. As an administrator or developer, the properties drop-down is an important component of the search page because the values presented in this control are managed properties. You learn more about configuring managed properties in Chapter 19.

Once the user enters a query into one of the search interfaces, the query server returns the results to a search results page. The results page consists of a set of highly customizable connected Web Parts. These Web Parts work together to provide the user with a rich search experience that goes well beyond the simple display of search results. It will help your understanding of search to go through each of the major sections of the results page.

Search Refinement

Search refinements, or facets as they were once called, allow the end user to drill deeper into the search results based on tags and metadata found in the result set. By utilizing the search refinements that appear on the left-hand side of the results page as shown in Figure 18-5, the end user may switch from a general search in which he/she browses through pages of results to a more directed search where the end user may follow a guided path to a specific result set. Let’s say an end user is doing a search for a project document related to the Gears project. She knows that the document she wants is a Word document. By using the refinement options, she can select all results from the result set that are Word documents. This works in a more research-oriented mode as well. For example, a user may be writing a statement of work and want to see some examples of other statements of work that have been written in the organization. If “statement of work” is a metadata property of all statement of work documents, the refinement bar will have an entry for statement of work that the end user may select.

Best Bets

Best Bets are a configured set of keywords that have a specific entry associated with them as shown in Figure 18-6. These are created and configured by a site administrator to show a particular result at the top of the results pane. For example, an administrator may create a Best Bet for the search term “Annual Report” that will display a link to the most recent annual report for the company at the top of the search results. The assumption is that when an exact match occurs, the Best Bet represents the result that most users are looking for when they use that term.

For an administrator, Best Bets are often informed by the search usage reports that show common search terms and destinations. When a search term has large number of click-throughs on a single link, you have a good candidate for a Best Bet.

Thumbnails and Previews

Thumbnails and Previews are a feature of FAST search that are worth their weight in gold. FAST search will show a thumbnail image of the first page of each Word and PowerPoint document in the result set (Figure 18-7), making document recognition much easier and more efficient. The presence of thumbnails means that a user can find the document he is looking for without opening each one. This saves a lot of productivity over the life of the search solution.

PowerPoint is further enhanced with the option to preview the entire slide deck (Figure 18-8) in the results page. The user clicks the link and a browsing window opens in which the user scrolls through the slides. In most PowerPoint searches, users are typically looking for a single slide or set of slides, but they often cannot remember what slide deck the slides are located in. Like document thumbnails, PowerPoint preview saves a lot of lost productivity caused by opening search results, one after another, in search of a slide or two.

Similar Results and Duplicates

FAST also enhances the search experience by combining duplicate entries with a hyperlink that allows the user to view all the duplicates together in a results set. Likewise, FAST uses the metadata and other factors to identify similar results sets for individual results entries.

Sort Results

A powerful feature in the user interface is sorting. SharePoint Server 2010 allows the user to sort results by relevance and date modified, which can aid in many general document searches. But additional sorting is necessary for more specific searches. FAST provides the ability to sort based on metadata and on relevance ranking profiles. In this way a user can sort based on a single rank or on a property; providing a much more efficient way to browse through search results.

Query Syntax

Most users are familiar with entering keyword terms into a search box to query for results. SharePoint begins with basic keyword query syntax but adds more capability in the form of property, wildcard, and Boolean query syntax.

Keyword syntax is straightforward. The user enters a term or phrase into the search box, and the query engine delivers results based on the term or phrase. SharePoint allows the use of quotations around phrases to query for exact matches of a set of words. It also allows the use of the + and – operators for inclusion and exclusion of terms. For example, a search for “acme +project” will return results that have both keywords (acme and project) in the results. Alternatively, a search for “acme – project” will return results that have had any project items removed from the results.

Property filters allow the user to enter in a property name:value-type search. These are managed properties that are associated with content by the indexer and work in the same way as property searches using the Advanced Search page. The user enters the property name followed by a colon followed by the expected property value. For example: “author: Ken Schaefer” would return all content authored by Ken Schaefer. The limitation to property filter searches in SharePoint is that there’s no easy way to know what properties are configured as managed properties, which the end user can put into the search box.

Wildcard search is a new feature in SharePoint 2010 that allows the user to enter an asterisk character at the end of a keyword. For example, a search for “ac*” could return results for academic, acme, and so on.

Another addition in SharePoint 2010 is Boolean syntax. Users may enter search terms using parentheses to organize the order in which search terms are utilized in the query. The user may also use AND, OR, and NOT to specify how matches are handled, as well as =, <, >, <=, and >= to refine the matches even more.

SOCIAL SEARCH

Social Search is a capability unique to SharePoint Server 2010 and FAST Search Server 2010 for SharePoint. SharePoint’s profile and MySite capabilities are integrated with search to provide a search experience that combines traditional personal contact search with the best features of social networking. This allows the organization to begin identifying areas of expertise and locating the most active contributors of content and institutional knowledge.

People Search

Searching for people in organizations presents special challenges. In the past, solutions that provided People Search–like capabilities were nothing more than enhanced address books. The challenge for People Search (Figure 18-9) today is that users are accustomed to looking up people through social networks to gather much more than simple e-mail address and contact information. Today’s social networks provide search users with visibility into people’s activities, their expertise, and other information about the person. Organizations have struggled for years to find solutions that can offer the organization the same kinds of information the public social networks provide. SharePoint addresses all these challenges in its Social Search capabilities.

People Search is limited to the two SharePoint products (SharePoint Server 2010 and FAST Search Server 2010 for SharePoint) because it relies on the user profile capabilities of the SharePoint platform. User profiles combined with the MySite features of SharePoint provide an information-rich platform for personnel data that in many ways resembles a social networking site and which leverages many of the best features of those social sites, such as microblogging, personal profile information, and organizational information.

Search results in the People Search results page reflect much of the information that is stored in user profiles and on the MySite page (Figure 18-9). Personal pictures, About Me descriptions, and Ask Me About content are gleaned from the data entered into a person’s MySite page and presented as part of the result, allowing search users to quickly browse important information about the person. This is important to the search user who is looking for experts in a particular area or isn’t sure if the person she is looking at is Erik from accounting or Erik from sales.

People Search also allows search users to locate content that is authored by a particular person. For example, a search user goes to a training course that was put together and presented by Sally Smith. Sally presented a PowerPoint slide deck that she uploaded to a SharePoint site that is indexed by the crawler. Perhaps the search user has forgotten the exact name of the training, or the particular PowerPoint slide deck isn’t coming up in the search results. People Search provides a solution to this problem by allowing the search user to dig into content authored by a person through a hyperlink in the results pane.

One really neat feature in MySite, presented by People Search results, is a view into the organizational chart using the Silverlight-based organizational viewer. This is a dynamic, animated organizational browser that shows the organization chart from the selected person’s position in the organization, in the form of sliding panes that show peers, subordinates, and superiors.

Despite the abundance of rich information that the People Search results page displays, it is highly dependent on profiles and MySite personal sites. If you plan to leverage this capability in any meaningful way, you must make a special effort to make sure that profiles are complete and imported into SharePoint. In most organizations, user profiles will be maintained in Active Directory. Active Directory allows the management of a great deal of contact and organizational information about each person in the organization. Similarly, SharePoint has the capability to maintain personnel information in its own profiles. The decision for most organizations is to determine which system will be the system of record that will be maintained with all the necessary information. If your organization determines that AD is the system of record, you will need to set up SharePoint to use the User Profile Synchronization Service (a topic beyond the scope of this chapter).

People results are also dependent on the content that individual users put into their MySite pages. If these users don’t fill in the About Me and Ask Me About sections of their MySite personal site, that information will not appear in the search results. To effectively leverage People Search in SharePoint it is important to make MySite available to the users and to encourage them to fill in the information in their MySite profiles. Otherwise, many of the social-search enhancements that SharePoint offers are useless.

How Social Behavior Influences Relevance

SharePoint Search’s relevance calculations take into account all aspects of social search, from content that people author, the relative location of people in the organizational charts, content entered into MySite personal profiles, and tags that people use on content. These additional capabilities that are found only in the SharePoint search products take search to a new level by presenting results in a context-oriented fashion that is personally relevant to the search user.

One of the key ways that SharePoint calculates relevance is through social distance based on the organizational chart. Content that is authored, modified, and tagged by people who are directly connected in the organization chart will appear before content of people who are many degrees away. SharePoint also takes into account any tagging or feedback that people close to the searcher have applied to content, which will also cause results to bubble up in relevance.

What if a person is far away on the organization chart from the search user, but the two are currently working together on a cross-departmental project? In the People Search results pane, each result entry contains a link to add a person as a colleague. By selecting this link, the search user places the person within his or her social network, which will in turn impact the relevance of search results.

SharePoint goes a step beyond the simple metrics of social distance and also factors in social frequency. This takes two forms, authorship and social participation. Authorship is a factor based on how much a person interacts with content in the system — the more content a user authors or edits, the higher the search relevance results. Not only does this place a particular user’s content above others, it helps the organization identify centers of expertise. Social participation works in much the same way by factoring in the amount of tagging that a person does to content.

SUMMARY

Search in the enterprise using the SharePoint 2010 stack of products involves a lot of moving parts. The crawling engine gathers information about content from a variety of locations including SharePoint, databases and line-of-business systems, websites, file shares, Exchange, and Lotus Notes. The crawler communicates with the indexer, which organizes and stores the crawled content and properties in index databases. The query server pulls search results from the index databases in response to search queries, and the user interface accepts queries and presents results.

Selecting a search product in the SharePoint 2010 stack can be a daunting exercise because many search offerings are intended for different purposes, from entry level to full-featured enterprise-scale solutions. Deciding between products and when to migrate to higher levels requires that you complete regular audits of your users and the content that the system is crawling.

All the SharePoint search products provide a rich set of user interfaces that allows the user to submit search queries and view results. The search experience is greatly enhanced with features that allow for refinement of searches (previously known as search facets), the display of Best Bets and exact matches, and at the higher end thumbnails and previews.

After reading this chapter, you should have a good understanding of what search is, as well as what can be done with search in the SharePoint 2010 stack. The next chapter digs deeper into building your search solution so that it will successfully deliver relevant and timely results to your search users.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset