2.2. Selecting Resources

When we talk about organizing systems, we often do so in terms of the contents of their collections. This implies that the most fundamental decision for an organizing system is determining its resource domain, the group or type of resources that are being organized. This decision is usually a constraint, not a choice; we acquire or encounter some resources that we need to interact with over time, and we need to organize them so we can do that effectively.

Selecting is the process by which resources are identified, evaluated, and then added to a collection in an organizing system. Selection is first shaped by the domain and then by the scope of the organizing system, which can be analyzed through six interrelated aspects:

  1. the number and nature of users

  2. the time span or lifetime over which the organizing system is expected to operate

  3. the size of the collection

  4. the expected changes to the collection

  5. the physical or technological environment in which the organizing system is situated or implemented

  6. the relationship of the organizing system to other ones that overlap with it in domain or scope.

(In Chapter 10, “The Organizing System Roadmap, we discuss these six aspects in more detail.)

2.2.1. Selecting {and, or, vs.} Organizing

Many types of resources are inevitably evaluated one-at-a-time. It is impossible to specify in advance every property or criterion that might be considered in making a selection decision, especially for unique or rare resources like those being considered by a museum or private collector. As a result, organizing activities typically occur after selection takes place, as in the closet organizing system with which we began this chapter.

When the resources being considered for a collection are more homogeneous and predictable, it is possible to specify selection criteria and organizing principles in advance. This makes selection and organizing into concurrent activities. You expect your email in-box will receive spam messages, so you might as well create a spam folder where the spam filter can deposit the messages it classifies as spam.

Some organizing systems acquire all resources of a particular type or from a particular source. Resources are then automatically added to the collection according to an organizing decision that need be made only for the first resource, with subsequent resources further organized according to properties that minimally distinguish them from each other, like their creation or acquisition dates. Syndicated or subscription resources like news feeds or serial publications are most often organized in this manner, where the organization imposed on the first resource acquired is replicated for each subsequent one. If you subscribe to a printed magazine, the magazines undoubtedly end up in a stack or pile with the most recent issue on the top.

Finally, as we pointed out in the sidebar, What about “Creating” Resources? selection can sometimes follow organizing. An order management system cannot add new orders until it has a defined schema for creating them.

2.2.2. Selection Principles

Selection must be an intentional process because, by definition, an organizing system contains resources whose selection and arrangement was determined by human or computational agents, not by natural processes. Selection methods and criteria vary across resource domains. Resource selection policies are often shaped by laws, regulations or policies that require or prohibit the collection of certain kinds of objects or types of information.36[Law]

[36][Law] Some governments attempt to preserve and prevent misappropriation of “cultural property” by enforcing import or export controls on antiquities that might be stolen from archaeological sites (Merryman 2006). For digital resources, privacy laws prohibit the collection or misuse of personally identifiable information about healthcare, education, telecommunications, video rental, and might soon restrict the information collected during web browsing.

Libraries typically select resources on the basis of their utility and relevance to their user populations, and try to choose resources that add the most value to their existing collections, given the cost constraints that most libraries are currently facing. In contrast, museums often emphasize intrinsic value, scarcity, or uniqueness as selection criteria, even if the resources lack any contemporary use. Both libraries and museums typically formalize their selection principles in collection development policies that establish priorities for acquiring resources that reflect the people they serve and the services they provide to them. Precise and formal selection principles enable users of a collection to be confident that it contains the most important and useful resources.

Adding a resource to a museum implies an obligation to preserve it forever, so many museums follow rigorous accessioning procedures before accepting it. Likewise, archives usually perform an additional appraisal step to determine the quality and value of materials offered to them. In archives, common appraisal criteria include uniqueness, the credibility of the source, the extent of documentation, and the rights and potential for reuse. To oversimplify: libraries decide what to keep, museums decide what to accept, and archives decide what to throw away.37[LIS]

[37][LIS] Large research libraries have historically viewed their collections as their intellectual capital and have policies that specify the subjects and sources that they intend to emphasize as they build their collections. See (Evans 2000). Museums are often wary of accepting items that might not have been legally acquired or that have claims on them from donor heirs or descendant groups; in the USA, much controversy exists because museums contain many human skeletal remains and artifacts that Native American groups want to be “repatriated.”

In the for-profit sector, well-run firms are similarly systematic in selecting the resources that must be managed and the information needed to manage them. “Selecting the right resource for the job” is a clichéd way of saying this, but this slogan nonetheless applies broadly to human resources, functional equipment, or information that drives business processes.38[Bus]

[38][Bus] Selection of a person involves assessing the match between competencies and capabilities (expressed verbally or in a resume, or demonstrated in some qualification test) and what is needed to do the required activities. Selection of athletes for sports teams can involve psychological, behavioral, and performance criteria and has become highly data-intensive, as the Moneyball book (Lewis 2003) and 2011 movie starring Brad Pitt demonstrate.

One way in which the selection of human resources differs notably from that for the resources selected by libraries and museums is that the former are often selected on the basis of predicted rather than current properties, capability or suitability. Sports teams often sign promising athletes for their minor league teams, and businesses hire interns, train their employees, and run executive development programs to prepare promising low-level managers for executive roles. Some firms even pay to train potential workers in “workforce development” or “pipeline” programs.

The organizing systems for managing sales, orders, customers, inventory, personnel, and finance information are tailored to the specific information needed to run that part of the company’s operations. Identifying this information is the job of business analysts and data modelers. Much of this operational data is combined in huge “data warehouses” to support the “business analytics” function in which novel combinations and relationships among data items are explored by selecting subsets of the data.

When multiple sets of data resources are combined, it is essential that “data cleaning” is performed to eliminate redundancy, to ensure that entities in the data are described using the same units at the same point in time, and to conform with applicable syntactic and semantic standards. Some data cleaning applies to every resource in a data set, as when every “Zip Code” in a United States mailing directory is given the more universal “Postal Code” label. However, data cleaning more often involves analysis, repair, and validation of resource instances. Detecting duplicate records is especially important because including them can produce misleading statistics and predictions, as well as creating the nuisance for consumers of receiving multiple copies of product catalogs, each with a different misspelling in a name or address.39[Com]

[39][Com] On data modeling: see (Kent 2012), (Silverston 2000), (Glushko and McGrath 2005). For data warehouses see (Turban et al. 2010).

For a classification and review of data cleaning problems and methods, see (Rahm and Do, 2000). A recent and popular analysis that describes data cleaning as "data wrangling, data munging, and data janitor work" is (Lohr 2014).

Selection is an essential activity in creating organizing systems whose purpose is to combine separate web services or resources to create a composite service or application according to the business design philosophy of Service Oriented Architecture (SOA).40[Com] When an information-intensive enterprise combines its internal services with outsourced ones provided by other firms, the resources are selected to create a combined collection of services according to the “core competency” principle: resources are selected and combined to exploit the enterprise’s internal capabilities and those of its service partners better than any other combination of services could.41[Bus]

[40][Com] See (Cherbakov et al. 2005), (Erl 2005a). The essence of SOA is to treat business services or functions as components that can be combined as needed. An SOA enables a business to quickly and cost-effectively change how it does business and whom it does business with (suppliers, business partners, or customers). SOA is generally implemented using web services that exchange Extensible Markup Language (XML) documents in real-time information flows to interconnect the business service components. If the business service components are described abstractly it can be possible for one service provider to be transparently substituted for anothera kind of real-time resource selectionto maintain the desired quality of service. For example, a web retailer might send a Shipping Request to many delivery services, one of which is selected to provide the service. It probably does not matter to the customer which delivery service handles his package, and it might not even matter to the retailer.

[41][Bus] The idea that a firm’s long term success can depend on just a handful of critical capabilities that cut across current technologies and organizational boundaries makes a firm’s core competency a very abstract conceptual model of how it is organized. This concept was first proposed by (Pralahad and Hamel 1990), and since then there have been literally hundreds of business books that all say essentially the same thing: you cannot be good at everything; choose what you need to be good at and focus on getting better at them; let someone else do things that you do not need to be good at doing.

Even when the selection principles behind a collection are clear and consistent, they can be unconventional, idiosyncratic, or otherwise biased by the perspective and experience of the collector. This is sometimes the case in museum or library collections that began or grew opportunistically through the acquisition of private collections that reflect a highly individual point of view.

It is especially easy to see the collector’s point of view in personal collections. Most of the clothes and shoes you own have a reason for being in your closet, but could anyone else explain the contents of your closet and its organizing system, and why you bought that crazy-looking dress or shirt?

2.2.3. Selection of Digital and Web-based Resources

Digitization is substantially changing how libraries select resources. Digital content can be delivered anywhere quickly and cheaply, making it easier for a group of cooperating libraries to share resources. For example, while each campus of the University of California system has its own libraries and library catalogs, system-wide catalogs and digital content delivery reduce the need for every campus to have any particular resource in its own collection.42[LIS]

[42][LIS] See (Borgman 2000) on digitization and libraries. But while shared collections benefit users and reduce acquisition costs, if a library has defined itself as a physical place and emphasizes its holdings the resources it directly controlsit might resist anything that reduces the importance of its physical reification, the size of its holdings or the control it has over resources (Sandler 2006). A challenge facing conventional libraries today is to make the transition from a perspective that emphasizes creation and preservation of physical collections to facilitating the use and creation of knowledge regardless of the medium of its representation and the physical or virtual location from which it is accessed.

Digitization has had extremely important impacts on the manner in which collections of information resources are created in information-intensive domains such as transportation, retailing, supply chain management, healthcare, energy management, and “big science” where a torrent of low-level information is captured from GPS devices, RFID tags, sensors and science labs. Businesses that once had to rely on limited historical data analysis and printed reports now have to deal with a constant stream of real-time information.

An analogous situation has evolved with personal collections of photographs. Less than two decades ago, before the digital camera became a consumer product, the time and expense of developing photographs induced people to take photos carefully and cautiously. Today the proliferation of digital cameras and photo-capable phones has made it so easy to take digital photos and videos that people are less selective and take many photos or videos of the same scene or event.

The nature and scale of the web changes how we collect resources and fundamentally challenges how we think of resources in the first place. Web-based resources cannot be selected for a collection by consulting a centralized authoritative directory, catalog, or index because one does not exist. And although your favorite web search engine consults an index or directory of web resources when you enter a search query, you do not know where that index or directory came from or how it was assembled.43[Web]

[43][Web] (Arasu et al. 2001), (Manning et al. 2008). The web is a graph, so all web crawlers use graph traversal algorithms to find URIs of web resources and then add any hyperlink they find to the list of URIs they visit. The sheer size of the web makes crawling its pages a bandwidth- and computation intensive process, and since some pages change frequently and others not at all, an effective crawler must be smart at how it prioritizes the pages it collects and how it re-crawls pages. A web crawler for a search engine can determine the most relevant, popular, and credible pages from query logs and visit them more often. For other sites, a crawler adjusts its “revisit frequency” based on the “change frequency” (Cho and Garcia-Molina 2000).

The contents of a collection and how it is organized always reflect its intended users and uses. But the web has universal scope and global reach, making most of the web irrelevant to most people most of the time. Researchers have attacked this problem by treating the web as a combination of a very large number of topic-based or domain-specific collections of resources, and then developing techniques for extracting these collections as digital libraries targeted for particular users and uses.44[Web]

[44][Web] Web resources are typically discovered by computerized “web crawlers” that find them by following links in a methodical automated manner. Web crawlers can be used to create topic-based or domain-specific collections of web resources by changing the “breadth-first” policy of generic crawlers to a “best-first” approach. Such “focused crawlers” only visit pages that have a high probability of being relevant to the topic or domain, which can be estimated by analyzing the similarity of the text of the linking and linked pages, terms in the linked page’s URI, or locating explicit semantic annotation that describes their content or their interfaces if they are invokable services (Bergmark et al. 2002), (Ding et al. 2004).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset