CHAPTER 1 SETTING THE SCENE 

The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings …

Thus begins Jorge Luis Borges’s fable The Library of Babel, which conjures up an image not unlike the World Wide Web. He gives a surreal description of the Library, which includes spiral staircases that “sink abysmally and soar upwards to remote distances” and mirrors that lead the inhabitants to conjecture whether or not the Library is infinite (“… I prefer to dream that their polished surfaces represent and promise the infinite,” declares Borges’s anonymous narrator). Next he tells of the life of its inhabitants, who live and die in this bleak space, traveling from gallery to gallery in their youth and in later years specializing in the contents of a small locality of this unbounded labyrinth. Then he describes the contents: every conceivable book is here, “the archangels’ autobiographies, the faithful catalogue of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue….”

Although the celebrated Argentine writer wrote this enigmatic little tale in 1941, it resonates with echoes of today’s World Wide Web. “The impious maintain that nonsense is normal in the Library and the reasonable is an almost miraculous exception.” But there are differences: travelers confirm that no two books in Borges’s Library are identical—in sharp contrast with the web, replete with redundancy.

The universe (which others call the Web) is exactly what this book is about. And the universe is not always a happy place. Despite the apparent glut of information in Borges’s Library of Babel, its books are completely useless to the reader, leading the librarians to a state of suicidal despair. Today we stand at the epicenter of a revolution in how our society creates, organizes, locates, presents, and preserves information—and misinformation. We are battered by lies, from junk e-mail, to other people’s misconceptions, to advertisements dressed up as hard news, to infotainment in which the borders of fact and fiction are deliberately smeared. It’s hard to make sense of the maelstrom: we feel confused, disoriented, unconfident, wary of the future, unsure even of the present.

Take heart: there have been revolutions before. To gain a sense of perspective, let’s glance briefly at another upheaval, one that caused far more chaos by overturning not just information but science and society as well. The Enlightenment in the eighteenth century advocated rationality as a means of establishing an authoritative system of knowledge and governance, ethics, and aesthetics. In the context of the times, this was far more radical than today’s little information revolution. Up until then, society’s intellectual traditions, legal structure, and customs were dictated partly by an often tyrannical state and partly by the Church—leavened with a goodly dose of irrationality and superstition. The French Revolution was a violent manifestation of Enlightenment philosophy. The desire for rationality in government led to an attempt to end the Catholic Church and indeed Christianity in France, as well as bringing a new order to the calendar, clock, yardstick, monetary system, and legal structure. Heads rolled.

Immanuel Kant, a great German philosopher of the time, urged thinkers to have the courage to rely on their own reason and understanding rather than seeking guidance from other, ostensibly more authoritative, intellects as they had been trained to do. As our kids say today, “Grow up!” He went on to ask new philosophical questions about the present—what is happening “right now.” How can we interpret the present when we are part of it ourselves, when our own thinking influences the very object of study, when new ideas cause heads to roll? In his quest to understand the revolutionary spirit of the times, he concluded that the significance of revolutions is not in the events themselves so much as in how they are perceived and understood by people who are not actually front-line combatants. It is not the perpetrators—the actors on the world stage—who come to understand the true meaning of a revolution, but the rest of society, the audience who are swept along by the plot.

In the information revolution sparked by the World Wide Web, we are all members of the audience. We did not ask for it. We did not direct its development. We did not participate in its conception and launch, in the design of the protocols and the construction of the search engines. But it has nevertheless become a valued part of our lives: we use it, we learn from it, we put information on it for others to find. To understand it we need to learn a little of how it arose and where it came from, who were the pioneers who created it, and what were they trying to do.

The best place to begin understanding the web’s fundamental role, which is to provide access to the world’s information, is with the philosophers, for, as you probably recall from early university courses in the liberal arts, early savants like Socrates and Plato knew a thing or two about knowledge and wisdom, and how to acquire and transmit them.

ACCORDING TO THE PHILOSOPHERS …

Seeking new information presents a very old philosophical conundrum. Around 400 B.C., the Greek sage Plato spoke of how his teacher Socrates examined moral concepts such as “good” and “justice”, important everyday ideas that are used loosely without any real definition. Socrates probed students with leading questions to help them determine their underlying beliefs and map out the extent of their knowledge—and ignorance. The Socratic method does not supply answers but generates better hypotheses by steadily identifying and eliminating those that lead to contradictions. In a discussion about Virtue, Socrates’ student Meno stumbles upon a paradox.

Meno: And how will you enquire, Socrates, into that which you do not know? What will you put forth as the subject of enquiry? And if you find what you want, how will you ever know that this is the thing which you did not know?

Socrates: I know, Meno, what you mean; but just see what a tiresome dispute you are introducing. You argue that man cannot enquire either about that which he knows, or about that which he does not know; for if he knows, he has no need to enquire; and if not, he cannot; for he does not know the very subject about which he is to enquire.

Plato Meno, XIV 80d–e/81a (Jowett, 1949)

In other words, what is this thing called “search”? How can you tell when you have arrived at the truth when you don’t know what the truth is? Web users, this is a question for our times!

KNOWLEDGE AS RELATIONS

Socrates, typically, did not answer the question. His method was to use inquiry to compel his students into a sometimes uncomfortable examination of their own beliefs and prejudices, to unveil the extent of their ignorance. His disciple Plato was more accommodating and did at least try to provide an answer. In philosophical terms, Plato was an idealist: he thought that ideas are not created by human reason but reside in a perfect world somewhere out there. He held that knowledge is in some sense innate, buried deep within the soul, but can be dimly perceived and brought out into the light when dealing with new experiences and discoveries—particularly with the guidance of a Socratic interrogator.

Reinterpreting for the web user, we might say that we do not begin the process of discovery from scratch, but instead have access to some preexisting model that enables us to evaluate and interpret what we read. We gain knowledge by relating new information and experience to our existing model in order to make sense of our perceptions. At a personal level, knowledge creation—that is, learning—is a process without beginning or end.

The American philosopher Charles S. Peirce (1839–1914) founded a movement called “pragmatism” that strives to clarify ideas by applying the methods of science to philosophical issues. His work is highly respected by other philosophers. Bertrand Russell thought he was “certainly the greatest American thinker ever,” and Karl Popper called him one of the greatest philosophers of all time. When Peirce discussed the question of how we acquire new knowledge, or as he put it, “whether there is any cognition not determined by a previous cognition,” he concluded that knowledge consists of relations.

All the cognitive faculties we know of are relative, and consequently their products are relations. But the cognition of a relation is determined by previous cognitions. No cognition not determined by a previous cognition, then, can be known. It does not exist, then, first, because it is absolutely incognizable, and second, because a cognition only exists so far as it is known.

Peirce (1868a, p. 111)

What thinking, learning, or acquiring knowledge does is create relations between existing “cognitions”—today we would call them cognitive structures, patterns of mental activity. But where does it all begin? For Peirce, there is no such thing as the first cognition. Everything we learn is intertwined—nothing comes first, there is no beginning.

Peirce’s pragmatism sits at the very opposite end of the philosophical spectrum to Plato’s idealism. But the two reached strikingly similar conclusions: we acquire knowledge by creating relationships among elements that were formerly unconnected. For Plato, the relationships are established between the perfect world of ideas and the world of actual experience, whereas Peirce’s relations are established among different cognitions, different thoughts. Knowing is relating. When philosophers arrive at the same conclusion from diametrically opposing starting points, it’s worth listening.

The World Wide Web is a metaphor for the general knowledge creation process that both Peirce and Plato envisaged. We humans learn by connecting and linking information, the very activity that defines the web. As we will argue in the next chapters, virtually all recorded knowledge is out there on the web—or soon will be. If linking information together is the key activity that underlies learning, the links that intertwine the web will have a profound influence on the entire process of knowledge creation within our society. New knowledge will not only be born digital; it will be born fully contextualized and linked to the existing knowledge base at birth—or, more literally, at conception.

KNOWLEDGE COMMUNITIES

We often think of the acquisition of new knowledge as a passive and solitary activity, like reading a book. Nothing could be further from the truth. Plato described how Socrates managed to elicit Pythagoras’s theorem, a mathematical result commonly attributed to the eponymous Greek philosopher and mathematician who lived 200 years earlier, from an uneducated slave—an extraordinary feat. Socrates led the slave into “discovering” this result through a long series of simple questions. He first demonstrated that the slave (incorrectly) thought that if you doubled the side of a square, you doubled its area. Then he talked him through a series of simple and obvious questions that made him realize that to double the area, you must make the diagonal twice the length of the side, which is not the same thing as doubling the side.

We can draw two lessons from this parable. First, discovery is a dialogue. The slave could never have found the truth alone, but only when guided by a master who gave advice and corrected his mistakes. Learning is not a solitary activity. Second, the slave reaches his understanding through a dynamic and active process, gradually producing closer approximations to the truth by correcting his interpretation of the information available. Learning, even learning a one-off “fact,” is not a blinding flash of inspiration but a process of discovery that involves examining ideas and beliefs using reason and logic.

Turn now from Plato, the classical idealist, to Peirce, the modern pragmatist. He asked, what is “reality”? The complex relation between external reality, truth, and cognition has bedeviled philosophers since time immemorial, and we’ll tiptoe carefully around it. But in his discussion, Peirce described the acquisition and organization of knowledge with reference to a community:

The very origin of the conception of reality shows that this conception essentially involves the notion of a Community, without definite limits, and capable of a definite increase of knowledge.

Peirce (1868b, p. 153)

Knowledge communities are central to the World Wide Web—that is, the universe (which others call the Web). In fact, community and knowledge are so intertwined that one cannot be understood without the other. As Peirce notes, communities do not have crisp boundaries in terms of membership. Rather, they can be recognized by their members’ shared beliefs, interests, and concerns. Though their constituency changes and evolves over time, communities are characterized by a common intellectual heritage. Peirce’s “reality” implies the shared knowledge that a community, itself in constant flux, continues to sustain and develop into the future. This social interpretation of knowledge and reality is reflected in the staggering number of overlapping communities that create the web. Indeed, as we will learn in Chapter 4, today’s search engines analyze this huge network in an attempt to determine and quantify the degree of authority accorded to each page by different social communities.

KNOWLEDGE AS LANGUAGE

We learned from Plato that people gain knowledge through interaction and dialogue, and from Peirce that knowledge is community-based and that it develops dynamically over time. Another philosopher, Ludwig Wittgenstein (1889–1951), one of last century’s most influential and original thinkers, gave a third perspective on how information is transformed into knowledge. He was obsessed with the nature of language and its relationship with logic. Language is clearly a social construct—a language that others cannot understand is no use at all. Linguistic communication involves applying rules that allow people to understand one another even when they do not share the same world vision. Meaning is attributed to words through a convention that becomes established over time within a given community. Understanding, the process of transforming information into knowledge, is inextricably bound up with the linguistic habits of a social group. Thinking is inseparable from language, which is inseparable from community.

Though Wittgenstein was talking generally, his argument fits the World Wide Web perfectly. The web externalizes knowledge in the form of language, generated and disseminated by interacting communities.

We have discussed three very different thinkers from distant times and cultures: Plato, Peirce, and Wittgenstein, and discovered what they had to say about the World Wide Web—though, of course, they didn’t know it. Knowing is relating. Knowledge is dynamic and community-based; its creation is both discovery and dialogue. Thinking is inseparable from language, which is inseparable from community. Thus prepared, we are ready to proceed with Kant’s challenge of interpreting the revolution.

ENTER THE TECHNOLOGISTS

Norbert Wiener (1894–1964) was among the leaders of the technological revolution that took place around the time of the Second World War. He was the first American-born mathematician to win the respect of top intellects in the traditional European bastions of learning. He coined the term cybernetics and introduced it to a mass audience in a popular book entitled The Human Use of Human Beings. Though he did not foresee in detail today’s amazing diffusion of information and communication technologies, and its pivotal role in shaping our society, he had much to say about it.

THE BIRTH OF CYBERNETICS

Wiener thought that the way to understand society is by studying messages and the media used to communicate them. He wanted to analyze how machines can communicate with each other, and how people might interact with them. Kids today discuss on street corners whether their portable music player can “talk to” their family computer, or how ineptly their parents interact with TiVo, but in the 1950s it was rather unusual to use machines and interaction in the same sentence. After the war, Wiener assembled to work with him at MIT some of the brightest young researchers in electrical engineering, neuropsychology, and what would now be called artificial intelligence.

Wiener began the study of communication protocols and human-computer interaction, and these underpin the operation of the World Wide Web. Although systems like search engines are obviously the product of human intellectual activity, we interact with them as entities in their own right. Though patently not humanoid robots from some futuristic world or science fiction tale, we nevertheless take their advice seriously. We rely on them to sift information for us and do not think, not for a moment, about how they work inside. Even all the software gurus who developed the system would be hard pressed to explain the precise reason why a particular list of results came up for a particular query at a particular time. The process is too intricate and the information it uses too dynamic and distributed to be able to retrace all the steps involved. No single person is in control: the machine is virtually autonomous.

When retrieving information from the web, we have no option but to trust tools whose characteristics we cannot comprehend, just as in life we are often forced to trust people we don’t really know. Of course, no sources of information in real life are completely objective. When we read newspapers, we do not expect the reporter’s account to be unbiased. But we do have some idea where he or she is coming from. Prominent journalists’ biases are public knowledge; the article’s political, social, and economic orientation is manifest in its first few lines; the newspaper’s masthead sets up appropriate expectations. Web search agents give no hint of their political inclinations—to be fair, they probably have none. But the most dangerous biases are neither political nor commercial, but are implicit in the structure of the technology. They are virtually undetectable even by the developers, caught up as they are in leading the revolutionary vanguard.

All those years ago, Wiener raised ethical concerns that have, over time, become increasingly ignored. He urged us to consider what are legitimate and useful developments of technology. He worried about leaving delicate decisions to machines; yet we now uncritically rely on them to find relevant information for us. He felt that even if a computer could learn to make good choices, it should never be allowed to be the final arbiter—particularly when we are only dimly aware of the methods it uses and the principles by which it operates. People need to have a basis on which to judge whether they agree with the computer’s decision. Responsibility should never be delegated to computers, but must remain with human beings.

Wiener’s concern is particularly acute in web information retrieval. One aim of this book is to raise the issue and discuss it honestly and openly. We do not presume to have a final response, a definitive solution. But we do aspire to increase people’s awareness of the ethical issues at stake. As Kant observed, the true significance of a revolution comes not from its commanders or foot soldiers, but from its assimilation by the rest of us.

INFORMATION AS PROCESS

In 1905, not long after the Wright Brothers made the first successful powered flight by a heavier-than-air machine, Rudyard Kipling wrote a story that envisaged how technology—in this case, aeronautics—might eventually come to control humanity. He anticipated how communication shapes society and international power relationships today. With the Night Mail is set in A.D 2000, when the world becomes fully globalized under the Aerial Board of Control (ABC), a small organization of “semi-elected” people who coordinate global transportation and communication. The ABC was founded in 1949 as an international authority with responsibility for airborne traffic and “all that that implies.” Air travel had so united the world that war had long since become obsolete. But private property was jeopardized: any building could be legitimately damaged by a plane engaged in a tricky landing procedure. Privacy was completely abandoned in the interests of technological communication and scientific progress. The machines were effectively in control.

This negative vision exasperated Wiener. He believed passionately that machines cannot in principle be in control, since they do their work at the behest of man. Only human beings can govern.

He [Kipling] has emphasized the extended physical transportation of man, rather than the transportation of language and ideas. He does not seem to realize that where a man’s word goes, and where his power of perception goes, to that point his control and in a sense his physical existence is extended. To see and to give commands to the whole world is almost the same as being everywhere.

Wiener (1950, p. 97)

Kipling’s dystopia was based on transportation technology, but Wiener took pains to point out that transporting information (i.e., bits) has quite different consequences from the transport of matter. (This was not so clear in 1950 as it is to us today.) Weiner deployed two arguments. The first was based on analyzing the kind of systems that were used to transport information. He argued that communicating machines, like communicating individuals, transcend their physical structure. Two interconnected systems comprise a new device that is greater than the sum of its parts. The whole acquires characteristics that cannot be predicted from its components. Today we see the web as having a holistic identity that transcends the sum of all the individual websites.

The second argument, even more germane to our topic, concerns the nature of information itself. In the late 1940s, Claude Shannon, a pioneer of information theory, likened information to thermodynamic entropy, for it obeys some of the same mathematical laws. Wiener inferred that information, like entropy, is not conserved in the way that physical matter is. The world is constantly changing, and you can’t store information and expect it to retain its value indefinitely. This led to some radical conclusions. For example, Wiener decried the secrecy that shrouded the scientific and technological discoveries of the Second World War; he felt that stealth was useless—even counterproductive—in maintaining the superiority of American research over the enemy’s. He believed that knowledge could best be advanced by ensuring that information remained open.

Information is not something that you can simply possess. It’s a process over time that involves producer, consumer, and intermediaries who assimilate and transmit it. It can be refined, increased, and improved by anyone in the chain. Technological tools play a relatively minor role: the actors are the beings who transform information into knowledge in order to pass it on. The activities of users affect the information itself. We filter, retrieve, catalogue, distribute, and evaluate information: we do not preserve it objectively. Even the acts of reading, selecting, transmitting, and linking transmute it into something different. Information is as delicate as it is valuable. Like an exquisite gourmet dish that is destroyed by transport in space or time, it should be enjoyed now, here at the table. Tomorrow may be too late. The world will have moved on, rendering today’s information stale.

THE PERSONAL LIBRARY

Vannevar Bush (1890–1974) is best remembered for his vision of the Memex, the forerunner of the personal digital assistant and the precursor of hypertext. One of America’s most successful scientists leading up to the Second World War, he was known not just for prolific scientific and technological achievements, but also for his prowess as a politician and scientific administrator. He became vice president and dean of engineering at MIT, his alma mater, in 1931. In 1940, he proposed an organization that would allow scientists to develop critical technologies as well as cutting-edge weapons, later named the Office for Scientific Research and Development. This placed him at the center of a network of leading scientists cooperating with military partners. With peacetime, the organization evolved under his direction into the National Science Foundation, which still funds research in the United States.

Bush’s experience as both scientist and technocrat provided the background for his 1945 vision:

A Memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.

Bush (1945)

He put his finger on two new problems that scientists of the time were beginning to face: specialization and the sheer volume of the scientific literature. It was becoming impossible to keep abreast of current thought, even in restricted fields. Bush wrote that scientific records, in order to be useful, must be stored, consulted, and continually extended—echoing Wiener’s “information as process.”

The dream that technology would solve the problem of information overload turned out to be a mirage. But Bush proposed a solution that even today is thought-provoking and inspirational. He rejected the indexing schemes used by librarians as artificial and stultifying and suggested an alternative.

The human mind … operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.

Bush (1945)

People make associative leaps when following ideas, leaps that are remarkably effective in retrieving information and making sense of raw data. Although Bush did not believe that machines could really emulate human memory, he was convinced that the Memex could augment the brain by suggesting and recording useful associations.

What Bush was suggesting had little in common with the giant calculating machines that were constructed during the 1940s. He was thinking of a desk-size workstation for information workers—lawyers, physicians, chemists, historians. Though he failed to recognize the potential of the new digital medium, his vision transcended technology and gave a glimpse of tools that might help deal with information overload. He foresaw the universe (which others call the Web) and inspired the pioneers who shaped it: Doug Engelbart, Ted Nelson, and Tim Berners-Lee.

THE HUMAN USE OF TECHNOLOGY

Although Bush did not participate directly in the artificial intelligence debate, he knew about it through his assistant Claude Shannon, who later created the theory of information that is still in use today (and also pioneered computer chess). The artificial intelligentsia of the day were striving to automate logical reasoning. But Bush thought that the highest form of human intelligence—the greatest accomplishment of the human mind, as he put it—was not logic but judgment. Judgment is the ability to select from a multitude of arguments and premises those that are most useful for achieving a particular objective. Owing more to experience than reasoning, it conjures up free association and loose connections of concepts and ideas, rather than the rigid classification structures that underlie library methods of information retrieval. He wanted machines to be able to exercise judgment:

Memex needs to graduate from its slavish following of discrete trails, even as modified by experience, and to incorporate a better way in which to examine and compare information it holds.

Bush (1959, p. 180)

Judgment is what stops people from making mistakes that affect human relationships—despite faulty data, despite violation of logic. It supplants logical deduction in the face of incomplete information. In real life, of course, data is never complete; rationality is always subject to particular circumstances and bounded by various kinds of limit. The next step for the Memex, therefore, was to exercise judgment in selecting the most useful links and trails according to the preferences of what Bush called its “master.” Today we call this “user modeling.” By the mid-1960s, his still-hypothetical machine embodied advanced features of present-day search engines.

How did Bush dream up a vision that so clearly anticipated future developments? He realized that if the information revolution was to bring us closer to what he called “social wisdom,” it must be based not just on new technical gadgets, but on a greater understanding of how to use them. “Know the user” is today a popular slogan in human-computer interface design, but in Bush’s day the technologists—not the users—were in firm control. New technology can only be revolutionary insofar as it affects people and their needs. While Wiener’s ethical concerns emphasized the human use of human beings, Bush wanted technologies that were well adapted to the needs of their human users.

THE INFORMATION REVOLUTION

The World Wide Web arose out of three major technical developments. First, with the advent of interactive systems, beginning with time-sharing and later morphing into today’s ubiquitous personal computer, people started to take the issue of human-computer interaction seriously. Second, advances in communication technology made it feasible to build large-scale computer networks. Third, changes in the way we represent knowledge led to the idea of explicitly linking individual pieces of information.

COMPUTERS AS COMMUNICATION TOOLS

J. C. R. Licklider (1915–1990) was one of the first to envision the kind of close interaction between user and computer that we now take for granted in our daily work and play. George Miller, doyen of modern psychologists, who worked with him at Harvard Laboratory during the Second World War, described him as the “all-American boy—tall, blond, and good-looking, good at everything he tried.” Unusually for a ground-breaking technologist, Lick (as he was called) was educated as an experimental psychologist and became expert in psychoacoustics, part of what we call neuroscience today. In the 1930s, psychoacoustics researchers began to use state-of-the-art electronics to measure and simulate neural stimuli. Though his background in psychology may seem tangential to his later work, it inspired his revolutionary vision of computers as tools for people to interact with.

Computers did not arrive on the scene until Lick was in mid-career, but he rapidly came to believe that they would become essential for progress in psychoacoustic research. His links with military projects gave him an opportunity to interact (helped by an expert operator) with a PDP-1, an advanced computer of the late 1950s. He described his meeting with the machine as akin to a religious conversion. As an early minicomputer, the PDP-1 was smaller and less expensive than the mainframes of the day, but nevertheless very powerful—particularly considering that it was only the size of a couple of refrigerators. An ancestor of the personal computer, it was far more suited to interactive use than other contemporary machines. Though inadequate for his needs, the PDP-1 stimulated a visionary new project: a machine that could become a scientific researcher’s assistant.

In 1957, Lick performed a little experiment: he noted down the activities of his working day. Fully 85 percent of his time was spent on clerical and mechanical tasks such as gathering data and taking notes—activities that he thought could be accomplished more efficiently by a machine. While others regarded computers as giant calculating engines that performed all the number-crunching that lies behind scientific work, as a psychologist Licklider saw them as interactive assistants that could interpret raw data in accordance with the aphorism that “the purpose of computing is insight, not numbers.”

Believing that computers could help scientists formulate models, Licklider outlined two objectives:

1) to let the computers facilitate formulative thinking as they now facilitate the solution of formulated problems, and 2) to enable men and computers to cooperate in making decisions and controlling complex situations without inflexible dependence on predetermined programs.

Licklider (1960)

He was more concerned with the immediate benefits of interactive machines than with the fanciful long-term speculations of artificial intelligence aficionados. He began a revolution based on the simple idea that, in order for computers to really help researchers, effective communication must be established between the two parties.

TIME-SHARING AND THE INTERNET

Licklider synthesized Bush’s concept of a personal library with the communication and control revolution sparked by Wiener’s cybernetics. He talked of “man-computer symbiosis”: cooperative and productive interaction between person and computer. His positive, practical attitude and unshakable belief in the fruits of symbiosis gave him credibility. Though others were thinking along the same lines, Lick soon found himself in the rare position of a man who could make his dream come true.

The U.S. Defense Department, alarmed by Russia’s lead in the space race—Sputnik, the world’s first satellite, was launched in 1957—created the Advanced Research Project Agency (ARPA) to fund scientific projects that could significantly advance the state of the art in key technologies. The idea was to bypass bureaucracy and choose projects that promised real breakthroughs. And in 1962, Licklider was appointed director of ARPA’s Information Processing Techniques Office, with a mandate to raise awareness of the computer’s potential, not just for military command but for commercial enterprises and the advancement of laboratory science. Human-computer symbiosis was elevated from one person’s dream to a national priority.

The first advance was time-sharing technology. Interacting one-on-one with minicomputers was still too expensive to be practical on a wide scale, so systems were created that allowed many programmers to share a machine’s resources simultaneously. This technical breakthrough caused a cultural change. Suddenly programmers realized that they belonged to the same community as the computer’s end users: they shared objectives, strategies, and ways of thinking about their relationship with the machine. The idea that you could type on the keyboard and see an immediate output produced a seismic shift in how people perceived the machine and their relationship with it. This was a first step toward the symbiosis that Licklider had imagined.

The second advance was the world’s first wide-area computer network, designed to connect scientists in different institutions and facilitate the exchange of ideas. In a series of memos that foreshadowed almost everything the Internet is today, Licklider had, shortly before he was appointed, formulated the idea of a global (he light-heartedly baptized it “galactic”) computer network. Now he had the resources to build it. Time-sharing reformed communication between people and machines; the network spawned a new medium of communication between human beings. Called the ARPAnet, in 1969 it grew into the Internet.

In 1968, Licklider wrote of a time in which “men will be able to communicate more effectively through a machine than face to face.” He viewed the computer as something that would allow creative ideas to emerge out of the interaction of minds. Unlike passive communication devices such as the telephone, it would participate actively in the process alongside the human players. His historic paper explicitly anticipated today’s online interactive communities:

[They] will consist of geographically separated members, sometimes grouped in small clusters and sometimes working individually. They will be communities not of common location, but of common interest.

Licklider and Taylor (1968)

Although the future was bright, a caveat was expressed: access to online content and services would have to be universal for the communication revolution to achieve its full potential. If this were a privilege reserved for a few people, the existing discontinuity in the spectrum of intellectual opportunity would be increased; if it were a birthright for all, it would allow the entire population to enjoy what Licklider called “intelligence amplification.”

The same reservation applies today. Intelligence amplification will be a boon if it is available universally; a source of great inequity otherwise. The United Nations has consistently expressed profound concern at the deepening mal-distribution of access, resources, and opportunities in the information and communication field, warning that a new type of poverty, “information poverty,” looms. The Internet is failing the developing world. The knowledge gap between nations is widening. For the sake of equity, our society must focus on guaranteeing open, all-inclusive, and cooperative access to the universe of human knowledge—which others call the Web.

AUGMENTING HUMAN INTELLECT

Doug Engelbart (1925–) wanted to improve the human condition by inventing tools that help us manage our world’s growing complexity. Like Licklider, he believed that machines should assist people by taking over some of their tasks. He was the key figure behind the development of the graphical interface we all use every day. He invented the mouse, the idea of multiple overlapping windows, and an advanced collaborative computing environment of which today’s “groupware” is still but a pale reflection. He strove to augment human intellect though electronic devices that facilitate interaction and collaboration with other people. He came up with the radical new notion of “user-friendliness,” though his early users were programmers and their systems were not as friendly as one might hope.

He thought that machines and people would co-evolve, mutually influencing one another in a manner reminiscent of Licklider’s “man-computer symbiosis.” Engelbart’s ground-breaking hypermedia groupware system represented information as a network of relations in which all concepts could be reciprocally intertwined, an approach inspired by Bush’s vision of the “intricacy of the web of trails.” In fact, Engelbart wrote to Bush acknowledging his article’s influence on his own work. Links could be created at any time during the process of organizing information—the genesis of today’s hypertextual world.

Engelbart recognized from the outset that knowledge management was a crucial part of the enterprise. He foresaw a revolution that would “augment human intellect,” in which knowledge workers would be the principal actors. An essential step was to make the computer a personal device, another radical notion in the mid-1960s. Engelbart recognized that the greatest challenge was the usability of the data representation, which could be achieved only by increasing the collaborative capabilities of both individuals and devices. The key was to allow the “augmented person” to create relations easily, relations that the “augmented computer” kept track of automatically. His sci-fivision was that human beings could evolve through interaction with their machines—and vice versa.

Engelbart’s innovative perspective caught the eye of the establishment. ARPA funded his work under the auspices of the prestigious Stanford Research Institute. When Xerox’s Palo Alto Research Center (PARC) was established at the beginning of the 1970s—it would soon become the world’s greatest human-computer research incubator—its founders recognized the importance of Engelbart’s work and began to entice researchers away from his group. In 1981, PARC produced the Star workstation, the culmination of a long line of development. Though not a commercial success in itself, Star inspired Apple’s Macintosh, which eventually provoked Microsoft into producing their now-ubiquitous Windows operating system. Through these developments, it was Engelbart and his collaborators who made the computer what it is today: a user-friendly hypertextual networked machine.

THE EMERGENCE OF HYPERTEXT

The spirit of hypertext was in the air during the 1960s. Humanities scholars imagined a tangled world of relations in which avid readers followed trails through intertwined webs of concepts. The first to try to actualize these “literary machines” was Ted Nelson (1937–), who coined the term hypertext. He discovered computers while at Harvard graduate school, and under Bush’s influence imagined that they might be used to keep track of his stream of ideas and of notes. While he thought this would be easy to realize, he later confessed that he mistook a clear view for a short distance, an endemic problem in computer programming.

Nelson saw the computer not just as a communication aid but as a completely new tool: a device to create the very content that would be communicated. He had no doubt that computers could help people in their creative thought processes. What he really wanted was something that kept track of all the author’s revisions and changes of mind, so they could revert to previous versions at will. He also imagined adding links that were like special footnotes where reader and author could skip hand in hand from one part of the work to another, transcending the traditional serial presentation of content. He developed these ideas independently of Engelbart, though he too was inspired by Bush.

Around the end of the 1970s, Nelson began Xanadu, a vision and embryo software design for a universal system of electronic information storage and access that provided inspiration to others but was never completed in itself. His contribution was to see computers as creative tools that would revolutionize the way in which literature is read and written. Computers had always been aimed at science; Nelson was among the first to direct their attention toward the humanities. This is as significant a gesture as Galileo’s four hundred years earlier when he pointed his telescope at the sky. Before that, it was a terrestrial instrument, used to spot faraway ships. Simply changing the direction of the lens transformed it into a new tool that shed fresh light onto our relationship with the world about us.

Nelson is a colorful and controversial figure who describes himself as a designer, generalist, and contrarian. He often repeats the four maxims by which he claims to lead his life: “most people are fools, most authority is malignant, God does not exist, and everything is wrong.” There’s something here to offend everyone! He imagined a community built around Xanadu, an anarchic group whose economy would be based on a system of reciprocal royalties for using one another’s text. This community was born some years later, but rested on a different economic principle: mutual cooperation in which members share their content, rewarded by little more than positive reputation and social recognition.

AND NOW, THE WEB

Nelson’s vision was transformed into reality by a young physicist. In 1980, Tim Berners-Lee (1955–) was a consultant at CERN, the European particle physics laboratory in Geneva. Stimulated by the need to communicate among dozens of institutions, he developed a program to help people track one another’s work. He whimsically called it Enquire, after a Victorian self-help book, Enquire Within Upon Everything, which had fascinated him as a boy. It was a remarkable project, even though—as we have seen—the basic idea of hypertext had already been established.

Following its conception, the web remained in utero for nearly a decade. The hypertext environment was intended only for CERN’s internal document repository. Berners-Lee continued to tinker with it, and in 1991 he sent to an Internet newsgroup a description of a project that would allow links to be made to any information, anywhere in the world. He announced a prototype hypertext editor for the NeXT, a powerful but little-used computer, and a browser for old-fashioned line-oriented terminals. He wrote: “If you’re interested in using the code, mail me. It’s very prototype, but available from [an Internet address]. It’s copyright CERN but free distribution and use is not normally a problem.”

One year later, the World Wide Web was demonstrated and distributed, along with browser software. But it only became popular when Marc Andreessen, a young programmer in the National Center for Supercomputing Applications at the University of Illinois, created a graphical web browser called Mosaic that significantly improved upon Berners-Lee’s original design. Then the web exploded and has now populated every corner of our planet.

The World Wide Web allows people to create and organize their own information spaces and share them with others. In Berners-Lee’s own words:

There was no central computer “controlling” the Web, no single network on which these protocols worked, not even an organization anywhere that “ran” the Web. The Web was not a physical “thing” that existed in a certain “place”. It was a “space” in which information could exist.

Berners-Lee (2000, p. 39)

The core ideas were a distributed, public information space freely accessible by all; organization of information into hypertext documents; no central authority to control the distribution of content; and a system for access based on the creation of web “trails” (as Bush had anticipated). Each of these characteristics is essential to the nature of the web as we know it.

The WWW is one of the greatest success stories in the history of technology. Although it exploded into the world without warning, like a supernova, the ground had been prepared over several decades: it is the culmination of the conjoint effort of philosophers, engineers, and humanities scholars. Between them, these people conjured up two revolutions, one in information dissemination and the other in human-computer interaction.

THE WORLD WIDE WEB

We opened with Borges’ Library of Babel as an analogy for the parlous state of the web, the universe with which this book is concerned. After briefly placing the information revolution in context by comparison with other revolutions (the Enlightenment and the French Revolution), we heard what three very different thinkers from distant times and cultures, Plato, Peirce and Wittgenstein, had to say about the World Wide Web. We turned to early technologists and cited some of their concerns: Wiener’s preoccupation with communication as the basis for society and with the dynamic nature of information, and Bush’s belief in the personal library and the primacy of judgment over logic. The information revolution itself was fertilized by technical developments in computer science and required a reconceptualization of the notion of who the “users” of computers really are. Finally, we recounted how the web was born in a corner of a small country at the center of Europe.

We close by reflecting on how the web is used. It’s created by people, and communities, and corporations—let’s call them “writers.” Its contents constitute a huge world archive of information. Although it certainly supports minority viewpoints, it is not a reflection of our world as it exists, as people often claim, but a reflection of our world as perceived by the writers. Although in principle almost anyone can be a writer, in practice few are—though wikis and blogs are redressing the balance. Ordinary people—let’s call them “readers”—consult the web for information every day, and they do so through search engines. For many readers, their favorite search engine is synonymous with the web. What we read is not the repository but a biased view of it, biased by our search engine.

A UNIVERSAL SOURCE OF ANSWERS?

Gottfried Wilhelm Leibniz (1646–1716) was one of the most prominent thinkers of his time. A brilliant scientist and illustrious philosopher, he was also an ambassador and influential political consultant. As a youth, he dreamed of discovering how to settle all philosophical and scientific questions by calculating the results of a specified mathematical procedure. His plan was to design a formal language (Lingua Characteristica) and inference technique (Calculus Ratiocinator) and then find a way to express questions in the language and apply the rules of the calculus mechanically. He fantasized that savants would sit together and calculate the outcome of their debate without any disagreement or confusion. Leibniz hoped that the project would avoid effort wasted in pointless discussion and argumentation. It would supply definitive, incontrovertible answers to all questions—along with an assurance that the solutions were wise, effective, and trustworthy. It would make reasoning as easy as talking and liberate us from the anxiety and injustice of wrong decisions and judgments.

Leibniz’s dream was hopelessly utopian. For a start, it presupposes that every philosophical, ethical, and scientific problem has a definitive answer, which unfortunately is not the case. The history of thought is littered with “dream of reason” projects like this. We all shrink from the insecurity inherent in decision-making, the stress involved in exercising judgment to come up with answers to delicate questions upon which our reputation rests. The myth that a fixed system of reasoning can provide a panacea to all doubts and difficulties is hard to dispel.

The web is the largest collection of information ever known. One might speculate that it contains the answer to every conceivable question—if not today, perhaps tomorrow, or next year—and revive the dream of a universal machine that can answer everything. As Leibniz suggested three centuries ago, the key would reside in the information representation. He wanted to couch everything in an artificial language in such a way that the answer to any question could be either retrieved or calculated. Fifty years ago, the computing world embarked on an analogous but far less ambitious venture: the construction of massive databases of company information. Data was normalized, fields were mandatory, data entry procedures were formalized, and rigid interrogation strategies were used to retrieve results. In the closed world of company databases, logical procedures returned the correct answer to every conceivable question.

When the web appeared, a painful adjustment took place. The logical foundation of document retrieval differs fundamentally from that of data retrieval. It is simply not possible to represent textual documents—web pages—uniformly in a way that supports answering questions about what they mean. The calculations performed by search engines recognize the lexical and statistical properties of text, but not its meaning.

Contrast a database query with a web request. To the database, we pose a clear question that implies a direct and well-defined answer: “how many students are enrolled in Harvard courses this year?” In the worst case, the information is absent; perhaps this year’s figures are not yet available. Even then the answer is useful. But when users interrogate the web, they do not expect a unique reply but rather a set of documents that probably includes some useful information on the topic. We could ask about the U.S. government’s position on the Middle East. The response will necessarily be indeterminate and nondeterministic—and will require the user’s judgment in determining which links to follow. It is meaningless to classify the query’s outcome in stark terms of success or failure.

The web user is faced with an indefinitely long list of results, most of them irrelevant. Contrast this with the precise answer obtained by a database user. Unfortunately, this deep shift in perspective is not always perceived clearly. When querying a database, the interrogator knows that there is a clear connection between request and reply. When the same question is posed to a web search engine, there is no guarantee of user satisfaction—indeed, we cannot even measure it. If the web is our sole information source and we interact using a particular tool, we have no way of evaluating the results obtained, comparing them with others, or knowing whether what we have found even scratches the surface of what is available.

Most users blindly trust their search tool, a single information source, when at their feet lies a subtle, collective, multiply-linked structure. Why? Like Leibniz, we are all seduced by the dream of reason. We feel, or hope, that it is possible to obtain a unique, definitive solution—just as we used to do when searching databases. We yearn to avoid ambiguity, the obligation to select results, the need to investigate the fallibility and evaluate the performance of search tools, to judge the quality of resources. There is no shame in this: Leibniz suffered the same misconception.

But be aware: The dream of reason is a dangerous nightmare. Users should not become victims. We must continually invent creative new procedures for searching. We must employ judgment to evaluate search tools. We must spend time and mental energy. We must not behave as though database-style integrity is guaranteed by the search tools we use every day.

WHAT USERS KNOW ABOUT WEB SEARCH

There is a mind-boggling supply of information on the web. Search engines—the “web dragons” of the book’s title—mediate between this treasure and its consumers, and purport to guarantee access to all that we require. We all use them in an attempt to find what we need to know. In the words of a recent report on Internet usage, searchers are confident, satisfied, and trusting—but they are also unaware and naïve.

The overwhelming majority of adult Internet users have used a search engine. Well over half the people online consult one on any given day (in 2006). Search engine usage rivals e-mail, which is impressive because e-mail is a private interpersonal communication activity. People employ search engines for important and trivial questions alike. They have become part of everyday life.

Though hundreds of search engines are freely and publicly available, a very few capture the overwhelming majority of the audience. According to the well-known 80/20 rule, 80 percent of users are concentrated on 20 percent of applications. The proportion is even more extreme with search engines because of a strong economic feedback loop: popular ones attract a greater volume of advertising and produce the highest revenues, which funds improved and diversified services. The dragons are out to capture our attention.

Users trust their own ability as web searchers. More than 90 percent of people who use search engines say they are confident in the answers; half are very confident. Users also judge their research activities as successful in most cases. “But how do you know,” Plato would retort, recalling Socrates’ conversation with Meno. Users should recognize Plato’s dilemma of knowledge: you cannot tell when you have arrived at the truth when you don’t know what the truth is.

The less Internet experience people have, the more successful they regard their own searches. Neophytes have overwhelming confidence that their search results are satisfactory; they grow more skeptical as their experience with search engines increases. Use is particularly prevalent among the younger generation, who feel daunted by the prospect of seeking information on the web (or anywhere else) without the help of search engines.

A report on the critical thinking of higher education students in the Internet era sums up its message in the title: “Of course it’s true; I saw it on the Internet!” Students tend to place supreme confidence in the results of search engines, without having any idea how they work or being aware that they are fundamentally commercial operations. Researchers asked students in a first-year university class on Computers and the Internet to answer several questions, some of which were prone to misinformation. The results were remarkable. Though they were not obliged to use the Internet for their information, fewer than 2 percent of students cited any other source. Students were extraordinarily confident in search engines and remained faithful to their dragon of choice throughout the survey, even when the desired answer eluded them. More interesting than the quantitative results were the students’ attitudes toward search capability. When their answers were correct, they rarely verified them against an alternative source, even though this would have been easy for them to do.

Given these findings, it is particularly disturbing that users think their search activity is successful. The students in this survey apparently place blind trust in whatever is presented to them by their favorite search engine. This is a dangerous vicious circle: users believe they are capable searchers precisely because they are uncritical toward the results that their search engine returns.

Surveys have revealed that more than two-thirds of users believe that search engines are a fair and unbiased source of information. In spite of the trust they place in these tools, the most confident users are those who are less knowledgeable and experienced in the world of search. In particular, many are unaware of two issues: commercialism, in the form of sponsored links, and privacy, or the fact that search engines have an opportunity to track each user’s search behavior. Only around 60 percent of users can identify commercially sponsored links in the search results, a proportion that has remained unchanged over the past two years. Even more prevalent is ignorance of potential privacy issues: nearly 60 percent of users are unaware that their online searches can be tracked.

SEARCHING AND SERENDIPITY

In a letter of January 28, 1754, to a British envoy in Florence, the English politician and writer Horace Walpole coined a new term: serendipity. Succinctly characterized as the art of finding something when searching for something else, the word comes from a tale called The Three Princes of Serendip (an old name for Sri Lanka). These princes journeyed widely, and as they traveled they continually made discoveries, by accident and sagacity, of things they were not seeking.

Serendipity shares salient characteristics with online information discovery. First, seeking answers to questions is not a static activity, but involves a quest, a journey. Second, exploration of new territory requires navigation tools: compass and sextant, perhaps a map and previous travelers’ diaries. Third, luck and intelligence are needed to make new discoveries: we must sagaciously interpret what we bump into by chance. Fourth, it is not possible to plan all research steps in full detail; we must be flexible enough to integrate our preliminary thoughts with hints collected in the field and adapt our strategy as required. Fifth, we often find what we are not seeking, but it nevertheless behooves us to understand and interpret the discovery. Sixth, while traveling, we will rarely encounter something completely new, for indigenous tribes already inhabit the spaces we are in the process of discovering. Innovation is about creating new connections and new relations with already known territory. Finally, serendipity implies that logical trails of discovery can never guarantee certainty. Every result is provisional and temporary, subject to revision as new vistas unfold.

Just as nature is continuously changing, so is the web. It’s a dynamic world: a collective memory that is in constant flux, not a static database that yields the same answer to the same question. Creating and using the web are activities for craftsmen as much as scholars. We strive to increase our knowledge despite the fact that we will never see the whole picture. The process is fascinating—but hardly reassuring, and to some, unnerving. There is no fixed point, no guiding star, and no guarantee. We travel with light hearts and a positive spirit, eager to face the continual challenges that a dynamic archive poses. We must be resourceful and embrace a diverse set of tools. To make sense of the universe (which others call the Web), we must recognize its social character, accept that discovery is never-ending, and exercise our judgment at all times. We must succeed where the librarians of Babel failed.

SO WHAT?

The World Wide Web is exerting a profound influence on the way we think, work, and play. It’s an absolutely unprecedented phenomenon. But although it exploded onto the scene in the space of a few years, in a way that was totally unexpected, it didn’t arise out of nowhere. The groundwork had been laid over centuries by philosophers and over decades by technological visionaries. Though they knew nothing of the web itself, looking back over their work we can see that they had a lot to say about it.

In the next chapter we will tease out another strand of our intellectual heritage, one that did not sow the seeds of the web but is now rushing headlong toward convergence with it: libraries and the world of books.

WHAT CAN YOU DO ABOUT ALL THIS?

• Read Jorge Luis Borges’ little story.

• Be creative about the tools you use to find information.

• Split your everyday search tasks into groups that call for different techniques.

• Don’t get stuck in a rut: try a different dragon for a change.

• Ask your friends whether they regard search engines as an unbiased source of information.

• Try the same search on the same search engine a week later.

• Enjoy serendipitous adventures when you next surf the web.

• Discuss the paradox of inquiry with your friends.

• Find My Life Bits on the web and trace its origins back to Bush’s Memex.

• Express your view on the evolution of science in the Internet era.

• Read The Onion’s zany satirical article “Factual error found on Internet.”

NOTES AND SOURCES

To avoid breaking up the flow of the main text, all references are collected together at the end of each chapter. This first Notes and Sources section describes papers, books, and other resources relevant to the material covered in Chapter 1.

The Library of Babel originally appeared in Spanish in Borges’ 1941 collection of stories, El Jardín de Senderos que se Bifurcan (The Garden of Forking Paths). Borges (2000) is a translation illustrated with beautiful, evocative etchings by Erik Desmazières. We found inspiration in Michel Foucault’s writings (1984, 1994), in particular his interpretation of Kant’s essay What is Enlightenment (1784) and Kant’s views on the French Revolution in The Conflict of the Faculties (1798). First published in 1953, two years after the author’s death, Wittgenstein’s Philosophical Investigations (2003) was a major contribution to last century’s philosophical debate that raised new questions about language and its interpretation.

Turning to the technologists, Wiener sets out his ideas in two books on cybernetics (1948, 1950), while Heims (1980) provides an independent account of Wiener’s ideas and achievements. Vannevar Bush’s seminal paper (1945) is available in many anthologies on the history of technology; Nyce and Kahn (1991) give a compendium of his contributions, including his papers and archival material, along with editorial comment and discussion of his far-sighted ideas. Lick’s early ideas are set out in Licklider (1960), and his vision of how men (sic) will be able to communicate more effectively through a machine than face to face is in Licklider and Taylor (1968). Waldrop (2002) gives a detailed description of Lick’s influence on the development of man-computer symbiosis through personal computers, while Chapter 7 of Rheingold (2000) recounts his contributions to user-friendliness and the history of computer networks.

Engelbart’s Bootstrap Institute website1 provides electronic copies of all his work, including the invention of the mouse, window, and hypermedia groupware. Bardini (2000) and Lana (2004) (the latter in Italian only) describe the role of people and machines in Engelbart’s “Augmentation of human intellect” project. You can read about Ted Nelson and how he pioneered two of the most influential cyberspace artifacts, hypertext and the Docuverse, on his website.2 The fascinating history of the web, told by its inventor Berners-Lee in his book Weaving the Web (2000), recounts from the very first step the developments that brought us where we are now.

Pew Internet & American Life Project3 surveyed search engine users in a January 2005 report. The study of college students in a Computers and the Internet course was undertaken by Graham and Metaxas (2003).

 Inside the Library of Babel.

1www.bootstrap.org

2xanadu.com.au/ted; a detailed bibliography appears at www.mprove.de/diplom/referencesNelson.html

3www.pewinternet.org

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset