Chapter 6

Big Data and Risk Assessment

Eileen R. Ridley

6.1 Introduction

The benefits of Big Data touch almost every aspect of digitized life: entertainment, academia, health, commercial enterprise, and governmental operations. However, with its breadth of reach comes greater exposure to risk and litigation. Significantly, the issue of privacy in the context of mass digitized information is relatively new. For a period of time, the public was enthralled with the benefits of the personal computer, the Internet, and personal mobile devices that made access to information available at literally a touch of a button. The conveniences provided by these technological advances distracted attention away from the realities of how those conveniences were provided, that is, via the collection, analysis, and distribution of data. However, as the public became more educated in how the new technological world worked, it became more concerned with how personal information was retained and distributed. Indeed, public awareness has been further heightened by various scandals, such as the recent revelations regarding the National Security Agency’s (NSA’s) use of digitized information, which in turn has spawned further privacy litigation. Big Data (as distinct from the issue of privacy alone) is a relatively recent evolution of the use of data. It is therefore likely that, with increased public awareness of Big Data and its uses, there will be new legal challenges and litigation focused on individual privacy rights and the principles of transparency, notice, access, and choice in the context of Big Data.

Although there are relatively few published cases discussing litigation of Big Data issues, those that do exist provide instruction for companies engaged in Big Data analytics. In short, companies must ensure transparency and simultaneously establish the business rationale for the use of Big Data. Moreover, when constructing mechanisms to use Big Data, companies should build in processes to retain and preserve their analytics in the case of litigation.

6.2 What Is the Strategic Purpose for the Use of Big Data?

Although the commercial benefits of the use of Big Data are apparent, in the context of limiting risk, it is important for companies to be clear regarding the business purpose for the use of Big Data (and ensure their Big Data applications follow that purpose). This identification has proven to be particularly useful in the context of litigation. Indeed, courts frequently weigh the importance of the business purpose (and whether the use of Big Data exceeds that purpose) with the claimed violation of privacy (and whether the claimant was informed of the company’s intended use of the data). For companies with business models dependent on the use of Big Data, risk is best mitigated by establishing the commercial and public value of their business model.

Most recently, this principle was proven in litigation between Google Inc., and the Authors Guild Inc.1 The case concerned the Google Books Project; Google would scan books (including copyrighted books) and use optical character recognition technology to generate machine-readable text, thereby creating a digital copy of each book. Google then would analyze each scan and create an overall index of all the scanned books. The index, in turn, allows for a search for a particular word or phrase throughout the scanned works. Google included certain security measures to prevent users from viewing a complete copy of the works by permitting snippet views. In deciding in favor of Google against claims of copyright infringement by the authors, the court noted that there were many benefits of the project including (1) the creation of an efficient method of finding books; (2) the creation of a research tool; (3) improvement of interlibrary lending; and (4) the facilitation of finding and checking citations. Significantly, the court also noted (as a benefit) that the project promoted “data mining” or “text mining.” Data mining or text mining is essentially the analysis of Big Data to produce results specific to a particular inquiry (e.g., is a particular word used, is a particular product in demand, etc.). The court considered data mining a research tool and noted that the project permitted researchers to track the frequency of references and how word uses had changed over time, thereby providing insights about “‘fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology.’”2 Indeed, in ruling for Google, the court went out of its way to note that the public benefit of the use of Big Data and data mining supported its ruling:

In my view, Google Books provides significant public benefits. It advances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders. It has become an invaluable research tool that permits students, teachers, librarians, and others to more efficiently identify and locate books. It has given scholars the ability, for the first time, to conduct full-text searches of tens of millions of books. It preserves books, in particular out-of-print and old books that have been forgotten in the bowels of libraries, and it gives them new life. It facilitates access to books for print-disabled and remote or underserved populations. It generates new audiences and creates new sources of income for authors and publishers. Indeed, all society benefits.3

Thus, Google successfully avoided liability by clearly defining the business purpose of its use of Big Data prior to the litigation, cogently presenting that vision to the court, and emphasizing to the court the public benefits of the results. Significantly, Google was successful in the face of the authors’ claims to copyright, which typically trump mere claims of commercial interest in disputed works. However, the court found Google’s competing commercial interests (e.g., attracting customers to purchase books) to be compelling. However, Google’s use of Big Data analytics to create a public good (e.g., developing research tools) while providing some protection to the claimant’s rights (notably Google prevented users from seeing a complete copyrighted work) enabled it to derive additional commercial benefit from its use of Big Data. The lesson: The litigation risk of using Big Data can be mitigated by a defined business purpose that (1) includes transparency so that the consumer is informed regarding how the data is being used; (2) provides protections for any competing commercial interests, and (3) promotes the advancement of the public good.

6.3 How Does the Use of Big Data Have an Impact on the Market?

Another issue that companies using Big Data should consider in the context of risk assessment is how the use of Big Data will have an impact on the marketplace. Generally, these questions go to whether there might be a claim that the use of the data would provide the basis for business claims like unfair competition. In reviewing these issues, companies should fully assess the market power Big Data analytics provide for the company, its vendors, and its competitors (see also Chapter 8, “The Antitrust Laws and Big Data”). This is particularly true when the company’s use of Big Data analytics provides it with a commercial benefit at the expense of another company’s commercial interest. Two recent cases highlight this issue.

In PeopleBrowsr, Inc. v. Twitter, Inc., the court noted that viable state court claims could be raised as a result of Twitter’s sudden exclusion of PeopleBrowsr from receipt of Twitter’s “Big Data analytics” market.4 Twitter’s Big Data analytics market consisted of companies that used data-mining techniques to derive insights from the flow of information generated on Twitter. In other words, Twitter provided companies with raw data that assisted those companies in marketing their products and services. Thus, a soft drink maker could gather information to determine if its new product was trending on Twitter, which in turn could be used as a measure of the effectiveness of its marketing campaign. PeopleBrowsr participated in the market for over four years, receiving every tweet posted on Twitter through the Twitter “Firehose” and paid Twitter over $1 million per year for such access. As the court noted: “PeopleBrowsr analyzes tweets to sell information to its clients, such as insight regarding consumer reactions to products and services as well as identification of the Twitter users who have the most influence in certain locations and communities.”5 After having such access, Twitter decided to identify favored companies to exert more control over the Twitter Big Data analytics market. PeopleBrowsr was not one of those favored and brought an action for, among other claims, unfair competition. PeopleBrowsr not only obtained a preliminary injunction against Twitter but also successfully defended against Twitter’s attempt to move the case to federal court and dismiss the action.

Apparently, the court in PeopleBrowsr found Twitter’s actions to be arbitrary and potentially predatory (by unilaterally trying to control and narrow its self-created Big Data analytics market). The lesson: In the age of Big Data, companies not only must be sensitive to how they deal with consumer information but also must consider the market effects of providing their Big Data analytics to third parties—including how they determine which parties will receive such information. As brokers of Big Data analytics, companies face significant litigation risk if their actions to create, narrow, or redefine their market are considered capricious. Again, transparency and a defined business model expressly addressing the use of Big Data are critical to limit a company’s risk.

Another instructive case is Tiffany (NJ), Inc. v. eBay, Inc.6 In this case, Tiffany had identified that items had been sold on eBay Inc. that were not genuine Tiffany products. Tiffany, of course, is a famous jeweler, and eBay is an online marketplace. Tiffany sought to protect its trademarks in a suit against eBay, contending that eBay was obligated to prohibit sellers from placing counterfeit Tiffany items on the market. How does this relate to Big Data? Tiffany presented the somewhat novel argument that eBay, as a vast online marketplace, had access to an enormous amount of data and had instituted fraud protocols that enabled it to analyze the data to assist in identifying suspect vendors. In essence, Tiffany contended that eBay was obligated to use its Big Data capabilities to root out forgeries and police the marketplace. Indeed, an expert for Tiffany testified that

using data mining techniques commonly used by corporations, eBay could have designed programs that identified listings of Tiffany items likely to be counterfeit, and that identified sellers thereof, using an algorithm to produce a “suspiciousness” score.7

Ultimately, the court rejected this contention, noting, for purposes of trademark claims, the rights holder (i.e., Tiffany) was obligated to show that eBay actually knew that specific items that purported to be Tiffany products were in fact forgeries. Tiffany could not meet this standard. Further, the court noted that the law did not obligate eBay to use its Big Data capability to police the site. However, the court took pains to note the following:

The result of the application of this legal standard is that Tiffany must ultimately bear the burden of protecting its trademark. Policymakers may yet decide that the law as it stands is inadequate to protect rights of owners in light of the increasing scope of Internet commerce and the concomitant rise in potential trademark infringement. Nevertheless, under the law as it currently stands, it does not matter whether eBay or Tiffany could more efficiently bear the burden of policing the eBay website for Tiffany counterfeits—an open question left unresolved by this trial.8

Thus, the court seems to warn that companies with the capacity to employ Big Data analytics may be compelled to do so to protect fair competition and their commercial marks (indeed, the court noted that Tiffany could have used the same data-mining techniques it suggested eBay employ to protect Tiffany’s trademark).9 In other words, although a company may develop its Big Data capabilities for its own commercial benefit, those same capabilities may require it to proactively protect not only their own separate commercial interests (such as copyrights, trademarks, and patents) but also those of others. This is particularly true when the business model entails the use of another company’s product (and the associated trade rights), such as eBay. It is unlikely that any court would require a company to incur extraordinary expense to protect another’s commercial interest. However, if doing so would subject a company to relatively nominal cost, courts will be more likely to assign that obligation on the entity. Thus, companies that employ Big Data analytics should not only consider how those analytics might increase their market share but also consider how the same analytic capability might be employed to deter claims by the public, competitors, and vendors. For example, data analytics can be used to police websites to identify possible breaches and forgeries while also providing the basis to thwart competitive challenges (e.g., if the data analytics not only provides a competitive advantage but also fosters general public knowledge). Further, data analytics may also be employed to assist companies in responding to discovery should litigation ensue.

6.4 Does the Use of Big Data Result in Injury or Damage?

For any litigation claim to stand, the plaintiff must establish that the attributed conduct by the company resulted in injury or damage. In the privacy and Big Data context, however, proving injury or damage can frequently be a high hurdle to jump.

Two decisions offer a case study. The first, In re JetBlue Airways Corp. Privacy Litig., concerned the creation of passenger name records (PNRs) by airlines and their use by other entities.10 JetBlue (like other airlines) had a practice of compiling and maintaining personal information (the PNRs) of passengers. The PNRs typically included the passenger names, addresses, phone numbers, and travel itineraries and were obtained through flight bookings either telephonically or online. Acxiom provides customer and information management solutions and separately maintained personally identifiable information on almost 80% of the US population. After September 11, 2001, a data-mining company (DMC), Torch, approached the Department of Defense (DoD) and suggested it could help enhance security by analyzing information contained in the PNRs to identify persons seeking access to military installations and predicting which individuals might pose a security risk. The DoD agreed to the plan and allowed airline PNRs to be a data source for the project. JetBlue was contacted by Torch, through the DoD, to provide its PNR data, which it did (without compensation). Torch combined this information with data from Acxiom, which constituted approximately five million electronically stored PNRs. Merging the data resulted in Torch obtaining a single database of JetBlue passenger information, including each passenger’s name, address, gender, home ownership or rental status, economic status, Social Security number, occupation, and the number of adults and children in the passenger’s family as well as the number of vehicles owned or leased. Torch used this data to create a profiling scheme regarding high-risk passengers.11 JetBlue acknowledged that providing the PNRs was a violation of the company’s privacy policy (e.g., no consent by JetBlue’s customers was obtained for the transfer of the information). A class of plaintiffs then brought the litigation claiming violations of various privacy statutes (including the Electronic Communications Privacy Act, ECPA) and state common law claims. The court determined that there was no liability under the ECPA because the statute is only applicable to “electronic communication services,” which involve a “service which provides to users the ability to send or receive wire or electronic communications” (18 U.S.C. Section 2510(15)). JetBlue is not such a service and therefore was not liable under the ECPA. More important for this discussion, the court further ruled that JetBlue was not liable for the remaining claims because the plaintiffs could not establish damage or injury. Specifically, the court noted that “[i]t is apparent based on the briefing and oral argument held in this case that the sparseness of the damages allegations is a direct result of plaintiffs’ inability to plead or prove any actual contract [or other] damages. As plaintiffs’ counsel concedes, the only damage that can be read into the present complaint is a loss of privacy.”12 However, a loss of privacy alone (e.g., without an economic loss) does not constitute injury or damage that would support a claim.

In contrast, there is the decision in Fraley v. Facebook, Inc.13 Fraley concerned Facebook’s “Sponsored Stories” application of Big Data. Facebook is a social networking site that, as of 2011, had over 600 million members. Facebook generates its revenue through the sale of advertising targeted at its users. Sponsored Stories was an advertising practice whereby if a Facebook member “liked” an advertiser’s Facebook page or advertisement, the advertiser’s information and ad would appear on the member’s friends’ pages indicating that the member liked the advertiser. Essentially, it appeared that the member “sponsored” the advertiser’s ad on the member’s friends’ pages (thus, suggesting that the member was recommending to his or her Facebook friends to solicit the advertiser). The court found that the plaintiff’s claims against Facebook were viable and, distinguishing JetBlue, found sufficient claims for damage or injury. Specifically, the Fraley court stated that:

Here, by contrast, Plaintiffs have articulated a coherent theory of how they were economically injured by the misappropriation of their names, photographs, and likenesses for use in paid commercial endorsements targeted not at themselves, but at other consumers, without their consent. Unlike the plaintiffs in [other cases], Plaintiffs here do not allege that their personal browsing histories have economic value to advertisers wishing to target advertisements at Plaintiffs themselves, nor that their demographic information has economic value for general marketing and analytics purposes. Rather they allege that their individual, personalized endorsement of products, services, and brands to their friends and acquaintances has concrete, provable value in the economy at large, which can be measured by additional profit Facebook earns from selling Sponsored Stories compared to its sale of regular advertisements. . . . Furthermore, Plaintiffs do not merely cite abstract economic concepts in support of their theory of economic injury, but rather point to specific examples of how their personal endorsement is valued by advertisers. The [Second Amended Complaint] quotes Facebook CEO Mark Zuckerberg stating that “[a] trusted referral influences people more that the best broadcast message. A trusted referral is the Holy Grail of advertising.”14

Thus, by recognizing the economic value of member-sponsored advertisements but failing to obtain members’ consent, Facebook’s use of Big Data analytics created a damage model for plaintiffs.

The immediate lesson of these two cases is that a plaintiff must be able to show a damage or injury to successfully present privacy claims. However, there is also a greater lesson. In both JetBlue and Facebook, there was a failure of transparency. Information was gathered and used for purposes that the consumers neither had knowledge of nor permitted. In JetBlue, the gathering and transfer of the information was admittedly against the stated privacy policy of the company. Such violations of stated policies combined with the failure to inform consumers regarding the use of their information is a sure recipe for litigation. Indeed, as the public becomes more educated concerning the amount of information gathered and its uses, it has been more likely to bring lawsuits to limit the use of that information. Moreover, when information is provided to a commercial entity that is then transferred (without notice or permission) to a governmental interest, public concern and the risk of litigation are heightened. This is best and most recently illustrated by the NSA scandal regarding the monitoring of the public’s use of the Internet. It is a cautionary tale to companies who have harnessed the power of Big Data: Be clear regarding what information is obtained, how it will be used, and whether (and in what circumstances) it will be transferred. Failure to do so will result in increased exposure to litigation.

6.5 Does the Use of Big Data Analysis Have an Impact on Health Issues?

As discussed in Chapter 4, “Privacy and Big Data,” the benefits and dangers of using Big Data analytics may be most dramatic in the health field. This is not only because of the very personal and sensitive nature of the data but also because of the vast amounts of data involved as almost all people have entered the health marketplace in some way. The key to limiting risk exposure in the health context, as noted previously in this chapter, is deidentification.

London v. New Albertson’s, Inc. is a good example of this concept.15 The London case was primarily based on claimed violations of California’s Medical Information Act (CMIA), California Civil Code Section 56 et seq. Factually, the suit concerned the alleged sale of pharmacy customer prescription information to DMCs, which used that information for marketing purposes. New Albertson’s owned several stores that contained pharmacies. London had his prescriptions filled at one of those stores. According to the allegations of the suit, the DMCs installed software on the pharmacies’ mainframe computer servers that captures and collates patient prescription information as it is transferred to the DMCs’ offsite computer servers. The software deidentifies the prescription information and assigns a number to each patient to allow correlation of that information without individually identifying patients. Once the DMCs harvest the deidentified data, they combine it with proscriber reference information and sell this information to pharmaceutical companies, which in turn use it to structure drug-marketing programs directed to physicians.16 The court found that the plaintiff had not made a viable claim of violation of the California Medical Information Act (CMIA) because the information was deidentified; therefore, there was no transmission of “medical information” as defined by the CMIA because the information could not be traced back to the individual once deidentified. While the Court’s holding in London speaks to the risk-limiting value of deidentification of health information, it is important to note that London was allowed to amend his claims to assert a more viable claim. Further, it is important to note that the deidentification took place after the information was transferred by New Albertson’s to the DMCs, thus there might be a viable claim for the transfer to the DMCs given the lack of consent for the use of the information by London.

Another instructional case regarding the use of health data is IMS Health, Inc. v. Sorrell.17 There, the appellant challenged a Vermont statute banning the sale, transmission, or use of prescriber-identifiable data (PI data) for marketing or promoting a prescription drug without prescriber consent. The appellate court found the statute was unconstitutional as a commercial speech restriction. In so doing, the court noted that the data was deidentified; therefore, there was no great harm that raised the state’s interest in the statute’s application. Notwithstanding the court’s ruling, there were important points raised by the dissent, including the issue of when the information was deidentified:

Accordingly, before a detailer ever sets foot in a doctor’s office—that is, before the commercial speech the majority focuses on ever occurs—at least three events take place: first, a pharmacy gathers information from patients seeking to fill prescriptions; second, it collects and sells that data to third parties, principally, “data vendors” or “data miners” such as appellants here; and third, these data miners repackage that data and license it to pharmaceutical companies.18

Thus, the dissent noted there could be significant litigation risks in providing health-related Big Data to third parties prior to deidentification—whether or not the information was later deidentified for commercial purposes. There are two lessons here. First, given the sensitive nature of personal health information, there are great risks of failing to comply with privacy standards. Second, the key to reduce risk exposure is not only the use of deidentification but also when that process is employed. If the personal health information is transferred improperly, subsequent deidentification will not serve to protect a company from the risk of litigation.

6.6 The Impact of Big Data on Discovery

Putting aside potential liability exposure that might arise from the use of Big Data analytics, companies should also consider the impact of Big Data on discovery. This impact is twofold. First, Big Data has an impact on the amount of information that might be subject to discovery in litigation—especially if Big Data analytics play a part in company business strategy. Second, Big Data can be used in the context of discovery to assist in searching for relevant evidence. These points are discussed in detail in Chapter 11, “Big Data Discovery,” but are touched on in this section.

The first point was discussed at length in Chevron Corp. v. Weinberg Group.19 In Chevron, the court was dealing with a discovery motion related to the review of privilege in a wide-ranging environmental matter. The court noted that “. . . in the era of ‘b’ig ‘d’ata, in which storage capacity is cheap and several bankers’ boxes of documents can be stored with a keystroke on a three inch thumb drive, there are simply more documents that everyone is keeping and a concomitant necessity to log more of them.”20 The judge further noted its own limited capacity to review the volume of information produced by Big Data discovery:

In an earlier time, the insufficiency of the log defaulted to in camera review by the judge. Yet, in case such as this, the sheer number of documents on a log may make that impossible. Here, I would have to review 9,171 pages of documents. That seems inconceivable given my advanced years. In all seriousness, a judge, unlike lawyers, who have resources for culling through documents, cannot use technology-assisted review to do the review more efficiently.

The discussion in Chevron highlights the two issues regarding Big Data analytics in discovery. First, in the era of Big Data, discovery will necessarily cover huge amounts of data, which in turn can be very expensive. Indeed, court decisions indicate that the cost of e-vendors to review such data is not a recoverable cost.21 Thus, the expense of data review is borne by the company itself—even if it successfully defends against the litigation. Second, if a company has adopted the use of Big Data analytics, it should design its systems and programs to provide for the possibility of litigation. This means designing methods to determine what analytics are used and preserve the results of the programs—particularly if those analytics support strategic business plans.

Notes

1. Authors Guild, Inc. v. Google, Inc., 2013 U.S. Dist. LEXIS 162198 (So. Dist. of NY, November 14, 2013).

2. Id., 2013 U.S. Dist. LEXIS 162198, at *9–11.

3. Id., 2013 U.S. Dist. LEXIS 162198, at *27–28.

4. PeopleBrowsr, Inc. v. Twitter, Inc., 2013 U.S. Dist. LEXIS 31786; 2013 WL 843032 (U.S. Dist. Ct., Northern District of CA, March 6, 2013).

5. Id., 2013 U.S. Dist. LEXIS, 31786, at *2–3.

6. Tiffany (NJ), Inc. v. eBay, Inc., 576 F.Supp.2d 463 (U.S. Dist. Ct., So. Dist. of NY, July 14, 2008).

7. Id., at 576 F. Supp.2d, 463, 491–492.

8. Id. 576 F.Supp.2d 463, at *470.

9. Id., 576 F.Supp.2d 463, at *492.

10. In re JetBlue Airways Corp. Privacy Litig., 379 F.Supp.2d 299 (U.S. Dist. Ct., Eastern Dist. of NY, August 1, 2005).

11. Id., 379 F.Supp.2d 299, at *304–305.

12. Id., 379 F.Supp.2d 299, at *326.

13. Fraley v. Facebook, Inc., 830 F.Supp.2d 785 (U.S. Dist. Ct. Northern Dist. of CA, December 16, 2011).

14. Id., 830 F.Supp.2d 785, at 798–800.

15. London v. New Albertson’s, Inc., 2008 U.S. Dist. LEXUS 76246; 2008 WL 4492642 (U.S. Dist. Ct., So. Dist. of California, September 30, 2008).

16. Id., 2008 U.S. Dist. LEXIS 76246, at *2–3.

17. IMS Health, Inc. v. Sorrell, 630 F.3d 263 (Ct. of Appeals, 2nd Cir., November 23, 2010).

18. Id., 630 F.3d 263, at 283.

19. Chevron Corp. v. Weinberg Group, 286 F.R.D. 95 (U.S. Dist. Ct. D.C., September 26, 2012).

20. Id., 286 F.R.D. 95, at 98 – 99.

21. See, e.g., Race Tires AM., Inc., v. Hoosier Racing Tire Corp., 674 F. 3d158 (U.S. Ct. of Appeals, Third Circuit, March 16, 2012.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset