Chapter 7

Licensing Big Data

Aaron K. Tantleff

7.1 Overview

This chapter discusses the licensing of databases. A license is nothing more than a contract between a licensor and licensee that defines the scope of activities a licensee may engage in with regard to the licensed database (e.g., use the data solely for internal use, distribute limited segments to others, combine the database with other data, etc.). Licenses are also used to ensure proper monetization of the data being licensed. Licenses in which a company will grant a third party the use of their database are referred to as “outbound” licenses. Licenses in which a business will be on the receiving end of a database license granted by a third party are called “inbound” licenses. Both types of licenses are discussed in this chapter.

One of the greatest mistakes in license agreements involving Big Data is the attempt to use a traditional license agreement (e.g., a form agreement used for licensing software or other forms of content) to govern a license of Big Data. As discussed in this material, a traditional license agreement is generally not appropriate and may result in a lack of adequate protection for the database, exposure for both the licensee and licensor, and a failure to realize appropriate revenue from exploitation of the data.

Under traditional database licensing agreements, the licensor was either the owner or, itself, a licensee (with the right to sublicense to others) of a database. The licensor would enter into an agreement with a licensee granting access to the database. Traditionally, the database was a structured set of data, made available either on a subscription basis or as a data feed. Under the traditional model, the license granted to the database was generally limited in scope; for example, to a defined set of data or for certain purposes. The licensee generally had a clear understanding of the data being made available to them and what they could do with it. However, it is not as straightforward with respect to Big Data. The data may consist of data that was generated by the licensor itself, from users and other third parties from which the licensor collects information, data licensed from third parties, and data that was scraped from the Internet, including via various social media tools. Accordingly, the licensor likely does not own or have an explicit license to all of the data it may offer to a licensee, and it is highly likely that a licensor may not be able to obtain one. Despite the absence of ownership or an explicit license to the data it is offering to license, a licensor may still have intellectual property rights in the data, which may permit the licensor the right to grant a license.

When drafting or negotiating a Big Data license agreement, licensors and licensees alike should consider certain key issues:

  • Contractual and other legal protections for databases;
  • Ownership of the data;
  • Ensuring the scope of the license balances the licensor’s desire to limit the scope of the rights granted as compared with a licensee’s desire for expansive rights designed for a maximum opportunity to exploit and mine the database;
  • Anonymization of the data;
  • Confidentiality, both as a protection of Big Data and of licensee’s information;
  • “Salting” of the database or the use of fake data to uncover unauthorized use and copying of database;
  • Each party’s rights of termination;
  • Limitations of liability governing each party’s responsibility for damages;
  • Fees to be charged and protection of the licensee from “fee creep”;
  • Audit rights to ensure proper use of the database;
  • Warranties; and
  • Indemnification obligations.

In the remainder of this chapter, each of these issues is discussed from the perspective of both the licensee and the licensor.

7.2 Protection of the Data/Database under Intellectual Property Law

Over the years, there have been several unsuccessful attempts to create a new intellectual property right expressly designed to protect databases. In the absence of that clear right, database licensors have had to rely on a somewhat imperfect combination of copyright and trade secret law to protect their data. We say “imperfect” because neither law provides complete protection (e.g., copyright only protects the compilation of the database as a whole and trade secret law only protects databases that are not generally known and are the subject of efforts to protect their confidentiality). That said, the protections afforded under copyright and trade secret law should not be minimized. Each affords the licensor the ability to recover potentially substantial damages for misuse of its database. In some instances, the licensor may also be able to recover statutory damages (i.e., damages specified by law, without having to actually establish the amount of damages actually incurred by the licensor) and its attorney’s fees and costs. Recovery is even permitted when the misuse was “innocent” (i.e., no malicious or wrongful intent need be shown to recover potential damages in some cases).

7.2.1 Copyright

Copyright protection affords the creator (referred to as the “author”) of an original work with significant protections, such as the ability to control who has access to the copyrighted materials and how the copyrighted materials may be accessed, used, and modified. Copyright protection also provides the creator with the ability to take legal action against a party who improperly accessed, used, or disclosed the copyrighted material. The individual elements of data that make up a database are not generally copyrightable, but their compilation into a database is copyrightable.

A famous example illustrates the approach to copyright protection for databases. Consider someone’s name, address, and telephone number. This information, standing alone, is not copyrightable. It is fact, not a “work of authorship” protected under copyright law. Now, consider the assembly of thousands, perhaps tens of thousands, of those names, addresses, and telephone numbers into an ordered database (i.e., a phone book). While copyright will not protect the individual entries in that phone book, it will protect the resulting compilation/database in the form of the phone book. The same approach has been applied to afford many other types of databases copyright protection.

7.2.2 Trade Secrets

In some circumstances, the contents of a database may represent the trade secrets of a licensor. Trade secrets are governed by state law, and all states (excluding New York and Massachusetts), the District of Columbia, Puerto Rico, and the US Virgin Islands have adopted a variation of the Uniform Trade Secrets Act. A trade secret is a set of information or a compilation of information that is not generally known or reasonably ascertainable by others, by which a business can obtain an economic advantage over competitors or customers, and is the subject of reasonable efforts by its owner to maintain its confidentiality.

To ensure their databases have potential protection as a trade secret, licensors must include contractual protections to ensure the database is held in confidence and not disclosed to unauthorized parties. Licensees must be careful when accessing databases where the licensor is claiming trade secret protection. In many cases, the licensor may require the licensee to abide by certain safeguards that the licensee may not have the ability to comply with absent significant costs. However, in some instances, a licensee’s standard information security practices are no less protective than the licensor’s requirements. As a result, licensees need to thoroughly review the licensor’s information security requirements and what they require. In an effort to ensure all “reasonable” efforts are employed to protect their information and to avoid having to review a licensee’s information security standards, licensors often include generic language in their database license agreement with a reference holding the licensee accountable to the requirements as set forth in the licensor’s standard information security guidelines. If a licensee does not carefully review these additional terms, it is possible that a licensee could be in breach of the license agreement the moment they accept it. Licensees should also look out for licensors who attempt to provide information security requirements that are inconsistent with the nature of the data licensed.

7.2.3 Contractual Protections for Big Data

As a complement to the intellectual property protections discussed, contractual limitations should be included to further protect databases from unauthorized use and disclosure. Most important, those limitations should include a clearly drafted license grant defining the rights of the licensee, a clause defining the licensor’s ownership rights, and a properly worded confidentiality clause. Each party—licensor and licensee—will want to ensure these protections adequately represent their needs and expectations in the proposed license agreement. These and other contractual protections are discussed in the following sections.

7.3 Ownership Rights

One of the key provisions in any license agreement is the language defining the parties’ respective ownership rights with regard to the database. Data has intrinsic value. Owning the data imparts the ability for a party to control the right of a third party to access, create, modify, use, repurpose, make available to others, and sell data, as well as the ability to transfer and assign these rights to others. Accordingly, licensors and licensees regularly debate this issue. The licensor’s desire is to control all access to and use of the database, as well as all modifications, enhancements, and other revisions (sometimes called “derivative works”) of the database. This must be counterbalanced by the licensee’s desire to fully exploit the database for its own purpose, which may be internal or external. For example, a licensee may license a database and then spend hundreds of hours mining the database for information generating analysis and new sets of data based on the derivative works of such data mining and analysis. The question will arise regarding whether the licensee or the licensor will claim ownership of derivative works such as this.

The owner of the data is generally the party that creates, generates, and collects the data. With respect to the traditional structured database, determining ownership is generally a straightforward process. However, with the rise of Big Data, resolving questions of ownership become more difficult, particularly because of the manner in which data is collected and generated. Big Data is the compilation of massive amounts of data collected from a variety of places using varying methods of collection. Ownership of Big Data is further complicated by the fact that, as data is collected, stored, and analyzed, new data is created based on the combination of the different elements from the database. In many cases, this new data is as valuable as, if not more valuable than, the data on which it is based.

There are two main principles of ownership fought over between licensors and licensees. Failure to address either of these up front may result in disastrous outcomes for all parties. The two most significant issues of ownership is who is the owner of the underlying, licensed data, and who is the owner of the derivative works produced as a result of the licensee’s analysis of the licensed data.

With respect to the underlying data, a licensor should state, and the licensee should confirm, that as between the licensee and licensor, the licensor is the owner of the database, its content, and any algorithms contained therein. If this is not addressed up front, some licensees may challenge the licensor’s ownership rights in and to the data or attempt to copy the data, claiming the licensor does not have any legal right to prevent a third party from copying the database and/or the database deserves no legal protection.

On the other hand, many license agreements are silent on the last concern; ownership of new data and other derivative works created from the exploitation of the licensed data. Failure to address this may interfere with or prevent the licensor from being able to license the database to other parties or frustrate a licensee that invested tremendous resources to mine and analyze the data. Given all the complications of Big Data, it is of tantamount importance to a licensor that it ensures that there are no encumbrances on its ability to license and derive profit from its database. Depending on the nature of the data and the particular analysis performed, general principles of common law provide that, absent a contractual agreement to the contrary, and despite the ownership of the underlying data, ownership of the derived data may default to licensee. It is also possible that ownership of the derivative works may be jointly owned by the licensee and licensor because the licensee performed analytics on data owned by licensor. While in many instances licensors are willing to grant ownership of such derived works to the licensee, some may request that such ownership is limited to the actual derivative works created between the parties as the licensor be unable to control or account for the analytics performed by other licensees or by other third parties. In other words, some licensors are concerned that two or more licensees may seek to perform the same or similar sets of analytics and that as a licensor it does not control licensee’s authorized use of the data and cannot guarantee that other licensees are not performing the same or substantially similar types of analytics or that more than one licensee may have created similar derivative works.

If the licensor retains all rights in the derivative works, the licensee needs to take extra caution when combining the licensed data with that of its own. Where a licensor retains ownership in the derivate works, by combining a licensee’s data with that of the licensor, then arguably a licensee may, unintentionally, assign ownership in licensee’s own data to the licensor. If the licensor is granted a license to all derivate works, then the licensee may have inadvertently granted a license to the licensor in order to access and use the licensee’s data. Either scenario could possibly result in a potential loss of the licensee’s intellectual property, exposure of the licensee’s data if not properly protected by the licensor’s obligations of confidentiality, or a breach of third-party rights because of the use of the data beyond the consent or authorized use of such data.

On the other hand, licensees need to consider the efforts required to mine the database and whether the licensor’s ownership rights, and thus control rights, would interfere with the licensee’s ability to exploit the data. Licensees also have concerns as to their ability to restrict access to the data they generate. For example, what is to stop a licensor from selling a licensee’s results to the licensee’s competitors? Where a licensor demands ownership of a licensee’s generated data, a licensee must ensure that it has a fully paid, royalty-free license without restriction on the use of the results.

Some licensees try to avoid this discussion by taking a different approach. When licensing Big Data, the volume and variety are so massive that some believe that it is highly unlikely that two unrelated entities would license or access the identical set of data or perform the same set of analytics over the identical dataset. Therefore, some licensees avoid the ownership debate by restricting a licensor’s ability to provide the identical dataset to two different competitor companies. Such a provision may provide comfort to each of the parties as a licensee does not have to worry that its investment in licensing and mining the data will be wasted, and a licensor has little risk as the nature of Big Data databases change so rapidly that there is little chance of this occurring. Whether a licensee or licensor, this provision is highly dependent on the makeup of the database and how a licensor grants access to its database.

7.4 License Grant

The license grant is one of the most significant provisions in the license agreement. It sets the tone for the entire engagement and defines the scope of the license, the restrictions placed on the licensee, the extent of the licensee’s authorized use, and any other licensee obligations. This is often where the licensee and licensor first start to discuss the intentions that each party has with respect to the data. This analysis should also include a discussion of the rights and use of any results, analytics, or algorithms created by or to be used with the database.

In developing the license grant, it is important for both parties to think critically regarding all aspects of what is required to enable and protect both the licensee and licensor. One of the first questions any licensee should ask in contemplating a proposed license grant is whether this license grant will afford the licensee the ability to do everything currently contemplated with the data and whether and how to address predictable future uses.

Licensors should understand how they intend to license the applicable database. Generally, a licensor will only grant a nonexclusive license (i.e., the licensor is not precluded from licensing the same database to others, including potential competitors of the licensee, for the exact same purposes) for use of their database. A licensor generally intends to capitalize on its Big Data investment by licensing access to the database to as many licensees as possible. In addition, not only does a licensor want to reserve the right to offer the same license to other potential licensees, some licensors want to ensure their own right to mine the data. In addition to granting access to the database, some licensors may package and sell their own reports based on analytics performed by the licensor. Notwithstanding all this, some subsets of Big Data have the potential to provide a significant competitive advantage to a licensee, but only if such data is not provided to a licensee’s competitors. If it is, such data may have little or no economic value to a licensee. In such instances, the licensee should request an exclusive license or a sole license, allowing the licensor to utilize and analyze the data solely for its own internal purposes.

Licensors, as discussed further in the section on fees, may wish to consider a licensee’s ability to grant sublicenses to third parties to use the database, its ability to distribute the data, as well as any results from the licensee’s analysis of the database. Generally, licensors are hesitant to grant a licensee the right to offer sublicenses as such sublicensee would not be under contract directly with the licensor. Granting sublicenses may also affect the licensor’s ability to fully exploit the revenue potential derived from granting licenses to the database. However, in some cases, a licensee may not be the party best suited to analyze the data and therefore seeks to grant a sublicense to a third party to assist the licensee with the analysis. Failure to allow a licensee to grant such access may diminish the value of the license to a potential licensee. To address this concern, a licensor could establish reasonable limitations granting the applicable sublicense while restricting the sublicensee’s performance to the benefit and on behalf of the licensee.

Licensors may also wish to carefully consider whether to allow licensees the ability to combine the licensed database with the licensee’s database or any other unauthorized or preapproved data. There are a number of potential concerns, including for reasons affecting ownership, intellectual property rights, consent, and privacy. If the licensor is not careful in restricting what types of data are combined, the licensee could expose the licensor to significant liability arising from the combination of the licensor’s data with other data. For example, the combination or aggregation of data may exceed the authorized scope of the licensor’s data.

Data that has been collected and provided via a specific database may have been collected under a specific set of facts and circumstances, each with specific consents. These consents may limit the use of the data in a specific manner. When combining sets of disparate data, resulting in the aggregation of data into larger datasets, it becomes increasingly difficult to determine whether the use of such data is within the scope of the applicable original consent the data was collected under. It is also possible that by combining data with other databases one could engage in secondary activities or analyses that were not possible with the smaller dataset. With these new, secondary activities, new uses of the data become available, and one needs to consider whether these new uses were anticipated and consented to at the time the data was collected. If not, it might be possible that by aggregating the data, any such secondary use of the data would be out of the scope for which the consent was granted. It is also important from both the licensee’s and licensor’s perspective that, although the licensee’s use of data or performance of certain analytics was allowable as originally licensed, the new, aggregated dataset may result in infringing activity. In such case, the licensee and licensor may be subject to potential third-party claims of infringement. For example, given that algorithms are protectable via patent as a business method claim, it is possible that the use of a certain algorithm on an aggregated set of data that was not previously cleared could result in a claim of patent infringement of such third party’s business method patent.

As with every license agreement, regardless of how tightly drafted, there will always be something that was unanticipated or forgotten, or new abilities and rights with respect to the data will emerge that could not be anticipated. Therefore, all well-drafted license grants should end with the statement that “all rights not expressly granted in this provision are expressly reserved by licensor,” ensuring that the licensor does not unintentionally give up its rights.

Given all the potential issues and liabilities, it is critical that the license grant does not exceed the scope of permitted use for the data. Any license grant provision should be clear and expressly state that the license to the data is limited to the authorized use as set forth in the agreement or in such other document attached to and made part of the license agreement. However, the need remains to be balanced against the broad and ever-increasing opportunities for use of Big Data and a licensee’s desire to fully exploit the licensed data as intended by the license, subject to any legal limitations. Accordingly, a licensee should ensure that the license grant is sufficiently broad to allow the licensee to exploit the database for its intended manner.

7.5 Anonymization

Both licensors and licensees face significant potential liability and public relations challenges when working with Big Data sets given the risk of reidentifying individuals. Big Data has become so large that no matter what one does to deidentify individuals, Big Data enables anyone with the appropriate tools to potentially reidentify any deidentified individual.

Various regulations address the issue of anonymization and deidentification of data. The Health Insurance Portability and Accountability Act (HIPAA) of 1996 addresses the issue of deidentified data by providing for an expert determination and a safe harbor if certain key pieces of information are removed from the medical record.1 The Gramm-Leach-Bliley Act also addresses the issue of anonymized data by defining personally identifiable financial information as that which “does not include information or blind data that does not contain personal identifiers such as account numbers, names or addresses.”2

Notwithstanding the regulatory limitations on the use of personal information and the restrictions on reidentifying and targeting individuals, there is a real risk and a precedent of people being able to take datasets that claim to be anonymized and deidentified and identifying individuals. Even with the stringent requirements promulgated by HIPAA that are designed to ensure that a deidentified health care record would not be identifiable, using just publicly available data, one could reidentify patients who were previously thought to be anonymous.

Depending on the circumstances, licensors should consider including the following provisions designed to minimize the licensor’s risk that an individual will be reidentified or targeted:

  • Limiting the licensee’s use of the database to datasets that have been anonymized;
  • Prohibiting a licensee from reidentifying any individuals or combining the dataset with other datasets that would enable any individuals to be reidentified;
  • Prohibiting licensees from using the data to take any action based on reidentified data;
  • Prohibiting licensees from using the datasets for unauthorized purposes; and
  • Requiring that the licensee notify the licensor in the event the licensee determines that any individual was reidentified or that it is determined that individuals could be reidentified.

Licensors should consider including a right to immediately suspend or terminate a licensee’s access to the database in the event the licensor has reason to know or suspect that deidentified individuals were or could be reidentified, thus compromising the database.

Licensees, on the other hand, who do not otherwise intend to try to identify individuals should be concerned that a licensor has provided them with data that does not properly deidentify the datasets, thus putting a licensee at greater risk, such as in the event of a breach. Licensees should consider provisions designed to:

  • Ensure that datasets provided are properly deidentified and comply with all applicable privacy and security laws;
  • Ensure that the licensor has the necessary rights to use and provide to the licensee the identifiable information, to the extent applicable; and
  • Provide the licensee with notice should the licensor discover that information provided is not properly deidentified or that it has reason to believe that such data could be reidentified.

7.6 Confidentiality

In addition to the intellectual property rights discussed previously and a well-worded license grant provision, licensors should impose a strong confidentiality obligation on their licensees to ensure the database is held in strict confidence. This is particularly critical to maintain any trade secrets in the database. The license agreement should require the licensee to acknowledge the database and its contents are the confidential information of the licensor and it may not disclose any information made available by the licensor, including the database, unless expressly authorized under the agreement or as otherwise required by law.

To avail itself of the protections of confidentiality, it is not sufficient for a licensor to claim the information as its confidential information. In fact, a licensor must treat such information as its confidential information, which includes employing and requiring the licensee to employ proper physical, administration, and technical safeguards to ensure the confidentiality of such information as well as to prevent the improper disclosure to or access by a third party. Often, this obligation is accomplished through the use of appropriate language in the license agreement. In some cases, licensors include an attachment stating their minimum information security requirements for a licensee to access the licensor’s confidential information. Additional limitations may include restrictions on the ability of the licensee to use, disclose, copy, or otherwise make such information available to a third party. As discussed in the trade section, licensees should carefully review any such information security requirements to determine the licensee’s ability to comply, the costs of complying, and whether such requirements are appropriate for the nature of the information being protected.

Licensees should consider requiring protection of their own confidential information. Depending on the nature of the engagement, a licensee may provide some of its own confidential information, such as certain data and other information, to the licensor in connection with an audit or otherwise in the performance of the agreement and should expect equal protection for their own confidential information.

Licensors generally restrict a licensee’s ability to use, share, and otherwise disclose and make the information available to third parties. In such cases, the licensor may consider the following additional restrictions and obligations with respect to the licensee’s ability to grant sublicenses under its license:

  • Provides only that information for which it is granted the right to disclose;
  • Provides the information only to those authorized parties to whom the licensee is allowed to disclose the information; and
  • Requires that any authorized third party with access is subject to obligations of confidentiality no less stringent than those set forth in the agreement between the licensor and the licensee.

Failure to ensure these protections may result in the licensor’s (and licensee’s, as applicable) loss of protection in the database, its content, and any algorithms associated with it. For example, if the licensor was claiming trade secret protection over a customer list or an algorithm but failed to require the licensee to limit its disclosure of such customer list or algorithm or does not require the recipient to maintain the confidentiality of such information, then it would be reasonable to conclude that a court would find that the owner of the trade secret, customer list or algorithm failed to take adequate protection, thus denying the owner protection of such customer list or algorithm.

7.7 Salting the Database

“Salting” a database is a common technique used by licensors to protect their database and detect unauthorized copying. It refers to seasoning the database with dummy or fake data that is difficult, if not impossible, to detect by others. Consider use of salting in the context of the example given regarding copyright protection for a telephone book. Because the telephone book comprises information entirely available publicly, if a competitor of the publisher of the telephone book publishes an identical listing of names and numbers, it would be difficult to prove the competitor simply copied the original book. But, if the original had been salted and one of those fake addresses showed up in the competitor’s book, it would be clear that the competitor copied the original book. This same principle can be applied to almost any form of database. The larger the database, the more difficult it would be for a third party to detect and remove the salted data.

A public example of salting a database was when Google started to suspect that Microsoft was copying the results of Google’s search engine to improve the results of Microsoft’s own Bing search engine. In an effort to confirm its suspicion, Google began to insert fake search results into its search engine and enlisted several engineers to run specific searches using Google via Microsoft’s Internet Explorer browser and enabled certain settings in the browser that sent information back to Microsoft. Soon thereafter, when running the same search on Bing, Bing began to return the same fake search results as Google. Google went public, claiming that Microsoft copied the search results and provided examples, including the use of the fake search results planted by Google.3

7.8 Termination

Licensors generally seek broad rights of termination with respect to Big Data license agreements. In particular, licensors typically seek the right to terminate the license agreement because:

  • Of the licensee’s use of the licensed data in excess of the rights granted under the agreement;
  • The licensor knows or suspects that deidentified individuals were or could be reidentified; or
  • Of the licensee’s breach of any of the privacy and security standards.

Licensees generally also seek broad termination rights in order to minimize their liability given that data license agreements generally have little to protect a licensee if, for example, the database is found to be unreliable or outdated, or becomes the subject of an infringement claim. In particular, licensees generally should consider the ability to terminate the license agreement because:

  • The data becomes the subject of an intellectual property infringement claim;
  • The licensee knows or has reason to believe that the licensor does not have the necessary rights or consents as required by the terms of the agreement or as necessary in order for the licensee to fully exploit the database as provided for under the agreement; and
  • Of the licensee’s convenience.

The last licensee termination right is often heavily negotiated, but it is often the licensee’s best defense against poor performance or poor-quality data. It also enables the licensor to walk away from an engagement where there is no perceived value or the licensee is no longer interested in the particular database.

7.9 Fees/Royalties

7.9.1 Revenue Models

Typically, a licensee will provide some form of compensation to a licensor for regular access to a database. However, the traditional means of compensation may not be the best approach for Big Data licensing given the differences in how the traditional, structured databases are valued and how Big Data is valued. In fact, setting rates for access to Big Data might be one of the most distinct differences between the traditional data license agreement and the data license agreement for access to Big Data.

Mining for value in Big Data is much like mining for gold. Both have the potential for significant value that is waiting to be discovered. Both require substantial investment, time, and resources to discover the wealth hidden in all the rubble. Many lured by the prospect of hitting it rich invested all they had in the hopes of striking gold. Unfortunately, for every story of someone who struck gold, there are multitudes of stories of those who invested everything and found nothing. Big Data mining is not much different. It can be resource intensive and costly and take a fair amount of time. To truly unlock the value of Big Data, the cost of entry may need to be lowered. However, if the cost of entry is not set properly, there will be a disincentive for the collectors and purveyors of Big Data to continue collecting it and making it available to others. Therefore, in creating new models for Big Data licensing, one must ensure that any lowering of the cost of entry must be designed so it does not sacrifice a licensor’s ability to profit and continue to invest and grow its database.

Consideration for the cost of entry is critical. It must be low enough that there are new licensees joining the ecosystem on a regular basis, yet high enough to ensure the licensors are rewarded for their investment. Licensees and licensors are at odds regarding how this should be accomplished. Licensees claim that mining Big Data is inconclusive, and the value has not yet been proven. As a result, some licensees avoid paying for access to Big Data and prefer to retain a consultant to advise on trends and other speculative matters or seek out other alternatives for gathering information. Although there have been many success stories regarding Big Data, generally licensors lack sufficient data points to determine the value in the large, unproven, ever-changing datasets.

Accordingly, other compensation models need to be considered. One approach is to liken of the speculative value of Big Data with that of certain early-stage, speculative patent and know-how licenses for research and development and other development purposes. In this case, a licensee would enter into a license agreement to gain access to certain information and intellectual property with the hope of turning that information into commercial value. Under such an arrangement, fees take the form of a low, up-front payment, with future royalties tied to the ability of the licensee to commercialize the intellectual property and know-how. In some cases, milestone payments are also included to ensure the licensee continues to develop the licensed intellectual property and not let it sit. The greater the value the licensee created or was able to extract, the greater the fees payable to the licensor, thus allowing the licensor to enable a low-risk cost of entry for the licensee yet maintain the ability to extract return from the licensed material.

Following this example, licensors of Big Data may wish to consider a relatively low up-front, initial fee to provide greater access to the data and to expand to a potentially greater pool of licensees. This can be tied to certain interim or minimum periodic payments to ensure the licensee continues to seek new commercial or internal value and properly incentivize the licensee to engage in productive use. Given the lower initial costs, the license grant should be tied to the ability of a licensor to receive royalties based on the extracted value or such other metric that is tied to the extracted or mined value.

Setting the fees in this manner may eliminate discussions between the parties regarding determination of the value of the licensed data as well as the typical argument that each party is under- or overvaluing the value of the database.

Another concern licensors have is how to ensure a licensee does not cannibalize the licensor’s ability to extract value by licensing access to the database to other parties. In some cases, licensees seek to monetize their own investment or lower the cost of access by turning around and granting sublicenses. The royalty scheme proposed herein would easily allow a licensee, subject to proper protections, to grant sublicenses without a licensor fearing loss of potential revenue. In this particular case, the licensee’s sublicensing of the database would represent a commercial value, of which the licensor would be entitled to compensation in the form of a royalty. Thus, depending on the nature of the database and the licensees, allowing the licensee to grant further licenses under this model would not cannibalize the licensor’s potential revenue stream associated with the database.

7.9.2 Price Protection

Once the license agreement is entered into, a licensee will lose significant leverage when it comes to price protection. Accordingly, a licensee should negotiate price protections when entering into the relationship. In particular, licensees should require a period of fixed fees, such as during the initial term of the license agreement, during which the licensor will be prohibited from raising the licensee’s rates. The licensor may, however, increase the rates at the start of a renewal period, subject to adequate prior written notice and the amount of such increase is capped at an amount equal to the percentage change in the Consumer Price Index (CPI) during the preceding calendar year and 4%, whichever is less.

It is not uncommon for licensors to claim licensees owe additional fees, such as fees for access to additional databases, sharing in the cost of network storage devices, and software required to access, process, and analyze the databases. Accordingly, it is critical that a licensee includes a statement that unless otherwise stated in writing and signed by the licensee, there are no other fees to be paid by the licensee in connection with the agreement.

7.10 Audit

Licensors should always consider including an audit in their license agreements permitting them (or a third-party auditor designated by the licensor) to inspect the licensee’s records and systems to confirm that the licensee’s use of the database complies with the scope of the licensee’s authorized use under the license agreement. In addition to the right to enter and inspect the licensee’s facilities and systems, it is important for licensors to require the licensee to properly maintain its books and records regarding its use of the database. These records should be kept throughout the term of the license agreement as well as for an appropriate period of time after the termination or expiration of the license agreement. Audits should cover the licensee’s use of the database, as well as the licensee’s security used to protect and secure the database or access thereto. This is a critical right designed to ensure protection of the licensor’s data. In the event the audit reveals noncompliance, it is typical for the costs of the audit to shift to the licensee and that the licensee be responsible for additional license fees to compensate the licensor for any excess use of the data. Depending on the nature of information licensed, some licensors have sought to include an indemnification directly within the audit provision. In addition to any other rights a licensor may have, including with respect to the right to seek indemnification under the license agreement, licensors may require a licensee to indemnify the licensor against any claims that may arise relating to the licensee’s compliance as determined by the audit. Even if the licensor never exercises its audit right, the threat is frequently sufficient to ensure licensee compliance with the terms of the agreement.

From the licensee’s perspective, audit rights can be problematic. Licensors have been known to abuse the audit process, conducting highly invasive audits that disrupt the licensee’s operations. Language should be added to the license agreement limiting the number of audits that can be conducted in a given period of time (e.g., once in any 12-month period) and making clear any audit must be conducted so as not to unreasonably interfere with or disrupt the licensee’s business. Licensees may also want to consider restricting a licensor’s ability to take multiple attempts at uncovering a licensee’s noncompliance by preventing a licensor from reauditing records that were previously audited and found to be compliant.

Because audits will almost certainly expose the licensor to confidential information of the licensee, the license agreement should include an appropriate confidentiality provision. If the licensee is a regulated entity, the licensee should consider refusing any on-site audit rights and limiting the audit to off-site review of the licensee’s records.

In some cases, the licensor may engage a third-party auditor whose compensation is based on whether it finds noncompliance. This type of compensation arrangement can create an adversarial relationship between the auditor and the licensee. Licensees should consider including language in the audit clause precluding such compensation arrangements.

Because the cost of the audit typically shifts to the licensee in the event noncompliance is found, licensees should revise the audit provision to ensure those fees do not become excessive. Although it is common to include language such as “fees must be reasonable,” this is frequently not enough. A better approach is to include language preventing the fees from the audit from exceeding some specified percentage of the noncompliance (e.g., “The costs of the audit shall not exceed 25% of the amount of any underpayment by the Licensee”). In this example language, if the licensee has used the database such that additional license fees of $10,000 are due, the amount of the audit costs may not exceed $2,500.00.

7.11 Warranty

Big Data is merely a collection of large datasets, often unverified and unchecked. Although the licensor may have created some of the licensed data, given the complexity and size of the Big Data databases, it is unlikely that all data contained therein was generated by the licensor and even less likely that one is able to verify the data. As such, licensors generally provide the database on an “as-is” basis and therefore are unlikely to agree to any protection with respect to a licensee’s use of the data, any errors in the data, or losses resulting from the use of the data. However, many of these provisions will likely depend on the actual data licensed.

One common theme often repeated by many licensors is that Big Data is provided or made available to others as a research tool. Generally, research tools are made available at the licensee’s discretion and advisement. The licensee is therefore generally responsible for determining the applicability and legality of the use of the dataset and any results in its sole and absolute discretion. Therefore, any losses or liabilities incurred by the licensee based on any action or inaction taken by the licensee, as between the licensor and the licensee, are those of the licensee.

Under a traditional license agreement, the licensor was often asked to warrant that:

  • it was the owner or licensee (with the right to sublicense to others) of the data it was licensing;
  • it owned or had the necessary rights and consents to grant access to the data to the licensee; and
  • the licensee’s intended use of the data is allowable.

However, with Big Data, given the vast amounts and variety of data within the licensed database and the variety of ways in which such data is collected, it is difficult to know with any certainty the nature and scope of data contained therein or what rights a licensor may have in the licensed data. Accordingly, licensees may express some caution when entering into agreements granting access to Big Data, especially where licensors are hesitant and even unwilling to provide certain warranties that licensees are accustomed to receiving under typical software and even some database licensing agreements.

Stemming from the traditional data license agreements, licensees often attempt to pressure licensors into providing (additional) warranties with respect to the nature of the database. However, even traditional “acceptable” data license warranties are somewhat problematic for licensors of Big Data. Typical warranties that may be found in a standard data license agreement may include:

  • The licensor has all rights necessary, including those of third parties, in order to grant the rights provided under the agreement;
  • (To the best of its knowledge as of the effective date), licensee’s authorized use of the data does not and shall not infringe the rights of any third party; and
  • The licensed data (to the best of licensor’s knowledge) does not contain any errors, and that licensor will promptly notify licensee of such errors and will promptly resolve any such errors in the licensed data.

Although these warranties may be considered reasonable for a typical structured data license, licensors should approach such warranties with caution when dealing with Big Data. Given how the data is collected, the nature of the data, how it is and what is combined with the data, general principles of intellectual property rights, consents granted by individuals whose data was collected, and applicable privacy policies in effect when the information was collected, granting any of these warranties may be highly problematic for a licensor, potentially subjecting the licensor to significant liability. Further complicating this issue is the fact that, in many cases, the licensor did not collect or generate the data being licensed but is merely a licensee itself from a third party. In some cases, a licensor may only be providing access to a third party’s database. In each case, the licensor may not have been granted equivalent warranties from its upstream licensors. Thus, it may not be reasonable to expect a licensor to take on the liability belonging to other parties.

Licensees should consider the nature of the license, the nature of the database, and the intended purposes of the license agreement and determine whether certain warranties would be applicable. The following series of warranties may be applicable for licensees of Big Data:

  • Licensor, to the best of their knowledge, has the necessary rights to provide or otherwise make the data available to the licensee;
  • Licensor is not providing any data to the licensee where licensor knows, or should reasonably know, that they do not have the rights to provide such data; and
  • The licensed data has not been manipulated by the licensor or other parties in such a manner as to render the data or the results of any analytics performed on such data questionable or worthless.

The final warranty just presented relates to the issue of data compression and salting of the database. Licensees should consider obtaining some reassurances that the data they are licensing has some value and is not missing potentially key bits of information or does not contain an amount of dummy data significant enough to render the database worthless. Given the size of Big Data, it is not uncommon for licensors to employ the use of lossy compression (i.e., a form of compression by which some of the original information is lost to reduce file size). Licensees should question the licensor on their use of such lossy compression and the use of dummy data. Although licensors are generally willing to discuss the use of compression mechanisms, many are not willing to discuss how, or even whether, they salt their database for fear of circumventing one of their intellectual property protections. Understanding this will enable their licensee to better determine the value of the Big Data licensed. In such instances, some licensees have been able to obtain warranties that any compression techniques applied will be lossless, enabling the licensee to reconstruct data in its original form.

Additional warranties that licensees may wish to consider with respect to the licensing of Big Data include warranties whereby the licensor warrants that:

  • The data is not corrupt;
  • The licensor did not insert malicious code; and
  • With respect to any “structured data,” the database is organized and formatted in a particular manner (which is disclosed to the licensee).

7.12 Indemnification

An indemnification is a contractual provision in which a party (the “indemnitor” or “indemnifying party”) promises to pay the losses and other damages incurred by the other party (the “indemnitee” or “indemnified party”) under certain conditions as set forth in the agreement. The most common form of indemnification found in license agreements is an indemnity for claims by a third party that the licensed materials infringe that third party’s intellectual property rights (e.g., the data in a database was copied without authorization from a third party and that copy infringes the third party’s copyrights). Similarly, it is common for the licensor to require an indemnity from the licensee protecting the licensor from claims and damages arising from the licensee’s use of the licensed materials in excess of the rights granted in the license agreement (e.g., the licensee is granted a license to use a database for its internal purposes but breaches the license by distributing the database to others, causing an infringement or other type of claim).

Given the nature of Big Data, the unanticipated uses of Big Data, and the risks and liabilities associated with the use and licensing of Big Data discussed in this book, most licensors of Big Data are hesitant or refuse to offer any form of indemnification to a licensee. Licensees have been able to achieve certain protection and/or indemnification given a variety of factors, including as a result of the licensee’s negotiating power, the relationship between the parties, and the experience level of the counsel and business team representing the licensor and licensee.

In some instances, a licensee may be well positioned for receipt of limited indemnification. In some cases, licensees and licensors enter into a license agreement for specific reasons, including with respect to specialized Big Data sets. Accordingly, it is not uncommon to receive a limited indemnification, generally subject to a cap on damages, that the licensor has all necessary rights and consents in the licensed database governing the licensed use and sublicense granted to licensee thereunder. In addition, the indemnification may expressly exclude, where appropriate, the delivery of the data and any technology, manipulation, alteration, or combination of the data.

7.13 Limitation of Liability

Almost every license agreement includes a limitation of liability defining the parties’ respective liability for damages. Limitations of liability typically have two parts: a disclaimer of all consequential damages (e.g., lost profits) and a cap on all other damages, which is typically linked to some portion of the contract. Licensors generally present a one-sided limitation of liability that protects the licensor and exposes the licensee to unlimited damages. This is common and frequently accepted by licensees.

Licensees generally request two types of changes to the limitation of liability: first that it be made mutual and second that, at minimum, the licensor’s confidentiality and indemnity obligations, if any, be excluded from all limitations of liability. If a licensor is inclined to grant mutuality, it must ensure that breach of the license grant or infringement of the licensor’s intellectual property rights by the licensee be excluded from the limitation of liability. Without those exclusions, the licensor has essentially sold its rights in the database for the value of the cap on damages. That is, if the agreement disclaims all liability for consequential damages and caps liability at one month of license fees and that language is made mutual, the licensor has just “sold” its rights in the database for one month of fees.

7.14 Conclusion

Although there are similarities between traditional license agreements and those used for Big Data, the key differences and issues highlighted in this chapter make clear that using a traditional license agreement is not appropriate for this new type of transaction. Big Data requires a fresh look at common provisions such as intellectual property ownership, indemnification, and the type and scope of license granted. Licensees and licensors can use this chapter as a checklist to mitigate risk in their Big Data license agreements.

Notes

1. Pub. L. No. 104-191, § 264 (1996), codified at 42 USC § 1320d; Standards for Privacy of Individually Identifiable Health Information, 45 CFR § 160 (2002), 45 CFR § 164 subpts. A, E (2002).

2. 27 CFR Part 248.

3. Danny Sullivan. Google: Bing Is Cheating, Copying Our Search Results. February 1, 2011. http://searchengineland.com/google-bing-is-cheating-copying-our-search-results-62914.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset