Skip to main content

Last fall, to little fanfare, the U.S. Copyright Office granted an exemption to a longstanding restriction on digital access to copyrighted books and movies, allowing academic researchers to bypass encryption so they can apply sophisticated datamining techniques to contemporary books and films. These same techniques have yielded powerful insights in the financial, science and medical fields for decades because the materials they depend on are not generally protected by encryption backed up by federal law. As a result, researchers have been able to, for example, rapidly perform an overview of a mass of coronavirus literature.

Some film scholars may be able to use the Copyright Office’s exemption, taking advantage of it by purchasing DVDs and bypassing encryption. That would be a big win for our collective understanding of an important part of our culture, particularly given the global dominance of the United States film industry.

But for those wishing to study literature, the exemption has proved frustratingly unworkable. Virtually every e-book available on the market today is licensed with terms that prohibit bypassing encryption. So, while an academic breaking encryption for datamining no longer violates federal law, researchers could still be forced to retract a paper for failing to follow contractual terms, as has already happened to one paper about Covid-19 vaccine hesitancy. Also, researchers may be liable for money damages for violating the contractual terms.

That means that researchers in the humanities using text datamining techniques are still largely limited to the study of works in the public domain (i.e., before 1925). Imagine if a data scientist was limited to using population data from 1950, or if a medical researcher was prevented from conducting meta-analysis on DNA samples from the past 25 years.

While no one is likely to discover the cure for cancer by studying popular culture, this new copyright exemption has the potential to inform—and change—the cultural conversation in ways not previously possible. Given the enormous influence of American popular culture on our global society—not to mention our country’s ongoing reckoning with its history of racial injustice—this is no small thing.

Until the Copyright Office granted the exemption, section 1201 of the Digital Millennium Copyright Act (DMCA) prevented researchers from engaging in datamining of in-copyright works. The DMCA includes a provision that prevents anyone – including academics pursuing clearly legal research projects – from accessing copyrighted materials that are under a digital lock and key. Violators of the Act, which is meant to deter Internet piracy, face stiff criminal and civil penalties of up to $500,000 and up to 5 years in jail for the first offense and double the fines and jail time for the second offense. Even for a good cause, few academics are willing to go to jail in the pursuit of knowledge.

Scroll to Continue

Recommended Articles

To remove this barrier, 14 researchers, as well as two experts in academic publishing and the Association for Computers and the Humanities, a professional organization, submitted letters supporting a petition filed by Authors Alliance, a digital advocacy group for writers, with the assistance of the Samuelson Law, Technology & Public Policy Clinic at Berkeley Law (which I direct). The Copyright Office granted an exemption to bypassing encryption in October 2021, removing one barrier to research moving forward. This is progress.

But the problem remains that academics who want to engage in datamining of e-books are still largely blocked from doing so. Academics will not carry out research projects, however valuable, that are not publishable because conducting them requires violating contract law. Moreover, few academics will be willing to take on personal liability for tens or hundreds of thousands of dollars in damages for contract violations to advance their research agendas.

There are a few possible ways to ensure that academics can bypass encryption to conduct datamining, but each of them brings its own challenges. The best solution would be for Congress to protect researchers’ rights under copyright by passing legislation that guarantees that publishers cannot, via contract, limit what the law otherwise allows researchers to do. But Congress is plagued by partisan gridlock, and the content industry’s lobbying power is formidable.

States, also, could act. After all, they administer robust systems of higher education and have an interest in making sure academics can continue to do cutting-edge work. In a related controversy regarding the contract restrictions that publishers impose on libraries buying e-books, some have proposed that states regulate the terms of e-book licenses. Assuming this novel approach is successful, states could also consider legislating that e-book contract provisions forbidding academics from bypassing encryption to conduct datamining are likewise against public policy and unenforceable. But this would result in only piecemeal protections, as not all states are likely to take action.

Finally, large university systems could attempt to leverage their market power to insist that e-book contracts permit their faculty and students to bypass encryption for datamining. In some recent battles between publishers and university systems, universities have succeeded in obtaining more favorable contract provisions than those originally on offer. However, university collections tend to underrepresent the popular works that generate the most research interest among digital humanities scholars. Thus, large platforms providing these works—like Amazon, Apple, and Google—also should use their considerable negotiating leverage to ensure that the rights their users enjoy under law are not taken away by contract.

To be sure, some authors and publishers worry that “rogue actors” will crack encryption on e-books and then make them available for free on the internet, depriving authors and publishers of compensation. But this concern has been addressed adequately. The Copyright Office already requires academic researchers to use strict security measures to safeguard e-books that have been unlocked for text datamining. Academic researchers routinely secure sensitive research data ranging from individuals’ medical data to national security information—surely these security measures are more than sufficient to secure e-books as well.

One thing is clear: Datamining is a valuable research technique across many spheres of learning. The U.S. Copyright Office finally opened the door for American academics to engage in this 21st century technique by permitting researchers to bypass encryption on in-copyright works, but outmoded publishers’ polices are keeping this potential source of cultural advancement locked firmly in the past.