Anthropic Agrees to Pay $1.5 Billion to Settle Lawsuit with Book Authors | GeekNews
News

Anthropic Agrees to Pay $1.5 Billion to Settle Lawsuit with Book Authors | GeekNews

laeyoung
2025.09.14
ยทNewsยทby Anonymous
#AI#LLM#Copyright#Lawsuit#Anthropic

Key Points

  • 1Anthropic settled with authors for $1.5 billion, the largest copyright payout in US history, agreeing to pay $3,000 per work for up to 500,000 titles and destroy illegally obtained datasets.
  • 2Crucially, this settlement establishes no legal precedent for AI training's fair use or data acquisition methods, meaning future similar cases will restart from scratch.
  • 3The agreement highlights the financial muscle required for major AI companies to navigate legal challenges, prompting wider discussions about data access for LLM development and content protection for creators.

Anthropic has agreed to pay \$1.5 billion to settle a copyright lawsuit with authors, marking the largest settlement in U.S. copyright lawsuit history. The agreement stipulates a payment of \$3,000 per work for an estimated 500,000 authors whose works were allegedly infringed. This initial \$1.5 billion fund is based on 500,000 works; if the number of identified works exceeds this, an additional \$3,000 will be paid for each extra work.

A central point clarified in the discussion is that the lawsuit's core contention was not the legality of training AI models itself, nor whether such training constitutes "fair use." Instead, the dispute revolved around Anthropic's unauthorized acquisition and use of pirated book data (specifically from sources like LibGen and PiLiMi) for model training. While training a model on lawfully obtained data might fall under fair use, the act of "pirating" copyrighted material from illicit sources is explicitly deemed problematic. The text highlights that "fair use" doctrine applies to the usage of *lawfully accessed* material, not to the initial, unauthorized acquisition of copyrighted content.

As part of the settlement, Anthropic is obligated to destroy all data obtained from LibGen and PiLiMi, irrespective of legal preservation demands. The agreement covers past infringement liability only for works included in an official "Works List" by August 25, 2025. This settlement does not resolve future infringements or issues related to AI-generated output. Crucially, the settlement establishes no legal precedent ("์„ ๋ก€") nor an admission of guilt ("์œ„๋ฒ•์„ฑ ์ธ์ •"). This implies that similar future lawsuits would have to be litigated from scratch. Many commentators suggest that such settlements, especially involving large corporations, are often chosen to avoid adverse legal rulings that could set unfavorable precedents or acknowledge culpability. This strategy is compared to Uber's initial operation without taxi licenses, relying on investment funds to cover fines and lobbying efforts.

From a financial perspective, despite the substantial \$1.5 billion payout, Anthropic's recent funding rounds, including a total of over \$27 billion raised since its inception, suggest this amount is manageable. Investors might view this as a strategic move to de-risk future operations by resolving potential legal liabilities, thereby enhancing the company's valuation and investment appeal within the industry. The settlement can be framed as creating a narrative mutually agreeable to both parties: "training is acceptable, but pirated data is the issue," thereby averting a legal ruling that AI training itself is illegal.

The discussion also touches upon the broader implications for the AI industry:

  • Cost of Data Acquisition: The high cost of licensing data for AI training ($3,000 per work) suggests that only well-funded large corporations like Anthropic might be able to afford legal datasets, potentially disadvantaging open-source AI initiatives that rely on more accessible (and sometimes illicit) data.
  • Feasibility of Legal Acquisition: The idea of legally acquiring and scanning "millions of books" is deemed impractical and too slow for the rapid development pace required by VC-backed AI companies. The efficiency of using readily available (even pirated) digital copies was a driving factor.
  • Content Protection for the Web: Ideas for protecting web content from AI crawlers while remaining accessible to humans are explored, such as "login walls," terms of service agreements, CAPTCHAs (acting as DMCA security measures), or offering content via paid APIs. However, the legal and technical complexity, especially regarding "fair use" and "transformative use" (where LLM training might be deemed transformative), makes such protections challenging to enforce effectively. For instance, if courts classify LLM training as transformative use, adding clauses like "LLM training prohibited" might not be legally enforceable, similar to how a musician cannot forbid sampling their music.
  • International Disparity: The settlement's impact might primarily be felt by Western AI companies, while AI industries in other countries, particularly China, might perceive a competitive advantage due to potentially fewer restrictions on data collection and usage.