Timely Tech Takeaways

TechnologyNews

Listen

All Episodes

AI, Copyrights, and the Book Scanning Saga

We unravel Anthropic’s controversial approach to training AI on millions of books, the court battles it sparked, and what it all means for copyright law. Dive into the ethical, legal, and business shocks reshaping the future of publishing and artificial intelligence.

This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.

Get Started

Is this your podcast and want to remove this banner? Click here.


Chapter 1

Inside Anthropic’s Book-Building Machine

Charlie Vox

Alright, welcome back to Timely Tech Takeaways. I’m Charlie Vox, and as always, I’m joined by the one and only Liam Harper. Today, we’re diving into a story that’s got the publishing world, well, kind of losing its collective mind—Anthropic’s wild ride to build a massive AI book library.

Liam Harper

Yeah, and when Charlie says “massive,” he’s not exaggerating. We’re talking millions of books, right? Some of them, uh, not exactly acquired through your local indie bookstore, if you catch my drift.

Charlie Vox

No, not at all. Anthropic started out by downloading over seven million books from places like Library Genesis and Pirate Library Mirror. Basically, the Napster of books, but with a lot more copyright headaches. And, I mean, internally, even they were worried about the legality of it all.

Liam Harper

Yeah, and then they tried to clean up their act—sort of. They went on this book-buying spree, buying millions of physical books in bulk. But here’s the kicker: they didn’t just read them or, you know, put them on a shelf. They literally ripped the bindings off, scanned every page, and then tossed the originals. Destructive scanning, they call it. Sounds like something out of a dystopian novel, honestly.

Charlie Vox

It’s a bit brutal, isn’t it? I mean, I get the technical side—scanning makes it searchable, easier to feed into an AI. But the idea of buying a book just to destroy it, it’s, well, it’s a bit unsettling. And it’s not just about the tech, there’s this whole ethical layer. Like, is it okay to destroy millions of books for the sake of building a smarter chatbot?

Liam Harper

Yeah, and it’s not just the destruction. There’s the whole question of, “Does buying a book give you the right to turn it into data for an AI?” I mean, we’ve talked about this kind of thing before—like with music and film in previous episodes—but books have this, I dunno, almost sacred vibe for a lot of people. It’s not just content, it’s culture.

Charlie Vox

Exactly. And the fact that they discarded the physical copies after scanning, it’s not like they were preserving anything. It’s pure data extraction. But, you know, from a technical perspective, it’s efficient. From an ethical one, it’s, well, a bit of a minefield.

Liam Harper

And let’s not forget, the whole thing started because they wanted to avoid the legal mess of using pirated books. So, they just made a different kind of mess. Classic tech move, right?

Chapter 2

The Legal Showdown: Fair Use vs. Infringement

Charlie Vox

Which brings us to the courtroom drama. So, in June 2025, Judge Alsup handed down this ruling that’s, honestly, kind of a landmark for AI and copyright. He said that using copyrighted books to train an AI—if you bought them legally, mind you—is “exceedingly transformative.” That’s the phrase. And that means it’s fair use under U.S. law.

Liam Harper

Yeah, and “exceedingly transformative” is, like, the legal equivalent of a gold star. The judge basically said, “Look, the AI isn’t just copying the book, it’s turning it into something new—statistical relationships, generative capabilities, all that jazz.” So, no, the AI isn’t just spitting out Harry Potter word-for-word. At least, not on purpose.

Charlie Vox

But—and it’s a big but—if you’re using pirated copies to build a permanent digital library, that’s a no-go. The court drew a pretty sharp line there. It’s one thing to use a book you bought for training, it’s another to hoard a bunch of pirated stuff for, well, whatever you want later.

Liam Harper

Right, and the judge was clear: keeping pirated books for “potential future uses” isn’t transformative, it’s just, you know, copyright infringement. It’s like, if you buy a song, you can listen to it, maybe remix it for your own use, but you can’t just download the whole record store and say, “It’s for research!”

Charlie Vox

That’s a terrible analogy, but I get what you mean. And it’s interesting because this isn’t just about Anthropic. We’ve seen similar cases with Meta and others—some lawsuits getting tossed, some sticking. The legal landscape is, well, shifting under everyone’s feet.

Liam Harper

Yeah, and it reminds me of that old Google Books lawsuit. Remember that? Google scanned millions of books, and after years of legal wrangling, the courts said it was fair use because they weren’t just handing out free copies—they were transforming the content, making it searchable. But here, the difference is, well, the pirated part. That’s where Anthropic tripped up.

Charlie Vox

And the distinction really matters. If you’re using stuff you bought, you’re on much safer ground. If you’re using pirated material, you’re, well, probably going to court. Or at least, your lawyers are going to be very busy.

Liam Harper

And probably very rich, too. But yeah, it’s a precedent that’s going to shape how AI companies build their datasets. You can’t just scrape the internet and hope for the best anymore. Or, well, you can, but you might end up in front of Judge Alsup.

Chapter 3

Publishers, Power, and the Future of AI Training

Charlie Vox

So, let’s talk about why publishers are, frankly, freaking out. It’s not just about losing a few book sales. The whole business model is under threat. If AI can generate content based on their books, what happens to their revenue? Their digital engagement? It’s a bit of an existential crisis.

Liam Harper

Yeah, and it’s not just books. News publishers are getting hammered, too. There’s this case where a major news publisher saw a big drop in ad revenue because people were getting summaries and answers from AI instead of clicking through to the original articles. That’s a direct hit to their bottom line.

Charlie Vox

And the response has been, well, a bit of a scramble. Some publishers are suing, some are trying to cut licensing deals with AI companies. But the power dynamic is, let’s be honest, pretty lopsided. The AI companies have the tech, the data, and, often, the money. Publishers are just trying to keep up.

Liam Harper

Yeah, and there’s no standard pricing for this stuff. It’s like the Wild West out there. Some publishers are getting deals, others are getting left out. And the legal frameworks are, uh, still catching up. It’s a mess, honestly.

Charlie Vox

And it’s not just about money. There’s this bigger question of, what does it mean for creativity, for culture, if AI can just remix everything that’s ever been written? We touched on this in our music and film episodes, but with books, it feels even more, I dunno, personal?

Liam Harper

Yeah, and I don’t think we’ve seen the end of it. Publishers are going to keep fighting, AI companies are going to keep pushing, and the courts are going to be busy for a long time. But, you know, it’s not all doom and gloom. There’s a chance for new licensing models, maybe even new ways for authors and publishers to get paid. If, uh, everyone can agree on what “fair” looks like.

Charlie Vox

And that’s the million-dollar question, isn’t it? What’s fair in a world where AI can read—and rewrite—everything? We’ll be keeping an eye on it, and I’m sure we’ll be back with more updates as this saga unfolds.

Liam Harper

Yeah, and if you’re a publisher, maybe don’t throw out your legal team just yet. You’re probably gonna need them. Alright, Charlie, always a pleasure.

Charlie Vox

Likewise, Liam. Thanks for listening, everyone. We’ll catch you next time on Timely Tech Takeaways. Take care!

Liam Harper

See ya, folks!