A group of news organizations led by The New York Times will take ChatGPT maker OpenAI to federal court on Tuesday, potentially determining whether the tech company should face the publisher in a high-profile copyright infringement case. be.
Three publisher lawsuits against OpenAI and its financial backer Microsoft have been combined into one lawsuit. The three cases are led by The Times, the New York Daily News, and the Center for Investigative Reporting.
While other publishers, including The Associated Press, News Corp., and Vox Media, have reached content-sharing agreements with OpenAI, the three litigants in this case are going the opposite route and going on the offensive.
Tuesday’s hearing centered on OpenAI’s motion to dismiss, a key step in whether a judge will clear the case and proceed to trial or dismiss it.
The publisher’s main argument is that the data underpinning ChatGPT includes millions of copyrighted works from news organizations, and that the articles the publisher alleges were used without consent or payment. The publisher claims that this constitutes massive copyright infringement.
“We have to follow the data,” Times attorney Jennifer Maisel said in court Tuesday. “It’s like chasing money in a criminal case.”
And according to the data, ChatGPT and Microsoft are profiting from journalistic works that are scanned, processed and recreated without payment or consent, the publisher’s legal team argued. Microsoft has integrated OpenAI technology into its Bing search engine.
“This is a substitute,” said Ian Crosby, a lawyer for the Times, meaning ChatGPT and Bing have become substitutes for the publisher’s original work for some people. Proving this point is the key to winning a copyright infringement lawsuit.
Crosby said in court documents that OpenAI’s “illegal use of The Times’ copyrighted material to develop artificial intelligence products that compete with The Times threatens The Times’ ability to provide its services.” ”, he wrote in more detail.
For OpenAI, “it was very lucrative to use other people’s valuable intellectual property in this way without paying for it,” he continued.
OpenAI argued that the vast amounts of data used to train its artificial intelligence bots were protected by “fair use” rules. This is a principle of American law that allows copyrighted material to be used for purposes such as teaching, research, and commentary.
To pass the fair use test, the work in question must be a new version of a copyrighted work, and the new work must not be able to compete with the original in the same market.
OpenAI and Microsoft’s legal teams are working with Sidney, appointed by President Bill Clinton, to explain how large-scale language models like ChatGPT work to make the case that the use of text is transformative. I explained this to Judge Stein.
Lawyers for the companies say that once data is fed into OpenAI’s artificial intelligence models, it is broken down into a series of “tokens,” or units that make data analysis more manageable. Eventually, the model will be able to recognize patterns.
OpenAI attorney Joseph Gratz said regurgitating entire articles about how ChatGPT operates is “not the intent or feature.”
“This is not a document retrieval system; it’s a large-scale language mode,” Gratz said.
Mr. Glatz argued that the instances of infringement cited by the Times in its lawsuit would have only occurred after “thousands, tens of thousands” of inquiries. Essentially, Glatz argued, publishers are tricking chatbots into spitting out text retrieved from the publisher’s website.
Microsoft says the Times uses its “power and megaphone” to take on threatening technologies
Lawyers for Microsoft, OpenAI’s largest investor, wrote in a motion to dismiss that it is not illegal for OpenAI to incorporate its journalistic writing.
“In this case, the New York Times is using its power and megaphone to challenge the latest significant technological advance: large-scale language models (LLMs),” they wrote in a court filing. I wrote this article and explained the technology behind ChatGPT. “Despite the Times’ claims, copyright law is no more a barrier to LLMs than it is to VCRs (or player pianos, photocopiers, personal computers, the Internet, or search engines).”
But news organizations say that ChatGPT’s global success is not only partially dependent on scavenging its treasure trove of copyrighted articles, but that ChatGPT is now virtually a trusted source of information. They claim that they are competitors.
This was part of the court’s discussion on Tuesday, and another aspect of how ChatGPT works was up for debate. This is known as the “Search Enhancement Generation.” In layman’s terms, it integrates the latest, more specific information from the web into the chatbot’s answers.
Some of this information, such as most news articles, may not have been part of the chatbot’s training data, but may still appear in ChatGPT’s output.
“This allows for free riding,” said Stephen Lieberman, a lawyer for the New York Daily News, which means that rather than visiting a publisher’s website, the company can create an OpenAI reproduction of a newspaper article. It refers to the reader who views it.
What happens next?
According to the complaint filed by the Times, OpenAI could be sued for billions of dollars in damages for illegally copying and using newspaper archives. The lawsuit also seeks to destroy the ChatGPT dataset.
That would have dramatic consequences. If the publisher wins and a federal judge orders the data set destroyed, OpenAI could be completely turned upside down as it would be forced to recreate the data set relying only on the copyrighted material it was authorized to use. There is sex.
Federal copyright law also imposes severe financial penalties, with violators subject to fines of up to $150,000 for each “willfully committed” infringement.
“If you’re copying millions of works, how does that affect companies? We’ll see if that’s a fatal number for people,” he told NPR. In August 2023, the Times was considering legal action against OpenAI before filing a lawsuit in December of the same year. “Copyright law is a sword that will hang over AI companies’ heads for years unless they find a way to negotiate a solution.”
Judge Stein did not issue a ruling Tuesday, but said he would rule soon on whether the lawsuit against OpenAI can proceed or be dismissed.