Advertisement

Column: OpenAI accuses China of stealing its content, the same accusation that authors have made against OpenAI

Sam Altman
OpenAI CEO Sam Altman: Did China just show his business model to be flawed?
(Eric Risberg / Associated Press)

The unveiling Monday of a Chinese-made AI bot that seems cheaper, more efficient and in some ways more accurate than American-grown versions certainly kicked up a fuss in the AI space.

Nvidia, the maker of the high-priced chips indispensable for AI development, lost nearly $600 billion, or 17%, in stock market value, the largest one-day market drop for any U.S. stock, ever. The loss wiped out the previous record, a $279-billion loss suffered last September by, yes, Nvidia. The loss triggered cascade of losses for other tech companies and consequently a major 3% downdraft in the Nasdaq index.

The announcement by the Chinese firm DeepSeek of its R1 model also provoked not a little hand-wringing over the idea that China could so easily have outpaced American tech companies, which have spent hundreds of billions of dollars trying to bring their AI performance to a level that DeepSeek seems to have achieved at a fraction of the cost.

Advertisement

We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models.

— OpenAI

The reaction resembles the thunderbolt that struck the U.S. aerospace community — and the government — in 1957, when the Soviet Union placed Sputnik in orbit while American rockets were still blowing up on their launchpads.

But another aspect of this fight should bring smiles to the faces of critics of OpenAI and other AI firms. It’s that OpenAI is accusing DeepSeek of, in effect, stealing its work to train R1.

Advertisement

That accusation bears a strong resemblance to the accusations that authors and artists have laid against OpenAI and other bot developers — specifically, that the developers have infringed the content creators’ copyrights by using their works to “train” their bots — plying the bots with content that the tools spit back to users, typically in somewhat altered form.

That charge is set forth in lawsuits brought in federal courts across the land by the authors and artists. Most of those lawsuits, including one filed by the New York Times against OpenAI in 2023, remain unresolved, as federal judges grapple with what may be a novel issue in copyright law.

Advertisement

Is poetic justice at play? Or, to put it as Shakespeare did in “Hamlet,” have the U.S. AI companies just been hoisted with their own petard?

Let’s take a look. First, a brief primer on how AI tools are developed, and why OpenAI says it’s acting legally and DeepSeek may not be.

Although AI chatbots may seem to the untutored user to be generating their own thoughts in responding to questions, they don’t create content, as such. They have to be “trained” by developers pumping their databases full of human-produced content — books, newspaper articles, junk scraped from the web, etc.

A federal judge’s dismissal of a copyright claim against OpenAI has artists and writers wondering if they can ever win in court against the AI industry, but experts aren’t sure the battle is over.

All this material allows the bots to generate superficially coherent answers to questions by generating prose patterns and sometimes repeating facts they dredge up from their hoards of scraped material.

The AI firms have said in their defense that they’re applying the “fair use” exception to copyright law. Fair use typically allows the use of copyrighted material without permission if it’s for a purpose “such as criticism, comment, news reporting, teaching, scholarship, and research,” according to the U.S. Copyright Office. But the definition is so inchoate that decisions about whether something rates as fair use are typically done by judges on a case-by-case basis.

OpenAI’s accusation about DeepSeek’s behavior falls into a somewhat different category. It involves a process common in the AI world known as “distillation.” That means using the output of one AI bot to train another AI bot, rather than training the second bot on the full global database used by the first.

Advertisement

At some level, “OpenAI may well have done analogous things to YouTube, New York Times, and countless artists and writers” that it now charges DeepSeek with, observes AI critic Gary Marcus. He adds, “Karma is a bitch.”

An OpenAI spokesperson told me by email that it’s aware that Chinese firms “are actively working to use methods, including what’s known as distillation, to try to replicate advanced U.S. AI models. We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models.”

The firm didn’t respond to my request for comment on whether it is accusing DeepSeek of doing what OpenAI has been accused of. Microsoft, a major partner of and investor in OpenAI, told me by email it “has nothing to share here.” DeepSeek hasn’t responded to my request for comment.

AI firms allow, even encourage, developers to distill material from their tools, though they see the process as a revenue-producing service. But they draw the line at using distillation to produce or improve competing products — such as R1, a potential competitor for OpenAI’s ChatGPT models. Doing so would be a violation of OpenAI’s terms of service. That’s why the firm accuses DeepSeek of “inappropriate” distillation.

U.S. tech stocks, including Nvidia, Oracle and Broadcom, plummeted Monday after Chinese startup DeepSeek said it created an AI system that can compete against chatbots such as OpenAI’s ChatGPT at much lower costs.

That brings us back to the wider landscape of AI development.

One reason that DeepSeek’s revelation caused such an earthquake is that the business model of U.S. AI developers has been based on absorbing almost unlimited resources in pursuit of a nirvana to come — billions in capital from venture investors and (in OpenAI’s case) Microsoft, gigawatts of energy, ever more potent and expensive graphics processing units from Nvidia.

“America’s most powerful tech companies sat back and built bigger, messier models powered by sprawling data centers and billions of dollars of NVIDIA GPUs, a bacchanalia of spending that strains our energy grid and depletes our water reserves,” writes AI critic Ed Zitron, “without, it appears, much consideration of whether an alternative was possible.”

Advertisement

Apple’s AI researchers gave these AI systems a simple arithmetic problem that schoolkids can solve. The bots flunked.

They had no incentive to seek out a cheaper or more efficient path to development because the money and energy and chips were so abundant. DeepSeek, however, seemed to show that the same goals could be reached at less than 1/50th the cost.

I say “seemed,” because DeepSeek’s claim to have developed its AI tool for less than $5.6 million is misleading at best. That’s the figure DeepSeek has given for training its model, the step that comes after years of research and development. Nor is DeepSeek a shoestring operation: It’s a spinoff from the Chinese hedge fund High-Flyer, whose investment in the project is unknown.

DeepSeek also says it developed its model using Nvidia chips that have been superseded by more advanced and costly versions. But that’s because the Biden administration barred the export of the more advanced chips. That may have forced the Chinese developers to find effective workarounds for their technological constraints, but they did evidently do so.

The revolution in technology and business thinking launched by DeepSeek’s unveiling of its AI tool may actually work to the benefit of the U.S. industry. American firms may come under pressure from their investors to do more with less, rather than trying to do more with more.

The resultant reduction in costs for AI applications may make them more appealing for business customers. That’s important, because thus far almost no one has found a use for AI bots or tools that can’t be done without them, and more cheaply.

It’s proper to note, furthermore, that DeepSeek hasn’t solved the fundamental obstacle to a wide rollout of AI tools in industry experienced by OpenAI and other development firms — the tools’ tendency to make mistakes — “hallucinations,” as they’re known in the field — that occur at a rate that destroys their reliability.

Advertisement

The DeepSeek shake-up of recent days will reverberate for a long time. It points to how much money has been wasted in the AI field up to now, and the shakiness of the myth that hundreds of billions more in capital is all that’s needed to solve technical problems that may not be solvable. The financial reckoning seen on Jan. 27 was well overdue. As for whether AI is actually all it’s cracked up to be, according to its promoters — that reckoning is still to come.

Advertisement