The copyright lawsuits against OpenAI are piling up as the tech company seeks data to train its AI

Kenneth Niemeyer,Lakshmi Varanasi

30 June 2024 at 5:09 pm·3-min read

Publishers want compensation from OpenAI for using their works to train AI models.
The Center for Investigative Reporting filed a lawsuit against the company this week.
The New York Times and other outlets also have similar lawsuits against OpenAI.

OpenAI uses any and all publicly available data to train ChatGPT, including books and articles from the internet. Now, those who own them want to be paid for their work.

Training data is an essential part of creating the AI models that are taking over the tech world. Leading tech companies like Google, Meta, OpenAI, Anthropic, and Microsoft are all scrambling to find new sources of data. Meta at one point even considered buying Simon & Schuster, one of the world's biggest publishing houses.

Part of the problem is that publishers are increasingly accusing these companies of hoovering up copyrighted data. They'd like to be paid for their work. Meta and OpenAI have argued in comments to the US Copyright Office that putting copyrighted material on the internet makes it "publicly available" and thus under fair use.

But they'll still have to make that argument in court as the company faces lawsuits from several groups over the copyrighted material.

The Center for Investigative Reporting, a news nonprofit known sometimes by its acronym CIR and which merged with Mother Jones and Reveal earlier this year, sued OpenAI and Microsoft last week in federal court. The lawsuit accuses OpenAI of being "built on the exploitation of copyrighted works belonging to creators around the world, including CIR."

Lawyers for the CIR accused OpenAI and Microsoft of using copyrighted material from Mother Jones to train their GPT and Copilot AI models.

"OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material," Monika Bauerlein, CEO of the Center for Investigative Reporting, said in an announcement about the lawsuit. "This free rider behavior is not only unfair, it is a violation of copyright."

The lawsuit says that "16,793 distinct URLs from Mother Jones's web domain" appeared in a published list of the top web domains present in the company's WebText training set.

In another class action lawsuit from the Author's Guild, two authors claimed that the company used information from their books to train ChatGPT. The New York Times also filed a similar lawsuit against the company in December 2023.

In May, court documents in the Author's Guild lawsuit revealed that OpenAI deleted two huge datasets used to train GPT-3. Lawyers for the guild said the two sets likely contained "more than 100,000 published books."

The two employees responsible for putting together the data no longer work for OpenAI, court documents say.

OpenAI has begun signing licensing agreements with news organizations to fairly use their work. The company has signed such agreements with The Associated Press, publishers of The Wall Street Journal and New York Post, The Atlantic, Prisa Media, Le Monde newspaper, Financial Times, and Business Insider parent Axel Springer.

But the scale of content required for these bots to continuously learn will require far more than a handful of licensing agreements.

One solution is synthetic data, which is artificially generated rather than collected from the real world, and can easily be generated by machine learning algorithms.

OpenAI has considered synthetic data as an option to train its models, but CEO Sam Altman has raised concerns about producing quality data.

"As long as you can get over the synthetic data event horizon, where the model is smart enough to make good synthetic data, everything will be fine," Altman said at a tech conference in May 2023. The company has also explored a process in which AI models work together — one AI system produces data, while another judges it.

OpenAI did not immediately return a request for comment from Business Insider.

Read the original article on Business Insider

PA Media: UK News
Trading is ‘near impossible’: The small business owners torn on how to vote
Small business owners explain which election issues are most important to them – and what could yet sway their decision.
Evening Standard
How to change your life, according to a top psychologist
Dissatisfied, bored, fed up with playing it safe? Dr Susan Kahn shares her guide on how to reinvent yourself, whatever your age
Malay Mail
Transport Ministry boosts staff at JB VEP centre as applications surge from Singapore vehicles, says minister
KUALA LUMPUR, July 2 — The Ministry of Transport is ramping up its VEP (Vehicle Entry Permit) operations to handle the s...
Yahoo Finance Video
Apple Intelligence is not enough to drive China sales: UBS
Apple's (AAPL) upcoming integration of Apple Intelligence software into its smartphone lineup may not be enough to boost sales in the Chinese market in 2025, according to UBS analyst David Vogt. He suggests that competition from Huawei's resurgence in China could slow Apple's regional growth prospects. Yahoo Finance tech reporter Dan Howley breaks down the details. For more expert insight and the latest market action, click here to watch this full episode of Market Domination. This post was written by Angel Smith
Deadline
Kanye West Sued By Former Employees He Allegedly Called “New Slaves” – Update
Kanye West is being sued by a group of former employees, who accuse him and former chief of staff Milo Yiannopoulos of “fostering a racist environment” and forcing employees to work “insanely long hours.” According to TMZ, the filing was by developers hired to work on West’s YZYVSN streaming service app, which he wanted to use to …
The Guardian
North Sea oil decline: ‘We can’t have a repeat of what happened to 80s miners’
Unlikely alliance of unions and climate groups call for ‘clear and funded’ transition plan for communities reliant on dwindling industry
Malay Mail
Thursday, not Tuesday: PM Anwar raps home minister for batik blunder in Parliament
KUALA LUMPUR, July 2 — Prime Minister Datuk Seri Anwar Ibrahim today admonished Home Minister Datuk Seri Saifuddin Nasut...
Malay Mail
Online users make fun of Wan Fayhsal, Radzi for final 5km effort during Syed Saddiq's 200-km run
KUALA LUMPUR, July 1 — Muar MP Syed Saddiq Abdul Rahman took a stand against government inaction in a rather unusual way...
The Telegraph
Prison officer accused of having sex with an inmate on video appears in court
A prison officer allegedly filmed having sex with an inmate in a cell has appeared in court charged with misconduct in public office.
INSIDER
Malaysia's $100 billion ghost town is good for at least one thing: filming documentaries and shows like Netflix's 'The Mole'
Embattled Forest City is a backdrop for a handful of reality shows and documentaries.
Malay Mail
Proton makes cheeky fun of Tesla’s Cybertruck in X50 social media ad (VIDEO)
KUALA LUMPUR, July 2 — Tesla’s Cybertruck has been gaining attention for more than its futuristic design. Various online...
Malay Mail
With 2.3 million Malaysian adults living with three NCDs, doctors warn of serious public health risks
KUALA LUMPUR, July 2 — The key findings from the newly-released National Health and Morbidity Survey 2023 by the Health...
The Telegraph
Macron ‘practically wiped out’, Marine Le Pen declares
Follow the latest updates in our French election liveblog
The Telegraph
John Terry brands BBC ‘disgrace’ after poking fun at Cristiano Ronaldo’s tears
Cristiano Ronaldo was left in floods of tears after missing a penalty in Portugal’s European Championship round of 16 win over Slovenia in Frankfurt on Monday night.
The Independent
A pair of siblings were stabbed and the sister’s dying words were that the killer was a man from Detroit. Thirty-four years later, he’s been arrested.
Pamela Sumpter, was raped and stabbed, and her brother John Sumpter was stabbed to death, at their Georgia home on July 15, 1990. Andrea Cavallier reports
Associated Press
Nepalese spiritual leader ‘Buddha Boy’ sentenced to 10 years in prison for sexual assault on minor
A controversial spiritual leader in Nepal known as “Buddha Boy” has been sentenced to a 10-year prison term Monday for sexually assaulting a minor, court officials said. Ram Bahadur Bamjan — believed by some to be the reincarnation of the founder of Buddhism — was also ordered to pay $3,700 in compensation to the victim by a judge at the Sarlahi District Court in southern Nepal. The man will have 70 days to appeal against the court order, court official Sadan Adhikari said.
People
Taylor Swift Gets Stuck on Platform After Stage Malfunction During Dublin Concert
One of the popstar's backup dancers, Jan Ravnik, came to her rescue during the Eras Tour mishap
People
Woman Was Killed by Man Her Mom Had Forced Her to Marry — and Now the Mother Is Convicted
Prosecutors in Australia said Sakina Muhammad Jan forced her daughter to marry a man she didn't want to wed in November 2019, weeks before he killed her
The Independent
Prostitute is fatally strangled then sexually assaulted inside Las Vegas casino by man who said he ‘snapped’
Jason Kendall, 35, is being held without bond in connection to the murder of Larissa Garcia at a Las Vegas casino
Malay Mail
Teh tarik satu, boss? Otters are surprise ‘regulars’ at a Perak mamak (VIDEO)
KUALA LUMPUR, July 2 — Visitors to a mamak eatery in Perak got a furry surprise when a pair of otters dashed into the es...

Latest stories