The internet is not a free-for-all—we shouldn't let big tech companies wish copyright out of existence

Jacob Ridley

3 July 2024 at 3:30 pm·13-min read

Why copyright right matters, over a crudely drawn countryside scene. — Credit: Future

Jacob Ridley, senior hardware editor

Jacob Ridley headshot on a green background

This week: I just so happened to be listening to Mustafa Suleyman's book, The Coming Wave: AI, Power and the 21st Century's Greatest Dilemma. In which, the DeepMind co-founder goes into his thoughts on AI and the "technological revolution" he suggests has already begun.

When a generative AI system creates an image or some text, it all begins with training. Without an understanding of how words are statistically related to one another, or without knowledge of what an image is showing, a generative AI cannot successfully recreate it. The image generated by an AI might be a new work in itself, a complete original, though it's influenced by real works—millions of them—owned by millions of people.

How AI companies, or the firms which create datasets used by AI systems, continue to collect data is a source of much contention—an uncomfortable truth hanging over AI's exponential growth. Many AI firms have quietly assumed a position of acting as though they're allowed to use data freely from the web—be it images, videos, or text. Without this justification, they'd be stuck having to actually pay for the content they're using, threatening said growth. Meanwhile, artists, content creators, journalists, bloggers, producers, novelists, coders, developers, musicians, and many more argue that's absolute hogwash.

This split is best exemplified by comments made during a CNBC interview at Aspen Ideas Festival (via The Verge) by the CEO of Microsoft AI, Mustafa Suleyman.

Suleyman is at the centre of AI development today. Not only is he leading Microsoft's AI efforts, he co-founded DeepMind, which was later bought by Google, and drove Google's AI efforts, too. He's had a large part to play in how two of the largest tech firms on the planet deliver their AI systems. I've been listening to the audiobook of Suleyman's book this past few weeks, The Coming Wave, as he's someone informed and with a lot to say about how AI has and will impact our daily lives.

So, I say this with the utmost respect to a pioneer in his field: I believe his idea of a "social contract" for the internet is complete nonsense.

Suleyman, when asked by CNBC's Andrew Ross Sorkin on whether AI companies have "effectively stolen the world's IP", had this to say:

With respect to content that is already on the open web, the social contract of that content since the '90s has been that it is fair use.

"It's a very fair argument. With respect to content that is already on the open web, the social contract of that content since the '90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

Except that isn't the understanding. At least not mine, anyways, and if you've been taking content freely from anywhere on the internet this whole time, I have some very bad news for you.

If we ignore the fact that freeware is already a thing and, no, not everything on the internet is freeware—just think of the ramifications for a moment if it were so, especially for Suleyman's own employer, Microsoft—there's further legalese to prevent a free-for-all online.

There's something called copyright, which here in the UK was enshrined into law through the Copyright, Designs and Patents Act 1988. As a journalist, I have to be very conscious of the right I have to use anything on the internet, otherwise I may (rightly) be forced to pay a very large sum of money to the copyright holder.

Let's not get too into the weeds with this (he says, not even halfway through a 2,000 word column), but generally copyright law covers "original literary, dramatic, musical, or artistic works." That includes all manner of text, too, not just novels or short stories, and lasts usually 70 years. The rights to which are initially assigned to the "first owner" or creator of that work.

Some argue the creations of generative AI are original works, and therefore qualify for automatic copyright. To whom you grant the automatic copyright is a tricky situation, as when animals have taken photos of themselves (search 'monkey selfie') our very human laws don't quite know what to make of it. We actually ended up with a ruling in 2014 by the United States Copyright office that states works by non-humans are not copyrightable (PDF). That's despite a human playing a pivotal role in setting up the entire thing—which could have implications for AI-generated art, and not the least bit because that same ruling applies similar constraints on works created by a computer.

A crudely draw monkey on a green background with the words 'Hail Satan' in the upper corner.

Whether you own the copyright to the art you prompted through a generative AI system, even finessing those prompts to get it just right, is an ongoing debate. However, US courts currently rule against granting copyright in these instances, and have even barred award winning artwork from copyright.

But this is a tangent. Let's focus back on the use of copyrighted works for training purposes because clearly copyright has something to do with the mass collection and use of images, videos, and text, without permission, for an AI system likely run by a private business for commercial gain.

Within UK law, the copyright owner (automatically the author or creator, or employer of said author or creator) gets to say who can use its images and how. It's easy to waive your rights to images—I might see you've posted an image of a fun PC mod and message to ask if I have your permission to use it on PC Gamer, for example. If you say yes, providing I give you sufficient attribution for your work, everyone is happy and life moves on.

If I don't ask your permission and subsequently take the image or "substantial" part of it (which some do, no doubt about that), upon finding out that I've encroached on your copyright, you could demand I remove the offending material, sue for damages, or even get an injunction banning me from publishing or repeating an offence again.

This has been the case since the act was introduced in the UK in 1988—which I'd add was before the internet was a big deal. Similar protections also exist around the world, including the US and EU.

So there's really no excuse for saying we've all been living in some kind of wild west where anything goes on the internet. It doesn't, AI companies just want that to be the case, and they are fighting to protect their own interests.

There are a few defences for taking copyrighted works without permission in UK law. These mostly come under something called fair dealing. Fair use in the US is a similar concept but different in practice and applicability—as a UK national, it's mostly fair dealing that covers my actions. There are a few versions of fair dealing: one covers reporting of current events , another for review or criticism, and quotations and parody are also covered. Unless AI is actually a big joke, that last one won't offer much of a defence.

The PC Gamer logo crudely drawn.

Neither will the rest. They don't cover photographs, for one, which are proactively defended in the law. They also require a user to not take unfair commercial advantage of the copyright owner's works and only using what's necessary for the defined purpose. They also frequently require sufficient acknowledgement—none of which is the done thing in generative AI.

The rights of some publishers to not share their content is something that Suleyman tends to agree is the case, and which has already been exploited, as he explained to CNBC (which, by the way, I can quote thanks to fair dealing):

That's a grey area and I think that's going to work its way through the courts.

"There's a separate category where a website or a publisher or a news organisation had explicitly said do not scrape or crawl me for any other reason than indexing me so other people can find that content. That's a grey area and I think that's going to work its way through the courts."

"So far, some people have taken that information. I don't know who, who hasn't, but that's going to get litigated and I think that's rightly so."

Except that the one form of content that doesn't generally come under copyright law are actually news articles.

I'm frustrated by the moves from Google and Microsoft to use AI to summarise my articles into little regurgitated bites that threaten to destroy the business of the internet, but I wouldn't want to argue that's copyright infringement in court. It's known as "lifting" a story when you take key information from something published by another and republish it yourself. Providing you don't use the same words and layout—you don't take the piss, basically—it's legally fine to do under existing law.

Plenty of publishers will argue against AI systems on the finer points of these systems and what constitutes lifting and what's just taking without asking and without fair recompense—see the New York Times vs. Open AI case. I'll leave that to the lawyers. My argument is that, legal or not, an AI summarising stories with no kickback for the people working to create those stories will ultimately do a lot more harm than good in the long run.

Artificial intelligence drawn crudely with the words 'ooo, art' at the bottom right.

Simply put, I don't understand the argument from Suleyman here. Maybe it's a degree of wishful thinking from someone inside the AI inner circle looking out, or maybe he's looking around the internet and seeing some sort of wild west without any rules? But that's not the case, even considering the common exceptions to copyright law we'll get to in a moment.

Copyright infringement happens all the time on the web, and it's a debasement of both our rights as creators to not have our stuff nicked and the value of the content itself. Does that mean we should just lay down, admit defeat, and let an AI system or dataset crawler rewrite the rules so that copyright need not apply to them? I don't think so.

AI, explained

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.

What is artificial general intelligence?: We dive into the lingo of AI and what the terms actually mean.

There are some measures coming into place to try to defend copyright in a world obsessed by AI. The EU has introduced the Artificial intelligence (AI) Act which includes a transparency requirement for "publishing summaries of copyrighted data used in training" and rules on compliance with EU copyright law, much of which is similar to that of the UK.

Though the EU also includes some get-outs allowing for data mining of copyrighted works in some instances. One allows the use in research and by cultural heritage institutions, and the other means users can opt out of further use by other organisations, including for machine learning. How exactly one opts out is, uh, not entirely clear (PDF).

The UK has something similar in place, as an exception to the 1988 Act, which allows for non-commercial use of data mining. This is generally not considered a viable defence for large AI firms with public, and commercial, generative AI systems. The UK Government had also planned another exception, since the sudden popularity of AI systems, though that has since fallen through. That's probably to the benefit of people in the UK, who are technically safe from data mining for commercial purposes, but not for the AI firms hoping to scrape data from within the UK's borders.

The exact ways in which companies hope to circumvent these limitations or how these laws look in practice are matters that lawyers, civil servants and politicians will have to debate for years to come. Though generally I just want to make it clear that these arguments exist because of copyright law—not for a lack of it.

By acting as though these rules don't apply to them, and putting pressure on governments to make allowances for AI due to the significant amount of money AI promises to deliver, AI firms have largely gotten away with it to-date. Though I'd hold they're mostly running on a strategy of "it's easier to ask for forgiveness than to ask for permission" and have been for a couple of years now. They might continue to get away with it, too. By the time we've got to grips with copyright claims and whether they even exist for AI, will it be possible to untrain AI systems already trained on datasets filled to the brim with copyrighted content? Oops, turns out we can't really do that very well.

"What a pity," the AI exec might say.

It's my take, as a person that creates for the internet and without any claims of being a copyright lawyer, that in the creation of any loopholes for the purposes of data mining we may end up with one rule for big AI firms and another for regular folk like you and me. The presumed benefits of AI generated art trained on the hard graft of your own creative work deemed too valuable to human existence to be held back by petty copyright infringement. It could feel like that, or we could hold the AI companies, some worth billions of dollars, to account for the copyrighted content they're benefiting from.

If copyright owners don't manage to fend off AI, what will become of the internet, or "open web", as we know it? Will an artist want to publish anything online? Will social media platforms arise with the promise to be 'AI-proof'? Will the internet become more siloed as a result, split off into smaller communities off the beaten track and away from the prying eyes of Google, Microsoft, and crawlers sent out by dataset companies?

Because, after all, it's not just my words in an article that an AI might look fondly upon, or even someone's publicly published artwork, but perhaps your wedding photos or your smutty fanfiction. And what then, of that AI generated advert for something that you don't agree with that looks like you, or sounds like you, of your ability to get it removed or your likeness untrained? Now, that sounds a lot more like the pandemonium free-for-all that Suleyman believes has been happening this entire time.

RFI
Carmakers unhappy after EU hits China with tariffs on electric vehicles
The European Union has slapped extra provisional duties of up to 38 percent on Chinese electric car imports because of "unfair" state subsidies, despite Beijing's warnings the move would unleash a trade war. But company reps in both China and Europe are critical of the steps. Brussels launched an investigation last year into Chinese electric vehicle manufacturers to probe whether state subsidies were unfairly undercutting European automakers.Since announcing the planned tariff hike last month, o
Malay Mail
Lawyer: More eyeing lawsuit against MyAirline founder for alleged fraud
Goh is already being sued by two other groups, 15 in one and 213 in the second MyAirline suspended operations withou...
Reuters
Exclusive-Europe seeks industry views on China's older generation chips
AMSTERDAM/LONDON (Reuters) -The European Commission has begun canvassing the region's semiconductor industry for its views on China's expanded production of older generation computer chips, two sources familiar with the matter told Reuters. The Commission, the EU executive, has sought feedback ahead of two voluntary surveys for the chip industry and major chip-using industrial firms that will be due in September. A spokesperson on Friday confirmed the Commission had begun a "targeted consultation (with) the industry to assess further the use of legacy chips in supply chains."
CoinDesk
Private Equity Giants Are Circling Bitcoin Miners on AI Allure
Private equity firms are now looking at bitcoin miners in a much more different way after Core Scientific signed 200MW deal with CoreWeave in June, the company’s CEO said in an exclusive interview with CoinDesk.
The Guardian
China to hold hearing into brandy imports as tension grows with EU over tariffs on EVs
Ministry will discuss investigation into claims that European producers are selling goods below market rates
The Hill
Musk asks judge to dismiss shareholder lawsuit, argues delayed Twitter disclosure a ‘mistake’
Elon Musk has asked a federal judge in New York to dismiss a lawsuit brought by shareholders of Twitter, the platform now known as X, arguing the delayed disclosure of his stake in the social media company was a “mistake.” The billionaire was sued by Twitter shareholders in April 2022 after revealing he had acquired…
Malay Mail
E-hailing driver faces charges over assault on female passenger caught on video, say police
KUALA LUMPUR, July 5 — Brickfields district police chief ACP Ku Mashariman Ku Mahmood today said that an e-hailing drive...
The Telegraph
‘I’ve lost my job and my house, but my wife and I are worth £650m – how can we pay less tax?’
Rishi Sunak, 44, finds himself in a bind. After a highly successful career in the City of London, followed by a brief but eventful spell in politics, he has suffered a serious demotion and been evicted from his home.
The Guardian
Portugal v France: a galactic battle lost in the black hole of one man’s ego
This Euro 2024 clash could have been an all-time great quarter-final, and instead a part of it was stolen
CNN
Opinion: What Britain’s first Asian prime minister meant to my family
The defeat of the Conservative Party in the UK election ends Rishi Sunak’s two-year premiership. Does it also sour the story of Britain’s first Asian prime minister, asks Sunder Katwala.
The Telegraph
Cristiano Ronaldo looks lost as Portugal lose to France in penalty shoot-out at Euro 2024
When it was finally over Cristiano Ronaldo turned away, pulled the captain’s armband from his sleeve and wandered off – if this is not the end for him in international football then he will be in his fifth decade by the time of the next World Cup finals.
Reuters
China anchors 'monster ship' in South China Sea, Philippine coast guard says
The Philippine Coast Guard (PCG) said on Saturday that China's largest coastguard vessel has anchored in Manila's exclusive economic zone (EEZ) in the South China Sea, and is meant to intimidate its smaller Asian neighbour. The China coastguard's 165-meter 'monster ship' entered Manila's 200-nautical mile EEZ on July 2, spokesperson for the PCG Jay Tarriela told a news forum. The PCG warned the Chinese vessel it was in the Philippine's EEZ and asked about their intentions, he said.
Malay Mail
Immigration nabs 75 in Klang Valley prostitution crackdown
KUALA LUMPUR, July 6 — The Immigration Department conducted a crackdown on a foreign prostitution syndicate in the Klang...
Malay Mail
Police: Man sodomised by foreigner he met in PJ condo gym
PETALING JAYA, July 6 — A local man has alleged he was sodomised against his will by a foreigner he met at the gym of th...
Cosmopolitan
Jennifer Lopez's Family Want Her to "File for Divorce First and Get On With Her Life"
Jennifer Lopez's family want her to "file for divorce first" from Ben Affleck and get on with her life.
Yahoo News UK
Seven moments over 14 years that destroyed the Conservative government
The Tories have lost the 2024 general election with the worst results in the party's history. Here are the key moments that brought them down.
INSIDER
China-made military drones similar to the MQ-9 Reaper were disguised as wind turbines in shipments to Libya: Italian officials
The Times of London reported on Sunday that the drones were the Chinese-made Wing Loong, aligning with specifications released by Italian officials.
ITN
Jeremy Hunt leaves No11 after election defeat
Chancellor Jeremy Hunt has left No 11 with his family after Labour's landslide general election win.
Malay Mail
Big fat weddings off the cards for Malaysian couples as cost of living bites post-pandemic
KUALA LUMPUR, July 6 — Malaysian couples prefer tying the knot in close-knit and moderate wedding ceremonies as cost of...
The Guardian
Jude Bellingham fined €30,000 and given suspended one-game ban for gesture
Uefa has found Jude Bellingham guilty of ‘violating the basic rules of decent conduct’ in his goal celebration against Slovakia

Latest stories