Authors Suing OpenAI Will Get to See Its Secret Training Data in Heavily Locked Down Room

Secure All

Authors suing OpenAI for copyright infringement are going to get unprecedented access to its training data — but only in a performatively locked down room.

As The Hollywood Reporter reveals, lawyers for authors Sarah Silverman, Ta-Nehisi Coates, and Paul Tremblay announced in a new court filing this week that they have reached an agreement with OpenAI that will allow the writers' representatives to view the company's training data trove.

This is, notably, the first time OpenAI has allowed an outside party to view its training data, and there are some major caveats.

As the report explains, the training data can only be viewed in a "secure" room at OpenAI's San Francisco headquarters on a locked down computer that does not have access to the internet or any shared networks. No outside electronics will be allowed into the room, and although the reps may be allowed to take notes, making copies of any portion of the data is strictly forbidden — a wild request, we should note, since it's all material created by the public in the first place.

Whoever views the datasets, which were not given a size estimate but are undeniably massive, will be required to provide identification, give their name in a visitor's log, and sign a non-disclosure agreement, the report adds.

Head Case

These stipulations, which sound more like the protocols for viewing state secrets than looking at a bunch of AI training data, are the latest forte in the protracted battle between these authors and OpenAI that may well end up serving as the legal precedent for AI's use of copyrighted material in the future.

The attorneys representing Coates, Silverman, and Tremblay hail from SF's Joseph Saveri Law Firm and are representing them in a similar case against Meta arguing the same thing: that the authors' copyrighted work was used without permission or compensation, resulting in ChatGPT spitting out answers that infringe upon their copyrights.

Twice this year, other claims from these lawsuits have been thrown out.

In February, the majority of the suit against OpenAI was tossed as US District Judge Araceli Martinez-Olguin said that the attorneys' claims of negligence, unjust enrichment, and vicarious copyright infringement were without merit.

Later this year, the same judge threw out the portion of the suit that alleged OpenAI engaged in unfair business practices by using the authors' copyrighted works — though as THR notes, the direct copyright infringement claims have remained intact throughout.

As of now, it's unclear when these locked down training data viewing sessions will take place, how long they'll have with the data, and how many people will go in at once.

All the same, we'll be watching this one closely — while not quite envying those who will have to dig through all that data to try to find a smoking gun.

More on OpenAI: OpenAI Execs Mass Quit as Company Removes Control From Non-Profit Board and Hands It to Sam Altman