Public Ledger

Reported
AI Data Deals.

A running ledger of publicly reported AI training data licensing deals. Every entry links to a primary news source. No projections, no rumors, no insider data — just the deals you can read about.

Reported Deals

27

Years Covered

2023–2026

Source Standard

Reuters, Bloomberg, WSJ, Axios, official press releases

Inclusion rule

Every deal on this page has been confirmed by at least one reputable outlet with a working URL. If the dollar amount isn't publicly disclosed, we write "terms undisclosed" rather than guess.

What we exclude

Product integrations (e.g., a company using GPT-4 in its app), in-house AI builds, lawsuits, and any deal we can't link to a primary source. Projections and rumors don't belong here.

Why we publish it

These are the deals that hit the news. The market FileYield serves is mostly the deals that don't — private introductions, NDAs, custom licensing. The visible market is the tip of the iceberg.

2026

9 deals

MAR 26

Reach plcAmazon

UK news (Mirror, Express, regional titles)

Usage-based compensation, terms undisclosed

UK publisher Reach signed a usage-based deal with Amazon allowing its content to power the Amazon Nova AI model and Alexa assistant.

Source: Press Gazette

MAR 26

News CorpMeta

News (WSJ, NY Post, Times of London, The Sun)

Up to $50 million per year for at least three years (~$150M+ total)

Multi-year deal letting Meta AI retrieve current articles and train on archives across News Corp's US and UK titles.

Source: Editor & Publisher

FEB 26

AP, Vox, Hearst, Condé Nast, People Inc., USA Today, Business InsiderMicrosoft

Publisher Content Marketplace launch

Usage-metered marketplace, terms undisclosed

Microsoft launched its Publisher Content Marketplace co-designed with seven major US publishers as the supply side, with Yahoo as first demand partner.

Source: Search Engine Land

FEB 26

Financial TimesGoogle

Financial journalism

Cash-for-content deal, terms undisclosed

FT signed a cash-for-content licensing deal with Google to feed FT journalism into Google's AI pilot programs, announced by FT CEO Jon Slade at a London summit.

Source: Press Gazette

JAN 26

Wikimedia FoundationAmazon

Wikipedia content / structured data

Paid Wikimedia Enterprise license, terms undisclosed

Amazon licenses high-throughput Wikimedia Enterprise access for AI model training and grounding.

Source: TechCrunch

JAN 26

Wikimedia FoundationMeta

Wikipedia content / structured data

Paid Wikimedia Enterprise license, terms undisclosed

Meta licenses Wikimedia Enterprise for grounding Llama and Meta AI products with verified knowledge.

Source: TechCrunch

JAN 26

Wikimedia FoundationMicrosoft

Wikipedia content / structured data

Paid Wikimedia Enterprise license, terms undisclosed

Microsoft licenses Wikimedia Enterprise high-volume access for Copilot and Azure AI services.

Source: TechCrunch

JAN 26

Wikimedia FoundationMistral AI

Wikipedia content / structured data

Paid Wikimedia Enterprise license, terms undisclosed

Mistral AI licenses Wikimedia Enterprise for European-grounded model training.

Source: TechCrunch

JAN 26

Wikimedia FoundationPerplexity

Wikipedia content / structured data

Paid Wikimedia Enterprise license, terms undisclosed

Perplexity licenses Wikimedia Enterprise for high-volume reference grounding in its answer engine.

Source: TechCrunch

2024

12 deals

SEP 24

GEDI GroupOpenAI

Italian news (La Repubblica, La Stampa)

Three-year deal, terms undisclosed

OpenAI's first major Italian publisher partnership — Italian DPA has since flagged personal-data concerns.

Source: OpenAI

AUG 24

Condé NastOpenAI

Editorial (Vogue, New Yorker, Wired, GQ, Vanity Fair, AD)

Multi-year, terms undisclosed

Covers Condé Nast's full title roster including Vogue, The New Yorker, Wired, GQ, Vanity Fair, Bon Appétit, and Architectural Digest.

Source: Bloomberg

JUN 24

TIMEOpenAI

Magazine archive

Multi-year, terms undisclosed (covers 101 years of TIME)

OpenAI gets training rights to TIME's full century-plus archive; ChatGPT surfaces TIME content with citations.

Source: TIME

MAY 24

Dotdash MeredithOpenAI

Reference / lifestyle (People, Better Homes, Allrecipes)

Reported floor of $16M+/yr (variable component)

IAC subsidiary Dotdash Meredith licensed its full publishing portfolio to OpenAI — the floor was reported by Adweek.

Source: Adweek

MAY 24

Vox MediaOpenAI

Digital media (Vox, The Verge, Eater, NY Mag)

Multi-year, terms undisclosed

Vox Media's full network including The Verge, NY Magazine, Eater, The Cut, Vulture, and SB Nation.

Source: Axios

MAY 24

The AtlanticOpenAI

Long-form journalism

Multi-year, terms undisclosed

Same announcement as Vox — content surfaces in ChatGPT with attribution.

Source: Bloomberg

MAY 24

News CorpOpenAI

News (WSJ, NY Post, Times, Australian, Barron's)

Up to $250 million over five years

Largest publisher–AI deal of 2024. News Corp's own WSJ broke the figure.

Source: Variety

APR 24

Financial TimesOpenAI

Financial journalism

Multi-year, terms undisclosed

FT licensed its archive for ChatGPT training and search-style attribution.

Source: OpenAI

MAR 24

Le MondeOpenAI

French news archive

Multi-year, terms undisclosed

OpenAI's first French-language news partnership.

Source: Bloomberg

MAR 24

Prisa MediaOpenAI

Spanish news (El País, AS, Cinco Días)

Multi-year, terms undisclosed

Same March 2024 announcement as Le Monde — covers El País, Cinco Días, AS, El HuffPost.

Source: Bloomberg

FEB 24

RedditGoogle

Conversational / forum data

$60 million per year

Google licensed Reddit's full post-and-comment corpus for AI training, signed shortly before Reddit's IPO.

Source: CBS News

FEB 24

Stack OverflowGoogle

Code / Q&A via OverflowAPI

Terms undisclosed

Google Cloud became Stack Overflow's first OverflowAPI partner — Gemini for Google Cloud gets validated technical knowledge.

Source: Stack Overflow blog

Disclaimer

FileYield is not party to any of the deals listed above. We publish this ledger as a public reference for understanding the AI training data market. All trademarks and company names belong to their respective owners. If you spot an error or have a deal we should add, email support@fileyield.com with the source link and we'll review.

The market you don't see

Most data deals never hit the news. They happen quietly, under NDA, between people who know each other. FileYield exists to match data owners with AI buyers privately — no public marketplace, no leaked terms.

Get a free valuation