Public Ledger
Reported
AI Data Deals.
A running ledger of publicly reported AI training data licensing deals. Every entry links to a primary news source. No projections, no rumors, no insider data — just the deals you can read about.
Reported Deals
27
Years Covered
2023–2026
Source Standard
Reuters, Bloomberg, WSJ, Axios, official press releases
Inclusion rule
Every deal on this page has been confirmed by at least one reputable outlet with a working URL. If the dollar amount isn't publicly disclosed, we write "terms undisclosed" rather than guess.
What we exclude
Product integrations (e.g., a company using GPT-4 in its app), in-house AI builds, lawsuits, and any deal we can't link to a primary source. Projections and rumors don't belong here.
Why we publish it
These are the deals that hit the news. The market FileYield serves is mostly the deals that don't — private introductions, NDAs, custom licensing. The visible market is the tip of the iceberg.
2026
9 deals
Date
Seller → Buyer
Reported Value
Description & Source
MAR 26
Reach plc→Amazon
UK news (Mirror, Express, regional titles)
Usage-based compensation, terms undisclosed
UK publisher Reach signed a usage-based deal with Amazon allowing its content to power the Amazon Nova AI model and Alexa assistant.
Source: Press Gazette
MAR 26
News Corp→Meta
News (WSJ, NY Post, Times of London, The Sun)
Up to $50 million per year for at least three years (~$150M+ total)
Multi-year deal letting Meta AI retrieve current articles and train on archives across News Corp's US and UK titles.
Source: Editor & Publisher
FEB 26
AP, Vox, Hearst, Condé Nast, People Inc., USA Today, Business Insider→Microsoft
Publisher Content Marketplace launch
Usage-metered marketplace, terms undisclosed
Microsoft launched its Publisher Content Marketplace co-designed with seven major US publishers as the supply side, with Yahoo as first demand partner.
Source: Search Engine Land
FEB 26
Financial Times→Google
Financial journalism
Cash-for-content deal, terms undisclosed
FT signed a cash-for-content licensing deal with Google to feed FT journalism into Google's AI pilot programs, announced by FT CEO Jon Slade at a London summit.
Source: Press Gazette
JAN 26
Wikimedia Foundation→Amazon
Wikipedia content / structured data
Paid Wikimedia Enterprise license, terms undisclosed
Amazon licenses high-throughput Wikimedia Enterprise access for AI model training and grounding.
Source: TechCrunch
JAN 26
Wikimedia Foundation→Meta
Wikipedia content / structured data
Paid Wikimedia Enterprise license, terms undisclosed
Meta licenses Wikimedia Enterprise for grounding Llama and Meta AI products with verified knowledge.
Source: TechCrunch
JAN 26
Wikimedia Foundation→Microsoft
Wikipedia content / structured data
Paid Wikimedia Enterprise license, terms undisclosed
Microsoft licenses Wikimedia Enterprise high-volume access for Copilot and Azure AI services.
Source: TechCrunch
JAN 26
Wikimedia Foundation→Mistral AI
Wikipedia content / structured data
Paid Wikimedia Enterprise license, terms undisclosed
Mistral AI licenses Wikimedia Enterprise for European-grounded model training.
Source: TechCrunch
JAN 26
Wikimedia Foundation→Perplexity
Wikipedia content / structured data
Paid Wikimedia Enterprise license, terms undisclosed
Perplexity licenses Wikimedia Enterprise for high-volume reference grounding in its answer engine.
Source: TechCrunch
2025
4 deals
Date
Seller → Buyer
Reported Value
Description & Source
OCT 25
People Inc. (IAC)→Meta
Lifestyle / reference
Multi-year content partnership, terms undisclosed
People Inc. (formerly Dotdash Meredith) extended its AI licensing strategy to Meta after its 2024 OpenAI deal.
Source: IAC investor relations
AUG 25
Scale AI→US Army
Defense AI / data services
$99 million Army R&D services contract
Follow-on to Scale's earlier $250M DoD blanket purchase agreement — covers AI R&D services for the Army.
Source: GovConWire
MAY 25
Chegg→2 undisclosed AI companies
Education Q&A
$4 million in Q1 2025 from licensing Q&A content
Chegg disclosed in its Q1 2025 earnings that two AI companies are licensing its Q&A content (buyers not named).
Source: EdTech Innovation Hub
APR 25
Freepik→2 undisclosed tech firms
Vector / stock imagery
Approximately $0.03 per image across a 200-million-image archive
Freepik's CEO told TechCrunch it sold licensed access to its image archive to two large tech firms (buyers not named).
Source: TechCrunch
2024
12 deals
Date
Seller → Buyer
Reported Value
Description & Source
SEP 24
GEDI Group→OpenAI
Italian news (La Repubblica, La Stampa)
Three-year deal, terms undisclosed
OpenAI's first major Italian publisher partnership — Italian DPA has since flagged personal-data concerns.
Source: OpenAI
AUG 24
Condé Nast→OpenAI
Editorial (Vogue, New Yorker, Wired, GQ, Vanity Fair, AD)
Multi-year, terms undisclosed
Covers Condé Nast's full title roster including Vogue, The New Yorker, Wired, GQ, Vanity Fair, Bon Appétit, and Architectural Digest.
Source: Bloomberg
JUN 24
TIME→OpenAI
Magazine archive
Multi-year, terms undisclosed (covers 101 years of TIME)
OpenAI gets training rights to TIME's full century-plus archive; ChatGPT surfaces TIME content with citations.
Source: TIME
MAY 24
Dotdash Meredith→OpenAI
Reference / lifestyle (People, Better Homes, Allrecipes)
Reported floor of $16M+/yr (variable component)
IAC subsidiary Dotdash Meredith licensed its full publishing portfolio to OpenAI — the floor was reported by Adweek.
Source: Adweek
MAY 24
Vox Media→OpenAI
Digital media (Vox, The Verge, Eater, NY Mag)
Multi-year, terms undisclosed
Vox Media's full network including The Verge, NY Magazine, Eater, The Cut, Vulture, and SB Nation.
Source: Axios
MAY 24
The Atlantic→OpenAI
Long-form journalism
Multi-year, terms undisclosed
Same announcement as Vox — content surfaces in ChatGPT with attribution.
Source: Bloomberg
MAY 24
News Corp→OpenAI
News (WSJ, NY Post, Times, Australian, Barron's)
Up to $250 million over five years
Largest publisher–AI deal of 2024. News Corp's own WSJ broke the figure.
Source: Variety
APR 24
Financial Times→OpenAI
Financial journalism
Multi-year, terms undisclosed
FT licensed its archive for ChatGPT training and search-style attribution.
Source: OpenAI
MAR 24
Le Monde→OpenAI
French news archive
Multi-year, terms undisclosed
OpenAI's first French-language news partnership.
Source: Bloomberg
MAR 24
Prisa Media→OpenAI
Spanish news (El País, AS, Cinco Días)
Multi-year, terms undisclosed
Same March 2024 announcement as Le Monde — covers El País, Cinco Días, AS, El HuffPost.
Source: Bloomberg
FEB 24
Reddit→Google
Conversational / forum data
$60 million per year
Google licensed Reddit's full post-and-comment corpus for AI training, signed shortly before Reddit's IPO.
Source: CBS News
FEB 24
Stack Overflow→Google
Code / Q&A via OverflowAPI
Terms undisclosed
Google Cloud became Stack Overflow's first OverflowAPI partner — Gemini for Google Cloud gets validated technical knowledge.
Source: Stack Overflow blog
2023
2 deals
Date
Seller → Buyer
Reported Value
Description & Source
DEC 23
Axel Springer→OpenAI
News (Politico, Business Insider, Bild, Welt)
Multi-year, terms undisclosed (reportedly tens of millions of euros per year)
First global news publisher to formally license content for ChatGPT — covers Politico, Business Insider, Bild, and Welt.
Source: OpenAI
JUL 23
Shutterstock→OpenAI
Images / video / music
Six-year expansion (terms undisclosed)
Shutterstock expanded its OpenAI partnership into a six-year deal covering image, video, and music libraries for model training.
Source: Shutterstock investor relations
Disclaimer
FileYield is not party to any of the deals listed above. We publish this ledger as a public reference for understanding the AI training data market. All trademarks and company names belong to their respective owners. If you spot an error or have a deal we should add, email support@fileyield.com with the source link and we'll review.
The market you don't see
Most data deals never hit the news. They happen quietly, under NDA, between people who know each other. FileYield exists to match data owners with AI buyers privately — no public marketplace, no leaked terms.
Get a free valuation