Deepgram
Enterprise speech AI company providing industry-leading speech-to-text and text-to-speech APIs. Valued at $1.3 billion after raising $130 million in Series C, Deepgram has processed over 50,000 years of audio and serves 200,000+ developers with the fastest, most accurate voice AI platform.
Overview
The Voice AI Infrastructure Layer
Deepgram is the enterprise speech AI company that powers voice experiences for over 200,000 developers and some of the world's largest enterprises. Their speech-to-text and text-to-speech APIs are among the fastest and most accurate available, processing real-time voice data at scale across dozens of languages and accents.
The company has processed over 50,000 years of audio and transcribed over 1 trillion words — an astounding scale that reflects their position as the go-to infrastructure provider for voice AI. Usage has grown 3.3x annually over the past four years, demonstrating sustained and accelerating demand.
Deepgram raised $130 million in Series C funding at a $1.3 billion valuation in 2025, with plans to scale their voice AI infrastructure globally. Major partnerships with IBM (first voice partner for watsonx) and Amazon Connect (contact center integration) position Deepgram at the center of enterprise voice AI.
For data sellers with audio and speech recordings, Deepgram is one of the most important buyers in the market. Their model accuracy depends directly on the diversity and quality of their training audio, and they operate a Model Improvement Partnership Program specifically designed to source underrepresented speech data.
Deepgram's processing scale is staggering. Over 50,000 years of audio processed and over 1 trillion words transcribed — numbers that reflect both the volume of voice data being generated in the enterprise and Deepgram's dominant position in processing it. The company's 3.3x annual usage growth over four consecutive years demonstrates that demand for voice AI infrastructure is accelerating, not plateauing.
The IBM partnership — making Deepgram IBM's first voice partner for watsonx Orchestrate — is particularly significant. IBM's enterprise customer base spans virtually every industry and geography, and the integration gives Deepgram access to one of the most extensive enterprise distribution channels in technology. Similarly, the Amazon Connect partnership positions Deepgram at the center of the contact center AI transformation, a market worth tens of billions of dollars.
Data Strategy
How Deepgram Trains Its Models
Deepgram's data strategy combines proprietary audio collection, customer-contributed data, synthetic data generation, and external data licensing.
The company's proprietary approach involves novel, non-lossy audio compression paired with synthetic data generation that precisely matches real-world conditions. This means Deepgram can create synthetic training audio that mimics the noise, codec artifacts, and acoustic conditions of a specific customer's environment.
Deepgram's Model Improvement Partnership Program is a formal initiative to source training data that addresses gaps in their models, particularly for underrepresented speaker groups. The program focuses on ensuring representation across age, sex, accents, dialects, and speaking styles. This creates specific, well-defined data needs that data sellers can fill.
Customer-contributed data is another important source. As enterprise customers process audio through Deepgram's APIs, the resulting transcriptions and corrections provide feedback that improves model accuracy over time. The IBM and Amazon Connect partnerships amplify this effect by routing high volumes of enterprise voice data through Deepgram's platform.
External data licensing fills the remaining gaps. Deepgram actively purchases speech recordings, podcast archives, broadcast audio, and conversational data from data providers. They pay particularly well for audio in underrepresented languages and accents.
Deepgram's novel approach to audio compression and synthetic data generation sets them apart from competitors. Their proprietary non-lossy compression technology preserves audio details that other speech AI companies lose, resulting in models that are more accurate on real-world audio. Their synthetic data pipeline can generate training audio that precisely matches a specific customer's acoustic conditions — the exact noise patterns, codec artifacts, and room acoustics of a particular call center or office.
The Model Improvement Partnership Program formalizes Deepgram's commitment to sourcing training data that addresses representation gaps. The program specifically targets underrepresented speaker groups based on age, sex, accent, and dialect, ensuring that Deepgram's models work equitably across diverse populations. Data providers who can supply recordings from underrepresented groups receive priority consideration and premium pricing.
Deepgram's enterprise customer base generates continuous training data through normal API usage. As customers process audio through Deepgram's platform, the resulting transcriptions, corrections, and feedback create a data flywheel that improves model accuracy over time. The IBM and Amazon Connect partnerships amplify this flywheel by routing massive volumes of enterprise voice data through Deepgram's infrastructure.
What They Need
Deepgram's
data needs.data needs.
These are the specific data types Deepgram is actively seeking. If you have any of these, FileYield can broker a deal.
Detailed Breakdown
What Deepgram Is Buying
Deepgram's data needs are highly specific to voice and audio AI, with emphasis on diversity and real-world conditions.
Accent-diverse speech recordings are Deepgram's highest priority. They need speech data representing hundreds of English accents (American regional, British, Australian, Indian, Nigerian, Singaporean, and many more) plus dozens of other languages. Each accent and language has specific gaps that Deepgram is willing to pay premium rates to fill.
Multi-speaker audio — recordings with multiple people talking, including overlapping speech, interruptions, and natural turn-taking — is essential for meeting transcription and contact center applications.
Phone call recordings with typical telephony audio quality (codec compression, background noise, cellular artifacts) are critical for contact center AI. Customer support calls, sales calls, and general business communications are all in demand.
Medical dictation recordings help Deepgram serve their healthcare customers. Physician dictation, clinical notes, radiology reports, and other medical audio require specialized vocabulary and are highly valuable.
Far-field microphone recordings — audio captured from across a room rather than close to the speaker's mouth — are important for meeting rooms, smart speakers, and conference call applications.
Ambient noise recordings and audio from challenging acoustic environments (restaurants, factories, outdoor settings, vehicles) help Deepgram improve accuracy in real-world conditions where audio quality is imperfect.
Podcast and long-form audio is valuable for training models that need to handle extended speech with topic changes, speaker switches, and varying audio quality. Podcast archives with transcriptions provide text-audio pairs that are useful for both speech recognition and speech synthesis training.
Broadcast audio — news broadcasts, radio programs, live event coverage — provides diverse speaking styles and audio conditions. Different broadcast formats (studio, field reporting, live sports, talk radio) each present unique challenges for speech AI.
Meeting recordings with multiple participants, cross-talk, and varying audio equipment are essential for enterprise meeting AI products. The shift to hybrid work has created massive demand for accurate meeting transcription, and Deepgram needs diverse meeting audio to train models that handle the full range of meeting environments.
Legal proceedings audio — courtroom recordings, depositions, arbitration hearings — represents a specialized niche where transcription accuracy is critical. Legal audio often includes technical vocabulary, rapid exchanges between speakers, and formal language patterns that require specialized training data.
Deal History
Recent
deals.deals.
Undisclosed
First voice partner integration into IBM watsonx Orchestrate for enterprise AI
2026Undisclosed
Expanded partnership for speech-to-text and voice AI in contact centers
2025$130M
Funding at $1.3B valuation for scaling real-time voice AI infrastructure
2025200K+ developers
Contact center and enterprise voice AI deployments across industries
2025Sell Through FileYield
Selling Audio Data to Deepgram Through FileYield
FileYield connects audio data owners with Deepgram's data acquisition team. Whether you have call center recordings, podcast archives, broadcast audio, or specialized speech collections, Deepgram is likely interested.
Submit a data appraisal through FileYield describing your audio data. Include details about speaker demographics (languages, accents, age ranges), recording conditions (phone, microphone type, environment), total hours, and any existing transcriptions. Our team provides a valuation within 48 hours.
Deepgram evaluates audio data for speaker diversity, recording quality, and relevance to their current model gaps. Audio in underrepresented languages or accents commands the highest prices.
Deals are structured as licensing agreements with clear terms around usage, privacy (all personally identifiable information must be handled appropriately), and payment. Deepgram's Model Improvement Partnership Program provides a formal framework for ongoing data supply relationships.
Deepgram is one of FileYield's most consistent buyers of audio data. Their continuous need for diverse speech recordings means they are almost always actively purchasing, and their evaluation process is fast — they can assess audio quality and relevance within days.
For audio data providers who can supply recordings in underrepresented languages or accents, Deepgram's Model Improvement Partnership Program offers a structured framework for ongoing data supply with premium pricing. FileYield helps data owners identify which specific audio categories Deepgram is currently prioritizing and negotiate pricing accordingly.
Deals typically range from five figures for small specialized collections to seven figures for large, diverse audio archives with professional transcriptions.
Company Profile
Deepgram at a Glance
Founded: 2015 Headquarters: San Francisco, California CEO: Scott Stephenson Employees: ~200
Valuation: $1.3 billion (Series C, 2025) Total Funding: $208 million Key Investors: Tiger Global, Madrona, NVIDIA, Y Combinator
Scale: 50,000+ years of audio processed, 1 trillion+ words transcribed Developers: 200,000+ Growth: 3.3x annual usage growth
Key Products: Speech-to-Text API, Text-to-Speech API (Aura), Voice Agent API, Audio Intelligence Partnerships: IBM watsonx (first voice partner), Amazon Connect, enterprise contact centers
Deepgram is the leading voice AI infrastructure company. Their constant need for diverse, high-quality speech data makes them one of the most consistent buyers of audio training data in the market.
Technical Approach: Deepgram's models are end-to-end deep learning systems trained from scratch on raw audio, rather than using the traditional approach of converting audio to spectrograms first. This architecture gives them accuracy advantages, particularly on challenging audio with noise, accents, or poor recording quality.
Market Position: Deepgram is the leading independent voice AI infrastructure company, competing against division-level products from Google (Cloud Speech-to-Text), Amazon (Transcribe), and Microsoft (Azure Speech). Their focus on speed, accuracy, and enterprise deployment options has won them a loyal developer community and growing enterprise customer base.
Sell data to
Deepgram
through FileYield.
Deepgram is actively acquiring training data. If you own data that matches their needs, we can broker a private deal with clear licensing terms, legal compliance, and fair pricing. No public listings, no bidding wars.