Audio
Buy and sell audio data — call center recordings, voicemail, podcast raw audio, courtroom proceedings, emergency dispatch, and environmental sound. Speech AI and voice recognition companies need diverse real-world audio datasets.
Available Now · 4 listings
Spanish-English Bilingual Support Transcripts — 480K Conversations, Code-Switching Annotated
Transcribed customer support conversations where agents and callers switch between Spanish and English. Each segment is language-tagged at the sentence level with code-switching points annotated. Sourced from insurance and healthcare support lines. Critical for training multilingual NLP models and bilingual virtual agents.
Customer Service Call Recordings — 1.2M Hours, Sentiment-Labeled, Multi-Industry
Inbound and outbound customer service calls from telecom, utilities, and financial services contact centers. Each call is transcribed, sentiment-scored at utterance level, and tagged with call disposition codes. Includes hold times, transfer events, and CSAT survey responses where available. Built for conversational AI training, agent coaching models, and IVR optimization.
Emergency Room Triage Call Recordings — 340K Hours, De-identified, HIPAA Compliant
Audio recordings from 14 Level I trauma center emergency departments spanning 2018-2026. Each call is transcribed, speaker-diarized, and tagged with chief complaint codes (ICD-10). Ideal for training medical triage AI, clinical NLP models, and patient routing systems.
Podcast Transcription Corpus — 890K Episodes, Speaker-Diarized, Topic-Classified
Full transcriptions of 890K English-language podcast episodes across 14 genres (true crime, business, technology, health, comedy, politics, etc.). Each episode is speaker-diarized, topic-modeled, and sentiment-scored at the segment level. Powers podcast search engines, content recommendation systems, and long-form audio AI.
Groups
Browse by group.group.
All Subtypes