Back to News Wire
Industry Source-backed article

21 Million Copyrighted Songs Found in AI Training Datasets

Published Jun 19, 2026 By Matt White
21 Million Copyrighted Songs Found in AI Training Datasets

Image via edm.com

TL;DR

An investigation by The Atlantic has uncovered four datasets containing more than 21 million copyrighted music recordings being shared among artificial intelligence developers. The largest collection, LAION-DISCO-12M, holds over 12 million tracks and was released in November 2024 by German nonprofit LAION. The findings come as AI music companies Suno and Udio face at least 12 lawsuits over alleged copyright infringement.

“Tech companies are using 54 OF MY SONGS to train and sell their generative AI models without compensation or permission from me”

Four Datasets Shared Thousands of Times

The collections include music from Taylor Swift, Bad Bunny, Billie Eilish, Nirvana, Pearl Jam and the Beatles, alongside legions of independent music producers. Two of the datasets hold more than 100,000 recordings each while the remaining two are considerably larger, containing roughly 9 million and 12 million tracks, respectively.

The Atlantic reported that Google and Stability AI have used tracks from the Free Music Archive, one of the smaller collections. Each dataset has reportedly been downloaded several thousand times, though the AI industry's practice of keeping training data confidential means it is largely unknown which companies have relied on which collections.

LAION-DISCO-12M Released for Academic Use

The largest of the four is LAION-DISCO-12M, a collection of more than 12 million tracks released in November 2024 by LAION, a German nonprofit that assembles open datasets for AI research. The organization is also behind the dataset used to train Stability AI's Stable Diffusion image generator.

LAION described the music collection as intended for academic use and explicitly warns against deploying it commercially or using it in its original form to create finished products. The dataset contains links to publicly available YouTube tracks and their associated metadata rather than the audio files themselves, and the organization says it does not distribute the music directly.

Producers Discover Unauthorized Use

Korean-American producer Kato On The Track, who has crafted hits for Tyga, Snoop Dogg and members of Wu-Tang Clan among other major artists, said 54 of his songs were swept up without consent or compensation.

The explosive investigation arrives as AI music companies face mounting legal pressure. Suno and Udio, two of the most prominent and controversial AI music generators, are now contending with at least 12 lawsuits, per The Atlantic. The litigation first erupted back in June 2024, when the Recording Industry Association of America, representing Sony, Warner and UMG, sued both companies for what it described as mass copyright infringement.

Major Labels Pursue Different Strategies

UMG settled with Udio in October 2025, announcing a compensatory legal settlement alongside new recorded music and publishing licenses for a jointly developed AI platform expected to launch in 2026. Under that arrangement, Udio's service will operate within what Universal described as a walled garden with audio fingerprinting and content filtering in place.

Warner reached its own settlement and licensing deal with Udio in November 2025 and, within days, became the first major label to reach a settlement with Suno as well. The Warner-Suno agreement, which the companies described as a first-of-its-kind partnership, also included Suno's acquisition of Songkick, the concert-discovery platform, from Warner. Sony, by contrast, has remained in active litigation against both companies.

Matt White

Matt White

EDM Source Editor

Reporting on the latest in the electronic dance music community with verified accuracy.

Community Signals

> No signals detected on this frequency.