The "Stealing Copyrighted Songs to Train AI" Thing is Way Worse Than We Thought

For two years, the International Confederation of Music Publishers (ICMP) has been compiling data pertaining to the theft of copyrighted songs. Huge tech companies are stealing intellectual property for the sake of training their Generative AI programs. Recently, ICMP shared their findings with Billboard. The results are staggering, worrying, and way worse than it initially seemed.

There’s been outrage for a long while about the way tech companies like Google, Microsoft, Meta, OpenAI, and X are scraping art, music, and writing to train their AI models. We’ve seen it in the way ChatGPT can be tasked with writing a short story in the style of Stephen King, for example. Or a song in the style of Nick Cave. The results are shoddy and embarrassing, but the more pressing issue has been the blatant theft of intellectual property.

ICMP has compiled substantial evidence over the past two years proving that these companies have stolen songs from big name artists. Among the names are The Beatles, Mariah Carey, The Weeknd, Beyoncé, Ed Sheeran, and Bob Dylan.

According to ICMP, the evidence is “comprehensive and clear” that companies are using stolen intellectual property to train their AI programs on a “global and highly extensive scale.” The use of this music has been unlicensed and for-profit. These companies are scalping millions of copyrighted songs without the barest hint of acknowledgement or guilt.

Generative ai companies have been stealing art to train their programs on an unimaginable scale

As Billboard reported, ICMP shared details from the investigation and included specific examples. Evidence came from public registries, leaked data, open-sourced AI training content, research papers, and independent studies from AI experts.

For example, ICMP found that Meta’s Llama 3 large language model pulled from copyrighted songs by popular artists. Research cited The Weeknd, Lorde, Bruno Mars, Childish Gambino, Imagine Dragons, Alicia Keys, Ed Sheeran, and Kanye West as examples. Several music publishers sued the AI corporation Anthropic for training its LLM, Claude, on lyrics from specific copyright-protected songs. The suit noted Don McLean’s “American Pie,” Lynyrd Skynyrd’s “Sweet Home Alabama,” and Beyoncé’s “Halo” as examples.

Furthermore, OpenAI admitted to training its Jukebox music making program on more than 1.2 million songs in 2020. More comprehensive investigation had them revealing the artists they stole from.

The list goes on. Google’s Gemini, AudioSet, and MusicLM, Microsoft’s CoPilot, and AI research company Runway have all revealed involvement in this practice. ICMP reported that Grok, X’s AI chatbot, is the worst offender.

“This is the largest IP theft in human history. That’s not hyperbole. We are seeing tens of millions of works being infringed daily,” ICMP director general John Phelan told Billboard. “Within any one model training data set, you’re often talking about tens of millions of musical works often gained from individual YouTube, Spotify and GitHub URLs, which are being collated in direct breach of the rights of music publishers and their songwriter partners.”

The issue is more than theft—it’s a disruption of revenue and livelihood

These tech companies are directly violating copyright and contract laws in order to train their AI models. Programs inundated into our daily lives without consent. They’ve taken over apps, social media, the internet as a whole, even our cellular devices. They’re disrupting job security, the environment, what it means to be human. Where does it end?

“This is not a victimless crime,” said a spokesperson for Concord Music Publishing. “These AI tools are being used in ways that will displace lyric writers and undermine existing royalty streams. Although Large Language Model (LLM) lyrics may never have the creativity of a human, LLMs trained on human lyrics coupled with their speed, scale and economy, will undermine the incentive to create new art, which is the core mission of copyright law.”

What this investigation proves is that AI tech companies believe they are the future. However, it seems that particular future lacks a distinct moral compass. Ethically, creatively, and intellectually, there doesn’t seem to be any benefit to further AI development. Still, ChatGPT can write you a song like Taylor Swift in under a minute. Why worry about the ethical implications? All of the reward with none of the hard work of using your brain or individual creativity. And isn’t that a comforting thought.

Photo by Andrey Rudakov/Bloomberg via Getty Images

The “Stealing Copyrighted Songs to Train AI” Thing is Way Worse Than We Thought

Tags: