Several tech giants, including Apple, Anthropic, Nvidia, and Salesforce, trained their artificial intelligence models on YouTube videos without the consent of platform owner Google and the authors of the videos, a Proof News investigative report found.
EleutherAI’s Role and the Pile Dataset
The alleged copyright infringer was EleutherAI, a non-profit organization that, according to its own statement, helps developers train AI models. Its target audience is not tech giants but small developers and scientists. EleutherAI has released the Pile dataset, a significant part of which is available and open to anyone on the Internet—all you need are the resources to download, store, and process it.
The Pile dataset included subtitles for 173,536 YouTube videos downloaded from more than 48,000 channels. Subtitle files are actually transcripts of videos, and YouTube platform rules prohibit downloading its materials without permission. However, Apple, Nvidia, and Salesforce—companies with capitalizations of hundreds of billions and trillions of dollars—have themselves admitted in their scientific papers that they used Pile to train AI. Apple, in particular, used Pile to train OpenELM models introduced in April, and already in June, they talked about new AI features that will appear on the iPhone and Mac.
Legal Implications and Ongoing Updates
If there was indeed copyright infringement in this incident, it was the non-profit organization EleutherAI that did it in the first place, and the tech giants may have been bona fide users of the publicly available dataset. This example once again shows that the field of AI training is still not well established from a legal perspective, concludes NIX Solutions. We’ll keep you updated on any developments in this ongoing situation.