UPDATED: July 18, 2024, 4:44 p.m. EDT Salesforce reached out to Mashable for comment in response to Wired’s report.
A new report According to an investigation by Proof News and published on the company’s website, tech giants like Apple, Nvidia, Anthropic and Salesforce have used data from “thousands of YouTube videos” to train AI. Cableclaimed that captions from 173,000 YouTube videos were scraped for the companies’ AI models.
Called “YouTube Captions,” the dataset contains video transcripts from educational channels such as Khan Academy, MIT, and Harvard, as well as the the wall street journalNPR and the BBC. Content from YouTube stars like PewDiePie, Marques Brownlee and MrBeast was also discovered.
We have yet to hear back from Anthropic after reaching out for comment, but Apple and Salesforce have issued a response to Wired report.
Will Apple use this data for Apple Intelligence and other AI services?
The short answer is no, but here’s the longer answer for those who don’t identify with the “TLDR” crowd:
In an email to Mashable, Apple said its open-source language model, OpenELM, Indeed I used the dataset, but not in the way some might think.
The OpenELM project is part of Apple’s ongoing efforts to benefit the broader research community. In other words, according to Apple, the OpenELM model was created for research purposes only and go not underpin any Apple machine learning-based AI hardware or services, including Information about Apple.
Speed of Light Mashable
For the uninitiated, Apple Information is the company’s new suite of AI features, which were revealed at WWDC 2024 (Apple’s annual event where the company reveals what’s coming with its software offerings, including iOS and iPadOS.)
Apple Informationfor example, can help summarize a text, whether it’s an email or a text message, for faster interactions with friends, family, colleagues, etc. It will also support more entertainment-oriented features, such as Genmojiwhich generates new iOS emojis with a prompt. There’s also Image Playground, which lets users create AI-generated images on the fly.
New Genmoji feature coming to iOS 18.
Credit: Apple
When it comes to AI utilities for its consumers, Apple stressed that it offers websites the option to opt out of having their content used for AI training. Apple said its generative models are built and refined using high-quality data, including licensed content from publishers and image companies, as well as publicly available data from the web.
To put it succinctly, Apple doesn’t deny that its open-source language model, OpenELM, used the dataset, but wants to make it clear that it will not support any of its AI services, including Apple Intelligence.
Salesforce claims academic use
In an email to Mashable, Salesforce also gave its side of the story:
“The Pile dataset mentioned in the research paper was used to train an AI model in 2021 for academic and research purposes,” a Salesforce representative said. “The dataset was publicly available and released under a permissive license.”
What does Nvidia have to say?
We also reached out to Nvidia for comment, but the company, known for integrating AI into many of its gaming hardware and services, declined to issue a statement.
We’ll update this article if we hear anything from Anthropic.
The subjects
Apple
Artificial intelligence