AI is Stealing my Videos
0 up · 0 down · 0 ratings
Description
so there's a story that came out today and the headline was that Apple trained AI models on YouTube content without consent including MKBHD videos there's no statement from Apple on this yet but here's my take the real story is apple and a whole bunch of other tech companies are training their AI models using data that they buy from third-party data scraping companies some of which get their data in slightly illegal ways so one of them pulled tons of transcript data from thousands of YouTube videos including Mr Beast videos and my videos and many more so that company is a problem and that's going to be an ongoing evolving problem for many years and also Apple can technically say they're not at fault for this but they're going to keep getting data from companies like this so that's also something they need to vet but the double Emmy is I actually pay for more accurate manual transcriptions on every video that we put out so that people who are hearing impaired don't have autoc captions which suck so that means the stolen transcriptions specifically are paid content that's being stolen more than once
The short discusses a developing controversy around artificial intelligence models being trained on YouTube content without explicit consent. The speaker highlights Apple and other tech companies allegedly using data scraped from third-party services, some of which obtain content through questionable means. A key focus is the alleged inclusion of transcripts from thousands of videos, including high-profile creators, into data sets used to train AI. The speaker contends that while Apple may attempt to distance itself from fault by claiming they are not directly responsible, the ongoing practice of sourcing data from these providers creates a systemic problem that will require ongoing scrutiny and vetting. A central point is the contrast between this data practice and the speaker’s own commitment to accuracy and accessibility: he emphasizes that he personally pays for manual transcriptions to ensure hearing-impaired viewers receive accurate captions, arguing that automated or stolen transcripts degrade accessibility and misrepresent spoken content. The video frames the issue as a broader conflict between data monetization, user rights, and the necessity of reliable accessibility tools, concluding with a warning that stolen transcriptions represent content being appropriated multiple times and underscore the need for ethical data practices in AI development.
Topics · technology · digital rights · ai ethics · video accessibility · content creation
Questions answered
- What is the main concern raised in the short about AI training data?
- The main concern is that AI models are being trained on YouTube content without consent, using data scraped from third-party providers that may obtain content illegally, including transcripts from many creators.
- What action does the creator take to support accessibility for hearing impaired viewers?
- The creator states that they pay for accurate manual transcriptions for every video to ensure hearing impaired viewers have reliable captions, avoiding problems with auto generated captions.