A new investigative report just revealed the dark side of training AI tools like ChatGPT.
Time reports that OpenAI used outsourced Kenyan laborers earning less than $2 per day to make ChatGPT less toxic.
To do that, workers had to review and label large amounts of disturbing text to make sure ChatGPT didn’t use it in its responses. That included violent, sexist, and racist content that was, in some cases, extremely graphic.
Some workers reported serious mental trauma resulting from the work, which was eventually suspended by OpenAI and Sama, the outsourcing company involved.
In Episode 31 of the Marketing AI Show, Marketing AI Institute founder/CEO Paul Roetzer broke down why this topic is critical for business leaders to follow. Here’s what you need to know.
1. Sadly, this isn't a new phenomenon.
ChatGPT has everyone paying attention to AI. But few newcomers to the space know how AI models are trained.
The Time report, while disturbing, isn’t a new phenomenon. For over a decade, AI used to moderate content on social media sites has been trained in a similar fashion.
This doesn’t mean OpenAI doesn’t bear responsibility. It does mean that any company training language or image models—or AI models that moderate content—is doing something similar to train models on toxic content.
“The gray area is that this is how all these tools and platforms are trained,” says Roetzer.
2. And unfortunately, humans have to train AI on toxic content.
AI doesn’t recognize toxic content on its own.
“The only way for AI to automatically detect and remove that stuff before it spreads is for it to learn it's bad,” says Roetzer. It learns from humans who must identify and label toxic content over and over again, until the AI system learns enough to identify the content on its own.
3. You'll increasingly need to ask hard questions about the AI technology you buy.
Organizations are now moving quickly to integrate AI into all aspects of their businesses.
As you vet and buy more AI technology, you’ll likely get into more scenarios where you need to understand where the training came from and how the outputs were generated.
That’s not just because it may have been trained by data labeled in a problematic way. (Though that’s important.) It’s also because you need to know things like:
- Was the AI trained on data that is protected by copyright?
- Was the AI trained on data that includes inherent bias that call outputs into question?
- Was the AI trained on complete data that doesn’t lead to flaws in outputs?
Here’s how to start figuring this out
It’s not easy to figure this out. But you can get ahead of AI-driven disruption—and do it fast—with our Piloting AI for Marketers course series, a series of 17 on-demand courses designed as a step-by-step learning journey for marketers and business leaders to increase productivity and performance with artificial intelligence.
The course series contains 7+ hours of learning, dozens of AI use cases and vendors, a collection of templates, course quizzes, a final exam, and a Professional Certificate upon completion.
After taking Piloting AI for Marketers, you’ll:
- Understand how to advance your career and transform your business with AI.
- Have 100+ use cases for AI in marketing—and learn how to identify and prioritize your own use cases.
- Discover 70+ AI vendors across different marketing categories that you can begin piloting today.
Mike Kaput
As Chief Content Officer, Mike Kaput uses content marketing, marketing strategy, and marketing technology to grow and scale traffic, leads, and revenue for Marketing AI Institute. Mike is the co-author of Marketing Artificial Intelligence: AI, Marketing and the Future of Business (Matt Holt Books, 2022). See Mike's full bio.