Big Problems Discovered with AI Training

Written by Mike Kaput | Apr 25, 2023 12:37:58 PM

AI companies like OpenAI are coming under fire for how AI tools are trained…

Reddit, which is often scraped to train language models, just announced it would charge for API access, in order to stop AI companies from training models on Reddit data without compensation.

Twitter recently made a similar move. And Elon Musk publicly threatened to sue Microsoft for, he says, “illegally using Twitter data” to train models.

Don’t be surprised if other companies follow suit…

An investigative report by the Washington Post recently found that large language models from Google and Meta trained on data from major websites like Wikipedia, The New York Times, and Kickstarter.

The report raises concerns that models may be using data from certain sites improperly. In one example, the Post found models had trained on an ebook piracy site—and so likely did not have permission to use the data it trained on. Not to mention, the copyright symbol appeared more than 200 million times in the data set the Post studied.

What concerns does this raise for marketing and business professionals using these tools?

I spoke to Marketing AI Institute co-founder and CEO Paul Roetzer on Episode 44 of the Marketing AI Show to learn more.

This will change the value proposition of putting data out there for free. Expect to see companies with proprietary data either train their own AI models and products (like Quora has) or charge for access to the data via API. Some might do both, says Roetzer. It also changes the value equation of putting data out there for free. In the past, you gave free access to your data in exchange for valuable benefits like more users or traffic. That equation may now change, as free access means you could be training a model that replaces the need for your site or brand.
AI training will need to change. In Europe, it’s looking like AI companies are struggling to train models in ways that don’t violate European law. Everywhere, it also appears AI companies are training models on copyrighted material. AI companies may get hit with massive penalties or legal actions—or dodge regulations entirely. But one thing is clear, no matter what happens. “The way they build these models is going to have to evolve,” says Roetzer.
Business leaders need to be prepared. “You have to address the fact that you may be using technology that was built illegally,” says Roetzer. That doesn’t mean you’ll get in trouble for using the technology. (It’s highly doubtful, but please check with a lawyer.) But you are going to likely train custom versions of models moving forward, models that are trained largely on compliant data that you legally own. And, prepare to hear about legal cases hitting big AI companies, even some you might use, moving forward.

Don’t get left behind…

You can get ahead of AI-driven disruption—and fast—with our Piloting AI for Marketers course series, a series of 17 on-demand courses designed as a step-by-step learning journey for marketers and business leaders to increase productivity and performance with artificial intelligence.

The course series contains 7+ hours of learning, dozens of AI use cases and vendors, a collection of templates, course quizzes, a final exam, and a Professional Certificate upon completion.

After taking Piloting AI for Marketers, you’ll:

Understand how to advance your career and transform your business with AI.
Have 100+ use cases for AI in marketing—and learn how to identify and prioritize your own use cases.
Discover 70+ AI vendors across different marketing categories that you can begin piloting today.

View full post