ChatGPT can now see, meaning it can now understand what is happening in any image you give it—so we took it for a test drive.
We’re already blown away by ChatGPT’s vision capabilities, even though it is still so early. There are transformative use cases in this tool for marketers—some obvious, some not.
On Episode 67 of The Marketing AI Show, Marketing AI Institute founder and CEO Paul Roetzer and I talked about Marketing AI Institute’s initial experiments with these new ChatGPT features.
This isn’t just ChatGPT identifying what’s in an image. In test after test, we found it capable of working logically through prompts to make educated guesses and produce smart analyses.
“It obviously has way more advanced abilities to reason, follow a chain of thought, and use a step-by-step process to do things,” says Roetzer.
ChatGPT still has plenty of flaws and gets many things wrong.
“But, you can’t use this and not have your head swimming,” says Roetzer.
Already, we’ve been highly impressed with what this tool can do despite its limitations. “And this is the least capable form of this we're ever going to see.”
It was able to accurately diagnose a problem with a contact record in HubSpot based on a simple screenshot of the record’s history.
We used it to quickly identify typos in slides and visuals.
And it analyzed marketing data effectively just from a screenshot of a dashboard.
In each of these tests and others, it was able to produce competent results in a fraction of the time it would take a human marketer.
Online tests we’ve seen have the tool analyzing flowcharts to create strategies and turning wireframes of webpages into fully functional code.
ChatGPT with vision capabilities is just one of many systems that are or will be multimodal. We’re about to see a proliferation of general-purpose AI systems that can read, write, speak, and see.
These systems look like they’re have broadly intelligent capabilities across many different verticals.
“You can just start to imagine all the applications,” says Roetzer.
It begs the question: What happens when a handful of generally useful systems can do many different things really, really well?
What does that mean for all the vertical-focused software we use and invest in?
It’s possible we won’t need many—or all—vertical-specific solutions in the future.
Multimodal systems will change things very, very quickly—and this creates a huge amount of uncertainty when it comes to learning, buying, and adopting technology.
You must act with urgency to develop comprehension and competency across your team with readily available multimodal AI systems.
“You have to understand what the stuff is capable of. And then the competency comes from experimentation,” says Roetzer.