A Hands-On Test of ChatGPT Vision

Written by Mike Kaput | Oct 10, 2023 3:31:18 PM

ChatGPT can now see, meaning it can now understand what is happening in any image you give it—so we took it for a test drive.

Why It Matters

We’re already blown away by ChatGPT’s vision capabilities, even though it is still so early. There are transformative use cases in this tool for marketers—some obvious, some not.

Connecting the Dots

On Episode 67 of The Marketing AI Show, Marketing AI Institute founder and CEO Paul Roetzer and I talked about Marketing AI Institute’s initial experiments with these new ChatGPT features.

1. It’s obvious that ChatGPT’s vision features display advanced reasoning.

This isn’t just ChatGPT identifying what’s in an image. In test after test, we found it capable of working logically through prompts to make educated guesses and produce smart analyses.

“It obviously has way more advanced abilities to reason, follow a chain of thought, and use a step-by-step process to do things,” says Roetzer.

2. Right now, it’s about the possibilities.

ChatGPT still has plenty of flaws and gets many things wrong.

“But, you can’t use this and not have your head swimming,” says Roetzer.

Already, we’ve been highly impressed with what this tool can do despite its limitations. “And this is the least capable form of this we're ever going to see.”

3. We’ve already used it for several powerful marketing and business use cases.

It was able to accurately diagnose a problem with a contact record in HubSpot based on a simple screenshot of the record’s history.

We used it to quickly identify typos in slides and visuals.

And it analyzed marketing data effectively just from a screenshot of a dashboard.

In each of these tests and others, it was able to produce competent results in a fraction of the time it would take a human marketer.

Online tests we’ve seen have the tool analyzing flowcharts to create strategies and turning wireframes of webpages into fully functional code.

4. This is a glimpse into the near future of AI.

ChatGPT with vision capabilities is just one of many systems that are or will be multimodal. We’re about to see a proliferation of general-purpose AI systems that can read, write, speak, and see.

These systems look like they’re have broadly intelligent capabilities across many different verticals.

“You can just start to imagine all the applications,” says Roetzer.

It begs the question: What happens when a handful of generally useful systems can do many different things really, really well?

What does that mean for all the vertical-focused software we use and invest in?

It’s possible we won’t need many—or all—vertical-specific solutions in the future.

What to Do About It

The only way to start solving for all the unknowns is comprehension and competency.

Multimodal systems will change things very, very quickly—and this creates a huge amount of uncertainty when it comes to learning, buying, and adopting technology.

You must act with urgency to develop comprehension and competency across your team with readily available multimodal AI systems.

“You have to understand what the stuff is capable of. And then the competency comes from experimentation,” says Roetzer.

View full post