Meet the "First AI Software Engineer"

Written by Mike Kaput | Mar 19, 2024 2:01:12 PM

The internet is blowing up with a jaw-dropping demo video of “the first AI software engineer.”

It shows an AI agent called Devin doing complex coding tasks without human oversight. Devin is the brainchild of Cognition, a small AI startup. And that brainchild can apparently do a lot. The demo shows Devin autonomously coding entire projects, including building a website.

According to Cognition CEO Scott Wu, Devin uses the same tools, reasoning, and problem-solving as a human software engineer. That's a fact tech leaders are gushing over.

Stripe CEO Patrick Collison called the demos "very impressive in practice." Perplexity CEO Aravind Srinivas said it's the first agent demo he's seen that crosses the threshold of human-level performance.

Does this mean AI software engineers are about to take all the coding jobs?

(Or that AI agents are about to take everyone’s jobs for that matter?)

I got the real story from Marketing AI Institute founder/CEO Paul Roetzer on Episode 88 of The Artificial Intelligence Show.

Make no mistake: AI agents are coming

The timing of Devin's release is almost comical, says Roetzer. The Artificial Intelligence Show's Episode 87, where Roetzer predicted an AI agent explosion in the 2025-2027 range, dropped the same day that Devin's demo began to go viral.

In that episode, Roetzer predicted a steady progression towards AI agents that can perform tasks autonomously. But he cautioned that we won't achieve full autonomy for quite some time.

Agents like Devin, while noteworthy, are "GPT-1, GPT-2 kind of agents," he says. They're impressive examples of AI acting autonomously, but still early.

The Devin demo is a perfect example of what he meant. It's an incredibly impressive agent, but it has plenty of flaws.

Roetzer points to AI expert Ethan Mollick's take. Mollick got early access to Devin and took it for a spin. He reports that, while Devin can indeed autonomously plan, research, code, and debug, it's still slow and breaks often.

A glimpse of the future of AI at work:

I got early access to Devin, the "AI developer" - it is slow & breaks often, but you can start to see what an AI agent can do.

It makes a plan and executes it autonomously, doing research, writing code & debugging, without you watching. pic.twitter.com/HHBQQDQZ9q
— Ethan Mollick (@emollick) March 15, 2024

But beware the hype

This type of nuance was lacking from the online discourse. The buzz around Devin was "insane," says Roetzer. Tech luminaries heaped praise on the demo with little to no acknowledgement of the fact that this was an early demo with a long way to go.

That can be dangerous, says Roetzer.

“If you don’t know what you’re looking at, you could think we just had the ChatGPT moment of agents and life just changed," says Roetzer. "That is not what happened."

Regardless, they will affect work

So, how will AI agents actually affect our work? Are we all losing our jobs to autonomous AI systems?

To assess when a task or job will truly be transformed by an AI advancement like Devin, Roetzer says we need to ask things like:

How reliable is it? What's the risk of it being wrong?
How repetitive and predictable is the task it's automating?
How much human oversight is still needed?
How complex is the task? Does it require reasoning, math, common sense, intuition?

To see how this plays out in practice, take the example of an AI agent trying to book a flight for you.

In practice, it may be able to complete the technical steps involved quite soon. But it will likely, for some time, lack an understanding of our individual nuances. (Think things like: which airports and travel experiences we prefer.)

“So it’s not like we’ll have these agents we turn on and things are just done for us,” says Roetzer. “There’s going to be a long period of time where we are observers, trainers, mentors, and managers of these agents.”

That likely means jobs will evolve in coming years to include AI agents as part of workflows, not as complete human replacements, says Roetzer.

“Agents are going to be a part of what we do, but your domain experience and expertise, your intuition and common sense, those things are going to be needed to make these agents economically viable,” says Roetzer.

That means we may get a future of work where AI agents are slotted into the org chart and do the work of multiple humans, but still need human oversight.

These agents will also likely be more specialized, rather than a general purpose ChatGPT-like agent for everything, says Roetzer. The narrower the domain an agent is trained on, the faster and more reliable it can become. It can reduce errors and the need for human interventions by having a more limited, but highly optimized, scope.

So, at least initially, expect agents with specific tasks like a social media manager agent, email marketing agent, ad buying agent, etc.

And you can expect more incredible demos

More eye-popping agent demos are coming, says Roetzer. Many of them will be remarkable. You can expect to see the likes of Google, Microsoft, OpenAI, and others demonstrate this technology.

“They’re all working on the exact same things,” says Roetzer. “And what we know about this space is that people follow fast.”

While no one's work or life may change today because of Devin, "this is exactly the kind of stuff we’re going to start seeing more and more of this year.”

View full post