We are so caught up right now in figuring out AI writing tools and large language models (LLMs) that most marketing and business leaders, as well as SaaS executives and investors, are missing the bigger picture. This is all just the foundation for what comes next.
Let’s look at email marketing as a practical example of what I mean.
Imagine you want to send an email promoting an upcoming event, product launch or promotion. But rather than a series of clicks and manual entries, you simply spoke or typed prompts for what you wanted the machine (aka the AI) to do.
I’m not talking about simple information retrieval and natural language generation (NLG), such as a chat feature that responds to queries and prompts, I’m saying that the AI will have the ability to perform actions (i.e. clicks, form fills, etc) the same way as humans.
Based on a collection of public AI research papers related to a concept called World of Bits (WoB), and in light of recent events and milestones in the AI industry, including legendary AI researcher Andrej Karpathy announcing his return to OpenAI, it appears that the capabilities for AI systems to use a keyboard and mouse are being developed in major AI research labs right now. This has been attempted in the past, but recent advancements in language AI appear to be bringing this closer to reality.
If AI develops these abilities at scale, the UX of every SaaS company will have to be re-imagined, and it will have profound impacts on productivity, the economy and human labor.
More on the macro conversation later. Back to our email marketing case for now.
So, let’s use our Marketing AI Conference (MAICON) as a hypothetical example, and I’ll pretend this is happening in HubSpot, which is our primary CRM and email system:
Current Workflow to Send an Email in HubSpot:
Note, to my knowledge, HubSpot is not currently infusing AI into any of these steps. This process is 100% manual / human powered.
Now, I could find a collection of 3rd party AI tools to intelligently automate pieces of this workflow (e.g. subject line writing, image generation, copywriting, send-time optimization), but those capabilities are not native in the platform today, and may be obsoleted as stand-alone features in the near future.
So, rather than building or buying a bunch of individual pieces, what if instead I simply opened HubSpot and gave it a series of spoken or written prompts:
Prompt 1: Draft an email for the High Engagement list about www.maicon.ai. The email should include information about the dates, location, agenda and speakers. Focus the messaging on a sense of urgency to solve for AI in their marketing programs, and use inspirational messaging about the opportunities to transform their company and career. Personalize the opening paragraph based on whether or not they attended MAICON 2022, and if they are an Academy member.
Prompt 2: Use a simple email design with a CTA that takes users to the event registration page. Add a montage of keynote speaker headshots to the top of the email. Turn the images into illustrations. Include Cathy’s signature and headshot at the end.
Prompt 3: Optimize the subject lines and send times for open rates at an individual recipient level based on each user’s history.
Prompt 4: Send test emails to the internal team. Hold the full send until you receive approval.
Prompt 5: Send a summary performance report with open rates, clicks rates, conversions, as well as a 250-word or less narrative about how the email performed vs campaigns benchmarks and goals. Include recommendations for improving future emails. Send reports at 1 day, 3 days, and 7 days.
This is not possible today. But, I believe it is where we are going in the coming years.
On Feb. 8, 2023, Andrej Karpathy, a founding member of OpenAI and former senior director of AI at Tesla, announced his return to OpenAI. My curiosity was immediately piqued as to why.
I knew it was a big deal, but I had to go back and re-listen to Karpathy’s interview with Lex Fridman (podcast here) to figure out why.
In the Fridman episode, Karpathy talks about transformers (the AI architecture kind, not the toys), language models, artificial general intelligence (AGI) and something called World of Bits (WoB), which I had not previously heard of.
The basic premise is that the World of Atoms is the physical world, our human domain. The World of Bits is the Internet, the machine’s domain.
In a 2017 research paper, World of Bits: An Open-Domain Platform for Web-Based Agents, Karpathy et al. explored the potential of AI agents to complete tasks such as booking flights and completing forms through simulated usage of a keyboard and mouse. They made progress, but obstacles remained:
“Conclusion: In this paper, we introduced World of Bits (WoB), a platform that allows agents to complete web tasks with keyboard and mouse actions. Unlike most existing reinforcement learning platforms, WoB offers the opportunity to tackle realistic tasks at scale. . . . we showed that while standard supervised and reinforcement learning techniques can be applied to achieve adequate results across these environments, the gap between agents and humans remains large, and welcomes additional modeling advances.”
In the October 2022 Fridman interview, Karpathy implied that those original barriers may be coming down, and a breakthrough could be imminent:
Fridman: You briefly worked on a project called World of Bits, training an RL system to take actions on the internet versus just consuming the internet like we talked about.
Karpathy: Yeah.
Fridman: Do you think there's a future for that kind of system interacting with the internet to help the learning?
Karpathy: Yes. I think that's probably the final frontier for a lot of these models, so as you mentioned, when I was at OpenAI, I was working on this project World of Bits and basically, it was the idea of giving neural networks access to a keyboard and a mouse. And the idea is that . . . basically, you perceive the input of the screen pixels and basically, the state of the computer is visualized for human consumption in images of the web browser and stuff like that. And then you give the neural network the ability to press keyboards and use the mouse and we're trying to get it to, for example, complete bookings and interact with user interfaces. . . .
Karpathy: Now, to your question as to what I learned from that, it's interesting because the World of Bits was basically too early I think at OpenAI, at the time. This is around 2015 or so. And the zeitgeist at that time was very different in AI from the zeitgeist today.
Karpathy: … it is time to revisit that and OpenAi is interested in this, companies like Adept are interested in this, and so on. And the idea is coming back because the interface is very powerful but now you're not training an agent from scratch. You are taking the GPT as an initialization. So GPT is pre-trained on all of text and it understands what's a booking, it understands what's a submit, it understands quite a bit more. And so it already has those representations. They are very powerful and that makes all the training significantly more efficient and makes the problem tractable.
Then, on Feb. 16, Karpathy Tweeted: Nice followup on our earlier OpenAI "World of Bits" work teaching AIs to use keyboard + mouse. Imo powerful to match AI "APIs" to those of humans bc the world is built for humans - gives completeness, incrementality, demonstration data. Applies in realms both digital and physical.
In the Tweet, he linked to a Feb. 16, 2022 paper, A data-driven approach for learning to control computers. The abstract for this paper reads:
“It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. . . . results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.”
In conclusion, the authors state:
“Humans use digital devices for billions of hours every day. If we can develop agents that can assist with even a tiny fraction of these tasks, we can hope to enter a virtuous cycle of agent assistance, followed by human feedback on failures, and hence to agent improvement and new capabilities.”
When you start connecting the dots, it appears that we are moving toward a world in which AI agents will not only retrieve and present information and answers, but have the ability to take actions in the digital world.
This changes things, likely in ways for which marketers, business leaders, software entrepreneurs and humans in general are not prepared.
We need to be thinking further into the future as an industry and society in order to be ready for what happens next.
We discuss this topic more extensively on Episode 35 of the Marketing AI Show.