This week, returning from Thanksgiving, we are grateful for both Mike Kaput and Paul Roetzer joining us to share the latest AI news. Together, they tackle the latest OpenAI developments, analyze Andrej Karpathy's insightful video on LLMs, explore Ethan Mollick's new article on AI's business impact, and cover the latest AI advancements from various companies.
Listen or watch below—and see below for show notes and the transcript.
This episode is brought to you by our sponsor:
Meet Akkio, the generative business intelligence platform that lets agencies add AI-powered analytics and predictive modeling to their service offering. Akkio lets your customers chat with their data, create real-time visualizations, and make predictions. Just connect your data, add your logo, and embed an AI analytics service to your site or Slack. Get your free trial at akkio.com/aipod.
Listen Now
Watch the Video
Timestamps
00:02:41 — AGI, Q* and the latest drama at OpenAI
00:21:59 — Andrej Karpathy’s “The Busy Person’s Intro to LLMs,” is now on YouTube
00:41:14 — Ethan Mollick’s article on how AI should cause companies to reinvent themselves
00:49:54 — Anthropic releases a new version of Claude, Claude 2.1
00:52:42 — Inflection unveils Inflection-2, an AI model that may outperform Google and Meta
00:55:10 — Google’s Bard Chatbot can now answer questions about YouTube videos
00:56:37 — ElevenLabs Speech to Speech tool
00:58:06 — StabilityAI releases Stable Video Diffusion
00:59:39 — Cohere launches a suite of fine-tuning tools to customize AI models.
Summary
The latest drama at OpenAI
This week, we’re coming up on the 1-year anniversary of ChatGPT’s release—a development that transformed the world of AI and made OpenAI one of the most important companies of all time.
We are also coming off perhaps the most insane week in the company’s history, which saw the company nearly implode after its board fired CEO Sam Altman.
What a difference a week makes. After getting fired, Altman is now back as CEO of OpenAI and as part of his return, there is also a new board. (Though Adam D’Angelo on the previous board is also returning as a board member.)
Co-founder Greg Brockman, who quit in protest of Altman’s firing, is also back. More details on the reason Altman was fired in the first place have emerged, though we don’t have anything close to the full story yet.
According to Reuters, before Altman was fired, “...several staff researchers wrote a letter to the board of directors warning of a powerful artificial intelligence discovery that they said could threaten humanity, two people familiar with the matter told Reuters.”
Those grievances within the letter included concerns over how fast the company was commercializing advances before understanding their consequences.
These concerns appear to revolve around a project at OpenAI called Q* (Q-star). Q* is rumored to be an advanced model the company developed that can solve math problems it hadn’t seen before.
Within the AI research community, a model’s ability to do math is seen as an important technical milestone. It’s also a potential indicator that we have the ability to build AI systems that resemble human intelligence—which raises fears of AGI among some.
These fears appear to be significant enough to have prompted Altman’s firing. The Information reported on these fears, saying:
“A demo of the model circulated within OpenAI in recent weeks, and the pace of development alarmed some researchers focused on AI safety.”
Q* was created by two researchers on Ilya Sutskever’s team. Sutskever was one of the leaders of the action to fire Altman.
The busy person’s intro to large language models
Andrej Karpathy, a leading expert in AI who works at OpenAI, released a public version of a talk called “The Busy Person’s intro to LLMs.”
The 1+ hour video is a useful, accessible introduction to how large language models work, with highly practical examples.
In the video, Karpathy also speculates on the future of LLMs. In doing so, he highlights two important points that give us clues as to where AI is headed:
First, it’s more correct to think of LLMs not as chatbots, but the kernel process of an emerging operating system.
Second, LLMs as operating systems in just a few years may increasingly be able to use different media and tools to solve problems like humans do.
This may include capabilities like: having more knowledge than any single human about subjects, browsing the internet, using existing software infrastructure to complete tasks, self improving in certain narrow domains, understanding and generating text, images, video, and audio, thinking deeply on hard problems for a long time, and the ability to communicate with other LLMs.
Rebuilding organizations for AI
AI expert and Wharton professor Ethan Mollick just published a preliminary blueprint for how companies should be thinking about organizational change caused by AI.
Mollick notes that with today’s AI tools, we already have the ability to radically change the way we work.
“Theoretical discussions become practical. Drudge work is removed. And, even more importantly, hours of meetings are eliminated and the remaining meetings are more impactful and useful. A process that used to take a week can be reduced to a day or two.”
This is already happening today. And much more may be possible in the near future.
He writes: “We already can see a world where autonomous AI agents start with a concept and go all the way to code and deployment with minimal human intervention. This is, in fact, a stated goal of OpenAI’s next phase of product development. It is likely that entire tasks can be outsourced largely to these agents, with humans acting as supervisors.”
His point? “AI is impacting organizations, and managers need to start taking an active role in shaping what that looks like.”
Links Referenced in the Show
- The latest drama at OpenAI.
- The busy person’s intro to large language models.
- Rebuilding organizations for AI.
- Anthropic releases a new version of Claude with a 150,000-word context window and 2X decrease in hallucinations.
- Inflection unveils Inflection-2, a new AI model that may outperform Google and Meta.
- Google Bard can now answer questions about YouTube videos.
- Stability AI releases Stable Video Diffusion, their first foundation model for generative video.
- Cohere launches a suite of fine-tuning tools to customize AI models.
- Eleven Labs releases Speech to Speech tool to clone voices.
Read the Transcription
Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content.
[00:00:00] Mike Kaput: It's also an indicator that an AI system is starting to resemble what we might describe as human intelligence, and this raises fears of artificial general intelligence
[00:00:11] Paul Roetzer: welcome to the Marketing AI Show, the podcast that helps your business grow smarter by making artificial intelligence approachable and actionable. You'll hear from top authors, entrepreneurs, researchers, and executives as they share case studies, strategies, and technologies that have the power to transform your business and your career.
[00:00:31] Paul Roetzer: My name is Paul Roetzer. I'm the founder of MarketingAIInstitute, and I'm your host.
[00:00:40] Paul Roetzer: Welcome to episode 74 of the MarketingAIShow. I'm your host, Paul Roetzer, along with my co host, Mike Kaput. It's good to have you back, Mike, for doing a solo session last week. It's good to be back. How was the trip?
[00:00:55] Mike Kaput: Really nice. Really nice. Good time away with family, as we're going to [00:01:00] discuss. Being on a trip, typically is when everything happens in ait seems like. So, we've got a lot to catch up on.
[00:01:09] Paul Roetzer: I know, I tried so hard last week to just sort of shut down. I had a couple things on Tuesday I had to do.
[00:01:15] Paul Roetzer: And then I, I largely shut down Wednesday through today. So we're, this is Monday, November 27th at 10 a. m. Eastern time. But I was also just... Kind of passively trying to connect the dots on what was going on. And so I think we're going to, anybody listened to episode 73 where I dove into sort of the story of what was going on at OpenAI and, some possibilities of what, may have been the cause of, Sam Altman and Greg Brockman leaving and then, returning.
[00:01:45] Paul Roetzer: We're going to kind of expand on that today. I think there's some interesting connections to be made with some other things that are happening. and some additional information that's come out a little bit. So we're gonna, we're gonna go a little bit more into that, but [00:02:00] first, today's episode is brought to us by Akkio, the generative business intelligence platform that lets agencies addAIpowered analytics and predictive modeling to their service offering.
[00:02:10] Paul Roetzer: Akkio lets your customers chat with their data, create real time visualizations, and make predictions. Just connect your data, add your logo, and embed anAIanalytics service to your site or Slack. You can get a free trial at Akkio, that's Akkio.com/aipod. Akkio.com/aipod. All right, Mike.
[00:02:33] Paul Roetzer: Let's let's get into it. We're back to our regularly scheduled programming with three main topics and rapid fire items. So it's all yours Mike.
00:02:41 — AGI, Q* and the latest drama at OpenAI
[00:02:41] Mike Kaput: All right. So this week we are coming up on a milestone here. It is the one year anniversary of ChatGPT's release, which is a development that has Certainly transformed our world and the world of AI and made OpenAI one of the most significant companies we're [00:03:00] likely to see in our lifetime and this kind of coincides with coming off perhaps the most insane week in the company's history this Where we saw the company nearly implode after its board fired, CEO, Sam Altman.
[00:03:14] Mike Kaput: As you are no doubt aware in AI, a day is like a week and a week is like a year. And in the last week after getting fired, Altman is now back as CEO of OpenAI as part of his return. There's also a new board, though it sounds like Adam DeAngelo from Cora, uh uh was who was on the previous board is also returning.
[00:03:36] Mike Kaput: Co founder Greg Brockman, who quit in protest of Altman's firing, is also back, and significantly more details on the reason Altman was fired in the first place have started to emerge, though we don't have anything close to a full story yet. However, according to Reuters, before Altman was fired, quote, several staff researchers wrote a letter to the board of [00:04:00] directors warning of a powerfulAI discovery.
[00:04:03] Mike Kaput: that they said could threaten humanity. Two people familiar with this topic reported this to Reuters. Now, this was one of many factors, it sounds like, that were involved in Altman's firing, according to these sources. And grievances also included concerns over how fast the company was commercializing advances in AI before understanding their consequences.
[00:04:26] Mike Kaput: Now, some of these concerns appear to revolve around a project at OpenAI called Q-STAR. Q-Star is rumored to be an advanced model that the company developed which can solve math problems that it hasn't seen before. Now, for some context here, within theAIresearch community, a model's ability to do math is seen as a really important technical milestone.
[00:04:52] Mike Kaput: It's also a potential indicator that An AI system is starting to resemble what we might describe as human [00:05:00] intelligence, and this raises fears of artificial general intelligence among some people. These fears appear to have been significant enough to have prompted Altman's firing. The information reported on this saying, quote, A demo of the model circulated within OpenAI in recent weeks, and the pace of development alarmed some researchers focused onAIsafety.
[00:05:24] Mike Kaput: Now, interestingly, Q-STAR was created by two researchers on Ilya Sutskever's team. Sutskever was one of the leaders of the action to fire Altman. So, Paul, based on the context you gave in episode 73, And what you're seeing now, can you walk us through what's going on here and what might have prompted Altman's ouster?
[00:05:48] Paul Roetzer: First, I'll say, the Reuters and the information articles, which I think came out last Wednesday, I believe, is really all we have. There hasn't been much talked [00:06:00] about. There was apparently an email internally at OpenAI acknowledging this QSTAR program. But there's very little information about it and so I'll kind of, I want to start by going through a little bit more from the information, which is the media outlet, which, by the way, if you are fascinated by this stuff, pay the 159 a year, whatever it is for the information, like It's, it's a fantastic source and they seem to always, get some great scoops.
[00:06:33] Paul Roetzer: So it's one of the sources Mike and I read all the time to, kind of keep up on this stuff. So, I kind of, again, a lot of what they were having and what Reuters had was very similar. The information seemed to go a little bit deeper. And the thing that caught my attention was if you recall from episode, 73, if you listened to that episode where I kind of went into all of this, I kind of ended it with, what did Sam [00:07:00] see in that room that was the potential breakthrough?
[00:07:03] Paul Roetzer: So he referenced, that a couple weeks earlier he had seen some major breakthrough basically. So the information article starts off with one day before he was fired by OpenAI's board. Sam Altman alluded to a recent technical advance the company had made that allowed it to push the veil of ignorance back and the frontier of discovery forward.
[00:07:22] Paul Roetzer: The Cryptic remarks at the APEC CEO summit went largely unnoticed as the company descended into turmoil. The article went on to say, but some OpenAI employees believe Altman's comments referred to an innovation by the company's researchers earlier this year that would allow them to develop far more powerful AI models.
[00:07:40] Paul Roetzer: Person with, familiar with the matter said the technical breakthrough, as you said, Mike, spearheaded by OpenAI chief scientist, Ilya Sutskever, raised concerns among some staff that the company didn't have proper safeguards in place to commercialize such advancedAImodels. Now, part of the friction appears to be that Sam and [00:08:00] Greg obviously were aware of this capability, and it sounds like maybe they were not only not stopping it, but they were potentially building some of these capabilities into GPT-5 and that the GPT's release was actually meant to accelerate some of what was being developed within this QSTAR program.
[00:08:22] Paul Roetzer: So again, not just that they were not stopping it, but that they were racing forward to commercialize it. So then, come back to the information article, said in the following months, senior OpenAI researchers used the innovation to build systems that could solve basic math problems, a difficult task for existing AI models.
[00:08:40] Paul Roetzer: Then it says two of the top researchers used Sutskever's work to build a model called QSTAR, that was able to solve math problems that it hadn't seen before, an important technical milestone, and that a demo of this had circulated in recent weeks. And the pace of development alarmed some researchers.
[00:08:56] Paul Roetzer: So again, the timing of this demo seems to align [00:09:00] with Sam referencing something he had seen internally in that Apex CEO summit. Said Sutskeva's team, this work had not been previously reported, and a concern inside the organization suggests that tensions within OpenAI about the pace of work will continue even after Altman was reinstated.
[00:09:17] Paul Roetzer: In the months following the breakthrough, Sutskever, who also sat on OpenAI's board until he got fired or released from the board, he did not get fired, released from the board, appears to have had reservations about the technology. In July, and now this is where the timing is going to start becoming relevant to that stuff we're going to talk about today.
[00:09:35] Paul Roetzer: So in July, if we recall from episode 73, Sutskever formed a team dedicated to limiting threats from AI systems vastly smarter than humans. On its webpage, the team says, while Superin intelligence seems far off now, we believe it could arrive this decade. Now that refers to the super alignment team that Skova formed.
[00:09:52] Paul Roetzer: I think they announced the July 6th of this year. TCA's Breakthroughs allowed OpenAI to overcome limitations on obtaining [00:10:00] enough high quality data to train new models according to a person with knowledge, a major obstacle for developing next generation models. The research involved using computer generated rather than real world data like text and images pulled from the internet to train models.
[00:10:15] Paul Roetzer: Now, I'll pause here because... If you're unfamiliar with the areas of research within the AI community, this may sound like some crazy thing. Synthetic data is being used by everyone. So every major research lab is not only talking about and researching, but using synthetic data. Tesla uses it to train their cars.
[00:10:37] Paul Roetzer: Anthropic, we talked in an earlier episode about how Anthropic is using it and how Dario Amadei, the founder and CEO of Anthropic, thinks that it can be a key on lock. So synthetic data. While in this article is not given the context of this is a widespread thing, it is in fact a widespread thing. So this, this paragraph is not really anything I don't think significant to like, [00:11:00] some advancement that OpenAI is making and other people aren't.
[00:11:02] Paul Roetzer: I went on to say for years, Sutskever had been working on ways to allow language models like GPT-4 to solve tasks that involved reasoning, like math and science problems. In 2021, he launched a project called GPTZero, a nod to DeepMind's AlphaZero program that could play chess. Go and Shogi. I boldfaced this one.
[00:11:24] Paul Roetzer: The team hypothesized that giving language models more time and computing power to generate responses to questions could allow them to develop new academic breakthroughs. This time concept is going to become very important. We're going to hit on this again and again in the coming topics, so we'll come back to that.
[00:11:45] Paul Roetzer: Lukas Kaiser, one of the coauthors of the groundbreaking transformer research paper from Google in 2017, which describes an invention that paved the way for more sophisticated models, held a key role on the GPTZero project. So again, you start [00:12:00] to see all these interconnections between that, attention is all you need paper, the eight or nine authors of that paper and their relevance to all, everything that's going on here, among the techniques, the team experimented with.
[00:12:12] Paul Roetzer: was a machine learning concept known as test time computation. which is meant to boost language models problem-solving abilities. Okay, so this sounds really complicated. It's not. Basically means if you give these things more time, they show more reasoning capability. They, if you don't give them, ask for like an instant response.
[00:12:30] Paul Roetzer: Are you going to ChatGPT? Like write me an article, write me an email, whatever. It just does it instinctually. Like it just takes its data and goes. If you ask it to solve a more complex like story problem, it's not really good at it because it just. immediately does something based on its training data.
[00:12:45] Paul Roetzer: What they are saying here is if you give it more time to think and to work through reasoning capabilities, it actually seems to do it. And so that's this idea of this test time computation. Again, we're going to come back around to this topic in a [00:13:00] couple minutes. So Earlier this year, Sutskever and his team discovered a variation of this test time computation method that prompted far greater results in their efforts to train more sophisticated models.
[00:13:12] Paul Roetzer: Now this, to me, is where it starts to get interesting. This is where you start to see something different. So the thing I thought was a little misleading, or that people I saw on social media kind of latch onto right away, was this Q star thing. And as though it was like, on its own, some groundbreaking thing.
[00:13:29] Paul Roetzer: The general belief is that the Q star is referencing Q learning, which is actually a known form of reinforcement learning in AI research and development. So it's this idea that, so I actually asked ChatGPT, I'd never heard of Q learning. So I asked ChatGPT for an explanation and I went and kind of verified the definition on Google.
[00:13:52] Paul Roetzer: What ChatGPT said was it's a key algorithm in AI that. helps machines learn optimal behaviors through trial and error, making it [00:14:00] crucial for the development of autonomous systems and decision making. It's simplicity combined with its power has made it a staple of AI research and applications. So again, Q learning, which seems to be the origin of the name of Q STAR is not.
[00:14:15] Paul Roetzer: new. As a matter of fact, Yann LeCun, who we often talk about, who runs the research lab at Facebook and Meta, he tweeted, please ignore the deluge of complete nonsense about QSTAR. One of the main challenges to improve language models reliability is to replace, this gets technical, auto regressive token prediction with planning.
[00:14:37] Paul Roetzer: Meaning it's just predicting the next token or next word in a sequence, and you're trying to improve that. by giving these things planning capabilities. It says pretty much every top lab, FaiR, which is Facebook's, DeepMind, OpenAI, etc. is working on that and some have already published ideas and results.
[00:14:55] Paul Roetzer: It is likely that QSTAR is OpenAI's attempts at planning. They [00:15:00] pretty much hired Noam Brown to work on that. And then he also added, Note, I've been advocating for deep learning architecture capable planning since 2016. Now, one other interesting thing real quick I will share to understand this Q Learning concept.
[00:15:16] Paul Roetzer: It was Thanksgiving week, I think this was Wednesday, and so I said to ChatGPT, Explain it to, how can I explain Q Learning to my family at Thanksgiving dinner? I actually thought this was pretty good. So, it said, It can be fun and engaging experience, blah, blah, blah. So it said, we'll use a turkey analogy.
[00:15:33] Paul Roetzer: Imagine you're trying to cook the perfect Thanksgiving turkey. You have a variety of actions to choose from. Oven temperature, how often to baste, what seasonings to use. Each combination of these actions can lead to a different result. A delicious turkey, a dry turkey, or maybe even a burnt one. So, in Q Learning, theAIlearns from trial and error.
[00:15:52] Paul Roetzer: It learns from mistakes. So it said, just like when you first learn to cook, you might make mistakes. Maybe the turkey gets overcooked, or it's not [00:16:00] seasoned enough, but with each attempt, you learn a little more about what works and what doesn't. So then, the Q learning has a quality score that basically applies a quality score to each output, in essence.
[00:16:11] Paul Roetzer: So the Q stands for quality of each action you take. Think of it as a score for each decision you make while cooking. A high score means a great tasting turkey. A low score means you might need to order a pizza. So basically, Q learning is a way for theAIto learn from its actions through trial and error, and then kind of remember what led to a greater Q score, in essence.
[00:16:33] Paul Roetzer: So I think, at a high level, That's the key here. Now, I want to go back to the Gnome Brown reference because I think this is actually a really key aspect of everything. So if we recall on July 5th, of this year, that's when Ilya and Yann LeCun, introduced Super Alignment Team at OpenAI. So July 5th.[00:17:00]
[00:17:00] Paul Roetzer: On July 6th, Noam Brown announced he was joining OpenAI. Now I'm going to go to the tweet thread and we'll put this in the show notes. When Noam Brown announced this. And I'm going to, I'm going to kind of stretch a couple of key points here. So he announced, I am thrilled to share that I have joined OpenAI.
[00:17:19] Paul Roetzer: Now he was previously at Meta. I'll get to that in a minute. So he came from the Meta research lab under Yann LeCun. So he worked for Yann LeCun. So he said, for years, I have researchedAIself play and reasoning. You're going to hear the word reasoning a lot. In games like poker and diplomacy. I'll now investigate how to make these methods truly general.
[00:17:40] Paul Roetzer: Artificial General Intelligence. Remember, the term general is not, is a, is a meaningful word here. If successful, we may one day see large language models that are 1, 000 times better than GPT-4. I'm going to stop here and say, Noam Brown didn't leave meta. Unless he [00:18:00] knew OpenAI was working on this stuff and that there was a path to apply his research immediately.
[00:18:07] Paul Roetzer: He's not leaving to go do this three to five years from now. So timing wise, whatever this QSTAR program is, whatever the breakthrough is, probably happened earlier in 2023. That led to some major advancements and the creation of the super alignment team. So again, they didn't create the super alignment team overnight.
[00:18:25] Paul Roetzer: So you probably realized there was some breakthrough, the need for the super alignment team emerged. And they bring in Noam Brown to work on this exact thing. So Noam goes on to say in 2016, AlphaGo beat Lee Sedol in a milestone for ai. Mike and I wrote about this in our book, Marketing Artificial Intelligence.
[00:18:43] Paul Roetzer: But the key to that was ai's ability to ponder. Again, go back to the time idea. To ponder for approximately one minute before each move. How much did that improve it? For alpha goes zero, it's the equivalent of scaling pre [00:19:00] training 100, 000x. What that means is, the model improved its quality of output, its prediction, 100, 000x.
[00:19:12] Paul Roetzer: by not taking the move immediately, instinctually, based on its data, but actually pondering the move, reasoning through what to do. He goes on to say, Also in 2016, I observed a similar phenomenon in poker. That insight led to our Libratus pokerAI don't know if I'm saying that right, that beat top humans for the first time.
[00:19:35] Paul Roetzer: Andy Jones investigated the train to time, test time compute trade off in detail in a paper. And then it went on to say all those prior methods are specific to the game, but if, but there's a key, but if we can discover a general version, the benefits could be huge. Yes, inference may be 1, 000 times slower.
[00:19:57] Paul Roetzer: That means coming up with the output, like the time it [00:20:00] takes to come up with the output, may be 1, 000 times slower and more costly, but inference costs, but what inference costs would we pay for a new cancer drug or a proof of the Reimann hypothesis? Improved capabilities are always risky. Super alignment team was built the day before.
[00:20:19] Paul Roetzer: Improved capabilities are always risky, but if this research succeeds, it could be valuable for safety research as well. Imagine being able to spend 1 million on inference to see what a more capable future model might look like. It would give us a warning that we otherwise lack. So when I started kind of connecting that, and I'll put in the show notes, we wrote about Gnome in, February 1st of this year, I wrote a blog post called Meta's AI, MetaAICicero provides a glimpse into the future of human machine collaboration.
[00:20:53] Paul Roetzer: We're talked about some of Gnome's research. And then we also talked about that in a tweet. Or I'm [00:21:00] sorry, in a, in a episode earlier this year. So when we start to consider all of this, we get beyond just the Q learning thing, and we start looking at everything else that was going on, who was being hired, keep in mind, Andres Karpathy, who we're going to talk about next.
[00:21:14] Paul Roetzer: He rejoined AI and OpenAI in February of this year. So I think they already were making these breakthroughs by February of 2023. So what could OpenAI be working on that could cause so much commotion and what does the hiring of Noam Brown and even earlier than that, Andres Karpathy and their work.
[00:21:34] Paul Roetzer: potentially tell us. And that takes us to our next main topic, Mike.
[00:21:40] Mike Kaput: There is some method to the madness behind the topics this week. So pay close attention because they do link up pretty well.
[00:21:47] Paul Roetzer: And you may want to go back and like re listen to this because I know we're, we're trying to pack a lot in here, but trust me, it all connects.
[00:21:55] Paul Roetzer: And I think that the picture is going to become more clear as we go through this.
00:21:59 — Andrej Karpathy’s “The Busy Person’s Intro to LLMs,” is now on YouTube
[00:21:59] Mike Kaput: So [00:22:00] in our next topic, Andrej Karpathy, who you just mentioned, a leading expert in AI who works at OpenAI, just released a public version of a talk that he's given called The Busy Person's Intro to LLMs, Large Language Models.
[00:22:16] Mike Kaput: Now, On the surface, this is a one plus hour video. It's very useful, highly accessible introduction to how large language models work. It has a bunch of very practical examples. I'd highly recommend anyone with a deeper interest in this take a look at it. But as part of this video, Karpathy also speculates on the future of LLMs.
[00:22:39] Mike Kaput: And in doing so, he highlights a couple of really important points that I think give us some clues as to where all this may be headed. So first, he encourages us to think of LLMs not as chatbots. It's actually more correct. to think of them as the kernel process of an emerging operating system. So LLMs [00:23:00] as an OS.
[00:23:01] Mike Kaput: And second, LLMs as operating systems, in just a few years by his estimation, may increasingly be able to use different media and actual tools. digital tools to solve problems just like humans do. So he says we could see relatively soon LLMs that have capabilities like having more knowledge than any single human about a given subject, the ability to browse the internet, use existing software infrastructure to complete tasks in certain narrow domains, actually self improve at how it gets its results, and obviously understanding and generating text, images, video, and audio.
[00:23:42] Mike Kaput: He also does call out the idea, Paul, that you highlighted in the previous topic, the ability to think deeply on hard problems for a longer time than just a snap judgment. And last but not least, he predicts we may have LLMs that could communicate with other LLMs. [00:24:00] So there's a little more going on in this video than just a 1 0 1 class on LLM.
[00:24:06] Mike Kaput: So Paul, can you maybe connect those dots a little further for us?
[00:24:10] Paul Roetzer: So first, I think it's good to revisit Karpathy's background. So he was actually a founding member of OpenAI. So he was there from January, 2016 to June, 2017. We talked about Andrej in, February 21st. This year episode 35, so if you want some more background on him and a paper he worked on called World of Bits was the key thing that we focused on and actually wrote a blog post about it at that time as well that we'll link to, but in.
[00:24:45] Paul Roetzer: In essence, he, he then went on after OpenAI to work at Tesla. He led theAIteam there and he developed the computer vision team that was critical to full self driving. So he played a key role there, but then when he announced he was going back to [00:25:00] OpenAI in February, actually February 8th of this year.
[00:25:02] Paul Roetzer: That was what caught my attention and led to me writing a blog post, going back and re listening to an episode he had done with Lex Fridman, a podcast interview he had done, I was trying to figure out why was he going back to OpenAI, what was going on? And so he actually alluded to it. I'll read a quick excerpt from the transcript of his interview with Lex Fridman, because Lex asked him like, why are you going back to OpenAI?
[00:25:25] Paul Roetzer: What is going on? Or like, really, what do you think the future is? This was before he actually announced he was going back. He said, do you think there's a future for that kind of system interacting with the internet to help learning? So they were talking specifically about likeAIagents. We've talked a lot about how the future is going to be these agents that can take actions, not just produce outputs.
[00:25:42] Paul Roetzer: So Karpathy said, yes, I think that's probably the final frontier for a lot of these models. So as you mentioned, when I was at OpenAI was working on this project, World of Bits, and basically it was the idea of giving neural networks access to a keyboard and a mouse. And the idea is that basically you perceive the input of the [00:26:00] screen pixels and the state of the computer is visualized for human consumption and images of the web browser and stuff like that.
[00:26:06] Paul Roetzer: And then you give the network the ability to press keyboards and use the mouse and we're trying to get to it. For example, complete bookings and interact with user interfaces. So again, the reason I'm giving this context. is, you have to understand, Karpathy went back to OpenAI in February of this year to work on what I'm explaining right now, followed by Noam Brown in July, followed by what has happened over the last two weeks.
[00:26:30] Paul Roetzer: It's all connected, trust me. So, Karpathy then goes on to say, now to your question as to what I learned from that, It's interesting because the world of bits was basically too early. So again, 2016, 17, it was too early to give theseAIagents the ability to take action. This is around 2015 or so. And the zeitgeist at the time was very different inAIthan today.
[00:26:54] Paul Roetzer: He said, it is time to revisit that. And OpenAI is interested in this. Companies like Adept, which we've [00:27:00] talked about before on the show, are interested in this and so on. And the idea is coming back because the interface is very powerful, but now you're not training agents from scratch. You are taking the GPT as an initialization, meaning the ability now for theAIto understand and generate language.
[00:27:16] Paul Roetzer: So GPT is pre trained on all this text. And so now the agent understands what a booking is, it understands what a submit is, it understands quite a bit more. And so it already has those representations. These are very powerful and make all the training significantly more efficient. So Karpathy is critical to a lot of the last, like, seven years of ai.
[00:27:40] Paul Roetzer: He's been a key player. And so he went back to OpenAI to work on these interactive agents, but in the process, obviously goes very deep on language models. So, he doesn't publish often on YouTube, and so when he does, you pay attention. And so I found this video extremely intriguing on a number of [00:28:00] levels.
[00:28:00] Paul Roetzer: So, first was, I think it gives a great overview of how a large language model works. I would say it's like, It's not a beginner level introduction, I wouldn't say. Like, I feel like you need some basic comprehension of large language models for this introduction to make sense. But it's definitely kind of like a 201 level.
[00:28:18] Paul Roetzer: It is not highly technical. Like, I really think everybody should go listen to it. Maybe listen to it a couple times, depending on your current familiarity with large language models. But I really do think it gets this great overview of how they work. But what I want to focus on is like the last 20 25 minutes of the video where he went into what comes next.
[00:28:37] Paul Roetzer: Now, I found this intriguing because 1. I think he's right, obviously. I mean, he knows more than I do. 2. It confirms everything I've been hearing from the other research labs. 3. The timing is really intriguing. Like, the fact that he's... presenting this as, Hey, I'm not saying this as OpenAI doing this, but this is what the research labs are doing.
[00:28:57] Paul Roetzer: Cause they are all connected. They've all worked with each other the last 10 [00:29:00] years. He knows what the other labs are doing. Just like Yann LeCun knows what the other labs are doing. So I think it's fair to say the things we're about to go through are things that not just OpenAI is working on, but. OpenAI is definitely working on it.
[00:29:14] Paul Roetzer: So, one, large language models scaling laws. So we hear this term scaling laws all the time. What does it actually mean? What it means in their case, and this is one of the things I kind of learned from this, is it is all about the accuracy of the model at predicting the next word. So today large language models are basically predictive engines around what the next word is it should write.
[00:29:39] Paul Roetzer: And so what he's saying is, so far. The more parameters we give these things, the more training data we give them, and then the time we spend training them, they pretty, we can pretty much predict how accurately they are going to generate the next word. And that his opinion and their data says that [00:30:00] shows no signs of topping out.
[00:30:02] Paul Roetzer: So in essence, training bigger models for longer and you get more powerful and accurate results. We're going to talk about a number of these foundation models in the rapid fire today. So what he said, and again, I find the use of the wording interesting. We can expect a lot more general capability across all areas of knowledge as these models get bigger.
[00:30:24] Paul Roetzer: So that was one. Two, connecting to and integrating tools makes the models more capable. So think about what they've been doing with ChatGPT. You had Code Interpreter, you have Bing Search Added, it has Calculator now, it has Image Generation and Recognition, it will have Video Generation and Recognition, it will have Audio Generation and Recognition.
[00:30:44] Paul Roetzer: So what OpenAI has been doing thus far is connecting tools to the model. they are not baked into the model per se. Now I think GPT-5 will be different. And the reason I think that is because I just read an interview with Sundar Pichai of Google [00:31:00] and he said that Gemini, their approach is different in that these tools are built into the foundation model itself.
[00:31:06] Paul Roetzer: So they are not going out and building an image generation tool and then connecting it. That's what BART is today. But the next iteration These models will all be built right in. These tools will be built right into the foundation model. So truly multimodal, which is number three, which we know, again, they are building in these capabilities of images, videos, audio, text, code.
[00:31:24] Paul Roetzer: All of that is getting built right into the models. Number four, this goes back to what we talked about earlier, and I think this is a really interesting concept. So he talked about a book called, Thinking, Fast and Slow, by, I forget the guy's name, Daniel something. And in that book, which I bought, I haven't read it yet, they talk about system one thinking is, system one thinking is instinctive.
[00:31:48] Paul Roetzer: It is what the like, large language model, Large language models are right now. So a system one model, you give it an input and it immediately gives you an output, just takes its training data and spits out the [00:32:00] answer. System 2 thinking, you have to work things out in your head, you think through possible outcomes, you think through the implications of those outcomes, maybe in the Q learning model, you think through the potential score or value of the future state once you take the action.
[00:32:17] Paul Roetzer: So he said, that is where we are going, is we're going to build models that have this reasoning capability, and so the more time you take to generate the output, the more accurate the prediction becomes. So. Right now you can sort of see parts of this where if you tell ChatGPT like take your time or think it through step by step It tends to improve the accuracy.
[00:32:40] Paul Roetzer: Now, interesting again, how it's all connected. Meta, on November 20th, just released a research paper called System 2 Attention, and then in parentheses, is something you might need too, which I actually laughed at. So you'll appreciate that, Mike. So, we go back to the origin of all this. 2017, the Transformer paper, the [00:33:00] title was Attention is All You Need.
[00:33:02] Paul Roetzer: So meta system to attention is something you might need to is kind of funny. So in there, they basically talk about this premise that the more time you give these models, the better they get at reasoning. Number five, my takeaways and what he highlighted in the future was self improvement. And so. He talked a lot about AlphaGo.
[00:33:21] Paul Roetzer: So again, if you haven't watched the AlphaGo documentary that tells the story of DeepMind beating the world champion at Go, watch it. You will understand this even, more. So he talks about how, traditionally these models learn through imitating human experts at players, but what AlphaGo did, what the DeepMind team did, was they, they enabled self improvement by enabling it to play against itself.
[00:33:45] Paul Roetzer: And the reward function was, did you win or not? And so the way that AlphaGo got way better, got superhuman at the game of Go, was it played itself millions of times. And it learned what to do. So, [00:34:00] he asked the question, Karpathy was like, What is the equivalent for large language models? So, right now, these models use imitation of human labeling and training data.
[00:34:08] Paul Roetzer: So, he explains, like, you, you build the model, and then humans go through and, provide labeling and training of the data. And, So it just basically imitates us. So the question becomes, and I think this is where the real potential breakthrough in QSTAR is happening, is how do we move to self improvement of these models so that we don't have to rely on human labeling and training of the model once it's been created?
[00:34:34] Paul Roetzer: So the question that they grapple with is what is the reward function? So in a game like AlphaGo, you win or you lose, that's the reward. You get more points if you win than if you lose. But in a language model where it just creates an output, how do you, how do you reward that? How do you know if it was good or bad?
[00:34:50] Paul Roetzer: Right now there's a thumbs up or a thumbs down, like maybe that's part of their early effort. But he said it's not easy to evaluate reward function in an LLM. [00:35:00] And so I made the note to myself, like, is this the breakthrough? Like, did they find a way? to provide a reward system, because right now they can only do it in narrow domains.
[00:35:09] Paul Roetzer: And so he said it's an open question in the field, but my thesis at the moment is maybe it's not as open as we think of a question. That if they could find a way to, to drive self improvement, that that would rapidly scale the capabilities of anAItoward AGI. The number six was customization. We want to be able to customize and have them become experts at specific tasks, thus the GPT model.
[00:35:33] Paul Roetzer: So they enabled this early on through GPTs, where you can go in and create your own version of ChatGPT. It lets you add knowledge and then a real key. It uses retrieval augmented generation or RAG as you've probably heard about lately if you follow the space to create outputs. What that lets you have happen is.
[00:35:50] Paul Roetzer: Rather than relying on a general ChatGPT model, based on GPT-4, you give it documentation, like your internal documents from your company and [00:36:00] say, look up the answer here. So it then goes and retrieves information from the source data you've provided. And he also gets into a little bit about fine tuning models.
[00:36:10] Paul Roetzer: And then the last thing deals with what you talked about, Mike, of this operating system idea. He shared a really. I thought a slide that's, I think, potentially really important, like, we'll look back two years from now and be like, Oh, he just laid out the whole roadmap. And I think that's what he did. So I think at the end, he actually said, here's what we're working on.
[00:36:28] Paul Roetzer: Here's what everyone is working on. This is what we know is already happening. So I'll just highlight those bullet points and then turn it back over to you, Mike. So the first is. The LLM in a few years is how he wrote it. It has more knowledge than any single human about all subjects, which you touched on, Mike.
[00:36:45] Paul Roetzer: It can browse the internet or reference local files through RAG, Retrieval Augmented Generation. It can use the existing software infrastructure, Calculator, Python, Mouse, Keyboard. Mouse, Keyboard goes back to the world of bits. It can see and generate [00:37:00] images and video. We're starting to see that happening.
[00:37:02] Paul Roetzer: It can hear and speak and generate music. We're starting to see that happening. It can think for a long time using a system 2. That's the one we just touched on. That's coming. It can self improve in narrow domains that offer a reward function. That's what we just touched on. It can be customized and fine tuned for specific tasks.
[00:37:22] Paul Roetzer: Many versions exist in app stores, and then it can communicate with other large language models. So, I don't know, I mean, I think when you, when you process everything he talks about, It actually starts to become pretty clear where the breakthroughs could have happened. And again, I think it could be any of them, but I think it's probably a safe bet it has something to do with reasoning capabilities and the reason Noam Brown joined OpenAI.
[00:37:52] Paul Roetzer: It has to do with giving it more time to process things, and I would make a big bet that self improvement has a real key [00:38:00] aspect of it. They have found some way to self improve because once you do that, it becomes harder. to keep these things from progressing at a rate that we're not prepared for in society.
[00:38:10] Paul Roetzer: So I know that was a lot. Mike, you watched it too. Is there anything that I didn't touch on that jumped out to you or just any other thoughts you had about it?
[00:38:19] Mike Kaput: No, I just think it's worth reemphasizing that what we're talking about is literally The speculation from a very informed person building this technology that we are going to very quickly have large language models are called them multimodal models, whatever term you want to use that can know more about subjects.
[00:38:41] Mike Kaput: than any single human and can do all these things. And that to me sounds like something severe enough that they almost torpedoed and destroyed the entire company over fears about it. So I think the severity of the response, whether it was right or [00:39:00] wrong or indifferent, is something to keep in mind as you are evaluating this, because I don't think you Torpedo, one of the top companies on the planet over hypothetical concerns, personally.
[00:39:16] Paul Roetzer: I think it's reasonable to, to assume that the outline of those last few bullet points I said, like LLM in a few years, that that is not the blueprint for GPT-5. Like that's what they are building. It's what everybody's building. It's what Gemini probably looks like. So yeahI I, I just. Again, like you said, the key for us is we just go to the source.
[00:39:40] Paul Roetzer: Like we're not looking at some, overnightAIexpert or influencer online and like trying to draw some conclusions. This is years of tracking, like the key players in the industry, trying to connect, why does Gnome go through this, three months after. Andres, why is Superalignment announced the day after Gnome [00:40:00] joined, like, why has all this happened?
[00:40:02] Paul Roetzer: And I think when you start going to those, the actual sources of the information, when they are willing to put it out there in interviews or online courses or whatever it is, You have to really add, again, like a language model, you have to add heavier weights to the information coming from the actual sources.
[00:40:21] Paul Roetzer: And I think in this case, there's probably a lot to what he's laying out beyond just his ideas. I think this is very real about the blueprint being followed within these research labs.
[00:40:33] Mike Kaput: Yeah, and I would just as a final note add that if you go back over the past year, especially, but even further back, your predictions on a lot of this stuff have been quite spot on.
[00:40:44] Mike Kaput: Obviously, we don't get everything right, but your strategy of going straight to the source and reading between the lines has predicted some of these things functionally months in advance.
[00:40:56] Paul Roetzer: Yeah, and I, like, that's, I think GNOME is a good example of that, [00:41:00] and Andrei's too, like, there's no coincidences in this space, like, I've been studying this space long enough to know that, there's usually reasons why all this stuff appears to be connected, because it actually is connected.
00:41:14 — Ethan Mollick’s article on how AI should cause companies to reinvent themselves
[00:41:14] Mike Kaput: That's actually a very good segue into the third final topic here, because, We just got an interesting article from AI expert and Wharton professor Ethan Mollick, who published this preliminary blueprint for how companies should be reinventing themselves due to the organizational changes caused by AI.
[00:41:35] Mike Kaput: Now, Mollick doesn't have, doesn't work for OpenAI, but he often sees behind the curtain quite a bit with some special access he gets. I think it's also quite interesting. He is publishing this now because he basically notes that with the tools we already have, the AI technology that already exists, we're already having the ability to radically Change the way we work.
[00:41:59] Mike Kaput: He [00:42:00] says, quote, theoretical discussions become practical, Drudge work is removed, and even more importantly, hours of meetings are eliminated and the remaining meetings are more impactful and useful. A process that used to take a week can be reduced to a day or two. Now this is where he sees. us already being at today, and much, much more may be possible in the near future.
[00:42:24] Mike Kaput: He writes, we can already see a world where autonomous AI agents start with a concept and go all the way to code and deployment with minimal human intervention. Importantly, this is, in fact, a stated goal of OpenAI's next phase of product development. It is likely that entire tasks can be largely outsourced to these agents with humans acting as survivors.
[00:42:48] Mike Kaput: are supervisors. Now his point here is thatAIis impacting companies already and managers need to start taking an active role in shaping what this looks like. He fully [00:43:00] admits I don't have all the answers yet, but does offer some advice for doing so. Just briefly, some of that advice is letting teams develop their own methods because ais perform more like people than software, even though.
[00:43:14] Mike Kaput: They are software. They are best managed as additional team members. Teams will need to figure out their own ways to useAIthrough experimentation and a way of sharing those methods with each other and with leadership. He also says you should be building for the oncoming future. In this article, everything I've shown you is already possible today using GPT-4.
[00:43:37] Mike Kaput: If we learned one thing from the OpenAI leadership drama, it is clear that more advanced models are coming and coming fast. Organizational change takes time, so those adapting processes toAIshould be considering future versions of ai. rather than just building for the models of today. And this last piece of advice I found [00:44:00] interesting and it's coming at this particular juncture.
[00:44:03] Mike Kaput: He says straight up, you don't have time. If the sort of efficiency gains we are seeing from earlyAIexperiments continue, organizations that wait to experiment will fall behind very quickly. If we can truly trim a weeks long process into a days long one, that is a profound change. to how work gets done.
[00:44:25] Mike Kaput: Now, Paul, we obviously are very close followers of Ethan Malik. I found the timing of this interesting. I found the stark language he used interesting. What were your thoughts on his advice?
[00:44:37] Paul Roetzer: Yeah, a couple other excerpts that jumped out to me. He talked about, how anyone can add intelligence to a project withAIand evidence shows that people are already doing this.
[00:44:47] Paul Roetzer: they are just not telling their bosses. So he cited a survey that found over half of people usingAIat work are doing so without approval and 64 percent have passed off ai. work as their own, which should be [00:45:00] terrifying to people who understand the copyright implications of that and other concerns. I said this sort of shadowAIuse is possible as large language models are uniquely suited to handling organizational roles.
[00:45:13] Paul Roetzer: They work at a human scale. They can read documents, write emails, adapt to context, and assist with projects without requiring users to have specialized training or complex custom built software. What does it mean for organizations when we acknowledge that this is happening? How to rebuild an organization on a fundamental shift in the way work is done, organized, and communicated.
[00:45:33] Paul Roetzer: And as you said, he doesn't have the answers, nobody does. But it's impacting organizations and managers have to start taking an active role. Now, his three points at the end that you outline, I just wanted to add a couple of thoughts on those. So the first is let teams develop their own methods. Great in theory, Like, I don't see how that works.
[00:45:54] Paul Roetzer: Like we talk with a lot of big enterprises and a lot of enterprises are not only restricting, [00:46:00] but limiting or limiting their, their restricting access to, toAItools. So if you're going to follow this premise, which I actually believe in, like I do think that we have to democratize. The access to these tools and the innovation that can come from them, I think that the practitioners will be the people finding the most interesting use cases when they are given the freedom to explore them and experiment, but in a lot of big enterprises, they are not going to be allowed to, no matter how much we say that that's the path forward.
[00:46:30] Paul Roetzer: So at minimI think organizations really need to get their generativeAIpolicies and responsible AI principles in place to allow this to happen. So even if you're an organization that's like, yeah, go for it. Like our organization is like, yeah, test stuff, like whatever, go ahead. Like, as long as you're not connecting it to sensitive data, like go ahead and test whatever you want and find some use cases.
[00:46:48] Paul Roetzer: You still need to give people the guardrails of what they are allowed to do and make sure that they understand how to properly use these tools. Otherwise you're putting your organization at risk. The second thing is build for the future. [00:47:00] I agree a hundred percent. We talk about this all the time in our, when we do presentations, when we run workshops, but I struggle to find any organizations that are correctly planning for the present.
[00:47:13] Paul Roetzer: So like it's most companies we talk to, most big enterprises. They don't have a plan for what to do about GPT-4. Like most people, when you pull them, haven't even tried ChatGPT Plus. Their perception of what AI does is the free version of ChatGPT. So, I agree 100 percent that the more forward thinking organizations, the ones that will win, are thinking about everything we just went through in the previous topic.
[00:47:40] Paul Roetzer: they are thinking about what is a large language model 6 12 months from now? What's it going to be capable of? What does that mean to us? But I just don't see organizations Planning in that way. I hope that changes in 2024, but as of right now, we're talking to the companies that are building their plans for 2024 and they have not built this into [00:48:00] them.
[00:48:01] Paul Roetzer: And the last is you don't have time like this is gonna happen fast. Agree a hundred percent . So what I'll reiterate is the way we teach this, the way we teach it in workshops, the way we teach this when we do advisory work. There's five steps that every company needs to be taking, and it's a good time to remind this as we're nearing the end of 2023.
[00:48:19] Paul Roetzer: Number one, education and training. You have to get your team on board with this. To get team experimentation, to give them the freedom to do this, they have to know how to use it. To find the right pilot projects in your company, you need team minded understanding of it. So prioritize education and training going into next year.
[00:48:36] Paul Roetzer: You have to have an AI council that guides the development of this, the monitoring of the progress, that council should be cross functional throughout your organization, not just marketing, but sales and service and, operations and, finance and legal and everything. Policies and principles, we talked about number three, get the generativeAIpolicies and responsible AI principles in place and then adhere to them, teach them, make them a part of the culture.
[00:48:59] Paul Roetzer: Number [00:49:00] four. If you're talking about like building for the future, you have to do impact assessments of your team. What is the, how is AI going to impact the roles in your organization? How's it going to impact your products and services, your operations? And then the last piece is build an AI roadmap.
[00:49:13] Paul Roetzer: That lays out the priority projects that are going to be initiated, how it's going to affect different teams, how it's going to affect the tech stack. So, I love Ethan's, thinking obviously in his writing. I think that there's beyond the three points he outlines, you, you really need to take action now and start thinking about those other five steps that I just outlined and what you can do in your organization starting in December to, to get that stuff moving.
[00:49:38] Mike Kaput: Yeah, I would hope the main thread that people realize running through these three main topics is it's probably time to act with some urgency. Excellent. All right, let's jump into some quick rapid fire topics. We have a bunch of other updates that are significant this past week.
00:49:54 — Anthropic releases a new version of Claude, Claude 2.1
So first up, Anthropic has [00:50:00] released Claude 2.
[00:50:01] Mike Kaput: 1, which is the latest version of its powerful foundation model. Now, interestingly, the model has what they are calling an industry leading. context window size of 200, 000 tokens. It also has a 2x decrease in hallucination rates compared to the previous version of Claude. This new version also has the ability to integrate Claude with other services, databases, and APIs, and also an upgraded developer experience.
[00:50:31] Mike Kaput: Now, The context window and decrease in hallucination rates seem particularly notable. So regarding this enlarged context window, Anthropic says you can now relay roughly 150, 000 words or over 500 pages of information to Claude. That means you can upload entire code bases, financial statements, or long literary works for Claude to summarize.
[00:50:57] Mike Kaput: perform Q& A, forecast trends, [00:51:00] compare and contrast multiple documents, and more. They also say about the decrease in hallucination rates that these quote, enable enterprises to build high performing applications that solve business problems with accuracy and reliability. Now you can actually test drive this new version of Claude by going to Claude.
[00:51:19] Mike Kaput:AIor using Anthropx. API. So Paul, this seems like a fairly significant step forward for Anthropic. Can you give us some more context about just how significant it is?
[00:51:31] Paul Roetzer: I think just more of the context around the company. We, we talk about philanthropic a lot. So they were founded in 2021. Dario Ade was, a leading, executive that OpenAI, focused on safety.
[00:51:43] Paul Roetzer: He felt OpenAI was,deviating from their original mission of AI, for the good of humanity. So he left and took 10% of the OpenAI team with him. They have since raised five and a half, billion [00:52:00] dollars. And if you recall from episode 73, I think I talked about this, the board, one of OpenAI's initiatives over the weekend, so they fired Sam on a Friday, I think by Saturday reached out to Dario to see if merging OpenAI and Anthropic was an option.
[00:52:15] Paul Roetzer: So, Anthropic's a major player here, they are focused on AI safety, they believe that the way to achieve AI safety and protect humanity from these AGI systems that are coming is to build big, powerful models so you can learn how to protect against them. So that's kind of the story of Anithropic.
[00:52:34] Paul Roetzer: You're going to continue to hear about them nonstop. they are a key, company in the future of ai.
00:52:42 — Inflection unveils Inflection-2, an AI model that may outperform Google and Meta
[00:52:42] Mike Kaput: So we've also seen another big model update with, at Inflection.AI. This is the company that creates the Pi chatbot, and they unveiled a new AI model called Inflection 2. The company claims this model can actually outperform Palm 2 from Google on a number of standard [00:53:00] benchmarks, and it says it can outperform Meta's Llama 2 on different measures.
[00:53:05] Mike Kaput: And they've said that this new, more powerful model is soon going to be integrated into Pi. So Paul, I'm kind of curious, given the context behind Anthropic, what is some similar context around inflection that's kind of helpful to understand how they fit into this ecosystem?
[00:53:22] Paul Roetzer: This was founded in 2022 by Mustafa Suleyman, who was one of the co founders of DeepMind.
[00:53:27] Paul Roetzer: So when we talked about AlphaGo earlier from DeepMind, which is now the main research lab within Google, Mustafa was one of the co founders. I believe he worked on AlphaGo. I think that was one of the things he was working on. They've raised over a billion dollars so far. He publicly has said that they had, I think it was in, August, they had 6, 000, GPUs from NVIDIA.
[00:53:49] Paul Roetzer: That was what they were doing their training on by December of this year. He said they would have 22, 000 and that they were going to continue to train more powerful models, which as we learned earlier, if you watch the intro [00:54:00] toAIfrom Karpathy, you will understand that the more GPUs from NVIDIA that have.
[00:54:04] Paul Roetzer: The more powerful training runs they can do. He recently released a book called the Coming Wave, which I have read. I don't know if. Have you read the Coming Wave yet Mike? I have not yet, no. It's, it's good. It's very macro level. He's a very smart guy obviously on a lot of different topics. Coming Wave is good if you want to go deeper and understand like broader implications to the world.
[00:54:25] Paul Roetzer: So again, Inflection is a key company. Their approach is personal assistance. They want this thing to be your friend, your therapist, your business coach, your strategist. again, if you're only experimenting with ChatGPT, you're not getting the true sense of what's going on out there. I think Inflection, if you're, let's say once a month you, dedicate.
[00:54:46] Paul Roetzer: Half a day to testing these models and seeing what's going on, you're going to test ChatGPT, you're probably going to test Google BARD, Anthropx, Claude for sure, Inflection is a different experience. So I think to understand where we are, where we're [00:55:00] going, playing around with Inflection will give you a greater sense of some of the other capabilities that are being developed and some of the different approaches to how to build these models.
00:55:10 — Google’s Bard Chatbot can now answer questions about YouTube videos
[00:55:10] Mike Kaput: So it turns out Google's Bard Chatbot can now answer questions about YouTube videos. So this happens now through the YouTube extension for Bard. And previously that extension allowed you to simply find different videos. But now you can ask Bard specific questions about a videos contents. So it's an example Google.
[00:55:32] Mike Kaput: says, for example, if you're looking for videos on how to make olive oil cake, you can now also ask how many eggs the recipe in the first video requires. So Paul, this sounds like they are baking in essentially the ability to chat with and query YouTube data that's locked into videos. How significant do you see that being?
[00:55:52] Paul Roetzer: No, no pun intended on the baking. Yeah, I think we're starting to see the path [00:56:00] forward for Google, the more they integrate their own, solutions and tools into BARD, the more valuable BARD becomes. So connecting it to Gmail, connecting it to YouTube, having it have these kind of video recognition capabilities where it's able to see in theory what's going on in the video as well as interpreting the transcript and giving you the ability to search the transcript.
[00:56:20] Paul Roetzer: And again, I think this is just a prelude to Google's Gemini model, which I assume will have this all. built right into the foundation model itself. Right now it's happening by connecting to these other things, but I assume it's going to kind of all be rolled together in, in Gemini.
00:56:37 — ElevenLabs Speech to Speech tool
[00:56:37] Mike Kaput: So ElevenLabs, which is a leading voice cloning tool, has released a speech to speech tool, which acts as anAIpowered voice converter.
[00:56:48] Mike Kaput: According to the company, it lets you turn the recording of one voice to sound as if spoken by another. It lets you control the emotions. tone, and pronunciation beyond what's [00:57:00] possible with text to speech prompts alone. So Eleven Lab sees the primary use cases for this tool revolving around extracting emotions from voices or fine tuning the intonation.
[00:57:13] Mike Kaput: invoices as desired. So Paul, we've talked a bit about voice cloning, voice converting technology. It sounds like this space is moving very, very fast.
[00:57:24] Paul Roetzer: Yeah, I haven't tested ElevenLabs myself yet, but I watch them pretty closely and I think we're going to just see a lot of innovation in the audio speech space moving into the coming year and so much, capability is going to be unlocked for good and bad.
[00:57:42] Paul Roetzer: I mean. You can obviously see how this goes really wrong and it, it seems like they just kind of put stuff out into the, it's kind of like stability, which we'll talk about next, like they are othehe model of like, just put it into the world and let, let people figure it out for themselves, the good and bad uses.
[00:57:59] Paul Roetzer: [00:58:00] So yeah, I know some people in my network have been playing with ElevenLabs and been impressed by it. Cool.
00:58:06 — StabilityAI releases Stable Video Diffusion
[00:58:06] Mike Kaput: So you did mention StabilityAI, and they just announced the release of Stable Video Diffusion, which is their first foundation model for generative video. And like some models from Runway, who we've talked about before, Stable Video Diffusion generates video using ai.
[00:58:24] Mike Kaput: And Stability is claiming Stable Video Diffusion even surpasses Runway's models in user Preference Studies. Right now, this model is only in a research preview, so the company has stated it's not ready yet for commercial use. They did announce, however, you can start signing up via waitlist for a new text to video interface.
[00:58:45] Mike Kaput: And they said that this tool showcases the practical applications of stable video diffusion in numerous sectors, including advertising, education, entertainment, and beyond. So, Paul, it sounds like an early release here, but... [00:59:00] You had seen some signals that Imad, the, CEO of Stable, StabilityAI, actually might be hinting at something more to come.
[00:59:09] Paul Roetzer: Yeah. He's like one of the best vague tweeters out there. Like he, he always tweets like stuff about what's coming from other people, like what he's hearing from other labs and then his own lab. So in this case, he was tweeting about his own. It just said, this was last night, Sunday night. November 26th.
[00:59:26] Paul Roetzer: Should we release bunches of models at once or like one every few days? with the pondering emoji. So I would expect more is coming from stability, before the year ends.
00:59:39 — Cohere launches a suite of fine-tuning tools to customize AI models.
[00:59:39] Mike Kaput: All right. And last but not least on the docket today, Foundation Model Company Cohere just launched, some new fine tuning capabilities.
[00:59:47] Mike Kaput: So these include things like the ability to do fine tuning for chat. fine tuning for search and recommendation systems, and fine tuning related to text analysis use cases. The company said that these [01:00:00] latest additions alongside our existing generative fine tuning solution complete a comprehensive suite designed to cater to a diverse range of enterprise AI applications for fine tuning.
[01:00:12] Mike Kaput: Now, Paul, given your knowledge and what you followed along with Cohere over the, the years, How significant is this and can you unpack what the importance is of these kind of fine tuning capabilities?
[01:00:25] Paul Roetzer: I think we just go back to Karpathy's outline like fine tuning of models is one of the future things like obviously the more training data you can give it that's specific to your company, your vertical, your use case, the more powerful the models become.
[01:00:42] Paul Roetzer: So, Cohere, again, in context, Aiden Gomez, one of the co founders, was one of the authors of the Attention is All You Need paper in 2017 at Google. He left and, founded Cohere, which has raised like 400 million or something like that. I think they just raised around a couple months ago. So, again, [01:01:00] another major player.
[01:01:01] Paul Roetzer: path, their go to market seems to be going after enterprise and building like custom versions of these models that can be on premise or in the cloud. You can build, I think they are connected in AWS. I believe they are also connected in, Google cloud. So you, if you're an AWS or Google cloud customer, you can, connect and fine tune based on the data you already store in those clouds.
[01:01:23] Paul Roetzer: So yeah, another major player and just making me realize like this whole episode was really all about the foundation model companies and. Large language models and what comes next, which honestly was not even by design. It just sort of like happened that way, didn't it, Mike? Yeah, yeah. The topics for the week just sort of fell into, but I think it's good because I think this is, again, I know this episode we've covered a lot.
[01:01:45] Paul Roetzer: Um. But I really think, like, if you have to go back and listen to it a couple times, I would do it. I think that there's a lot we talked about today that is going to become very apparent to you in the next couple months, the [01:02:00] importance of these topics. Like, I really think that... This lays a pretty good foundation for you to think about what are these language models going to be capable of and where are the major research labs going, in the coming, three to six months, hard to predict anything beyond that.
[01:02:15] Paul Roetzer: But I think this gives us a pretty good roadmap for that. And so as you're thinking about the impact on your company and your business strategies and your tech stack and your team, this is the kind of. thinking you really need to have, so yeah, I mean, I found this episode helpful to prepare for, like, just like watching the Karpathy video over the weekend and processing that and going back to some of the Noam Brown stuff we wrote and talked about and the early Karpathy stuff we wrote and talked about and even rethinking about AlphaGo and that documentary, which I've probably watched like five times.
[01:02:49] Paul Roetzer: I make a watch of the six time because I feel like It's starting to take on new meaning again, like you start to look at the stuff because when you try and project what is Google Gemini going to be, [01:03:00] I think you have to understand what DeepMind is and what they've built with things like AlphaGo, because that was a breakthrough in 2016, like 2015, 16, and we're just now.
[01:03:13] Paul Roetzer: Trying to ponder like, well, what if we could connect that capability to a large language model? Like what happens if Google pulls that off and you give this model the ability to search for answers and apply reasoning and take its time to like think through its outputs. Like that's wild. Like when you try and ponder, and I even made a note to myself over the weekend, like I have to think about what this means.
[01:03:35] Paul Roetzer: Like some of the things I watched in the Karpathy video and some of the things that, going back to Cicero with Noam, like. I don't know, like I need another week off after the Thanksgiving week to like really think about all this. It's a lot, but you can start, like in my head, I can start to see all the pieces coming together about how all this connects and what it might imply for all of us trying to figure out.
[01:03:59] Paul Roetzer: for our [01:04:00] businesses and our careers, what it means.
[01:04:02] Mike Kaput: Well, it's definitely an exciting time, and I'll just wrap up here by saying if you're trying to keep up with all the news, all the topics we have covered today, there's lots that doesn't make the list each week. So check out our marketingAIInstitute newsletter, marketingAIinstitute.com/newsletter.
[01:04:20] Mike Kaput: It covers everything this week in AI that you need to know. Highly recommend signing up for that if you're. new to our audience and haven't done that. And also just as a final note here, Paul and myself do many, many, many speaking engagements and workshops to help companies, get clarity on where they are going with AI and where AI is going and how that will impact them.
[01:04:43] Mike Kaput: So if you're interested in having one of us come speak at your organization, check out our website. Go to about and click on speaking and you can find all the details there. Paul, thanks again for unpacking another crazy week in AI.
[01:04:58] Paul Roetzer: Yeah, I was feeling like this [01:05:00] week's going to be crazy too. We will be back next week with a regularly scheduled program.
[01:05:04] Paul Roetzer: So thank you. Be sure to subscribe and share if you get value from the podcast. And again, Mike and I love to hear from you. So connect with us on. LinkedIn. I know both of us are really active on LinkedIn. And Twitter. I actually still, I'm just going to keep calling it Twitter. It's Twitter is my source for like 90 percent of what I learned about AI and how I stay connected.
[01:05:26] Paul Roetzer: It is, it's invaluable to me. So I'm, I'm pretty active there. So, Twitter and LinkedIn, we'd love to hear from you and stay connected. And until next week, we'll see you again.
[01:05:37] Paul Roetzer: Thanks for listening to the Marketing AI Show. If you like what you heard, you can subscribe on your favorite podcast app, and if you're ready to continue your learning, head over to www.marketingaiinstitute.com. Be sure to subscribe to our weekly newsletter, check out our free monthly webinars, and explore dozens of online courses and professional certifications.
[01:05:59] Paul Roetzer: Until [01:06:00] next time, stay curious and explore AI.
Claire Prudhomme
Claire Prudhomme is the Marketing Manager of Media and Content at the Marketing AI Institute. With a background in content marketing, video production and a deep interest in AI public policy, Claire brings a broad skill set to her role. Claire combines her skills, passion for storytelling, and dedication to lifelong learning to drive the Marketing AI Institute's mission forward.