Marketing AI Institute | Blog

[The AI Show Episode 84]: OpenAI Releases Sora, Google’s Surprise Launch of Gemini 1.5, and AI Rivals Band Together to Fight Deepfakes

Written by Claire Prudhomme | Feb 20, 2024 1:10:00 PM

After a whirlwind week of AI announcements, hosts Paul Roetzer and Mike Kaput breakdown and analyze some key updates! Episode 84 of The Artificial Intelligence Show discusses the potential of OpenAI's new video generation model, Sora, Google's Gemini 1.5's advancements, and efforts by major tech companies, including Meta, to regulate AI-generated content through the C2PA standard.

Listen or watch below—and see below for show notes and the transcript.

Listen Now

Watch the Video

Timestamps

00:03:12 — OpenAI Releases Sora

00:18:59 — Google’s next-gen model: Gemini 1.5

00:29:50 — Big tech gets behind C2PA industry standards for labeling AI content

00:41:21 — Researcher Andrej Karpathy Departs OpenAI

00:44:20 — Meta releases Video Joint Embedding Predictive Architecture (V-JEPA) model

00:49:33 — Memory and new controls for ChatGPT

00:56:11 — OpenAI Develops Web Search Product

00:58:31 — Judge rejects most ChatGPT copyright claims from book authors

01:01:33 — Revisiting Paul’s MAICON 2023 Keynote

Summary

OpenAI Releases Sora, A New Video from Text Model

We have been saying 2024 would be the year of AI video, and this prediction appears to be trending in the right direction.

OpenAI has teased a stunning new text-to-video model called Sora that is shocking the internet.

Sora is an AI model that can create realistic video from a simple text prompt. But what has everyone talking is the apparent quality of the output: Sora can generate videos up to a minute long that appear incredibly realistic and smooth.

MIT Technology Review calls the initial videos displayed by OpenAI as “high definition and full of detail,” and indeed they look stunning—showing hyper-realistic scenes like a woman walking through Tokyo at night and a movie trailer featuring an astronaut.

It is clearly giving existing video generation tools a run for their money—and looks pointedly night and day from where this technology was at only a year or so ago.

Our next-generation model: Gemini 1.5

Google announced Gemini 1.0, its most advanced model, with Gemini Nano, Pro, and Ultra versions, in December 2023.

Just last week, the company released Ultra 1.0 as part of its new Gemini Advanced paid subscription tier. And now, in a somewhat surprise announcement, Google says Gemini 1.5 is ready for primetime.

Said Google CEO Sundar Pichai in a blog post this past week: “It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute.”

“The new generation [of Gemini] also delivers a breakthrough in long-context understanding. We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.”

Not to mention, the new model appears to have impressive “in-context learning,” which means it can learn new skills from information given in a long prompt—without additional fine-tuning.

New AI Image Labeling Could Combat Deepfakes

Telling what is real and what is not online is becoming increasingly difficult thanks to hyper-realistic deepfakes and synthetic content generated by AI.

Leading AI companies are making an attempt to fix the problem. In the past couple weeks, Meta, OpenAI, and Google have announced they will join Microsoft, Adobe, and others in embracing Content Credentials, a technical standard for media provenance from C2PA.

C2PA is a standards organization (the name stands for Coalition for Content Provenance and Authenticity) and it is working on ways, in partnership with over 100 companies, to identify where content came from online.

The C2PA standard involves publishers and companies to embed metadata into media to verify the media’s origin. This metadata can be used to see if an image, for instance, was created with an AI tool.

For example, you can now view additional metadata in any image generated by ChatGPT’s DALL-E 3 capabilities, or the OpenAI API, and see the AI tools used to generate it. Meta is going one step further. The company says it is already using metadata to label images created with its Meta AI tool.

But, the company is now “building industry-leading tools that can identify invisible markers at scale – specifically, the 'AI generated' information in the C2PA and IPTC technical standards – so we can label images from Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock as they implement their plans for adding metadata to images created by their tools.”

Today’s episode is brought to you by Marketing AI Institute’s AI for Writers Summit presented by Jasper, happening virtually on Wednesday, March 6 from 12pm - 4pm Eastern Time. To register, go to AIwritersummit.com

Links Referenced in the Show

Read the Transcription

Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content.

[00:00:00] Mike Kaput: I don't want to pick on meta here, but we're basically asking.

[00:00:05] Mike Kaput: An organization that has routinely failed to regulate its platform adequately to now regulate this at scale.

[00:00:15] Welcome to the Artificial Intelligence Show, the podcast that helps your business grow smarter by making AI approachable and actionable. My name is Paul Roetzer. I'm the founder and CEO of Marketing AI Institute, and I'm your host. Each week, I'm joined by my co host, and Marketing AI Institute Chief Content Officer, Mike Kaput, as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career.

[00:00:45] Join us as we accelerate AI literacy for all.

[00:00:52] Paul Roetzer: Welcome to episode 84 of the Artificial Intelligence Show. I'm your host, Paul Roetzer, along with my co host as [00:01:00] always, Mike Kaput. Hello, Mike.

[00:01:01] Paul Roetzer: Hey Paul, how's it going?

[00:01:03] Paul Roetzer: Good. We are doing this on a Sunday morning due to travel schedules for the upcoming week, so it is Sunday, February 18th, you are probably listening to this at some point, February 20th or later.

[00:01:16] Paul Roetzer: So hopefully if anything crazy happens on Monday, I don't know.

[00:01:19] Paul Roetzer: I hope the AI industry just got the craziness out of its system this past week because it was a wild week.

[00:01:27] Paul Roetzer: We have a lot to talk about, on this morning. man, I just like, I don't know about you, but like Thursday is when everything kind of hit.

[00:01:37] Paul Roetzer: Yeah, And I had four presentations Thursday. I like a 7 30 a. m. A 7 30 p. m. A 1 p. m. workshop,

[00:01:44] Paul Roetzer: and then like a 3 30 p. m Thing and during the 1 p. m. Workshop was when OpenHAI dropped the video generation stuff we're going to talk about. So I just felt like. I mean, by the end of the day, Thursday, I was so mentally fried, but at the same time, I was like, man, I can't wait to do the [00:02:00] podcast.

[00:02:00] Paul Roetzer: because it has so much to talk about.

[00:02:03] Paul Roetzer: Yeah. Yeah. So, so we have a lot to unpack for you. Mike and I are going to do our best to try and make sense of one of the crazier weeks in AI that I can remember.

[00:02:11] Paul Roetzer: but before we do that, let's get into the sponsor. So today's episode is brought to us by the marketing AI Institute's AI for Writers Summit, which is coming up fast.

[00:02:21] Paul Roetzer: That is presented by Jasper. It is happening virtually on Wednesday, March 6th from noon to 5 p. m. Eastern time. we had over 4, 000 writers, editors, and content marketers join us for the inaugural event in March, 2023.

[00:02:36] Paul Roetzer: So we are back in 2024 with an amazing agenda, state of. Tools and platforms to use.

[00:02:43] Paul Roetzer: Implications on copyright and intellectual property. How to adopt AI writing platforms in the enterprise. An AI in action demo session with Mike and Kathy. It's going to be incredible. Just an amazing day. It is a free event. There's a free ticket option, thanks to Jasper. So you can go to AIwriterssummit.com

[00:02:59] Paul Roetzer: [00:03:00] Learn more about that. And we hope to see you there in a few short weeks. So Mike Let's just go ahead and get into it all because like we said up front, there is a lot to try and unpack here.

[00:03:12] OpenAI Releases Sora

[00:03:12] Mike Kaput: All right,

[00:03:12] Mike Kaput: Paul, so first up, OpenAI has teased a stunning new text to video model. It's called Sora, and it's blowing up Sora is an AI model that can create

[00:03:26] Mike Kaput: realistic video from a simple text prompt. Now what has everyone talking is the apparent quality of the output. Sora can generate videos up to a minute long that appear to be incredibly realistic and smooth.

[00:03:41] Mike Kaput: MIT Technology Review calls the initial videos displayed by OpenAI as, quote, high definition of detail. And indeed the demos so far we've seen look pretty stunning. they show scenes like a hyper realistic scene of a woman walking through Tokyo at night and [00:04:00] a really cool, vivid, realistic looking movie trailer featuring an astronaut.

[00:04:06] Mike Kaput: Based on these demos at least, it looks like Sora is giving existing video generation tools a run for their money. And honestly it looks just night and day. from where this technology was only a year, or a year or so Now, Paul, I don't think that there's a question that Sora is Like, it is blowing up the AI corners of the internet based on how they've have access to the tool yet, so we're reliant simply on these cherry

[00:04:37] Mike Kaput: picked examples, but Crazy. Regardless, it really seems like an insane amount of innovation in AI generated video has compared to just a year look at what Do you agree with that?

[00:04:52] Mike Kaput: Is that what

[00:04:53] Paul Roetzer: Yeah, definitely. I mean,

[00:04:56] Paul Roetzer: So we've, we've been saying for a while now that [00:05:00] 2024 was going to be the year of aI for video that it just definitely was trending in that direction and that seems to certainly be holding true so far because, you know, this isn't even the only announcement we've seen.

[00:05:10] Paul Roetzer: We're going to talk about Meta's announcements as well.

[00:05:13] 

[00:05:13] Paul Roetzer: Um, but if you just go back to around this time last year, I was trying to remember the exact date. I didn't pull it up, but it was somewhere around February or March when, Runway teased the future of storytelling, which ended up becoming Gen 2, which is text to video, which up until Thursday was, you know, maybe along with Pika, like there's been some other innovations in

[00:05:35] Paul Roetzer: the last couple months, but Runway Being able to generate four seconds of video at a time from a text prompt was sort of state of the art.

[00:05:45] Paul Roetzer: And then you could extend those videos to about 16 seconds, I think is the max length through Runway. And so we've talked a lot about Runway on this podcast

[00:05:53] Paul Roetzer: before. I demo it all the time when I'm giving keynotes as a model of kind of where the video is going.

[00:05:59] Paul Roetzer: So to go from [00:06:00] that 16 seconds to a minute, is pretty incredible, and my feeling was the minutes seem kind of arbitrary, honestly, like I don't know why they, I'm sure a minute is within their research, but it

[00:06:12] Paul Roetzer: seems as though they can probably go further than that. So my initial take was. This isn't

[00:06:18] Paul Roetzer: the ChatGPT moment, I would say, for aI video yet, maybe in part because it's not really available to anybody yet, but it doesn't seem like we're quite there, but certainly a milestone in, in the advancement of AI video generation.

[00:06:35] Paul Roetzer: I'm guessing, you know, based on openAI's past release schedules, you know, it wouldn't be unrealistic to think it's probably like two to three months out before we start seeing this built into ChatGPT or available as a standalone, but their recent history would tell us that this is probably

[00:06:53] Paul Roetzer: going to get rolled into ChatGPT, although

[00:06:56] Paul Roetzer: maybe that's a new pricing tier. Like you could start to see now how they can kind of add [00:07:00] these other capabilities where if you want video understanding and generation, you

[00:07:03] Paul Roetzer: can actually like increase your, your pricing tier,

[00:07:06] Paul Roetzer: No idea if that's what they'll do, but that was kind of my first take on it.

[00:07:10] Mike Kaput: So, I don't know. I mean, there's a lot of speculation out there. Just given how insanely good this looks, and like, how fast The innovation progressing, like, it just strikes me, and I just have to ask, like, are we looking at a near future where you don't really need to hire someone to shoot video?

[00:07:29] Mike Kaput: I mean, it sounds a little crazy, but like, we have to assume we're getting longer and longer hyper realistic videos very, very fast. I mean, can't anyone just make something incredibly good for cheap?

[00:07:42] Paul Roetzer: Yeah, I don't think that's going to be the case in the near term. Like I think we've all learned not to say that won't happen.

[00:07:50] Paul Roetzer: Like we just don't know where this is going. But I mean, I'll, I'll dissect like a few key excerpts from the blog post announcement. Cause I feel like OpenAI did a. [00:08:00] Pretty

[00:08:00] Paul Roetzer: good job. They didn't release a ton of technical details about how

[00:08:03] Paul Roetzer: they did this. There's no like open research report that says here's all the training data. They didn't. to say what the training data was, but the

[00:08:10] Paul Roetzer: blog post had some really interesting aspects to it. So, first they say, we're teaching AI to understand and simulate the physical world with the goal of training models that help people solve problems that require real world interaction. So, teaching, like, When a 3D animator or a game designer builds something, they

[00:08:32] Paul Roetzer: follow the laws of physics. Like, the objects

[00:08:35] Paul Roetzer: and the people, the characters, like, follow the rules of physics in the universe. These videos don't. Like, there are Pieces of them that do, but

[00:08:46] Paul Roetzer: then there are things that don't make any sense, like it drops

[00:08:49] Paul Roetzer: a glass and the glass doesn't shatter properly, like it's not following the rules of physics.

[00:08:54] Paul Roetzer: So, that's one thing, it's like they're trying to get there, They're

[00:08:57] Paul Roetzer: trying to create something that can not only [00:09:00] generate a minute or more of video, but can do it within the laws of the universe. so then it says, introducing SOAR, our text to video model, SOAR

[00:09:07] Paul Roetzer: can generate videos up to one minute long, so again, the context

[00:09:10] Paul Roetzer: here is right now, Best in class is like four seconds at a time to maintain consistency within those.

[00:09:17] Paul Roetzer: Um, but right again, the key aspect you mentioned, Mike, is it's only available to red teamers right now. So basically they're giving the powerful model to a very select group of people

[00:09:26] Paul Roetzer: who are going to test this thing, find all the harms and risks, finds where it's going to go off the rails, and then they'll give grant access.

[00:09:33] Paul Roetzer: They said to some visual artists, designers, and filmmakers. to try and understand the impact on creative professionals. They went on to say,

[00:09:41] Paul Roetzer: it can generate complex scenes, multiple characters, specific types of motion. and they claim the model understands not only what the user has asked for in the prompt, but

[00:09:52] Paul Roetzer: also how these things exist in the physical world. And this is, we're going to talk a little bit more about Yann LeCun in a minute at meta. [00:10:00] This is where the difference seems to be happening. Like

[00:10:03] Paul Roetzer: OpenAI is saying that they're building something based on transformers and kind of this diffusion model that is actually in a way understanding the physical world.

[00:10:13] Paul Roetzer: It appears to me Yann LeCun says that's impossible. Like every interpretation I have of what he's saying is they're mistaken if they think that that's what's So they go on to say the model has deep understanding of language. This is goes back to Mike, you and I have talked many times that language models

[00:10:28] Paul Roetzer: are just the foundation.

[00:10:30] Paul Roetzer: So they're saying that the The fact that these things can understand language is a real key and it enables it

[00:10:36] Paul Roetzer: to accurately interpret prompts and generate compelling characters and emotions and things like that. they also

[00:10:43] Paul Roetzer: say it can create multiple shots with a single generated video that accurately persists the characters and visual style.

[00:10:48] Paul Roetzer: That's key because, like, the example here would be if they're following a character walking down the street and the camera pans past that character. and then comes back to that character, it's [00:11:00] very hard for this AI to generate that character consistently once it has disappeared, in essence.

[00:11:07] Paul Roetzer: And so they're saying that they're basically having these breakthroughs where they're able to not only generate that character, but maintain the consistency of that character as you kind of move throughout these scenes.

[00:11:18] Paul Roetzer: And so to your question, can this replace, you know, video design and animation? No, because they're only able to do it in like limited environments where they're able to maintain this consistency. So it's not like you can say, Hey, design me a video game.

[00:11:32] Paul Roetzer: and it just goes and keeps all these consistencies and follows the rules of physics.

[00:11:35] Paul Roetzer: they get into a

[00:11:37] Paul Roetzer: little bit about safety and how they're, you know, they're aware that this could be used for misleading content, such as detection.

[00:11:44] Paul Roetzer: you know, they want to know when the video was made by Sora. They did talk a little about

[00:11:48] Paul Roetzer: C2PA,

[00:11:49] Paul Roetzer: C2PA, which you and I are going to get into. there topic later on. was a couple other things I thought was interesting.

[00:11:57] Paul Roetzer: They did talk about being a diffusion model, [00:12:00] which generates video by starting off with one that looks

[00:12:02] Paul Roetzer: like static noise and gradually transforms into removing the noise over many steps. I think that's like, we're not getting into the highly technical stuff here. I think

[00:12:09] Paul Roetzer: of the diffusion model in essence, like sculpting. I don't know if this is even the right analogy, but. When you start with a piece of marble

[00:12:16] Paul Roetzer: and you, like, sculpt this thing out of it, think of the marble as, like, the noise.

[00:12:20] Paul Roetzer: It's just, like, this thing. And out of that, you create the sculpture. That's kinda how I envision diffusion working. It's like, you're starting with this very noise. It's just, it looks like nothing. It's just noise on the, on

[00:12:31] Paul Roetzer: the screen. And it sort of diffuses down. It, like, it pulls the noise away.

[00:12:36] Paul Roetzer: And it's left with this video. So following this model, they're able to generate

[00:12:42] Paul Roetzer: entire videos all at once and extend existing videos and make them longer. and then at the end, they, they noted that, so SORA serves as a foundation for models that can understand

[00:12:54] Paul Roetzer: and simulate the real world, a capability we believe will be an important milestone for achieving aGI.[00:13:00] 

[00:13:00] Paul Roetzer: Everything with OpenAI always comes back to AGI, um.

[00:13:05] Paul Roetzer: A couple of quick notes. They did have a technical report, which again, if you're really interested in the technical side, go read it.

[00:13:11] Paul Roetzer: They don't tell you a bunch about how they did it. It's just kind of like what it's technically capable of. So a few interesting notes here,

[00:13:19] Paul Roetzer: They highlighted that they think this is a promising path toward building general purpose simulators of the physical world, again. Yann leCun may disagree.

[00:13:27] Paul Roetzer: They do call out that it is capable

[00:13:29] Paul Roetzer: of generating images as well, which makes me wonder how this and DALL E,

[00:13:33] Paul Roetzer: their image generation tool, either eventually become the same

[00:13:37] Paul Roetzer: thing or it replaces DALL E, I'm not really sure.

[00:13:41] Paul Roetzer: They talked a little bit about, a word that I think we're going to hear more about of patches. So we've talked about how language models predict words is actually they predict tokens, which are like parts of words. And we'll talk more about

[00:13:52] Paul Roetzer: tokens in a minute with the Google announcement. What this model does is it basically follows a similar pattern, but it actually [00:14:00] uses patches.

[00:14:00] Paul Roetzer: So it's just a word you're probably going to hear.

[00:14:03] Paul Roetzer: Um. And then one of the things that I thought was fascinating is they said, we find that the video model exhibits a number of interesting emergent capabilities when trained at scale. These capabilities enable SORDA to simulate some aspects

[00:14:17] Paul Roetzer: of people, animals, environments from the physical world that it wasn't basically trained to do. So this goes back to this idea of open AI is that if we just

[00:14:25] Paul Roetzer: keep giving it more data, keep giving it more computing power, it seems to develop these emergent capabilities. but they also noted that it does exhibit numerous limitations as a simulator. And this gets into how I started. For example, it does not accurately model

[00:14:41] Paul Roetzer: the physics of many basic interactions like glass shattering. Other interactions like eating food does not always yield

[00:14:47] Paul Roetzer: correct changes. So I take a bite of the burger and there's nothing missing from the burger kind thing. of thing.

[00:14:54] 

[00:14:54] Paul Roetzer: so then the two, the two final thoughts here. I noted Jim Fan, which I think you had made note [00:15:00] of, or you and I had talked about, was he, he tweeted, if you think OpenAI Sort is a creative tool or a creative toy like DALL E, think again.

[00:15:08] Paul Roetzer: It is a data driven physics engine. It is a simulation of many worlds, real and fantastical. Um. which I don't even know that, like, it was hurting my brain to, like, read that tweet. I've read it, like, 15 times and I just, I even used perplexity to try and, like, understand what he was saying.

[00:15:24] Paul Roetzer: but anyway, he's saying it's something much bigger than what you think it is, is the synopsis there. he, he equated it to The GPT-3 moment.

[00:15:34] Paul Roetzer: So he's basically saying like, Hey, this isn't GPT 4 yet.

[00:15:38] Paul Roetzer: This is about where we were with GPT-3 in relation to text and then the final one I'll put a note in here is Yann LeCun did comment and he was kind of like, people were taking shots at him saying, Oh, I thought you said

[00:15:51] Paul Roetzer: this wasn't going to work.

[00:15:51] Paul Roetzer: And here it seems to be working. And he basically was like, this is not what you think it is.

[00:15:58] Paul Roetzer: So.

[00:15:59] Paul Roetzer: He said the [00:16:00] generation of mostly realistic looking videos from prompts does not indicate that a system understands the physical world. And this is going to be a theme throughout

[00:16:07] Paul Roetzer: today's episode. There is disagreement from like the leading AI experts of how we move forward, what next generation looks like.

[00:16:15] Paul Roetzer: But, The final context from him, this is why he says the video they're trying to do isn't going to get us to the next level.

[00:16:23] Paul Roetzer: He said, words or tokens are easy to predict because there is a finite number of them. Typically there are about 30, 000 possible tokens for text in any language. If you don't know the price of only, if you

[00:16:37] Paul Roetzer: Don't know the price is only what a token comes next,

[00:16:40] Paul Roetzer: You can produce a score or probability for each possible token. So in essence, what he's saying is as you're writing, as, as AI is writing things, there's only like 30, 000 roughly variations of what could come next. And there's ways to kind of drill that down. So you, you, you have a probability of a few select words.

[00:16:57] Paul Roetzer: So he's basically saying like language prediction is, is kind of [00:17:00] easy comparatively. He goes on to say, but the number of possible video frames for all practical purposes is

[00:17:06] Paul Roetzer: infinite, continuous, and high dimensional. So he's basically saying like, predicting a word

[00:17:11] Paul Roetzer: is child's play compared to predicting what's going to happen in a physical world where anything can happen.

[00:17:17] Paul Roetzer: And you're trying to predict frames. And so he does not seem to believe that what OpenAI is doing is going to ultimately get to this AGI outcome, but they, they obviously disagree. So I don't know. I mean, that's kind of my big overall

[00:17:32] Paul Roetzer: take here is it seems like a massive leap forward. It's very obviously noteworthy technology.

[00:17:38] Paul Roetzer: I think once people get their hands on it, it'll be really fascinating to see what it's capable of

[00:17:42] Paul Roetzer: doing. They're going to have to red team this thing like crazy because it's going to, what we know from GPT 4 is it had all kinds of. capabilities that were, nerfed out of it for, you know, a technical term.

[00:17:58] Paul Roetzer: which means safety, like nerf, [00:18:00] actually like nerf guns safety. So they made it safer and this thing's going to do all kinds of horrible things.

[00:18:06] Paul Roetzer: Like if you ask it to, I'm sure, depending on what it's training data was and they're going to have to like, you know. Protected from doing I don't know,

[00:18:16] Paul Roetzer: man. I still like my head's still spinning. And again, all we have are OpenAI's outputs. So they're handpicking the best

[00:18:23] Paul Roetzer: of the best examples. But even within those examples, I saw a great like takedown of each of these videos, like the one where the little monster was like playing with the candle. And this, this guy was an animator, like called out like 50 different things that were wrong with this 17 second clip.

[00:18:37] Paul Roetzer: So to go back to your first question, will this replace anybody? No, like no time soon is this because it has. I mean, the equivalent of hallucinations, in essence, in language models. It's just like hallucinating things that would never happen in a

[00:18:50] Paul Roetzer: physical world that follows the laws of the universe.

[00:18:54] Mike Kaput: Alright, so in our second big piece of news today, we

[00:18:59] Google’s next-gen model: Gemini 1.5

[00:18:59] Mike Kaput: [00:19:00] have some big announcements again from Google. so if you recall back in December of 2023, Google announced Gemini 1. 0, its most advanced model. with versions lists as Gemini Nano, Pro, and Ultra. Just last week, we covered the company releasing Ultra 1.

[00:19:19] Mike Kaput: 0 as part of its new Gemini Paid Subscription tier.

[00:19:24] Mike Kaput: And now, in a little bit of a surprise announcement, Google says that Gemini 1. 5 for primetime.

[00:19:32] Mike Kaput: So google CEO Sundar PichaIn a blog post this past week, wrote, quote, It shows dramatic improvements across a number of dimensions, and 1.

[00:19:41] Mike Kaput: 5 Pro achieves comparable quality to 1. 0 Ultra while using less compute. The new generation of Gemini also delivers a breakthrough in long context understanding. We've been able

[00:19:54] Mike Kaput: to significantly increase the amount of information our models can process, running up to 1 [00:20:00] million tokens consistently.

[00:20:02] Mike Kaput: Achieving the longest context window of any large model yet. That last part is important. According to Google, quote, This means 1. 5 Pro can process vast information in one go, including one hour of video, 11 hours of audio, code bases with over 30, 000 lines of code, or over 700, 000 words.

[00:20:27] Mike Kaput: Not to mention this new model appears to have impressive, quote, in context learning. This means it can learn new skills from information given in a long prompt without any additional fine tuning. So this was actually on display in one of the examples given by Google. They had the model learn this very, very rare language called Kalimang, which has just 200 speakers all over the The model learned the language and how to translate it simply by using the context [00:21:00] in a grammar manual. So pause. Kind

[00:21:04] Mike Kaput: of surprised how quickly we have a new version of the Gemini model and one that's apparently at least in the version 1. 5 Pro comparable to Ultra 1.

[00:21:15] Mike Kaput: 0, but uses less compute because we literally. Talked about Gemini Ultra 1. 0 being released last week. what were your thoughts here on quickly this happened?

[00:21:26] Paul Roetzer: Yeah, the timing was just so bizarre. So this we talked about Thursday being a crazy day. This was Thursday as well. So this came out like 8 or 9 a. m. Eastern Time Thursday morning about two to three hours before OpenAI dropped Sora, which

[00:21:42] Paul Roetzer: There was a part of me that was like, opening, I totally had this announcement just sitting here and was waiting for Google to announce something

[00:21:47] Paul Roetzer: And then they just like one upped them just for fun. I don't know why else they would have both got announced on the same

[00:21:52] Paul Roetzer: day. So, I don't understand the timing.

[00:21:55] Paul Roetzer: It does create quite a bit of confusion when you just came out with [00:22:00] Alter 1. 0 and now you're saying we have a pro version. 1. 5 that's probably more powerful than the ultra 1. 0, which by the way, like literally just

[00:22:09] Paul Roetzer: became available for developers.

[00:22:11] Paul Roetzer: So I don't know. Like I, I don't understand it. I I've kind of taken a moment and tried to like figure out what is going on here on the timing, but I don't know.

[00:22:20] Paul Roetzer: My one theory is like some other things are coming soon and they just needed

[00:22:23] Paul Roetzer: to get this out from a timing perspective. So, yeah, that was my first take on the timing.

[00:22:30] Paul Roetzer: The context you mentioned, I think, is so critical. And we already talked about

[00:22:34] Paul Roetzer: like, tokens, in the previous one. But this, this starts to, you start to see a theme building here. So, yeah, timing was weird. It's only available in a limited preview to developers

[00:22:45] Paul Roetzer: and enterprise customers. So, again, this isn't something you or I are going to go run and try in our Gemini Advanced account.

[00:22:51] Paul Roetzer: Like, we don't have access to it yet.

[00:22:54] Paul Roetzer: they said that the pro when it is available to the rest of us, we'll have a standard 128, 000 token context [00:23:00] window, which

[00:23:00] Paul Roetzer: isn't life changing. Like Anthropic, I believe, has 200, 000 right now, so

[00:23:06] Paul Roetzer: the 128 isn't anything major, but then they're going to basically allow you to

[00:23:11] Paul Roetzer: It sounds like pay more to get the million plus tokens or even further up. The other thing they stressed was that it is a mixture of experts, architecture. And we've talked about that previously, on, on the show, but to recap here,

[00:23:27] Paul Roetzer: this, they're not the only ones doing this, but the significance of this, like the way I always explain this

[00:23:32] Paul Roetzer: is

[00:23:33] Paul Roetzer: like when you ask a question of a human and Like the human answers you, they usually pull from, we don't know exactly how it works,

[00:23:42] Paul Roetzer: but like every brain, every neuron in the brain doesn't fire to do each thing

[00:23:46] Paul Roetzer: a human does. There are like very specific parts of the brain that fire to, answer questions or take actions or whatever it may be.

[00:23:54] Paul Roetzer: And so this mis mixture of experts. architecture tries to follow a similar [00:24:00] concept within the machine. So when you ask it to do something, analyze a video, analyze audio, and you know, you give it an input of a bunch of texts that it only fires parts of the model.

[00:24:12] Paul Roetzer: So historically the whole neural network, like the whole thing would fire to do a single thing. Now what we're able to do or what they're doing here is they're almost saying, okay, it's being asked

[00:24:22] Paul Roetzer: to do this one thing. Here's the section

[00:24:25] Paul Roetzer: ofthe model that is able to do that thing best.

[00:24:28] Paul Roetzer: And we're only going to fire that. That allows it to be more efficient with the energy it uses and more efficient in its output.

[00:24:35] Paul Roetzer: I thought

[00:24:36] Paul Roetzer: it was interesting. They,

[00:24:37] Paul Roetzer: they mentioned that while a million tokens is what they're

[00:24:41] Paul Roetzer: kind of like.

[00:24:42] Paul Roetzer: Allowing people to have access to in this limited release. They successfully tested up to 10 million tokens in their research, which is kind of wild to consider.

[00:24:53] Paul Roetzer: And then they explained like basically the bigger the context, window, this, whether it's 128, 000 or a million or 10 million, the [00:25:00] more information that you can put

[00:25:01] Paul Roetzer: into the prompt and the output can become more consistent, relevant, and useful.

[00:25:05] Paul Roetzer: Um, They gave a few examples just to kind of bring this home and make it a little more tangible.

[00:25:11] Paul Roetzer: So they said, examples would be it could reason across very long documents, from comparing details across contracts, to synthesizing and analyzing themes and opinions across analyst reports, research studies, and

[00:25:22] Paul Roetzer: even a series of books.

[00:25:23] Paul Roetzer: Now, Anthropic and others would enable you to do this. Like, Mike and I do this sometimes.

[00:25:28] Paul Roetzer: We'll give it research reports, things.

[00:25:30] Paul Roetzer: What has traditionally happened with these models is The more context you give them, the more tokens you give them, the less accurate they kind of become as they go further on. So it becomes

[00:25:41] Paul Roetzer: less and less reliable as you give it more context. What it seems like Google is

[00:25:46] Paul Roetzer: saying is they're finding ways to maintain accuracy and reliability.

[00:25:51] Paul Roetzer: As you expand the number of contacts, well, that becomes huge, especially in examples like this, like analyst reports, contracts, when you really start to think about knowledge [00:26:00] work, we need to be able to trust these models and their outputs or else it's really just

[00:26:04] Paul Roetzer: redundancies. The humans still got to do all the work to verify

[00:26:08] Paul Roetzer: They talked about another example, analyzing, compare content across hours of video. Such as finding specific details in sports footage or getting caught up on

[00:26:16] Paul Roetzer: detailed information from

[00:26:18] Paul Roetzer: video meeting summaries that support precise question and answer. So again, that they use this

[00:26:22] Paul Roetzer: great example where they were like giving it a whole movie and saying, you know, find the part where the paper was taken out of the person's pocket and it finds it.

[00:26:30] Paul Roetzer: And then it's actually able to tell you what was on that piece of

[00:26:33] Paul Roetzer: paper and it's able to like see and analyze it. So, you know, in marketing and in business and communications and in sports entertainment, like media and entertainment, you can start to envision all

[00:26:43] Paul Roetzer: these ways you could use this if this technology becomes really available.

[00:26:49] Paul Roetzer: Another one was enabled chatbots to hold long conversations without freezing. Forgetting details, even over complex tasks or many follow up interactions. And then the last one was hyper personalized experiences

[00:26:59] Paul Roetzer: by [00:27:00] pulling relevant user information into the prompt without the complexity of fine tuning a model.

[00:27:04] Paul Roetzer: These to me was like, this is buried within kind of like the technical side, like the vertex AI stuff. But this to me was like, Oh my gosh, like now you can look at the business uses of these, just

[00:27:15] Paul Roetzer: those four I just went through, and you can start thinking, what if chatbots became reliable? What if they were accurate no matter how long the thread?

[00:27:23] Paul Roetzer: And what if they had memory about everything we've previously talked about? So knowing this is a 1. 5 release, knowing they appear to have accelerated the announcement of it for some unknown

[00:27:34] Paul Roetzer: reason.

[00:27:36] Paul Roetzer: You can start to kind of piece together what Google is doing and where by later 2024, the implications to business and knowledge work will start to happen.

[00:27:47] Paul Roetzer: Like when this stuff becomes truly reliable, it's, it's kind of like crazy to really start to like step back and think about.

[00:27:55] Mike Kaput: Yeah, it's certainly one of those things where we've talked a lot about [00:28:00] kind of the future potential of these tools once they really get good enough at things like the in context learning that Google has referenced here, that just starting to first glimmers of that actually happening, and I don't know if

[00:28:15] Paul Roetzer: Yeah, and the only other, like, just oddity I'll note, because again, like, it's so weird, like, how the timing's happening, but Google's doing this weird thing where Every time they make a major announcement about ai, they're doing a blog post from Sundar Phai, the CEO of

[00:28:30] Paul Roetzer: Alphabet and Google and Demis

[00:28:32] Paul Roetzer: Asaba, who runs Google DeepMind.

[00:28:35] Paul Roetzer: And this, I think this is the second or third time.

[00:28:37] Paul Roetzer: now, they've done this with the gemini models, which come out of deepMind in the AI lab. but It's just odd to me that their, their format is, here's a quote

[00:28:45] Paul Roetzer: from Sundar and here's an excerpt from Demis. And they basically say the same thing. I don't know why they're doing that.

[00:28:52] Paul Roetzer: And like, again, as you and I spent time in the PR

[00:28:55] Paul Roetzer: world, like, there has to be some strategic communications reason why they're doing [00:29:00] that. Like, it's

[00:29:00] Paul Roetzer: no value to the reader. Like, just tell me the information. I don't want to

[00:29:03] Paul Roetzer: just say it's byline by both of them. What do I care if it's broken up by, this is what Demet says.

[00:29:07] Paul Roetzer: And this is what Sundar says. Cause I know the PR people wrote it anyway. So like, why are, I don't

[00:29:11] Paul Roetzer: know. It's just, there's something there. I can't, I can't put my finger

[00:29:14] Paul Roetzer: on it yet. Like what exactly the reason is, but it's not. It's not an

[00:29:17] Paul Roetzer: insignificant thing that they're doing. It's a very intentional choice that, um, is

[00:29:24] Paul Roetzer: either laying the groundwork for something with But it's just interesting to have Demis constantly be on.

[00:29:32] Paul Roetzer: level, just one right below Sundar, but intentionally keeping them together in these announcements.

[00:29:37] Paul Roetzer: What I'm telling you is I think at some point something is going to happen and we will look back and be like, Ah, that's why they were doing that.

[00:29:45] Paul Roetzer: I have some theories, but I'll hold off on those for now.

[00:29:49] Mike Kaput: Alright,

[00:29:51] Mike Kaput: in our third big topic for this week, it's

[00:29:54] New AI Image Labeling, from C2PA, Could Combat Deepfakes

[00:29:54] Mike Kaput: becoming harder and harder to tell what's real and what's not thanks [00:30:00] hyper realistic deepfakes and synthetic content generated by AI, all of which we've highlighted as a problem many, times. Well, it turns out leading AI

[00:30:10] Mike Kaput: companies, even some AI rivals, are coming together to at least make some type of attempt to problem.

[00:30:19] Mike Kaput: the last couple weeks, Meta, OpenAI, and Google have announced that they're going to join companies like Microsoft, Adobe, and others. In embracing something called Content Credentials, which is a technical standard for media provenance from C2PA. Now C2PA is a standards organization. The name stands for Coalition Provenance Authenticity.

[00:30:45] Mike Kaput: And this organization is bringing together these companies and leaders to work on ways with over a hundred companies so far. To identify where content actually came from online. So the C2PA [00:31:00] standard, which is named after the organization, uh, basically has, gives publishers and companies the ability to embed metadata into media.

[00:31:10] Mike Kaput: to verify that So

[00:31:13] Mike Kaput: this metadata could be used to see, for an image was created with an AI tool when you view the image For example, you can now view additional metadata in any image generated by ChatGPT DALL E3 or the OpenAI API, and it'll actually tell you, okay, this was AI generated, here's what tool it came from, and a bunch of other information.

[00:31:39] Mike Kaput: Now it turns out as part of this process, Meta appears to be trying to go one

[00:31:44] Mike Kaput: step further.

[00:31:45] Mike Kaput: So the company said that it's already using metadata to label images that are created with its MetaAI tool. But now, they're quote, building industry leading that can identify invisible at scale, [00:32:00] specifically the AI generated information in the C2PA and other technical standards being used, so that we can label images from Google, OpenAI, Microsoft, Adobe, MidJourney, and Shutterstock as they implement their plans for adding metadata to images created by their tools.

[00:32:18] Mike Kaput: Now, This might seem a little technical or esoteric here, but really what they're trying to do is come together to have a joint standard and effort to actually label AI generated content online and actually detect when those labels exist, presumably, at least in Meta's case, to be actually able to regulate how those images and potentially video and audio show up on their platform.

[00:32:42] Mike Kaput: So I want to kind of first take a step back here, Paul, and just ask,

[00:32:46] Mike Kaput: like, why now? Why are these companies devoting to this given everything else they're working on.

[00:32:54] Paul Roetzer: Like we've talked about on the show so many times, I mean, my biggest concern for near term [00:33:00] AI misuses disinformation, misinformation, synthetic content because the average citizen has no idea AI is capable of doing these

[00:33:08] Paul Roetzer: things. And so you see an advancement like Sora, and it's incredible, but it's just one step closer. Even if, like, when you watch the videos on Sora, You have to have a trained eye or be looking for the breakdown in the laws of physics.

[00:33:23] Paul Roetzer: Like, it's not obvious right away that things aren't working the person who's looking for it can easily say, Oh,

[00:33:31] Paul Roetzer: okay, that's, that totally isn't real.

[00:33:33] Paul Roetzer: Like, obviously the thing went from five fingers to six fingers and then back to five fingers in like a blink of an eye. But it happened. That's The average citizen isn't

[00:33:41] Paul Roetzer: going to do that. Like, they're just going to see something, everything is

[00:33:45] Paul Roetzer: in shorts now, like they're all just like, quick videos of everything and you're not going to stop and process was that real or not.

[00:33:52] Paul Roetzer: So, all of these companies are aware that the things they're building will be misused and are being misused.

[00:33:59] Paul Roetzer: [00:34:00] And the more advanced they become Like advancements like Sora, the more likely it is that this is going to become a major problem in society.

[00:34:10] Mike Kaput: So, this seems like, in my opinion, at least initially, a good faith effort to address with deepfakes and synthetic content, largely through smarter engineering, which is awesome. Realistically, like, how feasible it to tackle this problem mean,

[00:34:30] Mike Kaput: the companies involved like Meta readily admit, many, caveats in their blog about this, that image labeling only works if all the major players actually do it. It's easy, apparently, to remove labels and watermarks. And

[00:34:46] Mike Kaput: it's not possible today to identify all AI generated content. Like, for instance, I think largelythis is happening right now with h image generation, whereas video and are lagging behind So, [00:35:00] how do you view that, and what are your thoughts on how likely this is to actually make an impact?

[00:35:06] Paul Roetzer: I I mean, the way I look at this is, it's critical that they're doing something in a unified way or apparently unified way,

[00:35:14] Paul Roetzer: where they're trying to find ways to address I think it's. Authentic, like I think that they truly do realize this is going to be a major problem and they are trying to find some But it does appear that even

[00:35:29] Paul Roetzer: this unified approach has massive limitations. So, you know, you mentioned the metadata can be removed. People can just take screenshots of things and spread the screenshots. you can even take a screen recording of something and spread it and like it appears real.

[00:35:44] Paul Roetzer: Like, so there's going to be limitations on a technical side.

[00:35:47] Paul Roetzer: Then you need the social networks to be willing to detect and flag and remove this content. So we saw, you know, the Taylor Swift example with, TwitterX, where it was like 48 hours before they Did [00:36:00] anything about.

[00:36:00] Paul Roetzer: fake content that they knew was fake content that was spreading. So even when they know it, they still have to do something about it.

[00:36:07] Paul Roetzer: So then you rely on the distribution channels to do something about the fact that this content is real and that it's not just whatever we want to, you know, say it's like freedom of speech that it's like, whatever people are

[00:36:19] Paul Roetzer: allowed to create. Deepfakes and spread them. What people are making arbitrary decisions around this stuff.

[00:36:24] Paul Roetzer: and then, you know, deal with the fact that there's a whole bunch of open source capabilities, like whatever Sora enables three to six months behind it is going to be some open source model that can do the exact same thing and they're not going to care.

[00:36:39] Paul Roetzer: And so I just, I feel like we have to make these efforts. The more people that are involved the better,

[00:36:46] Paul Roetzer: the more brain power and computing power that is going to solving this, it's, it's good, but I don't think any of it is going to solve this, in a

[00:36:58] Paul Roetzer: very uniform way [00:37:00] and that goes back to what you and I talk about all the time is like, the only true way to address this is through AI literacy is to make people aware.

[00:37:09] Paul Roetzer: that this stuff exists and it's possible to develop very real

[00:37:15] Paul Roetzer: looking videos and images and audio, and that they can't trust what they see online, that the people have to be able to vet things. But that's a,

[00:37:24] Paul Roetzer: that is a monumental task. Like we live in a society where people want to believe. What fits within their purview of the world, their political beliefs, their religious

[00:37:35] Paul Roetzer: beliefs, whatever it is, whatever they want validation for, they will believe anything that validates their beliefs. And so if people can create images and videos and text and audio

[00:37:47] Paul Roetzer: that validates what they want to be true, it doesn't matter to them.

[00:37:52] Paul Roetzer: If it was AI generated, they don't even want to know. They're just

[00:37:55] Paul Roetzer: like, I just want to ignore that. And so

[00:37:58] Paul Roetzer: I don't see a solution to that in [00:38:00] society anytime soon. And so I think it's important that the technical side is

[00:38:05] Paul Roetzer: doing what they're doing. I think it's important that more people drive AI literacy throughout society. And combined, I think that's the best we're going to get, but.

[00:38:14] Paul Roetzer: People have to accept that there is no solution to this, like, we are, we are going to live moving forward in a society filled with misinformation, disinformation, and synthetic media that is spread through social networks regardless of what those social networks try and do to stop it.

[00:38:32] Paul Roetzer: um. it, this is just the world we're going to live in and I think we just have to kind of like accept that and start doing our part with our own

[00:38:40] Paul Roetzer: kids, our own family, schools, um, businesses, like wherever you can influence, I think we just need to Do our part to try and raise awareness around this and get people to.

[00:38:51] Paul Roetzer: be more responsible about how they consume and share information. I, do you have any other thoughts

[00:38:56] Paul Roetzer: on that, Mike? Like, I don't know what else to do. I really feel like that is the [00:39:00] only real answer.

[00:39:00] Mike Kaput: Yeah, I completely agree because, you know, I don't want to pick on meta here, but we're basically asking.

[00:39:09] Mike Kaput: An organization that has routinely failed to regulate its platform adequately to now regulate this at scale. So I appreciate very much what they are doing, and I'm glad we're doing something, but at the end of theday, our ability to outrun this is Our ability to outrun this is impossible, so I think you just have to accept that we need to change the paradigm our own mind about what is real.

[00:39:34] Mike Kaput: Basically move forward, assuming nothing you see real until verified. But that I almost wonder it would be Helpful at some point, industry association or something that's like running ads Right. It needs to be some public service around what this technology is capable of.

[00:39:53] Mike Kaput: I want to see a Superbowl ad that educates

[00:39:56] Paul Roetzer: people about Yeah, I agree. And

[00:39:58] Paul Roetzer: I think like, you [00:40:00] know, just as we're talking about this, like, I don't even know that.

[00:40:04] Paul Roetzer: Like legal solutions are the answer because you could, you get into like, is and this has been already I think litigated But like are the social networks responsible do have a liability for the spread of misinformation that does harm,

[00:40:16] Paul Roetzer: to the spread of deep fakes That does harm Can they be financially liable criminally liable?

[00:40:21] Paul Roetzer: I don't know. But I don't even know that that's going to solve

[00:40:24] Paul Roetzer: it. And I think that depending on which political party is in office, like again, I, when we speak politically, it's like a general awareness of like how the United States government works. I I think the motivation to do something about this is going to vary based on Who's in office for four years

[00:40:44] Paul Roetzer: And so I don't even think that's the solution. So yeah, I do. I think, you know, I'd love to see not only this,

[00:40:51] Paul Roetzer: C2PA, with all these companies buying, but I like your idea of like you as part of this, like each of you needs to put in

[00:40:59] Paul Roetzer: [00:41:00] 10 million for a public awareness campaign around synthetic media. Like that, I think that's a great idea, Mike. I think like.

[00:41:06] Paul Roetzer: Put your money behind this as well as your technical prowess and let's actually try and change the understanding across society because we're going to run out

[00:41:14] Paul Roetzer: of time before the next election cycle. We're in the

[00:41:17] Paul Roetzer: next election cycle already. Yeah.

[00:41:20] 

[00:41:20] Andrej Karpathy Departs OpenAI

[00:41:20] Mike Kaput: Alright, so diving into some of the rapid fire topics this week. First up, Andrej Karpathy, one of the founding members of OpenAI, and one of its top

[00:41:32] Mike Kaput: AI researchers. company. in a post on X, Karpathy said that, quote, nothing happened and it's not the result of event, issue, or drama.

[00:41:43] Mike Kaput: He but compliments for the team. And he said his immediate plan is to work on personal projects and, what happens. In the post, he also hinted that his long time followers might have a little idea of what his next project might So. Paul, we don't know a [00:42:00] ton here, but you followed him for a while, like, think is going on here, I do you buy the story, nothing really happened, any ideas on what he might be working on next?

[00:42:10] Paul Roetzer: No, I mean,

[00:42:11] Paul Roetzer: I have to say, I think he'd be working on AI agents of some sort. He's a huge proponent of open source. The only thing that seemed odd to me in his time at OpenAI. And again, we talked in

[00:42:23] Paul Roetzer: depth about Andrej's last week's episodes. So if you look,

[00:42:26] Paul Roetzer: that's what was weird is this happened on like tuesday.

[00:42:29] Paul Roetzer: I think he announced, he told OpenAI he was leaving on Monday. It came out on Tuesday of this week, this past week.

[00:42:35] Paul Roetzer: And we had just talked in depth about World of Bits and his work on AI agents. We talked about OpenAI. going aggressively into the AI Asian space, which seemed

[00:42:43] Paul Roetzer: to align with why Andrej went back. So on its surface, it didn't make a heck of a lot of sense. There wasn't much other than like rumors

[00:42:52] Paul Roetzer: and theories online. The only thing that jumped out to me, cause again, he went back to open AI

[00:42:57] Paul Roetzer: right around a year ago. the day [00:43:00] he left.. day he It was about a year in February, february 8th, I think is when he started back at OpenAI and he left on

[00:43:05] Paul Roetzer: February 13th or So he was back for one year. Um, He did that busy persons, intro to lLMs YouTube video

[00:43:17] Paul Roetzer: over the holidays, which. I thought was interesting because it was very clear. It was like, Hey, this isn't open AI. What were they working on? This is my understanding of what's going on in the larger research community.

[00:43:27] Paul Roetzer: And there were some elements within that I thought diverted a little from what he was working on or appeared to be working

[00:43:34] Paul Roetzer: on an open AI. But the biggest thing to me is he is a. he appears to be a very big proponent of open research and open source models, which is not the path open AI is going down.

[00:43:47] Paul Roetzer: And so it wouldn't surprise me if he did something more in that realm. I can't imagine he's going to go start his own, you know, AI. So he could certainly raise as much money as he wanted if he wanted to do that. [00:44:00] I don't know.

[00:44:01] Paul Roetzer: I think he's going to work on AI agents and I think he's going to do something in the open source world more

[00:44:06] Paul Roetzer: so than what was happening at OpenAI, but it's perplexing like, I don't know.

[00:44:12] Paul Roetzer: And I, other than, other than some theories, nobody seems to know and he's not really saying much. we'll see, definitely intriguing.

[00:44:20] Meta releases V-JEPA model

[00:44:20] Mike Kaput: Yeah, there's probably some more coming out soon about

[00:44:24] Mike Kaput: him. Yeah.

[00:44:25] Mike Kaput: Alright, in other news, Meta has announced that it's publicly releasing a model called

[00:44:30] Mike Kaput: V-JEPA

[00:44:32] Mike Kaput: it's an acronym that stands for Video Joint Embedding Predictive Architecture. Now that sounds like a mouthful, but

[00:44:40] Mike Kaput: it

[00:44:40] Mike Kaput: is, it is,

[00:44:43] Mike Kaput: but it is important because V-JEPA is basically a model that's trained on video data.

[00:44:48] Mike Kaput: And as a result of the way it's trained, it can efficiently learn concepts about the physical world. So it can learn new concepts and do new tasks using only a few examples. And this [00:45:00] sounds like it kind of gives

[00:45:00] Mike Kaput: the model the ability to learn more like a by simply observing the world. Now, according to Meta VP Scientist, Yann LeCunn, V-JEPA is a step toward a more grounded understanding of the world so machines can achieve more generalized and planning.

[00:45:20] Mike Kaput: Our goal is to build advanced machine that can learn more like Forming internal models of to learn, adapt, and forge plans efficiently in the service of completing complex tasks. Now right now this model is open to researchers Creative Commons license.

[00:45:40] Mike Kaput: So Paul, this definitely seemed to have relation to what we saw with Sora, we have LeCun basically saying, that these are designing machines to have more generalized reasoning and planning. And this becomes a little interesting when you start thinking about how this might relate [00:46:00] to Meta's wearable AI products the Ray Bans are selling now, and other things they might be.

[00:46:09] Mike Kaput: the ability to visuals reason on the fly as you world could be at play here What were your thoughts about?

[00:46:17] Paul Roetzer: So, a couple thoughts, I guess.

[00:46:21] Paul Roetzer: One, it's very technical. Like, I mean,

[00:46:24] Paul Roetzer: this isn't

[00:46:24] Paul Roetzer: like the average marketer or business leader is going to go in and read this stuff and really have a deep comprehension of what in the heck they're talking about. I think it's, it's always important to come back to the bigger picture, which is Yann LeCun does not subscribe to the open AI and appearing.

[00:46:41] Paul Roetzer: to be Google approach to throw more training data, more computing power, build more data centers,

[00:46:47] Paul Roetzer: get more NVIDIA chips, and just keep brute forcing intelligence through language models and transformers and diffusion models. Like he, he doesn't believe that.

[00:46:57] Paul Roetzer: And so it's interesting a lot of times because it [00:47:00] seems like Meta is. doing a lot of that. Like they have teams within the

[00:47:04] Paul Roetzer: AI research lab that he runs that are doing things with language models and diffusion models. And

[00:47:10] Paul Roetzer: yet he believes that like we need some scientific breakthroughs to get to the next level, to, to, to learn the way a

[00:47:18] Paul Roetzer: child would learn, which is kind of like, if you ever listened to him do talks, it's what he talks about.

[00:47:23] Paul Roetzer: He's like, you know, a two year old, a toddler. doesn't learn the way we're brute forcing these things to learn. They learn through a worldview. They observe the world, they understand how physics works, they understand gravity, like they, they learn to understand time and space and the things around them, and how to interact with those environments.

[00:47:40] Paul Roetzer: And so he doesn't. believe that just brute forcing large language models will get us to that toddler

[00:47:47] Paul Roetzer: level understanding of the world, And so he's done some talks that honestly, I've tried very, very hard to understand what he's saying. And it's, it's

[00:47:58] Paul Roetzer: very [00:48:00] complicated. Andrej Karpathy is like wonderful at simplifying his concepts and like, you know, it's kind of tangible.

[00:48:06] Paul Roetzer: Yann is just like brilliant and sometimes it's really hard to comprehend what exactly he's saying. but that's my general

[00:48:15] Paul Roetzer: takeaway is when I try and simplify it down,

[00:48:19] Paul Roetzer: a toddler learns through observation of the world around them and they learn to understand a world model and

[00:48:24] Paul Roetzer: they can interact with that world.

[00:48:27] Paul Roetzer: Machines can't, and he doesn't think the way that it's being done in some of these other research labs is going to get

[00:48:32] Paul Roetzer: us to that toddler level And

[00:48:35] Paul Roetzer: so this, I think, you know, it might be six months, 12 months, two years before what we're talking about here finds its way into some meta product or some leads to some breakthrough that all of a sudden it's like, Oh, meta was right there that they did need this to get a worldview.

[00:48:53] Paul Roetzer: And here's how it's happening. but even like Elon Musk tweeted in reply to this is like, Oh, we've had the ability to do

[00:48:59] Paul Roetzer: these [00:49:00] worldviews for over a year with Tesla full self driving. It's like, Oh my God, like, I can't even go this route. So my main takeaway here is it's really technical. It's, it's cool to kind of like know this stuff is happening.

[00:49:12] Paul Roetzer: I think it's really important. People understand there are different beliefs and approaches being taken to get the leap forward.

[00:49:20] Paul Roetzer: general intelligence. Um, and this is one of them kind of keep, you know, in the back of your mind, I guess.

[00:49:29] Mike Kaput: So, we also got an announcement that OpenAI is testing,

[00:49:33]  Memory and new controls for ChatGPT

[00:49:33] Mike Kaput: giving, giving ChatGPT a memory. So this means ChatGPT will be able to remember things you discuss across all of your chats. So you don't have to repeat information. The way they describe this working is that as you chat with ChatGPT, you can ask it to remember specific, or let it pick up details itself.

[00:49:54] Mike Kaput: ChatGPT's memory will get better the more you use it, and you'll start to notice improvements [00:50:00] You can also control what it forgets, or turn off this feature entirely. So, some examples provided by OpenAI that I thought were pretty interesting for our world of how you might benefit from this include things like ChatGPT could remember your tone, format preferences, then automatically apply those to your blog post drafts every time you write one.

[00:50:24] Mike Kaput: It could remember your preferred programming language and frameworks when generating code for you. Or it could remember, say, the format and outputs required for a monthly that you pull regularly using your company's data each and every month. Now this memory feature right now is rolling out to a small portion of ChatGPT Free and Plus according to OpenAI.

[00:50:49] Mike Kaput: And they said that they'll share plans. Soon about a broader rollout.

[00:50:55] Mike Kaput: So, paul, the first question that kind of comes to mind for me here, when hearing these new [00:51:00] capabilities is one, they're awesome. This sounds really interesting, does this undercut some of the capabilities of other startup ecosystem?

[00:51:09] Mike Kaput: Like often a selling point of some of these tools is they can learn your brand voice, your tone, your style, customized to you. That sounds like just a feature of ChatGPT.

[00:51:21] Paul Roetzer: Yeah, I think that's right. and honestly, like I haven't really spent a ton of time

[00:51:27] Paul Roetzer: thinking deeply about this one yet, but. there's a reasonable chance this is the biggest news of the week. Like, as a prelude to what they're going to do with this and a prelude to building AGI, this is a critical step.

[00:51:38] Paul Roetzer: so I think this will have very tangible implications to users like temporary chat versus, yeah, go ahead and remember this.

[00:51:47] Paul Roetzer: It'll play out into You know, we've talked numerous times about these truly virtual, intelligent assistants, where like Surrey and Alexa and Google Assistant and in theory, you know, OpenAI's chatbots or InflectionPi, [00:52:00] that for them to become your true personal assistant and be truly intelligent, they have to remember everything.

[00:52:07] Paul Roetzer: so memory is absolutely like essential to where they, they want to go with general intelligence, but on the

[00:52:15] Paul Roetzer: page where they announce this, it even gets into team and enterprise customers. So it says for team and enterprise, can learn style preferences

[00:52:23] Paul Roetzer: as you alluded to build upon past interactions to save you time. And then at bullet points, like

[00:52:27] Paul Roetzer: it can remember your tone, voice, and format preferences, automatically apply

[00:52:30] Paul Roetzer: them to blog post draft without needing repetition. when

[00:52:34] Paul Roetzer: coding, it'll remember, you know, preferences of subsequent tasks and storyline process.

[00:52:38] Paul Roetzer: For monthly business reviews. You can upload your data to ChatGPT and it creates your preferred charts with three takeaways each.

[00:52:44] Paul Roetzer: Like, hmm. Yeah, I mean, this

[00:52:46] Paul Roetzer: is moving in a really interesting direction, and then it, says GPT, so the custom ones you can build yourself, like, those can have memory too. So, this seems to, like, not only be playing in, as you alluded to, the startup space where, [00:53:00] in some cases the differentiation is that you can train it on style guides and certain documentation and knowledge graphs. It seems like the direction they'll go, and

[00:53:10] Paul Roetzer: I'm sure gemini is going in the same direction with Google. is that you can just train it on these things and it'll remember everything. And that as the, you know, the context window we talked about earlier with Gemini 1. 5, as

[00:53:25] Paul Roetzer: its memory becomes greater, because those 10, you know, 10 million tokens, which is probably where we'll be a year from now, but to remember better than a human level.

[00:53:35] Paul Roetzer: Because if you think about it, like we keep, we keep trying to pretend like these things need to be perfect because that's what we expect from software

[00:53:43] Paul Roetzer: That it just, it's perfect. But the reality is like, Mike, if you and I watch a two

[00:53:48] Paul Roetzer: hour movie. There's going to be a whole bunch of things in there.

[00:53:51] Paul Roetzer: Have no recollection of.

[00:53:52] Paul Roetzer: My 12 year old daughter has like attention to detail far beyond anything

[00:53:57] Paul Roetzer: I have. And she will remember like little things [00:54:00] from a movie. Like, remember when this happens? No,

[00:54:01] Paul Roetzer: I have no recollection of that happening. We just watched that last week. and so like

[00:54:07] Paul Roetzer: to imagine I think that there's a very near point where not only do they have the ability to have

[00:54:14] Paul Roetzer: a million tokens of context, or ten million tokens of context, but memory that far surpasses human level memory of everything

[00:54:22] Paul Roetzer: within those tokens in that and that's to me why I think this is

[00:54:27] Paul Roetzer: This is probably the most important piece when you combine it with what we talked about earlier. The ability to understand and generate

[00:54:34] Paul Roetzer: video, to, you know, understand audio, long research reports, you know, 100, 000 words,

[00:54:40] Paul Roetzer: and instant recall of everything. That's crazy. Like when you really

[00:54:46] Paul Roetzer: stop and think that we may be, like, one to two years out from one to ten million tokens of context and maybe it's like 99 percent accuracy of [00:55:00] outputs, like, it doesn't hallucinate anymore, certainly no more than a human would, and I think that's the benchmark and that's

[00:55:06] Paul Roetzer: where a lot of people look and say,

[00:55:07] Paul Roetzer: You Do we really need AGI? Like, does it even matter? I've said this before. It's like, who cares? Like, maybe they'll get to whatever they call AGI. But if we get to a world a year from now,

[00:55:17] Paul Roetzer: where for 30 a month, we have ChatGPT team in our company and it has access to all of our data, all of our videos, our audio, our images, and I don't have to go keyword search things. I can just

[00:55:28] Paul Roetzer: talk to it and say like, find me the video where Mike and I talked about memory and ChatGPT and boom, it's just like right there. Give me a summary of

[00:55:36] Paul Roetzer: like what we said when that happened. And it's like going through the transcript and the video and like, just, oh, what shirt was I wearing that day?

[00:55:43] Paul Roetzer: Boom, you're wearing an old pR 2020 shirt. Like this is. it's a wild thing when you really step back and think about what it would mean to have

[00:55:53] Paul Roetzer: almost infinite memory of multimodal, everything we have, and to have, better than human [00:56:00] recall of, of that. And I don't, I don't think that we're far off from either of those

[00:56:06] Mike Kaput: so it

[00:56:08] OpenAI Develops Web Search Product

[00:56:08] Mike Kaput: turns out also OpenAI has been busy developing a web search that would essentially bring them more into direct competition with Google, according to reporting from the information who is citing an plans. This person also said service would be partly powered by Bing m,

[00:56:31] Mike Kaput: But it isn't clear if this is going to be separate from ChatGPT, or baked right into it. And Microsoft and OpenAI have both declined to comment. So this is certainly very much in the rumor phase right now, but very, very interesting to consider because it does seem like Huge if they pull something like this off

[00:56:53] Mike Kaput: because I mean, at least from my perspective the financial incentives are really here.

[00:56:58] Mike Kaput: Google has way more [00:57:00] to lose than Microsoft if people stop seeing search ads. Microsoft and OpenAI don't necessarily need search revenue to operate their businesses. Google does. seems like high reward for Microsoft OpenAI, high risk for Google here. Do you agree with that? What did think when you saw this?

[00:57:18] Paul Roetzer: It seems like an obvious play. It's so bizarre to me how everything Microsoft and OpenAI do appears to be competing with each other and yet microsoft owns like 49 percent of OpenAI, and

[00:57:30] Paul Roetzer: I don't know, that relationship is so bizarre to me.

[00:57:35] Paul Roetzer: Perplexity, man, like they're, they're either going to be like a

[00:57:40] Paul Roetzer: hundred billion dollar company or OpenAI is just going to like replicate what they do in ChatGPT. Like, I don't know if there's ever been like an all or nothing business that could

[00:57:50] Paul Roetzer: like literally redefine search or just get obsoleted tomorrow.

[00:57:54] Paul Roetzer: Like, I don't know. I don't know, man.

[00:57:56] Paul Roetzer: This is why I was like struggle so great. Like to like [00:58:00] personal investing, like what startups would I even invest in? Like, I love perplexity. Like you got me turned on to it after that episode we did. And I use it daily. Like it's, it's one of like the key pigs, but I could easily see Google just

[00:58:13] Paul Roetzer: Basically, you know, emulating it or making it better, and I just start using Google because I'm in there all day anyway.

[00:58:21] Paul Roetzer: yeah, I don't know, like you said, it's kind of like just, the information has it, they're usually extremely accurate in their reporting, but there's not a heck of a lot to go on other

[00:58:29] Paul Roetzer: than that. Certainly worth paying attention

[00:58:31] Judge rejects most ChatGPT copyright claims from book authors

[00:58:31] Mike Kaput: For sure. So in other news, a U. S. district judge in California has largely sided with OpenAI in three separate lawsuits about copyright that have been brought against the company by a group of authors that included Sarah Silverman, Michael Chabon, and Paul Tremblay. According to Ars Technica, Quote, by allegedly repackaging original works as ChatGPT outputs, the authors OpenAI's most popular chatbot was just a high [00:59:00] tech grift that seemingly violated copyright laws, as well as state laws preventing unfair business practices unjust enrichment.

[00:59:07] Mike Kaput: Now we won't get into all the legalities and legal terms here, we're certainly not lawyers, but basically it appears a judge agreed with OpenAI. that every one of these claims about ChatGPT's output all being an infringement of copyright were not actually the case. Quote, Authors failed to convince the judge that OpenAI violated the Digital Millennium Copyright Act by allegedly removing copyright management information like authors names, titles of works, and terms and conditions for the use of their work from training So all these all of these claims and issues that authors had with OpenAI using some of this work in ChatGPT have basically been thrown out by a judge, except for the claim under California's unfair competition law

[00:59:59] 

[00:59:59] Mike Kaput: [01:00:00] that OpenAI used copyrighted works to train ChatGPT without So, To kind of step back from this, we don't have an overall on whether or not OpenAI used copyrighted works to train its models, but all the other stuff around that claim that some of these authors tag them with, it sounds like a judge is siding with OpenAI.

[01:00:24] Mike Kaput: so it kind of certainly sounds like we could see a future here where OpenAI ends up maybe paying some fines all move on, whether we agree with what they've done here or not. Like, what did you think of this

[01:00:35] Paul Roetzer: ruling?

[01:00:36] Paul Roetzer: I still,

[01:00:38] Paul Roetzer: our uneducated non attorney opinion, I still seems like that's the most likely outcome to me because I think these

[01:00:47] Paul Roetzer: cases are going to take years to try and by the time that we have any finality to it, like Supreme Court level finality, they'll be on GPT 8 and they'll have done it all through synthetic [01:01:00] data and licensed data

[01:01:01] Paul Roetzer: and like all this stuff about was it, or was it not fair use is just going to be irrelevant and So I could definitely see a scenario

[01:01:09] Paul Roetzer: where they just end up paying some big hefty fines and who cares because they're worth five trillion dollars by then and you pay your hundred billion and move on.

[01:01:16] Paul Roetzer: So I don't know. I just, there's going to be so many cases like this and, you know, this really happened, but it doesn't really mean anything. And I

[01:01:24] Paul Roetzer: think we're just going to, it's going to be that ongoing theme. And again, like nobody knows. So anybody who tells you with high level of confidence, this is, or is not what's going to happen.

[01:01:34] Paul Roetzer: They're just. Just driving clicks. Like, we don't know. They don't know,

[01:01:39] Paul Roetzer: let the, let the courts figure it all out, I guess.

[01:01:42] Mike Kaput: Alright,

[01:01:43] Mike Kaput: so as we wrap up this week, Paul, you posted

[01:01:47] Revisiting Paul’s MAICON 2023 Keynote

[01:01:47] Mike Kaput: this week on LinkedIn about all these crazy advancements we saw in AI this week, and how you're kind of personally thinking about AI's ability to create for us more time in our lives, our [01:02:00] work, in our personal life, etc. Could you maybe share a little bit more to close us out here?

[01:02:05] Mike Kaput: about your thoughts.

[01:02:06] Paul Roetzer: Yeah, I'll just read what

[01:02:08] Paul Roetzer: I wrote, because I think it just sort of speaks for itself, and it is, like, I was talking with my friend Jeff Roars, about, like, time and aI and the impact it has. And

[01:02:17] Paul Roetzer: it's something I've thought a lot about, as you know, mike, like, you and I have talked

[01:02:21] Paul Roetzer: a bunch about this, and you know, the more and more

[01:02:23] Paul Roetzer: we meet with You know, company leaders who are trying to drive efficiency and productivity and looking for ways to increase profits and drive revenue and reduce the costs.

[01:02:32] Paul Roetzer: And like, these are the things that come up all the time. And so for my macon,

[01:02:38] Paul Roetzer: if you're not familiar, our marketing AI conference we run every year, Macon 2023, the keynote

[01:02:43] Paul Roetzer: I did. I ended with this line and I'll just kind of read it and we'll call it a day for the

[01:02:49] Paul Roetzer: podcast because I think it's just something good for people to reflect on. so what I said at the Macon was part of the reason I began pursuing artificial intelligence

[01:02:57] Paul Roetzer: 12 years ago was because I saw it as a [01:03:00] path to extend time.

[01:03:01] Paul Roetzer: I'd always wondered why time seemed to move faster as we got older. I realized at least for me that the busier I was and the longer hours I worked, the faster the days and weeks seemed to fly by.

[01:03:14] Paul Roetzer: When our first child was born in 2012,

[01:03:17] Paul Roetzer: I began to truly appreciate every second of every day. I knew I couldn't get more than 24 hours out of a day, but I thought

[01:03:24] Paul Roetzer: it might be possible to slow those 24 hours down. I didn't understand exactly what AI was back then, but it seemed to hold the potential to unlock productivity gains, which would allow us to redistribute the time saved and live more

[01:03:37] Paul Roetzer: more fulfilling lives.

[01:03:40] Paul Roetzer: What I've since learned then is that AI on its own won't extend time for me or anyone else. It will increase efficiency and productivity at a scale we have never seen in human history, but we have to

[01:03:53] Paul Roetzer: make the choice

[01:03:54] Paul Roetzer: to use the increases to benefit humanity. Otherwise, we'll just fill the time with more [01:04:00] work and find new ways to maximize profits at the expense of people.

[01:04:04] Paul Roetzer: We have one chance to get this right. AI can give us the greatest gift of all, more time. Or it can be just another technological revolution

[01:04:13] Paul Roetzer: that expands our work, fills our hours, and leads us down the path of never ending productivity gains for profits. We can choose to make

[01:04:20] Paul Roetzer: the future more intelligent and more

[01:04:24] Mike Kaput: That is an awesome way to end this crazy week I'm sure all this people, you know, certainly excited, but also a with how things are

[01:04:38] Paul Roetzer: just such a Yeah, I think it's just good perspective. It's good for me too, honestly, like sometimes I just go back and think about why we're doing this, like why did I pursue this path initially and With that,

[01:04:50] Paul Roetzer: I'm

[01:04:50] Paul Roetzer: going to go spend Sunday with my kids.

[01:04:52] Mike Kaput: I

[01:04:52] Mike Kaput: love it. Paul, as always, thank you so much for breaking down everything in AI this week for us. I would just [01:05:00] encourage everyone, if you have not subscribed yet to our newsletter this week in AI'd highly encourage you to do so. Go to

[01:05:06] Mike Kaput: marketingainstitute. com

[01:05:09] Mike Kaput: forward slash newsletter to sign up. We break down both the stories we just discussed even further and all the ones we don't have time for in a podcast episode.

[01:05:19] Mike Kaput: And there's usually At least half a dozen other things going on that you should know about. So this week in AI is a comprehensive digest that you can quickly read to learn everything you that's going on in

[01:05:32] Paul Roetzer: AI

[01:05:33] Paul Roetzer: All right. Thanks, Mike. Everyone we'll

[01:05:35] Paul Roetzer: talk with you again next week.

[01:05:36] Paul Roetzer: Have a great week.

[01:05:37] Paul Roetzer: Thanks, Paul!

[01:05:39] Thanks for listening to The AI Show. Visit MarketingAIInstitute. com to continue your AI learning journey. And join more than 60, 000 professionals and business leaders who have subscribed to the weekly newsletter, downloaded the AI blueprints, attended virtual and in person events, taken our online AI [01:06:00] courses, and engaged in the Slack community.

[01:06:03] Until next time, stay curious and explore AI.