O3 "Arc AGI" Postmortem

garymarcus.substack.com

38 points by signa11 5 hours ago

Summary: "O3 is probably not AGI, despite hot debates on short-form social media sites."

jsheard 4 hours ago

Also "marketing will make misleading claims or lies-by-omission to hype a product", which is a lesson we seem to keep forgetting. That goes in general but doubly so for AI companies, which have to keep investors in a perpetual state of manic hype until they figure out a way to actually make any money.
- margalabargala 2 hours ago
  
  > That goes in general but doubly so for AI companies
  Agree but I'd like to expand. It goes doubly so for "whatever industry/technology is popular right now". AI is that at the moment, but won't be forever. The extra lying is a function of popularity and thus available capital, not something inherent to "AI".

cs702 3 hours ago

Yes, there's a lot hype, and also quite a bit of drama, but there's also real progress.

Every time I read anything by Gary Marcus, I feel like we're stuck in a silly loop:

  while True:
    progress = HardWordByMany_function(progress)
    print(summarize(progress))
    >>> Models can now do this previously impossible thing!
    criticism = GaryMarcus_function(progress)
    print(criticism)
    >>> It's not intelligence, because it can't do this other thing.

tim333 29 minutes ago

There was quite a funny bit on Hinton using Marcus as an example of a human who doesn't understand neural net but confabulates https://www.reddit.com/r/singularity/comments/1ajemjc/spoile...
keepingscore 2 hours ago

Fwiw I don't think he's talking to you. My feed is filled with ai influencers getting all kinds of likes and comments, ready to declare AGI is here!
- cs702 2 hours ago
  
  Yes, there's a lot of hype, much of it fueled by all those "ai influencers."

CliveBloomers an hour ago

This is the moment businesses have been waiting for—the game-changer that’s about to redefine everything. Expect to see it rolled out across every industry, in every country, everywhere, by next year. And it’s not just hype; it’s that good. It’s so simple and effective that even the receptionist could deploy it, and the CEO? They can trim down the staff, skip the meetings, and spend their day just asking it questions.

No need for tech experts anymore. Software Engineers? DevOps? Gone. The whole IT department? Practically obsolete. All that’s left is the CEO, the cloud, and this. It’s lean, it’s powerful, and it’s set to shake the foundations of how businesses operate. The future isn’t coming—it’s already here, and it’s going to be wild.

bugglebeetle 3 hours ago

I’m as skeptical of OpenAI as just about anyone, but Marcus just seems to engage in the kind of contrarian ranting that happens to people when they spend too much time on social media. I would say a better indication of o3’s performance is François Chollet seeming kind of bummed that LLMs have reached this point, contrary to his expectations.

EDIT:

In the interest of not being overly negative myself, I found Nathan Lambert’s more measured and detailed breakdown far more instructive:

https://www.interconnects.ai/p/openais-o3-the-2024-finale-of...

jagrsw 3 hours ago

Chollet's shift in tone is interesting. As recently as 2022, he expressed strong skepticism about LLMs solving ARC-AGI within a decade. Now, he seems to suggest that significant progress on the challenge is possible within the next 1-2 years.
Of course, he strives to maintain a consistent public persona, but this shift strongly implies that o3's performance has significantly altered his views.
This field, much like economics, is full of experts who can convincingly explain why there were wrong :)
PS: ARC-AGI is a very well-though-out/interesting/useful test.
- dimmuborgir 3 hours ago
  
  The result announcement blog post sounded too hypey and gave the impression that he changed his tone but for the last one year he has been consistently saying that LLM+"discreet program search" is the way to go. All the top scoring submissions before o3 had followed the same bruteforce strategy. Even o3 is more or less doing the same bruteforce under the hood, maybe not discreet.

SpicyLemonZest 3 hours ago

> The video should have been much clearer about what was actually tested and what was actually trained. To the average listener it may have sounded like the AI took the test cold, with a few sample items, like a human would, but that’s not actually what happened.

I have to agree with this. I haven't watched the video, but that's exactly what the hype surrounding it made me think up until I read this sentence. An AI system that can be fine-tuned to achieve human level performance on a variety of intellectual puzzles still sounds like an important advance, don't get me wrong, but being able to manufacture a model that does what you need is very different than having a model that's already capable of everyone's needs.