> Give AMD Engineers more compute and engineering resources to fix and improve the AMD ecosystem, they have very few internal gpu boxes relative to what Nvidia provides to their engineers.
This is real. We’ve found ourselves having to give hardware to engineers at AMD because they’re unable to get allocation of it internally.
Sadly common at hardware companies. The most extreme case I've heard of is ASML, who supposedly doesn't keep any machines of their own. They test against "almost-ready" machines right before they go out the door to customers.
Many for last gen process nodes, and from a second or third hand supplier if you could even find one. ASML makes very few fully working machines each year, and the cost and throughput those machines have is astronomical.
They have spare parts you'd bet, and I'd bet they have some SLA agreement with each customer where an engineer is basically on call nearby in case a single thing dosnt work or a random part breaks or needs servicing.
Asianometry did a great video on the cost of downtime when it comes to ASML device in any fab. While I am not directly in this field and can't speak to the accuracy of the numbers john gives, he does not seem one to just make stuff up as his quality of video production for niche topics is quite good.
Almost a decade ago KFAB had a fire, power was cut, everything in process was dumped, they planned to restart but ended up being was cheaper to close the whole facility .
You joke, but it is almost a genuine investment opportunity here for a large player.
Spend a billion on AMD shares, Spend another Billion on a out-of-house software team to solve the software solution to more than double the share price.
Taking into account that there are players that already own billions in AMD shares, they could probably do that as well. On the other hand perhaps it would be better for them, as major shareholders, to have a word with AMD management.
I don't have the inside baseball but I have seen those weird as hell interviews with Lisa Su where she gets asked point blank about the software problems and instead of "working on it, stay tuned" -- an answer that costs nothing to give -- she deflects into "performance is what matters," which is the kind of denial that rhymes exactly with the problems they are having. No, the horsepower of your F1 racecar doesn't matter if the engine doesn't start and there's a wheel missing! You need to fix those problems before the horsepower can matter! Please tell me you are fixing the starter and the wheel!
Hopefully I am reading too much into this. Hopefully she doesn't have any weird hangups over investing in software and it all just takes time to Do It Right after GPGPU got starved in the AMD winter. But if it is a weird hangup then yeah, 100%, ownership needs to get management in line because whiffing a matmul benchmark years into a world where matmul is worth trillions just ain't it.
> she deflects into "performance is what matters," which is the kind of denial that rhymes exactly with the problems they are having.
It's not a deflection, but a straightforward description of AMDs current top-down market strategy of partnering with big players instead of doubling down to have a great OOBE for consumers & others who don't order GPUs by the pallet. It's an honest reflection if their current core competencies, and the opportunity presented by Nvidia's margins.
They are going for a bang-for-buck right now aiming at data center workloads, and the hyperscalers care a lot about perf/$ than raw performance at. Hyperscalers are also more self-sufficient at software: they have entire teams working on PyTorch, Jax, and writing kernels.
>Hyperscalers are also more self-sufficient at software: they have entire teams working on PyTorch, Jax, and writing kernels.
None of this matters because AMD drivers are broken. No one is asking AMD to write a PyTorch backend. The idea that AMD will have twice the silicon performance than nvidia to make up the performance loss for bad software is a pipedream.
Engineers at hyperscalers are struggling through all the bugs too. It's coming at notable opportunity cost for them, at a time when they also want an end to the monopoly. Do they buy AMD and wade through bug after bug, regression after regression, or do they shell out slightly more money for Nvidia GPUs and have it "just work".
AMD has to get on top of their software quality issues if they're ever going to succeed in this segment, or they need to be producing chips so much faster than Nvidia that it's worth the extra time investment and pain.
That's the excuse used by every big company shitting out software so broken that it needs intensive professional babysitting.
I've been on both sides of this shitshow, I've even said those lines before! But I've also been in the trenches making the broken shit work and I know that it's fundamentally an excuse. There's a reason why people pay 80% margin to Nvidia and there's a reason why AMD is worth less than the rounding error when people call NVDA a 3 trillion dollar company.
It's not because people can't read a spec sheet, it's because people want their expensive engineers training models not changing diapers on incontinent equipment.
I hope AMD pulls through but denial is _not_ the move.
What exactly are they in denial about? They are aware that software is not a strength of theirs, so they partner with those who are great at it.
Would you say AMD is "shitting the bed" by not building it's own consoles too? You know AMD could build a kick-ass console since they are doing the heavy-lifting for the Playstation, and the XBox[1] , but AMD knows as much as anybody that they don't have the skills to wrangle studio relationships or figure out which games to finance. Instead, they lean hard in their HW skills and get Sony Entertainment/the Xbox division do what they do best.
1.and the Steam Deck, plus half a dozen Deck clones.
There is probably one employee - either a direct report of Su's or maybe one of her grandchildren in the org chart - who needs to "get it". If they replaced that one manager with someone who sees graphics cards as a tool to accelerate linear algebra then AMD would be participating more effectively in a multi-trillion dollar market. They are so breathtakingly close to the minimum standards of competence on this one. We know from the specs that the cards they produce should be able to perform.
This is a case-specific example of failure, it doesn't generalise very well to other markets. AMD is really well positioned for this very specific opportunity of historic proportions and the only thing holding them back is a somewhat continuous stream of unforced failures when writing a high quality compute driver. It seems to be pretty close to one single team of people holding the company back although organisational issues tend to stem from a level or two higher than the team. This could be the most visible case of value destruction by a public company we'll see in our lifetimes.
Optimistically speaking maybe they've already found and sacked the individual responsible and we're just waiting for improvement. I'm buying Nvidia until that proves to be so.
I was surprised to hear recently that the same happens at NVIDIA! Hopefully less frequently, but I can understand why it's hard to keep many units on hand given the level of external demand.
Honestly, probably NVIDIA itself, since they contribute significantly to many open-source projects (MLIR), and also make their SoTA GEMM/Conv implementations open-source and available for study (Cutlass).
> AMD is attempting to vertically integrate next year with their upcoming Pollara 400G NIC, which supports Ultra Ethernet, hopefully making AMD competitive with Nvidia.
Infiniband is an industry standard. It is weird to see the industry invent yet another standard to do effectively the same thing just because Nvidia is using it. This “Nvidia does things this way so let’s do it differently” mentality is hurting AMD:
* Nvidia has a unified architecture so let’s split ours into RDNA and CDNA.
* Nvidia has a unified driver, so let’s make a different driver for every platform.
* Nvidia made a virtual ISA (PTX) for backward compatibility. Let’s avoid that.
* Nvidia is implementing tensor cores. Let’s avoid those on RDNA. Then implement them on CDNA and call them matrix cores.
* Nvidia is using Infiniband like the rest of the HPC community. Let’s use Ethernet.
I am sure people can find more examples. Also, they seem to have realized their mistake in splitting their architecture into RDNA and CDNA, since they are introducing UDNA in the future to unify them like Nvidia does.
You're painting this like AMD is off to play in their own sandbox when it's more like the entire industry is trying to develop an alternative to Infiniband.
Ultra Ethernet is a joint project between dozens of companies organized under the Linux Foundation.
>> The Linux Foundation has established the Ultra Ethernet Consortium "UED" as an industry-wide effort founded by AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, and Microsoft for designing a new Ethernet-based communication stack architecture for high performance networking.
You probably can't call it "industry standard" yet but the goal is obviously for it to become one.
That MatMul performance is fairly shocking. To be that much below theoretical maximum on what should be a fairly low overhead operation.
I would at least hope that they know where the speed is going, but the issue of torch.matmul and F.Linear using different libraries with different performance suggests that they don't even know which code they are running, let alone where the slow bits in that code are.
Low overhead in what sense? matmul is kinda complicated and there are varying, complex state-of-the-art algorithms for it, no? And then if you know things about the matrices in advance you can start optimizing for that, which adds another layer of complexity.
There are, but everyone uses variations of the same O(n^3) algorithm taught in college introduction to linear algebra classes because it is numerically stable and can be made extremely fast through tweaks that give spatial locality and good cache characteristics. Meanwhile the asymptomatically faster algorithms have such large constants in their big O notation that they are not worth using. FFT based matrix multiplication, which is O((n^2)log(n)), also has numerical instability on top of running slower.
By overhead I'm talking about the things that have to be done supplementary to the algorithm.
While there are complex state-of-the-art algorithms, those algorithms exist for everyone. The overhead is the bit that had to be done to make the algorithm work.
For instance for sorting a list of strings the algorithm might be quick sort. The overhead would be in the efficiency of your string compare.
For matmul I'm not sure what your overhead is beyond moving memory, multiplying, and adding. A platform touting a memory bandwidth and raw compute advantage should have that covered. Where is the performance being lost?
I guess the only real options are stalls, unnecessary copies, or unnecessary computations.
Which algorithm you pick for what shape of matrices is different and not straightforward to figure out. AMD currently wants you to “tune” ops and likely search for the right algorithm for your shapes while Nvidia has accurate heuristics for picking the right algorithm.
> For matmul I'm not sure what your overhead is beyond moving memory, multiplying, and adding. A platform touting a memory bandwidth and raw compute advantage should have that covered. Where is the performance being lost?
The use of the word 'algorithm' is incorrect.
Look... I do this sort of work for a living. There has been no useful significant change to matmul algorithms.
What has changed is the matmul process.
Modern perf optimization on GPUs has little to do with algorithms and everything to do with process optimization. This is akin to factory floor planning and such. You have to make sure the data is there when the processing units need it, and the data is coming in at the fastest rate possible, while keeping everything synchronized to avoid wrong results or deadlocks.
Really compute power has nothing to do with it. It's a waste of time to even consider it. We can compute matmuls much faster than you can naively bring memory to the processing units. Whoever solves that problem will become very rich.
To that end, NVIDIA ships libraries that will choose from a wide variety of implementations the appropriate trade-offs necessary for SoTA perf on matmuls of all shapes and data types.
To be fair, GEMV is memory bandwidth bound and that is what token generation in transformers uses. GEMM is the compute bound one, provided you do not shoehorn GEMV into it. That special case is memory bandwidth bound.
Yes and no. Conceptually it's just three nested loops. The fiddly part is unrolling the inner loop and swizzling the data layouts in such a way that the cores can be kept "fed" efficiently. This usually means breaking things up into cache-sized chunks along some axis.
It's easy enough that there's blog articles showing single developers getting within spitting distance of NVIDIA's highly optimised code. As in, 80-something-percent of the best available algorithms!
All NVIDIA did was "put the effort in", where the effort isn't some super clever algorithm implemented by a unique genius, but they simply made hundreds of variants of the matmul algorithm optimised for various scenarios. It's a kind of algorithmic brute force for eking out every last percentage point for every shape and size of input matrices on every GPU model and even for various SLI configurations.
From what I've seen, AMD has done... none of this.
I expect everyone has been saying it for a while, the calls are just getting more strident and public as it becomes clear that AMD's failures are strategic rather than tactical. And as people try to build business on their half-hearted attempts.
I still think it is a mistake to say that CUDA is a moat. IMO the problem here is that AMD still doesn't seem to think that GPGPU compute is a thing. They don't seem to understand the idea that someone might want to use their graphics cards to multiply matricies independently of a graphics pipeline. All the features CUDA supports are irrelevant compared to the fact that AMD can't handle GEMM performantly out of the box. In my experience it just can't do it, back in the day my attempts to multiply matrices would crash drivers. That isn't a moat, but it certainly is something spectacular.
If they could manage an engineering process that delivered good GEMM performance then the other stuff can probably get handled. But without it there really is a question of what these cards are for.
I wonder to what extent vulkan compute could be used for this. Of course, it is only an option on their RDNA GPUs since CDNA is not for graphics, even though that is the G in GPU.
Yeah, 80% margins on matrix multiplication should be a puddle not a moat but AMD is more scared of water than the witch that melts in Wizard of Oz so I guess the puddle is a moat after all.
Anyone who looks at the mess that is ROCm and the design choices they made could easily see that.
GPU support lagged behind for years, no support for APUs and no guaranteed forward compatibility were clear signs that as a whole they have no idea what they are doing when it comes to building and shipping a software ecosystem.
To that you can add the long history of both AMD and ATI before they merged releasing dog shit software and then dropping support for it.
On the other hand you can take any CUDA binary even one that dates back to the original Tesla and run it on any modern NVIDIA GPU.
> GPU support lagged behind for years, no support for APUs and no guaranteed forward compatibility were clear signs that as a whole they have no idea what they are doing when it comes to building and shipping a software ecosystem.
This is likely self inflicted. They decided to make two different architectures. One is CDNA for HPC and the other is RDNA for graphics. They are reportedly going to rectify this with UDNA in the future. However, that is what they really should have done from the start. Nvidia builds 1 architecture with different chips based on it to accommodate everything and code written for one easily works on another as it is the same architecture. This is before even considering that they have PTX to be an intermediate language that serves a similar purpose to Java byte code in allowing write once, run anywhere.
They didn’t release support even for all GPUs from the same generation and dropped support for GPUs sometime within 6 months of releasing a version that actually “worked”.
The entire core architecture behind ROCM is rotten.
P.S. NVIDIA usually has multiple CUDA feature levels even within a generation. The difference is that a) they always provide a fallback option, and usually this doesn’t require any manual intervention and b) is that as long as you define the minimum target framework when you build the binary you are guaranteed to run on all past hardware that is supported by the feature level you targeted and on all future hardware.
> On the other hand you can take any CUDA binary even one that dates back to the original Tesla and run it on any modern NVIDIA GPU
This particular difference stems the fact that NVIDIA has PTX and AMD does not have any such thing. Ie this kind of backwards compatibility will never be possible on AMD.
Backward compatibility is one thing but not having a forward compatibility is a killer.
Having to create a binary that targets a very specific set of hardware and having no guarantees and in fact having a guarantee that it won’t on future hardware is what make ROCM unusable for anything you intend to ship.
What’s worse is that they also drop support for their GPUs faster than Leo drops support for his girlfriends once they reach 25…
So not only that you have to recompile there is no guarantee that your code would work with future versions of ROCM or that future versions of ROCM could still produce binaries which are compatible with your older hardware.
Like how is this not the first design goal to address when you are building a CUDA competitor I don’t fucking know.
In buggy numerical code many bugs go trough the software stack without any problems. No crash, no errors. For example,you might switch two double parameters to a function and if their value range is similar, everything works fine except it's all bullshit.
If there are bugs in AMD code that prevent running tests, I bet there are even more bugs that don't manifest until you look at results.
"The software needs to be better" is (and was) an easy call to make for anyone paying attention. The problem is that "AMD just needs to do better" is not and will never be an implementable strategy. Engineering isn't just about money. It's also about the process of exploring all the edge cases.
"We recommend that AMD to fix their GEMM libraries’ heuristic model such that it picks the correct algorithm out of the box instead of wasting the end user’s time doing tuning on their end." Is such a profoundly unhelpful thing to say unless you imagine AMDs engineers just sitting around wondering what to do all day.
AMD needs to make their drivers better, and they have. Shit just takes time.
I find the calling out of Lisa Su by name a bit odd in this kind of product review. I've seen the same from George Hotz. Is it a kind of misogynism? What about the weaknesses of AMD software for AI GPU compute makes this so directly related to the CEO? When Intel's GPU's underwhelmed I didn't see anyone calling out Gelsinger and telling him how to run his business.
And yes, she is definitely responsible for this. Probably more than Gelsinger.
At Intel it is not so obvious what they should have done to improve their fabs to better compete with TSMC, which is groundbreaking tech where you often have to make risky bets.
At AMD it was pretty obvious what had to be done to better compete in AI, and it was basic software engineering, not lithography wizardry. Totally achievable by just spending money, increasing head count, hiring top talent, and firing underperformers.
They have so much low hanging fruit that could have been solved just by hiring 5 or 10 software engineers.
In this case it’s because Dylan Patel of Semianalysis interviews Lisa Su regularly and presumably has a direct line to her, and because Lisa and the rest of AMD leadership are absolutely reading the article. It’s unclear if Pat would have (e.g. I don’t think Pat ever sat down for a chat with Semianalysis like Lisa has).
At what point did any of the criticism have anything to do with her gender? Honest question, I'm scratching my head trying to see where misogyny comes into play. Surely it's not that _because_ she's a woman any criticism from men must be misogynistic? Would it be different if Intel's CEO was female? Or do the people criticising need to be of the same gender as those they're criticising in order for there to be no misogyny?
Truly just trying to get an idea of what sort of perspective it takes to get to
It might be because Lisa has been so outstandingly effective at making AMD competitive across multiple product lines against bigger competitors and this feels to some people like an oversight that AMD could easily solve. I suspect the current situation with ML software at AMD is a consequence of a very focused company and not an easy fix without sacrificing something else.
I don't think many people can keep track of who's running Intel let alone have hope that with a little work they can deliver reasonable substitutes for NVIDIA's products.
Lisa Su is an exemplary CEO, and widely recognized as such. She is exemplary for doing what she did with AMD, and did it without appealing at all to her sex... just on sheer competence. I think it's a bit presumptuous to suddenly call out her sex as if it matters. In reality, she's being talked about exactly like any male CEO. I have great faith in her though. She is clearly extraordinarily capable, and honestly a real inspiration to women in tech
This is the exact type of victim mentality that we don't need. There are absolute insane amounts of people calling out Gelsinger by name and blaming solely him for failures at Intel.
I once tried installing AMD ROCM to run a small llm on a consumer-grade AMD GPU. It was the most horrible software install experience I ever had. Never did manage to get it working.
AMD could spend their market cap in one year to get this done in three and it would be a coup for the shareholders. They could hire all of the best NVIDIA engineers at double their current comp, crush the next TSMC node on Apple levels, and just do it and if it got them a quarter of NVDA’s cap it would be a bargain.
They don’t fucking want to! Believing this is anything like a market is fucking religion.
1: issuing new shares worth their market cap, diluting existing shareholders to 50%.
2: Or borrow their market cap and pay interest by decreasing profits. "AMD operating margin for the quarter ending September 30, 2024 was 5.64%" so profits would be extremely impacted by interest repayments.
Either way your suggestion would be unlikely to be supported by shareholders.
> crush the next TSMC node on Apple levels
I would guess Apple is indirectly paying for the hardware (to avoid repatriating profits) or guaranteeing usage to get to the front of the line at TSMC. Good luck AMD competing with Apple: there's a reason AMD sold GlobalFoundries and there's a reason Intel is now struggling with their foundry costs.
And it comes across as condescending to assume you know better than a successful company.
So basically Nvidia is like Windows and AMD is like Wine. I think trying to emulate CUDA and using forked Nvidia libraries is not the best strategy for AMD. They should have made a clean break and come out with a fresh API, like Apple's Metal.
Back in the late 1990s I met the ATI guys, and they were slipshod then as well. That the ATI legacy of special-casing things lives on is sadly, not too surprising for me.
I made the mistake of clicking on one of the links to commits they mentioned only to end up at a MR changing multiple autogenerated yaml files with 10k line diffs and incomprehensible names. I guess this is where the whole "bad talent" thing comes in - a year later and you are thousands of YAML files deep but still no one can run a simple PyTorch compile ops and get the performance you sold, absolutely unhinged.
What I couldn't find is inference benchmarks for consumer hardware. Just pick a reasonable workload with llama.cpp or ollama and show us some numbers.
I'm particularly interested in building a Home Assistant machine that can run the voice assistant locally (STT/TTS/LLM) while using the least amount of power / generating the least amount of heat and noise.
AMD's software for consumer GPUs demonstrates a lack of seriousness. ROCm only officially supports RDNA2 and RDNA3 GPUs (their last two generations of hardware), and for some reason most of them are supported on only Windows (https://rocm.docs.amd.com/projects/install-on-windows/en/lat...) and not Linux (https://rocm.docs.amd.com/projects/install-on-linux/en/lates...), where most AI training and inference occurs. In particular, Linux users can only start playing with ROCm with a top-of-the-line, power-guzzling unit whereas they can get started with CUDA using basically any Nvidia GPU on desktops or laptops.
In practice, consumer Navi 21 based cards (RX 6900XT etc) and Navi 31 cards (RX 7900 XTX etc) are compatible with Pytorch on Linux.
What they write about ROCm and Windows is equivocation. They target only one app: Blender. Pytorch+ROCm+Windows does not work.
I had bought a 6900XT myself around launch time (the RTX3080 I ordered was not coming, it was the chip shortage times...) and it took around 2 years for Pytorch to become actually usable on it.
Sad but true. Years ago, pre2018 nvidia was the goto hardware supplier if you were doing anything with neural networks.
I remember CUDA being much more buggy back then but it still worked pretty good.
Back then AMD wasn't considered a real competition for ML/AI hardware.
Glad as always to see more competition in the market to drive innovations. AMD seems to be letting larger VRAM onto consumer cards, which is nice to see, just hope the AI/ML experience can get better for their software ecosystem.
That said, I would not expect it to stay working for long as long as ROCm is a dependency since AMD drops support for its older GPUs quickly while Nvidia continues to support older GPUs with less frequent legacy driver updates.
Based on a machine we had bought at my university with 4 AMD W6800s (which are just RX 6800s with double the VRAM), it's bad _even if it works at all_.
Yes, that's exactly what I was checking out. You need fast enough hardware to run the speech to text, text to speech and (most importantly) LLM locally: https://www.youtube.com/watch?v=XvbVePuP7NY (he has dual 3090 GPUs but that's not a practical setup for most people - budget / power / noise).
My anecdata on AMD hiring: they just aren't moving fast enough. They still wanted to fly people out scheduling 3 weeks in advance for AI compiler work. That's just not going to work. Startups and companies like NVIDIA, OpenAI are hiring much faster with much less onerous interview processes, with higher compensation. This is not a mystery. People work for money and aren't going to hop through more hoops to be paid less.
Disappointed that there wasn’t anything on inference performance in the article at all. That’s what the major customers have announced they use it for.
The amount of effort this team took, literally co-opting AMD engineers, and working for 5 months, to get closer but not yet usable, means they are not even close to usable. What team wanting to do ML training/inference can afford so much down time for zero benefit? How many except a few big ones can get AMD to devote so many resources simply for that team?
And, if you’re training a model costing you millions, the last thing you need is a buggy, untested stack, breaking training or perhaps worse giving you noise that makes your models perform worse or increases training time.
By the time AMD gets usable out of the box at this point, NVidia will have moved further ahead.
It sure doesn’t sound done. It’s a one off hacked set of scripts tied to incompatible chunks of a ton of libraries. What happens when you want or need other parts of the PyTorch/billion libs ecosystem? You’re gonna get more AMD engineers and waste 5 months getting those to work?
Meanwhile those libs release running CUDA on NVidia’s old and newest releases out of the box.
So no, it cannot be reused by others in production any more than my custom hacked car engine mod can be added by Ford to every car in existence.
Have you done any deep professional production work on any of these stacks? I have, and would never, ever put stuff like the stuff in the article in production. It’s no where near ready for production use.
Just like that? So a little work and now they are competitive. You know how much work “just a little bit of work” is doing? They us a cultural issue and it will take months to fix if they are lucky and then you start tackling the tech debt they’ve built up. By that time it will be another generation
Anthony has been doing training on 4 of our MI300x systems and has been getting great results... but of course, he is a genius, and writing his own code...
Would be a a buy signal if their actions (better drivers) show that they are seriously working on improving software. "This could be great _if_ you go through the trouble of doing it right!" is not persuasive, and any sane person would go with green if they know they have the choice between troubleshooting shitty software and things just working. Look at the george hotz archive youtube channel and watch the videos where he's debugging the amd drivers, it damn near ruins the man. And george is not the type to give up at the first roadblock, there are multiple 5-8 hour videos where he just tries to get the thing to work. The mad man ended up just writing his own driver lol.
It does seem like an improvement. Six or twelve months ago, I recall a lot of crashes and even more basic problems. “If you tune it right, it’s awesome” is a big step forward compared to that.
I expect Nvidia shares to increase tomorrow because of the article while AMD shares are not likely to do well. It is odd how we read the same thing and came to opposite conclusions.
> Give AMD Engineers more compute and engineering resources to fix and improve the AMD ecosystem, they have very few internal gpu boxes relative to what Nvidia provides to their engineers.
This is real. We’ve found ourselves having to give hardware to engineers at AMD because they’re unable to get allocation of it internally.
Sadly common at hardware companies. The most extreme case I've heard of is ASML, who supposedly doesn't keep any machines of their own. They test against "almost-ready" machines right before they go out the door to customers.
ASML might be an extreme outlier though, don't those things cost like $50 million+ each?
Many for last gen process nodes, and from a second or third hand supplier if you could even find one. ASML makes very few fully working machines each year, and the cost and throughput those machines have is astronomical.
They have spare parts you'd bet, and I'd bet they have some SLA agreement with each customer where an engineer is basically on call nearby in case a single thing dosnt work or a random part breaks or needs servicing.
Asianometry did a great video on the cost of downtime when it comes to ASML device in any fab. While I am not directly in this field and can't speak to the accuracy of the numbers john gives, he does not seem one to just make stuff up as his quality of video production for niche topics is quite good.
Almost a decade ago KFAB had a fire, power was cut, everything in process was dumped, they planned to restart but ended up being was cheaper to close the whole facility .
Probably for the best though, KFAB had been discharging several tons of solvents, cleaning agents, and reagents per year into the surrounding area [for as long as it ran](https://enviro.epa.gov/facts/tri/ef-facilities/#/Release/640...)
https://enviro.epa.gov/facts/tri/ef-facilities/#/Release/640...
Put a 1 or 2 in front of that
try 400 mil
Coming up next: "We bought AMD stock on the open market and used it to compensate AMD engineers".
You joke, but it is almost a genuine investment opportunity here for a large player.
Spend a billion on AMD shares, Spend another Billion on a out-of-house software team to solve the software solution to more than double the share price.
Taking into account that there are players that already own billions in AMD shares, they could probably do that as well. On the other hand perhaps it would be better for them, as major shareholders, to have a word with AMD management.
I don't have the inside baseball but I have seen those weird as hell interviews with Lisa Su where she gets asked point blank about the software problems and instead of "working on it, stay tuned" -- an answer that costs nothing to give -- she deflects into "performance is what matters," which is the kind of denial that rhymes exactly with the problems they are having. No, the horsepower of your F1 racecar doesn't matter if the engine doesn't start and there's a wheel missing! You need to fix those problems before the horsepower can matter! Please tell me you are fixing the starter and the wheel!
Hopefully I am reading too much into this. Hopefully she doesn't have any weird hangups over investing in software and it all just takes time to Do It Right after GPGPU got starved in the AMD winter. But if it is a weird hangup then yeah, 100%, ownership needs to get management in line because whiffing a matmul benchmark years into a world where matmul is worth trillions just ain't it.
> she deflects into "performance is what matters," which is the kind of denial that rhymes exactly with the problems they are having.
It's not a deflection, but a straightforward description of AMDs current top-down market strategy of partnering with big players instead of doubling down to have a great OOBE for consumers & others who don't order GPUs by the pallet. It's an honest reflection if their current core competencies, and the opportunity presented by Nvidia's margins.
They are going for a bang-for-buck right now aiming at data center workloads, and the hyperscalers care a lot about perf/$ than raw performance at. Hyperscalers are also more self-sufficient at software: they have entire teams working on PyTorch, Jax, and writing kernels.
>Hyperscalers are also more self-sufficient at software: they have entire teams working on PyTorch, Jax, and writing kernels.
None of this matters because AMD drivers are broken. No one is asking AMD to write a PyTorch backend. The idea that AMD will have twice the silicon performance than nvidia to make up the performance loss for bad software is a pipedream.
> None of this matters because AMD drivers are broken
Do you honestly think the MI300 has broken drivers?
Engineers at hyperscalers are struggling through all the bugs too. It's coming at notable opportunity cost for them, at a time when they also want an end to the monopoly. Do they buy AMD and wade through bug after bug, regression after regression, or do they shell out slightly more money for Nvidia GPUs and have it "just work".
AMD has to get on top of their software quality issues if they're ever going to succeed in this segment, or they need to be producing chips so much faster than Nvidia that it's worth the extra time investment and pain.
That's the excuse used by every big company shitting out software so broken that it needs intensive professional babysitting.
I've been on both sides of this shitshow, I've even said those lines before! But I've also been in the trenches making the broken shit work and I know that it's fundamentally an excuse. There's a reason why people pay 80% margin to Nvidia and there's a reason why AMD is worth less than the rounding error when people call NVDA a 3 trillion dollar company.
It's not because people can't read a spec sheet, it's because people want their expensive engineers training models not changing diapers on incontinent equipment.
I hope AMD pulls through but denial is _not_ the move.
What exactly are they in denial about? They are aware that software is not a strength of theirs, so they partner with those who are great at it.
Would you say AMD is "shitting the bed" by not building it's own consoles too? You know AMD could build a kick-ass console since they are doing the heavy-lifting for the Playstation, and the XBox[1] , but AMD knows as much as anybody that they don't have the skills to wrangle studio relationships or figure out which games to finance. Instead, they lean hard in their HW skills and get Sony Entertainment/the Xbox division do what they do best.
1.and the Steam Deck, plus half a dozen Deck clones.
There is probably one employee - either a direct report of Su's or maybe one of her grandchildren in the org chart - who needs to "get it". If they replaced that one manager with someone who sees graphics cards as a tool to accelerate linear algebra then AMD would be participating more effectively in a multi-trillion dollar market. They are so breathtakingly close to the minimum standards of competence on this one. We know from the specs that the cards they produce should be able to perform.
This is a case-specific example of failure, it doesn't generalise very well to other markets. AMD is really well positioned for this very specific opportunity of historic proportions and the only thing holding them back is a somewhat continuous stream of unforced failures when writing a high quality compute driver. It seems to be pretty close to one single team of people holding the company back although organisational issues tend to stem from a level or two higher than the team. This could be the most visible case of value destruction by a public company we'll see in our lifetimes.
Optimistically speaking maybe they've already found and sacked the individual responsible and we're just waiting for improvement. I'm buying Nvidia until that proves to be so.
I was surprised to hear recently that the same happens at NVIDIA! Hopefully less frequently, but I can understand why it's hard to keep many units on hand given the level of external demand.
This seems so insane, is anyone actually doing the work to provide an alternative to CUDA? Maybe Google?
https://www.intel.com/content/www/us/en/developer/articles/t...
Honestly, probably NVIDIA itself, since they contribute significantly to many open-source projects (MLIR), and also make their SoTA GEMM/Conv implementations open-source and available for study (Cutlass).
> AMD is attempting to vertically integrate next year with their upcoming Pollara 400G NIC, which supports Ultra Ethernet, hopefully making AMD competitive with Nvidia.
Infiniband is an industry standard. It is weird to see the industry invent yet another standard to do effectively the same thing just because Nvidia is using it. This “Nvidia does things this way so let’s do it differently” mentality is hurting AMD:
I am sure people can find more examples. Also, they seem to have realized their mistake in splitting their architecture into RDNA and CDNA, since they are introducing UDNA in the future to unify them like Nvidia does.You're painting this like AMD is off to play in their own sandbox when it's more like the entire industry is trying to develop an alternative to Infiniband.
Ultra Ethernet is a joint project between dozens of companies organized under the Linux Foundation.
https://www.phoronix.com/news/Ultra-Ethernet-Consortium
>> The Linux Foundation has established the Ultra Ethernet Consortium "UED" as an industry-wide effort founded by AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, and Microsoft for designing a new Ethernet-based communication stack architecture for high performance networking.
You probably can't call it "industry standard" yet but the goal is obviously for it to become one.
I wrote:
> It is weird to see the industry invent yet another standard to do effectively the same thing just because Nvidia is using it.
This is a misstep for all involved, AMD included. Even if AMD is following everyone else by jumping off a bridge, AMD is still jumping too.
That MatMul performance is fairly shocking. To be that much below theoretical maximum on what should be a fairly low overhead operation.
I would at least hope that they know where the speed is going, but the issue of torch.matmul and F.Linear using different libraries with different performance suggests that they don't even know which code they are running, let alone where the slow bits in that code are.
Low overhead in what sense? matmul is kinda complicated and there are varying, complex state-of-the-art algorithms for it, no? And then if you know things about the matrices in advance you can start optimizing for that, which adds another layer of complexity.
There are, but everyone uses variations of the same O(n^3) algorithm taught in college introduction to linear algebra classes because it is numerically stable and can be made extremely fast through tweaks that give spatial locality and good cache characteristics. Meanwhile the asymptomatically faster algorithms have such large constants in their big O notation that they are not worth using. FFT based matrix multiplication, which is O((n^2)log(n)), also has numerical instability on top of running slower.
By overhead I'm talking about the things that have to be done supplementary to the algorithm.
While there are complex state-of-the-art algorithms, those algorithms exist for everyone. The overhead is the bit that had to be done to make the algorithm work.
For instance for sorting a list of strings the algorithm might be quick sort. The overhead would be in the efficiency of your string compare.
For matmul I'm not sure what your overhead is beyond moving memory, multiplying, and adding. A platform touting a memory bandwidth and raw compute advantage should have that covered. Where is the performance being lost?
I guess the only real options are stalls, unnecessary copies, or unnecessary computations.
Which algorithm you pick for what shape of matrices is different and not straightforward to figure out. AMD currently wants you to “tune” ops and likely search for the right algorithm for your shapes while Nvidia has accurate heuristics for picking the right algorithm.
> For matmul I'm not sure what your overhead is beyond moving memory, multiplying, and adding. A platform touting a memory bandwidth and raw compute advantage should have that covered. Where is the performance being lost?
The use of the word 'algorithm' is incorrect.
Look... I do this sort of work for a living. There has been no useful significant change to matmul algorithms.
What has changed is the matmul process.
Modern perf optimization on GPUs has little to do with algorithms and everything to do with process optimization. This is akin to factory floor planning and such. You have to make sure the data is there when the processing units need it, and the data is coming in at the fastest rate possible, while keeping everything synchronized to avoid wrong results or deadlocks.
Really compute power has nothing to do with it. It's a waste of time to even consider it. We can compute matmuls much faster than you can naively bring memory to the processing units. Whoever solves that problem will become very rich.
To that end, NVIDIA ships libraries that will choose from a wide variety of implementations the appropriate trade-offs necessary for SoTA perf on matmuls of all shapes and data types.
To be fair, GEMV is memory bandwidth bound and that is what token generation in transformers uses. GEMM is the compute bound one, provided you do not shoehorn GEMV into it. That special case is memory bandwidth bound.
Yes and no. Conceptually it's just three nested loops. The fiddly part is unrolling the inner loop and swizzling the data layouts in such a way that the cores can be kept "fed" efficiently. This usually means breaking things up into cache-sized chunks along some axis.
It's easy enough that there's blog articles showing single developers getting within spitting distance of NVIDIA's highly optimised code. As in, 80-something-percent of the best available algorithms!
All NVIDIA did was "put the effort in", where the effort isn't some super clever algorithm implemented by a unique genius, but they simply made hundreds of variants of the matmul algorithm optimised for various scenarios. It's a kind of algorithmic brute force for eking out every last percentage point for every shape and size of input matrices on every GPU model and even for various SLI configurations.
From what I've seen, AMD has done... none of this.
> It’s not just that it’s immature software, they need to change how they do development.
I remember geohot saying something similar about a year ago
I expect everyone has been saying it for a while, the calls are just getting more strident and public as it becomes clear that AMD's failures are strategic rather than tactical. And as people try to build business on their half-hearted attempts.
I still think it is a mistake to say that CUDA is a moat. IMO the problem here is that AMD still doesn't seem to think that GPGPU compute is a thing. They don't seem to understand the idea that someone might want to use their graphics cards to multiply matricies independently of a graphics pipeline. All the features CUDA supports are irrelevant compared to the fact that AMD can't handle GEMM performantly out of the box. In my experience it just can't do it, back in the day my attempts to multiply matrices would crash drivers. That isn't a moat, but it certainly is something spectacular.
If they could manage an engineering process that delivered good GEMM performance then the other stuff can probably get handled. But without it there really is a question of what these cards are for.
I wonder to what extent vulkan compute could be used for this. Of course, it is only an option on their RDNA GPUs since CDNA is not for graphics, even though that is the G in GPU.
Yeah, 80% margins on matrix multiplication should be a puddle not a moat but AMD is more scared of water than the witch that melts in Wizard of Oz so I guess the puddle is a moat after all.
Anyone who looks at the mess that is ROCm and the design choices they made could easily see that.
GPU support lagged behind for years, no support for APUs and no guaranteed forward compatibility were clear signs that as a whole they have no idea what they are doing when it comes to building and shipping a software ecosystem.
To that you can add the long history of both AMD and ATI before they merged releasing dog shit software and then dropping support for it.
On the other hand you can take any CUDA binary even one that dates back to the original Tesla and run it on any modern NVIDIA GPU.
> GPU support lagged behind for years, no support for APUs and no guaranteed forward compatibility were clear signs that as a whole they have no idea what they are doing when it comes to building and shipping a software ecosystem.
This is likely self inflicted. They decided to make two different architectures. One is CDNA for HPC and the other is RDNA for graphics. They are reportedly going to rectify this with UDNA in the future. However, that is what they really should have done from the start. Nvidia builds 1 architecture with different chips based on it to accommodate everything and code written for one easily works on another as it is the same architecture. This is before even considering that they have PTX to be an intermediate language that serves a similar purpose to Java byte code in allowing write once, run anywhere.
This was happening before CDNA was even a thing.
They didn’t release support even for all GPUs from the same generation and dropped support for GPUs sometime within 6 months of releasing a version that actually “worked”.
The entire core architecture behind ROCM is rotten.
P.S. NVIDIA usually has multiple CUDA feature levels even within a generation. The difference is that a) they always provide a fallback option, and usually this doesn’t require any manual intervention and b) is that as long as you define the minimum target framework when you build the binary you are guaranteed to run on all past hardware that is supported by the feature level you targeted and on all future hardware.
> On the other hand you can take any CUDA binary even one that dates back to the original Tesla and run it on any modern NVIDIA GPU
This particular difference stems the fact that NVIDIA has PTX and AMD does not have any such thing. Ie this kind of backwards compatibility will never be possible on AMD.
Backward compatibility is one thing but not having a forward compatibility is a killer.
Having to create a binary that targets a very specific set of hardware and having no guarantees and in fact having a guarantee that it won’t on future hardware is what make ROCM unusable for anything you intend to ship.
What’s worse is that they also drop support for their GPUs faster than Leo drops support for his girlfriends once they reach 25…
So not only that you have to recompile there is no guarantee that your code would work with future versions of ROCM or that future versions of ROCM could still produce binaries which are compatible with your older hardware.
Like how is this not the first design goal to address when you are building a CUDA competitor I don’t fucking know.
> Like how is this not the first design goal to address when you are building a CUDA competitor I don’t fucking know.
The words "tech debt" do not have any meaning at AMD. No one understands why this is a problem.
In buggy numerical code many bugs go trough the software stack without any problems. No crash, no errors. For example,you might switch two double parameters to a function and if their value range is similar, everything works fine except it's all bullshit.
If there are bugs in AMD code that prevent running tests, I bet there are even more bugs that don't manifest until you look at results.
"The software needs to be better" is (and was) an easy call to make for anyone paying attention. The problem is that "AMD just needs to do better" is not and will never be an implementable strategy. Engineering isn't just about money. It's also about the process of exploring all the edge cases.
"We recommend that AMD to fix their GEMM libraries’ heuristic model such that it picks the correct algorithm out of the box instead of wasting the end user’s time doing tuning on their end." Is such a profoundly unhelpful thing to say unless you imagine AMDs engineers just sitting around wondering what to do all day.
AMD needs to make their drivers better, and they have. Shit just takes time.
Sounds more like they were (and still are) being sloppy. “be better” is one thing. “runs without fatal crash” is what semi is talking about.
[dead]
I find the calling out of Lisa Su by name a bit odd in this kind of product review. I've seen the same from George Hotz. Is it a kind of misogynism? What about the weaknesses of AMD software for AI GPU compute makes this so directly related to the CEO? When Intel's GPU's underwhelmed I didn't see anyone calling out Gelsinger and telling him how to run his business.
People call out Gelsinger all the time.
And yes, she is definitely responsible for this. Probably more than Gelsinger.
At Intel it is not so obvious what they should have done to improve their fabs to better compete with TSMC, which is groundbreaking tech where you often have to make risky bets.
At AMD it was pretty obvious what had to be done to better compete in AI, and it was basic software engineering, not lithography wizardry. Totally achievable by just spending money, increasing head count, hiring top talent, and firing underperformers.
They have so much low hanging fruit that could have been solved just by hiring 5 or 10 software engineers.
In this case it’s because Dylan Patel of Semianalysis interviews Lisa Su regularly and presumably has a direct line to her, and because Lisa and the rest of AMD leadership are absolutely reading the article. It’s unclear if Pat would have (e.g. I don’t think Pat ever sat down for a chat with Semianalysis like Lisa has).
> Is it a kind of misogynism?
-.-
At what point did any of the criticism have anything to do with her gender? Honest question, I'm scratching my head trying to see where misogyny comes into play. Surely it's not that _because_ she's a woman any criticism from men must be misogynistic? Would it be different if Intel's CEO was female? Or do the people criticising need to be of the same gender as those they're criticising in order for there to be no misogyny?
Truly just trying to get an idea of what sort of perspective it takes to get to
> Is it a kind of misogynism?
It might be because Lisa has been so outstandingly effective at making AMD competitive across multiple product lines against bigger competitors and this feels to some people like an oversight that AMD could easily solve. I suspect the current situation with ML software at AMD is a consequence of a very focused company and not an easy fix without sacrificing something else.
I don't think many people can keep track of who's running Intel let alone have hope that with a little work they can deliver reasonable substitutes for NVIDIA's products.
This is Lisa Su's fault.
Lisa Su is an exemplary CEO, and widely recognized as such. She is exemplary for doing what she did with AMD, and did it without appealing at all to her sex... just on sheer competence. I think it's a bit presumptuous to suddenly call out her sex as if it matters. In reality, she's being talked about exactly like any male CEO. I have great faith in her though. She is clearly extraordinarily capable, and honestly a real inspiration to women in tech
This is the exact type of victim mentality that we don't need. There are absolute insane amounts of people calling out Gelsinger by name and blaming solely him for failures at Intel.
I once tried installing AMD ROCM to run a small llm on a consumer-grade AMD GPU. It was the most horrible software install experience I ever had. Never did manage to get it working.
AMD could spend their market cap in one year to get this done in three and it would be a coup for the shareholders. They could hire all of the best NVIDIA engineers at double their current comp, crush the next TSMC node on Apple levels, and just do it and if it got them a quarter of NVDA’s cap it would be a bargain.
They don’t fucking want to! Believing this is anything like a market is fucking religion.
Try to make sense...
They can spend their market cap by either:
1: issuing new shares worth their market cap, diluting existing shareholders to 50%.
2: Or borrow their market cap and pay interest by decreasing profits. "AMD operating margin for the quarter ending September 30, 2024 was 5.64%" so profits would be extremely impacted by interest repayments.
Either way your suggestion would be unlikely to be supported by shareholders.
> crush the next TSMC node on Apple levels
I would guess Apple is indirectly paying for the hardware (to avoid repatriating profits) or guaranteeing usage to get to the front of the line at TSMC. Good luck AMD competing with Apple: there's a reason AMD sold GlobalFoundries and there's a reason Intel is now struggling with their foundry costs.
And it comes across as condescending to assume you know better than a successful company.
When 4 trillion dollars are at stake, the financing is available or it fucking better be.
What in God’s name do we pay these structured finance, bond-issue assholes 15% of GDP for if not to finance a sure thing like that?
It sure as hell ain’t for their taste in Charvet and Hermes ties, because the ones they pick look like shit.
So basically Nvidia is like Windows and AMD is like Wine. I think trying to emulate CUDA and using forked Nvidia libraries is not the best strategy for AMD. They should have made a clean break and come out with a fresh API, like Apple's Metal.
Back in the late 1990s I met the ATI guys, and they were slipshod then as well. That the ATI legacy of special-casing things lives on is sadly, not too surprising for me.
I made the mistake of clicking on one of the links to commits they mentioned only to end up at a MR changing multiple autogenerated yaml files with 10k line diffs and incomprehensible names. I guess this is where the whole "bad talent" thing comes in - a year later and you are thousands of YAML files deep but still no one can run a simple PyTorch compile ops and get the performance you sold, absolutely unhinged.
B100, B200 ramp-up is only 4-6 months away.
What I couldn't find is inference benchmarks for consumer hardware. Just pick a reasonable workload with llama.cpp or ollama and show us some numbers.
I'm particularly interested in building a Home Assistant machine that can run the voice assistant locally (STT/TTS/LLM) while using the least amount of power / generating the least amount of heat and noise.
AMD's software for consumer GPUs demonstrates a lack of seriousness. ROCm only officially supports RDNA2 and RDNA3 GPUs (their last two generations of hardware), and for some reason most of them are supported on only Windows (https://rocm.docs.amd.com/projects/install-on-windows/en/lat...) and not Linux (https://rocm.docs.amd.com/projects/install-on-linux/en/lates...), where most AI training and inference occurs. In particular, Linux users can only start playing with ROCm with a top-of-the-line, power-guzzling unit whereas they can get started with CUDA using basically any Nvidia GPU on desktops or laptops.
In practice, consumer Navi 21 based cards (RX 6900XT etc) and Navi 31 cards (RX 7900 XTX etc) are compatible with Pytorch on Linux.
What they write about ROCm and Windows is equivocation. They target only one app: Blender. Pytorch+ROCm+Windows does not work.
I had bought a 6900XT myself around launch time (the RTX3080 I ordered was not coming, it was the chip shortage times...) and it took around 2 years for Pytorch to become actually usable on it.
in practice everyone who wants to do ML at home buys nvidia and pays the premium
Sad but true. Years ago, pre2018 nvidia was the goto hardware supplier if you were doing anything with neural networks.
I remember CUDA being much more buggy back then but it still worked pretty good.
Back then AMD wasn't considered a real competition for ML/AI hardware.
Glad as always to see more competition in the market to drive innovations. AMD seems to be letting larger VRAM onto consumer cards, which is nice to see, just hope the AI/ML experience can get better for their software ecosystem.
The obligatory link:
https://xkcd.com/644/
That said, I would not expect it to stay working for long as long as ROCm is a dependency since AMD drops support for its older GPUs quickly while Nvidia continues to support older GPUs with less frequent legacy driver updates.
Based on a machine we had bought at my university with 4 AMD W6800s (which are just RX 6800s with double the VRAM), it's bad _even if it works at all_.
You might just check out the Home Assistant Voice:
https://ameridroid.com/products/home-assistant-voice-preview...
Yes, that's exactly what I was checking out. You need fast enough hardware to run the speech to text, text to speech and (most importantly) LLM locally: https://www.youtube.com/watch?v=XvbVePuP7NY (he has dual 3090 GPUs but that's not a practical setup for most people - budget / power / noise).
It would be cool to see these benchmarks on the newly released Jetson Orin Nano Super, like faster-whisper.
My anecdata on AMD hiring: they just aren't moving fast enough. They still wanted to fly people out scheduling 3 weeks in advance for AI compiler work. That's just not going to work. Startups and companies like NVIDIA, OpenAI are hiring much faster with much less onerous interview processes, with higher compensation. This is not a mystery. People work for money and aren't going to hop through more hoops to be paid less.
Disappointed that there wasn’t anything on inference performance in the article at all. That’s what the major customers have announced they use it for.
> CUDA Moat Still Alive
Wrong conclusion. AMD is slower than NVidia, but not _that_ much slower. They are actually pretty cost-competitive.
The just need to do some improvements, and they'll be a very viable competitor.
The amount of effort this team took, literally co-opting AMD engineers, and working for 5 months, to get closer but not yet usable, means they are not even close to usable. What team wanting to do ML training/inference can afford so much down time for zero benefit? How many except a few big ones can get AMD to devote so many resources simply for that team?
And, if you’re training a model costing you millions, the last thing you need is a buggy, untested stack, breaking training or perhaps worse giving you noise that makes your models perform worse or increases training time.
By the time AMD gets usable out of the box at this point, NVidia will have moved further ahead.
Sure. But this work is done, and can be reused by others.
Meanwhile, Nvidia hardware is expensive and still is in short supply. AMD might look quite tempting.
It sure doesn’t sound done. It’s a one off hacked set of scripts tied to incompatible chunks of a ton of libraries. What happens when you want or need other parts of the PyTorch/billion libs ecosystem? You’re gonna get more AMD engineers and waste 5 months getting those to work?
Meanwhile those libs release running CUDA on NVidia’s old and newest releases out of the box.
So no, it cannot be reused by others in production any more than my custom hacked car engine mod can be added by Ford to every car in existence.
Have you done any deep professional production work on any of these stacks? I have, and would never, ever put stuff like the stuff in the article in production. It’s no where near ready for production use.
There is a difference between doing something just for yourself and making it usable by others.
Like the article say, if the model change a little this work need to be almost thrown out
All the cloud providers list MI300x as more expensive than H100. So if you compare performance/cost it is even worse.
Just like that? So a little work and now they are competitive. You know how much work “just a little bit of work” is doing? They us a cultural issue and it will take months to fix if they are lucky and then you start tackling the tech debt they’ve built up. By that time it will be another generation
Sounds like a buy signal for AMD. If you run the right branch and set the right env cars, the thing flies.
Anthony has been doing training on 4 of our MI300x systems and has been getting great results... but of course, he is a genius, and writing his own code...
https://x.com/HotAisle/status/1870984996171006035
https://x.com/zealandic1/status/1869857713280430349
https://x.com/zealandic1/status/1868810042168033623
Would be a a buy signal if their actions (better drivers) show that they are seriously working on improving software. "This could be great _if_ you go through the trouble of doing it right!" is not persuasive, and any sane person would go with green if they know they have the choice between troubleshooting shitty software and things just working. Look at the george hotz archive youtube channel and watch the videos where he's debugging the amd drivers, it damn near ruins the man. And george is not the type to give up at the first roadblock, there are multiple 5-8 hour videos where he just tries to get the thing to work. The mad man ended up just writing his own driver lol.
It does seem like an improvement. Six or twelve months ago, I recall a lot of crashes and even more basic problems. “If you tune it right, it’s awesome” is a big step forward compared to that.
Unfortunately,
> Getting reasonable training performance out of AMD MI300X is an NP-Hard problem.
I expect Nvidia shares to increase tomorrow because of the article while AMD shares are not likely to do well. It is odd how we read the same thing and came to opposite conclusions.
1. AMD always had a lot of hype already priced in, it is no different with AI.
2. AMD has always shipped a bad software stack, it is no different with AI.