I think I’d buy something with Strix Halo or Strix Point if there was official ROCm support. As of 6.4.1 from earlier this month there’s still not, as I understand it. I’d be delighted to be corrected on this matter.
michaellarabel 11 hours ago [-]
There is (unofficial) ROCm support for Strix Halo with ROCm 6.4.1. But like Llama.cpp and such were seg faulting but ROCR-based OpenCL was working and other workloads.
By the way thanks for working on this! I read all of your reviews on this device and it's been very informative.
pantalaimon 8 hours ago [-]
Does it work with RustiCL?
michaellarabel 7 hours ago [-]
I haven't gotten around to trying it but it's on my TODO list if having the time before needing to send the review unit back (likely next week or so I'd expect)
perplexe 11 hours ago [-]
What’s AMD's strategy in not having consumer chip support for ROCm? It's puzzling. No way to get critical mass of development interest if the bar to entry is high.
_aavaa_ 10 hours ago [-]
Their plan appears to be TheRock [0]. Further open sourcing and engaging the community and leveraging that to expand support faster.
There are some recent discussions on YouTube about it [1], including one with a senior VP [2].
I think that part of the issue is the split between CDNA for data centers [1] and RDNA for consumer products [2] with AMD only having the money to focus on the bigger data center market. There are rumors that both architectures will be merged into UDNA in the future, which will hopefully improve ROCm support, but for now it's lacking
The strategy (seems to be) targeting data centres and focusing support efforts on the cards most likely to be used in one. There is an expectation that ROCm will work on pretty much everything but their drivers aren't good so in practice it is dicey whether it actually does.
rcarmo 8 hours ago [-]
You can run theoretically run ollama on it, as with the earlier APUs (I did it on an M780 by allocating 16 of the machine's 32GB to the iGPU). I am _very_ interested in getting my hands on one because I see it as a decent compromise between power, RAM capacity (with soldered-on RAM, it's got pretty good latency) and performance.
xeeeeeeeeeeenu 9 hours ago [-]
It seems that AMD/ATI reuses the Radeon 8000 name approximately every 10 years:
2001: Radeon 8000 series
2013: Radeon HD 8000 series
2025: Radeon 8000S series
_aavaa_ 7 hours ago [-]
Checks out. Increase number by 1,000 every year, modulo 10,000.
SecretDreams 11 hours ago [-]
I would have liked to see some discussions on cost and comparisons to dGPUs in the laptop space. I can see it's beating the Intel laptops, but that is expected based on specs.
Maybe I'm missing something?
roenxi 11 hours ago [-]
We have a GPU/CPU fusion chip that is unremarkable, performant and runs well under linux out of the box. There isn't a lot of novelty in that which is, in itself, pretty remarkable.
Plus, although I can't really swear to understand how these chips work, my read is this is basically a graphics card that can be configured with 64GB of memory. If I'm not misreading that it actually sounds quite interesting; even AMDs hopeless compute drivers might potentially be useful for AI work if enough RAM gets thrown into the mix. Although my burn wounds from buying AMD haven't healed yet so I'll let someone else fund that experiment.
jakogut 9 hours ago [-]
I've done it. I have a GPD Pocket 4 with 64 GB of RAM and the less capable HX 370 Strix Point chip.
Using ollama, hardware acceleration doesn't really work through ROCm. The framework doesn't officially support gfx1151 (Strix Point RDNA 3.5+), though you can override it to fake gfx1150 (Strix Halo, also RDNA 3.5+ and UMA), and it works.
I think I got it to work for smaller models that fit entirely into the preallocated VRAM buffer, but my machine only allows for statically allocating up to 16 GB for the GPU, and where's the fun in that? This is a unified memory architecture chip, I want to be able to run 30+ GB models seamlessly.
It turns out, you can. Just build llama.cpp from source with the Vulkan backend enabled. You can use a 2 GB static VRAM allocation and any additional data spills into GTT which the driver handles mapping into the GPU's address space seamlessly.
You can see a benchmark I performed of a small model on GitHub [0], but I've done up to Gemma3 27b (~21 GB) and other large models with decent performance, and Strix Halo is supposed to have 2-3x the memory bandwidth and compute performance. Even 8b models perform well with the GPU in power saving mode, inside ~8W.
Come to think of it, those results might make a good blog post.
> We have a GPU/CPU fusion chip that is unremarkable, performant and runs well under linux out of the box. There isn't a lot of novelty in that which is, in itself, pretty remarkable.
With this reasoning, I'd probably argue all modern cpus and GPUs aren't particularly remarkable/novel. That could help even be fine.
At the end of the day, these benchmarks are all meant to inform on relative performance, price, and power consumption for end users to make informed decisions (imo). The relative* comparisons are low-key just as important as the new bench data point.
jauntywundrkind 7 hours ago [-]
Agreed that for GPU oomph, you'd do better with a dGPU; there's some very good deals for dGPU laptops but even this next-tier-down 8050S is still a rather expensive new purchase by compare (for now; Strix Halo is brand new). But the power consumption will likely be much higher.
Strix Halo as an APU has two very clear advantages. First, I expect power consumption is somewhat better, due to using LPDDR5(x?) and not needing to go over PCIe.
But the real win is that you can get a 64GB or 128GB GPU (well somewhat less than that)! And there's not really anything stopping 192GB or 256GB builds from happening, now that bigger ram sizes are finally available in the world. But so far all Strix Halo offerings are soldered on ram (non user upgradeable, no camm2 offerings yet), and no one's doing more than 128GB. But that's still a huge LLM compared to what consumers could run before! Or lots of LLMs loaded and ready to go! We see similar things with the large unified memory on Mac APUs; it's why are minis are sometimes popular for LLMs.
Meanwhile Nvidia is charging $20k+ for an A100 GPU with 80GB ram. You won't have that level of performance, but you I'll be able to fit an even bigger LLM than it. For 1/10th the price.
There's also a lot of neat applications for DB's or any kind of data-intense processing. Because unified memory means the work can move between CPU and GPU, without having to move the workload. Normally to use a GPU you end up copying data out of main memory then writing it to the GPU* then reading it to do work, and you can skip 2/3rds of these read/write steps here.
There's some very awesome potential for doing query processing on GPU (ex: PG-Strom). Might also be a pretty interesting for a GPU based router, ala PacketShader (2010).
* Note that PCIe p2p-dma / device RDMA / dma-buf has been getting progressively much better, a lot of attention, across the past half deacde, such that say a nic can send network data direct to GPU memory, or a NVMe drive can send data direct to GPU or network, without bouncing through main memory. One recent example of many: https://www.phoronix.com/news/Device-Memory-TCP-TX-Linux-6.1...
Giving an APU actually fast ram has some cool use cases. I'm excited to see the lines blur in computing like this, to see
dragonwriter 7 hours ago [-]
> Meanwhile Nvidia is charging $20k+ for an A100 GPU with 80GB ram.
Or, sometime in the next month or so, NVidia GB10-based miniPC form factor devices with 128GB (with high-speed interconnect to allow two to serve as a single 256GB system) from various brands (including direct from Nvidia) for $3000-4000 depending on exact configuration and who is assembling the completed system.
rbanffy 11 hours ago [-]
It appears it's using significantly more power than the Intel ones as well. I would be more interested in GPU computing performance for LLMs than graphics, as all I need is a frame buffer.
opencl 10 hours ago [-]
The Strix Halo GPU is roughly around RTX 4060 (laptop version) performance.
Phoronix just doesn't do much mobile dGPU testing in general to have any data to compare with there.
michaellarabel 10 hours ago [-]
Right, unfortunately, was limited by the laptops I have on-hand for (re)testing... With routinely re-testing all laptops fresh, in this case on Ubuntu 25.04, not able to compare to prior dGPU-enabled laptops that since had to be returned to vendors, etc.
SecretDreams 8 hours ago [-]
It makes sense, just leaves the article feeling kind of incomplete as a result. It's more like a data point to be compiled against other existing data. They could pull more clicks if they either had some of that other data done in house or collaborated with some other shop that does have the data (or a laptop to loan) readily available.
mjevans 11 hours ago [-]
Valve, could we please get a Ryzen 390-ish Steamdeck platform refresh? Or maybe if Intel wants game devs to test on their GPUs they'd cut a small deal to make an Intel Xe refresh.
adgjlsfhk1 10 hours ago [-]
My guess is that Valve will wait at least 1 more generation, since Zen 6 will be on N2p which will be the first nanosheet generation (with promised 30% power reductions compared to the n3e that zen5 uses). Valve clearly is more interested in a console model with new releases only when there is a massive performance uplift compared to the prior generation, so I think we are at least 1 gen away. Specifically, I think their target will be 1080p/60fps on low settings (or higher settings using fsr).
ho_schi 8 hours ago [-]
Yep. And that's good for customers and Linux.
Maybe Zen 6. AMD already provides the Z-Series specifically for handhelds. The Steamdeck uses currently the regular Zen2 with a RNDA2 from Zen3+. Best would be passive cooling, which would probably run well with native Linux ports like Counter-Strike 2. But I'm worried that they need to use fans again.
adgjlsfhk1 7 hours ago [-]
a fan seems very likely to me. to avoid it, you need to go down to ~5 watts, and then you remove the ability to run faster when plugged in.
shantara 9 hours ago [-]
Valve cares about battery life and user experience more than just raw power, and after watching multiple other handhelds deliver very insubstantial improvements over Steam Deck OLED at 3x power consumption, I tend to agree. I'm convinced that SD2 won't happen until measurable improvements at the same 15W power limit the current Deck has. There are plenty of older games and indies in Steam catalog that play perfectly fine on Deck, and chasing the latest power hungry UE5 juggernauts is a losing proposition.
One thing I wouldn't mind having on my Deck is FSR4 support, though AMD still hasn't submitted their Vulkan FP8 support proposal, requiring unofficial Mesa hacks to enable it even on desktop Linux.
bryanlarsen 9 hours ago [-]
Shoving a 45W+ chip into a steam deck would result in more compromises than most would like.
nexle 7 hours ago [-]
It is mindblowing how AMD managed to squeeze such a powerful iGPU (essentially a low-end dGPU) into an APU while much more energy efficient compared to a dGPU (and thus requires less heat sink = more compact).
Of course the major problem about Strix Halo is the price. I'm just wondering how much the iGPU contributed to the insane price tag compared to the NPU. If AMD can release a similar APU without the useless NPU (at least in Linux) with a more accessible pricing (e.g. 8745HS), they can easily dominate the low-end mobile dGPU market.
z3ratul163071 3 hours ago [-]
oh amdgpu, my favorite topic. less than a half an hour ago:
How come integrated graphics is in the CPU, rather than being part of the chipset? For an actual single-chip SoC, i suppose it has to be, but even my Ryzen 5 7600X has graphics. I would have thought resources on the CPU would be at a premium, so you'd put them all towards compute. Particularly since integrated graphics doesn't need to be that powerful.
dragontamer 9 hours ago [-]
Chipset doesn't attach to any RAM.
Today's northbridge (aka: Memory controller) is on CPUs. GPUs need a powerful memory controller. And the most powerful memory controller between CPU and Southbridge/Chipset is the memory controller on the CPU itself.
fulafel 8 hours ago [-]
Strix Halo is the powerful-GPU product version.
More generally there isn't really a place for low performance integrated graphis any more and southbridge style chips are made with old old cheap processes that cripped with poor memory access would probably not run any modern desktop well.
A second option for the memory would be to put a small amount of local memory on the mobo along with the chipset, which again would be slow and still costly while losing the normal iGPU advantage of unified gpu & cpu memory access to th same data (UMA).
dragontamer 8 hours ago [-]
Powerful for iGPU.
I think Strix Halo is up to 44 CUs? Which is more than a 7600xt but less than a 7700xt.
So a bit on the low-end in the scheme of dGPUs. But Strix Halo might be the most powerful iGPU ever made.
fulafel 8 hours ago [-]
High end Apple M chips have it beat still I think.
edit: there's a new marketing claim from AMD that it beats M4 in some configuraton by 2.6x: https://www.amd.com/en/developer/resources/technical-article... .. but it's a small memory mode of M4 Pro, I wonder if there's independently benchmarked numbers of M4 Max vs the 395 out there.
scoopertrooper 9 hours ago [-]
As I understand it, chiplets were introduced to address this problem while being a SoC.
ROCm GPU Compute Performance With AMD Ryzen AI MAX+ "Strix Halo": https://www.phoronix.com/review/amd-strix-halo-rocm-benchmar...
There are some recent discussions on YouTube about it [1], including one with a senior VP [2].
[0]: https://github.com/ROCm/TheRock [1]: https://www.youtube.com/watch?v=6tASUo7UqNw&t=4551 [2]: https://www.youtube.com/watch?v=0B8JOtS2Tew
[1] https://www.amd.com/en/technologies/cdna.html [2] https://www.amd.com/en/technologies/rdna.html
It's not rumor. It came straight from an executive: https://www.tomshardware.com/pc-components/cpus/amd-announce...
2001: Radeon 8000 series
2013: Radeon HD 8000 series
2025: Radeon 8000S series
Maybe I'm missing something?
Plus, although I can't really swear to understand how these chips work, my read is this is basically a graphics card that can be configured with 64GB of memory. If I'm not misreading that it actually sounds quite interesting; even AMDs hopeless compute drivers might potentially be useful for AI work if enough RAM gets thrown into the mix. Although my burn wounds from buying AMD haven't healed yet so I'll let someone else fund that experiment.
Using ollama, hardware acceleration doesn't really work through ROCm. The framework doesn't officially support gfx1151 (Strix Point RDNA 3.5+), though you can override it to fake gfx1150 (Strix Halo, also RDNA 3.5+ and UMA), and it works.
I think I got it to work for smaller models that fit entirely into the preallocated VRAM buffer, but my machine only allows for statically allocating up to 16 GB for the GPU, and where's the fun in that? This is a unified memory architecture chip, I want to be able to run 30+ GB models seamlessly.
It turns out, you can. Just build llama.cpp from source with the Vulkan backend enabled. You can use a 2 GB static VRAM allocation and any additional data spills into GTT which the driver handles mapping into the GPU's address space seamlessly.
You can see a benchmark I performed of a small model on GitHub [0], but I've done up to Gemma3 27b (~21 GB) and other large models with decent performance, and Strix Halo is supposed to have 2-3x the memory bandwidth and compute performance. Even 8b models perform well with the GPU in power saving mode, inside ~8W.
Come to think of it, those results might make a good blog post.
[0] https://github.com/ggml-org/llama.cpp/discussions/10879
Search for "HX 370"
With this reasoning, I'd probably argue all modern cpus and GPUs aren't particularly remarkable/novel. That could help even be fine.
At the end of the day, these benchmarks are all meant to inform on relative performance, price, and power consumption for end users to make informed decisions (imo). The relative* comparisons are low-key just as important as the new bench data point.
Strix Halo as an APU has two very clear advantages. First, I expect power consumption is somewhat better, due to using LPDDR5(x?) and not needing to go over PCIe.
But the real win is that you can get a 64GB or 128GB GPU (well somewhat less than that)! And there's not really anything stopping 192GB or 256GB builds from happening, now that bigger ram sizes are finally available in the world. But so far all Strix Halo offerings are soldered on ram (non user upgradeable, no camm2 offerings yet), and no one's doing more than 128GB. But that's still a huge LLM compared to what consumers could run before! Or lots of LLMs loaded and ready to go! We see similar things with the large unified memory on Mac APUs; it's why are minis are sometimes popular for LLMs.
Meanwhile Nvidia is charging $20k+ for an A100 GPU with 80GB ram. You won't have that level of performance, but you I'll be able to fit an even bigger LLM than it. For 1/10th the price.
There's also a lot of neat applications for DB's or any kind of data-intense processing. Because unified memory means the work can move between CPU and GPU, without having to move the workload. Normally to use a GPU you end up copying data out of main memory then writing it to the GPU* then reading it to do work, and you can skip 2/3rds of these read/write steps here.
There's some very awesome potential for doing query processing on GPU (ex: PG-Strom). Might also be a pretty interesting for a GPU based router, ala PacketShader (2010).
* Note that PCIe p2p-dma / device RDMA / dma-buf has been getting progressively much better, a lot of attention, across the past half deacde, such that say a nic can send network data direct to GPU memory, or a NVMe drive can send data direct to GPU or network, without bouncing through main memory. One recent example of many: https://www.phoronix.com/news/Device-Memory-TCP-TX-Linux-6.1...
Giving an APU actually fast ram has some cool use cases. I'm excited to see the lines blur in computing like this, to see
Or, sometime in the next month or so, NVidia GB10-based miniPC form factor devices with 128GB (with high-speed interconnect to allow two to serve as a single 256GB system) from various brands (including direct from Nvidia) for $3000-4000 depending on exact configuration and who is assembling the completed system.
Phoronix just doesn't do much mobile dGPU testing in general to have any data to compare with there.
Maybe Zen 6. AMD already provides the Z-Series specifically for handhelds. The Steamdeck uses currently the regular Zen2 with a RNDA2 from Zen3+. Best would be passive cooling, which would probably run well with native Linux ports like Counter-Strike 2. But I'm worried that they need to use fans again.
One thing I wouldn't mind having on my Deck is FSR4 support, though AMD still hasn't submitted their Vulkan FP8 support proposal, requiring unofficial Mesa hacks to enable it even on desktop Linux.
Of course the major problem about Strix Halo is the price. I'm just wondering how much the iGPU contributed to the insane price tag compared to the NPU. If AMD can release a similar APU without the useless NPU (at least in Linux) with a more accessible pricing (e.g. 8745HS), they can easily dominate the low-end mobile dGPU market.
https://pbs.twimg.com/media/GsyOHOEW0AAVogO?format=jpg&name=...
Today's northbridge (aka: Memory controller) is on CPUs. GPUs need a powerful memory controller. And the most powerful memory controller between CPU and Southbridge/Chipset is the memory controller on the CPU itself.
More generally there isn't really a place for low performance integrated graphis any more and southbridge style chips are made with old old cheap processes that cripped with poor memory access would probably not run any modern desktop well. A second option for the memory would be to put a small amount of local memory on the mobo along with the chipset, which again would be slow and still costly while losing the normal iGPU advantage of unified gpu & cpu memory access to th same data (UMA).
I think Strix Halo is up to 44 CUs? Which is more than a 7600xt but less than a 7700xt.
So a bit on the low-end in the scheme of dGPUs. But Strix Halo might be the most powerful iGPU ever made.
edit: there's a new marketing claim from AMD that it beats M4 in some configuraton by 2.6x: https://www.amd.com/en/developer/resources/technical-article... .. but it's a small memory mode of M4 Pro, I wonder if there's independently benchmarked numbers of M4 Max vs the 395 out there.