Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.
Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.
AI really needs dedicated hardware, I feel like if there was more chip manufacturing in the west we might have more diverse chips.
Frankly I’m really confused as to why this llm demand on ram isn’t encouraging new companies to manufacture ram. If this is a bubble then we all just wait it out, if it’s not a bubble then someone else would swoop in to take up the market.
It’s not easy to scale up chip production, because it relies on extremely precise machines, which take a long time to build, and many different steps on the way from raw materials to a finished chip.
If you’d want to set up a new RAM chip factory with competitive performance, it’d be an investment of over a billion USD at the bare minimum, and it’d take a few years to set up all the processes because the first chips can roll off the assembly line.
If the bubble popped by then, then your new factory would probably run at a loss because it’s nearly impossible to complete with companies who have had decades to optimise their production processes.
Even if the bubble didn’t pop by then, then the next problem will likely be the wafer supply. Because just like how there are only a few companies with the infrastructure to build modern, high-performance computer chips, there are only a few companies with the infrastructure to build silicon wafers of a high enough quality to build those chips with. And they have only just enough capacity to supply their current customers.
So to then solve the wafer problem, someone needs to be willing to invest at least a few hundreds of millions of USD to build a new factory for those, which again would struggle to complete in a post-scarcity market. And wafers are far from being the only resource with that issue.
TL;DR: It’s be a huge investment and a huge gamble, and would likely end up just moving the problem anyway.