Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • keepthepace@tarte.nuage-libre.fr
    link
    fedilink
    English
    arrow-up
    1
    ·
    20 hours ago

    You know what I want? A Whisper chip. Or whatever language is better now, but it is good enough for so many application. Give the sense of hearing to appliances. I guarantee you that in 20 years it will still be used.