It is small but still really good

  • d0nkey@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 days ago

    I have used the micro variant primarily with perplexica and I must say it is really good for summation and for answering further questions, especially when it comes to these tasks in my testing it has outclassed instruct models that are 2-3 times its size.

  • afk_strats@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 days ago

    You are not alone. It blew my mind at how good it is per billion parameters. As an example, I can’t think of another model that will give you working code at 4B or less. I havent tried it on agentic tasks but that would be interesting

  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    18 days ago

    Havent heard of it till this post, what about it impressed you over something like llama, mistral, qwen?

    for anyone who wants more info its a 7b Mixture of Experts model released under apache 2.0!

    " Granite-4-Tiny-Preview is a 7B parameter fine-grained hybrid mixture-of-experts (MoE) instruct model fine-tuned from Granite-4.0-Tiny-Base-Preview using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised fine-tuning, and model alignment using reinforcement learning."

    Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may fine-tune this Granite model for languages beyond these 12 languages.

    Intended Use: This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications.

    Capabilities

    Thinking
    Summarization
    Text classification
    Text extraction
    Question-answering
    Retrieval Augmented Generation (RAG)
    Code related tasks
    Function-calling tasks
    Multilingual dialog use cases
    Long-context tasks including long document/meeting summarization, long document QA, etc.
    

    https://huggingface.co/ibm-granite/granite-4.0-tiny-preview

    • Xylight@lemdro.id
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      18 days ago

      there’s also a “small” and “micro” variant, which are 32b a6b MoE and 3b dense models respectively