It is small but still really good
I have used the micro variant primarily with perplexica and I must say it is really good for summation and for answering further questions, especially when it comes to these tasks in my testing it has outclassed instruct models that are 2-3 times its size.
You are not alone. It blew my mind at how good it is per billion parameters. As an example, I can’t think of another model that will give you working code at 4B or less. I havent tried it on agentic tasks but that would be interesting
Havent heard of it till this post, what about it impressed you over something like llama, mistral, qwen?
for anyone who wants more info its a 7b Mixture of Experts model released under apache 2.0!
" Granite-4-Tiny-Preview is a 7B parameter fine-grained hybrid mixture-of-experts (MoE) instruct model fine-tuned from Granite-4.0-Tiny-Base-Preview using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised fine-tuning, and model alignment using reinforcement learning."
Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may fine-tune this Granite model for languages beyond these 12 languages.
Intended Use: This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications.
Capabilities
Thinking Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc.there’s also a “small” and “micro” variant, which are 32b a6b MoE and 3b dense models respectively
granite4:micro-h should be able to run on machines with 4GB RAM
You can run Qwen3 4b thinking at q4 quantization at 2.5GB, which is probably a better model too
Do you mean IBM Granite 4?
Good catch





