Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Last month, Microsoft released a paper where they show that LLMs with ternary (-1, 0, 1) weights

could match the performance of 16-bit LLMs: https://lnkd.in/e6JXMSch

In concrete terms, it means it *greatly* improves LLMs' efficiency in terms of memory,


throughput, and energy consumption. A complete shift in the way we pre-train LLMs.

Since then, two independent studies managed to reproduce these results:

- 1bitLLM: models have been trained on the RedPajama dataset (100B tokens) and are available
on the HF Hub: https://lnkd.in/ewQ_fE_v
- Nous Research: models have been trained on the Dolma dataset (60B tokens) and are also
available on the HF Hub. See the announcement with the WandB charts: https://lnkd.in/edpChHcr

The quantization scheme is super straightforward with an absmean function. You can learn more
about it in this article: https://lnkd.in/epnr8bBZ

We need to see how it scales (~30B), but super curious about 1.58-bit Mamba and MoE models.
It would be a tremendous breakthrough.

You might also like