Professional Documents
Culture Documents
1 Bit Llms Proof
1 Bit Llms Proof
- 1bitLLM: models have been trained on the RedPajama dataset (100B tokens) and are available
on the HF Hub: https://lnkd.in/ewQ_fE_v
- Nous Research: models have been trained on the Dolma dataset (60B tokens) and are also
available on the HF Hub. See the announcement with the WandB charts: https://lnkd.in/edpChHcr
The quantization scheme is super straightforward with an absmean function. You can learn more
about it in this article: https://lnkd.in/epnr8bBZ
We need to see how it scales (~30B), but super curious about 1.58-bit Mamba and MoE models.
It would be a tremendous breakthrough.