Professional Documents
Culture Documents
21127043NLPA
21127043NLPA
UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
LAB 02
Natural Language Processing
Applications
1
load_in_4bit=True to specify the use of the quantized model.
The batch size is set to 14 to ensure efficient evaluation.
NF4 is a quantization technique used to reduce the precision of numerical values in neural
networks to four bits. By quantizing the model to four bits, NF4 significantly reduces
memory usage and computational requirements while aiming to maintain the fidelity of
the model's representations.
Use the !lm_eval command with similar parameters as above, but specify
the path to the quantized GPTQ model (pretrained=kaitchup/Llama-2-7b-
gptq-4bit).
The batch size is set to 16 to ensure evaluation performance.
GPTQ is a quantization method specifically designed for pre-trained language models
inspired by the GPT architecture. It involves reducing the precision of the model's parameters
and activations to a lower bit-width representation, such as four bits. GPTQ aims to reduce
memory usage and computational complexity while optimizing performance on downstream
tasks by carefully optimizing the quantization process.
II. Detailed steps to evaluate the LLaMA 2 model without using huggingface.
1.Clone Repository:
2.Install Packages:
2
for running model evaluation (lm-evaluation-harness and bitsandbytes).
3.Run Evaluation:
--model hf: Specifies that the model to be evaluated is hosted on the Hugging
Face Hub.
III:Reference:
3
4