Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

VIET NAM NATIONAL UNIVERSITY OF HO CHI MINH CITY

UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY

LAB 02
Natural Language Processing
Applications

Sinh viên thực hiện: 21127043 - Lư Trung Hậu

Lớp CN: 21CNTThức

Giảng viên hướng dẫn: Thầy Nguyễn Hồng Bửu Long


Thấy Lương An Vinh

I. Detailed steps to evaluate the LLaMA 2 model using huggingface:


1.Install lm-evaluation-harness and bitsandbytes:

 First, we install the necessary packages lm-evaluation-harness and


bitsandbytes. lm-evaluation-harness is a framework for evaluating
language models, while bitsandbytes is a library for evaluating quantized
models.

2.Hugging Face Hub Authentication:

 For evaluating models from the Hugging Face Hub, authentication is


required to access protected models.
 We use notebook_login() from huggingface_hub for authentication.

3.List Available Evaluation Tasks:

 Before performing evaluation, we can list all available evaluation tasks


using the command lm-eval --tasks list.

4.Evaluate Llama 2 7B on Truthfulqa, HellaSwag, and Winogrande.

Evaluate the LLaMA 2 model on the tasks Truthfulqa, HellaSwag, and


Winogrande.

Use the !lm_eval command with the following parameters:

--model hf: Use a model from the Hugging Face Hub.

--model_args pretrained=meta-llama/Llama-2-7b-hf,dtype="float16": Use the


LLaMA 2 model with dtype as float16.

--tasks truthfulqa,hellaswag,winogrande: Evaluate on the Truthfulqa,


HellaSwag, and Winogrande tasks.

--device cuda:0: Utilize GPU for evaluation.

--batch_size 6: Set the batch size to 6.

--output_path ./eval_llama2_7b: Save the results to the directory


eval_llama2_7b.

--log_samples: Record sample results for later inspection.

5.Evaluate Quantized LLaMA 2 Model with bitsandbytes NF4:

 Use similar parameters as before, with the additional parameter

1
load_in_4bit=True to specify the use of the quantized model.
 The batch size is set to 14 to ensure efficient evaluation.
NF4 is a quantization technique used to reduce the precision of numerical values in neural
networks to four bits. By quantizing the model to four bits, NF4 significantly reduces
memory usage and computational requirements while aiming to maintain the fidelity of
the model's representations.

6.Evaluate Quantized LLaMA 2 Model with bitsandbytes NF4 and LoRA


Adapter:

 Use similar parameters as before, with the additional parameter


peft=kaitchup/Llama-2-7B-oasstguanaco-adapter to specify the use of the
fine-tuned LoRA adapter.
LoRA adapter is a lightweight adaptation mechanism added to pre-trained language
models (LLMs) for efficient fine-tuning on downstream tasks. It introduces low-rank
parameterization to adapter layers, reducing the number of parameters and computational
complexity. LoRA adapters enable task-specific fine-tuning without extensively
modifying the original LLM's parameters.

7.Evaluate Quantized LLaMA 2 Model with GPTQ:

 Use the !lm_eval command with similar parameters as above, but specify
the path to the quantized GPTQ model (pretrained=kaitchup/Llama-2-7b-
gptq-4bit).
 The batch size is set to 16 to ensure evaluation performance.
GPTQ is a quantization method specifically designed for pre-trained language models
inspired by the GPT architecture. It involves reducing the precision of the model's parameters
and activations to a lower bit-width representation, such as four bits. GPTQ aims to reduce
memory usage and computational complexity while optimizing performance on downstream
tasks by carefully optimizing the quantization process.

II. Detailed steps to evaluate the LLaMA 2 model without using huggingface.

1.Clone Repository:

 It clones the repository containing the quantized version of the LLaMA 2


model using GPTQ from the specified URL.

2.Install Packages:

 It installs the necessary Python packages for evaluating the quantized


model with GPTQ (auto-gptq and optimum) and other packages required

2
for running model evaluation (lm-evaluation-harness and bitsandbytes).

3.Run Evaluation:

 It runs the evaluation process using the lm_eval command. Here's a


breakdown of the parameters used:

--model hf: Specifies that the model to be evaluated is hosted on the Hugging
Face Hub.

--model_args pretrained=/content/Llama-2-7b-gptq-4bit: Specifies the path to


the cloned quantized LLaMA 2 model.

--tasks truthfulqa,hellaswag,winogrande: Specifies the evaluation tasks to be


performed (Truthfulqa, HellaSwag, and Winogrande).

--device cuda:0: Specifies the device to be used for evaluation (CUDA-


compatible GPU with ID 0).

--batch_size 16: Specifies the batch size for evaluation.

--output_path ./eval_llama2_7b_gptq: Specifies the directory where evaluation


results will be saved.

--log_samples: Indicates that sample results should be logged for later


inspection.

III:Reference:

3
4

You might also like