21127043NLPA

VIET NAM NATIONAL UNIVERSITY OF HO CHI MINH CITY
UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
LAB 02
Natural Language Processing
Applications
Sinh viên thực hiện: 21127043 - Lư Trung Hậu
Lớp CN: 21CNTThức
Giảng viên hướng dẫn: Thầy Nguyễn Hồng Bửu Long

Thấy Lương An Vinh
I. Detailed steps to evaluate the LLaMA 2 model using huggingface:

1.Install lm-evaluation-harness and bitsandbytes:
 First, we install the necessary packages lm-evaluation-harness and

bitsandbytes. lm-evaluation-harness is a framework for evaluating
language models, while bitsandbytes is a library for evaluating quantized
models.
2.Hugging Face Hub Authentication:
 For evaluating models from the Hugging Face Hub, authentication is

required to access protected models.
 We use notebook_login() from huggingface_hub for authentication.
3.List Available Evaluation Tasks:
 Before performing evaluation, we can list all available evaluation tasks

using the command lm-eval --tasks list.
4.Evaluate Llama 2 7B on Truthfulqa, HellaSwag, and Winogrande.
Evaluate the LLaMA 2 model on the tasks Truthfulqa, HellaSwag, and

Winogrande.
Use the !lm_eval command with the following parameters:
--model hf: Use a model from the Hugging Face Hub.
--model_args pretrained=meta-llama/Llama-2-7b-hf,dtype="float16": Use the

LLaMA 2 model with dtype as float16.
--tasks truthfulqa,hellaswag,winogrande: Evaluate on the Truthfulqa,

HellaSwag, and Winogrande tasks.
--device cuda:0: Utilize GPU for evaluation.
--batch_size 6: Set the batch size to 6.
--output_path ./eval_llama2_7b: Save the results to the directory

eval_llama2_7b.
--log_samples: Record sample results for later inspection.
5.Evaluate Quantized LLaMA 2 Model with bitsandbytes NF4:
 Use similar parameters as before, with the additional parameter
1
load_in_4bit=True to specify the use of the quantized model.
 The batch size is set to 14 to ensure efficient evaluation.
NF4 is a quantization technique used to reduce the precision of numerical values in neural
networks to four bits. By quantizing the model to four bits, NF4 significantly reduces
memory usage and computational requirements while aiming to maintain the fidelity of
the model's representations.
6.Evaluate Quantized LLaMA 2 Model with bitsandbytes NF4 and LoRA

Adapter:
 Use similar parameters as before, with the additional parameter

peft=kaitchup/Llama-2-7B-oasstguanaco-adapter to specify the use of the
fine-tuned LoRA adapter.
LoRA adapter is a lightweight adaptation mechanism added to pre-trained language
models (LLMs) for efficient fine-tuning on downstream tasks. It introduces low-rank
parameterization to adapter layers, reducing the number of parameters and computational
complexity. LoRA adapters enable task-specific fine-tuning without extensively
modifying the original LLM's parameters.
7.Evaluate Quantized LLaMA 2 Model with GPTQ:
 Use the !lm_eval command with similar parameters as above, but specify
the path to the quantized GPTQ model (pretrained=kaitchup/Llama-2-7b-
gptq-4bit).
 The batch size is set to 16 to ensure evaluation performance.
GPTQ is a quantization method specifically designed for pre-trained language models
inspired by the GPT architecture. It involves reducing the precision of the model's parameters
and activations to a lower bit-width representation, such as four bits. GPTQ aims to reduce
memory usage and computational complexity while optimizing performance on downstream
tasks by carefully optimizing the quantization process.
II. Detailed steps to evaluate the LLaMA 2 model without using huggingface.
1.Clone Repository:
 It clones the repository containing the quantized version of the LLaMA 2

model using GPTQ from the specified URL.
2.Install Packages:
 It installs the necessary Python packages for evaluating the quantized

model with GPTQ (auto-gptq and optimum) and other packages required
2
for running model evaluation (lm-evaluation-harness and bitsandbytes).
3.Run Evaluation:
 It runs the evaluation process using the lm_eval command. Here's a

breakdown of the parameters used:
--model hf: Specifies that the model to be evaluated is hosted on the Hugging
Face Hub.
--model_args pretrained=/content/Llama-2-7b-gptq-4bit: Specifies the path to

the cloned quantized LLaMA 2 model.
--tasks truthfulqa,hellaswag,winogrande: Specifies the evaluation tasks to be

performed (Truthfulqa, HellaSwag, and Winogrande).
--device cuda:0: Specifies the device to be used for evaluation (CUDA-

compatible GPU with ID 0).
--batch_size 16: Specifies the batch size for evaluation.
--output_path ./eval_llama2_7b_gptq: Specifies the directory where evaluation

results will be saved.
--log_samples: Indicates that sample results should be logged for later

inspection.
III:Reference:
3
4

21127043NLPA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21127043NLPA

Uploaded by

Copyright:

Available Formats

VIET NAM NATIONAL UNIVERSITY OF HO CHI MINH CITY

Sinh viên thực hiện: 21127043 - Lư Trung Hậu

Lớp CN: 21CNTThức

Giảng viên hướng dẫn: Thầy Nguyễn Hồng Bửu Long

I. Detailed steps to evaluate the LLaMA 2 model using huggingface:

 First, we install the necessary packages lm-evaluation-harness and

2.Hugging Face Hub Authentication:

 For evaluating models from the Hugging Face Hub, authentication is

3.List Available Evaluation Tasks:

 Before performing evaluation, we can list all available evaluation tasks

4.Evaluate Llama 2 7B on Truthfulqa, HellaSwag, and Winogrande.

Evaluate the LLaMA 2 model on the tasks Truthfulqa, HellaSwag, and

Use the !lm_eval command with the following parameters:

--model hf: Use a model from the Hugging Face Hub.

--model_args pretrained=meta-llama/Llama-2-7b-hf,dtype="float16": Use the

--tasks truthfulqa,hellaswag,winogrande: Evaluate on the Truthfulqa,

--device cuda:0: Utilize GPU for evaluation.

--batch_size 6: Set the batch size to 6.

--output_path ./eval_llama2_7b: Save the results to the directory

--log_samples: Record sample results for later inspection.

5.Evaluate Quantized LLaMA 2 Model with bitsandbytes NF4:

 Use similar parameters as before, with the additional parameter

6.Evaluate Quantized LLaMA 2 Model with bitsandbytes NF4 and LoRA

 Use similar parameters as before, with the additional parameter

7.Evaluate Quantized LLaMA 2 Model with GPTQ:

 It clones the repository containing the quantized version of the LLaMA 2

 It installs the necessary Python packages for evaluating the quantized

 It runs the evaluation process using the lm_eval command. Here's a

--model_args pretrained=/content/Llama-2-7b-gptq-4bit: Specifies the path to

--tasks truthfulqa,hellaswag,winogrande: Specifies the evaluation tasks to be

--device cuda:0: Specifies the device to be used for evaluation (CUDA-

--batch_size 16: Specifies the batch size for evaluation.

--output_path ./eval_llama2_7b_gptq: Specifies the directory where evaluation

--log_samples: Indicates that sample results should be logged for later

You might also like