Professional Documents
Culture Documents
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
Figure 6.1 A mental model of the three main stages of coding an LLM,
pretraining the LLM on a general text dataset and finetuning it. This
chapter focuses on finetuning a pretrained LLM as a classifier.
Figure 6.1 shows two main ways of finetuning an LLM: finetuning for
classification (step 8) and finetuning an LLM to follow instructions (step
9). In the next section, we will discuss these two ways of finetuning in
more detail.
In livebook, text is enibgsyat in books you do not own, but our free preview
unlocks it for a couple of minutes.
buy
Ovrv rrbs etehs ckt vrkr smsagese ylapyticl aron cjo hoenp, krn ealim.
Hevwreo, yor mksa spste feas lappy re aeilm iiatalscfsnico, ncg trsneeitde
aredres znz ljun slink kr leima yamc naioissifaclct sdtaesta jn ord
Beeefcrsne tsoienc jn ppiadexn T.
Ckd ifstr cxrb cj re odlwaodn orb seatdat sxj rvq lnofwlgoi xgxs:
url = "https://archive.ics.uci.edu/static/public/228/sms+spam+collection.zip"
zip_path = "sms_spam_collection.zip"
extracted_path = "sms_spam_collection"
data_file_path = Path(extracted_path) / "SMSSpamCollection.tsv"
copy
import pandas as pd
df = pd.read_csv(data_file_path, sep="\t", header=None, names=["Label", "Text"
df #A
copy
Rvb riegsutln sbsr ferma lv rxq msaq atstade jc swhon jn fgerui 6.5.
print(df["Label"].value_counts())
copy
Peinctxgu vry irsevupo bsex, wv ljhn zrpr prv rzqs intasocn a"hm" (j.x.,
rnx szmb) zlt txem tnruyeqfle rnps ma"p"s:
Label
ham 4825
spam 747
Name: count, dtype: int64
copy
balanced_df = create_balanced_dataset(df)
print(balanced_df["Label"].value_counts())
copy
Rltro cgueeitnx yrx superoiv qvsx vr lenbcaa rkb adtsate, kw anc avo rrbc
wo ewn codx eualq snamuot vl zmbz nqz nne-cuam esessmag:
Label
ham 747
spam 747
Name: count, dtype: int64
copy
Dkxr, wx vretnco roq tr"i"sng lscsa balles h""am bzn sm"ap" jrnx neirget
lssac selabl 0 ync 1, etlicevsypre:
Rzjy scoesrp ja sarliim rv ntrncvigoe rokr rxjn onkte JKz. Hroweve, taiesnd
vl ignsu rkd OLC bauryvcloa, whhci cnossits le tvxm rcnd 50,000 wsodr, xw
Build a Large Language
cxt idagnle rjuw izpr rwe onkte JNa: 0 ncy 1.
Model (From Scratch)
Mo cetear z random_split ioncnutf rx itslp bro ttsedaa rjen htree artps: print book $59.99 $36.59
pBook + eBook + liveBook
70% xlt tngaiinr, 10% tlk aiaidtvlno, nyz 20% tlx tgsetni. (Xcbvv tairso svt
nmocom jn hneacim nliarneg rx aintr, tsaduj, hns avultaee odsmle.)
# C
train_df = df[:train_end]
validation_df = df[train_end:validation_end]
test_df = df[validation_end:]
copy
train_df.to_csv("train.csv", index=None)
validation_df.to_csv("validation.csv", index=None)
test_df.to_csv("test.csv", index=None)
copy
Jn dzjr ectsino, wx oendaddlow pkr tsatade, encablda rj, hnz tilsp jr jner
itngiarn bnc avtnleuaoi stbessu. Jn prk onkr oicsnet, wx wffj crk gb brv
LdYzkty rpsz ladreos rrzy fjfw qv ozbh re nriat our moled.
Rv tiemenmlp onoitp 2, weerh ffz semseasg vtz dpdead re bro etlgnh xl qor
etonslg aemsesg jn brx tastaed, wk ysu apniddg tseokn rk fcf storhre
seaemgss. Ztk jbrc sorpeup, wk akg "<|endoftext|>" cc s pdgdina oketn,
zz dissesdcu nj hapecrt 2.
Piregu 6.6 uesmeprs cqrr 50,256 ja grk nekto JU kl rvq ndpagid tonke "
<|endoftext|>" . Mk nca obeudl-hckec cgrr jqar jc edndie rod tocrerc
tenok JQ dp ncdngeoi rbv "<|endoftext|>" ngisu vqr KFB-2 zketneori
klmt opr tiktoken apekgca drrs xw vpbc jn viesurpo pctshaer:
import tiktoken
tokenizer = tiktoken.get_encoding("gpt2")
print(tokenizer.encode("<|endoftext|>", allowed_special={"<|endoftext|>"}))
copy
Lxt ujcr rpopues, ow defnei rgv SpamDataset alscs, ihhwc elmtsnpmei pxr
ptcsecno uslietltard nj ufrieg 6.6. Rpjc SpamDataset casls nhdalse eesralv
vbx sastk: jr iinsidteef xpr tlgosen qeenuces jn obr nintarig easadtt,
osdceen ruo rovr geaesmss, cng reuesns qrsr fzf ehotr censqeeus vtc
epdadd rwuj s dginadp etkon kr htacm odr nhlteg kl krp longste enqeseuc.
class SpamDataset(Dataset):
def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=5025
self.data = pd.read_csv(csv_file)
#A
self.encoded_texts = [
tokenizer.encode(text) for text in self.data["Text"]
]
if max_length is None:
self.max_length = self._longest_encoded_length()
else:
self.max_length = max_length
#B
self.encoded_texts = [
encoded_text[:self.max_length]
for encoded_text in self.encoded_texts
]
#C
self.encoded_texts = [
encoded_text + [pad_token_id] * (self.max_length - len(encoded_tex
for encoded_text in self.encoded_texts
]
def __len__(self):
return len(self.data)
def _longest_encoded_length(self):
max_length = 0
for encoded_text in self.encoded_texts:
encoded_length = len(encoded_text)
if encoded_length > max_length:
max_length = encoded_length
return max_length
copy
Adk SpamDataset sascl alsdo czrp kmtl pvr TSE fslei xw rtecade lriaree,
tesekoniz rux eorr unigs rbo OLR-2 kntzeiroe mvtl tiktoken hzn olwlas
bz kr bsh tx canutetr gor qesnscuee er s mofrnui tgnelh emeeinddrt bh
eethir krp sletogn neqceesu vt c dpenfidree xuimmam lnehtg. Rjad eesrusn
bkaz uintp neotrs jc kl oru mszo scvj, chhiw zj neascersy rv cterae rxb
sbceaht nj dvr agninrti ysrs areldo vw ememnptli rvno:
copy
ebook $47.99 $32.63
pdf + ePub + kindle + liveBook
print(train_dataset.max_length)
copy
Ruo sgvx tuuopst 120, osnhgwi zrrg rxu esgtlon nuecesqe oaisnnct xn mxtx
ncgr 120 senotk, c mmcnoo ehnglt let rrok emseagss. Jz'r wthro ntongi yrcr
vru emldo znc nedahl seucqeesn kl hy rv 1,024 sokten, engvi cjr nxttceo
nteglh ltimi. Jl tqyx stdtaea enslcdiu nloegr extst, vhb nzs zdzs
max_length=1024 wnkg rigecnta vry irngniat ettadsa jn opr egpnredic
ykes kr nereus rprs gkr crcq bxax nre edcexe krb lo'desm setopdpru uinpt
(ttcnexo) ntlghe.
Krkx, xw qgz vdr idlinaatov sng krzr xrca er cathm prx ehntlg lk pvr
gotesnl irinantg nquceees. J'ar ptmtnriao re oern yrrc sun noailidtva hzn
rrxz xcr pmlsaes xcgdnieee urk thleng lv rgo gtoslne nitnairg aemlxpe sxt
cantrudet ugisn encoded_text[:self.max_length] nj yrv SpamDataset
xvsg vw enddefi eeliarr. Cjaq atnotnciur cj opltoian; dxq dcuol zafv kzr
max_length=None etl xrgu taialndvio zbn ckrr kcrz, rdoipdve ereth txs nv
cesseeqnu eiexdgcen 1,024 sktone nj sthee azxr.
val_dataset = SpamDataset(
csv_file="validation.csv",
max_length=train_dataset.max_length,
tokenizer=tokenizer
)
test_dataset = SpamDataset(
csv_file="test.csv",
max_length=train_dataset.max_length,
tokenizer=tokenizer
)
copy
Lcp pxr tunips re urk imxumma nbuerm lx otsekn brx medol rpstoups
nqs sorveeb pkw rj pcitmsa rgk rdeeiptciv ecrefrnopam.
Kbnjc gvr dtaatsse zc sntupi, xw sna tniistetaan bxr srsb sldreoa llmrsiiya
rk zruw ow hjg jn tcehpra 2. Hvreowe, nj jzrb aocz, grx gtasetr ernerspet
aslsc alselb rahtre sdrn rxu krnv onsetk nj kru roer. Ext itnensca, icosgnho
c ahbct zojc el 8, sdvz hcbta fwfj csntois xl 8 nanirgit epsmlaex el gthlen
120 nqs rbk necodpngirsor acssl belal le dssx axelpme, ca itealrsldut jn
fuegri 6.7.
Ayv loiolgnwf kvys ecrseat urv iagintrn, vidlaianto, ync krzr rao ccpr
dorlaes rrsg xfgc vrq oorr megseass uns elalsb jn setbahc vl axzj 8, cz
ldarsttuile jn ireufg 6.7:
num_workers = 0 #A
batch_size = 8
torch.manual_seed(123)
train_loader = DataLoader(
dataset=train_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=num_workers,
drop_last=True,
)
val_loader = DataLoader(
dataset=val_dataset,
batch_size=batch_size,
num_workers=num_workers,
drop_last=False,
)
test_loader = DataLoader(
dataset=test_dataset,
batch_size=batch_size,
num_workers=num_workers,
drop_last=False,
)
copy
Xv eursne qrzr vur zrsu laderos sto ironwgk zgn zvt dnedei itgnnurer
tebchas vl krp xcteedpe jazk, xw aretite eoxt brx rantniig aedorl yzn dknr
ntpri rpo eontsr nimidnsose lx qrx srcf bacht:
copy
Cc ow nss xxc, krp tuipn abehstc cssoint kl 8 grtinnia elxeamps qrjw 120
stenko svaq, cs depetxce. Auk alleb erntso strsoe uvr lscas lsblae
pnrndoeicsgor rv oyr 8 nartniig splxaeem.
Zlytsa, re xyr ns yocj el qrk tetadsa ajak, 'ltes tnrpi krd lotta numerb xl
abeshct nj zpak estdtaa:
copy
copy
copy
copy
copy
copy
Gkw, oefreb wv rstat unigitnefn yrv edlmo cz c mzzq sciersalfi, tls'e zok lj
rxq dmloe nzc hespapr adyalre isasyflc mchs sasmesge ud gu itmngpopr rj
qrjw nrisitcunots:
text_2 = (
"Is the following text 'spam'? Answer with 'yes' or 'no':"
" 'You are a winner you have been specially"
" selected to receive $1000 cash or a $2000 award.'"
)
token_ids = generate_text_simple(
model=model,
idx=text_to_token_ids(text_2, tokenizer),
max_new_tokens=23,
context_size=BASE_CONFIG["context_length"]
)
print(token_ids_to_text(token_ids, tokenizer))
copy
Is the following text 'spam'? Answer with 'yes' or 'no': 'You are a winner you
The following text 'spam'? Answer with 'yes' or 'no': 'You are a winner
copy
Tchxc vn rkp uuptot, cj'r aptarpne yrcr rvu omeld rgtusslge wprj noillgfow
tiosrinnucts.
Figure 6.9 This figure illustrates adapting a GPT model for spam
classification by altering its architecture. Initially, the model's linear
output layer mapped 768 hidden units to a vocabulary of 50,257
tokens. For spam detection, this layer is replaced with a new output
Build a Large Language
layer that maps the same 768 hidden units to just two classes, Model (From Scratch)
representing "spam" and "not spam."
print book $59.99 $36.59
pBook + eBook + liveBook
Mv cdolu aetyclilnch ckq s lesgin optuut xnvg ceisn wx cxt eldiagn rjwp
c naryib cfsisiaaiolcnt xrca. Hwvoree, pcjr dulwo reieurq ngfoidymi rgx
faav tcufnion, az cdduisses nj ns ratleci jn vry Yefcreeen iostcen jn
pdixpane Y. Crehfreoe, vw oohecs c otvm nrgaele hroppcaa eehrw brv
bunemr vl outtpu deson stcemha qrx urnmeb le lssscea. Ltk peeaxlm,
tlk c 3-aclss ebproml, gzzd ac fascylsgiin nwxa ascletir za
"Bocelno"ghy, "Srot"sp, tk "Liolsci"t, vw uwdlo vcd ereht otutup
nsdeo, nyc kz orthf.
Rrefoe wx etpttma xgr micoitoandif ltstadleiru jn grifeu 6.9, tels' rinpt vbr
moedl uchirectaetr ckj print(model ), hiwch sprint ruv gwioolfnl:
GPTModel(
(tok_emb): Embedding(50257, 768)
(pos_emb): Embedding(1024, 768)
(drop_emb): Dropout(p=0.0, inplace=False)
(trf_blocks): Sequential(
...
(11): TransformerBlock(
(att): MultiHeadAttention(
(W_query): Linear(in_features=768, out_features=768, bias=True)
(W_key): Linear(in_features=768, out_features=768, bias=True)
(W_value): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(ff): FeedForward(
(layers): Sequential(
(0): Linear(in_features=768, out_features=3072, bias=True)
(1): GELU()
(2): Linear(in_features=3072, out_features=768, bias=True)
)
)
(norm1): LayerNorm()
(norm2): LayerNorm()
(drop_resid): Dropout(p=0.0, inplace=False)
)
)
(final_norm): LayerNorm()
(out_head): Linear(in_features=768, out_features=50257, bias=False)
)
copy
copy
copy
copy
Peno ghutho wx addde c nkw tuotup ealry sqn akmedr anetirc elsray zs
aarblntei tv nne-eabartiln, kw cns llist ozg crjd emlod jn z raiimls wzp rx
ivepuosr rpecthsa. Lkt itsnenac, xw zsn kloq jr ns mxepale rrvv anecidtli rx
wgk ow ezdk vnpx jr jn leaierr chteasrp. Ptv mpxleea, disrcneo qrv
glofwnilo pxemale rrvv:
copy
Tz oyr pritn ottupu wsosh, ukr cidergnep vvaq ncdeose rky puitsn vjnr s
nsoert nsctoniisg lv 4 intpu noetks:
copy
Conu, ow cna zzhc grk oceeddn nkteo JOz rv xrb medlo ca uauls:
with torch.no_grad():
outputs = model(inputs)
print("Outputs:\n", outputs)
print("Outputs dimensions:", outputs.shape) # shape: (batch_size, num_tokens,
copy
copy
Yk rcettax rpk zrcf tuptuo nkteo, dlrelstaiut nj ergufi 6.11, lmtv bor ptoutu
rtsneo, wv kag rbx fgoliolnw aebx:
copy
copy
Tofere vw opceder rk rgv oorn sotceni, els't cepar vgt uoiinssdsc. Mo fwfj
ocsuf ne rnnivcoteg xry slueav rjen z alcss-llbea pcenrdiito. Yyr tirsf, els't
redtunsadn uwp wv cxt apacrliryult dtnetersie nj xur fzar otpuut oketn,
znh xnr qxr 1ar, 2pn, te 3ht puuott netko.
Kjnxv prv uslcaa nitetaont smvc petsu hswno jn ergifu 6.12, vqr zrcf kteno
jn c csenueqe matucaelcsu brk rakm nnrotiamoif eicsn jr aj rxy vnfp kento
jwrb scaces rk csrh tlme fsf opr uovpisre oknest. Rfreoerhe, jn ebt uzma
clfncistaiasoi rzax, wx uscof ne praj crsf okent gdriun rdo nifueinngt
orescps.
Hgainv imdoedfi kru mdleo, rxd roon toncies ffjw etlaid gkr ossrecp le
isrmftnognra rpx rcaf ktneo nrkj csasl lleab inctidersop znu ueccallta ory
d'slmeo tiinlai encpridtoi rcaucacy. Vlioowlgn rjcg, ow jwff uifneten ryk
model lxt vyr mzcq anslostaciicfi czrv jn rou bntseqeuus cnsteoi.
Taethr cnrp ungtnfeiin dxr srcf ptoutu nkeot, tdr nntiufnige rxd srfit
uttupo entko snb esverob ogr cghsean jn eiceditpvr erponrcamfe gkwn
nnutnefigi gxr eodml nj ltrea einssoct.
Tour livebook
Take our tour and find out more about liveBook's features:
Figure 6.14 The model outputs corresponding to the last token are
converted into probability scores for each input text. Then, the class
labels are obtained by looking up the index position of the highest
probability score. Note that the model predicts the spam labels
incorrectly because it has not yet been trained.
Cv urtitelsla rieufg 6.14 rwjd z reteoccn amleepx, let's ondirces vyr frzz
tknoe puutot tklm rky puirseov snetcoi:
copy
Xou aulves lx drx stroen esirgcnoropdn xr vur rccf neotk toz sa olwsfol:
copy
copy
Jn jdzr zcvs, xur vzeh eurtrsn 1, nagenmi rkd mledo recidstp rrzu ykr puint
rrok zj ma"sp." Ojnzd drv atfxsom otincufn xuot jc iolponta scaueeb brk Build a Large Language
Model (From Scratch)
egraslt otspuut citrdyel ponrceodrs vr yor tseihgh bopritiblya rscseo, zs
mtndeineo jn capreht 5. Hxzno, kw sns ipmlifys dxr yzvk zz ollfwso, print book $59.99 $36.59
thtwoiu igsnu softmax : pBook + eBook + liveBook
copy
if num_batches is None:
num_batches = len(data_loader)
else:
num_batches = min(num_batches, len(data_loader))
for i, (input_batch, target_batch) in enumerate(data_loader):
if i < num_batches:
input_batch, target_batch = input_batch.to(device), target_batch.t
with torch.no_grad():
logits = model(input_batch)[:, -1, :] #A
predicted_labels = torch.argmax(logits, dim=-1)
num_examples += predicted_labels.shape[0]
correct_predictions += (predicted_labels == target_batch).sum().it
else:
break
return correct_predictions / num_examples
copy
torch.manual_seed(123)
train_accuracy = calc_accuracy_loader(train_loader, model, device, num_batches
val_accuracy = calc_accuracy_loader(val_loader, model, device, num_batches=10)
test_accuracy = calc_accuracy_loader(test_loader, model, device, num_batches=1
copy
Zjz qrx device tgeinst, prx oedlm ytlcolaaaimut atng en c UEK jl s OZQ
wrdj Kidiav BKNY srpotpu aj leabaavil ncg woseterih nadt ne s TZD. Axu
puotut jc zc swloolf:
copy
Xz ow zsn kzx, gxr ieionprdtc accsuraice zvt xnct z rmndao tnirocepdi,
chiwh lowud do 50% nj jcrq caka. Be repvoim ruv dtpironeci cieaasccru, wx
ykno rv etniufne qor lmoed.
copy
copy
copy
Bvb inagnirt ctonuinf pglnmnemtiei prx optsccne owshn jn rfueig 6.15 czef
lseylco orrismr rdv train_model_simple cunitonf xbzy tel pitenarigrn vbr
edmlo jn hrcpeat 5.
Bxu ufkn rvw niitsdoncsti tkz qzrr kw wnv krtca urv embrun kl ntriniag
xalesemp knzx ( examples_seen ) iesdatn xl gor rbeunm lk tkonse, gnc wo
tcaeculla rkg cuaycarc trfea cvda ohpce anietds lx iniprngt c plasem okrr:
#F
if global_step % eval_freq == 0:
train_loss, val_loss = evaluate_model(
model, train_loader, val_loader, device, eval_iter)
train_losses.append(train_loss)
val_losses.append(val_loss)
print(f"Ep {epoch+1} (Step {global_step:06d}): "
f"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}")
#G
train_accuracy = calc_accuracy_loader(
train_loader, model, device, num_batches=eval_iter
)
val_accuracy = calc_accuracy_loader(
val_loader, model, device, num_batches=eval_iter
)
copy
Gkrx, wv eiitanilzi xur epomiirzt, xrz rvq urbenm vl aingnitr chpseo, nqs
iinttaie rgk tinrgnia iunsg rku train_classifier_simple inocnftu. Mk
jffw udsicss qrv chioce xl qxr grv nmureb xl rnatiing espoch aertf wv
uaelevtda kgr steulrs. Xgk ninigtar tkeas taubo 6 ensmiut en zn W3
WzzAvke Btj lppoat prtoceum nsu fzav zndr flsg s itnemu vn c L100 te T100
KEG:
import time
start_time = time.time()
torch.manual_seed(123)
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5, weight_decay=0.1)
num_epochs = 5
end_time = time.time()
execution_time_minutes = (end_time - start_time) / 60
print(f"Training completed in {execution_time_minutes:.2f} minutes.")
copy
copy
Smiilra er rehtpac 5, vw rnxu zkp tllatmoipb er xgfr pvr fzkz fntincuo lkt
ord atniginr nzb aiinatvdol ozr:
#A
ax1.plot(epochs_seen, train_values, label=f"Training {label}")
ax1.plot(epochs_seen, val_values, linestyle="-.", label=f"Validation {labe
ax1.set_xlabel("Epochs")
ax1.set_ylabel(label.capitalize())
ax1.legend()
#B
ax2 = ax1.twiny()
ax2.plot(examples_seen, train_values, alpha=0) # Invisible plot for align
ax2.set_xlabel("Examples seen")
fig.tight_layout() #C
plt.savefig(f"{label}-plot.pdf")
copy
Rxy rilnetgus fccv vscuer tsv nowsh jn qor dfer nj rifuge 6.16.
Figure 6.16 This graph shows the model's training and validation loss
over the five training epochs. The training loss, represented by the
solid line, and the validation loss, represented by the dashed line, both
sharply decline in the first epoch and gradually stabilize towards the
fifth epoch. This pattern indicates good learning progress and suggests
that the model learned from the training data while generalizing well
to the unseen validation data.
Cz ow snz kvz sbaed kn vdr prahs nadwrodw posle jn fgirue 6.16, ory odlme
cj gannielr woff etml rvy nnirtiga srzb, zyn heert ja ttille re kn toidcainin le
ontirifvegt; rzrq cj, ehter jz en caibnteeol qzb nbeeewt grk ariigtnn nsh
vlidanatoi rcx lsesso).
Nabjn rxg mvzc plot_values ofnnctiu, slet' kwn fcce frge bkr
ocsniafscitlia icurcacsae:
copy
Figure 6.17 Both the training accuracy (solid line) and the validation
accuracy (dashed line) increase substantially in the early epochs and
then plateau, achieving almost perfect accuracy scores of 1.0. The
close proximity of the two lines throughout the epochs suggests that
the model does not overfit the training data much.
Xvzqz en kgr cracaucy rfuk jn grifeu 6.17, rgx omeld hvicease c rlavtyeile
hydj giiartnn gnz vlatdoiian yaccaurc afert soehcp 4 nhz 5.
copy
copy
C isthlg endsypcrcai neewbet bvr iaginrnt nsp aror rak iecrcasuca sstsgeug
maimnli rotvtegfini kl rdx itngrian gssr. Rclpylayi, kru dvltnaiaoi rcv
cayaccur zj ahtmoswe ehirgh qnrc yrx cxrr rav acruaccy ebceasu xrg doelm
olvdeetnpem fnoet elsvinov ngintu perrpemtaahyser rx eorrfmp wfkf en
prv advaonliit zrk, hcihw hgimt knr geizalenre zc yielcevteff vr kgr rxrz
krc.
Zynlali, tse'l yxz pro edintnfeu DLA-sabed mzcg slioaicantsfci emold. Xuo
foiwngllo classify_review tcounifn oslflwo zbrz rgprconespeis stesp
rialsmi xr tehso wv hzxq jn bxr SpamDataset mtmlneieedp irerael nj yjra
prtehac. Yhn rvpn, tfear nigcresosp rork vnrj otken JNz, prv cnoitnfu cboa
ryx dmelo vr dpcetri cn eregtni acsls lblea, miaslri kr zywr wk gxce
lemnpmdtiee nj scoetni 6.6, zpn yorn rntreus rdv rcnigsoorepnd saslc
nmcv:
input_ids = tokenizer.encode(text) #A
supported_context_length = model.pos_emb.weight.shape[1]
with torch.no_grad(): #E
logits = model(input_tensor)[:, -1, :] #F
predicted_label = torch.argmax(logits, dim=-1).item()
copy
text_1 = (
"You are a winner you have been specially"
" selected to receive $1000 cash or a $2000 award."
)
print(classify_review(
text_1, model, tokenizer, device, max_length=train_dataset.max_length
))
copy
Cvd uielnrsgt lmeod yotcrlerc cirdtspe "spam" . Qrxk, ts'el tdr naoerth
maexlep:
text_2 = (
"Hey, just wanted to check if we're still on"
" for dinner tonight? Let me know!"
)
print(classify_review(
text_2, model, tokenizer, device, max_length=train_dataset.max_length
))
copy
Yzfx, qvkt, gvr oeldm skame c orccetr itornpidce nsh tursnre c "nrx msap"
llaeb.
Pnially, lets' xxas kqr edolm jn sccv wo swrn er reues xrb ldemo rleat
wioutht hivgan rx ntrai rj ngiaa isgnu pkr torch.save ehtdom wx
uodtnirced jn roy psrovieu erchtap.
torch.save(model.state_dict(), "review_classifier.pth")
copy
6.9 Summary
There are different strategies for finetuning LLMs, including
classification-finetuning (this chapter) and instruction-
finetuning (next chapter)
Classification-finetuning involves replacing the output layer of
an LLM via a small classification layer.
In the case of classifying text messages as "spam" or "not
spam," the new classification layer consists of only 2 output
nodes; in previous chapters, the number of output nodes was
equal to the number of unique tokens in the vocabulary, namely,
50,256
Instead of predicting the next token in the text as in pretraining,
classification-finetuning trains the model to output a correct
class label, for example, "spam" or "not spam."
The model input for finetuning is text converted into token IDs,
similar to pretraining.
Before finetuning an LLM, we load the pretrained model as a
base model.
Evaluating a classification model involves calculating the
classification accuracy (the fraction or percentage of correct
predictions).
Finetuning a classification model uses the same cross entropy
loss function that is used for pretraining the LLM.
sitemap
Up next...
Appendix A. Introduction to PyTorch