Deep Learning Based Image Classification On Smartphones

-NAPOCA
-NAPOCA
Specializarea:
Absolvent,
Decan,
cn=GABRIEL OLTEAN, sn=OLTEAN, Digitally signed by VIRGIL-MIRCEA DOBROTA

givenName=GABRIEL, DN: cn=VIRGIL-MIRCEA DOBROTA,
serialNumber=201104209GO22,
email=gabriel.oltean@bel.utcluj.ro,
o=UNIVERSITATEA TEHNICA DIN CLUJ-NAPOCA,
VIRGIL-MIRCEA sn=DOBROTA, givenName=VIRGIL-MIRCEA,
serialNumber=201104209VMD8,
email=virgil.dobrota@com.utcluj.ro,
DOBROTA
l=CLUJ-NAPOCA, st=CLUJ, c=RO, title=DECAN
ou=COMUNICATII, o=UNIVERSITATEA TEHNICA
2020.07.14 15:32:01 +03'00'
DIN CLUJ-NAPOCA, l=CLUJ-NAPOCA, st=Cluj,
c=RO, title=DIRECTOR DEPARTAMENT
2014 Date: 2020.07.16 22:39:14 +03'00'

-NAPOCA
Departamentul
Titlul proiectului :
Descrierea temei:
Locul de realizare:
Data emiterii temei:

temei:
Absolvent,
Director departament, ,
Digitally signed by VIRGIL-MIRCEA DOBROTA

DN: cn=VIRGIL-MIRCEA DOBROTA,
VIRGIL-MIRCEA sn=DOBROTA, givenName=VIRGIL-MIRCEA,

serialNumber=201104209VMD8,
email=virgil.dobrota@com.utcluj.ro,
DOBROTA ou=COMUNICATII, o=UNIVERSITATEA TEHNICA

DIN CLUJ-NAPOCA, l=CLUJ-NAPOCA, st=Cluj,
c=RO, title=DIRECTOR DEPARTAMENT
Date: 2020.07.13 22:32:56 +03'00'
de
Subsemnatul
legitimat cu seria
în calitate de autor al cu titlul
de la
Facultatea
Specializarea
Universitatea Tehnic din Cluj-Napoca
sesiunea a anului
declar pe baza
bibliografie.
Declar fost
drepturile de autor.
comisii de examen
.
administrative
respectiv anularea examenului de .
i prenume
Data
Absolvent:
Conduc tor:
SINTEZA
torului
tor, Absolvent,
1. Table of Contents
1. Table of Contents ........................................................................................................... 1

2. Summary (rezumat în limba română) .......................................................................... 3
2.1. State of the art ....................................................................................................................... 3
2.2. Theoretical Fundamentals ..................................................................................................... 4
2.3. Implementation ..................................................................................................................... 7
2.4. Experimental results .............................................................................................................. 9
2.5. Conclusions .......................................................................................................................... 10
3. Work planning .............................................................................................................. 11
4. State of the Art .............................................................................................................. 12
5. Theoretical Fundamentals ........................................................................................... 15
5.1. Image Classification ............................................................................................................. 15
5.2. Artificial Neural Networks ................................................................................................... 17
5.2.1. Architecture .................................................................................................................. 17
5.2.2. Data flow through neural networks ............................................................................. 17
5.2.3. Activation functions ...................................................................................................... 19
5.3. Learning algorithm............................................................................................................... 20
5.3.1. Gradient calculation. Cost function .............................................................................. 21
5.3.2. Learning Rate ................................................................................................................ 22
5.3.3. Stochastic gradient descendent – SGD optimizer ........................................................ 22
5.3.4. Adaptive moment estimation – Adam optimizer ......................................................... 23
5.4. Convolutional Neural Networks .......................................................................................... 24
5.4.1. Architecture .................................................................................................................. 24
5.4.2. Convolution algorithm .................................................................................................. 25
5.4.3. Pooling operation ......................................................................................................... 26
5.5. Overfitting. Underfitting ...................................................................................................... 27
5.5.1. Data augmentation ....................................................................................................... 28
5.5.2. Fine-tuning. Transfer learning ...................................................................................... 28
6. Implementation............................................................................................................. 29
6.1. Technologies ........................................................................................................................ 29
6.1.1. Anaconda environment. Jupyter Notebook ................................................................. 29
6.1.2. Python Packages for Machine Learning........................................................................ 30
6.1.3. Android Studio .............................................................................................................. 31
6.2. Implementation of the CNN classifier ................................................................................. 31
1|Page
6.2.1. Dataset Description ...................................................................................................... 32
6.2.2. Data augmentation ....................................................................................................... 34
6.2.3. Transfer learning........................................................................................................... 37
6.2.4. Training ......................................................................................................................... 40
6.2.1. Performances Evaluations. TensorFlow Lite conversion .............................................. 41
6.3. Implementation of the Smartphone Application ................................................................ 42
6.3.1. Import the TFLite model ............................................................................................... 43
6.3.2. Android Manifest .......................................................................................................... 43
6.3.3. Choose Model Activity .................................................................................................. 44
6.3.4. Classify Activity ............................................................................................................. 45
7. Experimental results .................................................................................................... 47
7.1. CNN models ......................................................................................................................... 47
7.1.1. Assorted leaves model with 38 classes ........................................................................ 48
7.1.2. Tomato leaves model with 10 classes .......................................................................... 51
7.2. Android application ............................................................................................................. 53
8. Conclusions ................................................................................................................... 57
9. References ..................................................................................................................... 58
10. Appendix ..................................................................................................................... 61
2|Page
2. Summary (rezumat în limba română)
2.1. State of the art

Revoluția tehnologică a avut un impact semnificativ asupra agriculturii, ceea ce a dus la o
industrializare substanțială în gestionarea producției de culturi. Acest fenomen este denumit
Agricultură de Precizie și scopul său este de a îmbunătăți calitatea mediului și performanțele legate
de producția vegetală.
Prosperitatea producțiilor agricole este puternic legată de diferite atribute care presupun un
grad ridicat de incertitudine, cum ar fi proprietățile solului, topografia, vremea și irigarea naturală.
Aceste aspecte pot fi manipulate cu sisteme construite pe baza Inteligenței artificiale (AI) datorită
capacității acesora de a face față unei cantități mari de date și de a funcționa pe obiective neliniare.
Clasificarea imaginilor reprezintă un subcâmp de recunoaștere a modelelor în imagini cu
aplicabilitate largă în agricultura de precizie. Acesta este adesea utilizată pentru detectarea
patologiei la plante, prevenind în acest fel ca randamentul culturilor să fie afectat negativ,
conducând astfel la o îmbunătățire a producțiilor agricole.
Procesul de extragere a modelelor de caracteristici din imagini poate fi realizat manual sau
într-o manieră automată, de către sistemul însuși folosind algoritmi de calcul puternici și complecși.
Caracteristicile extrase manual sunt, de asemenea, numite "artizanale" și implică un compromis
între eficiența matematică și performanța sistemului. Pe de altă parte, problema de extracție a
caracteristicilor este, de asemenea, abordată prin crearea de sisteme complexe de rețele neuronale
care prezintă capacitatea de a detecta și extrage automat caracteristicile semnificative. Diferența
dintre cele două metode este ilustrată în figura 1 din secțiunea 5.1.Clasificare de imagini de mai
jos:
Figure 1.Traditional and deep learning approaches of image classification
Detectarea și clasificarea bolilor plantelor pot fi realizate prin intermediul mai multor
algoritmi de inteligență artificială (IA). Câteva exemple de astfel de metode de clasificare AI sunt
regresia logistică, arborele decizional, rețelele convoluționale (CNN) și Mașinile vectoriale de
suport (SVM). Aceste metode sunt adesea combinate cu diferite tehnici de preprocesare a datelor,
și anume caracteristici artizanale, pentru îmbunătățirea semnificației caracteristicilor.
Lucrarea curentă intenționează să implementeze un clasificator de imagini bazat pe “Deep-
learning” pentru agricultura de precizie. Se propun punerea în aplicare a două metode. Prima, care
constă într-un clasificator CNN antrenat pe bolile plantelor conținute în setul de date
„PlantVillage”, și o a doua metodă care implică clasificarea numai pentru bolile specifice culturii
de roșii .
3|Page
2.2. Theoretical Fundamentals
Viziunea artificială este una dintre cele mai robuste forme ale domeniului de știință
cunoscut sub numele de inteligență artificială, al cărei scop este de a reproduce modul de
funcționare complex al sistemelor vizual și neuronal ale ființelor umane, oferind unui dispozitiv
"cunoștințele" necesare pentru a identifica, recunoaște și procesa anumite obiecte din imagini sau
videoclipuri.
Clasificarea de imagini exploatează funcționalitățile tehnicilor de viziune artificială, pentru
a asocia etichetele imaginilor, care sunt descrise prin tipare (caracteristici) specifice, astfel încât să
se obțină capacitatea de a asocia o anumită imagine clasei sale de apartenență. Acesta se bazează
pe două principii, și anume analiza descriptivă și analiza predictivă, care reprezintă cele două
caracteristici principale ale unui sistem de clasificare a imaginilor.
Principiul tehnicilor „Deep Learning” este de a învăța un dispozitiv să filtreze o anumită
intrare prin mai multe straturi ascunse compuse din unități numite neuroni, pentru a prezice și
eticheta informația. Pentru a instrui o rețea neurală de tipul “deep-learning” (DNN), este necesară
o cantitate considerabilă de date etichetate și arhitecturi neuronale, iar scopul său este de a
determina și recunoaște automat trasaturi exacte din setul de date de intrare, fără a fi nevoie de
extragerea manuală a acestora.
Algoritmii manipulați în învățarea automată pot fi împărțiți în trei categorii în funcție de
modul în care învață. Prin urmare, există învățare supravegheată, semi-supravegheată și învățare
nesupravegheată. Învățarea supravegheată se caracterizează prin faptul că datele de intrare sunt
etichetate corespunzător și au un rezultat foarte bine cunoscut. Învățarea nesupravegheată constă
în utilizarea datelor neetichetate, ceea ce înseamnă că datele de intrare nu au un rezultat deja
cunoscut.
Rețelele neuronale artificiale (ANNs), sub numele căruia „deep-learning” este cunoscut,
reprezintă un tip complex de tehnici de învățare automată, utilizat ori de câte ori este necesar pentru
a procesa o cantitate foarte mare de date. Un ANN are la bază unitățile neuronale cunoscute sub
numele de noduri, care sunt structurate în continuare pe baza unei abordări stratificate. O rețea
neurală constă, de obicei, în trei categorii de straturi, după cum urmează: un strat de intrare, urmat
de unul sau mai multe straturi ascunse și în cele din urmă, un strat de ieșire. Arhitectura de bază a
unui model neuronal este prezentată în figura 2 (secțiunea 5.2.1.Arhitectura).
Figure 2. Basic Architecture of an Artificial Neural Network
Datele pe baza cărora modelul se antrenează, se introduc în stratul de intrare, apoi parcurg
o succesiune de straturi ascunse până când ajung la stratul de ieșire al rețelei care oferă semnalele
dorite pentru a clasifica în mod corespunzător informația primită. Ieșirea stratului superior
furnizează predicțiile. Fluxul datelor prin rețea suferă transformări distincte în funcție de stratul
prin care trece. Complexitatea transformărilor aplicate datelor crește odată cu adâncimea rețelei
neuronale. Pe baza fluxului de date prin rețea există două tipuri diferite de ANN: rețele neuronale
de tipul feedback, respectiv feedforward.
4|Page
Fiecare neuron specific al unui anumit strat ia ca valori numerice de intrare, fie
caracteristicile extrase din datele de antrenare, în cazul stratului de intrare, fie ieșirile nodurilor din
stratul precedent. Pentru ca semnalele să treacă prin rețea, parcurg conexiuni ponderate descrise în
mod unic. Când semnalul intră într-un neuron, pentru ca acel nod să se activeze, este necesară o
funcție de activare. Această funcție determină dacă un anumit neuron produce o valoare de ieșire
sau nu. În cazul în care se produce o valoare de ieșire, aceasta este mapată tot prin intermediul
funcției de activare. Principiul de funcționare al unui neuron este ilustrat în figura 4, din secțiunea
5.2.3.Funcții de activare.
Figure 4. Neuron's working principle
Una dintre funcțiile de activare utilizate în mod regulat pentru straturile ascunse din rețeaua
neurală este "Unitatea Liniară Rectificată" cunoscută și sub numele de ReLU. Din punct de vedere
biologic această funcție simulează în modul cel mai precis, procesul real de luare a deciziilor al
neuronilor din creierul uman. ReLU nu necesită pre-normalizarea datelor de intrare și produce o
valoare de 0 pentru orice intrare negativă și un anumit procentaj, în cazul în care intrarea este o
valoare mai mare decât 0.
Pentru modelele de clasificare cu mai mult de două clase, în stratul de ieșire, este foarte des
utilizată funcția de activare SoftMax, deoarece ieșirea sa oferă distribuția probabilităților unui
vector de valori reale.
La crearea rețelei neuronale, valorile ponderilor sunt inițiate aleatoriu. Aceste valori sunt
utilizate în timpul primei epoci de antrenare. O epocă reprezintă o trecere completă prin întregul
set de date de intrare. În timpul unei epoci, la fiecare pas, datele sunt inserate primului strat, fiind
segmentate în porțiuni mai mici numite loturi. Datele sunt procesate de rețea și în cele din urmă se
face o predicție. Diferența dintre valoarea prezisă și valoarea reală are ca rezultat o funcție de cost,
cunoscută și ca funcția de pierdere. Scopul principal al unei rețele neuronale este de a găsi minimul
acestei funcții. Un exemplu de astfel de functie se regăsește în figura 8 (secțiunea 5.3.3.Gradientul
stocastic descendent).
Figure 8. Stochastic gradient descendent
Prin urmare, pentru a atinge acest obiectiv trebuie calculat un gradient, astfel încât ponderile
să poată fi ajustate la valorile corespunzătoare pentru a crește acuratețea unui model. Minimizarea
pierderii caracteristice modelului nu este un proces direct. De fapt, se face într-un anumit număr de
pași. Dimensiunea unui pas determină ce procent de eroare este corectat după o examinare a tuturor
datelor. Acest aspect este strâns legat de un hiperparametru al unui model „deep-learning”, denumit
"rată de învățare" (LR).
5|Page
Rata de învățare poate avea un impact major asupra procesului de învățare. În cazul în care
valoarea este prea mare, se declanșează fenomenul de „overfitting”, care constă în evitarea valoarii
minime a funcției de cost datorită dimensiunii mai mari a pasului de corecție dat de LR. Pe de altă
parte, o valoare foarte mică a ratei de învățare poate duce la un timp de procesare mai lung și, în
unele situații, poate duce la încetarea procesului de învățare.
Întrucât optimizatorul gradient descendent este instabil din punct de vedere al
performanțelor generate, îmbunătățiri treptate au fost introduse până s-a ajuns la optimizatorul
Adam, care permite o rată de învățare adaptivă pentru fiecare parametru, oferind astfel performanțe
mai bune în problema clasificării.
Rețelele neuronale convoluționale (CNN) formează o subdiviziune a rețelelor neuronale
artificiale, concepute în principal pentru a efectua recunoașterea obiectelor, detectarea, clasificarea
imaginilor sau alte sarcini complexe care implică lucrul cu imagini. Principiul lor de lucru constă
în atribuirea de bias-uri și ponderi care pot fi învățate pe baza tiparelor semnificative din imaginea
de intrare.
Principalul avantaj al unui CNN comparativ cu o rețea neurală clasică este că un CNN
reduce dimensiunile imaginii de intrare, păstrându-și principalele caracteristici. Cea mai simplă
arhitectură a unei rețele neuronale convoluționale este compusă din cinci tipuri de straturi, acestea
fiind: un strat de intrare, un strat de convoluție urmat imediat de un strat de activare, un strat de
pooling și o serie de straturi dense și stratul de ieșire. În figura 9 este prezentată arhitectura de bază
a unui CNN (secțiunea 5.4.1.Arhitectura).
Figure 9. Convolutional neural network architecture
Stratul de intrare al unei rețele neuronale convoluționale conține imaginea reprezentată în

format digital. Următorul strat este stratul de convoluție a cărui sarcină este extragerea
caracteristicilor importante din subregiunile primite din stratul anterior. Obiectivul principal al
acestui strat este de a crea o hartă de caracteristici bazată pe tiparele semnificative găsite în imagine.
Fiecare strat convoluțional trebuie să fie urmat de un strat de activare pentru a acorda importanță
trăsăturilor semnificative din harta caracteristicilor creată anterior.
Următorul strat este cunoscut sub numele de pooling. Acesta este similar cu stratul de
convoluție, utilizând o mască pentru a reduce dimensiunea spațială a hărții de caracteristici. Acest
lucru este necesar pentru a reduce informațiile inutile pe care le furnizează imaginea, într-un format
care necesită mai puțină putere de calcul. În plus, acest strat are și un rol important în extragerea
tiparelor invariante la poziție și rotație, crescând astfel capacitatea de generalizare a modelului.
Ieșirea stratului de pooling este interconectată într-un strat dens pentru a oferi formatul
necesar pentru alimentarea următoarelor straturi a căror funcție este de a învăța combinații neliniare
de caracteristici necesare pentru clasificare. Datele trec în continuare prin rețea și sunt procesate,
ultimul strat de activare folosind funcția SoftMax.
6|Page
Pentru a antrena un model neuronal, întregul set de date este împărțit în trei componente,
respectiv date de antrenare formate din majoritatea imaginilor, date de validare și date de testare,
care reprezintă un procent mic din cantitatea totală de imagini de intrare. Imaginile din cele trei
seturi de date trebuie să fie diferite și unice. Numărul de eșantioane din setul de date de antrenare
și calitatea acestora influențează puternic performanța modelului și capacitatea sa de generalizare.
Când modelul are performanțe ridicate pe setul de antrenare, dar nu este în măsură să
clasifice și să prezică date pe care nu a fost antrenat, apare fenomenul numit „overfitting” sau supra-
încărcare. Prezența supra-încărcării poate fi detectată prin analiza acurateții și a pierderilor din
timpul antrenării modelului. De asemenea, dacă modelul este antrenat de prea mult timp, poate
duce la „overfitting” din cauza faptului că modelul începe să învețe detalii necorespunzătoare, cum
ar fi zgomotul prezent în imaginile din setul de date. Pentru a evita acest lucru, este necesară o
preprocesare a imaginilor utilizate pentru a antrena modelul, folosind filtre speciale cu aplicații
pentru eliminarea zgomotului.
Situația contrară pentru supra-încărcare este denumită „underfitting”. Acest fenomen se
produce din diferite motive, care ar putea fi: complexitatea modelului nu este suficient de ridicată
sau modelul nu este antrenat suficient de mult. Prin urmare, “underfitting” înseamnă că modelul nu
are capacitatea de a obține performanțe bune atât pentru antrenare, cât și pentru validare.
Pentru a soluționa problema de ”overfitting”, se pot adopta soluții diferite, cum ar fi
adăugarea mai multor imagini la setul de date de antrenare, reducerea nivelului de complexitate al
modelului sau utilizarea straturilor „Dropout”. Pentru ca modelul să prezinte o capacitate ridicată
de a generaliza și de a clasifica imagini noi, este folosită în mod obișnuit tehnica numită augmentare
de date.
Această tehnică oferă un nou set de imagini derivate din setul de date original, prin aplicarea
unui set de operații menite să ofere o nouă perspectivă de vizualizare a datelor. Augmentarea
datelor este implementată printr-o serie de algoritmi orientați pe imagini, cum ar fi zoom, scalare,
rotire, forfecare, deplasare, decupare sau răsturnare, exemplu în figura 22, din secțiunea
6.2.2.Augmentarea datelor.
Figure 22. Augmented images
Un clasificator CNN poate fi creat de la zero, ceea ce înseamnă că fiecare strat din model
trebuie creat manual și configurat în funcție de sarcina pe care rețeaua este destinată să o realizeze.
Există și învățarea prin transfer a parametrilor, o tehnică care folosește cunoștințele deja dobândite
de o rețea deja antrenată și o reantrenează în continuare pentru un set de date diferit.
Procesul de reglare fină presupune antrenarea numai a straturilor superioare ale unei astfel
de rețele neuronale convoluționale predefinite. Straturile care se ocupă de extragerea
caracteristicilor nu sunt de obicei reantrenate. Arhitectura acestor modele poate fi modificată, ceea
ce înseamnă că se pot adăuga noi straturi în rețea sau că straturile deja existente pot fi eliminate sau
modificate.
2.3. Implementation
Metoda propusă implementează un clasificator bazat pe rețele neuronale convoluționale
(CNN) și va servi drept aplicație pentru agricultura de precizie. Sistemul va fi antrenat cu diferite
tipuri de frunze și va clasifica boli specifice fiecărui tip de frunză. De asemenea, modelul va putea
prezice dacă o anumită frunză este sau nu sănătoasă.
7|Page
Implementarea temei presupune utilizarea unei varietăți mari de tehnologii. În primul rând,
clasificatorul este construit folosind limbajul de programare Phyton într-o aplicație Jupyter
Notebook în mediul de dezvoltare Anaconda. Funcțiile de tip deep-learning sunt disponibile
folosind Keras (API), care este încapsulat TensorFlow. Apoi, modelul este utilizat într-o aplicație
creată cu tehnologia Android.
Modelele descrise în această lucrare au fost implementate utilizănd cele două tehnologii
menționate mai sus, respectiv Keras și TensorFlow. TensorFlow (TF) este denumit API-ul open-
source destinat furnizării unor funcții ușor de utilizat pentru construirea aplicațiilor complexe de
inteligență artificială. TF reprezintă nucleul pentru Keras, fiind denumit și centrul său de calcul.
Keras, oferă ușurința de a lucra cu rețele neuronale, implementând caracteristicile de
abstractizare ale TensorFlow creând un mediu ușor de utilizat pentru dezvoltarea mai rapidă și mai
ușoară a rețelelor neuronale. Keras conține o serie de modele de deep-learning deja construite și
antrenate care pot fi utilizate în continuare de dezvoltatori pentru a-și crea propriile aplicații.
Alte utilități python utilizate pentru implementarea acestui proiect sunt Pandas, NumPy,
MatplotLib și Scikit-Image. Pentru o mai bună gestionare și vizualizare a imaginilor din setul de
date, a fost utilizat Pandas, cunoscut pentru capacitatea sa de a manipula și interoga secvențe de
date.
Android Studio este un mediu de dezvoltare integrat utilizat pentru dezvoltarea aplicațiilor
Android concepute pentru a rula pe dispozitive mobile și oferă instrumente performante.
Modelul construit în acest proiect cu Keras și TensorFlow este convertit în continuare într-
un model TensorFlow Lite, astfel încât să poată fi utilizat pe dispozitive mobile. Clasificatorul și
funcționalitatea acestuia sunt integrate în mediul Android pentru crearea aplicației finale propuse
de această lucrare.
Setul de date original conține 54.309 de imagini și conține 14 specii de frunze, cuprinzând
4 boli cauzate de bacterii, 2 boli virale, 2 alergii la mucegai, 17 infecții fungice, o boală bazată pe
acarieni și frunze sănătoase pentru 12 din speciile de frunze incluse. Câteva exemple de imagini
din setul de date sunt reprezentate în figura 14 (secțiunea 6.2.1.Descrierea setului de date).
Figure 14. PlantVillage Dataset
Metoda propusă pentru acest proiect constă în utilizarea a două seturi de date create cu
imagini din setul de date PlantVillage segmentat. Prima metodă se bazează pe un set de date care
include toate cele 38 de clase incluse de întregul set de imagini. Pentru prelucrarea ulterioară, acest
set de date este împărțit în 3 grupuri, și anume, date de antrenare, date de validare și date de testare.
Divizarea a fost realizată pe baza unui raport 6: 2: 2, ceea ce înseamnă că 60% din datele totale au
fost utilizate pentru antrenare, 20% pentru validare și 20% pentru testare. Pentru al doilea set de
date, au fost selectate doar 10 clase din întregul set de imagini. Aceste clase sunt specifice pentru
frunzele de roșii.
După ce imaginile au fost organizate în fișiere, a fost aplicat un algoritm de augmentare a
datelor. Acest lucru este făcut pentru a preveni „overfittingul” modelelor și, de asemenea, pentru a
crește numărul de imagini din setul de date.
8|Page
Scopul acestei lucrări este de a reantrena doar straturile superioare ale modelului VGG16,
tehnică numită reglare fină. Ultimul strat al modelului de bază a fost eliminat, deoarece numărul
de noduri din strat nu corespunde cu numărul de clase din seturile de date. Prin urmare, s-a adăugat
un nou strat dens pentru fiecare model, având o dată 38 de noduri necesar pentru primul set de date
și 10 noduri pentru al doilea set de date. Arhitecturile rezultante sunt ilustrate în figura 24 din
secțiunea 6.2.3.Transfer learning.
Figure 24. Layers parameters configuration
După câteva încercări de configurare a modelului cu diferite optimizatoare și rate de

învățare, modelul final a fost implementat cu o rată de învățare mică, adică 0.0001, care se
potrivește cu dimensiuni mici ale lot-ului de date, în acest caz, 64. Pentru optimizator, a fost ales
optimizatorul Adam oferit de Keras. Metrica folosită pentru a compila modelul a fost aleasă ca
fiind acuratețea.
Performanțele modelului au fost evaluate prin vizualizarea matricei de confuzie construită
pe baza predicțiilor făcute pe imaginile de testare. Pentru ca modelul să fie inclus în aplicația
Android, acesta a fost convertit într-un model TensorFlow Lite. Un model TensorFlow Lite este
proiectat pentru a fi folosit pe dispozitive mobile. Un model TFLite rulează cu latență scăzută și cu
viteze mai rapide pe smartphone-uri.
Pentru a obține o clasificare a imaginilor pe dispozitivele Android, modelul TensorFlow
Lite creat anterior a fost importat în aplicația de pe dispozitivul mobil. Pentru ca aplicația să poată
rula pe dispozitivul mobil, trebuie să se acorde unele permisiuni. Dacă utilizatorul acordă
permisiunile necesare, se creează o nouă activitate, și anume ChooseModel. Această activitate
permite încărcarea modelului, captarea unei imagini, decuparea acesteia și trimiterea informațiilor
necesare activității următoare, care este Clasificarea. Aici imaginea este redimensionată pentru a
se potrivi cu intrarea modelului și este clasificată. Din această activitate, utilizatorul poate reveni
la activitatea ChooseModel pentru a captura o altă imagine sau există posibilitatea de a închide
aplicația.
2.4. Experimental results

Arhitecturile finale ale clasificatoarelor implementate constau în două modele secvențiale
formate din 23 de straturi. Structurile corespunzătoare respectă principiul de funcționare al unei
rețele VGG16 care conține 12 straturi convolutive, 5 straturi de colectare maximă și 3 straturi
dense. Ultimul strat dens este compus din 38 de noduri, peste care se aplică funcția de activare
Softmax. Acest strat oferă ieșirea sub forma unei distribuție de probabilitate. Cu excepția stratului
de ieșire, ambele modele au fost configurate utilizând aceleași hiperparametri.
Instruirea modelului pentru toate clasele din setul de date, adică 38 de categorii distincte,
conduce la o acuratețe de 80,87% și o pierdere de 2,874. Instruirea modelului pe o parte din setul
de date, adică 10 categorii distincte, duce la acuratețe de 87,84%. și o pierdere de 1,5867.
Aplicația Android constă în principal în patru activități, prezentate în figura 37 (secțiunea
7.2.Aplicația Android). O primă activitate, și anume ChooseModel, unde utilizatorul poate alege
ce model să folosească pentru a clasifica tipul de frunze dorit și o a doua, numită Classify unde este
afișată ieșirea clasificatorului. Pe lângă aceste activități principale, mai sunt efectuate alte două
9|Page
acțiuni, respectiv realizarea unei fotografii folosind camera foto a dispozitivului și decuparea
imaginii dobândite pentru a se potrivi cu intrarea clasificatorului.
Figure 37. Application's working principle
2.5. Conclusions
Obiectivele acestui proiect au fost studiul teoretic al metodelor de clasificare de bazate pe
deep-learning, implementarea și proiectarea unei aplicații Android pentru clasificarea de boli
caracteristice frunzelor plantelor, în agricultura de precizie.
Pentru versiunea finală a aplicației, au fost utilizate două clasificatoare CNN. Un
clasificator care oferă etichetarea pentru 38 de clase diferite de frunze și o un al doilea clasificator,
care se concentrează pe un tip specific de frunze cu bolile asociate, anume frunze de roșii. Cele
două modele sunt încorporate într-o aplicație destinată smartphone-urilor Android, cu aplicabilitate
practică în domeniul agriculturii de precizie.
Aceeași arhitectură a fost folosită pentru a antrena ambele clasificatoare. Modelul antrenat
pe întregul set de date a obținut o precizie de 80,87% în timpul unui proces de antrenare de 200 de
epoci, în timp ce modelul axat pe frunzele de roșii a atins o precizie de 87,84% în doar 150 de
epoci, după cum este prezentat în figura 31 și figura 34 din secțiunile 7.1.1. respectiv 7.1.2. unde
se regăsesc rezultatele obținute la evaluarea performanțelor celor două modele. Modelele rezultate
nu prezintă efecte ale „overfittingului” datorită selecției corespunzătoare a numărului de epoci de
antrenare și, de asemenea, datorită împărțirii corespunzătoare și augmentarea datelor. Alegerea
numărului adecvat de epoci de instruire pentru fiecare rețea a fost făcută prin încercarea de valori
diferite.
Figure 31. Assorted leaves model accuracy and loss Figure 34. Tomato leaves model - Accuracy and loss
Un dezavantaj al utilizării modelului VGG16 pentru aplicațiile android constă în

dimensiunile sale în ceea ce privește stocarea pe dispozitiv. Fiecare dintre cele două modele
construite are aproximativ 138 de milioane de parametri care duc la o aplicație cu dimensiunea de
1,08 GB.
Drept îmbunătățiri pentru viitor, performanța modelelor ar putea fi crescută printr-o
preprocesare mai precisă a datelor. Mai mult, o conexiune server-client ar putea fi realizată prin
conectarea aplicației cu o bază de date pentru stocarea noilor imagini capturate și clasificate
corespunzător, pentru actualizarea modelelor la o anumită perioadă de timp.
10 | P a g e
3. Work planning
11 | P a g e
4. State of the Art
The technology revolution has come up with a great impact over the agriculture domain,
leading to a substantial industrialization of the crop yield management. This phenomenon is
referred to as Precision Agriculture and implies the usage of high standard technologies for
administration of the temporal and spatial volatility. It aims to improve the environmental quality
and the performances related to the crop production.
The prosperity of yield is strongly related to various production attributes that suppose a
high degree of uncertainty such as the soil properties, topography, weather and natural irrigation.
These aspects are suitable to be handled with Machine Learning (ML) dependent systems due to
their capability to deal with large amount of data and to operate with non-linear objectives.
ML techniques bring the significant advantage of individually solving consistent non-linear
tasks using the amount of information enclosed within properly designed datasets. A ML based
system provides better decision-making abilities and learned actions within practically scenarios,
excluding the need of human intervention. Besides that, ML algorithms incorporate the necessary
knowledge into a system in order to deal with data-controlled decisions, making the ML techniques
to be widely applicable in the field of Precision Agriculture (PA) [1].
One of the most used ML techniques used in agriculture is referred to as Pattern
Recognition. This technique was initially used together with Remote Sensing (RS) for various
problems such as identification of the soil properties, detection of crop species or recognition and
classification of plant diseases.
One of the most important aspect in precision agriculture is identifying diseases within
plant leaves images. The majority of plant diseases are characterized through specific visual
patterns. Therefore, a ML algorithm is suitable to detect, recognize and classify the specific
characteristics of a given disease.
Image classification represents a subfield of Pattern Recognition with wide applicability
in PA. It is often used for plant pathology detection, preventing in this way the crop yield to be
negatively affected and leading to an enhancement of the farm productions in an environmentally
dependent manner.
The field of ML which deals with Image Classification is referred to as Computer Vision
(CV). Most of the CV techniques used to solve image classification problems are based on the
process of detecting and extracting local features within images. Therefore, there is a debate
focused on visualizing, characterizing, understanding and improving the features extracted from
the images. This is strongly related to the manner in which the meaningful information is extracted
from an image.
The feature extraction process can be manually realized through human intervention or in
an automatic manner, accomplished by the system itself using powerful computational backend
algorithms. The features designed manually are also called “handcrafted” and imply a trade-off
between the computational efficiency and performance. On the other hand, the problem of feature
extraction is also approached by creating complex networking systems that present the capability
to automatically detect and extract the meaningful features.
The computation process of handcrafted characteristics implies a few steps that have to be
followed to obtain the desired accuracy. First of all, in order to actually extract the right feature,
there is required to select the region of the image which contains the targeted characteristic. This
process is referred to as the extraction of the Region of interest. Once the feature is localized on the
image surface, the next step is to calculate a descriptor meant to distinguish each particular feature
from another. Building the right descriptor strongly depends on the aim of the research in which it
is used.
Furthermore, the Deep learning (DL) techniques provides feature extraction at the level of
the deep layers within neural networks. DL algorithms provides a set of features observed and
12 | P a g e
learned directly from the input image. These algorithms rely on the pyramidal approach which
states that high level features can be extracted as the network goes deeper.
Both of these approaches perform better depending on the task they are used for. They can
be used separately or there is the possibility to combine them, meaning the use of handcrafted
features for training deep learning based image classifiers. The meaningful traits are further used
to fit into a classifier which is going to perform the classification task.
The detection and classification of plant diseases can be achieved through several Artificial
Intelligence (AI) algorithms. Some examples of such AI classification methods are the logistic
regression, K-nearest neighbors (K-NN), decision tree, deep convolutional networks (CNNs) and
Support Vector Machines (SVM). These methods are often combined with different data
preprocessing techniques, i.e. handcrafted features, for the enhancement of the features
connotations. These algorithms are split in two categories, namely supervised or non-supervised
learning algorithms. These aspects will be detailed in the subsection “5.1. Image Classification”.
The decision tree algorithm represents a supervised learning technique where the nodes
referred to as decision points, the connections being associated with a feasible output of a given
node and the leaves representing the classes. This method implies node overlapping and data
overfitting, which are major disadvantages in classification systems [2].
Another supervised learning technique is represented by the SVM algorithm. This method
is widely used to classify handcrafted features. Its basic idea is to build a hyperplane that separates
the classes in an optimum manner. It is mainly used in classification problems and in statistical
learning based linear regression [3].
The deep learning algorithms represent a class of DL techniques, also referred to as an
extension of traditional machine learning techniques due to the fact that it adds complexity to the
data representation within the model. The theory behind the neural networks will be discussed in
the “5.Theoretical Fundamentals” section.
The problem of leaf disease identification and classification has always been a challenging
task within the Precision Agriculture. Various combinations of feature extraction methods and
machine learning classifiers were used in a trial-error process in order to build systems aimed to
obtain the best performance.
An example of such a research is described in [4] where a deep convolutional neural
network was used in order to perform the classification task on the PlantVillage dataset. The
architecture proposed by this research consists in a 9-layer deep CNN formed of 3 convolutional
layers, 3 pooling layers and 3 fully connected layers. The related work makes a comparison
between the results obtained using the proposed method or by using some pretrained CNN
architectures such as Alexnet, Inception-V3, ResNet and VGG16. The 9-layers neural network
reached the accuracy of 97% which is higher than the performance obtained through transfer
learning. The pretrained models accuracies had values starting from 87% for AlexNet increasing up
to 94% for the Inception-V3 model.
The approach presented in reference [4] uses strong preprocessing algorithms such as
Gamma Correction, noise insertion, image flipping and rotation transformations in order to create
augmented data for better performance result.
Another work that uses DL algorithms for leaf disease image classification is described in
reference [5]. This approach uses the same dataset as the research presented above and it relies on
the AlexNet and GoogLeNet architectures. In order to accomplish the desired task, two approaches
were taken into consideration. The first method implies the transfer learning technique while the
second one involves training from scratch of the two models. Different dataset partitioning methods
were tried with distinct types of images (color, segmented and gray-scale) for comparison reasons.
The results obtained consist in different values of accuracies with values within the range of 85.53%
for AlexNet using the training from scratch methodology with a data partitioning of 8:2 (80%
training data and 20% validation data) and 99.34% for GoogLeNet using transfer learning on the
color dataset organized through a 8:2 ratio.
13 | P a g e
Other study focused on the same dataset was made within the reference [6]. This study
approaches the plant disease detection problem using both deep learning and traditional machine
learning classifiers. Their work supposes two manners of solving the detection problem. The first
one which implies using transfer learning on ResNet50, GoogLeNet and VGG16, and the second
one which relies on using the two networks and some traditional methods for feature extraction,
implementing the classification through SVM and K-NN algorithms. A comparative study was
made in order to test the performance of different feature extraction methods (traditional and deep
learning) and ML or DL classifiers.
The deep learning approach within the described study reached the higher accuracy value
for the VGG16 model as compared to the other networks. Furthermore, the research also presents
the results obtained for traditional machine learning classification models using manually extracted
features. Examples of the features extracted are the color feature, Histogram of Oriented gradients
(HOG), Local Binary Pattern (LBP) and Gamma based Feature Extraction (GFE).
In the reference [7] is presented another SVM application for a particular crop family, i.e.
potato leaves, that uses image segmentation for detection purposes. The features extraction method
used in the presented study relies on 10 characteristics specific for each image within the dataset.
Examples of such features are Gray Level Co-occurrence Matrix (GLCM) and the color plane
histogram. The GLCM is used to describe the texture of the image and provides representative
attributes such as correlation, energy, contrast, and homogeneity. From the color histogram were
extracted features like the mean, standard deviation, skew, energy and entropy.
The results of the previously presented study led to a traditional machine learning classifier
with the highest accuracy of 95%.
The current work intends to implement a deep learning-based image classifier for precision
agriculture. Two methods are proposed to be implemented. The first one, which consists in a CNN
classifier focused on the plant diseases present in the PlantVillage dataset, and the second one
which involves the classification only for the diseases specific to the tomato crop.
14 | P a g e
5. Theoretical Fundamentals
5.1. Image Classification

Computer vision is one of the most robust forms of the Artificial Intelligence field of
science, whose goal is to reproduce the complex visual system of the human beings by giving to a
computer the necessary “knowledge” to identify, to recognize, and to process specific objects from
images or videos. It has at its basis the pattern recognition, a process of identifying the meaningful
information within an image, known as a pattern. The applications derived from computer vision
lead towards a wide variety of fields of research, such as object detection, identification,
classification, and recognition [8].
Image classification further exploits the functionalities of computer vision techniques, to
associate labels to images, which are described through specific patterns, to achieve the ability to
classify a given image to its belonging class. It relies on two principles, namely descriptive analysis,
and predictive analysis, which represent the two main characteristics of an image classification
system.
To build such a system, it is necessary to define a coherent numerical model of the reality
which is intended to be observed. This model is used to interpret the meaningful traits within the
data, followed by the validation of the seized hypotheses by comparing the outcome of the
descriptive model with the real output. Furthermore, the model must have the ability to predict
which means it has to be able to control an unknown entity and to assign it to a specific class. This
is attained through finding an appropriate function which is meant to minimize the difference
between the expected outcome and the output offered by the model [9].
There are two ways of realizing an image classification system, one following the classical
approach where there are used computer vision techniques to define the characteristic descriptors,
and the second one, which makes use of deep learning techniques.
Figure 1.Traditional and deep learning approaches of image classification
As it is shown in Figure 1, the main difference between the two approaches is illustrated
through the fact that in classical image classification, the first step in building a model, is to
manually extract the essential characteristics from the input data. This is a crucial phase of the
classification mechanism whereas the performance of a classifier is strongly dependent on the
features used as input. The traits of an image can be related to different properties such as texture,
color, shapes, edges, corners, or even more complex attributes. For efficient extraction of features,
first, it is needed to identify the interest region within the image, followed by the actual extraction.
15 | P a g e
A feature descriptor is a method through which unique attributes are collected and they have
similar properties if they describe two related interest points from two different images. There exist
many types of feature descriptors, such as, “scale-invariant feature transform (SIFT)”,” Speeded
Up Robust features (SURF)”, “Histogram of oriented gradients (HOG)”, presented in [10].
Deep learning is a division of a wider science branch, so-called Machine Learning (ML)
whose purpose is to emulate the human brain’s working principle. Its basis consists of the ability
of a computer to reproduce the neurobiological functions of the neurons inside the brain, and
through structured machine learning algorithms to simulate the human intellect. As mentioned in
[11] a ML system must be capable to generalize, respective to recognize convoluted information
patterns that are inter-correlated through similarity. Such a system aims to recognize an object in
distinct situations, perspectives, colors, sizes, and over a diversity of various backgrounds.
The principle of deep learning techniques is to teach a computer how to filter a given input
through multiple hidden layers composed of units called neurons, in order to predict and label the
information. Based on the way the information is “filtered” by the human brain, deep learning is
often called deep neural learning, and it is known under the name of deep neural networks. In order
to train a deep neural network (DNN), a considerable supply of labeled data and neural architectures
is required, and its purpose is to determine and recognize features precisely from the input set of
data without the need for manual extraction of meaningful features [12].
The algorithms handled in machine learning can be split into three categories based on the
way they are learning. Therefore, there are Supervised, Semi-supervised, and unsupervised
learning. Supervised learning is characterized by the fact that de input data is properly labeled and
has a very well-known result. By training a supervised system, the model is adapted during the
training process in which predictions are required to be made and in case of errored predictions,
the model must be fitted. The training process is continuous and it stops when a desired level of
accuracy is obtained.
Unsupervised learning consists of using non-labeled data that means the input data hasn’t a
well-known result. The purpose of this type of system is to classify data based on similarity patterns
or to extract generic rules and through mathematical procedures, to consistently diminish the
redundancy. Semi-supervised deep learning methods are known for using as the input data, a
mixture of labeled/unlabeled samples. In this case, the model must gain the “knowledge” of the
data structures to organize the input data so that it can make the desired predictions [13].
Most of the ML is following the supervised learning method, which means that there is
needed a preprocessing phase of the input data. In this stage, the user has to properly extract the
most significant features from the raw data. This process is referred to as feature extraction and it
has to be done manually by the user. The success of a learning algorithm is conditioned by this
preprocessing stage, therefore the features used in the training process have to be meaningful and
to concretely describe the data samples.
On the other hand, deep learning methods imply the unsupervised learning technique. The
user is no more compelled to realize the manual extraction of the specific characteristics
representing the training data. This type of learning is based on the extraction of essential elements
from input data by the program itself, with no longer need for supervision. Unsupervised learning
provides higher processing speeds and most of the time, better accuracy.
Another important aspect to keep in mind is that the resulting accuracy of a training process
is strongly related to the dataset used to feed the model. A well-built and structured database
containing clear samples of data that provides a wide variety of augmented data leads to higher
performances of the model. Using a proper dataset has a critical impact on the ability of the system
to generalize.
Artificial Neural Networks (ANNs), under the name of which Deep learning is known as,
represent a complex type of ML technique that is most often used whenever is needed to train a
very large amount of data [14].
16 | P a g e
5.2. Artificial Neural Networks
5.2.1. Architecture
Deep learning attempts to reproduce the layered activity of the neurons inside the human
brain. As an analogy with the biological neural system of the human beings, the neurons themselves
are not able to realize convoluted tasks, but when there are lots of them, the network obtained by
interconnecting all the neurons results in a very complex system with high computational capability
and impressive performances [15].
An ANN has at its basis the neural units known as neurons, which are further structured
following a layered approach. A neural network usually consists of three categories of layers, as it
follows: an input layer, followed by one or more hidden layers, and in the end, an output layer. The
data based on which the model trains are fed into the input layer, then it flows through a succession
of unseen layers until it reaches the output layer of the network which provides the desired signals
to properly classify the input data. The output of the top (output) layer is providing the predictions.
The data flow through the network suffers distinct transformations depending on the layer it passes
through. The complexity of the transformations applied to the data is increasing once it goes deeper
into the network.
Figure 2. Basic Architecture of an Artificial Neural Network
As we can see in Figure 2 the number of neurons from the input layer is three and each node
corresponds to one component of input data. The number of nodes from a hidden layer can vary
and its value is chosen depending on how complex the network is requested to be, and in the output
layer, there are as many neurons as the number of desired outputs. In this case, the number of
artificial neurons from the top layer is three, which means that there are only three possible output
classes for each input data that has passed through the layered structure.
5.2.2. Data flow through neural networks
Based on the data flow through the network there exist two different types of ANNs:
feedback, respective feedforward neural networks. The feedforward networks are based on the
classical architecture composed out of an input layer, unseen layers, and an output layer. The data
flows in only one direction, from the input layer towards the output one. Excepting the first layer,
the input of the neurons from the rest of the layers is a weighted sum of the outputs of the
17 | P a g e
interconnected neurons from the previous layers, defining in this way the forward propagation of
processed data. Feedback neural networks, also known as Recurrent Neural Networks (RNNs)
differ through the fact that they incorporate feedback paths. In a feedback network, some signals
can flow in different directions by implementing feedback loops. These loops make RNN a
dynamic and non-linear neural system that permanently changes its parameters until it finds an
equilibrium point. In [16] is presented a detailed comparison between these two categories of
networks.
At the input level, the information is fed into the network following a numerical pattern.
Therefore, each node is represented through a given number which is referred to as the activation
value. The greater the value is, the higher the activation becomes, which means that the neuron has
a greater chance to output a non-zero value.
Figure 3. Weights representation in ANNs
As it is shown in Figure 3, the interconnection between two neurons is called “weight”, also
known as the connection strength. These values are used in order to compute the weighted sum of
the outputs shaped by each particular neuron in a specific layer. The obtained result is further
passed through an activation function at the neuron level which decides if the signal passes or not
through that specific node. In this way, the activation value flows through all the layers of the
network until it enters the final layer, which provides us as output, a value that can be easily
interpreted. The predicted value is further compared with the expected output value and gives us
the cost function, also referred to as the loss function. This property describes the performances of
a system, the goal being to minimize the loss as much as possible. A deep neural model with
minimum loss provides many accurate predictions, therefore, it has increased performances. To
minimize the cost, specific information is sent back through the network. Based on the
backpropagation algorithm, new values are assigned to the weights and the entire process is
repeated. [15]
𝑁𝑒𝑢𝑟𝑜𝑛′ 𝑠 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑖𝑛𝑝𝑢𝑡𝑠) (1)
This algorithm defines the learning process. Each particular neuron from a given layer is
connected to all the nodes from the very next layer and every connection between two nodes is
uniquely characterized through its weight. By adapting the weights to an optimal value to obtain
the minimum loss possible, the model learns the common features from each class. Therefore, a
deep learning model is also known as a method that implies learning by example.
18 | P a g e
5.2.3. Activation functions
Each specific neuron of a given layer takes as input numeric values provided either by the
features extracted from the training data, in the case of the input layer, or outputs of the nodes from
the preceding layer. In order for the signals to flow through the network, it passes over uniquely
described weighted connections. When the signal enters a neuron, for that node to fire or not, an
activation function is needed. This function determines whether a specific neuron outputs a value
or not. In the case when a value is given as output, it is mapped through this function, as it is shown
in Figure 4.
Figure 4. Neuron's working principle
There are different types of activation functions, but mainly they split into two major
categories, non-linear or linear, depending on which mathematical expression is used.
Suppose a sequence a1, a2, a3, …, an representing all the incoming values for a given node.
Each value has its own weight associated meaning that w1, w2, w3, …, wn forms the sequence of
weights. The weighted sum of all the incoming inputs represents the linear output of the node and it is
defined as:
𝑦𝑙𝑖𝑛𝑒𝑎𝑟 = 𝑤1 ∙ 𝑎1 + 𝑤2 ∙ 𝑎2 + ⋯ + 𝑤𝑛 ∙ 𝑎𝑛 + 𝑏 (2)
In equation (2) the number “b” is also known as the bias value and it is a supplementary parameter
whose role is to fine-tune the output after applying the weights. In order for the network to be able to
recognize many complex patterns these linear outputs have to be converted into non-linear ones:
𝑦𝑛𝑜𝑛−𝑙𝑖𝑛𝑒𝑎𝑟 = 𝑎𝑓(𝑤1 ∙ 𝑎1 + 𝑤2 ∙ 𝑎2 + ⋯ + 𝑤𝑛 ∙ 𝑎𝑛 + 𝑏) (3)
The usage of the activation function depends on the layer on which it is applied, therefore, if the
activation is done over a hidden layer it maps the output into a non-linear form. In the case of the output
layer, the activation function delivers the predictions. [17]
There are various functions used for activation, but only a few of them are commonly used to
perform tasks inside a neural network.
The simplest way to describe such a function is by a threshold function, also referred to as
a binary activation due to its functionality.
This is also known as the step function and it maps the output of the nude eighter as a 0 if
the weighted sum of the inputs is lower than a given threshold, or to a 1 if the sum is greater. [15]
Another commonly used type of activation function is the Sigmoid Function, whose
formula is presented in the following relation:
1
𝑎𝑓(𝑥) = (4)
1 + e−𝑥
This function derives from the linear threshold function and it represents a gradual smooth
progression with values going from 0 to 1, therefore it is very popular in linear regression [17]. The
sigmoid function is often used as an activation pattern at the output layer where it provides the
statistical probabilities of the resulted predictions.
19 | P a g e
Figure 5. Sigmoid Function
Another heavily used activation function is the “Hyperbolic Tangent” function, which
works like the sigmoid function, but instead of providing an output with values between 0 and 1, it
increases the possible output range to (-1, 1). Since the output can take values below 0, using the
tangent hyperbolic as an activation function in a neural network can improve the results obtained
through a sigmoid function by preventing the network to get stuck when there are a lot of heavily
negative values reaching the input of a node.
One of the regularly used activation functions for hidden layers in the Neural Network is
the “Rectified Linear Unit” also known as ReLU function. From the biological point of view, this
function emulates most accurately, the actual process of decision making inside the neurons from
the human brain. ReLU does not require pre-normalization of the input data and it outputs a very
clear value of 0 for any negative input and a percentage of the prediction, in the case then the input
is a value greater than 0. [18]
Figure 6. ReLU function
For multi-classification models, in the output layer, it is very often used the SoftMax
activation function because its output provides the distribution of probabilities of an array of real
values, where the true class will have the highest probability. SoftMax is the Sigmoid
correspondent for a multi-class matter of classification. Its mathematical form is described in the
following relationship [19]:
e (𝑥𝑖 )
𝑎𝑓(𝑥𝑖 ) = (5)
∑𝑗 e (𝑥𝑗 )
5.3. Learning algorithm

When the neural network is created, the values of the weights are randomly initiated. These
values are used during the first epoch of training. An epoch represents one complete pass through
the whole input data set. During one epoch, at each step, the data is fed to the first layer grouped
into smaller chunks called batches. Then it flows through the network and in the end a prediction
is made. The difference between the expected value and the actual predicted value results in a cost
function, also known as the loss function.
The main goal of training a deep neural network is to find the minimum of this function.
Therefore, to achieve this goal a gradient has to be calculated so that the weights can be adjusted
to proper values in order to increase the accuracy of a model.
20 | P a g e
5.3.1. Gradient calculation. Cost function
Suppose an architecture of an artificial neural network with N layers where y represents the
desired output value and C is the cost function of the system. There is a wide variety of cost
functions, but a very often used function is related below:
𝑠
1 2
𝐶 = ∑(𝑦𝑘 − 𝑦
𝑠
̂)
𝑘 (6)
𝑘=1
This function takes into consists into summing up the squares of the differences between
the expected output value and the output value provided by the network for each sample “s”. The
gradient is used to find the dependency between the network’s cost function and the network’s
parameters like activation values, weights, and biases. For each interconnection between any
neuron from the last layer and the neighboring layer we have the following relationships:
Consider by the abstraction that “a” represents the activation for all the nodes within a given
layer and “w”, the weights associated with the connections between any two neurons from two
adjoining layers. The corresponding cost for the last layer of the network can be defined as it
follows [20]:
𝑥 (𝑁) = 𝑤 (𝑁) ∙ 𝑎 + 𝑏𝑖𝑎𝑠 (7)

𝑎(𝑁) = 𝑎𝑓(𝑥 (𝑁) ) (8)
2
𝐶𝑜𝑠𝑡 = (𝑎(𝐿) − 𝑦) (9)
Therefore, to compute the gradient, the partial derivative of the cost function is needed:
𝜕𝐶𝑜𝑠𝑡 𝜕𝐶𝑜𝑠𝑡 𝜕𝑎(𝑁) 𝜕𝑥(𝑁)

= ∙ ∙ = 2 ∙ (𝑎(𝑁) − 𝑦) ∙ 𝑎𝑓 ′ (𝑥 (𝑁) ) ∙ 𝑎 (𝑁−1) (10)
𝜕𝑤 (𝑁) 𝜕𝑎(𝑁) 𝜕𝑥(𝑁) 𝜕𝑤(𝑁)
In order to optimize the system loss, it is required to compute the contribution of each of
the parameters (weights, activations, and biases) over the cost function:
𝜕𝐶𝑜𝑠𝑡 𝜕𝐶𝑜𝑠𝑡 𝜕𝑎(𝑁) 𝜕𝑥 (𝑁)

= ∙ ∙
𝜕𝑤 (𝑁) 𝜕𝑎(𝑁) 𝜕𝑥 (𝑁) 𝜕𝑤 (𝑁)
(𝑁)
𝜕𝐶𝑜𝑠𝑡 𝜕𝐶𝑜𝑠𝑡 𝜕𝑎 𝜕𝑥 (𝑁)
= ∙ ∙ (11)
𝜕𝑎 (𝑁−1) 𝜕𝑎 (𝑁) 𝜕𝑥 (𝑁) 𝜕𝑎 (𝑁−1)
𝜕𝐶𝑜𝑠𝑡 𝜕𝐶𝑜𝑠𝑡 𝜕𝑎(𝑁) 𝜕𝑥 (𝑁)
{𝜕𝑏𝑖𝑎𝑠 (𝑁) 𝜕𝑎 (𝑁) ∙ 𝜕𝑥 (𝑁) ∙ 𝜕𝑏𝑖𝑎𝑠 (𝑁)
=
The dimensions which can be tuned during the learning phase are the weights and the biases
values. Thus, to complete this calculus all the weights and biases particular contributions are settled
into an array, whose dimension is the same as the total number of biases and weights. The gradient
is denoted using the symbol ∇ and it is also referred to as the “gradient vector”. Its expression is
the following:
𝜕𝐶𝑜𝑠𝑡
𝜕𝑤1
𝜕𝑏𝑖𝑎𝑠1
−∇𝐶𝑜𝑠𝑡(𝑤1 , 𝑏𝑖𝑎𝑠1 , 𝑤2 , 𝑏𝑖𝑎𝑠2 , … , 𝑤𝑀 , 𝑏𝑖𝑎𝑠𝑀 ) = … (12)
𝜕𝑀
[𝜕𝑏𝑖𝑎𝑠𝑀 ]
21 | P a g e
The gradient vector is computed at the batch level. For each step during a batch
examination, new weights values are computed, and also, new values for biases. The numerical
mean of all the biases and weights represents the output provided by the gradient vector. Therefore,
by knowing the output value of the gradient vector, the cost function is gradually adapted whenever
a new data batch is evaluated.
5.3.2. Learning Rate

The minimization of the loss characteristic of the model is not straight forward. It is done
after a certain number of steps. The size of a step determines how much percentage of error is
corrected after one batch examination. This aspect is strongly related to a hyperparameter of a deep
learning model, referred to as the “learning rate” (LR).
The contribution of each weight over the cost function, clearly described by its specific
gradient, is further multiplied with the value of the “learning rate”. Usually, the values chosen for
LR are pretty small, therefore after one step, only a small amount of error is corrected. This
hyperparameter has a major impact on the speed of the learning process [21]. The error correction
is applied to each particular weight and bias in the network as it follows:
𝑤𝑙𝑎𝑦𝑒𝑟_𝑛𝑒𝑤 = 𝑤𝑙𝑎𝑦𝑒𝑟_𝑜𝑙𝑑 − 𝐿𝑅 ∙ (13)
𝜕𝑤𝑙𝑎𝑦𝑒𝑟_𝑜𝑙𝑑
𝑏𝑖𝑎𝑠𝑙𝑎𝑦𝑒𝑟_𝑛𝑒𝑤 = 𝑏𝑖𝑎𝑠𝑙𝑎𝑦𝑒𝑟_𝑜𝑙𝑑 − 𝐿𝑅 ∙ (14)
𝜕𝑏𝑖𝑎𝑠𝑙𝑎𝑦𝑒𝑟_𝑜𝑙𝑑
The learning rate hyperparameter can have a major impact on the learning process. If the
value is too high, it triggers the overshooting phenomenon, which consists of shoot passing over
the minimum value of the cost function due to the larger size of the correction step given by LR.
On the other hand, a very small value of the learning rate can lead to longer processing time, and
in some situations, it can manage the network to get stuck.
5.3.3. Stochastic gradient descendent – SGD optimizer

The gradient descendent represents an algorithm used for optimization purposes in artificial
neural networks. Its mathematical objective is to find the minimum value of a given function. In
order for the gradient descendent to work, the function has to be convex, as shown in Figure 7.
Figure 7. Gradient descendent
This algorithm is based on the slope calculation and depending on its value, the point on
the graph moves from left to the right if the slope is negative, respective from the right to the left
22 | P a g e
if the slope is positive. During each iteration, the point moves down the graph with a step equal to
the learning rate, until the minimum value is achieved. [22]
In an artificial neural network, the gradient descendent is computed by considering and
adjusting the weights from all the rows belonging to a network. This method is also known as batch
gradient descendent. In the case when the loss function is not convex, as in Figure 8, the gradient
descendent algorithm is only able to find the first local minimum. In order to identify the global
minimum of the cost function, it is used the stochastic gradient descendent.
Figure 8. Stochastic gradient descendent
This method implies adjusting weights row by row within a neural network. Instead of using
a large batch, the algorithm splits the data in mini-batches composed of random samples, hence the
name “Stochastic”, and it updates the biases and weights based upon the mean value of the gradient
within the given mini-batch. The values of the weights are randomly initiated for each batch and
regarding the biases, they can be initiated in various ways, but usually, they are set to 0 at the
beginning of the training process. [20]
5.3.4. Adaptive moment estimation – Adam optimizer

The gradient descend involves taking small steps iteratively until it is reached the minimum
of the loss function, for the proper values of the weights. The problem with this simplest optimizer
is that each weight is updated once after passing through the entire dataset. It means that due to the
typically large values for the gradient, the error corresponding to each weight is also corrected with
a large value. Thus, the weights might hover over their optimal values without even being able to
reach it.
As a solution for this, the SGD optimizer updates the weights more frequently, I.e. at data
sample level. This could generate noisy jumps that are getting further from the optimal values.
Therefore, it was introduced the Mini-batch SGD optimizer that update the weights after a few
samples pass through the network.
Another way to solve the SGD problem is to use momentum property. By adding momentum
to the optimizer, the model can learn faster paying less attention to the data samples that cause the
weights to get away from their optimal value. This aspect could cause the weights to pass over the
optimal values if the momentum increases too high. In order to combat this problem, acceleration
is added to the SGD momentum optimizer. In this, the amount of the error correction applying to
the weights is better managed.
These properties, i.e. the momentum and the acceleration are fixed for each parameter due
to the fixed learning rate.
Adam optimizer allows an adaptive learning rate for every parameter. The adapting learning
rate optimizers can learn more in one direction than another, being widely used in neural networks.
23 | P a g e
5.4. Convolutional Neural Networks
Convolutional neural networks (CNNs) form a subdivision of deep learning artificial neural
networks mainly designed to perform object recognition, detection, image classification, or other
complex tasks that imply working with images. Its working principle consists in assigning
meaningful learnable biases and weights to the specific aspects from the input image, also being
able to make a distinction between different objects.
Despite traditional neural networks, CNNs do not require as much preprocessing of the
images. The filter used to extract the most important characteristics of an image are not manually
engineered as in the case of the primitive artificial neural networks, the convolutional networks
having the capability to automatically determine these filters. The main advantage of a CNN over
a classical neural network is that the ConvNet reduces the dimensions of the input image, keeping
its principal characteristics.
5.4.1. Architecture
The simplest architecture of a convolutional neural network is composed of five types of
layers, being: an input layer, a convolution layer, immediately followed by an activation layer, a
pooling layer, and a series of fully-connected layers among which the output layer.
Figure 9. Convolutional neural network architecture
The input layer of a convolutional neural network contains the image represented as digital
data shaped into a single column. The given images are split into subregions and each subsection
is further sent to a particular neuron from the very next layer. It is important for the images which
feed the input layer, to be preprocessed in order to be scaled and resized to the same dimensions.
The next layer is the convolution layer whose task is to extract the powerful features from
the subareas received from the previous layer. Its operational principle is to perform the
mathematical process called convolution, by applying a filtering kernel on the subsection received
as input. The kernel and the sample of the image must have the same dimensions, as far as the same
depth, to conserve the spatial relationship among pixels. The main objective of this layer is to create
a feature map that relied on the meaningful characteristics found within the image. Each
convolutional layer needs to have on top of it, an activation layer in order to grant importance to
the significant traits within the previously created feature map. The first convolutional layer in a
convolutional neural network extracts low-level characteristics, referred to as edges, corners,
24 | P a g e
colors, and gradient orientation. As it goes deeper into the network, high-level characteristics are
extracted through convolution layers, providing to the network a superior understanding of the data
processed.
The next layer is known as the pooling layer. Similarly, as the convolution layer, it uses a
kernel in order to reduce the spatial size of the feature map. This is necessary in order to cut down
unnecessary information delivering the image into a format that requires less computational power.
Additionally, this layer also has an important role in extracting position and rotation invariant
features, increasing in this way the generalization capability of the model.
The output of the pooling layer is flattened into a fully connected layer in order to provide
the necessary format to feed the next layers whose function is to learn non-linear compositions of
powerful characteristics for classification purposes. The data further flows through the network and
it is processed as described in the 5.3 Learning algorithm section above, where the last activation
layer uses the SoftMax function. [23]
5.4.2. Convolution algorithm

Convolution represents a mathematical algorithm used in order to establish the relationship
between two different signals of the same dimensions. In image processing, the 2D-convolution is
a very widely used technique and it performs the calculus on two inputs, representing the image
desired to be analyzed and the kernel matrix, also known as the filter. Depending on the kernel
used, the convolution performs different operations, for instance, edge detection, sharpening,
blurring, noise reduction or even more complex applications, as presented in [24].
∞ ∞
𝑦(𝑘, 𝑙) = ∑ ∑ 𝑚(𝑝, 𝑞) ∙ 𝑥(𝑘 − 𝑝, 𝑙 − 𝑞) (15)

𝑝=−∞ 𝑞=−∞
The mathematical expression of the convolution process is presented in equation (15),

where y(k, l) represents the output obtained by convolving the kernel m(p,q) and the image x(k,l):
In order to compute the convolution of an image, the first step implied is to choose the
proper kernel depending on the task desired to be accomplished. The kernel is further flipped on
both vertical and horizontal directions, as shown in Figure 10:
Figure 10. Example of kernel used for convolution
The kernel is further sliding through the image performing the dot product between the
elements of the image, respective the elements of the filter. The kernels used in neural networks
for implementing the convolutional layers functionalities have odd dimensions so that they contain
a reference point, referred to as the center of the matrix. The computation algorithm consists in
placing the kernel’s reference point over the first pixel and then sliding the filter through the image
on the horizontal axis to the right. When the entire row is traversed, the kernel hops down to the
next row and the process is repeated until the filter passes through the entire image, as it can be
seen in Figure 11. The step by which the filter moves within the image is called stride.
25 | P a g e
The result of the convolution process can reduce the dimension of the output image,
depending on the padding technique used. In the case when the filter does not properly fit the image,
the picture is padded. There are two types of padding algorithms, namely, “Same padding” also
known as zero-padding, when the picture is padded with zero for the filter to fit, and “Valid
padding” when the parts of the image which do not fit are dropped out. [23]
Figure 11. Convolution algorithm
5.4.3. Pooling operation
As it is mentioned in section the “5.4.1 Architecture”, a convolution layer within a CNN is

followed by a pooling layer. Its main objective is to trim the number of trainable parameters of a
neural network, in the cases when the processed data is too large. The pooling process, also referred
to as down-sampling or sub-sampling, keeps the most compelling information within a trait map
while its dimension is considerably reduced.
There are three categories of pooling layers, namely, sum pooling, average pooling, and
max pooling. This is similar to the convolution operation with no padding and a stride equal with
the size of the used mask. The justification for this is based on the dimensionality reduction of the
characteristics map. They perform in similar ways, the difference being the mathematical rule used
to obtain the output value after executing the pooling operation.
The max-pooling method consists of selecting the largest value from the region of the image
covered by the mask, which is set as the value of the newly obtained pixel. As in the convolution
process, the mask travels within the feature map from left to the right on horizontal, respective
downward, in the vertical direction. Based on the same principle, the average pooling performs the
arithmetic mean of the pixel’s intensities within the computational region wrapped by the mask.
The obtained values are placed in a new feature map of reduced dimensions. Furthermore, the sum-
pooling algorithm replaces the averaging process and it simply sums up the values overlapped by
the mask, obtaining the new pixel’s intensity value. [23]
Figure 12. Types of pooling

26 | P a g e
5.5. Overfitting. Underfitting
In order to train a neural model, the entire dataset is split into three components,
respectively, training data formed of the majority of the images, validation data, and testing data
which represent a small percentage from the total amount of input pictures. The images from the
three sets of data must be different and unique. The number of samples from the training dataset
and their quality strongly influence the performance of the model and its capacity to generalize.
When the model has increased performances on the training set, but it is not able to classify
and predict the data on which it was not trained, it appears the phenomenon called “overfitting”.
The presence of overfitting can be spotted by monitoring the metrics during the training process,
such as validation/training accuracy and loss. Also, if the model is trained for too long, it can lead
to overfitting since the model starts to learn inappropriate details, such as the noise present in the
images from the dataset. To avoid this, it is needed to preprocess the images used to train the model,
using special filters with applications for noise removal.
The opposite situation for overfitting is referred to as “underfitting”. This phenomenon
occurs due to different reasons, which could be: the complexity of the model is not high enough,
the model is not trained long enough or it is over regularized. Therefore, underfitting means that
the model has not the capacity to perform well both on training and the validation data.
In order to solve the overfitting problem, there can be adopted different solutions, such as
adding more images to the training dataset, reducing the complexity level of the model, or using
dropout layers.
Increasing the number of samples from the training dataset, it is also increasing the
diversity of the images on which the model is trained. The performance of a convolutional neural
network is proportional to the number of training samples, meaning that the classifier performs as
well as the dataset contains a larger variety of images. If there is no possibility to supply the input
data with new images, there is commonly used a technique called data augmentation. This method
implies the creation of new samples based on the already existing images by changing some of
their spatial properties.
Another way to deal with the overfitting is to reduce the complexity level of the model, by
removing some of the component layers, or to decrease the number of nodes within the hidden
layers. This can lead to a better generalization capability of the neural network. Furthermore, a
commonly used technique that can reduce the overfitting effect is adding dropout layers to the
network. Using such a layer, there is a specific probability for the network to neglect some
subgroups of neurons within the layers, restricting in this way the nodes from the given subset to
provide any output with impact on the prediction.
In the case of underfitting, when the model is not capable to perform on the training data, it
helps if the complexity of the model is increased. In order to realize this, there is used a larger
number of neurons that actively participate within the training process. Moreover, the complexity
of the network increases proportionally with its depth, so another solution to deal with the
underfitting is to add more layers.
The underfitting problem can further be fixed if the number of particular features used to
feed the network is increased. The usage of combinations of high-level characteristics extracted
from the image strongly improves the performances. If the network contains dropouts, another way
to obliterate the underfitting consists in decreasing the percentage of the neurons which are not
actively participating in the training process. [25]
The techniques used to deal with the underfitting are the opposite of those used for solving
the overfitting problem. It means that these two aspects must be carefully managed since there is a
great possibility that the solution for one issue can lead to the occurrence of the other one.
Therefore, it is necessary to make a tradeoff in order to find the proper configuration which avoids
both issues.
Underfitting and overfitting are two major obstacles in deep learning. Their impact on the
model’s performance can be devastating.
27 | P a g e
5.5.1. Data augmentation
A convolutional neural network learns based on the examples which he sees. Therefore, if
it is trained on a specific type of image with a predefined spatial distribution of the objects, it is not
capable to classify images that it has never seen. This problem is referred to as low generalization
capability. In order for the model to have the ability to generalize and to classify new images, it is
commonly used the technique called data augmentation.
This technique provides a new set of images derived from the original dataset, by applying
a set of operations meant to give a new viewing perspective of the image data. Data augmentation
is implemented through a series of image-oriented algorithms like zooming, scaling, rotation,
shearing, shifting, crop, or flipping.
These operations are particularly applied depending on the nature of the images desired to
be classified, but most of the time, combinations of them are used for better performances [26].
This preprocessing of the input data is performed before the learning takes place. The basic idea is
that the model doesn’t process the same image multiple times during training.
An important aspect related to data augmentation is interpolation. When the image is
suffering modifications, which imply dimensionality changes such as rotation or shifting, there
remain vacant background regions. These regions are filled with proper values through the
interpolation algorithm. There are two categories of interpolation, namely, adaptive and non-
adaptive techniques. They are designed to fill the vacant pixels with specific values following
predefined rules as described in [27].
5.5.2. Fine-tuning. Transfer learning

Transfer learning is a technique that makes use of the already gained knowledge of a pre-
trained network and further trains it for a new different set of data. Strongly related to this is the
process known as fine-tuning.
A convolutional neural network classifier can be built from scratch, meaning that each layer
within the model must be manually created and configured depending on the task that the network
is meant to realize. This could be considered a disadvantage because it is time-consuming and it
also implies hard hand-work to modify the architecture and the specific hyperparameters according
to the achieved results.
As a solution for this, there have been implemented basic CNN models, which can be found
at [28]. These models are convolutional neural networks with already trained parameters and
predefined architectures. They are meant to be further used to perform different tasks than the one
they have been created for.
The predefined CNN models offer the possibility to train the entire network, meaning that
all the weights specific to each layer are adjusted when the learning takes place. Furthermore, the
pre-trained parameters can be kept, a process referred to as “freezing weights”. Transfer learning
allows to select the layers desired to be retrained, and also it allows to build on top of already
existing models.
The fine-tuning process implies training only the top layers of such a predefined
convolutional neural network, also known as the classification stage. The layers dealing with the
feature extraction are not usually retrained. The architecture of these models can be modified,
meaning that new layers can be added to the network or already existing layers can be removed or
modified.
This procedure represents an advantage in working with neural networks so that building a
CNN classifier based upon a predefined model is less time consuming [29].
28 | P a g e
6. Implementation
The proposed method implements a classifier based on the Convolutional Neural Network
(CNN) machine learning technique and will serve as an application for precision agriculture. The
system will be trained with different types of leaves and it will make a distinction between
categories. Also, the model will be able to predict whether or not a specific leaf is
healthy/unhealthy.
Different techniques were initially approached, involving training from scratch some
personal architectures and transfer learning over the InceptionV3, Xception, MobileNet, and
VGG16/19 models. As a result of the theoretical analysis and preliminary experimental validation,
the core of the Android application has been chosen the fine-tuned VGG16 architecture.
The implementation of the approached subject involves using a large variety of
technologies. Firstly, the classifier is build using Phyton programming language in a Jupyter
Notebook App installed on the Anaconda development tool. Deep learning functions are available
using Keras Application Programming Interface (API) which is encapsulated in TensorFlow API.
Then, the model is transferred to an application created with Android Studio technology.
6.1. Technologies
6.1.1. Anaconda environment. Jupyter Notebook

Anaconda is a computing platform based on Python and R programming languages,
intended for scientific and data processing purposes. It is a free-to-use Python distribution meant
to simplify the development of applications based on Machine Learning, Predictive Analysis,
Large-Scale Data Visualization, Neural networks, and Bias mitigation. It encloses a series of
automatically installed packages, and allow the installation of new Python-related packages
designed for ML projects and data science applications.
The anaconda environment has a Graphical User Interface (GUI), also known as Anaconda
Navigator. The navigator grants access to a series of Integrated development environments
(IDEs)/Editors such as JupyterLab, Jupyter Notebook, QtConsole, Spyder, Glue, Orange, RStudio,
Visual Studio Code, presented in. It represents an effortless way of working and managing
packages and environments.
Anaconda provides support for different Python versions and its specific packages.
Therefore, it offers the possibility to work with environments. An environment has a single Python
version and allows the installation of particular tools. The anaconda navigator encloses the
necessary tools to work with environments, so that they can be created, updated, cloned, shared,
restored, or deleted. They include pip [30] functionalities, which is the python package installer.
The Jupyter Notebook application is free to use IDE provided by Anaconda distribution. It
allows us to run and edit web documents, referred to as notebooks, over a web application. It offers
files that can be run as executables further used in other applications or files that can contain both
code and human-readable text. The code within a notebook is executed with the help of ipython
kernel, referenced in [31].
Anaconda is often used for machine learning applications because it provides a large variety
of useful tools specially designed for data analysis. The necessary packages used in this project,
are Pandas, NumPy, Keras, TensorFlow/TensorFlow Lite, MatplotLib, and Scikit-image.
The models described in this paper have been implemented on top of the two Application
Programming Interfaces (APIs) mentioned above, respectively Keras and TensorFlow.
29 | P a g e
6.1.2. Python Packages for Machine Learning
TensorFlow (TF) is referred to as an open-source API meant to provide easy to use features
for building complex machine learning applications. It wraps together a bunch of general-purpose
machine learning as well as deep learning algorithms, and it also encloses high-level and useful
facilities. It represents the backend core for Keras, being also referred to as its computational
engine.
TensorFlow is suitable for applications which imply working with images and neural
networks since it provides the possibility to work with dataflow structures, that are structured
graphs composed of a series of nodes where each node can represent a specific mathematical
operation.
One of the major advantages that come with TensorFlow is the abstraction capability. It
allows the developer to focus on the global logic of the end-user program, TensorFlow dealing with
the fine details of the processes behind. Moreover, it allows eager-execution, which means it
provides the possibility to separately modify each particular process and evaluate its results.
In this project TensorFlow, 2.0 was used, whose available packages for different operating
systems can be found in [32]. It only works with Python 3.5 up to Python 3.8 versions, and it also
provides Graphical Processing Unit (GPU) support. Image classification approaches require the
GPU version due to the fact it grants higher processing speeds as compared to Central Processing
Unit (CPU) based TensorFlow applications.
To be able to run the TF GPU version in a virtual environment, there is needed to install
Nvidia GPU drivers for the compatible CuDNN and Cuda versions. “Compute Unified Data
Architecture”, also known as CUDA, is a computational platform created by NVIDIA and it is
designed to effectively increase the computing speed by exploiting the full power of the GPUs.
NVIDIA also developed a library meant to be used within Deep neural networks (DNN), namely
CuDNN. TensorFlow works well with specific versions of CUDA and CuDNN which can be found
at [33].
On top of TensorFlow is built the Keras API which provides the ease of working with
DNNs. It carries out the abstraction characteristics of TensorFlow creating a user-friendly
environment for faster and easier developing of Neural Network models. It contains a series of
already built and trained deep learning models that can be found at [28] and which can further be
used by developers to create their applications.
Keras's work-frame is included in TensorFlow 2.0 and it offers different functionalities such
as the possibility to work with datasets and to preprocess the data before the training takes place.
It also encloses methods designed to create, train, update, visualize and evaluate models.
Furthermore, TensorFlow offers models for mobile applications by the introduction of
TensorFlow Lite [34]. It offers the possibility to convert a TF model into a TFLite model so that
its size is decreased by lightly affecting its performances and accuracy metrics. In this way, TF
models can be used to perform on smartphones, tablets, or other mobile devices applications.
Other python utilities used to implement this project are Pandas, NumPy, MatplotLib, and
Scikit-Image. NumPy is a Python package meant to provide help with the numerical scientific
computations. It works with multidimensional arrays and it relies on linear algebra and other
mathematical functions listed in [35].
For better management and visualization of the samples from the dataset, it was used the
Pandas, known for its capability to manipulate and interrogate DataFrames. Its functionalities are
presented in [36].
The results and were shown and evaluated using the MatplotLib package which provides a
set of tools embedded into an object-oriented API meant to create interactive, static, or animated
visualizations in python.
30 | P a g e
6.1.3. Android Studio
Android Studio is an Integrated Development Environment used for developing android

applications designed to run on mobile devices It provides performant tools whose purpose is to
ease the android project management. It comes with a set of features such as a visual layout editor
which allows the developer to build the graphical interface in an easier way, an APK analyzer, and
an emulator which offers the possibility to quickly visualize the real-time results.
The model built in this project with Keras and TensorFlow is further converted into a
TensorFlow Lite model so that it can be used for mobile projects. The classifier and its functionality
are integrated into the Android environment for the creation of the final application proposed by
this paper.
6.2. Implementation of the CNN classifier
Figure 13. Implementation Workflow

31 | P a g e
6.2.1. Dataset Description
The dataset used for the implementation of the proposed method is known as the
“PlantVillage Dataset” referenced in [37].
The original dataset contains 54,309 images and contains 14 leaf species, namely: Tomato,
Strawberry, Squash, Soybean, Raspberry, Potato, Pepper Bell, Peach, Orange, Grape, Corn,
Cherry, Blueberry and Apple. The overall diseases enclosed by these species are, 4 diseases caused
by bacteria, 2 viral diseases, 2 mold allergies, 17 fungal infections, one mite-based disease, and
healthy leaves for 12 species. The structure of the original dataset can be visualized in Figure 14.
Figure 14. PlantVillage Dataset
There exist two variants of the PlantVillage dataset: one version contains raw images that
enclose the background characteristics and the second version which consists of processed images
that emphasize the region of interest (ROI), which is the leaf region. The segmentation process has
been realized based on the algorithm presented in [38]. Some examples of samples within both
original and segmented datasets are presented in Figure 15.
Figure 15. (1.A/B) – Apple scab; (2.A/B) Blueberry healthy; (3.A/B) – Peach Bacterial Spot;
(4.A/B) – Corn Gray Leaf Spot;
32 | P a g e
The proposed method for this project consists of using two sets of data created with images
from the segmented PlantVillage dataset. The first implementation is based on a dataset that
includes all the 38 classes enclosed by the entire image set. For further processing, this dataset is
split into 3 groups, namely, training data, validation data, and test data. The splitting was done
based on a 6:2:2 ratio, which means that 60% of the total data was used for train, 20% for
validation, and 20% for testing. The repartition of images within this dataset is shown in Figure 16.
Figure 16. Data partitioning for the first dataset
For the second dataset, there were selected only 10 classes from the whole image set. These
classes are representative of the Tomato leaves diseases. The image set was partitioned based on
the same principle as the previous dataset, the difference being that, in this case, for the test images
are taken only 150 samples for each particular class (excepting the class “Mosaic-virus” which
contains 74 images). The unused samples from the Test data are assigned to their corresponding
classes into the Training and Validation sets. This set of images incorporates 4 fungal infections
(“Early blight”, “Septoria leaf-spot”, “Target Leaf-Spot”, “Late Blight”), one disease caused by
bacteria (“Bacterial Spot”), one caused by mold (“Leaf Mold”), 2 virus-induced diseases (“Yellow
curl virus”, “Mosaic virus”), one mite based infection (“Spider mites”) and healthy leaves. The
structure of the image set is represented in Figure 17:
Figure 17. Data partitioning for the second dataset
33 | P a g e
For ease of work with datasets within the Python environment, Pandas DataFrame service
was used and OS library [39]. Python OS is a package that implicitly comes with any python version
and it offers an easy to use way to work with the project dependencies with the Operating System.
The OS library allows the developer to manage data and system files and folders directly from
Python scripts, as shown in the example above:
train_data_df = pd.DataFrame(columns=['path', 'leaf', 'disease','label'])
df_index = 0
for folder in os.listdir(dataset_path):
for file in os.listdir(dataset_path + '/' + folder):
fName_split = folder.split('.')[0].split('_')
path=dataset_path + '/' + folder + '/' + file
leaf = fName_split[0]
disease = fName_split[-1]
train_data_df.loc[df_index] = [path, leaf, disease, folder]
df_index+=1
Pandas DataFrames functionalities combined with the use of OS packages offer the
possibility to manage and to visualize real-time data from specific locations within the computer.
It is a proper way to keep track of the data used within the project. In the case of working with
images, an efficient way to use Pandas Dataframes is to store the path of the image file instead of
storing the entire image, as shown in Figure 18.
Figure 18. Managing data using Pandas DataFrame
6.2.2. Data augmentation

After the images were organized in specific folders, a data augmentation algorithm was
applied. This is done to prevent overfitting of the models and also to increase the number of samples
within the dataset. The idea behind data augmentation is that when the model is trained on
augmented data, it never processes the same image more than once. This aspect increases the
overall performances of an image classifier and provides models with high generalization
capabilities.
34 | P a g e
The data belonging to each group (training, validation, test) from the two datasets are
augmented with a bunch of different techniques. TensorFlow provides special functions meant to
realize this task. For the data augmentation part, there were used the ImageDataGenerator class
and its method, flow_from_directory, referenced in [40].
The ImageDataGenerator is a class belonging to the Keras Dataset preprocessing
functionalities. It produces groups, referred to as batches, of images formatted as tensors. By tensor,
it is understood a generalization of matrices and vectors and it can be represented as a multi-
dimensional array. The class constructor allows the optimization of the parameters used in the
augmentation process.
At first, different techniques were separately applied to the images in order to expand the
set of data, so that, in the end, the effects of the transforms were overlapped to obtained more
complex augmented data. Any sample created through an image generator and meant to be used
for further training a classifier model has to be normalized. That means it has to be rescaled to
values between 0 and 1 for faster processing of data flow through a neural network.
The images used in this dataset are based on the RGB color palette which means they have
3 color channels, each channel having pixels within 0 and 255 limits. To achieve this, the Image
generator can be set a particular value for scaling the image to proper values needed for further
processing.
The horizontal flip is a very often used transformation because, even if it is a simple
mathematical operation, it can bring major improvements to the model’s performance. The results
of both horizontal and vertical flip operations can be visualized in Figure 19.
Figure 19. Horizontal/Vertical Flip techniques
The images were also exposed to rotation invariance. The ImageDataGenerator class offers
the possibility to set a degree range within which the image can suffer random rotations. The
argument corresponding to the rotation range is an integer value, representing the range in degrees
of the random rotation to be applied. The results obtained by applying a 45-degree range for random
rotations are shown in Figure 20 below.
Figure 20.Rotation Transform
35 | P a g e
Another technique that has been used in this project for data augmentation is zoom
augmentation. Based on the same principle as for the Rotation transformation, an object of type
ImageDataGenerator allows applying a random zoom within a range set which can be set
as a parameter (it is a float or an interval). The effect of random zooming is shown in Figure 21.
Figure 21. Zoom augmentation

A few more augmentation techniques were used to dispose of many complexes augmented
data. All the operations mentioned above were combined with a horizontal and vertical shift
operation. Another important transformation very often used for image augmentation is also
referred to as shear. This has a great impact on the network performances because by using shear
the neural network learns to recognize and classify the given features under different viewing
perspectives. This dramatically increases the capability of the model to fit new images.
Whenever such an operation is applied over an image (i.e. rotation, shifting) there are vacant
pixels over some regions of the image (usually corners) that remain unfilled. Therefore, the
interpolation technique is applied and the ImageDataGenerator provides this functionality. It
allows us to set the type of the fill_mode so that the image will not contain erroneous information
after its processing. The class also offers the possibility to set other important parameters like the
processing function, feature-wise normalization, or sample-wise normalization. The list of all
parameters of the ImageDataGenerator can be found at [40].
In the end, all the above-mentioned techniques were merged and a suitable image generator
for the desired task was created. The final result of the augmentation process is illustrated in Figure
22.
Figure 22. Augmented images

1
The images presented in Figure 22 were augmented using the rescale property with 255, a
zoom range of 20%, a rotation range of 40-degrees and a shear range of 20%. Also, a horizontal
and vertical flip were applied, combined with a width, respective height shift of the image. The fill
mode for the interpolation operation was chosen as constant, so that the vacant pixels will
automatically be filled with a given constant, which was set to 0. They were generated with the
following code sequence:
train_image_generator = ImageDataGenerator(rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='constant')
36 | P a g e
To have a data generator, there is used a particular method within the ImageDataGenerator
class. The class offers two useful methods: “flow_from_directory” and “flow_from_dataframe”.
These two methods allow the definition of an object that creates the augmented images at each call,
based on the previously instantiated object of the ImageDataGenerator class.
These methods provide the opportunity to work with images directly stored on the disk, or
in an indirect way through a Pandas DataFrame. In the case of using a DataFrame, there are two
possibilities to be approached. Once it can be specified the directory where the images are found
on disk and the DataFrame should contain the relative names of the files, and second, if there is
specified no directory, the DataFrame must contain the absolute paths of the images.
In the present work, flow_from_dataframe method was used. To augment all the disposable
data for further processing, 6 different DataFrames were built based on the data organization within
the two datasets. A separate DataFrame was built for each of the training, validation, and test
purposes. With the flow_from_directory method, new data was generated based on each of these
DataFrames.
This method automatically labels the processed data based on the classes specified in the
DataFrame. Each generated image has a target size that can be specified among the method’s
parameters. It also offers the possibility to select the color mode of the images desired to be
generated, i.e. RGB or Gray. Overall, this method creates a data flow by generating mini-batches
of images (the batch size has to be specified as a parameter of the method) that are characterized
by a specific class mode, depending on the classification purposes.
In this work the flow_from_dataframe method was used in order to generate a data flow
described by a target size of 224x224, a batch size of 64 and a categorical class mode. The samples
are generated randomly and they do not respect the order in which they have been read from the
DataFrame, as it is shown below:
train_data_gen = train_image_generator.flow_from_dataframe(
dataframe=train_data_df,
directory=None,
x_col="path",
y_col="label",
weight_col=None,
target_size=(224, 224),
class_mode="categorical",
batch_size=64,
shuffle=True)
6.2.3. Transfer learning

Now that the data is ready to be processed, the next step was to create the model desired to
be trained. To achieve this, a solution was the transfer learning technique based on the VGG16
Keras application.
VGG16 was initially created for the ImageNet dataset, by A. Zisserman and K. Simonyan,
referenced in [41]. This model was designed to classify large-scale images by using 23 layers deep
convolutional neural network. It is offered by Keras applications and it comes with the weights
already pre-trained from the ImageNet dataset. This dataset contains millions of images and the
model classifies them into 1000 classes.
Its predefined input size is 224x224, but it offers the possibility to feed images of other
sizes. The minimum width and height for the images fed as input are 32x32. It is composed of 13
convolutional layers, 5 pooling layers, and 3 top fully connected layers.
Keras offers the possibility to import this model and to use it as it is or to fine-tune it as it
is desired. There are different versions of the VGG16 model that can be imported into personal
projects depending on the parameters set when the model is downloaded.
In this project was used the following setup in order to import the pretrained VGG16 model:
37 | P a g e
base_model = tf.keras.applications.VGG16(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=(img_haight,img_width,3),
pooling='Max')
base_model.trainable=False
The purpose of this work is to retrain only the top layers of the VGG16 model, a technique
called fine-tuning. When the model was initially imported, its weights were set to non-trainable,
and in what followed, the model was prepared for this project.
The architecture of the VGG16 model used in this work is shown in Figure 23:
Figure 23. VGG16 Architecture
All the layers within the model are non-trainable (0 trainable parameters, as it is illustrated
in Figure 23). In order for the model to match with the two datasets described above, two new
Sequential models were built over the VGG16 base model, using the following code sequence:
38 | P a g e
first_model=Sequential()
for layer in base_model.layers:
first_model.add(layer)
first_model.set_weights(base_model.get_weights())
Keras provides the ease of building models by using the Sequential class which delivers
models as a stack of layers that can be particularly edited to obtain the desired results.
The last layer of the base model was removed since the number of nodes within the layer
does not correspond with the number of classes within the datasets. Therefore, after the last top
layer was removed, a new Dense layer was added for each model, once having 38 nodes as it is
required for the first dataset, and 10 nodes for the second set of data. This was achieved using the
“.pop” and ”.add” methods provided by the Sequential class, as shown below:
first_model.pop()
first_model.add(Dense(38, activation = 'softmax'))
second_model.pop()
second_model.add(Dense(10, activation = 'softmax'))
The Keras Sequential models provide the possibility of using different types of layers. In
this project were approached a limited set of sequential layers such as InputLayer, Conv2D,
MaxPooling2D, Flatten, and Dense.
The InputLayer is the first layer within a Sequential model and it takes the dimensions of
input tensors. It is followed by a Conv2D layer, which is a 2-dimensional convolutional layer that
computes the convolution between a kernel and the layer input, providing in this way input for the
next MaxPooling2D layer.
The Flatten layer is used to reshape the tensor provided by the last MaxPooling2D layer,
so that the new shape will be reduced to the number of the elements within the tensor, ignoring the
batch size. For the last layer, it is used the Softmax activation function, which is the most suitable
for this work.
The Conv2D offers the possibility to set the numbers of filters used within the layer, the
size of the filter, the stride of the convolution and the padding type, through its parameters. In the
architecture presented above the padding used for all the convolutional layers is “Same” Padding
and the stride of the convolution calculation is set to be 1. The detailed structure of the layers within
the model is presented in :
Figure 24. Layers parameters configuration
The number of filters is 64 for the first 2 convolutional layers, 128 for the following three,
256 for the next three, and 512 for the rest of them.
The method proposed in this work implies to retrain only the first fully connected layer of
the VGG16 model. This layer contains the largest of trainable parameters and its role is just to
classify based on the features extracted within the previous convolutional layers. The Sequential
39 | P a g e
class properties allow updating the trainable parameter of each particular layer by using the
layer.trainable property, where the variable layer is a particular layer from the architecture. The
model has a total of 21 blocks, as shown in Figure 25, therefore, to train only the top three layers,
the following code sequence was used:
first_model.layers[19].trainable=True
second_model.layers[19].trainable=True
Figure 25. Layer trainable property
6.2.4. Training
The model was compiled using the compile offered by the Sequential Keras model. After a
few training attempts with different optimizers and learning rates, the final model was chosen a
small learning rate, i.e. 0.0001, which works properly with small batch sizes, in this case, 64. For
the optimizer, the Adam optimizer offered by Kera was a suitable option.
In order to compile the model, besides the optimizer, there is also needed a loss function.
The Keras API provides a series of loss functions that can be found at [42]. The appropriate loss
for this project was CategoricalCrossentropy, given that the expected output consists of more than
two classes.
This loss function relies on the cross-entropy between the true labels and the predicted ones.
The labels are provided in the “one_hot” format, which means the categories are converted from
categorical to numerical values.
The metric used to compile the model was the Accuracy metric. Keras provides a series of
different metrics supported when compiling the model. The Accuracy class relies on counting how
many times a predicted label is the same as the true label. This metric is based on computing the
equality rate between the predicted label and the actual value of the class label.
The model was compiled using the following code sequence:
first_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001)
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
The from_logits parameter of the CategoricalCrossentropy class is used to that the loss
function “know” that a Sofmax activation function was applied on the output layer of the model
and the predicted values are normalized and represent a probability distribution.
40 | P a g e
The hyperparameters used to train the models are presented in :
Figure 26. Training parameters
In order to train the model, it was used the fit_generator method provided by keras within
the sequential class, as shown below:
history = new_model.fit_generator(
train_data_gen,
steps_per_epoch=total_train_images // 64,
epochs=200,
validation_data=validation_data_gen,
validation_steps=total_val_images // 64
)
The fit_generator method provides the possibility to train a model based on an

ImageDataGenerator object. It has to be specified both the training and the validation dataset based
on which the training of the model is done. The method needs the number of training epochs to be
passed among its parameters, as well as the number of steps to be executed when processing the
training set, respectively, the validation set.
As it was described in the “5.3. Learning algorithm” section, the input data is split into mini
chunks of data called batches. These batches are processed one by one until the entire data passed
through the network. The complete delivery of the whole set of data is referred to as an epoch. The
number of training epochs was set to 200 since the learning rate of the model was chosen a small
value, i.e. 0.0001.
6.2.1. Performances Evaluations. TensorFlow Lite conversion

The trained model can be saved onto the disk by using the tf.save_model.save function
provided by TensorFlow API. Once the model is saved it can be loaded from the disk using the
tf.load_model method, as it is following:
In order to make predictions, a new image generator was created based on the test dataset.
The method used is predict_generator enclosed by the keras API, as in the code snippet below:
predictions = first_model.predict_generator(test_data_gen,
steps = total_val_images // batch_size+1)
The performances of the model were evaluated by visualizing the confusion matrix built
based on the predictions previously made and the true classes of the test images. Therefore, to
construct the confusion matrix, the sklearn.metrix.confusion_matrix method was approached:
c=confusion_matrix(test_data_gen.classes, y_predictions)
41 | P a g e
For the model to be further enclosed in the android application, it has to be converted into
a TensorFlow Lite model. A TensorFlow Lite model is designed to be used on embedded and
mobile devices. It is the light-weight adaption of a normal TensorFlow model, and it runs with low
latency and with faster speeds on smartphones.
In this work, the conversion to TF Lite was made through the following code sequence:
converter=tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model=converter.convert()
with open(target_path, 'wb') as f:

f.write(tflite_model)
The TensorFlow API provides the TFLiteConverter as an option to be used to realize the
conversion. This class allows us to save a model depending on its original format. Therefore, it was
used the from_saved_model method which matches the tf.save_model.save method previously used
to save the model on disk. The conversion process provided a TensorFlow Lite model, which was
further written into a file with the extension .tflite. The resulting model will further be used to
develop the android project. Moreover, to achieve this, a text file containing the class names is
needed.
6.3. Implementation of the Smartphone Application
Figure 27. Application's workflow

42 | P a g e
In Figure 27 it is illustrated the working principle of the android application proposed by
this work. For the application to run on the mobile device, some permissions must be granted. If
the user grants the necessary permissions, a new activity is created, namely ChooseModel. This
activity allows to load the model, capture an image, crop it, and send the needed information to the
next activity, which is Classify. Here the image is resized to match the model’s input and is further
classified. From this activity, the user can go back to the ChooseModel activity to capture another
image, or there is the possibility to close the application.
6.3.1. Import the TFLite model

In order to achieve image classification on android devices, the TensorFlow Lite previously
created model must be imported into the mobile device application. To do this, a new folder was
created within the app\src\main\ location, which is called assets.
This folder contains the resource files that will be accessed by the activities within the
application to perform the desired tasks. The TFLite models and the text files containing their
specific class names were placed in the assets folder as it is shown in Figure 28:
Figure 28. Placement of the TFLite models
6.3.2. Android Manifest

For the proper working flow of the application, there is the necessity to work with the
device’s camera. To achieve this, it must require user permission to use the camera’s
functionalities. The image captured using the camera activity has to be saved onto the device’s
storage for further processing. That is because the captured images consist of a large quantity of
information so that it is not recommended to directly send them from one activity to another in raw
format.
Therefore, in the AndroidManifest.xml file was specified that the application is going to use
the camera and the device’s storage options. These facilities have to be required in order for the
permission to be grant, using the following code sequence:
<uses-feature
android:name="android.hardware.camera"
android:required="true" />
<uses-feature
android:name="android.hardware.camera.any"
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

<uses-permission android:name="android.permission.CAMERA" />
In the same manifest file, it has to be created the activity for the cropping process. The
cropping activity is required after a picture was captured, to select a square region from the original
image, that is required for further processing because the TFLite classifier works with square sized
images.
43 | P a g e
For the cropping activity to properly work and for the models to run within the application,
new dependencies were added within the build.gradle file, as it follows:
implementation 'com.soundcloud.android:android-crop:1.0.1@aar'
implementation 'org.tensorflow:tensorflow-lite:+'
Moreover, in the same build.gradle file, it was specified the noCompress aaptOption for
the TFLite models from the assets folder, so that they can be used as they are imported and their
performances are not affected.
6.3.3. Choose Model Activity

The ChooseModel activity is the first one that is created when the application starts running.
It consists of two buttons which allow the user to choose what model is going to be used for
classification. To create the design for these buttons a new choose_model_activity.xml file was
created in the layout resource folder.
The onCreate method was configured so that the application is sending the request
permission message when it starts. If the permissions are granted, the activity keeps running, but if
the permissions are denied, the application was set to automatically close and to display a popup
message that informs the user that it can’t run without the requested permissions.
Also, within the onCreate method, it was configured the setOnClickListener functionality
for the two buttons previously created, as it is shown below:
modelFloat.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View view) {
chosen = "model.tflite";
openCameraIntent();
}});
As is illustrated in the example above, the buttons were configured so that when they are
clicked, the corresponding model is to be selected and the openCameraIntent method is called.
Within this function, the application was instructed to ask the system to open the camera and a new
imageURI variable was created in order to reference the captured image within the storage medium.
This was achieved through the MediaStore.Images.Media.EXTERNAL_CONTENT_URI as it is
illustrated below:
imageUri = getContentResolver().insert(
MediaStore.Images.Media.EXTERNAL_CONTENT_URI, values);
Therefore, a new intent was created and associated with the previously created URI
reference, in order to be further used to request the system to open the camera.
The onActivityRequest method was implemented so that, depending on the requestCode
received after the picture was captured within the camera activity, the image will be further sent to
the next activity, which is to crop the image. To be able to start a new activity after the crop is
performed, a new intent was created. The generated intent contains the information about the
reference URI and the chosen model. This is sent to the next activity when it starts and the current
activity is going to end, as it is illustrated in the code sequence below:
Intent in = new Intent(ChooseModel.this, Classify.class);

in.putExtra("uri_ID", imageUri);
i.putExtra("chosen", chosen);
44 | P a g e
startActivity(i);
finish();
6.3.4. Classify Activity
The intent previously created in the ChooseModel activity is further sent to the
ClassifyActivity, containing the URI associated with the captured image and the name of the chosen
model. This information is needed for the ClassifyActivity to read the image from the device’s
storage to classify it using the TFLite model.
The front end of this activity consists of a few layout elements such as an ImageView, two
buttons, and a group of LinearLayouts and TextViews. The ImageView was set to display the actual
image read from the external storage, after the resizing procedure. Besides that, the two buttons
were designed to display the classification results on the given image, respectively to return to the
previous activity. Therefore, the group of TextViews and LinearLayouts were placed within the
activity layer so that they display the results returned from the model, namely the labels and their
corresponding confidences.
The java code for this activity has an increased complexity as compared to the beck-end
part for the initial. The working principle of this activity consists of loading the labels list and the
model from the assets directory and resizing the image and feeding it as input to the model.
Furthermore, it realizes the prediction on the given image and finally displays the results on the
screen.
In order to achieve this, in the onCreate method was extracted the chosen property from the
intent received from the previous activity, as it follows:
chosen = (String) getIntent().getStringExtra("chosen");
Within the onCreate superclass constructor method, a TFLite Interpreter was initialized,
whose purpose is to load the TFLite model from the assets directory and configure it with the
default tfliteOptions. Together with the model, there is also loaded the text file containing the class
names list corresponding to the classifier. The Interpreter initialization and the label loading
process illustrated within the next code sequence:
tflite = new Interpreter(loadModelFile(), tfliteOptions);

labelList = loadLabelList();
The loadModelFile method is using the chosen extra received from the previous activity to
open the model file from the assets. The content of the file is read as a string of bytes and it is
returned for further processing. On the same principle, it was built the loadLabelList which reads,
byte-wise, the content of the labels.txt, also found in the assets folder.
Furthermore, a byte array was initialized in order to store the image data that is to be fed to
the model. In TFLite there can be used two types of models, namely floats or quantized. The
from_saved_model method used to convert the TF model into a TFLite one provides a float model,
meaning that there are used 4 bytes to store the model’s weights. Therefore, to manage the image
data for the model, the byte array was initialized with 4 times the dimension of the image, as it is
following:
imgData = ByteBuffer.allocateDirect(4 * DIM_IMG_SIZE_X * DIM_IMG_SIZE_Y *

DIM_PIXEL_SIZE);
ProbArray = new float[1][labelList.size()];
Also, a probability array was used in order to store the output predictions of the model,
which are also, in float format.
45 | P a g e
Once the model and the labels were loaded, the image provided by the initial activity is also
read from the storage with the help of its corresponding URI. After it is loaded, it is placed within
the ImageView layout element of the activity, using the getBitmap function offered within :
Bitmap img_bitmap=MediaStore.Images.Media.getBitmap(getContentResolver(),
uri_image);
selected_image = (ImageView) findViewById(R.id.selected_image);
selected_image.setImageBitmap(img_bitmap);
As mentioned above, this activity also contains two buttons, namely Classify and Back.
Within the onClick listener of the back button, a new intent was created in order to be sent
to the next activity and the startActivity() function was called to turn back to the initial
ChooseModel activity screen.
The functionality of the Classify configured so that the image is read from the ImageView
display element and then it is resized to match the model’s input dimensions. The resizing process
simply consists of scaling the image until the (224,224,3) dimensions are achieved. Once the image
has the proper sizes, it is converted into a byte buffer, a format that is needed for further processing.
The conversion to the byte buffer is meant to convert the image into an RGB valued array.
The algorithm implies passing through each pixel of the image and convert it into a RGB value by
using bit manipulation technique. In order to do this, there were used some predefined values for
the image Mean and image Standard Deviation. The normalization values were taken from the
TensrFlow Github repository, referenced in [43]. The normalization process is illustrated in the
code sequence below:
private static final int IMAGE_MEAN = 128;

private static final float IMAGE_STD = 128.0f;
int pixel = 0;
for (int i = 0; i < SIZE_X; ++i) {
for (int j = 0; j < SIZE_Y; ++j) {
final int val = intValues[pixel++];
imgData.putFloat((((val >> 16) & 0xFF)-IMAGE_MEAN)/IMAGE_STD);
imgData.putFloat((((val) & 0xFF)-IMAGE_MEAN)/IMAGE_STD);}}
When the button is pressed, the model executed and the output provided by the classifier is
placed within the previously created ProbArray probability array. This was approached using the
.run method offered by the TensorFlow dependency within the Android Environment, as it is
shown in the following code snippet.
tflite.run(imgData, labelProbArray);
In order to display the results, a PriorityQueue was used to store the top 3 greater
confidences that are meant to be displayed as the result after the classification is performed. The
values within the priority queue are placed within the TextView layout elements.
The group of LinearLayouts and TextViews was set to appear as invisible when the activity
starts so that it becomes visible at the moment when the Classify button is activated.
After the prediction was done, the onClick method of the Back button allows the user to
navigate back to the initial activity.
46 | P a g e
7. Experimental results
In this section, there are going to be presented the experimental results obtained during the
implementation stage of the project. The results consist of the architectures of the two realized
models, the evaluation of their performances, followed by the presentation of the resulting android
application and its functionality, that is the final objective of the current work.
7.1. CNN models

The final architectures of the implemented classifiers consist of two Sequential models
made of 23 layers. The corresponding structures are following the working principle of a VGG16
network containing 12 convolutional layers, 5 max-pooling layers, and 3 fully-connected layers.
The last fully connected layer is composed of 38 nodes, over which is applied to the Softmax
activation function. This layer provides the output as a probability distribution, meaning that the
sum of all the output values is going to be 1. The complete architecture of the two models is
illustrated in Figure 29:
Figure 29. The architecture of the implemented models

47 | P a g e
As can be seen in Figure 29, both models have a large number of trainable parameters.
There is a slight difference of 114,716 parameters since the tomato leaf disease classifier has fewer
nodes within its output layer, i.e. 10 nodes (10 output classes). Therefore, the classifier based on
tomato leaf diseases has 114,716 less trainable parameters as compared to the second model which
consists of 38 nodes within its classification layer. The disparity is quite small, but in addition to
the significant inequality of the number of total samples on which the models were trained, a great
difference was noticeable regarding the execution speed of the training process.
Excepting the output layer, both models were configured using the same hyperparameters,
which are illustrated in Figure 26 from section 6.2.4.Training above. The total number of images
on which the classifiers were trained, as well the execution time difference can be seen in Figure 30
presented below:
Figure 30. Differences between the two models
7.1.1. Assorted leaves model with 38 classes
Training the model on all the classes within the dataset, i.e. 38 distinct categories, leads to
a resulting accuracy of 80.87%. and a loss of 2.874. The evolution of these two metrics
characteristics for the model after training it over a number of 200 epochs is illustrated in Figure
31:
Figure 31. Assorted leaves model accuracy and loss
48 | P a g e
As it is shown in the figure above, the model’s accuracy increases with a high slope during
the first iterations within the training process following that its slope decreases with the growth of
the numbers of epoch trained.
The motivation for this is the fact that when the model starts training, the weights are
initially randomly generated meaning that the loss is going to have high values. The weights are
further updated during each epoch with a small proportion of the total error (the loss generated by
a specific weight is multiplied with the learning rate factor, as it was described in “5.3.2. Learning
Rate” section). If the loss is high, then the product between the learning rate and the loss will also
be high, implying that the value used to update the weight is significant, thus a high decreasing
slope for the loss function is noticeable, concurring with a high increase of accuracy from one
iteration to another.
The graphical representation also demonstrates that the model is not overfitted through the
fact that the values for the training accuracy and the validation accuracy closely follow each other.
This aspect denotes that the model is capable to generalize the features learned from the training
set to apply them on a new set of data, i.e. the validation dataset.
In order to evaluate the model’s performance when predicting a new set of data, it was used
as an image generator (as the one presented in “6.2.2. Data augmentation”) which processes the
data from the Test directory. The test dataset is described in “6.2.1.Dataset Description” section.
In Figure 32 is presented the confusion matrix associated with the predictions over the test
data:
Figure 32. Assorted leaves model - Confusion matrix
49 | P a g e
This matrix shows exactly how many images were correctly predicted for each given class
and in the case of misclassifications, the number of samples that were classified as belonging to a
different category than its true class.
Each row of the matrix represents a true class while each column is associated with a
predicted label and the intersection of row X with a column Y means that the and image from the
true class associated with row X was misclassified as belonging to the class associated to the column
Y. If the model has better performance, meaning that the images are correctly predicted as
belonging to their true class, then the main diagonal of the matrix will be strongly populated with
the corresponding values for each class (i.e. it should contain the total number of samples within
that class for accuracy of 100%).
For example, in the figure above, for the Apple_Apple-scab class, a number of 107 images
were correctly predicted, 2 images were predicted as being Apple_Black-rot, 1 sample was
misclassified as Apple_Cedar-apple-rust, 7 images were erroneously predicted as Apple_healthy
and so on.
An easier representation of the most significant data which can be read from the confusion
matrix is illustrated in Figure 33, below:
Figure 33. Confusion matrix interpretation - Assorted leaves model
50 | P a g e
The figure presented above presents the confusion matrix in an easily understandable way.
Therefore, in the second column, i.e. ActualValues, is presented how many images from each class
were found in the test dataset. The number of correctly predicted samples corresponding to a
specific class is shown in the next column, which is Correct predicted values.
The next column, i.e. Total predicted values, shows how many images were predicted as
belonging to a specific class. For example, for the Apple_Apple-scab from a total of 126 images
within the dataset, only 107 were correctly classified, but a total of 113 images were predicted as
belonging to this class. This means that 6 other images were misclassified as Apple_Apple-scab.
The last column represents the percentage of images from a specific class that was
misclassified. Relying on the same example with the Apple_Apple-scab class, from the total of 126
samples, 107 were correctly classified, meaning that 9 images from this class were misclassified,
representing 15.08% from the total number of test samples contained within the Apple_Apple-scab
class.
7.1.2. Tomato leaves model with 10 classes

In this subsection will be presented the experimental results obtained for the second model
used to classify the tomato leaves.
Training the model on a part of the entire dataset, i.e. 10 distinct categories, leads to a
resulting accuracy of 87.84%. and a loss of 1.5867. The evolution of the model’s characteristics
after training it over a number of 150 epochs is illustrated in Figure 34, below:
Figure 34. Tomato leaves model - Accuracy and loss
As it is shown in the figure above, after approximately 150 epochs, the model starts to
overfit. By training the model only for 150 epochs it was obtained a difference of 0.0344 between
the accuracy on the training set and the validation accuracy. The value of 3% is relatively small but
as it can be seen in the two graphs if the model is trained over a number of epochs larger than 150,
the difference in accuracy slightly increases leading to the apparition of the overfitting
phenomenon. A solution for this would be to add a Dropout layer after the first fully connected
layer. The dropout value should be chosen within a trial and error process.
51 | P a g e
The dataset used for performance evaluation of the classifier by predicting a new set of test
images was described in the subsection “6.2.1. Dataset Description”. It mainly consists of 150
samples for each class excepting the Tomato_Mosaic-virus. The confusion matrix corresponding
to the model’s predictions is presented in Figure 35.
Figure 35. Tomato leaves model - Confusion matrix
In this case, the matrix is limited to fewer rows and columns because the test dataset wraps
only 10 classes. In the figure above can be observed that there is a problematic class, i.e.
Tomato_early-blight, which has a considerable impact on the accuracy of the model. To better
visualize the information offered by the confusion matrix, it was used the same approach as for the
other model, that is the representation within Figure 36 below:
Figure 36. Confusion matrix interpretation - Tomato leaves model
As it can be seen in the previous figure, 46% of the images within the Tomato_Early-blight
were erroneously predicted and most of the misclassifications are associated with the Tomato_Late-
blight class. Even if the model has overall better accuracy and loss, there can be noticed a relatively
52 | P a g e
high percentage of misclassifications for most of the classes within the dataset. A solution for this
will be discussed within the “8.Conclusions” section.
7.2. Android application

As it is illustrated in Figure 27 from section 6.3, the application mainly consists of four
activities. A first activity, namely ChooseModel, where the user can choose which model to use to
classify the desired type of leaf and a second one, called Classify where the output of the classifier
is displayed.
Besides this main activity, there are performed two more actions, respectively taking a
picture using the device’s camera and cropping the acquired image to match the classifier’s input.
There were established relationships between the activities, so that the main workflow of the
application that the user has to follow is intuitive, as it is shown in Figure 37:
Figure 37. Application's working principle
For the application to be able to run on the smartphone, the user needs first to grant the
required permissions, i.e. the permission to use the device’s camera functions and to store the
picture within the internal memory medium, as it is shown in Figure 38:
Figure 38. Permission requirements
Once the permissions are granted, the ChooseModel will be the first activity that pops on
the screen. Its main elements are represented by a TextView and two buttons. These buttons allow
53 | P a g e
the user to choose which model is going to be used for further processing. The layout of this activity
is illustrated in Figure 39.
The two buttons presented in Figure 39 were created with rectangle shapes over which a
the radius of 50dp was applied for each corner and applying a linear color gradient in the vertical
direction.
Figure 39. ChooseModel Activity
When a specific button is pressed, its corresponding model is loaded into the application
from the assets folder to be used within the next activity.
Furthermore, the camera activity starts running to capture the image containing the leaf
desired to be classified. After the image is captured, the next action that needs to be executed is
cropping the picture so that it becomes a square image. These two processes are required for a
better overall performance of the application since the classifier model accepts as input only square
images.
The CameraActivity and the CropActivity screens are illustrated in Figure 40 below:
Figure 40. Camera and cropping activities

54 | P a g e
The cropping activity has to be manually done by the user by moving and resizing the
selection square tool present on the screen. In this way, the region of interest, i.e. the part of the
image containing the leaf, is further sent to the next activity while the rest of the image is discarded.
This action is necessary in order to get rid of non-useful information which can harm the
classification process.
The useful information, i.e. the leaf image, is then received by the last activity, namely
ClassifyActivity. Within this activity, the image suffers a resizing operation through which the sizes
of the image are adjusted to match the sizes of the input layer of the CNN model used as a classifier.
The resized image is further placed within a TextView in order to be visualized by the user and to
be checked if it was properly captured.
If the cropped image does not contain the desired information, the user can go back to the
first activity by hitting the Back button and to capture a new image. Once the picture is containing
the desired details, it can be classified within the Classify activity through the functionalities offered
by the Classify button.
The two buttons, namely Back and Classify, were customized using the same layout as for
the buttons from the initial activity, the difference being that the linear color gradient is applied at
a 45 degree, as it is shown within the following figure.
The graphical user interface of the Classify activity is illustrated in Figure 41:
Figure 41. Classify Activity Screen
The results are set to be visible on the screen within the onClickListener associated to the
Classify button. For displaying the results were used a set of LinearLayouts and TextViews () which
are holding the classification results once the TFLite model is called to predict.
Figure 42. Classification results holder

55 | P a g e
The container illustrated in Figure 42 is showing up when the onClickListerner of the
Classify button is activated. Therefore, the final layout of the current activity will also contain the
top 3 most appropriate classes for the image fed into the classifier, as it is shown in Figure 43 below:
Figure 43. Classify Activity
In the figure presented above, it can be seen the result of a classification situation using the
Assorted leaves-based model. In Figure 44, there can be seen results from both models, using images
from the test dataset directory captured with the phone’s camera.
Figure 44. a) Correct classification; b) Misclassification; c) Correct classification
56 | P a g e
8. Conclusions
The objectives of this project were the theoretical study of deep learning-based image
classification methods, the implementation, and designing an Android application for leaf disease
classification in precision agriculture.
For the final version of the application, two CNN classifiers were used. One which provides
labeling for 38 different classes of assorted leaves and diseases and another one that focuses on a
specific type of leaf with its associated infections, i.e. tomato leaves. The two models are embedded
into an application designed for android smartphones, with practical applicability in precision
agriculture domain.
The classifiers were built approaching the transfer learning technique based on the VGG16
CNN provided by Keras. The VGG16 model was adjusted to match the objective of this work, thus
it was fine-tuned and retrained on the PlantVillage dataset. The model optimization process implied
the trial and error approach.
The same architecture was used to train both classifiers. The model trained on the entire
dataset achieved an accuracy of 80.87% during a training process of 200 epochs, while the model
focused on the tomato leaves reached an accuracy of 87.84% in just 150 epochs. It can be noticeable
that the second model obtained a better accuracy even if it was trained on a smaller number of
epochs. This is motivated by the smaller number of the total classes that the second model is
designed to classify. Thus, the overall performance of a CNN model is strongly affected by the
number of distinct categories used for training.
The resulting models are not overfitted due to the appropriate selection of the number of
training epochs and also due to the proper data partitioning and data augmentation. Choosing the
proper number of training epochs for each network was made through trial and error. During the
experiments, it was observed that if the number of epochs is larger than the specified values for
each classifier, then the model slowly starts to overfit.
To be noticed that there are problematic classes within the dataset (Figure 33 and Figure 36
from the 7.Experimental results section). These so-called problematic classes are characterized by
the fact that none of the images from the given classes were correctly classified. Some examples
of such classes are Corn(maize)_Gray-Leaf-Spot and 6 more different classes. These classes affect
the overall performance of the model, thus, in order to improve the results, the simplest solution
would be to remove the respective categories from the dataset. However, this cannot be done in the
case when these classes represent important categories within the dataset. Therefore, to improve
the model’s performance, a series of supplementary preprocessing operations are required to
manually select the region of interest within the image, i.e. extracting only features focused on the
disease spots targeted to be classified.
A disadvantage of using the VGG16 model for android applications consists of its
dimensions regarding the device’s storage. Each of the two models built has approximately 138
million parameters leading to a resulting application of 1.08 GB.
One of the problems encountered during the implementation stage of the project was
represented by the hardware requirements. TF offers support for CPU, GPU, and TPU. Training
such models using the CPU takes very long, therefore the models were built and trained using the
GPU version of TF. However, even if the models were running on the GPU, they still required a
long processing time. A barrier imposed by TF GPU in this project was the limitations of the
graphical card. When training the models, TF uses the memory of the GPU, therefore when trying
to run more complex architectures, more GPU memory is necessary.
For future work, the performance of the models could be increased by a more precise data
preprocessing and by adjusting the depth of the networks as well as the number of parameters
within the last fully connected layers to decrease the application’s size. Besides, a server-client
connection could be realized by linking the application with a database for storing all the new
captured images (properly classified) and updating the models at a given period.
57 | P a g e
9. References
[1] A. Chlingaryan, S. Sukkarieh and B. Whelan, "Machine learning approaches for

crop yield prediction and nitrogen status estimation in precision agriculture: A review.
Computers and Electronics in Agriculture," Computers and Electronics in Agriculture, pp.
61-69, 2018.
[2] Chun-Chieh Yang, Shiv O Prasher, Peter Enright, Chandra Madramootoo,
Magdalena Burgess, Pradeep K Goel and Ian Callum, "Application of decision tree
technology for image classification using remote," Agricultural Systems, vol. 76, no. 3, pp.
1101-1117, 2003.
[3] . M. Ebrahimi, M. Khoshtaghaza, S. Minaei and B. Jamshidi , "Vision-based pest
detection based on SVM classification method," Comput Electron Agric, vol. 137, p. 52–8,
2017.
[4] G. Geetharamani and J. Arun Pandian , "Identification of plant leaf diseases using a
nine-layer deep convolutional neural network,," Computers & Electrical Engineering, vol.
76, pp. 323-338, 2019.
[5] S. P. Mohanty , . D. P. Hughes and M. Salathé , "Using Deep Learning for Image-
Based Plant Disease Detection," Frontiers in Plant Science, vol. 7, p. 1419, 2016.
[6] . F. Mohameth, C. Bingcai and K. Sada, "Plant Disease Detection with Deep
Learning and Feature Extraction Using Plant Village," Journal of Computer and
Communications, vol. 8, pp. 10-22, 2020.
[7] M. Islam, A. Dinh, K. Wahid and P. Bhowmik, "Detection of potato diseases using
image segmentation and multiclass support vector machine," in 2017 IEEE 30th Canadian
Conference on Electrical and Computer Engineering (CCECE), Windsor, 2017, pp. 1-4.
[8] "Towards Data Science," [Online]. Available:
https://towardsdatascience.com/everything-you-ever-wanted-to-know-about-computer-
vision-heres-a-look-why-it-s-so-awesome-e8a58dfb641e. [Accessed 2020].
[9] J. Walsh, N. O' Mahony, S. Campbell, A. Carvalho, L. Krpalkova, G. Velasco-
Hernandez, S. Harapanahalli and D. Riordan, "Deep Learning vs. Traditional Computer
Vision," in Computer Vision Conference (CVC) 2019, Las Vegas, Nevada, United States,
2019.
[10] R. M. Kumar and K. Sreekumar, "A survey on image feature descriptors," Int J
Comput Sci Inf Technol 5, pp. 7668-7673, 2014.
[11] M. Pasquinelli, "Machines that morph logic: neural networks and the distorted
automation of intelligence as statistical inference," Glass Bead Journal, no. Oct, 2017,
2017.
[12] J. Schmidhuber, "Deep learning in neural networks: An overview," Neural
Networks, vol. 61, pp. 85-117, 2015.
[13] "Machine Learning Mastery," [Online]. Available:
https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/. [Accessed
2020].
[14] "SearchEnterpriseAI - TechTarget," [Online]. Available:
https://searchenterpriseai.techtarget.com/definition/deep-learning-deep-neural-network.
[Accessed 2020].
58 | P a g e
[15] "Towards Data Science," [Online]. Available:
https://towardsdatascience.com/simply-deep-learning-an-effortless-introduction-
45591a1c4abb. [Accessed 2020].
[16] R. L. Welch, S. M. Ruffing and G. K. Venayagamoorthy, "Comparison of
feedforward and feedback neural network architectures for short term wind speed
prediction," International Joint Conference on Neural Networks, no. 14-19 June, pp. 3335-
3340, 2009.
[17] C. Nwankpa, W. Ijomah, A. Gachagan and S. Marshall, "Activation functions:
Comparison of trends in practice and research for deep learning," arXiv preprint
arXiv:1811.03378, 2018.
[18] P. a. Z. Ramachandran and B. a. L. Quoc V, "Searching for activation functions,"
arXiv preprint arXiv:1710.05941, 2017.
[19] G. Bouchard, "Efficient bounds for the softmax function, applications to inference
in hybrid models," in Presentation at the Workshop for Approximate Bayesian Inference in
Continuous/Hybrid Systems at NIPS-07, Citeseer, 2007.
[20] C. Hansen, "Machine Learning From Scratch," [Online]. Available:
https://mlfromscratch.com/neural-networks-explained/#cost-function.
[21] "Deep Lizard," [Online]. Available: https://deeplizard.com/learn/video/jWT-
AX9677k. [Accessed 2020].
[22] "ML Glossary," [Online]. Available: https://ml-
cheatsheet.readthedocs.io/en/latest/gradient_descent.html. [Accessed 2020].
[23] "Towards Data Science," [Online]. Available: https://towardsdatascience.com/a-
comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53.
[Accessed 2020].
[24] C. J. a. C. Burges and E. a. G. Hans Peter, "Method of image enhancement using
convolution kernels," Google Patents, 1997.
[25] "Deep lizard," [Online]. Available:
https://deeplizard.com/learn/video/0h8lAm5Ki5g. [Accessed 2020].
[26] Tensorflow. [Online]. Available:
https://www.tensorflow.org/tutorials/images/classification. [Accessed 2020].
[27] "Nanonets," [Online]. Available: https://nanonets.com/blog/data-augmentation-
how-to-use-deep-learning-when-you-have-limited-data-part-2/. [Accessed 2020].
[28] "Keras," [Online]. Available: https://keras.io/api/applications/. [Accessed 2020].
[29] "Deep Lizard," [Online]. Available: https://deeplizard.com/learn/video/5T-
iXNNiwIs. [Accessed 2020].
[30] "Pyip," [Online]. Available: https://pypi.org/project/pip/. [Accessed 2020].
[31] "Jupyter," [Online]. Available: https://jupyter.readthedocs.io/en/latest/#kernels.
[Accessed 2020].
[32] "TensorFlow," [Online]. Available: https://www.tensorflow.org/install. [Accessed
2020].
[33] "TensorFlow," [Online]. Available: https://www.tensorflow.org/install/gpu.
[Accessed 2020].
[34] "Tensorflow," [Online]. Available: https://www.tensorflow.org/lite/guide.
[Accessed 2020].
[35] "NumPy," [Online]. Available: https://numpy.org/doc/stable/about.html. [Accessed
2020].
59 | P a g e
[36] "Pandas Pydata," [Online]. Available: https://pandas.pydata.org/about/. [Accessed
2020].
[37] J. ARUN PANDIAN and G. GEETHARAMANI, "Data for: Identification of Plant
Leaf Diseases Using a 9-layer Deep Convolutional Neural Network," Mendeley Data, vol.
I, 2019.
[38] "GitHub," [Online]. Available: https://github.com/YaredTaddese/leaf-image-
segmentation. [Accessed 2020].
[39] "Python Docs," [Online]. Available: https://docs.python.org/3/library/os.html.
[Accessed 2020].
[40] "Keras," [Online]. Available: https://keras.io/api/preprocessing/image/. [Accessed
2020].
[41] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale
image recognition," arXiv preprint arXiv:1409.1556, 2014.
[42] "Keras," [Online]. Available: https://keras.io/api/losses/. [Accessed 2020].
[43] "TensorFlow," [Online]. Available:
https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/an
droid/EXPLORE_THE_CODE.md. [Accessed 2020].
60 | P a g e
10. Appendix
• Assorted leaf model – Python
# Import the necessary Python and deep learning packages
from __future__ import absolute_import, division, print_function,

unicode_literals
import tensorflow as tf
from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Activation, Dense, Conv2D, Flatten,
Dropout, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from skimage import io
import seaborn as sns

import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Select the working folder i.e. the directory where the dataset is found
dataset_path = 'E:/Leaves_diseases_Classification/Datasets/Plant
VillageSegmented_6_2_2_First_Commit'
train_dir = os.path.join(dataset_path, 'Train')

validation_dir = os.path.join(dataset_path, 'Validation')
test_dir = os.path.join(dataset_path, 'Test')
dir_list = os.listdir(train_dir)
dir_list.sort()
print (dir_list)
#Create a DataFrame for the train data used

train_data_df = pd.DataFrame(columns=['path', 'leaf', 'disease','label'])
count = 0
for i in dir_list:
file_list = os.listdir(train_dir + '/' + i)
for f in file_list:
nm = i.split('.')[0].split('_')
path=train_dir + '/' + i + '/' + f
#img = io.imread(path)
leaf = nm[0]
disease = nm[-1]
label=i
#data_df.loc[count] = [path, leaf, disease, img]
train_data_df.loc[count] = [path, leaf, disease, i]
count+=1
#Create a DataFrame for validation the data used

validation_data_df = pd.DataFrame(columns=['path', 'leaf',
'disease','label'])
61 | P a g e
count = 0
for i in dir_list:
file_list = os.listdir(validation_dir + '/' + i)
for f in file_list:
path=validation_dir + '/' + i + '/' + f
leaf = nm[0]
disease = nm[-1]
label=i
validation_data_df.loc[count] = [path, leaf, disease, i]
count+=1
#Create a DataFrame for the test data

test_data_df = pd.DataFrame(columns=['path', 'leaf', 'disease','label'])
count = 0
for i in dir_list:
file_list = os.listdir(test_dir + '/' + i)
for f in file_list:
path=test_dir + '/' + i + '/' + f
leaf = nm[0]
disease = nm[-1]
label=i
test_data_df.loc[count] = [path, leaf, disease, i]
count+=1
# Display some samples within the dataFrames
print (len(train_data_df))
train_data_df.sample(n=10)
print (len(validation_data_df))
validation_data_df.sample(n=10)
print (len(test_data_df))
test_data_df.sample(n=10)
# Plotbar for visualizing datadistribution
def plot_dist(dist, color_code='#C2185B', title="Plot"):

"""
To plot the data distributioin by class.
Arg:
dist: pandas series of label count.
"""
tmp_df = pd.DataFrame()
tmp_df['Label'] = list(dist.keys())
tmp_df['Count'] = list(dist)
fig, ax = plt.subplots(figsize=(40, 20))
62 | P a g e
result =
tmp_df.groupby(["Label"])['Count'].aggregate(np.median).reset_index().sort_v
alues('Label')
ax = sns.barplot(x="Count", y='Label', color=color_code,
data=tmp_df,order=result['Label'])
for index, value in enumerate(dist):
plt.text(value, index, str(value))
ax.set_title(title)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45)
# Count each label within the dataset
train_aux = train_data_df.label.value_counts().sort_index()
print(train_aux)
%matplotlib qt
plot_dist(train_aux, "#2962FF", "Data Distribution")
validation_aux = validation_data_df.label.value_counts().sort_index()
print(validation_aux)
plot_dist(validation_aux, "#2962FF", "Data Distribution")
test_aux = test_data_df.label.value_counts().sort_index()
print(test_aux)
plot_dist(test_aux, "#2962FF", "Data Distribution")
# Display the content of the dataset for better understanding

g = train_data_df.groupby('label')
keys=list(g.groups.keys())
class_names = keys
actualdf = pd.DataFrame({'Label':class_names,'Train':
train_aux.path.values,'Validation':validation_aux.path.values,'Test':test_au
x.path.values})
actualdf
batch_size = 64
epochs = 200
IMG_HEIGHT = 224
IMG_WIDTH = 224
# Define the image data generators

train_image_generator = ImageDataGenerator(rotation_range=40,
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
validation_image_generator = ImageDataGenerator(rotation_range=40,
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
test_image_generator = ImageDataGenerator(rotation_range=40,
63 | P a g e
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
train_data_gen = train_image_generator.flow_from_dataframe(
dataframe=train_data_df,
directory=None,
x_col="path",
y_col="label",
weight_col=None,
target_size=(IMG_HEIGHT, IMG_WIDTH),
batch_size=batch_size,
shuffle=True)
validation_data_gen = train_image_generator.flow_from_dataframe(
dataframe=validation_data_df,
directory=None,
x_col="path",
y_col="label",
weight_col=None,
shuffle=True)
test_data_gen = test_image_generator.flow_from_dataframe(
dataframe=test_data_df,
directory=None,
x_col="path",
y_col="label",
weight_col=None,
shuffle=False)
# This function will plot images in the form of a grid with 1 row and 5
columns where images are placed in each column.
def plotImages(images_arr):
fig, axes = plt.subplots(1, 5, figsize=(20,20))
axes = axes.flatten()
for img, ax in zip( images_arr, axes):
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
total_train = len(train_data_df)
total_val = len(validation_data_df)
#Fine tune the VGG16 architecture
base_model = tf.keras.applications.VGG16(
include_top=True,
64 | P a g e
weights="imagenet",
input_tensor=None,
input_shape=(IMG_HEIGHT,IMG_WIDTH,3),
pooling='Max')
base_model.trainable=False
new_model=Sequential()
for layer in base_model.layers:
new_model.add(layer)
new_model.pop()
new_model.add(Dense(38, activation = 'softmax'))
new_model.layers[19].trainable = True
# Compile the model

new_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
new_model.summary()
# Fit the model

history = new_model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=validation_data_gen,
validation_steps=total_val // batch_size
)
# Plot the model's accuracy and loss

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss=history.history['loss']
val_loss=history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
labels = '\n'.join(sorted(train_data_gen.class_indices.keys()))
with open('E:/Leaves_diseases_Classification/Models/Assorted
leaves/labels.txt', 'w') as f:
f.write(labels)
65 | P a g e
def print_confusion_matrix(confusion_matrix, class_names, figsize = (20,15),
fontsize=14):
df_cm = pd.DataFrame(
confusion_matrix, index=class_names, columns=class_names,
)
fig = plt.figure(figsize=figsize)
try:
heatmap = sns.heatmap(df_cm, annot=True, fmt="d")
except ValueError:
raise ValueError("Confusion matrix values must be integers.")
heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0,
ha='right', fontsize=fontsize)
heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(),
rotation=45, ha='right', fontsize=fontsize)
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Predict on the test dataset

total_test = len(test_data_df)
print(total_test)
preds = new_model.predict_generator(test_data_gen, steps = total_test //
batch_size+1, verbose=1)
# Compute the confusion matrix

from sklearn.metrics import confusion_matrix
y_pred = np.argmax(preds, axis=1)

print('Confusion Matrix')
c=confusion_matrix(test_data_gen.classes, y_pred)
c
# Print the confmat

g = validation_data_df.groupby('label')
keys=list(g.groups.keys())
class_names = keys
print(keys)
class_names = keys
print_confusion_matrix(c, class_names)
# Save the model onto disk and convert it to TFLite

saved_model_dir='E:/Leaves_diseases_Classification/Models/Assorted leaves/'
tf.saved_model.save(new_model, saved_model_dir)
converter=tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model=converter.convert()
with open('E:/Leaves_diseases_Classification/Models/Assorted
leaves/model.tflite', 'wb') as f:
f.write(tflite_model)
# Display the information from the confmat in an easily understandable way

actualvalues_count = c.sum(axis=1)
predictedvalues_count = c.sum(axis=0)
correctpreds_count = np.diagonal(c)
missclass_percentage = (actualvalues_count-
correctpreds_count)/actualvalues_count
66 | P a g e
actualdf = pd.DataFrame({'Label':class_names,'Actualvalues':
actualvalues_count,
'Correct predicted values':correctpreds_count,
'Total predicted values':predictedvalues_count,
'Wrong predictions':predictedvalues_count-
correctpreds_count,
'Missclassification percentage':
missclass_percentage})
actualdf['Missclassification percentage'] = pd.Series(["{0:.2f}%".format(val
* 100) for val in actualdf['Missclassification percentage']], index =
actualdf.index)
actualdf
• Tomato leaf model – Python
# The code is exactly the same as the one presented below excepting two
aspects
# have to be modified
dataset_path =
'E:/Leaves_diseases_Classification/Datasets/PlantVillageTomatoes'
epochs=150
• Manifest.xml - Android
<?xml version="1.0" encoding="utf-8"?>

<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
package="com.example.tflite_leaf_classifier">


<uses-feature
android:name="android.hardware.camera"
<uses-feature
android:name="android.hardware.camera.any"
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.CAMERA" />
<application
android:allowBackup="true"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/AppTheme">
<activity android:name=".ChooseModel">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />

</intent-filter>
</activity>
<activity android:name=".Classify"></activity>
67 | P a g e

<activity android:name="com.soundcloud.android.crop.CropImageActivity"
android:screenOrientation="portrait"
tools:ignore="LockedOrientationActivity" />
</application>
</manifest>
• ChooseModel.java - Android
package com.example.tflite_leaf_classifier;
import android.Manifest;
import android.annotation.SuppressLint;
import android.content.ContentValues;
import android.content.Intent;
import android.content.pm.ActivityInfo;
import android.content.pm.PackageManager;
import android.net.Uri;
import android.os.Build;
import android.provider.MediaStore;
import androidx.annotation.NonNull;
import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;
import androidx.core.content.ContextCompat;
import android.os.Bundle;
import android.util.Log;
import android.view.View;
import android.widget.Button;
import android.widget.Toast;
import com.soundcloud.android.crop.Crop;
import java.io.File;
public class ChooseModel extends AppCompatActivity {

// button for each available classifier
private Button modelFloat;
private Button modelFloat1;
// for permission requests

public static final int REQUEST_PERMISSION = 300;
// request code for permission requests to the os for image

public static final int REQUEST_IMAGE = 100;
// will hold uri of image obtained from camera

private Uri imageUri;
// string to send to next activity that describes the chosen classifier

private String chosen;
//boolean value dictating if chosen model is quantized version or not.

private boolean quant;
@Override
68 | P a g e
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_choose_model);
// request permission to use the camera on the user's phone

if (ActivityCompat.checkSelfPermission(this.getApplicationContext(),
android.Manifest.permission.CAMERA) != PackageManager.PERMISSION_GRANTED){
ActivityCompat.requestPermissions(this, new String[]
{android.Manifest.permission.CAMERA}, REQUEST_PERMISSION);
}
// request permission to write data (aka images) to the user's external

storage of their phone
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M
&& ContextCompat.checkSelfPermission(this,
Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
ActivityCompat.requestPermissions(this, new
String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE},
REQUEST_PERMISSION);
}
// request permission to read data (aka images) from the user's external
storage of their phone
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M
&& ContextCompat.checkSelfPermission(this,
Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
ActivityCompat.requestPermissions(this, new
String[]{Manifest.permission.READ_EXTERNAL_STORAGE},
REQUEST_PERMISSION);
}
// on click for inception float model

modelFloat = (Button)findViewById(R.id.diverse_leaves);
modelFloat.setOnClickListener(new View.OnClickListener() {
@Override
// filename in assets
chosen = "model.tflite";
// model in not quantized
quant = false;
// open camera
openCameraIntent();
}
});
// on click for inception float model

modelFloat1 = (Button)findViewById(R.id.tomato_leaves);
modelFloat1.setOnClickListener(new View.OnClickListener() {
@Override
// filename in assets
chosen = "model_TomatoLeaves.tflite";
// model in not quantized
quant = false;
// open camera
openCameraIntent();
}
});
69 | P a g e
// opens camera for user
@SuppressLint("SourceLockedOrientationActivity")
private void openCameraIntent(){
ContentValues values = new ContentValues();
values.put(MediaStore.Images.Media.TITLE, "New Picture");
values.put(MediaStore.Images.Media.DESCRIPTION, "From your Camera");
// tell camera where to store the resulting picture
imageUri = getContentResolver().insert(
MediaStore.Images.Media.EXTERNAL_CONTENT_URI, values);
Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
intent.putExtra(MediaStore.EXTRA_OUTPUT, imageUri);
setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_PORTRAIT);
//setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_PORTRAIT);
// start camera, and wait for it to finish
startActivityForResult(intent, REQUEST_IMAGE);
}
// checks that the user has allowed all the required permission of read and
write and camera. If not, notify the user and close the application
@Override
public void onRequestPermissionsResult(final int requestCode, @NonNull final
String[] permissions, @NonNull final int[] grantResults) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults);
if (requestCode == REQUEST_PERMISSION) {
if (!(grantResults.length > 0 && grantResults[0] ==
PackageManager.PERMISSION_GRANTED)) {
Toast.makeText(getApplicationContext(),"This application needs
read, write, and camera permissions to run. Application now
closing.",Toast.LENGTH_LONG);
System.exit(0);
}
}
}
// dictates what to do after the user takes an image, selects and image, or
crops an image
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data){
super.onActivityResult(requestCode, resultCode, data);
// if the camera activity is finished, obtained the uri, crop it to make it
square, and send it to 'Classify' activity
if(requestCode == REQUEST_IMAGE && resultCode == RESULT_OK) {
try {
Uri source_uri = imageUri;
Uri dest_uri = Uri.fromFile(new File(getCacheDir(), "cropped"));
// need to crop it to square image as CNN's always required square
input
Crop.of(source_uri, dest_uri).asSquare().start(ChooseModel.this);
} catch (Exception e) {
e.printStackTrace();
}
}
// if cropping acitivty is finished, get the resulting cropped image uri

and send it to 'Classify' activity
else if(requestCode == Crop.REQUEST_CROP && resultCode == RESULT_OK){
imageUri = Crop.getOutput(data);
Intent i = new Intent(ChooseModel.this, Classify.class);
// put image data in extras to send
i.putExtra("resID_uri", imageUri);
// put filename in extras
70 | P a g e
i.putExtra("chosen", chosen);
// put model type in extras
i.putExtra("quant", quant);
// send other required data
startActivity(i);
finish();
}
}
• Classify.java - Android
package com.example.tflite_leaf_classifier;
import androidx.appcompat.app.AppCompatActivity;
import android.content.Intent;
import android.content.res.AssetFileDescriptor;
import android.graphics.Bitmap;
import android.graphics.Matrix;
import android.graphics.drawable.BitmapDrawable;
import android.net.Uri;
import android.os.SystemClock;
import android.provider.MediaStore;
import android.os.Bundle;
import android.view.View;
import android.widget.Button;
import android.widget.ImageView;
import android.widget.LinearLayout;
import android.widget.TextView;
import org.tensorflow.lite.Interpreter;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.AbstractMap;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.PriorityQueue;
public class Classify extends AppCompatActivity {

// presets for rgb conversion
private static final int RESULTS_TO_SHOW = 3;
private static final int IMAGE_MEAN = 128;
private static final float IMAGE_STD = 128.0f;
// options for model interpreter

private final Interpreter.Options tfliteOptions = new Interpreter.Options();
// tflite graph
71 | P a g e
private Interpreter tflite;
// holds all the possible labels for model
private List<String> labelList;
// holds the selected image data as bytes
private ByteBuffer imgData = null;
// holds the probabilities of each label for non-quantized graphs
private float[][] labelProbArray = null;
// holds the probabilities of each label for quantized graphs
private byte[][] labelProbArrayB = null;
// array that holds the labels with the highest probabilities
private String[] topLables = null;
// array that holds the highest probabilities
private String[] topConfidence = null;
// selected classifier information received from extras

private String chosen;
private boolean quant;
// input image dimensions for the Inception Model

private int DIM_IMG_SIZE_X = 224;
private int DIM_IMG_SIZE_Y = 224;
private int DIM_PIXEL_SIZE = 3;
// int array to hold image data

private int[] intValues;
// activity elements
private ImageView selected_image;
private Button classify_button;
private Button back_button;
private LinearLayout result_layout;
private TextView label1;
private TextView Confidence1;
// priority queue that will hold the top results from the CNN
private PriorityQueue<Map.Entry<String, Float>> sortedLabels =
new PriorityQueue<>(
RESULTS_TO_SHOW,
new Comparator<Map.Entry<String, Float>>() {
@Override
public int compare(Map.Entry<String, Float> o1,
Map.Entry<String, Float> o2) {
return (o1.getValue()).compareTo(o2.getValue());
}
});
@Override
protected void onCreate(Bundle savedInstanceState) {
// get all selected classifier data from classifiers
chosen = (String) getIntent().getStringExtra("chosen");
quant = (boolean) getIntent().getBooleanExtra("quant", false);
// initialize array that holds image data

intValues = new int[DIM_IMG_SIZE_X * DIM_IMG_SIZE_Y];
super.onCreate(savedInstanceState);
72 | P a g e
//initilize graph and labels
try{
tflite = new Interpreter(loadModelFile(), tfliteOptions);
labelList = loadLabelList();
} catch (Exception ex){
ex.printStackTrace();
}
// initialize byte array. The size depends if the input data needs to be
quantized or not
imgData = ByteBuffer.allocateDirect(4 * DIM_IMG_SIZE_X * DIM_IMG_SIZE_Y *
DIM_PIXEL_SIZE);
imgData.order(ByteOrder.nativeOrder());
// initialize probabilities array. The datatypes that array holds depends

if the input data needs to be quantized or not
labelProbArray = new float[1][labelList.size()];
setContentView(R.layout.activity_classify);
// labels that hold top three results of CNN

label1 = (TextView) findViewById(R.id.label1);
// displays the probabilities of top labels
Confidence1 = (TextView) findViewById(R.id.Confidence1);
result_layout= (LinearLayout) findViewById(R.id.labels);
// initialize imageView that displays selected image to the user
selected_image = (ImageView) findViewById(R.id.selected_image);
// initialize array to hold top labels

topLables = new String[RESULTS_TO_SHOW];
// initialize array to hold top probabilities
topConfidence = new String[RESULTS_TO_SHOW];
// allows user to go back to activity to select a different image

back_button = (Button)findViewById(R.id.back_button);
back_button.setOnClickListener(new View.OnClickListener() {
@Override
Intent i = new Intent(Classify.this, ChooseModel.class);
startActivity(i);
finish();
}
});
// classify current dispalyed image

classify_button = (Button)findViewById(R.id.classify_image);
classify_button.setOnClickListener(new View.OnClickListener() {
@Override
// get current bitmap from imageView
Bitmap bitmap_orig =
((BitmapDrawable)selected_image.getDrawable()).getBitmap();
// resize the bitmap to the required input size to the CNN
Bitmap bitmap = getResizedBitmap(bitmap_orig, DIM_IMG_SIZE_X,
DIM_IMG_SIZE_Y);
73 | P a g e
// convert bitmap to byte array
convertBitmapToByteBuffer(bitmap);
// pass byte data to the graph
if(quant){
tflite.run(imgData, labelProbArrayB);
} else {
tflite.run(imgData, labelProbArray);
}
// display the results
printTopKLabels();
}
});
// get image from previous activity to show in the imageView

Uri uri = (Uri)getIntent().getParcelableExtra("resID_uri");
try {
Bitmap bitmap = MediaStore.Images.Media.getBitmap(getContentResolver(),
uri);
selected_image.setImageBitmap(bitmap);
// not sure why this happens, but without this the image appears on its
side
selected_image.setRotation(selected_image.getRotation() + 90);
} catch (IOException e) {
e.printStackTrace();
}
}
// loads tflite grapg from file

private MappedByteBuffer loadModelFile() throws IOException {
AssetFileDescriptor fileDescriptor = this.getAssets().openFd(chosen);
FileInputStream inputStream = new
FileInputStream(fileDescriptor.getFileDescriptor());
FileChannel fileChannel = inputStream.getChannel();
long startOffset = fileDescriptor.getStartOffset();
long declaredLength = fileDescriptor.getDeclaredLength();
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset,
declaredLength);
}
// converts bitmap to byte array which is passed in the tflite graph

private void convertBitmapToByteBuffer(Bitmap bitmap) {
if (imgData == null) {
return;
}
imgData.rewind();
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(),
bitmap.getHeight());
// loop through all pixels
int pixel = 0;
for (int i = 0; i < DIM_IMG_SIZE_X; ++i) {
for (int j = 0; j < DIM_IMG_SIZE_Y; ++j) {
final int val = intValues[pixel++];
// get rgb values from intValues where each int holds the rgb
values for a pixel.
// if quantized, convert each rgb value to a byte, otherwise to a
float
if(quant){
imgData.put((byte) ((val >> 16) & 0xFF));
imgData.put((byte) ((val >> 8) & 0xFF));
imgData.put((byte) (val & 0xFF));
} else {
74 | P a g e
imgData.putFloat((((val) & 0xFF)-IMAGE_MEAN)/IMAGE_STD);
}
}
}
}
// loads the labels from the label txt file in assets into a string array
private List<String> loadLabelList() throws IOException {
List<String> labelList = new ArrayList<String>();
BufferedReader reader;
if(chosen.equals("model.tflite"))
{
reader = new BufferedReader(new
InputStreamReader(this.getAssets().open("labels.txt")));
}
else
{
reader = new BufferedReader(new
InputStreamReader(this.getAssets().open("labels_TomatoLeaves.txt")));
}
String line;
while ((line = reader.readLine()) != null) {
labelList.add(line);
}
reader.close();
return labelList;
}
// print the top labels and respective confidences

private void printTopKLabels() {
// add all results to priority queue
for (int i = 0; i < labelList.size(); ++i) {
if(quant){
sortedLabels.add(
new AbstractMap.SimpleEntry<>(labelList.get(i),
(labelProbArrayB[0][i] & 0xff) / 255.0f));
} else {
sortedLabels.add(
new AbstractMap.SimpleEntry<>(labelList.get(i),
labelProbArray[0][i]));
}
if (sortedLabels.size() > RESULTS_TO_SHOW) {
sortedLabels.poll();
}
}
// get top results from priority queue

final int size = sortedLabels.size();
for (int i = 0; i < size; ++i) {
Map.Entry<String, Float> label = sortedLabels.poll();
topLables[i] = label.getKey();
topConfidence[i] = String.format("%.0f%%",label.getValue()*100);
}
// set the corresponding textviews with the results

label1.setText("1. "+topLables[2]);
75 | P a g e
Confidence1.setText(topConfidence[2]);
result_layout.setVisibility(View.VISIBLE);
}
// resizes bitmap to given dimensions

public Bitmap getResizedBitmap(Bitmap bm, int newWidth, int newHeight) {
int width = bm.getWidth();
int height = bm.getHeight();
float scaleWidth = ((float) newWidth) / width;
float scaleHeight = ((float) newHeight) / height;
Matrix matrix = new Matrix();
matrix.postScale(scaleWidth, scaleHeight);
Bitmap resizedBitmap = Bitmap.createBitmap(
bm, 0, 0, width, height, matrix, false);
return resizedBitmap;
}
}
• custom_back_button.xml - Android

<shape xmlns:android="http://schemas.android.com/apk/res/android"
android:shape="rectangle">
<gradient android:startColor="#00c3ff" android:endColor="#ffff1c"

android:type="linear" android:angle="135"/>
<stroke android:width="1dp" android:color="#5472d3"/>
<corners android:radius="50dp"/>
</shape>
• custom_capture_button.xml - Android

<gradient android:startColor="#00C9FF" android:endColor="#92FE9D"

android:type="linear"/>
</shape>
• custom_classify_button.xml - Android

<gradient android:startColor="#00c3ff" android:endColor="#ffff1c"

android:type="linear" android:angle="45"/>
<stroke android:width="1dp" android:color="#5472d3"/>
76 | P a g e
</shape>
• activity_choose_model.xml - Android


<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".ChooseModel"
android:background="@drawable/background1">
<LinearLayout
android:id="@+id/b_group"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginTop="119dp"
android:layout_gravity="center"
android:layout_centerHorizontal="true"
android:gravity="center"
android:orientation="horizontal">
<Button
android:id="@+id/tomato_leaves"
android:layout_height="45dp"
android:background="@drawable/custom_capture_button"
android:text=" Tomato leaves " />
<Button
android:id="@+id/diverse_leaves"
android:background="@drawable/custom_capture_button"
android:text=" Assorted leaves " />
</LinearLayout>
<TextView
android:id="@+id/textView2"
android:layout_below="@+id/b_group"
android:layout_alignParentTop="true"
android:text="Choose your model"
android:textColor="#FFF"
android:textSize="20sp"
android:textStyle="bold"/>
</RelativeLayout>
77 | P a g e
• activity_classify.xml - Android

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
tools:context=".Classify"
android:background="@drawable/background2">
<ImageView
android:id="@+id/selected_image"
android:layout_width="300dp"
android:layout_alignParentTop="true"
android:layout_below="@id/selected_image"
app:srcCompat="@android:color/background_light" />
<LinearLayout
android:id="@+id/btn_group"
android:layout_below="@+id/selected_image"
<Button
android:id="@+id/classify_image"
android:background="@drawable/custom_classify_button"
android:text="Classify!" />
<Button
android:id="@+id/back_button"
android:background="@drawable/custom_back_button"
android:text="Back" />
</LinearLayout>
<LinearLayout
android:id="@+id/labels"
android:layout_below="@+id/btn_group"
android:layout_alignParentBottom="true"
android:visibility="invisible"
android:layout_marginBottom="44dp"
android:orientation="vertical">
<LinearLayout
78 | P a g e
<TextView
android:id="@+id/lableText"
android:text="Class"
android:textStyle="bold" />
<TextView
android:id="@+id/ConfidenceText"
android:text="Likelihood"
android:textStyle="bold" />
</LinearLayout>
<LinearLayout
<TextView
android:id="@+id/label1"
android:layout_marginRight="10dp"
android:text="1."
android:textSize="12sp" />
<TextView
android:id="@+id/Confidence1"
android:text=""
</LinearLayout>
<LinearLayout
<TextView
79 | P a g e
android:text="2."
<TextView
android:text=""
</LinearLayout>
<LinearLayout
<TextView
android:text="3."
<TextView
android:text=""
</LinearLayout>
</LinearLayout>
</RelativeLayout>
• build.gradle - Android
apply plugin: 'com.android.application'
android {
compileSdkVersion 28
defaultConfig {
applicationId "com.example.tflite_leaf_classifier"
minSdkVersion 15
targetSdkVersion 28
80 | P a g e
versionCode 1
versionName "1.0"
testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
}
buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android.txt'),
'proguard-rules.pro'
}
}
aaptOptions {
noCompress "tflite"
noCompress "lite"
}
dependencies {
implementation fileTree(dir: 'libs', include: ['*.jar'])
implementation 'androidx.appcompat:appcompat:1.1.0'
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'androidx.test.ext:junit:1.1.1'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
// dependency to allow us to crop square images
implementation 'com.soundcloud.android:android-crop:1.0.1@aar'
implementation 'org.tensorflow:tensorflow-lite:+'
}
81 | P a g e
Curriculum vitae
PERSONAL INFORMATION Alexandru Mihai Spoiala

2 Primaverii Street, 400540 Cluj-Napoca (Romania)
0742690326
alexandruspoiala@yahoo.com
Skype mihaita_alex1997
Sex Male | Date of birth 02/11/1997 | Nationality Romanian
JOB APPLIED FOR Network DevOps Engineer
WORK EXPERIENCE
17 Jul 2017–10 Sep 2017 Intern

TEC Software Solutions, Cluj-Napoca (Romania)
- Study OOP concepts and fundamental elements related to C# Programming Language
- Use SQL Server to create a database containing information about customers
- Create an API using the WebAPI Framework to lay out HTTP endpoints using the data stored in the
previously created database
- Build an Web Application using HTML, CSS, JavaScript and jQuery
20 Jun 2018–30 Sep 2018 Cook

Flatlanders, Greenville (United States)
1 Jul 2018–15 Sep 2018 Stocker

Indian Hill Trading Post, Greenville (United States)
1 Jul 2019–30 Sep 2019 Cook

Flatlanders, Greenville (United States)
8 Jul 2019–30 Sep 2019 Stocker

Indian Hill Trading Post, Greenville (United States)
EDUCATION AND TRAINING
15 Sep 2012–21 Jun 2016 Baccalaureate Diploma EQF level 5

National College "Petru Rareș" Suceava, Suceava (Romania)
Real profile: "Mathematics and Informatics"
Main subjects:
- English, French, C/C++ programming language, Mathematics, Physics
Skills covered:
- understanding the teamwork concept
- communicate concisely, clearly and accurately with others
- basic knowledge of C/C++
- efficient interpretation, usage and evaluation of informational data
3/6/20 © European Union, 2002-2020 | http://europass.cedefop.europa.eu Page 1 / 3

Curriculum vitae Alexandru Mihai Spoiala
1 Oct 2016–Present Bachelor's Degree in Telecommunications and Systems of EQF level 6

Telecommunications
Technical University of Cluj-Napoca, Faculty of Electronics, Telecommunications and
Information Technology, Cluj-Napoca (Romania)
Main subjects:
- Advanced Mathematics, Computer Programming (C/C++), Software Engineering (Java), Information
Theory and Coding, Computer Aided Design, Modulation Techniques, Telephony, Cellular Radio
Communications, Computers Networking, Switching and Routing Systems, Internet protocols, Mobile
Communications, Television, Numerical processing of images
PERSONAL SKILLS
Mother tongue(s) Romanian
Foreign language(s) UNDERSTANDING SPEAKING WRITING
Listening Reading Spoken interaction Spoken production
English B2 B2 B2 B2 B2
French A1 A1 A1 A1 A1
German A1 A1 A1 A1 A1
Levels: A1 and A2: Basic user - B1 and B2: Independent user - C1 and C2: Proficient user
Common European Framework of Reference for Languages - Self-assessment grid
Communication skills - good communication skills in an international language (English) gained and developed during the
working experience from United States
- active listener, always providing feedback / asking questions
- non-verbal communication skills
- ability to motivate and to cooperate with the other teammates in order to create a pleasant working
environment
Organisational / managerial skills - good organisational skills achieved throughout the faculty by realizing different projects
- easily getting used with new working environments
- leadership / good managerial skills developed due to the fact that I have been responsible for training
two fresh cooks during my second working experience in United States
Digital skills SELF-ASSESSMENT
Information Content Problem-

Communication Safety
processing creation solving
Proficient user Proficient user Independent user Proficient user Independent user
Digital skills - Self-assessment grid
ECDL Certificate
- Skills acquired due to ECDL Certificate: ability to work with Microsoft Word, Microsoft Excel and
Microsoft PowerPoint
- Good knowledge of programming languages such as: C/C++, Java, C# and JavaScript
- Video editing skills achieved by working with Adobe Premiere Pro / After Effects and Sony Vegas Pro
- Ability to work with simulators like Matlab, LTSpice, Mefisto-2D, Orcad, Emu8086

Curriculum vitae Alexandru Mihai Spoiala
- Good knowledge of development environments for computer programming such as: Visual Studio,
Eclipse and Anaconda
Driving licence AM, B1, B
ADDITIONAL INFORMATION
Projects 1. Create a basic Paint Application using Java Language

2. Create a video (15 mins long) using Adobe Premiere Pro / After Effects
3. Client- server TCP/IP architecture: Stream Socket-based communication in IPv4/IPv6
4. Asterisk PBX: Implement SIP/IAX functions
5. Create an API using the WebAPI Framework to lay out HTTP endpoints using the data stored in a
previously created database
6. Digital Code Lock Using Arduino (include creating an android bluetooth application: Keypad)
7. Zooming and Shrinking Images by Pixel Replication and Bicubic Interpolation using C#
programming language

Deep Learning Based Image Classification On Smartphones

Uploaded by

Copyright:

Available Formats

You might also like

Deep Learning Based Image Classification On Smartphones

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning Based Image Classification On Smartphones

Uploaded by

Copyright:

Available Formats

-NAPOCA

cn=GABRIEL OLTEAN, sn=OLTEAN, Digitally signed by VIRGIL-MIRCEA DOBROTA

2014 Date: 2020.07.16 22:39:14 +03'00'

Data emiterii temei:

Digitally signed by VIRGIL-MIRCEA DOBROTA

VIRGIL-MIRCEA sn=DOBROTA, givenName=VIRGIL-MIRCEA,

DOBROTA ou=COMUNICATII, o=UNIVERSITATEA TEHNICA

1. Table of Contents ........................................................................................................... 1

2.1. State of the art

Figure 1.Traditional and deep learning approaches of image classification

Figure 2. Basic Architecture of an Artificial Neural Network

Figure 4. Neuron's working principle

Figure 8. Stochastic gradient descendent

Figure 9. Convolutional neural network architecture

Stratul de intrare al unei rețele neuronale convoluționale conține imaginea reprezentată în

Figure 22. Augmented images

Figure 14. PlantVillage Dataset

Figure 24. Layers parameters configuration

După câteva încercări de configurare a modelului cu diferite optimizatoare și rate de

2.4. Experimental results

Figure 37. Application's working principle

Un dezavantaj al utilizării modelului VGG16 pentru aplicațiile android constă în

5.1. Image Classification

Figure 1.Traditional and deep learning approaches of image classification

Figure 2. Basic Architecture of an Artificial Neural Network

5.2.2. Data flow through neural networks

Figure 3. Weights representation in ANNs

𝑁𝑒𝑢𝑟𝑜𝑛′ 𝑠 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑖𝑛𝑝𝑢𝑡𝑠) (1)

Figure 4. Neuron's working principle

𝑦𝑛𝑜𝑛−𝑙𝑖𝑛𝑒𝑎𝑟 = 𝑎𝑓(𝑤1 ∙ 𝑎1 + 𝑤2 ∙ 𝑎2 + ⋯ + 𝑤𝑛 ∙ 𝑎𝑛 + 𝑏) (3)

Figure 6. ReLU function

5.3. Learning algorithm

𝑥 (𝑁) = 𝑤 (𝑁) ∙ 𝑎 + 𝑏𝑖𝑎𝑠 (7)

𝜕𝐶𝑜𝑠𝑡 𝜕𝐶𝑜𝑠𝑡 𝜕𝑎(𝑁) 𝜕𝑥(𝑁)

𝜕𝐶𝑜𝑠𝑡 𝜕𝐶𝑜𝑠𝑡 𝜕𝑎(𝑁) 𝜕𝑥 (𝑁)

5.3.2. Learning Rate

5.3.3. Stochastic gradient descendent – SGD optimizer

Figure 7. Gradient descendent

Figure 8. Stochastic gradient descendent

5.3.4. Adaptive moment estimation – Adam optimizer

Figure 9. Convolutional neural network architecture

5.4.2. Convolution algorithm

𝑦(𝑘, 𝑙) = ∑ ∑ 𝑚(𝑝, 𝑞) ∙ 𝑥(𝑘 − 𝑝, 𝑙 − 𝑞) (15)

The mathematical expression of the convolution process is presented in equation (15),

Figure 10. Example of kernel used for convolution

Figure 11. Convolution algorithm

5.4.3. Pooling operation

As it is mentioned in section the “5.4.1 Architecture”, a convolution layer within a CNN is

Figure 12. Types of pooling

5.5.2. Fine-tuning. Transfer learning

6.1.1. Anaconda environment. Jupyter Notebook

Android Studio is an Integrated Development Environment used for developing android

6.2. Implementation of the CNN classifier

Figure 13. Implementation Workflow

Figure 14. PlantVillage Dataset

Figure 16. Data partitioning for the first dataset

Figure 17. Data partitioning for the second dataset

train_data_df = pd.DataFrame(columns=['path', 'leaf', 'disease','label'])

from future import absolute_import, division, print_function,