Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

CSE3506

Essentials of Data Analytics

J Component Report

A project report titled

Apparel Categorisation using Image Classification Model

By

Utkarsh 18BEC1230
Hemant Pamnani 18BEC1241

BACHELOR OF TECHNOLOGY

IN

ELECTRONICS AND COMMUNICATION ENGINEERING

Submitted to
Dr. R. KARTHIK
MAY 2021
School of Electronics Engineering
DECLARATION BY THE CANDIDATE

I hereby declare that the Report entitled “Apparel Categorisation using Image
Classification Model” submitted by us to VIT Chennai is a record of bonafide work
undertaken by us under the supervision of Dr. R. Karthik, Senior Assistant
Professor, SENSE, VIT Chennai.

Hemant Pamnani Utkarsh

Chennai

05/06/2021.
ACKNOWLEDGEMENT

We wish to express our sincere thanks and deep sense of gratitude to our project
guide, Dr. R. Karthik, School of Electronics Engineering for his consistent
encouragement and valuable guidance offered to us in a pleasant manner throughout
the course of the project work.

We are extremely grateful to Dr. Sivasubramanian. A, Dean of the School


of Electronics Engineering (SENSE), VIT University Chennai, for extending the
facilities of the School towards our project and for his unstinting support.

We express our thanks to our Head of The Department Dr. Vetrivelan. P (for
B.Tech-ECE) for his support throughout the course of this project.

We also take this opportunity to thank all the faculty of the School for their
support and their wisdom imparted to us throughout the courses till date.

We thank our parents, family, and friends for bearing with us throughout the
course of our project and for the opportunity they provided us in undergoing this
course in such a prestigious institution.
BONAFIDE CERTIFICATE

Certified that this project report entitled “Apparel Categorisation using Image
Classification Model” is a bonafide work of Utkarsh (18BEC1230) and Hemant
Pamnani (18BEC1241) carried out the “J”-Project work under my supervision and
guidance for CSE 3506 ESSENTIALS OF DATA ANALYTICS.

Dr. R. Karthik

School of Electronics Engineering

VIT University, Chennai

Chennai – 600 127.


TABLE OF CONTENTS

S.NO Chapter PAGE NO.

I Abstract 6

1 Chapter -1 Introduction 7-9

2 Chapter – 2 Requirements and Proposed 10-11


system

3 Chapter -3 Module description 12-15

4 Chapter 4 – Results and Discussion 16-22

5 Chapter 5 - Conclusion 22

6 Code 23-24

7 Reference 25
I. ABSTRACT

Apparel classification from images finds practical applications in e-commerce and


online advertising. The project work describes our approach at classifying an apparel
from a given image using convolutional neural networks which have been proven to
perform remarkably well in the field of image recognition. Fashion industry is
always evolving and it is important to keep up with the latest trends. For example, if
it often happens that we like a particular type of apparel or clothing while watching
a TV show. In such situations, one wants to know where they can buy a similar piece.
With our project, we aim to lay the groundwork to facilitate such a system by which
we can provide a set of similar apparels available for online purchase. This requires
us to be first able to classify clothing with high precision. This task has its own
challenges because very often the clothes are deformed, folded in weird manner and
not exactly stretched to reveal its actual shape. If the picture only contains clothing
but no person wearing it, it can be hard even for a human to classify it accurately.
Also, the pictures are not always taken from the front and this variation of angle can
also add significant difficulty. We believe that with a good amount of data with many
such variations, CNNs will do a good job at learning the features most indicative of
their respective classes.
1. INTRODUCTION

Fashion industry is always evolving and it is important to keep up with the latest
trends. For instance, if it often happens that we like a particular type of apparel or
clothing while watching a TV show. In such situations, one wants to know where
they can buy a similar piece. With our prototype, we aim to lay the groundwork to
facilitate such a system by which we can provide a set of similar apparels available
for online purchase. This requires us to be first able to classify clothing with high
precision. This task has its own challenges because very often the clothes are
deformed, folded in weird manner and not exactly stretched to reveal its actual shape.
If the picture only contains clothing but no person wearing it, it can be hard even for
a human to classify it accurately. Also, the pictures are not always taken from the
front and this variation of angle can also add significant difficulty.

We believe that with a good amount of data with many such variations, CNNs will
do a good job at learning the features most indicative of their respective classes.
Depending on the particular application of fashion classification, the most relevant
problems to solve will differ. We will focus on optimizing fashion classification for
the purposes of annotating images and discovering the most similar fashion items to
a fashion item in a query image. Some of the challenges for this task include: classes
of clothing can share similar characteristics (e.g. Handbag , Travelling Bag , School
Bag etc. are all given a broader label of ‘Bag’ due to their similar characteristics),
clothing can easily deform due to their material, certain types of clothing can be
small, and clothing types can look very different depending on aspect ratio and angle.
1.1 PROBLEM STATEMENT

More than 25% of entire revenue in E-Commerce is attributed to apparels & accessories.
A major problem they face is categorizing these apparels from just the images, especially
when the categories provided by the brands are quite inconsistent.

This poses an interesting computer vision problem which has caught the eyes of several
deep learning researchers.

Thus, the aim of our project is to categorise the apparels into the correct categories (Shoe,
Top, Dress etc.) using the image of the apparel.

1.2 PRIMED OBJECTIVES

● To demarcate/categorise the apparels into the correctly defined categories (Shoe, Top,
Dress, Bag ,Pullover ,Trouser etc.) via employing just the basic image of the apparel .

● To analyze the overall effect of alterations within possible hyper-parameters under


neural-network model evaluation/enhancement phase and keep a check on criticalities
enhancing the robustness of the model.

● Analyzation of vivid supportive algorithms for the model preparation such as 2d-layer
convolution, 2D-max pooling setup, etc.

● Inference through vivid obtained metrics and multiple cross-checks with the actual
categorisation list (further optimization, if required)

1.3 LITERATURE REVIEW

In literature, many techniques have experimented for image classification of Fashion-


MNIST and MNIST. Levi & Hassner (2015), proposed CNN architecture to overcome the
overfitting problem due to less number of images. Result shows that CNN provides
improved gender and age classification results even in case of smaller size of contemporary
unconstrained image sets. For large-scale image classification using CNN with a dataset of
1 million YouTube videos belonging to 487 classes is experimented by Karpathy et al.
(2014). Authors found that CNN learned the powerful features even from weakly-labeled
data. The performance of the proposed model compared using UCF-101 Action
Recognition dataset. Significant performance improvement up to 63.3 % is presented.
Jmour et al. (2018) trained the CNN on ImageNet dataset for traffic sign classification
system. CNN is used to learn features and classify RGB-D images. Various parameters
have an effect on the accuracy of training results.

Authors presented the effect of mini-batch-size on training models. The model gave
93.33% accuracy on the test set with a minimum batch size of 10. CNN for document image
classification is presented in paper by Kang et al. (2014). Authors used CNN with rectified
linear units and trained with dropout. Results obtained using CNN are better than
traditional methods. Tobacco litigation and NIST tax-form dataset are used for
experimentation. For Tobacco dataset, 80% of accuracy is achieved for training and 20%
for validation for 10 classes of images. The median accuracy of 65.37% is achieved for
100 samples. A median accuracy of 100% is achieved through 100 partitions of training
and test on NIST tax form dataset. CNN is used for house number digit classification in a
paper published by Sermanet et al. (2012). They improved the traditional CNN by
multistage features and use of Lp pooling method. Obtained accuracy of 94.85% using
SVHN dataset for 45.2% error improvement.

Hu et al. (2015) proposed , five-layer CNN architecture is experimented on several


hyperspectral image data sets to classify hyperspectral images directly in the spectral
domain, which gives better performance. CNN frameworks such as Caffe used to reduce
training and testing time and achieved 90% accuracy on MNIST dataset. Manessi & Rozza
(2018), has introduced two approaches to learn the different combination of base activation
functions namely identity function, ReLU and tanh. The proposed approaches compared
with well-known architectures namely LeNet-5, AlexNet, and ResNet-56 using three
standard datasets
2. REQUIREMENTS AND PROPOSED SYSTEM

2.1 REQUIREMENTS

Tools Required:

● Rstudio (R-4.1.0)

Dataset Employed:

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training


set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28
grayscale image, associated with a label from 10 classes. Zalando intends Fashion-
MNIST to serve as a direct drop-in replacement for the original MNIST dataset for
benchmarking machine learning algorithms. It shares the same image size and
structure of training and testing splits.

In the commercial model of our proposed prototype, we would need to convert it


into a web application which can take real-time images and classify the apparel
properly.

2.2 PROPOSED SYSTEM

● Baseline-I

As a baseline model, we employed the convolutional neural network to predict the style of
the clothing items. A batch of 128 28x28 inputs is passed through a convolution layer of
32 filters of size 5 × 5 followed by batch normalization, relu activation to bring in non-
linearity and max-pooling over the 2×2 region.

The output from this is again passed into the exact same series of layers once again. Later,
the output from that is passed into a dense fully connected layer of size 128 before going
through another set of batch normalization and relu activation. The last set of layers consist
of another dense layer of size 64 and a softmax layer which converts the outputs to
probability scores for each class.
While training, the model weights are updated by backpropagation so as to reduce the
softmax loss at each iteration. To achieve good performance on the training set, the model
was trained on 13 epochs. While testing, a forward pass is implemented on an image input
and the label with highest score is predicted to be its clothing category.

In summary, the architecture looks like: [conv- batch - relu - 2x2 max pool]*2 - affine -
batch - relu - affine - softmax. A test accuracy of 90.57% was obtained and set as our
baseline for future experiments.

● Baseline-II

For the baseline case, we only considered five attributes: gender, necktie, skin exposure,
wear scarf and collar. As mentioned before, for many of the images certain attributes were
not available because of the ambiguity among the human workers classifying them.

Hence we have a varying number of attributes for each of the images. In order to account
for the varying number of attributes in different images, we trained a different neural
network for each of the chosen attributes. For each of the networks, we only chose the
images having the corresponding attribute for the training set. For all the cases, we trained
a simple 3-layer CNN.

The CNN used was: conv - relu - 2x2 max pool - affine - relu - affine - softmax We obtained
a mean accuracy of 88.26% for the attributes considered in the baseline case. This was not
really impressive considering the fact the labels were binary and the data set was
unbalanced in most of the categories. The accuracy we obtained was not much higher than
the guess accuracy for most of the selected attributes.
3. MODULE DESCRIPTION
3.1 UNDERLYING MODULES

1. Post accessing the training and testing chunks (datasets), we would go for
unflattening the data entries, that means transforming the data into a matrix as they
are easier to interpret in indexable format being in pixelated units.
2. Then, we would go for creation of some sort of significant functions for matrix
rotation, plotting the characteristic images(16 cherry-picked) from the matrix, both
from the training as well as testing sets.
3. Then, we would go for the Neural Network Model (Building + Compiling +
Testing/Evaluating). Moreover, data visualisation is also an important segment of
this module.
4. Finally, we would go for ensuring the correctness of the model opted/designed via
analysing the classification metrics and model feasibility.

3.2 STYLE CLASSIFICATION

Given images of apparels, we try to classify them into different classes (Example: shirt,
blouse, undergarment etc.). For this particular problem, we assume that the input images
are already cropped and one image contains only one clothing item. This input image is
then passed through the CNN to generate one of the labels as the output. For this part, the
input is 16 cherry-picked images from the training set belonging to one of the following
categories:

1. Blouses
2. Cloak
3. Coat
4. Jacket
5. Long Dress
6. Polo shirt or Sport shirt
7. Robe
8. Shirt
9. Short Dress
10. Suit, Suit of clothes
11. Sweater
12. Jersey, T-shirt
13. Undergarment, Upper body
14. Uniform
15. Vest, waist-coat

Data is transformed to matrix format for two primed reasons:

(a) Indexability for better interpretation


(b) Separation of pixels from labels

This image is first converted into a contiguous array which stores individual pixel values.
For example, if we choose a resolution of 64 × 64, the input will have shape 64 × 64 × 3
where the 3 refers to the RGB pixel values. Matrix rotation is required because it is a
tried and tested hack by R users for large pixel-image data. Thus, rows and columns’
order reversal is done and matrix transposition is taken to obtain the image to be
displayed with accurate orientation.

3.3 HYPER-PARAMETER TUNING-I

Instead of employing the popular network architectures, we decided to modify and


experiment on our baseline model. We started with a deeper version of our baseline with 5
convolutional layers instead of 2. Also, we altered the strides of the kernel in each
successive deepening layer, moreover, we tried the concept of same as well as valid
padding too. When we augmented a 5x5x1 image into a 6x6x1 image and then applied the
3x3x1 overlap(kernel) over it, we find that the convolved matrix turns out to be of
dimensions 5x5x1.

The results seemed promising as the performance increased by 6% on the validation set
and around 5% on the test set. Next, we decided to perform hyper-parameter search on the
same network. We could only do trial and error because the training takes around 4-5 hours
and so grid search seemed infeasible.

While doing the hyper-parameter tuning we realized that the training accuracy never went
over 90%. Hence we tried a deeper network with lesser dropout and pooling and Adadelta
optimizer. Optimizers are algorithms/ methods employed to alter the attributes of a neural
network such as weights and learning rate in order to cut down the losses to an extent.
Now, the very reason for employing Adadelta optimizer only is that the concerned
optimizer basically improves the previous algorithm by introducing a history window
which sets a fixed number of past gradients to take into consideration during the training
phase. In this way, we generally don’t encounter the issue of vanishing learning rate . As
we are employing ConvNet with various deep layers, it is alarmingly essential to keep track
of learning rate so that it is maintained during each iteration of flattening with a certain
dropout.

We thought adadelta might be useful because it is the fastest to reach near convergence
even if it might take a long time to converge. This proved to be correct and we could
observe a higher training accuracy and a slight improvement in validation and test
accuracy.

3.4 ATTRIBUTE CLASSIFICATION

In the Clothing attribute dataset, each image had at most 23 clothing attribute labels. For
some of the images, certain attributes were not available because of the ambiguity among
the human workers classifying them. The images in the Clothing attribute dataset were
also in the format of jpgs.

For the classification, we only considered attributes having binary classes (most of them
being a yes/no classification). We have 23 such attributes, which included 11 colors, 5
clothing patterns, scarf and collar identification. To account for the varying image sizes,
we decided to resize them to 100 × 100 images for the baseline test.

3.5 HYPER-PARAMETER TUNING-II

Going forward, we decided to build a single multi-label binary classification CNN


instead of training multiple neural networks. This was a much faster and robust
approach and gave us much better results compared to the baseline case. In order to
account for the missing labels in the training set, we randomly assigned them one of
the classes out of the 10 as mentioned in the MNIST dataset.
This approach is acceptable considering the fact that humans had difficulty
identifying the classes themselves.

We used 2 convolution layers (with batchnorm, relu and dropout layers included)
and 1 pooling layer and a robust fully-connected layer. We used a sigmoid activation
function instead of a softmax activation (which we used in the baseline) as we found
much better accuracy. This is to be expected as we were dealing with multi-label
attributes. We employed cross entropy loss and Adadelta optimizer to train the CNN.
Computation of cross-entropy metric between the labels and predictions are
smoothly done using this inbuilt function. It is usually employed when there are
multiple label classes(2 or more). Basically, it consists of 4 optional arguments
within itself, i.e., name, datatype, from_logits(which is a by-default set that output
encodes a probabilistic distribution) and label-smoothing. We performed hyper-
parameter tuning to improve the best model.

3.6 TRAINING AND EVALUATION

Post-building CNN model as well as its very compilation, here comes, training and
evaluation phase of the model. We employed model%>%fit() to primarily train the
model with a set proportion of data-entries for further validation, prominent
arguments which are passed are x_train, y_train, controlled epochs through which
model should undergo, batch_size (initially instantiated) and verbose(
immediately/directly known subclass).

Here, verbose is set to 1 as we are considering the immediate dense layer as known
subclass which is potential enough in dimensionality reduction procedure. Followed
by verbose, validation sets are also prepared by passing testing sets via list. Probably
one-sixth of the dataset is given for validation of built CNN. Model loss as well as
corresponding accuracy for each successive epoch is nicely plotted and numerics
can be seen on console post each epoch completion.

Also, we have visually verified the correctness of the model predictions in attaching
the correct set of labels to corresponding sets of apparels by employing which.max()
function. It returns the location of the first maximum value in the numeric vector.
That means the numeric vector is containing all probable numerics corresponding to
every label for a set of apparel. But the first highest numeric in that vector is certainly
the perfect candidate of exact label to be affixed to the corresponding apparel set.

4. RESULTS AND DISCUSSION

4.1 RESULT

Fig.1: Plot of 16 cherry-picked images from the training set


Fig.2: Plot of 36 cherry-picked images from the training set

From the above figure we can observe that a broader label is being given for articles.
For instance: Travelling Bag , Hand Bag , Clutches are given a broader label of “Bag” ,
signifying the future scope of study of going deeper into classification.

The Model is run for 13 Epochs, values of loss and accuracy are noted and recorded in
Table 1.
Fig.3: Depicting accuracy and loss for 13 epoch (also recorded in Table 1 given below)

Table 1: Epoch no. vs Validation Loss/Accuracy

Epoch no. Validation loss Validation Accuracy


1 0.3560 0.8693
2 0.2937 0.8922
3 0.2593 0.9049
4 0.2498 0.9069
5 0.2359 0.9134
6 0.2270 0.9174

7 0.2199 0.9185
8 0.2189 0.9198
9 0.2248 0.9173
10 0.2165 0.9178
11 0.2143 0.9183
12 0.2120 0.9213
13 0.2019 0.9298
Fig.4: Plot of Loss and Accuracy of deployed CNN Model on Fashion MNIST Dataset

Fig.5: 16 Cherry picked images are randomly chosen from Test data and are successfully classified .

Hence, Apparel classification is successfully performed by deploying CNN Model for 3


different epochs - 5 , 7 and 13 . Also hyper tuned several parameters and found that
Accuracy is maximum (equal to 0.9298) for Epoch 13 .

4.2 CONSTRAINTS AND TRADE-OFFS

The minute misclassifications can be due to the presence of a large number of redundant
variants for a particular set of apparel, lack of proper image resolution, complex
computational errors such as missing data-entries or their discarding during the flattening
procedure with certain drop-out in-order to inhibit over-fitting. These misclassifications
are considered as non-linear activations’ occurrence while performing CNN, they can be
checked as we add some more deep dense layers out-of-which, the prominently required
layer is Max-pooling layer. It acts as Noise-Suppressant and effectively helps in denoising
the non-linear activations and proper dimensionality reduction for desired classification at
the end.
4.3 FUTURE WORK

● Broader Labelling: Broader label is given for some apparels , which accounts for
future work for digging deeper and giving more explicit labelling. For example:
Travelling Bag, Hand Bag , Clutches are given a broader label of “Bag” , signifying
the future scope of study of going deeper into classification.
● For this project, we built on top of the default AlexNet architecture using the
Imagenet pre-trained weights as a starting point for our work. As a next step, we
will try modified architectures more tailored for practical fashion classification
applications. For example, we are in the process of implementing spatial pyramid
pooling layers, which have been shown to speed up the R-CNN method by 24-102x
● We can employ further data augmentation techniques such as rotating the image and
flipping along the horizontal axis to further perturb the input image.

5. CONCLUSION

To summarize, we successfully implemented CNNs to perform the task of apparel


classification and attribute classification. The results obtained were decent and show great
potential of doing better with higher resolution images and more sophisticated neural
networks. Accuracy of the deployed model after Epoch 13 comes out to be maximum of
0.9298 (several parameters are hypertuned for achieving the above accuracy which are
mentioned in Chapter 2 and Chapter 3). We can even increase the accuracy by increasing
the epochs but that will increase the time complexity as well as the space complexity. So,
there is a trade-off between time complexity and accuracy of classification. For the
applications where we need to lay down the emphasis on speed (or time complexity), we
can go for lower epoch value but in arenas where we need to lay down emphasis on
accuracy or perfect classification of the apparel, we can even rely on incrementation of the
epochs.

In a nutshell, this can be stated that an accuracy of almost 93% indicates that out of 10,000
validation data-entries we took for the inductive phase, almost 9300 apparels get properly
classified via the model we employed.
6. CODE

setwd("C:/Users/dj/Desktop/Winter SEM -2020-2021/EDA_J_COMPONENT")


#install.packages('readr')
library(readr)
#install.packages('keras')
library(keras)
train.data <- read_csv("fashion-mnist_train.csv",col_types = cols(.default = "i"))
test.data <- read_csv("fashion-mnist_test.csv",col_types = cols(.default = "i"))
# Data is 28 pixels big in width and height
img_rows <- img_cols <- 28
#' Data is transformed to matrix (because they are easier indexable) and pixels
#' are separated from labels.
x_train <- as.matrix(train.data[, 2:dim(train.data)[2]])
y_train <- as.matrix(train.data[, 1])
# Unflattening the data.
dim(x_train) <- c(nrow(x_train), img_rows, img_cols, 1)
# The same for the test set
x_test <- as.matrix(test.data[, 2:dim(train.data)[2]])
y_test <- as.matrix(test.data[, 1])
dim(x_test) <- c(nrow(x_test), img_rows, img_cols, 1)
clothes.labels <-c( "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot")
# Function to rotate matrices
rotate <- function(x) t(apply(x, 2, rev))
# Function to plot image from a matrix x
plot_image <- function(x, title = "", title.color = "black") {
dim(x) <- c(img_rows, img_cols)
image(rotate(rotate(x)), axes = FALSE,
col = grey(seq(0, 1, length = 256)),
main = list(title, col = title.color))
}
# Plotting 16 cherry-picked images from the training set
par(mfrow=c(4, 4), mar=c(0, 0.2, 1, 0.2))
for (i in 1:16) {
nr <- i * 10
plot_image(x_train[nr, , , 1],
clothes.labels[as.numeric(train.data[nr, 1] + 1)])
}
# Plotting 25 cherry-picked images from the training set
par(mfrow=c(5, 5), mar=c(0, 0.2, 1, 0.2))
for (i in 1:25) {
nr <- i * 10
plot_image(x_train[nr, , , 1],
clothes.labels[as.numeric(train.data[nr, 1] + 1)])
}
par(mfrow=c(6, 6), mar=c(0, 0.2, 1, 0.2))
for (i in 1:36) {
nr <- i * 10
plot_image(x_train[nr, , , 1],
clothes.labels[as.numeric(train.data[nr, 1] + 1)])
}
# Neural network model
batch_size <- 128
num_classes <- 10
epochs <- 13
input_shape <- c(img_rows, img_cols, 1)
x_train <- x_train / 255
x_test <- x_test / 255
# Convert class vectors to binary class matrices
y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)
#install.packages('magrittr')
#library(magrittr)
#install.packages('dplyr')
#install.packages('keras')
#library(keras)
#install.packages('tensorflow')
#tensorflow::install_tensorflow()
#tensorflow::tf_config()
#library(tensorflow)
#library(dplyr)
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(5,5), activation = 'relu',
input_shape = input_shape) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes, activation = 'softmax')
# compile model
model %>% compile(
loss = loss_categorical_crossentropy,
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)
# train and evaluate
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
validation_data = list(x_test, y_test)
)
scores <- model %>% evaluate(
x_test, y_test, verbose = 0
)
cat('Test loss:', scores[[1]], '\n')
cat('Test accuracy:', scores[[2]], '\n')
# Test loss: 1.472362
# Test accuracy: 0.9057
# Let's visually check the correctness of our model predictions.
for (i in 1:16) {
nr <- i * 10
tmpimg <- x_train[nr, , , 1]
# input layers accepts 4D matrix
dim(tmpimg) <- c(1, img_rows, img_cols, 1)
pred <- model %>% predict(tmpimg)
plot_image(x_train[nr, , , 1],
clothes.labels[which.max(pred)],
"blue")
}
par(mfrow=c(1, 1))
7. REFERENCES

[1] Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., Yan, S.: Street-to-Shop: Cross-Scenario
Clothing Retrieval via Parts Alignment and Auxiliary Set. CVPR (2012).

[2] Yang, M., Yu, K.: Real-time clothing recognition in surveillance videos. In: 18th
IEEE International Conference on Image Processing (2011).

[3] Bossard, Lukas, et al. ”Apparel classification with style.” Computer


VisionACCV 2012. Springer Berlin Heidelberg, 2013. 321-335.

[4] Chen, Huizhong, Andrew Gallagher, and Bernd Girod. ”Describing clothing by
semantic attributes.” Computer VisionECCV 2012. Springer Berlin Heidelberg,
2012. 609-623.

[5] Liu, Si, et al. ”Fashion parsing with weak colorcategory labels.” Multimedia,
IEEE Transactions on 16.1 (2014): 253-265.

[6] Lorenzo-Navarro, Javier, et al. ”Evaluation of LBP and HOG Descriptors for
Clothing Attribute Description.” Video Analytics for Audience Measurement.
Springer International Publishing, 2014. 53-65.

[7] Hara, Kota, Vignesh Jagadeesh, and Robinson Piramuthu. ”Fashion Apparel
Detection: The Role of Deep Convolutional Neural Network and Pose-dependent
Priors.” arXiv preprint arXiv:1411.5319 (2014).

[8] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation. CVPR, 2014.

You might also like