Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Proposal Document For StyleBusters

Walid Mohamed, Sherif Wael, Abdelrahman Hesham, Adham Moataz


Supervised by:Dr.Fatma Helmy,Eng.Mahmoud ElSahhar
May 16, 2023

Proposal Version Date Reason for Change


1.0 6-October-2022 Proposal First version’s specifications are defined

Table 1: Document version history

GitHub: https://github.com/walidwalid23/StyleBustersApplication

Abstract
The main idea of this project is to implement deep learning techniques to extract style features from
images of different clothes, artworks and logos then using these features to retrieve similar clothes, logos
and artworks from various websites, which is going to help the online shoppers to shop faster, the fashion
designers to find out if their designs were stolen, in addition to that it will also help in the prevention of
stealing artworks and replicating logos.

1 Introduction
1.1 Background
According to Ecommerce benchmarks 2022 [1], there are currently over 2.14 billion online shoppers who
are shopping through 9.1 million online stores and these numbers are increasing rapidly. And according to
arts’ market research [2], the global arts market has grown from $275.58 billion in 2021 to $448.92 billion
in 2022. Along with this increase in the amount of online stores, logo making websites and artworks, comes
more problems for online shoppers and fashion designers as well as the artists and logo makers, as the
online shoppers have to scroll through millions of products in millions of websites to find clothes with the
styles that they want and the fashion designers also have to scroll through the same amount of products to
find out whether an online retailer stole their designs or not while the artists have no way to find out if their
artworks are being stolen or not and users of logo making websites have no guarantee that their generated
logos are unique. This project aims to use deep learning techniques to build a model that will be able to
extract features from clothes, logos and artworks in order to search for clothes, logos and artworks that have
similar styles. This model will then be implemented in an easy-to-use mobile application, providing a quick
way for anyone to search for clothes, logos and artworks based on their styles.

1
1.2 Motivation
1.2.1 Academic
As the online E-commerce businesses, logo generators and artworks industries started to grow rapidly,
online shoppers, fashion designers and artists started to face the problem of having to scroll through millions
of websites and items to find a certain clothes design or certain artwork, also logo generating and making
websites have no guarantee to generate a unique logo for each customer. While there are currently no
permanent solution for this problem, some attempts to solve this problem were made by few E-commerce
websites, they have implemented an option in their websites that allow users to search for products by
design but this option is only limited to their products. From the academic perspective, we decided to
face the challenge of building an easy-to-use mobile application based on state of the art machine learning
techniques to allow users to search for any clothing piece or logo or artwork by their style attributes through
tons of E-commerce, logos and artworks websites with one click, Moreover, we plan to publish an academic
paper on our provided solution.

1.2.2 Business
According to E-commerce benchmarks 2022[1], 27% of global population shop online and during 2022,
global online sales are estimated to be around 5 trillion dollars. Also The global apparel market has grew
from $551.36 billion in 2021 to $606.19 billion in 2022 [3], and according to statista [4], the global art
market has been valued at 65.1 billion dollars in 2021. And the large logo making websites make millions of
dollars in revenue each year, Therefore any solution that will help online shoppers to have a better shopping
experience as well as helping the fashion designers to search for stolen copies of their designs, helping logo
makers to make sure that their created logos are unique and helping in detecting stolen artworks, is going to
have a great impact on the business industry. Therefore there is a huge business need for our project.

1.3 Problem Statement


The main problems that our system is trying to solve are helping online shoppers to shop faster across e-
commerce websites as well as helping fashion designers in finding out if their styles are being copied or not
by allowing them to search for clothes by designs. Our system is also going to help in preventing artworks
from being stolen by allowing users to search for similar style artworks as well as preventing the replication
of logos using deep learning techniques.

2
2 Project Description
The figure 1 below shows our project workflow:

Figure 1: StyleBusters System

2.1 Objectives
Our objective is to build an easy-to-use mobile application that will help the users to find clothes,logos and
artworks that are similar in style to the images that they upload. Meanwhile, our application should have a
solid technical background since we will use advanced deep-learning techniques to find similar styles.

2.2 Scope
The Main Features Of The System:

• The system will allow the users to search for clothes by styles across e-commerce websites by up-
loading an image of the desired piece of clothes.

• The system will allow the users to search for artworks that have a similar drawing style by uploading
an image of an artwork.

• The system will allow the users to search for identical logos by uploading an image of a logo.

The Expected Outcomes Of The System:

• A mobile application that will allow the users to search for similar style clothes or artworks or logos.

The Boundaries Of The System:

• The system will not be able to query through all the websites on the internet as this is not technically
possible.

3
2.3 Project Overview

Figure 2: Clothes System Overview

4
Logos System Overview Diagram:

Figure 3: Logos System Overview

Artworks System Overview Diagram:

Figure 4: Artworks System Overview

5
Systems Description:

• Dataset Gathering:

– For the Clothes System: We are going to use the Deep Fashion 2 dataset [?] which contains
491K diverse images of 13 different clothing categories from both consumers and commercial
shopping stores to train the object detection model.
– For the Artworks System: We are going to use the refined WikiArt Dataset [?] which contains
more than Dataset contains 80000 unique images from 1119 different artists in 27 styles.

• Data Preprocessing: the data that has been gathered will be preprocessed by adding the suitable
annotations to each image and rescaling the images as well as normalizing the pixels values; then,
80% of the data will be split into a training set and 20% into a testing set.

• Deep Learning Models:

– For the Clothes System: YOLO V8 [?] is used as an object detector and classifier to be able to
detect the objects in the images that will be uploaded by the users and the VGG model will be
trained for classification then it will be used as a feature extractor to extract features from the
detected objects and the objects retrieved from the web scraper query server.
– For the Logos System: MobileNet model [?] which is pretrained on ImageNet Dataset [?] will
be used as a feature extractor to extract features from the uploaded logo and the logos retrieved
from the web scraper query server.
– For the Artworks System: VGG (visual Geometry Group) deep learning model will be trained
for classification and after the training process the feature extraction part of the model will be
separated and used as a feature extractor in our system to extract style features from the artwork
uploaded to the system and the artworks retrieved from the web scraper query server

• Similarity Comparison: The Cosine Similarity [?] will be used to get the similarity percentage be-
tween the feature embeddings that are resulted from the feature extractors then the retrieved image
will be displayed to the users if its similarity percentage is above a certain limit.

• Web Scraping: The Node.js Server will be responsible for web scraping and retrieving clothes,logos
and artworks from various reliable websites.

• Models Deployment: the trained feature extraction models will be deployed in a flask server to serve
the requests sent by the users of the mobile application and send requests to the Node.js web scraper
then respond with the results to the users of the mobile application.

• Mobile Application: A Flutter mobile application will be developed which will allow the users of the
application to upload images of clothes,logos and artworks and retrieve clothes,logos and artwork of
similar styles.

6
2.4 Stakeholders
2.4.1 Internal
The internal stakeholders for this project are the team members, and each member has a specific responsi-
bility. The team members and the responsibilities are as the following:

1. Walid Mohamed: Team Leader - Building and Training the Convolutional neural network model and
developing the mobile application.

2. Sherif Wael: Implementing Object Detection Techniques and developing the mobile application.

3. Abdelrahman Hesham: Building and Training the Convolutional neural network model and develop-
ing the mobile application.

4. Adham Moataz: Performing Preprocessing and developing the mobile application.

2.4.2 External
• Online shoppers,fashion designers, logos makers and artists.

This application’s main external stakeholders would be the people who shop for clothes online who want
a faster way to search for certain clothing styles, fashion designers who want to know if their designs are
being stolen or not, logo makers who want to make sure that their logos are unique and artists who want to
make sure that a certain painting style is not copied from another artist.

3 Similar System
3.1 Academic
1. Laenen et.al. [5]:

This paper’s problem targets the Extraction of fashion attributes to be used in a cross-modal search
tool. The solution aims to develop a neural network which learns from organic e-commerce data.
The dataset used in this research named Amazon Dresses dataset. It consists of 53,689 images of
dresses and their descriptions. Each image shows dresses from different categories, such as casual,
bridesmaid, night out, mother of the bride and special occasion, cocktail, wedding and wear to work.
Finally, the results obtained was.This paper has a problem with the retrieval and identification of
fabrics needing further refinement.

Figure 5: Image search results

7
2. Tuinhof et. al. [?]:
This paper discusses the creating of project based on recommending products which is suitable for the
user to increase the probability of a users’ purchase. To enhance the performance of the model, it’s
trained on a convolutional neural network (CNN) to solve classification tasks and to extract hidden
features by composing many convolutional layers. In this paper, researchers were using two dataset
Fashion category and Fashion texture. The Fashion category dataset consists of 11,851. Also, The
Fashion texture dataset consists of 7,342 images, further information shown in figure ??. The highest
accuracy model for the category dataset was 87.0% with BN-Inception. Furthermore, The highest
accuracy model for the texture dataset was 80.0% with BN-Inception. The researchers should have
taken consideration to Siamese networks, so they would get higher accuracy than what they got.

Figure 6: Classification accuracy for each attribute.

3. Du et al. [6] :
This paper targets the issue of creating a home product visual search system and web-scale fashion.
The researchers have tried to solve the problem by training feature and detection extraction mod-
els. There used a dataset which contains a large number of home products and Amazon fashion
clothes.The highest accuracy the researchers had was by image embedding model. Figure 7 shows
the Precision and recall of the localization models.This paper can be upgraded by experiment with
feedback for improving customer experience. Also, introducing user feedback signals would help in
improving this tool.

8
Figure 7: Precision and recall of the localization models

4. Sun et. al. [7]:


Mainly in this paper, the problem was recommending products which is suitable for the user to in-
crease the probability of a users’ purchase. Throughout the paper, the researchers have proposed a
version of two deep CNNs like VGG-19 and AlexNet, which are pretrained on the ImageNet classifi-
cation dataset which are fine-tuned for this task. The researchers were using three benchmark datasets
such as, Flickr Style, AVA Style and WikiPaintings. The AVA Style dataset which has about 14,000
images with different styles. Also, The Flickr Style dataset and WikiPainting dataset which consists
of 85,000 images and 80,000 images. The diffrence between datasets are shown in Figure 10 The best
results for the WikiPainting dataset by using VGG-19-based MNet was 60.4% mAP. On the other
side, The best results for the Flickr dataset by using VGG-19-based MNet was 41.0% mAP. Last but
not least, The best results for the AVA Style dataset by using VGG-19-based MNet was 65.5% mAP.
The results percentages has to be better than the current results which would boost the project.

5. Park et. al. [8]:


This paper discusses searching for exact products accurately from massive dataset of fashion products
according to a query image. To enhance the performance of fashion products detection, pre-trained
fashion products detection models are utilized through multiple modern convolutional neural network
models. In this paper, researchers are using three benchmarks on the DeepFashion dataset.with some
information about the labels of attributes, categories, landmarks and bounding boxes. The highest
accuracy model is 90.9% with SEResNeXt50+OC as Shown in Figure 8. The researchers should have
used different CNN models, such as FR-CNN and R-CNN, so they would get higher accuracy than
what they got.

9
Figure 8: Quantitative comparison of category classification on category prediction of DeepFashion dataset

6. Tuinhof et.al. [9] : This paper discusses the creating of project based on recommending products
which is suitable for the user to increase the probability of a users’ purchase. To enhance the perfor-
mance of the model, it’s trained on a convolutional neural network (CNN) to solve classification tasks
and to extract hidden features by composing many convolutional layers. In this paper, researchers
were using two dataset Fashion category and Fashion texture. The Fashion category dataset consists
of 11,851. Also, The Fashion texture dataset consists of 7,342 images. The highest accuracy model
for the category dataset was 87.0% with BN-Inception. Furthermore, The highest accuracy model
for the texture dataset was 80.0% with BN-Inception shown in Figure??. There are many additional
features that could be added to the model to further increase the accuracy such as gender or color
classification tasks.

Figure 9: Final BN-Inception classification results on the Fashion datasets for category and texture

7. Jia et. al. [10]:


Mainly in this paper, the problem was the Visual analysis of clothing. Throughout the paper, the
researchers have proposed a version of Faster R-CNN model that will be trained on a dataset of
different fashion images. The researchers are using the DeepFashion dataset which contains more than
800,000 different fashion images ranging from well-posed shop images to unconstrained consumer
photos. as shown in Figure 10 The results of the model was 8.9% precision and 24% recall per
attribute type. The training data used was about of 46 categories and some categories have more

10
than 70, 000 images while other categories only tens or hundreds of images. The imbalance makes it
harder to train the model so it can detect some minor categories.

Figure 10: Test Results

8. Li et. al. [11]:


In this paper, the researchers have discussed two main problems faced by the e-commerce industry.
The first Problem was for sellers who faces problems while uploading images of products for sale
and the consequent manual tagging involved. Another problem was faced by customers when they
don’t know the right keywords to search for a certain product but have a visual impression of a certain
picture. In this paper, The researchers have used machine learning algorithms such as Deep Learning
Convolution Neural Network (CNN) which has helped them to solve these problems that have been
mentioned. The dataset used was a real life fashion images from an Indian ecommerce website which
consists of 44,424 images along with product description and attributes. This paper mainly tested the
model with 3 different Product Category, such as Gender and Master Category, Sub-Category and
Article Type. They got an accuracy of 88.4% for the Gender and Master Category, and with the Sub-
Category, they got an accuracy of 96.6%. The Article Type Category has got the accuracy of 87.3%
as shown in Figure 11. The researchers has used a mislabelled products or low resolution images.

Figure 11: model performance

9. Kim et. al. [12]:


This paper’s problem was analysing fashion images as it has attracted significant research attention
due to the availability of large scale fashion datasets with rich annotations. The authors proposed a
convolutional neural network (CNN) model to detect fashion images. In this paper, the researchers
has used the DeepFashion2 dataset, which consists of more than 200,000 fashion images and image’s
annotations for 13 different classes. Each class has 294 unique landmarks and specific keypoints. The
average training of the dataset that they have used is 1000 images per class. The highest accuracy
was 79.9% accuracy. Also, they have created a R-CNN which achieved 82% accuracy as the highest
accuracy for this dataset, as shown in Figure 12. The porposed model didn’t achieve the highest
accuracy as there wasn’t any image pre-processing.

11
Figure 12: Results of performance comparison with the existing methods

10. Shen et. al.[13]:

The main problem in this research paper was discovering close duplicate patterns in large numbers
of artworks. Researchers are trying to solve this problem by adapting a standard deep feature by
fine-tuning it on a specific art collection by using self-supervised learning. The researchers has used
a dataset of artworks for Jan Brueghel and his workshop1 which consist of 1587 artworks. The
researchers has annotated 273 near duplicate details. In this paper, the researchers method achieved
88.5% as the highest accuracy for LTLL and 85.7% as the highest accuracy for Oxford , as shown in
Figure 13. The model created can be better by improving system performance and implementation.

Figure 13: Classification accuracy on LTLL and retrieval mAP on Oxford5K

12
3.2 Business Applications
1. StyleSnap [14] : StyleSnap is an AI-powered feature created by Amazon that helps online shoppers
in shopping through Amazon by allowing them to upload screenshots of the designs that they want to
search for and getting Amazon products with similar designs as a result.

Figure 14: StyleSnap

2. Oxford Painting Search [15] : Oxford Painting Search is an AI-powered feature created by Oxford to
help people in searching for paintings through Oxford by allowing them to upload screenshots of the
paintings that they want to search for and getting similar paintings found in BBC and ART uk.

Figure 15: Oxford Painting Search

13
4 What is new in the Proposed Project?
We are going to create a new type of visual search engines that search for similar style clothes, logos and
artworks remotely. It will then be implemented in a mobile application to be used by the public and we are
aiming to do that by:

• Creating webscrapers to scrape through various clothing, logos and artworks websites.

• Training deep learning feature extractors using datasets of clothes and artworks which are classified
by style.

• Training object detection model to detect and classify clothes in the uploaded images.

• Developing a mobile application and connecting all the system components together.

5 Proof of concept
So far, We have worked on the following:

• We have developed logos and artworks webscrapers.

• We have been able to extract logos features using a pretrained MobileNet deep learning feature ex-
tractor.

• We have trained a YOLO V8 object detection model to detect and classify clothes.

• We have trained a deep learning feature extraction model to be able to extract style features from
artworks.

The figure 16 below is a screenshot from our github page:

Figure 16: GitHub

14
6 Project Management and Deliverables
6.1 Deliverables
This Project will produce a mobile application that will allow users to search for similar style clothes, logos
and artworks.

The deliverables are:

1. Mobile Application 30,January 2023

2. Academic Research Papers 10,October 2022

3. SRS document 30,November 2022

4. SDD document 1,February 2022

5. Final Thesis document 1,May 2023

6.2 Tasks and Time Plan

Figure 17: Time Plan

15
Figure 18: GANTT CHART

6.3 Budget and Resource Costs


The only cost needed to be paid is the cost of the Google Play developer account.
Costs:

• 25 USD for the Google Play developer account.

16
7 Supportive Documents

References
[1] https://www.tidio.com/blog/online-shopping-statistics/.

[2] https://www.thebusinessresearchcompany.com/report/arts-global-market-
report: :text=This%20arts%20market%20research%20report,(CAGR)%20of%2062.9%25.

[3] https://www.thebusinessresearchcompany.com/report/apparel-global-market-
report: :text=The%20global%20apparel%20market%20grew,least%20in%20the%20short%20term.

[4] https://www.statista.com/topics/1119/art-market/.

[5] Katrien Laenen, Susana Zoghbi, and Marie-Francine Moens. Cross-modal search for fashion at-
tributes. In Proceedings of the KDD 2017 Workshop on Machine Learning Meets Fashion, volume
2017, pages 1–10. ACM, 2017.

[6] Ming Du, Arnau Ramisa, Amit Kumar KC, Sampath Chanda, Mengjiao Wang, Neelakandan Rajesh,
Shasha Li, Yingchuan Hu, Tao Zhou, Nagashri Lakshminarayana, et al. Amazon shop the look: A
visual search system for fashion and home. 2022.

[7] Tiancheng Sun, Yulong Wang, Jian Yang, and Xiaolin Hu. Convolution neural networks with two
pathways for image style recognition. IEEE Transactions on Image Processing, 26(9):4102–4113,
2017.

[8] Sanghyuk Park, Minchul Shin, Sungho Ham, Seungkwon Choe, and Yoohoon Kang. Study on fashion
image retrieval methods for efficient fashion visual search. In Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.

[9] Hessel Tuinhof, Clemens Pirker, and Markus Haltmeier. Image-based fashion product recommen-
dation with deep learning. In International conference on machine learning, optimization, and data
science, pages 472–481. Springer, 2018.

[10] Menglin Jia, Yichen Zhou, Mengyun Shi, and Bharath Hariharan. A deep-learning-based fashion
attributes detection model. arXiv preprint arXiv:1810.10148, 2018.

[11] Fengzi Li, Shashi Kant, Shunichi Araki, Sumer Bangera, and Swapna Samir Shukla. Neural networks
for fashion image classification and visual search. arXiv preprint arXiv:2005.08170, 2020.

[12] Hyo Jin Kim, Doo Hee Lee, Asim Niaz, Chan Yong Kim, Asif Aziz Memon, and Kwang Nam Choi.
Multiple-clothing detection and fashion landmark estimation using a single-stage detector. IEEE Ac-
cess, 9:11694–11704, 2021.

[13] Xi Shen, Alexei A Efros, and Mathieu Aubry. Discovering visual patterns in art collections with
spatially-consistent feature learning. In Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pages 9278–9287, 2019.

[14] https://www.amazon.com/stylesnap.

[15] http://zeus.robots.ox.ac.uk/artsearch/.

17

You might also like