Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341158045

A Machine Learning Based Adult Content Detection Using Support Vector


Machine

Conference Paper · March 2020


DOI: 10.23919/INDIACom49435.2020.9083700

CITATIONS READS
11 541

4 authors, including:

Saritha L R
Jio Institute Ulwe
6 PUBLICATIONS 32 CITATIONS

SEE PROFILE

All content following this page was uploaded by Saritha L R on 06 May 2022.

The user has requested enhancement of the downloaded file.


Proceedings of the 14thINDIACom; INDIACom-2020; IEEE Conference ID: 49435
2020 7 International Conference on “Computing for Sustainable Global Development”, 12 th - 14th March, 2020
th

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

A Machine Learning Based Adult Content Detection


Using Support Vector Machine
Ganesh Gajula Ajinkya Hundiwale Shreyas Mujumdar
SIES Graduate School of Technology SIES Graduate School of Technology SIES Graduate School of Technology
University of Mumbai University of Mumbai University of Mumbai
Mumbai India Mumbai India Mumbai India
ganeshgajula22@gmail.com ajinkyahundiwale@gmail.com shreyasmujumdar@gmail.com

Mrs. Saritha L.R


SIES Graduate School of Technology
University of Mumbai
Mumbai India
saritha.r@siesgst.ac.in

Abstract—In the era of internet, recognizing pornographic those images as porn or not we will be using the Support Vector
images is of great significance for protecting children’s physical Machine (SVM) Algorithm. The SVM has an hyperplane which
and mental health. With small kids surfing over the internet they separates the data points in two partitions as per their class. After
are just one click away from getting access to pornographic determining the image as porn it is then coloured black using
images. However, this task is very challenging as the key image processing[16]. Thus, the adult content detection model
pornographic contents (example. breast, private part) in an image will surely help filter unwanted/adult images.
often lie in local regions of small size. The proposed model is based
on supervised learning-based Support Vector Machine (SVM) II. LITERATURE SURVEY
algorithm which returns whether an image is safe or unsafe. The
proposed model not only differentiates the image between A. Pornographic image recognition using feature based
safe/unsafe but also blurs/colors the exposed skin portion approach
completely black if the image is found to be unsafe (i.e. The early works focuses on classifying the image based on
pornographic image) using image processing technique. So, that the percentage of skin exposed in the images. A fixed threshold
the end user won’t be able to see exposed private parts in an image.
value is being set thereafter if the percentage of skin exposed in
When tested on our newly-collected large scale dataset
demonstrates the effectiveness of the proposed method, achieving
the images is above certain threshold value then the image is
an accuracy of ~91% when tested 4k pornographic images and 4k classified as pornographic image [6],[7],[8]. Further the
normal images. classification involves feature based approach, region based and
body part-based approach. The feature-based approach involves
extracting important features from the entire image. Some of the
Keywords—adult content detection, SVM, pornographic image features involve bag-of-feature (BoF) approach [1],[2],[3] along
recognition, pornographic image blur, safe browsing with deep convolutional neural network (CNNs) approach [4],
[5]. The BoF approach captures local patterns of the entire image
I. INTRODUCTION but it lacks entire discriminative power of an image. On the
The adult content detection is an important and challenging counterpart the CNN-based approach can automatically learn to
task especially with the large amount of freely available content discriminate image from large dataset [8]. But, since they
on the web as it involves filtering the adult images and then blur directly adopt CNN architecture to model the entire
those images before the image reaches to the end user. Also, pornographic image, some crucial local details (example. private
many film production boards have implemented rating model for parts) are largely ignored.
movies so that viewers can come to know about the presence of B. Pornographic image recognition using region based
adult content in those films. In this model, the pornographic approach
images will be detected on the basis of the percentage of skin
exposed in those images. Thereafter, if found that the image is The region based approach extracts features based on the
porn then the image will be blurred. This will ensure that the end detection of images [9], [10]. Further based on the region
user is not able to see any porn images if it suddenly pops out detection such as hand geometry, shapes pertaining to private
while surfing the internet. In this, initially the model is being parts (example. breasts) are being widely detected through
trained on the available dataset. Further for classification of region-based approach [14]. Thus, as compared to the previous
Proceedings of the 14thINDIACom; INDIACom-2020; IEEE Conference ID: 49435
2020 7 International Conference on “Computing for Sustainable Global Development”, 12 th - 14th March, 2020
th

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

feature-based approach for classification of images the region- space into high dimensional feature space [11],[12],[15]. While
based approach plays a vital role in classification. The region- providing the data to kernel function as an input the data is in
based approach is much more robust but, there exists a risk of non- separable form and when this data is being converted into
determining the inappropriate regions since skin detection high dimensional feature space then it becomes separable and
involves a challenging task. the data can be classified.
III. PROPOSED APPROACH The SVM diagram is being shown below which shows how
the separation between two data classes takes place.
The proposed approach is completely based on machine
learning. In this, the model is initially trained on the training
dataset during which the accuracy obtained was 94% and while
testing the model the accuracy found was 91%.The model gives
4% false positives by classifying some of the non-adult images
into adult. The entire classification of image whether porn or not
is completely based on the amount of skin percentage being
exposed in the images. The Support Vector Machine (SVM)
algorithm is used for the classification of images whether porn
or not [11]. The SVM algorithm works on the basis of an
hyperplane which separates the data points of the two classes.
The hyperplane is also called as decision boundary as it decides
to which class the new data belongs to. The hyperplane selected
is such that it should have the maximum margin i.e. the width of
the margin should be maximum. This is taken into consideration
as the maximum margin hyperplane helps to classify the future
data points accurately whether porn or not. The dataset we have
Fig.1 . Support Vector Machine Implementation Diagram
is non-linear in nature therefore we will be using Non-Linear
SVM and kernel function. The kernel functions task involves In fig.1 the middle slant line is a hyperplane which separates the
converting the low dimensional feature space into high data points of two class. The two lines on both the sides of a
dimensional feature space [11]. While providing the data to hyperplane is drawn such that it passes close to one of the data
kernel function as an input the data is in non- separable form and points of two classes i.e. support vectors. Here, two classes are
when this data is being converted into high dimensional feature being separated by hyperplane so that the prediction of model is
space then it becomes separable and the data can be classified. accurate such that the images can be classified as per their class
Further after classifying the images into porn or not then using (i.e. porn and non-porn) [15].
image processing technique the porn image is then coloured
black completely by converting the RGB colour model into the
HSV colour model so the end user won’t be able to see such
content [16]. Thus, the model works on the basis of classification
algorithm SVM and further if the image is found to be porn then
it will be turned black thus, ensuring the safety of kids while
browsing on the web and also it helps an individual to have
smooth user experience while using social media sites.
IV. SUPPORT VECTOR MACHINE ALGORITHM
Support Vector Machine (SVM) has similar origin as neural
network. Initially, we trained the model so that it can be used in
further testing with random images .SVM builds up this model
Fig. 2 . Hyperplane Selection in Support Vector Machine
based on statistical learning, and the process of building up a
model and tuning parameters can be finished in a certain In fig.2 the hyperplane selection criteria is being explained. The
duration. Each record of information is a vector of attributes that hyperplane is selected such that the margin must be maximum
should be as representative as possible for that record of data. i.e. the horizontal distance between the data points of two classes
Since each record is formed as a vector, thus called SVM or support vectors across the hyperplane must be always
[11],[12],[13]. maximum so as to get the prediction of the model as accurate as
SVM algorithm is used for classification of images whether possible. Further, even if the data gets increased in near future
porn or not. The dataset we have is non-linear i.e. using this then too the accuracy of the model will not decrease due to the
dataset we cannot separate the data points linearly to separate the maximal margin hyperplane as the data points won’t get
two different classes i.e. porn and non-porn[15]. For that we accumulated or bounded close to each other [11],[12].
have used Non- Linear SVM and kernel function. The kernel The dataset we have contains 4k pornographic images and
function’s task involves separating low dimensional feature 4k normal images. Out of which some images are such that they
Proceedings of the 14thINDIACom; INDIACom-2020; IEEE Conference ID: 49435
2020 7 International Conference on “Computing for Sustainable Global Development”, 12 th - 14th March, 2020
th

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

cannot be directly distinguished whether pornographic or not. two dimensional image which can now be separated using
So, in such cases we use Non-Linear SVM and kernel function hyperplane.
as stated in [11],[12],[15]. The kernel function coverts the low
dimensional feature space into high dimensional feature space
so that the data points pertaining to two different classes which
are initially non separable becomes separable. In the figure
below, we can see that the one dimensional image which
contains the data points that are non separable at first after
passing through the kernel function gets converted into two
dimensional image with separable data points[15]. Also, the two
dimensional image which is initially in non-separable form after
passing through kernel function gets converted into three
dimensional image with separable data ponts. The figure below
illustrates the same.

Fig. 5. Non-Linear SVM & kernel function on 2-dimensional non-linear


data points.

In fig.5 the non-linear data points in 2-dimensional image which


cannot be separated into two classes with the help of hyperplane
are then given to kernel function as an input in the form of low
Fig. 3. Kernel Function is used in Non-Linear SVM. dimensional feature space which converts it into high
dimensional feature space in the form of separable data points in
In fig.3 the kernel function is used to convert the low three dimensional image which can now be separated using
dimensional feature space (i.e. the non-separable data points) hyperplane.
into high dimensional feature space due to which the data
becomes separable. V. RESULTS AND DISCUSSION
A. Implementation when pornographic image is being
uploaded onto the model

Fig. 6. Command prompt execution


In fig.6 the initial execution of command prompt window is
being shown where the model is being executed.

Fig. 4. Non-Linear SVM & kernel function on 1-dimensional non-linear


data points.

In fig.4 the non-linear data points in 1-dimensional image which


cannot be separated into two classes with the help of hyperplane
are then given to kernel function as an input in the form of low Fig. 7. After uploading the pornographic image onto the model, it
dimensional feature space which converts it into high displays whether the image is safe or unsafe further it also displays the
dimensional feature space in the form of separable data points in accuracy.
Proceedings of the 14thINDIACom; INDIACom-2020; IEEE Conference ID: 49435
2020 7 International Conference on “Computing for Sustainable Global Development”, 12 th - 14th March, 2020
th

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

In fig.7 the pornographic image is being loaded on the model animated adult image so there is little difference in processing
and based upon the skin exposed in the image it is then time for both animated and real image(depends on training).
classified as unsafe image with accuracy of image being B. Implementation when non-pornographic image is being
pornographic close to 89.094%. uploaded onto the model

Fig. 11. After uploading the non-pornographic image onto the model, it
displays whether the image is safe or unsafe.
In fig.11 the non-pornographic image is being loaded on the
model and based upon the skin exposed in the image it is then
classified as safe image.

Fig. 8. The pornographic image is then colored black


In fig.8 the pornographic image that has been previously loaded
on to the model and got detected as unsafe is then colored
completely black wherever the large area of skin regions are
being exposed.

Fig. 12. The non-pornographic image is not colored black it remains the
same.
In fig.12 the non-pornographic image that has been previously
loaded on to the model and got detected as safe is then not
colored black as the skin area exposed is very less or negligible.
Fig. 9. After uploading the pornographic image onto the model, it displays
whether the image is safe or unsafe further it also displays the accuracy.
In fig.9 the pornographic image is being loaded on the model
and based upon the skin exposed in the image it is then classified
as unsafe image with accuracy of 88.99%.
Fig. 13. After uploading the non-pornographic image onto the model, it
displays whether the image is safe or unsafe.
In fig.13 the non-pornographic image is being loaded on the
model and based upon the skin exposed in the image it is then
classified as safe image.

Fig. 10. The pornographic image is then colored black.

In fig.10 the pornographic image that has been previously loaded


on to the model and got detected as unsafe is then colored
completely black wherever the large area of skin regions are
being exposed. So according to different classes mentioned in
the algorithm like buttocks, breast, belly, it will detect part that Fig. 14. The non-pornographic image is not colored black it remains the
matches according to provided classes in trained model and same.
make that part black. It is somewhat depends on clearity of In fig.14 the non-pornographic image that has been previously
image provided as input, more the image clear it will helps to loaded on to the model and got detected as safe is then not
detect the trained model or detector more faster. Now fig. 10 is colored black as the skin area exposed is very less or negligible.
Proceedings of the 14thINDIACom; INDIACom-2020; IEEE Conference ID: 49435
2020 7 International Conference on “Computing for Sustainable Global Development”, 12 th - 14th March, 2020
th

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

detects an image and returns a false positive and it can provide


safe search and puts image in medical, adult or violence category.
The API of this model can be used in any website or android
application so that whatever the content or advertisements
pertaining adult content will automatically get blurred. The
Fig.15.After uploading the non-pornographic image onto the model it YouTube kids work in a similar way.
displays whether the image is safe or unsafe.
VII. CONCLUSION AND FUTURE SCOPE
In fig.15 the non-pornographic image is being loaded on the
model and based upon the skin exposed in the image it is then Adult content detection model is very essential for
classified as safe image. maintaining the privacy and security while surfing on the web
for both the kids as well as adults as no one would ever want
some random advertise popping up and displaying unwanted and
undesired content randomly. So, we can further improve the
efficiency by reducing the blur time of an image from the
moment it’s been getting popped up on the screen.
The adult content detection model can further also be expanded
to work on videos so that one can determine whether the video
contains any adult content in any time frame of the video. This can
be done by dividing the entire video into 3sec time frames each and
then applying the above described model onto it.
REFERENCES
Fig. 16. The non-pornographic image is not colored black it remains the [1] A. P. Lopes, S. E. de Avila, A. N. Peixoto, R. S. Oliveira, and A. de A
same. Araujo. A bag-of-features approach based on hue-sift descriptor for nude
In fig.16 the non-pornographic image that has been previously detection. In Signal Processing Conference, 2009 17th European, pages
1552–1556. IEEE, 2009.
loaded on to the model and got detected as safe is then not
colored black as the skin area exposed is very less or negligible.
[2] A. Ulges and A. Stahl. Automatic detection of child pornography using
color visual words. In Multimedia and Expo (ICME), 2011 IEEE
In the above snippets we can see that after uploading the non- International Conference on, pages 1–6. IEEE, 2011.
adult image onto the model it is then classified whether safe or
unsafe. If the image is safe then the image is not colored black. [3] L. Sui, J. Zhang, L. Zhuo, and Y. Yang. Research on pornographic images
Thus, the model will ensure that if while surfing on the internet recognition method based on visual words in a compressed domain. IET
the random popped up image is pornographic then it will image processing, 6(1):87–93, 2012. Volume 6, Issue 1, February 2012,
p. 87 – 93
immediately determine the image and will color the image
completely black. On the contrary, if the image is non-
pornographic then the originality of the image is being [4] M. Moustafa. Applying deep learning to classify pornographic images and
videos. arXiv preprint arXiv:1511.08899, 2015.
maintained and hence, it is not colored black.
VI. ADVANTAGES AND APPLICATIONS [5] F. Nian, T. Li, Y. Wang, M. Xu, and J. Wu. Pornographic image detection
utilizing deep convolutional neural networks. Neurocomputing, 2016.
The adult content detection model can be used to protect
children from getting exposed to widely available adult content [6] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards Real-Time
on the internet. It can also be used as a model that works similar Object Detection with Region Proposal Networks., IEEE Transactions on
to child mode on a mobile phone except here the child will not
Pattern Analysis and Machine Intelligence (2016) 1–1
even see the image.
In many countries before the release of the movie in theatre it [7] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image
is passed through censor board where they have a look at the recognition, in: IEEE Conference on Computer Vision and Pattern
movie and if any adult scenes are there in a movie in large amount Recognition (CVPR), 2015.
then the board is likely to cut those scenes completely or
sometimes it blurs those scenes and rate the movie as adult [8] A.P.B. Lopes, S. E. F. d. Avila, A. N. A. Peixoto, R. S. Oliveira, M.d.M.
Coelho, A.d.A.Ara´ujo, Nude detection in video using bag-of-visual-
(displaying A or 18+ only) on the poster of the movie. So, instead features, in: XXII Brazilian Symposium on Computer Graphics and Image
of censor board being checking each and every movie for the Processing, 2009, pp. 224–231.
presence of adult content the existing model can be expanded
which will work on videos so that when the movie is played [9] Q.-F. Zheng, W. Zeng, W.-Q. Wang, and W. Gao. Shape-based adult
wherever the adult scenes are present in the movie it will be image detection. International Journal of Image and Graphics, Volume 6
blurred completely while playing. It will provide an API that Issue 1,pp.115-124,2006.
Proceedings of the 14thINDIACom; INDIACom-2020; IEEE Conference ID: 49435
2020 7 International Conference on “Computing for Sustainable Global Development”, 12 th - 14th March, 2020
th

Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)

[10] Q. Zhu, C.-T. Wu, K.-T. Cheng, and Y.-L. Wu. An adaptive skin model
and its application to objectionable image filtering. In Proceedings of the
12th annual ACM international conference on Multimedia, pages 56–63.
ACM, 2004.

[11] Lin, Yu-Chun & Tseng, Hung-Wei &Chiou-Shann.(2003).Pornography


detection using support vector machine.

[12] Holger Frohlich, Olivier Chapelle, and Bernhard Scholkopf. Feature


selection for support vector machines by means of genetic algorithms. In
Proc. International journal on artificial intelligence tools, pages 142-148.
IEEE Computer Society, 2003.

[13] S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for


human action recognition, IEEE Transactions on Pattern Analysis and
Machine Intelligence 35 (1) (2013) 221–231.

[14] Y. Huang and A. W. K. Kong, "Using a CNN ensemble for detecting


pornographic and upskirt images," in Biometrics Theory, Applications
and Systems (BTAS),2016 IEEE8th International Conference on, 2016,
pp. 1-7.

[15] HajarBouirouga, Sanaa El Fkihi, AbdeilahJilbab, and DrissAboutajdine.


Comparison of performance between differentsvm kernels for the
identification of adult video.World Academy of Science, Engineering and
Technology, 2011.

[16] Basilio, Jorge & Torres, Gualberto & Sanchez-Perez, Gabriel & Medina,
Linda & Perez-Meana, Hector & Escamilla-Hernandez, Enrique. (2011).
Explicit Content Image Detection. Signal & Image Processing. 1.
10.5121/sipij.2010.1205.

View publication stats

You might also like