Baitalarm: Detecting Phishing Sites Using Similarity in Fundamental Visual Features

2013 5th International Conference on Intelligent Networking and Collaborative Systems
BaitAlarm: Detecting Phishing Sites Using Similarity in Fundamental Visual

Features
Jian Mao1,2 , Pei Li 1 , Kun Li1 , Tao Wei3 , and Zhenkai Liang4
1
School of Electronic and Information Engineering, BeiHang University, China
2
The State Key Laboratory of Integrated Services Networks, Xidian University, China
3
Institute of Computer Science and Technology, Peking University, China
4
School of Computing, National University of Singapore, Singapore
Abstract—In this paper, we present a new solution, BaitA- appearance, a few solutions [9]–[11] is based on comparison
larm, to detect phishing attack using features that are hard of the image of a rendered page. However, this solution
to evade. The intuition of our approach is that phishing pages is not efficient. They can be affected by slight differences
need to preserve the visual appearance the target pages. We
present an algorithm to quantify the suspicious ratings of web caused by different browser rendering engines. Moreover, if
pages based on similarity of visual appearance between the the target page cannot be indexed by search engines, such
web pages. Since CSS is the standard technique to specify as a page that can be displayed only after a user login, the
page layout, our solution uses the CSS as the basis for above solutions cannot be applied.
detecting visual similarities among web pages. We prototyped To robustly detecting phishing sites, we aim to use fun-
our approach as a Google Chrome extension and used it to
rate the suspiciousness of web pages. The prototype shows the damental visual features of a web page’s appearance as
correctness and accuracy of our approach with a relatively low the basis of detecting page similarities. In this paper, we
performance overhead. propose a novel solution, BaitAlarm, to efficiently detect
phishing web pages. Note that page layouts and contents
I. I NTRODUCTION are fundamental feature of web pages’ appearance. Since
the standard way to specify page layouts is through the style
Phishing is a form of social engineering attack in which an sheet (CSS), we develop an algorithm to detect similarities
attacker mimics electronic communications to lure users to in key elements related to CSS.
provide their confidential information. Such communications We implemented BaitAlarm in a Google Chrome exten-
trick users to visit phishing web sites, which collect users’ sion. Our evaluation on more than 7000 phishing pages con-
private information, such as passwords, credit card numbers, firms our assumptions. BaitAlarm achieved accurate results
and social security numbers. According to the investigation in detecting hundreds of samples in phishtank.com, a
report from APWG [1], phishing attacks increased 50% per web collection of phishing attack samples.
month, among which around 5% phishing mails attract users
to visit the phishing web sites. II. OVERVIEW
A widely-used type of solutions detects phishing URLs A. Page Layout and CSS
and alert users before they visit the URLs. For example, The visual appearance of a web page is decided by its
Bayesian anti-phishing toolbar [2], [3] maintains a blacklist page layout and contents. To achieve a consistent appearance
database of phishing sites. Special characteristics of web across all variants of web browsers, Cascading Style Sheets
sites hosting phishing pages, such as the lifetime and the (CSS) is the standard technology for web pages to specify
registration date of a web site, can also be used to detect their visual appearance. When the user opens a web page, the
phishing attacks [4]–[6]. However, the features that such browser captures the CSS structure of the page, which is a
solutions are based on, such as URL strings, are not funda- series of rules specifying visual properties for page elements.
mental features of phishing pages. As a result, it is not hard A CSS rule includes two main components: a selector
for attackers to find ways to evade such defense mechanisms. and one or more declarations. The selector is usually an
Since phishing pages need to lure users by their visual HTML element, and each declaration consists of a property
appearance, i.e., page contents and page layouts, they are and a value. The property is the style attribute of the HTML
usually similar to the target pages. Recent solutions [7], element. Each property has a value [12].
[8] check whether the contents of the page being visited is Selectors can be split into several categories, such as
similar to other pages indexed by search engines. However, tag selectors, id selectors, .class selectors and other
such solutions can be confused by attackers through embed- selectors (e.g., some attribute selectors, etc). Properties illus-
ding invisible contents. To capture the similarity in visual trate the attributes related to the elements that selected by
978-0-7695-4988-0/13 $26.00 © 2013 IEEE 790

DOI 10.1109/INCoS.2013.151
the selectors. For example, a paragraph text contains color, comparison and detection. Attackers may also embed noise
font-size,font-family, border, margin, padding properties. The contents with the page’s background color to be invisible to
values of the properties determine the display effect of the users. These methods can bypass text-content comparison
selected element. without losing the visual similarity to the target page.
Rendered-page-based methods [9]–[11] decide the similarity
B. Overview of Our Approach between two pages by comparing the pixel of their rendered
Intuitively, the higher similarity between the phishing pages. This approach has high overhead incurred by image
page and the target page, the more likely users will be parsing.
deceived. For this reason, attackers always try their best to
clone the target pages. To maintain a consistent look across III. D ESIGN
browsers, attackers also need to rely on the CSS technology,
as the target website does. Attackers should use CSS that A. Phising Page Detection Using Visual Features
results in visual appearance similar to that of the target page
to lure users successfully. The key component of our solution is an algorithm to
To validate our intuition, we analyzed over 7000 samples measure page similarities. The algorithm takes two pages,
from the database of phishtank.com. We found that including page contents and layout specifications, and output
there were two major ways for attackers to develop a the similarity score based on the visual layout similarity. For
phishing page. One is using the link of target web page’s each page, it first extracts the page layout and represents
CSS directly or create a new CSS document and copy the it in a normalized representation. Then it compares the
content from the original CSS. Most of the phishing pages normalized representations of two pages and decides their
we analyzed are developed in this way. The other way is to similarity.
completely rewrite the CSS document to generate a similar
appearance as the original page. As a commercial web page Page layout normalization
has a lot of page elements and complicated CSS rules, it is
To extract formatting features from CSS, we first convert
difficult for the attacker to use a completely different CSS
CSS and page DOM into our own representation.
that generates the same visual effect as the target web page.
As a result, the attackers’ CSS usually have the same set Step 1: Rules set extraction: When the user open a
of properties as the target pages after being parsed by the web page, browser can capture the CSS structure of the
browser. page. The CSS structure is a series of rules with the general
The main idea of our approach is to detect phishing page representation as
Selector1 {Property1 -1:
based on the fundamental feature of the phishing pages, that
Value1 -1;Property1 -2: Value1 -2; ...};
is, similarity in page visual layouts. To effectively attract
Selector2 {Property2 -1:
victims, attackers usually try their best to make a phishing
Value2 -1;Property2 -2: Value2 -2; ...};
page look similar to the target page.
...
However, comparing details of rendered pages are not
efficient. We aim to address this problem by utilizing key Step 2: Convert CSS rules into comparison-units:
features of page layout without actually rendering them. As In order to calculate similarity value more effectively, we
CSS is the standard technology to specifying page layouts, convert CSS rule into a new representation, which we called
it is not practical for attackers to achieve universal looking comparison-unit.
across browsers without using CSS. Therefore, our solution Definition 1: (Comparison-Unit) Given a Web page’s
will extract the key features in CSS and page contents, and CSS rules set, CSS() = {. . . , [Selectori {. . . ; [P ropertyj :
use them as a basis to decide the similarities of pages. V aluek ; . . .], . . .}], . . .}, the corresponding Comparison-
Advantage to existing solutions: There are two other Units set of the web page is represented as
kinds of existing phishing detection approaches that are CompU nit() = {. . . , [P ropertyj : [. . . ; V aluekj :
based on the similarity of web pages: based on page [. . . , Selectorij,k ; . . .], . . .], . . . , }.
text and based on rendered page image. Text-content-based
methods [7], [8] detect the phishing pages according to the A compare-unit consists of two main parts: a property,
frequency of web pages’ keywords, some sensitive words or and one or more declarations. Each declaration consists of
the matching ratio between the suspicious page and the target a value and one or more selectors.
page. These solutions have their limitations. For example, In addition, we classify the selectors into four categories
attackers may replace the text contents by an image with Tag, ID, Class and Others. For example, selector p, div
the same content. In this way, the phishing page displays belong to the Tag category. Class selectors belong to the
the same content as the original one but antiphishing tools Class category; ID selectors belong to the ID category and
cannot get the useful text content of the phishing pages for the other selectors belong to the Others category.
791
Page similarity detection/computing
Before we illustrate our visual similarity computing algo-
rithm, we first define three notations.
Definition 2: (Complexity Score) The Complexity Score
of a web page is a fundamental visual layout metrics. Given
the comparison-unit of the web page A, CompU nit(A), the
complexity of the web page A is Similarity Checker Layout Model Builder
<Similarity Score> <Comparison Unit>
NA
Mn <Decision> <Page Info.>
SA = ktn,m ∗wt +kcn,m ∗wc +kin,m ∗wi +kon,m ∗wo

n=1 m=1
,
<Whitelist/Blacklist>
where NA is the number of web page A’s properties;

<Surfing History Info.: Page URL, Layout Model,...; >
Mn is the number of the n-th property’s optional values <Account Info.-Webpage Mapping Table>
n,m
{V aluemn }; kt , kcn,m , kin,m and kon,m represent the num-
ber of the Tag, Class, ID, Others selectors with the value
V aluemn respectively, and wt , wc , wi , wo are corresponding
Figure 1: BaitAlarm Architecture
weight values.
Definition 3: (Match Score) Given the comparison-units
of the web pages A and B, the Match Score of A and B Phase III: Making decision based on the page similarity
A,B
labeled as Smatch is and additional features of the web pages: If the similarity
NA
Mn
Sim(Sus, V ic) is beyond a preset threshold , that means
A,B
suspicious page and victim page should be the same page.
Smatch = en,m
t ∗wt +en,m
c ∗wc +en,m
i ∗wi +kon,m ∗eo
n=1 m=1
If there exist some other evidences proving that these two
, pages are different, for example, the URL of two pages have
different domains, we conclude that the suspicious page Sus
is a phishing page and output our decision.
where en,m
t , en,m
c , en,m
i and en,m
o represent the number of This is a first step toward our high-level idea of detecting
equal selectors with the value V aluemn belong to the Tag, page similarity using fundamental page features. It confirms
Class, ID, Others categories respectively. our assumptions on CSS role in detecting phishing attacks.
Definition 4: (Similarity) Given the comparison-units of
the web pages A and B, the Similarity between A and B is B. BaitAlarm Architecture
A,B
match score (A, B) Smatch The overall architecture of the BaitAlarm extension is
Sim(A, B) = = shown in Figure 1. BaitAlarm includes three main compo-
min{score (A), score (B)} min{SA , SB }
. nents: Pre-Processor, Layout Monitor, and Network Library.
The Pre-Processor consists of Page Filter, DOM, and
HTML Parser. After a web page is loaded, the Page Filter
Based on our analysis of phishing pages, the ID and checks it over. If the web page has been loaded before, it
Class selectors influence more in visual layout similarity. does not need further analysis. If the loaded page is new
Generally, different web pages should have different ID and and contains some specific UI (e.g., login form), the Page
Class selectors, especially for some unusual name of the ID Filter triggers the detecting process. The HTML Parser and
selector. the DOM extract the layout information of the suspicious
Summary of our approach. Our visual layout similarity page. When the user inputs personal information, such as
based phishing detection scheme includes three phases. Login ID, the browser holds the page and the Pre-Processor
Phase I: Extracting and normalizing CSS structure of sends the layout information to the Layout Monitor.
the suspicious page: Given a suspicious page Sus, we can The Layout Monitor consists of a Layout Model Builder
get the CSS structure of the page CSS(Sus). Then we and a Similarity Checker. When the Layout Monitor gets
convert CSS(Sus) into the normalized model Comparison- the layout information of the suspicious page from the
unit of web page Sus, Compunit(Sus). Pre-Processor, the Layout Model Builder models them into
Phase II: Computing similarity between the suspicious “comparison-unit” and sent them to the Similarity Checker,
page and victim page: After we obtain the normalized model together with additional page features (e.g., page domain,
Compunit(Sus), we match the two comparison-units of the etc.). After the Similarity Checker gets the comparison unit
suspicious page and victim page, and compute the similarity of the suspicious page, it searches the Network Library for
score of the two pages Sim(Sus, V ic). the victim pages feature model (comparison unit) indexed
792
Target Page Paypal Sulake Corp. AOL Blizzard Orkut Cielo Tibia Facebook Other
Number 1978 1029 329 267 207 167 162 109 3516
Ratio 25.48% 13.25% 4.24% 3.44% 2.67% 2.15% 2.10% 1.40% 45.28%
Table I: Statistical Distribution of Target Pages
Similarity p 0.24 < p < 0.3 0.3 ≤ p < 0.4 0.4 ≤ p < 0.6 0.6 ≤ p < 0.8 0.8 ≤ p < 1 1
Number 3 16 9 20 42 420
Ratio 0.59% 3.14% 1.76% 3.29% 8.24% 82.35%
Table II: Similar Ratios of PayPal-Phishing Sites
by the same personal information that has been inputted by Paypal is the most popular target page for phishing attacks
the user before. (with the forging ratio 25.48%); The next three other popular
If the Similarity Checker does not find the matched page, phishing target pages are Sulake Corp., AOL and Blizzard.
then it informs the browser to release the page and treat it as There are almost 46.41% phishing samples targeting these
a new registering web site. The Similarity Checker reports four websites. We use them and their phishing pages for
the page information and its layout model to the Network BaitAlarm system training and threshold adjustment.
Library.
Similarity between phising pages and their target page
If the Similarity Checker finds the matched page (or
pages) and gets its (their) layout model and additional page Firstly, we analyzed the similarity of the phishing pages
information. The checker calculates the similarity score of and their victim page. We use Paypal site and Paypal-
the pages and outputs the decision based on their similarity phishing pages in the phishtank database as the test samples.
score and additional page information. There are totally 1680 phishing Paypal login pages in
In our scheme, if a page’s similarity score is less than the database. Among them, 784 pages were no longer
the preset threshold, the page is innocent. Then browser unavailable online, and 396 pages have the different visual
releases the page and the Similarity Checker reports the page layout from the Paypal site, in which the similarity ratio
information and its layout model to the Network Library. reported by BaitAlarm ranges from 0 to 0.216.
Otherwise, the Similarity Checker checks additional page We analyzed the rest 510 Paypal-phishing pages in the
information to make the decision. For example, if the pages database and show their similarity ratios measured by BaitA-
have a relatively high similarity but their URLs have different larm in Table II. We can see that 82.35% paypal-phishing
domains, the suspicious page is regarded as a phishing pages got the similarity score 1. There is only 0.59% pages
page. The checker will submit the related information to with the similarity scores less than 0.3. According to our
the Network Library and inform the browser to pop up a manual analysis, pages with the similarity score less than
warning page. 0.3 are visually different from the Paypal’s page and users
The Network Library maintains the user’s surfing history can distinguish them easily.
information (e.g., URL, layout model, etc.), Whitelist/Black- For AOL websites, we made the same experiment based
list and a “Personal Info-Historical Page Mapping Table”. on 276 samples selected from phishtank.com that were
The table is used to search for the victim pages based on labeled as AOL-phishing pages. 242 pages’ visual appear-
users’ information captured by the browser. ance was distinct from AOL. For the remained 36 phishing
pages, BaitAlarm reported that the similarity score is 1.
IV. I MPLEMENTATION AND E VALUATION
Similarity to Other Web Pages
We developed BaitAlarm as an extension in the Google
Chrome browser and used it to implement the real-time We made experiments to study false positive by illus-
phishing detection. trating the similarity between other web pages and some
Our evaluation is performed on a computer with an target pages (without losing the generality, we took Paypal
Intel(R) Core(TM)2 Duo CPU (3.00GHz) and 2GB of mem- as the target page). In this experiment, we chose 302 web
ory. We used Google Chrome v21.0.1180.15. The phishing pages randomly that include university websites, government
pages are collected from Phishtank.com, and the sample homepages, E-business websites, and social network sites,
data set consists of 7764 phishing sites. etc. We show the results in Table III. 86.30% of the benign
pages’ similarity score is less than 0.04 and there is no
A. Training and Threshold Determining benign page’s similarity score beyond 0.18.
We analyzed 7764 phishing samples in our dataset and We also tested the similarity score between phishing
counted the forging frequency of the specific victim pages. pages and their non-target pages. In this case, we randomly
The statistical result is shown in Table I. We can see that selected 276 phishing pages cited by the phishtank.com
793
Similarity 0-0.04 0.04-0.08 0.08-0.12 0.12-0.18 0.18-1
Number 252 20 9 11 0
Ratio 86.30% 6.85% 3.08% 3.77% 0
Table III: Similarity scores of normal web pages and Paypal
Similarity 0-0.1 0.1-0.2 0.2-0.3 0.3-0.5 0.5-1

Number 222 2 3 5 0
Ratio 95.69% 0.86% 1.29% 2.16% 0
Table IV: Similarity scores of non-AOL phishing web pages

and AOL page
Figure 2: Detection of French Phishing site
and their similarity scores of AOL was calculated and shown
in Table IV. We can see that most of the similarity scores
of AOL and non-AOL phishing pages are less than 0.1. But approach detects the attempt to hide the true domain of the
there are 8 similarity scores in the interval [0.2,0.5] and the site through URL manipulation. For example, an attacker
similarity results in Table IV are a little higher than the may provide a page with URL https://online.citibank.com/
scores in Table III. It is the result of the normalization in US/JPS/portal/Index.do@10.30.12.14/, whose real domain is
our algorithm: we normalize the similarity-score by using 10.30.12.14. Bayesian Anti-phishing toolbar [2] is based on
the minimum complexity score of the two pages, so that a whitelist. For a given web-site, the toolbar checks with an
we can prevent attackers’ attempts to bypass detection by analyzer for whether the given URL is a legitimate web site.
forging web pages with few CSS rules, which decreases the If the URL is not in the whitelist, DOM analyzer labels the
page’s similarity score. As a trade-off, however, this induces given web site with a token and sends the token to a scoring
some false positive of the detection that some testing pages module. When the output score exceeds a threshold, the URL
with few CSS rules might be wrongly treated as phishing is treated as phishing.
pages.
Content-based detection: Eric et al. proposed a visual-
To clarify this situation, we manually analyzed the real
similarity-based phishing detection scheme [14] to visually
phishing pages which are visually similar to their target web
compare the suspicious phishing page with the legitimate
pages and compute their complexity scores. All these testing
one. They consider and identify three typical features, i.e.
pages’ complexity scores are beyond 100. According to the
text pieces and styles, images in the page and the overall
analysis and experience, we set 100 as an experience thresh-
visual appearance of the page. Chen et al. [15] proposes an
old to filter the similarity testing input. If BaitAlarm gets
approach for detecting visual similarity between two web
pages with the complexity score less than 100, it will send an
pages. The method objectifies and directly compares these
alert to remind users check the page content carefully before
indivisible super signals using algorithmic complexity the-
they input their private information because the current
ory. CANTINA [7] detects phishing web sites based on page
page has a suspicious simple layout. Otherwise, BaitAlarm
contents. It is based on an algorithm using term frequency-
computes the similarity score of the testing page and make
inverse document frequency (TF-IDF), combined with other
the corresponding decision based on the preset threshold
heuristics to detect phishing sites. SpoofGuard [16] is a
(e.g., an experience value 0.3, in our implementation).
browser plug-in that uses domain name, URL, link and
B. Accuracy image check to determine if a given page is a part of a
To evaluate the accuracy of BaitAlarm, we selected 300 spoof attack. Its false alarm rate depends on the frequency
phishing pages from phishtank.com that are labeled as that users establishes new accounts and the frequency that
phishing pages of Google, Hotmail, ASB Bank and Blizard users empty the browser history. The above approaches are
corporation respectively. BaitAlarm filtered 149 phishing not based on fundamental features of phishing pages, and
pages that were not visually similar to the target page thus are not resilient to evasions.
claimed by phishtank.com. The other phishing pages Phishing detection using fundamental features: An
remained are checked out by BaitAlarm successfully. For all SLN-based scheme [17] is proposed by Liu et al. The
these 300 testing samples, the detection rate of BaitAlarm method is based on constructing and reasoning of the seman-
is 100% and false negative rate is 0%. tic link network (SLN) of web pages. It constructs the SLN
from the given suspicious web page and its associated web
V. R ELATED W ORK pages. SLN discloses implicit relations among web pages.
URL-based detection: SpoofStick [13] is a browser By analysis the relations, suspicious web pages, including
toolbar to show the real domain of the current page. The phishing pages, can be identified.
794
Reputation Scoring: WOT [18] and iTrustPage [4], [5] [6] I. Fette, N. Sadeh, and A. Tomasic, “Learning to detect
aim to rate a page on the possibility of phishing using phishing emails,” in Proceedings of the International World
reputation scores, which are either reported from the anti- Wide Web Conference (WWW), May 2007.
phishing community or computed from the given web page. [7] Y. Zhang, J. Hong, and L. Cranor, “Cantina: A content-based
Nevertheless, the two approaches listed above are user approach to detecting phishing web sites,” in Proceedings of
assisted and WOT’s rating scheme is based on the subjective the International World Wide Web Conference (WWW), May
comments submitted by the users. 2007.
Unlike the anti-phishing methods discussed above, BaitA-
[8] A. Nourian, S. Ishtiaq, and M. Maheswaran, “Castle: A
larm is based on the fundamental display features of web scocial framework for collaborative anti-phishing databases,”
pages. These features are monitored by browsers and treated ACM Transactions on Internet Technology, 2009.
as objective metrics in phishing web-page detection au-
tomatically. Compared to BaitAlarm, the whitelist based [9] C. Y., H. W., and Y. Le, “Anti-phishing based on automated
techniques cannot be used to identify the newly set-up individual white-list,” in Proceedings of the 4th ACM work-
shop on Digital identity management, 2008, pp. 51–60.
benign web page, and it needs to be updated manually with
a learning and verification period, which might cause a high [10] D. Xiaotie, H. Guanglin, and F. A.Y., “An antiphishing
false positive rate. strategy based on visual similarity assessment,” Internet Com-
puting, vol. 10, no. 2, pp. 58–65, 2006.
VI. C ONCLUSION
[11] L. Wenyin and D. Xiaotie, “Detecting phishing web pages
Phishing is a popular social engineering attack used by with visual similarity assessment based on earth mover’s
attackers to collect sensitive information from victim users. distance,” IEEE Transactions on Dependable and Secure
Computing, vol. 3, no. 4, pp. 301–311, 2006.
This paper introduces a novel antiphishing approach, BaitA-
larm, which is based on efficient similarity comparison be- [12] W3CSchool, “Css tutorial-w3cschool,” http://www.
tween the suspicious page and the target page. In particular, w3schools.com/css/.
BaitAlarm uses CSS and related elements to represent visual
features of a web page. Our evaluation using a large number [13] SpoofStick, “Spoofstick,” http://www.corestreet.com/
spoofstick/.
of phishing pages supports the key idea of our approach. In
the future work, we will work on improving BaitAlarm’s [14] E. Medvet, E. Kirda, and C. Kruegel, “Visual-similarity-based
resilience to evasion attacks. phishing detection,” in Proceedings of SecureComm 2008.
ACM, September 2008.
Acknowledment. The authors thank anonymous review- [15] T.-C. Chen, S. Dick, and J. Miller, “Detecting visually similar
ers for their insightful comments. This work was sup- web pages: Application to phishing detection,” ACM Trans-
ported in part by the Beijing Natural Science Foundation action on Internet Technology, vol. 10, no. 2, pp. 1–38, May
(No. 4132056), the National Key Basic Research Program 2010.
(NKBRP) (973 Program) (No. 2012CB315905), the Beijing
[16] D. Boneh, “Spoofguard,” http://crypto.stanford.edu/
Natural Science Foundation (No.4122024), and the Na- SpoofGuard.
tional Natural Science Foundation of China (No. 61272501,
61173154, 61003214). [17] L. Wenyin, N. Fang, X. Quan, B. Qiu, and G. Liu, “Discover-
ing phishing target based on semantic link network,” Future
R EFERENCES Generation Computer Systems, no. 26, pp. 381–388, 2010.
[1] APWG, “Investigation report,” http://www.antiphishing.org/ [18] WOT, “Web of trust,” http://www.antiphishing.org/reports/
reports/apwg trends report h2 2011.pdf, 2011. apwg trends report h2 2011.pdf, 2011.
[2] L. P., E. Jung, D. D., H. T.E., and H. J.P., “B-apt: Bayesian

anti-phishing toolbar,” in Proceedings of IEEE International
Conference on Communications, ICC’08. IEEE Press, May
2008.
[3] C.Inc., “Couldmark toolbar,” http://www.cloudmark.com/

desktop/ie-toolbar.
[4] T. Ronda, S. Saroiu, and A. Wolman, “itrustpage: A user-

assisted anti-phishing tool,” in Proceedings of Eurosys’08.
ACM, April 2008.
[5] iTrustPage, http://www.cs.toronto.edu/∼ronda/itrustpage/.
795

Baitalarm: Detecting Phishing Sites Using Similarity in Fundamental Visual Features

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Baitalarm: Detecting Phishing Sites Using Similarity in Fundamental Visual Features

Uploaded by

Copyright:

Available Formats

2013 5th International Conference on Intelligent Networking and Collaborative Systems

BaitAlarm: Detecting Phishing Sites Using Similarity in Fundamental Visual

978-0-7695-4988-0/13 $26.00 © 2013 IEEE 790

SA = ktn,m ∗wt +kcn,m ∗wc +kin,m ∗wi +kon,m ∗wo

where NA is the number of web page A’s properties;

Table I: Statistical Distribution of Target Pages

Table II: Similar Ratios of PayPal-Phishing Sites

Table III: Similarity scores of normal web pages and Paypal

Similarity 0-0.1 0.1-0.2 0.2-0.3 0.3-0.5 0.5-1

Table IV: Similarity scores of non-AOL phishing web pages

[2] L. P., E. Jung, D. D., H. T.E., and H. J.P., “B-apt: Bayesian

[3] C.Inc., “Couldmark toolbar,” http://www.cloudmark.com/

[4] T. Ronda, S. Saroiu, and A. Wolman, “itrustpage: A user-

[5] iTrustPage, http://www.cs.toronto.edu/∼ronda/itrustpage/.

You might also like