2013 IEEE Symposium on Computers & Informatics

An Efficient False Alarm Reduction Approach in

HTTP-based Botnet Detection
Meisam Eslahi

H. Hashim

N.M. Tahir

Computer System and Technology Dept.

University of Malaya

Faculty of Electrical Engineering

Universiti Teknologi MARA

Faculty of Electrical Engineering

Universiti Teknologi MARA

instead of having a central C&C server, the botmaster sends a

command to one or more bots, and they deliver it to their
neighbours. Since the botmaster commands are distributed by
other bots, the botmaster is not able to monitor the delivery
status of the commands [2]. Moreover, the implementation of a
P2P botnet is difficult and complex. Therefore, botmasters
have begun to use the central C&C model again, where the
HTTP protocol is used to publish the commands on certain
web servers [6, 7]. Instead of remaining in connected mode, the
HTTP bots periodically visit certain web servers to get updates
or new commands. This model is called the PULL style and
continues at a regular interval that is defined by the botmaster
Because of the wide range of HTTP services used, it is
not easy to block [7, 9]. Moreover, HTTP protocol is used by
a wide range of normal applications and services on the
Internet, thus detection of the HTTP botnets with low rate of
false detection (e.g. false negative and positive) has become a
challenge for botnet detection studies [6, 7]. Therefore, this
paper aims to propose a method to detect HTTP botnets with a
low rate of false positive and false negative. The major
contributions can be summarised as follows:

Abstract In recent years, bots and botnets have become one

of the most dangerous infrastructure to carry out nearly every
type of cyber-attack. Their dynamic and flexible nature along
with sophisticated mechanisms makes them difficult to detect.
One of the latest generations of botnet, called HTTP-based, uses
the standard HTTP protocol to impersonate normal web traffic
and bypass the current network security systems (e.g. firewalls).
Besides, HTTP protocol is commonly used by normal
applications and services on the Internet, thus detection of the
HTTP botnets with a low rate of false alarms (e.g. false negative
and false positive) has become a notable challenge. In this paper,
we review the current studies on HTTP-based botnet detection in
addition to their shortcomings. We also propose a detection
approach to improve the HTTP-based botnet detection regarding
the rate of false alarms and the detection of HTTP bots with
random patterns. The testing result shows that the proposed
method is able to reduce the false alarm rates in HTTP-based
botnet detection successfully.
Keywords Network Security, Botnet Detection, Command and
Control Mechanism, HTTP Botnet, False Alarm Rate.

A botnet threat comes from three main elements - the bots,
the command and control (C&C) servers, and the botmasters.
A bot is a small application which is designed to infect
computers and use them as part of a botnet without their
owners knowledge. The infected computers or zombies are
controlled by skilful remote attackers called botmasters. They
use C&C servers as interface to send orders to all the bots and
control the entire botnet [1]. In general, there are different
types of botnet command and control models based on the
communication style (i.e. PUSH or PULL), architecture (e.g.
centralised and decentralised) and protocols (e.g. IRC, HTTP
and P2P) [2].
The Internet Relay Chat (IRC) protocol is used in the first
generation of botnets where the IRC servers and the relevant
channels are employed to establish a central C&C server to
distribute botmasters commands [3]. The IRC bots follow the
PUSH approach as they connect to selected channels and
remain in the connect mode [4]. Although the IRC botnets are
easy to use, control and manage, they suffer from a central
point of failure [5]. To overcome this issue, in the P2P model,

A botnet detection technique based on a behavioural

analysis approach to detect malicious activities in a
given network.

Two filter algorithms, High Access Rate (HAR) and

Low Access Rate (LAR), to remove a wide range of
unwanted traffic without using any white or black lists,
in order to reduce the rate of false alarms (i.e. false
positive and false negative).

A periodic pattern detector called Periodic Access

Analysis (PAA) to detect PULL style HTTP botnet
traffic with both fixed and random intervals.

The remainder of this paper is organised as follows.

Section II presents current studies on HTTP-based botnet
detection along with their weaknesses. Section III proposes a
data reduction and analysis approach to overcome current
challenges regarding false alarm rates. The experiment and
resulting analysis are considered in Section IV, followed by
discussion and future works in Section V. Finally, Section VI
gives the overall conclusions of this paper.


To overcome this issue, they propose a fuzzy crossassociation classifier which uses synchronisation activity as a
metric based on the fact that the bots may perform abnormal
activities to be in synchronisation with other bots in the same
botnet. This method also requires a large number of bots in one
botnet and may generate false alarms in small-scale botnets.
Finally, In order to detect small-scale botnets with lower
false alarms, Binbin et al. [6] used request byte, response byte,
and the number of packets as common features of an HTTP
connection, to classify the similar connections generated by a
single bot. Their method can detect the small-scale botnets, but
some techniques like random request delay or random packet
number can evade their detection method and generate high
false negative rates in the results. In addition, like the other
HTTP-based botnet detection approaches, normal programs
which generate periodic connections (e.g. auto refresh web
pages) can be detected as a bot and increase the number of
false positives.
Each of aforementioned methods comes with different
tradeoffs regarding false alarm rates and efficiency in detecting
HTTP-based botnet with random patterns. Therefore, this paper
aims to propose new data filtering approaches to reduce the
false positive and false negative rates in the detection results.

A considerable number of studies on botnet detection have

adopted passive analysis by collecting the network traffic for a
specific period and analysing it in order to identify any
evidence of bot and botnet activities [7]. Table 1 shows several
previous studies proposed to detect the HTTP-based botnets. It
shows the false negative and false positive and their efficiency
in detecting bots with random pattern (e.g. interval, packet
Jae-Seo et al. [3] and Tung-Ming et al. [9] introduced a
parameter based on one of the pre-defined characteristics of
HTTP-based botnets. They suggested a Degree of Periodic
Repeatability (DPR) to show the pattern of regular connections
(i.e. PULL style) of HTTP-based bots to certain servers. In
their method, an activity is considered a bot if its DPR is low,
although the DPR becomes low only if a bot uses the fixed
connection intervals. By changing the connection intervals
technique (e.g. random pattern), the botmasters can evade this
technique and generate a false negative in results [10].
Moreover, the authors observed that by using this technique,
the normal automatic software, such as updaters, can be
detected as a bot and generate a false positive in results.
To reduce the false alarm rates Gu et al. proposed
Botsniffer [8] and its extension BotMiner [4] based on
analysing similarities in the abnormal or malicious activities
generated by a group of bots from the same botnet. Although
they can detect the bots with random interval, they observed
that some services such as Gmail session, which periodically
checks the emails for updates, can generate high rates of false
positives in the results. Moreover, these methods are designed
based on cooperative behaviour analysis, which requires an
adequate number of members (bots) in one botnet to make
detection successfully. Therefore, their proposed Group
correlation analysis shows less efficiency (e.g. a high false
negative rate) in the detection of small-scale botnets and single
bots. In order to overcome this shortcoming, Botsniffer
proposed a sub-system for single bot detection; but, as noted by
authors, it is not as robust as their group analysis technique.
Accordingly, Lu et al. in [11] categorised the services and
application flows using payload-signature to examine the bit
strings in the packets payload as a signature. These signatures
were used to separate known traffic from unknown traffic in
order to decrease the false alarm rates. Like traditional
signature-based techniques the proposed classifier is less
effective as it is unable to identify new or encrypted patterns
and possibly increase the false negative rate.


This paper employed the passive behaviour analysis
approach to collect information about particular network traffic
and analyse it in order to identify any signs of bots and botnet
activities. However, its main objective was to propose an
efficient filtering approach to reduce the rate of false negative
and false positive rates to improve current detection solutions.
A. Data Preparation:
Before applying proposed data reduction approaches,
simple data filtering models called HTTP Traffic Separator
(HTS) and Get and Post Separator (GPS) are applied on
collected traffic to select HTTP traffic with GET and POST
methods only. These filtering approaches are used by almost
every HTTP-based Botnet Detection study as HTTP-based
bots use these methods to contact their C&C server [8, 12].
B. Grouping and Sorting:
The Grouping and Sorting process sorts the collected
traffic packets and divides them into different groups based on
the source IP address, destination IP address, URL, UserAgent string and timestamp. While the other studies mostly
use source IP, destination IP and Domain names to divide the
collected traffic packets to different groups [3, 4, 9, 13], in this
paper one of the HTTP header fields known as the User-Agent
has been used as an additional parameter along with the
previous ones, to make the collected network packets
grouping and classification more accurate.


False Alarm Rate


False Negative

False Positive




Efficiency in
Random Pattern

C. High Access Rate Filter:

The High Access Rate (HAR) filter detects and eliminates
the group of similar HTTP connections or requests that have
been generated within a short period of time, for example, a
group with more than one request per second. This is


important as automatic software (e.g. updater and downloader)

transmits a similar periodic pattern of traffic which can be
falsely identified as HTTP bots activities and increase the
false alarm rates [4, 7, 9]. However, the number of requests
which are generated by the aforementioned applications is
extremely high when compared to HTTP-bots (e.g. more than
one request per minute). Moreover, Strayer et al. [14]
observed that a bot does not generate bulk data transfer.
Therefore, this filter is proposed to remove any traffic that
generates high rates of request and label them as automatic
software instead of bot activities.




Server 1



Local Area Network



D. Low Access Rate Filter:

The Low Access Rate (LAR) filter acts on results from the
HAR filter and removes traffic with a low rate of access. For
example, only a few packets in a long period of time (e.g.
entire data collecting period). This filter is designed based
upon the comment made by Strayer et al. [14] where bots are
designed to perform bigger tasks and much faster than
humans, hence they do not generate brief traffic.



Server 2


Figure 3: The Testbed Overview


In order to evaluate the proposed architecture, several
experiments were conducted. Figure 3 illustrates the
experimental schema which is designed based on the topology
proposed by Lu et al. [11]. The experiment requires several
infected computers from different networks (e.g. different
VLAN), analyser and command and control servers. Two
HTTP-based botnets called BlackEnergy [15] and Bobax [16]
are employed in this research as they are used in most of the
previous studies explained in the literature. The evaluation
study used four different bot configurations to generate botnet
activities as shown in Table 2.
PC1 was infected by the real BlackEnergy bot and the
others were infected by the modified bots modelled after
BlackEnergy and Bobax. PC2 was infected by HBot1 and it is
modelled based upon the BlackEnergy description but it was
modified to become stealthier. It contacts the command and
control server periodically at random intervals (i.e. three to ten
minutes). The HBot2 was generated based on the descriptions
of Bobax and it connects to the command and control server at
every four-minute interval. Finally, the HBot3 adopted a
similar structure as HBot2 but it contacts the command and
control server at random intervals. The bots on PC1 and PC2
are periodically connected to the command and control server
1 and bots on PC3 and PC4 are connected to the command and
control server 2.
In addition to the bots, a set of small software sensors are
designed and placed on the experimental clients to collect the
traffic and send them to the analyser engine which is located
in VLAN 5. Moreover, the Tcpreplay [17] application was
used to regenerate the normal traffic which were previously
captured from the university campus during the experiment.

E. Periodic Access Analysis:

The Periodic Access Analysis (PAA) process selects the
HTTP connections that are generated with a pattern with
periodic intervals. This filter is designed based on the nature
of HTTP-based botnets which follow the PULL style where
they connect to their command and control server periodically
in order to get the commands and updates. This filter is an
improved model of the same concept which is used by the
existing HTTP-based botnet detection methods in [4, 9] to
select only suspicious activities which initiate periodic
communication to specific servers.
This process calculates the total data collecting time, and
divides it into equal partitions (i.e. P1 to Pn) as shown in
Figure 1. Based on the experimental observation of this paper,
the length of one hour is considered for each partition. To
illustrate the difference between normal groups and botnet
groups, Figure 1 depicts the example of their distributions.
The circle represents the botnet activities and the triangle
illustrates the normal groups. As can be seen in the figure, the
circles are repeated in variable intervals to present PULL style
command and control mechanisms with random intervals.
Moreover, they appear in all partitions and this can be
considered as a periodic pattern, therefore they can be
identified as botnet activities. The remaining groups,
represented by triangles, are considered as being non-periodic
and will be removed by the filter.





Normal Activities



Figure 1: Normal and Suspicious Activities Distribution



Name of Bot







Infected by

All Collected Data



Data Preparation



As discussed above, numbers of experiments were

conducted to evaluate the performance of proposed
approaches to reduce false alarms. Table 3 shows the results
of one of the experiments, which was conducted by collecting
305,861 packets in 3 hours. There are five main algorithms
used to filter the traffic. The first filter of data preparation
filtered about 66.45% of data from 305,861 to 102,623
packets. The second filter, GPS removed another 87.86% data
to produce only 12,461 packets.
The data reduction and analysis process was continued
with other filter algorithms: HAR, LAR followed by Periodic
Access Analysis (PAA). The purpose of having the last three
filters is to separate the HTTP-based botnet Command and
Control traffic flows from the normal flows. The HAR filter
has been proven to be very effective by removing 91.93% of
unwanted traffic data packets and reducing them to 1,005
packets from 12,461 packets. The LAR filter reduced the
traffic from 1,005 to only 226 packets, which indicated
77.51% of the traffic had been removed with the filter.
Finally, the PAR filtered out unwanted traffic data packets and
reduced them from 226 packets to only 125 packets (i.e.
44.69% of the traffic had been removed).
In total, with all five steps, the traffic was reduced from
305,861 to 125 packets which indicated 99.96% of the traffic
data packets had been removed. As can be seen in Table 3, the
drastic reduction is significant as the huge percentage is
achieved without using any white or black lists. The
experimental results have demonstrated that the proposed
method in this paper is able to reduce the amount of unwanted
data and detect all the HTTP-based bots which are used in the
aforementioned experiment.







monitored clients and end devices.

Although the experiment results show that our method
significantly reduced the false alarm rate, in some
circumstances normal applications can be detected as
suspicious activities. For instance, if users keep using normal
auto refresh web sites constantly over a long period of time
(e.g. a whole day), it may generate the same pattern as HTTPbased botnets. To overcome this issue, our future work called
botsAnalytics focuses on the User-Agent string to evaluate the
origin environment (e.g. standard web browsers) of collated
requests. The User-Agent is part of HTTP request header that
indicates the application or browser which is generating a
request. However, as observed in [18], the botnets can use
fake User-Agents in order to impersonate normal applications.
Therefore, future efforts should focus on designing a pattern
recognition method to distinguish the original User-Agents
from the fake ones, and to use this method to improve botnet
detection. We are working on two unique algorithms to
prioritise the detected suspicious activities in order to rank
them qualitatively (i.e. low and high). This helps network
security experts to make better decisions on detected
suspicious activities. Moreover, in future work more study
should be dedicated to the detection of HTTP-based botnets
with multi C&C servers instead of those which communicate
with a single C&C server.
Finally, based on our investigation, mobile devices and
networks have been targeted by the new generation of botnets
called MoBots [19]. Mobile networks are now well integrated
with the Internet (e.g. 3G, 4G and LTE technologies) and
provide efficient environments which attract botmasters [20].
Moreover, Knysz et al. [21] discovered that mobile botnet
activities over WiFi connections (i.e. mobile HTTP-based
botnets) are more difficult to monitor and detect. On the other
hand, the techniques and filtering approaches discussed in this
paper are mainly designed based on computer and computer
network characteristics and may not fully applicable for
mobile HTTP botnets. Therefore, we are working on the
extension of current approaches to design a central security
management solution (e.g. Cloud-based) called mobAnalytics,
to detect and mitigate a new generation of botnets in mobile
devices (smartphones in particular) and response to them.


The current HTTP-based botnet detection methods are
mostly based on the fact that bots are periodically connected
to their command and control servers to update themselves or
get new commands from their botmaster. Jae-Seo et al. [3]
observed that some normal automatic applications, such as
download managers, updaters, and auto refresh web pages,
pose similar activities and generate false alarms. Likewise, Gu
et al. [8] found that some services, such as Gmail sessions
which periodically check emails for updates, also have the
potential to generate false alarms. In order to reduce the false
alarm rates, the aforementioned studies proposed a white list
technique. However, this paper proposed two filters, HAR and
LAR and they removed a wide range of unwanted traffic (i.e.
about 96% of non-related data) without using any white or
black lists. Proper data reduction and filtering plays an
important role in the analysis process as it can reduce the
processing load at the analyser engine and effects on the

This paper proposed several approaches to reduce the false
alarms rate in HTTP-based botnets detection. The proposed
methods are evaluated based on the false positive and false
negative rates and its efficiency in the detection of botnets
with random intervals. The test results show that the proposed
method achieved higher efficiency in detecting HTTP-based
botnets. The very low false positive ratio obtained through the


use of the new proposed HAR, and LAR filters, shows that the
proposed method is able to reduce false alarm rates and
improve current studies on HTTP-based botnet detection




This work was supported in part by Research Management

Institute, Universiti Teknologi MARA, Malaysia.











