Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

IMPLEMENTING APPLICATION

FILTERING FEATURE ON
NON-STANDARD FIREWALL

Supervisor: TRINH Ngoc Minh


Student: TRAN Thi Dung

January 2013
Ho Chi Minh City, Vietnam
INTERNSHIP REPORT

Table of Contents
ACKNOWLEDGMENTS .................................................................................................. 7

ABSTRACT......................................................................................................................... 8

1 INTRODUCTION ....................................................................................................... 9

1.1 ISeLAB Introduction ............................................................................................. 9

1.1.1 General information ........................................................................................ 9

1.1.2 Mission and vision .......................................................................................... 9

1.1.3 Organization ................................................................................................. 10

1.2 Internship Introduction ........................................................................................ 10

1.2.1 Motivation..................................................................................................... 10

1.2.2 Internship requirements and expected result ................................................ 11

1.2.3 Internship Report Structure .......................................................................... 12

1.2.4 Internship schedule ....................................................................................... 13

2 TECHNOLOGY BACKGROUND ........................................................................... 14

2.1 Non-standard Firewall ......................................................................................... 14

2.2 Application filtering............................................................................................. 14

2.2.1 Introduction................................................................................................... 14

2.2.2 Linux Firewall – Netfilter/Iptables ............................................................... 15

2.3 Application Layer Packet Classifier for Linux – L7-filter .................................. 17

2.4 Machine learning – C5.0 algorithm ..................................................................... 18

2.5 HTTP overview ................................................................................................... 18

3 IMPLEMENTATION AND RESULT ...................................................................... 20

3.1 HTTP analyzer ..................................................................................................... 20

3.2 L7-filter expression for different applications ..................................................... 21

3/46 17/01/13
INTERNSHIP REPORT

3.2.1 Implementing topology................................................................................. 21

3.2.2 Video/audio................................................................................................... 21

3.2.3 Download...................................................................................................... 23

3.3 Machine learning – C5.0 algorithm ..................................................................... 25

3.3.1 Method overview .......................................................................................... 25

3.3.2 Obtaining training data ................................................................................. 27

3.3.3 Classification attributes ................................................................................ 29

3.3.4 Result ............................................................................................................ 30

3.3.5 Applying result to match module for iptables .............................................. 33

4 CONCLUSION AND FUTURE WORK .................................................................. 35

4.1 Internship conclusion ........................................................................................... 35

4.2 Future work .......................................................................................................... 35

REFERENCE..................................................................................................................... 36

Appendix 1: Implementation of l7-filter ............................................................................ 38

Installation ..................................................................................................................... 38

Usage ............................................................................................................................. 41

Appendix 2: Internship registration information ............................................................... 43

4/46 17/01/13
INTERNSHIP REPORT

List of Figures
Figure 1-1: ISeLAB Logo .................................................................................................... 9

Figure 1-2: ISeLAB Organization ..................................................................................... 10

Figure 2-1: Nonstandard – Firewall ................................................................................... 14

Figure 2-2: Packet flow through iptables [5] ..................................................................... 16

Figure 3-1: HTTP analyzer ................................................................................................ 20

Figure 3-2: Implementing topology ................................................................................... 21

Figure 3-3: Result of l7-filter on the firewall and client machine ..................................... 22

Figure 3-4: Statistic of HTTP entity header field .............................................................. 24

Figure 3-5: List of value of Content-Disposition field ...................................................... 24

Figure 3-6: Result of l7-filter on client machine ............................................................... 25

Figure 3-7: Machine-learning method ............................................................................... 26

Figure 3-8: Average number of bytes corresponds to the first N packets ......................... 27

Figure 3-9: Average time to store first N packets of a HTTP flow ................................... 27

Figure 3-10: Obtaining data ............................................................................................... 28

Figure 3-11: Evaluation on the training data in case of whole flow .................................. 30

Figure 3-12: Decision tree within the whole flow ............................................................. 31

Figure 3-13: Predication using decision tree ..................................................................... 31

Figure 3-14: Evaluation on training data in case of first 50 packets ................................. 32

Figure 3-15: Decision tree within first 50 packets ............................................................. 32

Figure 3-16: Matching module for iptables ....................................................................... 34

Figure 0-1: Kernel configuration ....................................................................................... 39

5/46 17/01/13
INTERNSHIP REPORT

List of Tables
Table 4-1: List of video website ........................................................................................ 23

Table 4-2: The error rate table ........................................................................................... 33

6/46 17/01/13
INTERNSHIP REPORT

ACKNOWLEDGMENTS
I am grateful to my advisor, Dr. Trinh Ngoc Minh, for his support and advice during 6
months I was performing my internship at ISeLAB.

I also thank Mr. Bui Thanh Phong and my colleges in ISeLAB, especially who work in
Non Standard Firewall project for their helps, suggestions to assist my work. Without
their supports, this internship cannot be finished.

I am thankful to all professors who have taught me for 2 years at PUF and have given me
such a useful knowledge to my internship.

My due thanks are to my family and all best friends for encouraging and supporting me
during the life.

7/46 17/01/13
INTERNSHIP REPORT

ABSTRACT
Non-standard Firewall is a developing firewall product of ISeLAB. It contains many
features that can compete with the commercial product [14]. However, one of advanced
features, that Non-standard Firewall does not have, is application filtering. The
requirement for this application filtering is that it can filter network traffic generated by
an application based on the information of the application header, regardless of the
protocol or port it uses at Layer 4. Moreover, there are more and more applications
running on web, such as web game, video online and downloading file, that the
administrator does not want his employees to access during the working hour in company
network. Then, a more specific requirement is given; this filtering can filter at least 3
kinds of HTTP traffic: video, file download and game. To fulfill this requirement, at first
I have worked with pattern matching using l7-filter. With the implementation of l7-filter
and iptables (Section 3.2.2 and 3.2.3), I can apply this to filter 2 kinds of HTTP traffic:
video and file download. However, this cannot apply to filter web game (Section 3.2).
Hence, I went to another approach – machine-learning C5.0 algorithm. I examined this
technique to classify 4 kinds of HTTP traffic: video, file download, game and normal
HTTP traffic (Section 3.3). From the result of this experiment, I gave an idea to build the
application firewall by creating a match module for iptables (Section 3.3.5). Beside 2
approaches of classification, I also created a HTTP analyzer tool to help me on (i) finding
the regular expression for l7-filter, (ii) obtaining training data for the C5.0 algorithm
(Section 3.1).

8/46 17/01/13
INTERNSHIP REPORT

1 INTRODUCTION

1.1 ISeLAB Introduction

1.1.1 General information

Figure 1-1: ISeLAB Logo

The Information Security Laboratory (ISeLAB) was officially established in March 2006
inside Information Technology Park, Vietnam National University HoChiMinh City
(VNU-ITP).

Address: Community 6, Linh Trung Ward, Thu Duc Dist., HCM City, Vietnam

Tel:08 37244004

Fax:08 3 7242058

Website:www.iselab.edu.vn

Email:info@iselab.edu.vn

1.1.2 Mission and vision

ISeLAB has following missions:

- Developing the scientific research in Information Security, transferring research


results from universities to industry and conduct researches and enhances training
quality for VNU-HCM.
- Providing information security services.
- Teaching and training information security experts, the network security for
graduate and postgraduate programs.

9/46 17/01/13
INTERNSHIP REPORT

- Collaborating with joint ventures and foreign companies for doing research,
training and transferring technology in network security and other sectors.
Vision:

“As an applied research center pioneer, providing professional services and


training program in information security”

1.1.3 Organization

ISeLAB has 9 permanents research staffs and most of the members are Ph.D and
Masters with varied backgrounds from Networking, Programming, Windows System,
Linux System, IS0-2700X, computer virus, database technology etc. The organization
diagram of ISeLAB is shown in figure 2:

Figure 1-2: ISeLAB Organization

1.2 Internship Introduction

1.2.1 Motivation

ISeLAB has developed a proprietary firewall called Non-standard Firewall. The idea of
Non-standard Firewall is dividing the firewall into two parts: One (the inside part)
connects to the inside network and another (the outside part) connects to the outside
network. Two parts are interconnected using a non-standard connection (in this way is

10/46 17/01/13
INTERNSHIP REPORT

non-IP, layer 2 only). If the part connected to the outside is compromised, hacker will
find it much more difficult to take the privilege of the inside part in order to hack deeper
into the network or shutdown the filter running on the inside part of firewall.

One of the advanced features of Non-standard Firewall is application filtering. We know


that the normal stateless or stateful firewall filters traffic by matching TCP packets with
source or destination port (e.g. port 80, which is the standard HTTP port). However, some
applications or service such as web servers can be configured to use any port (8080 or
8800) beside the default port; or another bad traffic uses port 80 to avoid firewall; so our
filters will not work for these particular traffics. So that, we need a higher-level filtering
that can filter network traffic generated by an application based on the information of the
application header, regardless of the protocol or port it uses at Layer 4. Moreover, there
are more and more applications running on web, such as web game, video online and
downloading file, that the administrator does not want his employee to access during the
working hour beside normal activities. Main purpose of the application filtering in Non-
standard Firewall is filtering network traffic generated by an application regardless of the
protocol or port it uses at Layer 4, and specially the web applications.

Before implementing the application filtering, we need to find an appropriate technique to


classify the network traffic, especially HTTP traffic. The main purpose of the internship
is to examine different classification technique and to apply each technique to identify
three kinds of HTTP traffic: video, file download and web games.

1.2.2 Internship requirements and expected result

Base on the motivation, I need to fulfill these following requirements during the
internship:

- Research the Non-standard Firewall.


o Research TCP/IP implementation in Non-standard Firewall
o Research application filtering
o Research netfilter, iptables, layer 7 filters.
o Indentify method to build an application filtering upon iptables.
o Research applications on web: video online, file download and webgame…
and how to filter these kinds of network traffic.

11/46 17/01/13
INTERNSHIP REPORT

- Implement the application filter on Non-standard Firewall.


o Install and configure application filter on Non-standard firewall
o Apply filter on basic application: HTTP, SMTP
o Apply filter on web application: web game, video online…

The expected result complies with the above requirements:

- Implement an application filtering complying correctly with the Non-standard


Firewall.
- Test and deploy application filtering with specific kind of network traffic.
- Take and analyze the statistic from the test.

However, during the internship, I found that l7-filter has some disadvantages such as
downgrade the performance, hardly finding the right pattern for a specific kind of
network traffic. I need to examine a different classification technique: machine learning.

During 6 months of the internship, there are some achievements:

- Implementing the l7-filter on Non-standard firewall


- Testing l7-filter with 2 kinds of HTTP traffic: video and file download
- Developing HTTP analyzer tool to help us get more information on HTTP traffic
- Classification of 3 kind of HTTP traffic with machine learning – C5.0 algorithm

1.2.3 Internship Report Structure

This report is divided into 4 chapters:

- Chapter 1: provides an introduction on the department where the internship is


done, and a brief introduction about the internship.
- Chapter 2: provides the background knowledge on Non-standard firewall,
application filtering, different classification techniques.
- Chapter 3: describes the works during the internship. They are:
o Developing the HTTP analyzer tool
o Implementing the l7-filter on Non-standard firewall and apply to identify
different kinds of HTTP traffic
o Classifying HTTP traffic with C5.0 algorithm and proposing a scheme to
build a match module for Non-standard firewall

12/46 17/01/13
INTERNSHIP REPORT

- Chapter 4: gives a discussion and conclusion for what I did in the internship, and
provides an approach to future work.

1.2.4 Internship schedule

No. Task Start date End date

1 Research on Non-standard firewall 1th June 15th June

2 Research on Linux firewall - iptables 16th June 30th June

3 Research on l7-filter 1st Jul 14th Jul

4 Implementing l7-filter on Non-standard firewall 15th Jul 31st Jul

5 Building the HTTP analyzer 1st Aug 31st Aug

6 Applying l7-filter to different kinds of HTTP traffic 1st Sep 30th Sep

7 Research on machine learning 1st Oct 15th Oct

8 Upgrading the HTTP analyzer 16th Oct 15th Nov

9 Classifying the HTTP traffic using C5.0 algorithm 16th Nov 15th Dec

10 Writing the report 16th Dec 15th Jan

13/46 17/01/13
INTERNSHIP REPORT

2 TECHNOLOGY BACKGROUND

2.1 Non-standard Firewall

A firewall is a part of a computer system or network that is designed to protect the system
or network. Its main function is blocking unauthorized access while permitting authorized
communications. Since the firewall has some vulnerability that can be exploiting to
compromise the firewall, hacker can use the hop-by-hop attack to compromise the whole
system.

ISeLAB has given a solution to defend the hop-by-hop attacks. It is Non-standard


Firewall (NS-FW). The idea of NS-FW is shown in figure 3-1:

Internet

LAN/Inside network Ethernet/non-IP

Internal box External box

Figure 2-1: Nonstandard – Firewall

NS-FW includes two components: internal box and external box. The connection
between two boxes is Ethernet (non-IP). The term “non-standard” means non-IP. The
objective of designing such a special firewall is against hop-by-hop attacks and protects
the LAN inside. When a hacker tries to attack the firewall, maybe he can access the
External Box, but not the Internal Box using IP protocol as usual, because the
connection between two boxes is Ethernet (non-IP).

2.2 Application filtering

2.2.1 Introduction

One of the advanced features of Non-standard Firewall is application filtering. We know


that the normal stateless or stateful firewall filters traffic by matching TCP/UDP packets
with source or destination port (e.g. port 80, which is the standard HTTP port). However,

14/46 17/01/13
INTERNSHIP REPORT

some applications or service such as web servers can be configured to use any port (8080
or 8800) beside the default port; or another bad traffic uses port 80 to avoid firewall; so
our normal filters will not work for that particular traffic. So that, we need a higher-level
filtering that can filter network traffic generated by an application based on the
information of the application header, regardless of the protocol or port it uses at Layer 4.
Moreover, there are more and more applications running on web such as web game, video
online… which the administrator does not want his employees to access during the
working hour. Hence, we need to filter network traffic generated by an application
regardless of the protocol or port it uses at Layer 4, and specially the web application.

2.2.2 Linux Firewall – Netfilter/Iptables

The netfilter/iptables firewall is developed by the Netfilter Project and is available in all
major distributions of Linux [11]. There are two parts of the firewall:

- Iptables: the front end of the firewall (also called the user-land program) which
instructs the kernel what to do with the IP traffic that flow through the Linux box
(arriving, passing through, or leaving).
- Netfilter: part of the Linux kernel in terms of security, packet mangling, and
network address translation. That is the back end of the firewall (kernel module)
which analyzes all the packets going through it; and the kernel finds matching
rules, then the packet is manipulated according to the matching rule.

Moreover, netfilter/iptables has four important components [4] [5]:

- Table: an iptables structure that define the categories of functionality. There are
four tables: filter, nat, mangle and raw. Each table contains a set of rules that have
different functionality, such as filtering rules in filter table, NAT rules in nat
table…. This internship will focus on the filter function of the netfilter/iptables.
- Chain: each table has its own set of chains that can be built-in chain or user-define
chain. For filter table, the most important chains are:
o The INPUT chain concerns packets that are destined for the Linux system
itself.
o The OUTPUT chain is reserved for packets that are generated by the Linux
system itself.

15/46 17/01/13
INTERNSHIP REPORT

o The FORWARD chain processes packets that are routed through the Linux
system (when the Linux system works as a router).

The flow of the packet goes through the tables and chain in netfilter/iptables as the
following figure:

Figure 2-2: Packet flow through iptables [5]

- Match: a condition for iptables to process the packet, normally a classification of


the nework traffic. If the condition is matched, the packet will be processed
according to the action specified by the rule target. This internship focuses on the

16/46 17/01/13
INTERNSHIP REPORT

match module – a condition/a method to classify the HTTP traffic. There are 2
kinds of classification:
o Pattern matching with Application Layer Packet Classifier for Linux –
L7-filter
o Machine learning with C5.0 algorithm
- Target: an action that is applied when a condition is matched. There are some
popular targets such as DROP, ACCEPT, LOG…

2.3 Application Layer Packet Classifier for Linux – L7-filter

Application Layer Packet Classifier for Linux - l7-filter is a matching module for iptables.
Unlike some matching module based on the value of a field in the header, l7-filter using
regular expression on the application layer data to classify the network application.

L7-filter consists of three important parts:

- A kernel patch: applies to kernel, provides a method for the kernel to analyze
packet
- A patch for iptables: applies to iptables as a module, provides matching options for
iptables
- A collection of protocol definition files: provides sample regular expression to
define popular protocols [12].

L7-filter, which combines with the ip-conntrack module [5] [13], inspects first N packets
(N=20 by default) of each data connection and finds the matching expression in the
application data (each connection is defined by five attributes: IP source, IP destination,
source port, destination port, transport protocol). If the matching is found in one packet,
the whole connection will be classified as a defined application.

For example, we know that a HTTP transaction contains request and response (Section
2.5), and a response contains HTTP version, status code in the status line and some
mandatory HTTP response header. It is assumed that the expression for a HTTP response
is “http/(0\.9|1\.0|1\.1) [1-5][0-9][0-9][\x09-\x0d-~]*(connection:|content-
type:|content-length:)”. If l7-filter detects one part of any packet matching with the
expression then the whole connection is identified as a HTTP connection.

17/46 17/01/13
INTERNSHIP REPORT

The most difficult part is finding the most appropriate regular expression that defines an
application/protocol. The best way is understanding clearly the communication of the
application/protocol through the network using the specification (RFC - Request for
Comments) or using network protocol analyzer (TCPDump or Wireshark).

2.4 Machine learning – C5.0 algorithm

C5.0 is a data-mining tool that is based on the input statistic to distinguish between
different types of applications/protocol and generate classification rules (decision trees) to
make the prediction. According to [2][3], C5.0 can distinguish 3 main kinds of HTTP
traffic: web browsing, file download and video/audio. However, this is mainly based on
the content-type field in the HTTP header and the application that the user uses. Within
the objective of this internship is that the firewall can filter three kinds of HTTP traffic:
file download, video/audio and game, content-type is not enough and firewall do not
concern with the application that the user uses. We need to distinguish and make the
firewall recognize 3 kinds of HTTP traffic: video, file download, game using C5.0 with
different set of attributes.

2.5 HTTP overview

The Hypertext Transfer Protocol (HTTP) is an application-layer protocol behind the


World Wide Web (WWW) and is used to transfer web pages across a network . HTTP
provides a way for clients to begin the communication by requesting data, and for servers
to respond to these requests [1][8].

A HTTP communication contains two types of message:

- Request message: is used by client to request data and has 3 basic elements:
o Request line: specifies request method, resource location and HTTP
version
o HTTP header: provides information that can help Web client explain
request more clearly. They can be general header, request header or entity
header (if the content exists)
o Content: optional, exist in case of data upload.

18/46 17/01/13
INTERNSHIP REPORT

- Response message: is used by server to response the request of client and also has
3 basic element:
o Status line: give the summary of the response and contains the HTTP
version, status code and brief description of the status code.
o HTTP header: provide the information that help client to understand the
response. They can be general header, response header and entity header
o Content: HTTP data

One of important things to give the information about HTTP content is HTTP entity
header:

- Allow: indicate request methods for the specified resource


- Content-Encoding: specify the encoding that applies on the HTTP content. The
encodings can be deflat, compress or gzip.
- Content-Length: the size of the content
- Content-Location: specify the location of the content in the request
- Content-MD5: checksum of the content
- Content-Range: if the content is divided into parts, this specifies the part that
include in the message.
- Content-Type: kind of the content
- Content-Disposition: decide that the message should be display (value: inline), or
requires some form of action from the user to open it (value: attachment)
- Expires: the time that the content is expired
- Last-Modified: the time that content has created or modified on server

In the work of this internship, I mainly use the value of the entity HTTP header to be the
base of the classification.

19/46 17/01/13
INTERNSHIP REPORT

3 IMPLEMENTATION AND RESULT

3.1 HTTP analyzer

Figure 3-1: HTTP analyzer

Before going to the different classification methods and applying them for the firewall,
we need to clearly understand the operation/communication of the application/service
through HTTP. Actually, the specification of HTTP is good but not enough to understand
the inside application/service and get the pattern for l7-filter or the statistic for machine
learning. In addition, the network protocol analyzer such as Wireshark is the best choice
but it takes a long time to analyze the behavior of the application/service through HTTP. I
need a tool to support the administrator to analyze and get the pattern for l7-filter or the
statistic for C5.0 algorithm.

With the help of a Java library - JNetPcap that provides the functions to work with the
capturing network packet, I develop a tool that can support the administrator to analyze
the HTTP application/service. That tool is called HTTP analyzer and has following
functions:

- Read a specific .pcap file


- Extract the HTTP flow from the .pcap

20/46 17/01/13
INTERNSHIP REPORT

- Take the frequency of the HTTP header field, also the frequency of the value of
that header field. (a base to get pattern for l7-filter)
- Take the statistic of HTTP flow and write to a file that is used for the input of
C5.0 algorithm

3.2 L7-filter expression for different applications

The goal of this internship is differentiating three kinds of HTTP traffic: video, file
download, game with the other HTTP traffic. Using the specification of HTTP and the
help of HTTP analyzer, I can find the pattern matching for video and download but not
game because of some following reasons:

- There are many technologies to build browser-based game.


- Each time a game is played, many components are loaded and there is not a
specific component that can be used as a matching pattern.

3.2.1 Implementing topology

l7-filter

Ethernet Non-IP Internet

Inside machine

Figure 3-2: Implementing topology

To implement and to test the l7-filter, I only need 2 components: a PC as a client


machine, a Non-standard firewall. The l7-filter is installed on the inside part of the Non-
standard firewall.

3.2.2 Video/audio

To watch online video from a video server, the client has to send a request and the server
will response a message with video content. To make the client understand that the
content is a video/audio, the HTTP header inside the message will have the “content-
type” field with value “video/…” for video or “audio/…” for audio.

21/46 17/01/13
INTERNSHIP REPORT

The result regular expressions for online audio/video are:

- Video: http/(0\.9|1\.0|1\.1)[\x09-\x0d][1-5][0-9][0-9][\x09-\x0d
-~]*(content-type: video)
- Audio: http/(0\.9|1\.0|1\.1)[\x09-\x0d][1-5][0-9][0-9][\x09-
\x0d -~]*(content-type: audio)

Then, applying this regular expression in l7-filter configuration:


root@trustix /etc/l7-protocols# iptables -L FORWARD -n
Chain FORWARD (policy ACCEPT)
target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0 LAYER7 l7proto video
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0

I try to watch video on different online-video sites (Table 4-2) and have got the result
shown on the firewall and the client as following figures. Within the figure 4-2, I can see
that the browser can load the others component of the website but cannot load the video.
root@trustix /etc/l7-protocols# iptables -L FORWARD -n -v
Chain FORWARD (policy ACCEPT 10189 packets, 13M bytes)
pkts bytes target prot opt in out source destination
44 32213 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 LAYER7 l7proto video
6314 506K ACCEPT all -- eth1 * 0.0.0.0/0 0.0.0.0/0

Figure 3-3: Result of l7-filter on the firewall and client machine

22/46 17/01/13
INTERNSHIP REPORT

However, there is one special case of online video that the l7-filter with the above regular
expression cannot filter, is real-time streaming video. The reason is that this kind of
website contains a special player. This player uses its own way to communicate with
server to get the video (especially online TV streaming), for example RTSP, RTMP.

Website L7-filter Note

Tv.zing.vn Yes

Dailymotion.com Yes

Phim3s.com Yes

Youtube.com Yes

Veoh.com Yes

Kenh14.vn/video.chn Yes

www.bing.com/videos Yes

Xemphimon.com Yes

Tv24.vn No Using RTMP

Movies.hdviet.com No Content-type: application/octet-


stream

Table 3-1: List of video website

3.2.3 Download

To download class, we cannot use the Content_Type field, because there are many file
types corresponding with Content-Type field (audio file: audio/ , video file: video/ , gzip
file: application/gzip). Moreover, there are download website that have Content_Type of
application/octet-stream. I need to find another pattern for file download by using the help
of HTTP analyzer. From the result of HTTP analyzer with statistic of the HTTP header
field, we can see that the field Content_Disposition is appeared only with the download
traffic.

23/46 17/01/13
INTERNSHIP REPORT

60%

50%

40%

30%

20% Download
Game
10%
Video
0%

Figure 3-4: Statistic of HTTP entity header field

Inspecting deeper with HTTP analyzer, we can see the value of Content-Disposition in
download traffic as following figure

Figure 3-5: List of value of Content-Disposition field

Specific expression for download is http/(0\.9|1\.0|1\.1)[\x09-\x0d][1-5][0-


9][0-9][\x09-\x0d -~]*(content-disposition: attachment)

Verifying with l7-filter we get the result on the firewall and the client (figure 4-5)

24/46 17/01/13
INTERNSHIP REPORT

root@trustix /etc/l7-protocols/extra# iptables -L FORWARD -n -v


Chain FORWARD (policy ACCEPT 18522 packets, 23M bytes)
pkts bytes target prot opt in out source destination
5 3953 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 LAYER7 l7proto download
3392 363K ACCEPT all -- eth1 * 0.0.0.0/0 0.0.0.0/0

Figure 3-6: Result of l7-filter on client machine

3.3 Machine learning – C5.0 algorithm

It is difficult to find a specific pattern to match the game traffic (Section 3.2). Moreover,
there are many kinds of game with different technology; each technology has different
implementation and operation. I examine another approach of classification method to
differentiate browser-based game traffic with another by using machine learning. The
difficulty of this method is finding an appropriate set of attributes used to classify. This
internship gives a hypothesis and verifies whether it can be used in firewall technology or
not.

3.3.1 Method overview

There are many machine-learning algorithms that can be used in classification such as
C4.5, Naive Bayes classifier, Bayesian networks…. In this internship, I use a tool called
See5/C5.0 that uses the upgrade version of C4.5 algorithm. The advantage of this tool is
that it can analyze substantial databases containing thousands of records, generate the
decision tree or set of if-then rule that we can use to predict the other records.

The idea of using machine learning is described as following step:

25/46 17/01/13
INTERNSHIP REPORT

- Using Wireshark to capture the communication each time I use different


application/service
- Using the HTTP analyzer to extract the HTTP flows and get their statistic
- Using these statistics as input vector for C5.0 algorithm: The C5.0 algorithm gives
us 2 important information: the evaluation of the algorithm, the decision tree/the
rule set to predict the category for new case.
- Using the decision tree / rule set to implement a condition/matching module for
the iptables

Packet capture Matching


.pcap module

HTTP analyzer/ Decision tree/


Wireshark rule set

HTTP Flow
C5.0 algorithm
statistic
Result
evaluation

Figure 3-7: Machine-learning method

However, if a firewall has to keep the whole flow to categorize and filter, this affects the
performance, the consumed resource or the reaction time of the firewall. I did some
statistic on the HTTP flow and had the result on figure 3-8 and 3-9. The figure 3-8 shows
the average number of bytes corresponds to the first N packets of a HTTP flow. The
larger the number of byte the firewall needs to process, the larger resource it needs to
store and handle. In addition, the figure 3-9 shows the time to store the HTTP flow before
analyzing them. This time also has a contribution on the reaction time of the firewall
before it decides to filter the flow or not. Hence, we need to reduce the number of packets
in a HTTP flow so that it not only provides the right information, which can be used to
categorize the HTTP traffic, but also keep the performance of the firewall.

26/46 17/01/13
INTERNSHIP REPORT

Figure 3-8: Average number of bytes corresponds to the first N packets

Figure 3-9: Average time to store first N packets of a HTTP flow

3.3.2 Obtaining training data

I have to classify 4 kinds of HTTP traffic: video, file download, game, normal web
browsing.

There are 2 input files for See5/C5.0:

- .name file: list of classification attribute

27/46 17/01/13
INTERNSHIP REPORT

- .data file: provides information on the training cases from which See5 will extract
decision tree/rule set. The entry for each case consists of values for all explicitly
defined attributes (defined in .name file).

To get the information and put into the .data file, I need the help of HTTP analyzer and
Wireshark. Using Wireshark, I capture the traffic while I access different kind of HTTP
traffic and save them in .pcap format file. Before using HTTP analyzer, I use a little trick
to filter the right traffic to analyze. We already know that the video traffic contains the
Content-type value <video/> and the download traffic contains the Content-disposition
value <attachment>. So that, I filter these traffic before getting the statistic with HTTP
analyzer. Another trick is when we view a webpage there are many components from the
other site (advertising site). I only choose the IP communication that has the most traffic
as following figure:

Figure 3-10: Obtaining data

This filtered communication is saved into a new .pcap file and becomes the input of the
HTTP analyzer. Inside the communication, I extract only the HTTP traffic and analyze
each HTTP traffic to provide the information for the C5.0 algorithm.

28/46 17/01/13
INTERNSHIP REPORT

3.3.3 Classification attributes

The .name file defines following attributes:

- http-ratio: continuous. The ratio of the HTTP packet in the flow


- in-out: continuous. The ratio of the inbound packet and the outbound packet
- avr-packet-size: continuous. Average packet size
- avr-content-length: continuous. Average content-length of the objects on one
HTTP flow
- GET-request-ratio: continuous. The ratio of method “GET” over HTTP request
- POST-request-ratio: continuous. The ratio of method “POST” over HTTP
request
- content-disposition: inline, attachment. Value of Content-disposition that
appeared in the HTTP flow
- video: continuous. The ratio of the content-type video that appeared on the HTTP
flow
- audio: continuous.
- text: continuous.
- image: continuous.
- multipart: continuous.
- application/x-shockwave-flash: continuous.
- application/javascript: continuous.

As in the list, there are 2 types of attribute:

- One that involves the statistic of the flows: http-ratio, in-out, avr-packet-size.
- Another that involves the statistic of the HTTP header field. As in the technology
background, the entity header reflects the information about the content of the
HTTP transaction. Beside, most of them are the values of the Content-type field
because there are many loaded objects within a HTTP flow; each object has
different content type. Giving the percentage of each of them within a HTTP flow
may help us to find the characteristic of a flow.

29/46 17/01/13
INTERNSHIP REPORT

3.3.4 Result

These two following sections provide the example of the results after using C5.0
algorithm to categorize the HTTP traffic. The result contains two parts:

- The decision tree/set of rule


- The evaluation on training data: show us the size of the decision tree or number of
rule and the error rate if I apply the decision tree on the training data.

Then, the third section gives the discussion on the result of different trails with different
set of training cases.

3.3.4.1 Classification with whole HTTP flow

Evaluation on the training data: The error rate is 1.3% and the wrong classification
belongs to class game and normal.

Figure 3-11: Evaluation on the training data in case of whole flow

Decision tree: We can use this decision tree to predict the class for any HTTP traffic.

30/46 17/01/13
INTERNSHIP REPORT

Figure 3-12: Decision tree within the whole flow

To verify the error of the decision tree, I use this decision tree on a set of test cases (figure
3-12). These cases are on the game class. We have 30 cases of game traffic. There are 3
of them that are classified as normal traffic. The error rate of predication on game traffic
is 10%.

Figure 3-13: Predication using decision tree

31/46 17/01/13
INTERNSHIP REPORT

3.3.4.2 Classification with first 50 packets of HTTP flow

Evaluation on training data: The error rate is 2.2% and the wrong classification belongs to
class game and normal.

Figure 3-14: Evaluation on training data in case of first 50 packets

We also have the decision tree based on the training data.

Figure 3-15: Decision tree within first 50 packets

For this case, I also use this decision tree on a set of test cases. These cases are on the
game class. We have 32 cases of game traffic. There are 5 of them that are classified as
normal traffic. The error rate of predication on game traffic is 15.6%.

32/46 17/01/13
INTERNSHIP REPORT

3.3.4.3 Discussion

In this internship, I examine four classes of HTTP traffic: video, download, game and
normal within 230 cases of HTTP flow. These training cases contain 38 cases of video,
27 cases of file download, 51 cases of game and 113 cases of others. As the discussion on
section 3.3.1, we need to reduce the number of examined packets in a HTTP flow so that
it not only provides the right information, which can be used to categorize the HTTP
traffic, but also keep the performance of the firewall. To do that, for each class of traffic, I
take the classification information within first N packet of the HTTP flow. With each
value of N, I are using the boost option of See5/C5.0 with 10 trails to get the best decision
tree with lowest error rate and the result as following table:

N Best error rate (%)

Whole flow 1.3

50 2.2

35 3

30 4.6

25 5.4

Table 3-2: The error rate table

From the table 4-2, we see that, the more N decreases, the more error rate increases.
Based on this table and the statistic on the figure 3-8, depending on the capacity of the
machine that deploys firewall and the error tolerant, we can choose the appropriate
number of N on the HTTP flow. Moreover, we need more experiments on the other set of
attributes on more training cases. This may be clarified in the future work with the
implementation and performance analysis of this classification technique in Non-standard
firewall.

3.3.5 Applying result to match module for iptables

This internship provides an idea of building a matching module for iptables based on the
result of the classification method above and leaves the implementation to the future

33/46 17/01/13
INTERNSHIP REPORT

work. This module can be implemented on any kind of LINUX firewall, not only the
Non-standard firewall.

This module contains three processes and needs three inputs as in figure 3-15:

- Three input:
o TCP flow
o Decision tree from the result above
o Name of the class that we want to match
- Three process:
o HTTP flow filter: choose the HTTP flow for next process
o Calculate the statistic: calculate the value for each attribute at section 4.3.3
o Classification mechanism: using the decision rule to determine the class of
the HTTP flow

TCP flow
Decision tree

HTTP flow Calculate Classification


HTTP flow
filtered statistic mechanism

Class name The same? Class name

Yes/No

Figure 3-16: Matching module for iptables

34/46 17/01/13
INTERNSHIP REPORT

4 CONCLUSION AND FUTURE WORK

4.1 Internship conclusion

This internship shows us two different approaches to implement the application filtering
on Non-standard firewall. The first one is pattern matching with l7-filter and iptables.
This technique is successfully implemented on the Non-standard firewall. With the help
of HTTP analyzer – the tool I have made, I can apply l7-filter to filter 2 kinds of HTTP
traffic: video and file download. Because of the difficulty in finding the regular
expression for game (Section 3.2), I went to another approach: machine-learning
algorithm – C5.0 algorithm (Section 3.3). With this approach, I only gave the
classification method to classify the HTTP traffic (Section 3.3.1). From the result of this
method, I proposed an idea to build a match module for iptables (Section 3.3.5).

After this internship, I have a deep knowledge on many fields: LINUX firewall, HTTP,
application filtering and machine-learning C5.0 algorithm. Beside the technical knowledge,
the working environment in ISeLAB helps me improve many important skills: (1) easily
expressing my ideas, (2) working in team. These skills are very important for me to go further
in my profession.

4.2 Future work

As stated in section 3.3.4 and 3.3.5, to have a perfect feature for application filtering, we
need more works on following tasks:

- Examining different sets of classification attribute


- Increasing the number of training cases to get a better evaluation
- Creating new match module for iptables based on the decision tree/rule set from
C5.0 algorithm
- Analyzing the performance of the module and comparing with the l7-filter to
decide the classification method for the application filtering on Non-standard
firewall

35/46 17/01/13
INTERNSHIP REPORT

REFERENCE
[1] Leon Shklar, Richard Rosen. "Birth of the World Wide Web: HTTP ". Web
Application Architecture: Principles, protocols and practices. pp.32-68. John Wiley &
Sons Ltd. 2003.

[2] Tomasz Bujlow, Tahir Riaz, Jens Myrup Pedersen. "A method for classification of
network traffic based on C5.0 Machine Learning Algorithm," Computing, Networking
and Communications (ICNC), 2012 International Conference on , vol., no., pp.237-241,
Jan. 30 2012-Feb. 2 2012

[3] Tomasz Bujlow, Tahir Riaz, Jens Myrup Pedersen. "Classification of HTTP traffic
based on C5.0 Machine Learning Algorithm." Computers and Communications (ISCC),
2012 IEEE Symposium on , vol., no., pp.000882-000887, 1-4 July 2012.

[4] Bert Hubert. Linux Advanced Routing & Traffic Control HOWTO. 29/10/2003

[5] Lucian Gheorghe. “Layer 7 filtering.” Designing and Implementing Linux Firewalls
and QoS using netfilter, iproute2, NAT, and L7-filter. pp.119-136. PACK publishing.
2006.

[6] Chris Sinclair, Lyn Pierce, Sara Matzner. "An application of machine learning to
network intrusion detection." Computer Security Applications Conference, 1999. (ACSAC
'99) Proceedings. 15th Annual , vol., no., pp.371-377, 1999.

[7] Sebastian Zander, Thuy Nguyen, Grenville Armitage. "Automated traffic


classification and application identification using machine learning." Local Computer
Networks, 2005. 30th Anniversary. The IEEE Conference on , vol., no., pp.250-257, 17-
17 Nov. 2005

[8] David Gourley, Brian Totty, Marjorie Sayer, Anshu Aggarwal, Sailu Reddy. "Chapter
15. Entities and Encodings." HTTP: The Definitive Guide. pp. 317-342. September 2002

[9] L7-filter Kernel Version HOWTO. http://l7-filter.sourceforge.net/HOWTO-kernel

[10] JNetPcap Open source | Protocol analysis SDK. http://jnetpcap.com/

[11] Netfilter/iptables project. http://www.netfilter.org/

[12] L7-filter Supported Protocols. http://l7-filter.sourceforge.net/protocols

36/46 17/01/13
INTERNSHIP REPORT

[13] The netfilter.org "libnetfilter_conntrack" project.

http://www.netfilter.org/projects/libnetfilter_conntrack/index.html

[14] NGUYEN Anh Dung. Implementing and Testing Non-Standard Firewall. 2009

37/46 17/01/13
INTERNSHIP REPORT

Appendix 1: Implementation of l7-filter

Installation

Getting the source code from http://l7-filter.clearfoundation.com/


root@trustix /usr/src# wget http://download.clearfoundation.com/l7-
filter/netfilter-layer7-v2.22.tar.gz
Extracting the compressed file
root@trustix /usr/src# tar -zxvf netfilter-layer7-v2.22

Downloading kernel source if it does not exist.

Changing the directory to the kernel source, applying the patch of layer-7 to the kernel
source.
root@trustix /usr/src/kernel-source-2.6.19.7-3tr# patch -p1 < ../netfilter-
layer7-v2.22/for_older_kernels/kernel-2.6.18-2.6.19-layer7-2.9.patch

Configuring the kernel using one of these commands: make config, make menuconfig or
make Xconfig. These following options need to be enabled:

- Code maturity level options | Prompt for development and/or incomplete


code/drivers
- Netfilter (Device Drivers | Networking support | Networking Options | Network
packet filtering)
- Connection tracking (Network packet filtering | IP: Netfilter Configuration |
Connection tracking)
- Connection tracking flow accounting and IP tables support (on the same screen)
- Layer 7 match support

38/46 17/01/13
INTERNSHIP REPORT

Figure 0-1: Kernel configuration

Compiling and installing using these commands: make, make modules, make
modules_install and make install

Downloading the source of iptables on netfilter.org


root@trustix /usr/src# wget
http://www.netfilter.org/projects/iptables/files/ iptables-1.4.13.tar.bz2

Extracting the compressed file


root@trustix /usr/src# tar xjvf iptables-1.4.13.tar.bz2

Copying 2 files: libxt_layer7.c and libxt_layer7.man in the folder netfilter-


layer7-v2.22/1.4.3forward-for-kernel-2.6.20forward into folder extension/ of iptables

Configuring and installing the iptables with following steps:

- "./configure --with-ksource=/usr/src/kernel-source-2.6.19.7-3tr/"
(đường dẫn đến thư mục chứa source của kernel mà ta vừa cấu hình và cài đặt)
- "make"
- "make install"

39/46 17/01/13
INTERNSHIP REPORT

Downloading the protocol set


root@trustix /usr/src# wget http://download.clearfoundation.com/l7-
filter/l7-protocols-2009-05-28.tar.gz

Extracting and copying then into the folder /etc/l7-protocols


root@trustix /usr/src# tar -zxvf l7-protocols-2009-05-28.tar.gz
Before we can use l7-filter, we need to check whether it is successfully installed by using
command modinfo (module name is ipt_layer7)

root@trustix ~# modinfo ipt_layer7

filename: /lib/modules/2.6.19.7-3tr/kernel/net/ipv4/netfilter/ipt_layer7.ko

author: Matthew Strait <quadong@users.sf.net>, Ethan Sommer <sommere@users.sf.net>

license: GPL

description: iptables application layer match module

version: 2.0

vermagic: 2.6.19.7-3tr mod_unload 586

depends: ip_conntrack

srcversion: C5460962D1CE10F665D072A

parm: maxdatalen:maximum bytes of data looked at by l7-filter (int)

Checking whether module ipt_layer7 and module ip_conntrack were loaded, if not we
use the modprobe to load these module

root@trustix ~# modprobe ipt_layer7

root@trustix ~# lsmod

Module Size Used by

ipt_layer7 10376 0

ipv6 208992 12

iptable_filter 2432 0

iptable_nat 6020 0

iptable_mangle 2432 0

ip_nat_snmp_basic 9988 0

ip_nat_pptp 4100 0

ip_nat_irc 2176 0

40/46 17/01/13
INTERNSHIP REPORT

ip_nat_ftp 2816 0

ip_nat 14252 4 iptable_nat,ip_nat_pptp,ip_nat_irc,ip_nat_ftp

ip_conntrack_pptp 8080 1 ip_nat_pptp

ip_conntrack_irc 5520 1 ip_nat_irc

ip_conntrack_ftp 6160 1 ip_nat_ftp

ip_conntrack 38796 10
ipt_layer7,iptable_nat,ip_nat_snmp_basic,ip_nat_pptp,ip_nat_irc,ip_nat_ftp,ip_nat,ip_conntrack_pptp,ip_c
onntrack_irc,ip_conntrack_ftp

Using l7-filter as a match module in iptables with option –m, sub-option --l7proto to
identify the protocol that need to be matched.

root@trustix /usr/src# iptables -A INPUT -m layer7 --l7proto http

root@trustix /usr/src# iptables -L

Chain INPUT (policy ACCEPT)

target prot opt source destination

all -- anywhere anywhere LAYER7 l7proto http

Chain FORWARD (policy ACCEPT)

target prot opt source destination

Chain OUTPUT (policy ACCEPT)

target prot opt source destination

Usage

The protocol I use to test is HTTP. Starting with the definition file of the protocol, I have
a file with the extension .pat. The content of the http.pat is as following

http

http/(0\.9|1\.0|1\.1) [1-5][0-9][0-9] [\x09-\x0d~]*(connection:|content-type:|content-length:|date:)|post


[\x09-\x0d -~]* http/[01]\.[019]

The content of this definition file has 2 lines:

- First line: The name of the file needs to be the same with the name of the protocol
that is defined inside the definition.
- Second line: The regular expression that is used to match the flow of HTTP traffic.

41/46 17/01/13
INTERNSHIP REPORT

Using the following comment to apply filtering HTTP traffic

root@trustix /usr/src# iptables -A INPUT -m layer7 --l7proto http

Testing the filtering by loading the Google page

root@trustix /usr/src# wget http://google.com

--09:25:07-- http://google.com/

=> `index.html'

Resolving www.google.com.vn... 74.125.128.94, 2404:6800:4005:c00::5e

Connecting to www.google.com.vn|74.125.128.94|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [text/html]

[ <=> ] 11,804 --.--K/s 09:25:08 (275.49 KB/s) - `index.html' saved [11804]

Showing the result, we see that there is a matched flow on the iptables.

root@trustix /usr/src# iptables -L -n -v

Chain INPUT (policy ACCEPT 67 packets, 18306 bytes)

pkts bytes target prot opt in out source destination

15 14888 all -- * * 0.0.0.0/0 0.0.0.0/0 LAYER7 l7proto http

The two following sections provide the specific regular expression to define different
kinds of HTTP traffic.

42/46 17/01/13
INTERNSHIP REPORT

Appendix 2: Internship registration information

Etudiant

NOM : TRẦN Prénom : Thị Dung

Email : trandung1369@gmail.com Tél. : +84907116939

Parcours : Master Informatics

Lieu du stage

Société ou laboratoire: Information Security Lab - Information Technology Park,


Vietnam National University Ho Chi Minh City (VNU-ITP)

Adresse: Community 6, Linh Trung Ward, Thu Duc Dist., HCM City, Vietnam

Responsable du stage :

NOM : TRINH Prénom : Ngọc Minh

Email : minhtn@isepro.vn Tél :

Tuteur de stage (si différent) :

NOM : Prénom :

Email : Tél :

43/46 17/01/13
INTERNSHIP REPORT

Modalités

Durée : du June 1st au : November 30th

Indemnités (facultatif) : ……………. € /


mois………………………..….……………………

Sujet du stage

Titre: Research and implement the appication filtering in Non-standard Firewall

Mots-clé: Firewall, non-standard, application filtering

Descriptif détaillé du sujet :

Specification

Nowadays, firewalls are important to many organizations because they can protect
their network from many kinds of attack. They work at the border which separates the
inside and outside network to protect the inside network. If the firewall is hacked or
attacked, it may lead to the intrusion into the inside network.

In the meanwhile, ISeLAB has developed a proprietary firewall called Non-


standard Firewall. The idea of Non-standard Firewall is dividing the firewall into two
parts: One (the inside-part) connects to the inside network and another (the outside-part)
connects to the outside network. Two parts are interconnected using a non-standard (in
this way is non-IP, layer 2 only) connection so that if the part connected to the outside is

44/46 17/01/13
INTERNSHIP REPORT

compromised, hacker will find it much more difficult to take the privilege of the inside
part in order to hack deeper into the network or shutdown the filter running on the inside-
part of firewall.

One of the advanced features of Non-standard Firewall is application filtering. We


know that the normal stateless or stateful firewall filters traffic by matching TCP packets
with source or destination port (e.g port 80, which is the standard HTTP port). However,
some applications or service such as web servers can be configured to use any port (8080
or 8800) beside the default port; or another bad traffic uses port 80 to avoid firewall; so
our filters won't work for that particular traffic. So that, we need a higher filtering that can
filter network traffic generated by an application based on the information of the
application header, regardless of the protocol or port it uses at Layer 4. Moreover, there
are more and more applications running on web such as web game, video online… which
the administrator does not want his employee to access during the working hour beside
normal webpage. The internship will focus on filtering network traffic generated by an
application regardless of the protocol or port it uses at Layer 4, and specially the web
application.

Objective

- Research on the Non-standard Firewall

+ Research on TCP/IP implementation in Non-standard Firewall

+ Research on application filtering

+ Research on Netfilter, iptables, layer 7 filters.

+ Indentifying method to build an application filtering upon iptables

+ Research on web applications: video online and web game and how to filter
these kinds of network traffic

- Implementing the application filter on Non-standard Firewall

+ Installing and configure application filter on Non-standard firewall

+ Applying filter on basic application: HTTP, SMTP

+ Applying filter on web application: web game, video online…

45/46 17/01/13
INTERNSHIP REPORT

Facility and equipments

- Non-standard Firewall

Expected results

- Implement an application filtering complying correctly with the Non-standard


Firewall.

- Test and deploy application filtering with specific kind of network traffic.

- Take and analyze the statistic from the test.

46/46 17/01/13

You might also like