Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

MAnagement of Security information and events

in Service InFrastructures

MASSIF
FP7-257475

D3.2.1 - Scenarios analysis and external


languages specification

Activity

A3

Workpackage

WP3.2

Due Date

December 2010

Submission Date

2011-02-04

Main Author(s)

Herv Debar (TSP)

Version

v1.0(Rev : 92)

Status

Final

Dissemination

CO

Nature

Level
Keywords

security languages, event languages, alert languages

Reviewers

Luigi Romano (CINI)


Claudio Soriente (UPM)

Part of the Seventh


Framework Programme
Funded by the EC - DG INFSO

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Version history
Rev

Date

Author

Comments

V0.1

2011-01-14

Herv Debar

First draft for review

V1.0

2011-02-03

Herv Debar

Final version after 2nd review cycle

V1.0

2011-02-04

Elsa Prieto (Atos)

Final review and delivery

2011 by MASSIF Consortium

2 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Glossary of Acronyms
Abbr

Abbreviation

BSCW

Be Smart - Cooperate Worldwide

CEF

Common Event Format

CLF

Common Log Format

CSS

Cascading style sheets

DoW

Description of Work

EC

European Commission

EU

European Union

FP7

Seventh Framework Programme

FTP

File Transfer Protocol

IEFT

Internet Engineering Task Force

LEA

Log Extraction API

MASSIF MAnagement of Security information and events in Service InFrastructures


MSS

Managed Security Service

MSSP

Managed Security Service Provider

OASIS

Organization for the Advancement of Structured Information Standards

ODBC

Open Database Connectivity

PU

Public Usage

R&D

Research & Development

RSS

Really Simple Syndication

SCP

Secure Copy

SFTP

Secure File Transfer Protocol

SIEM

Security Information and Event Management

SNMP

Simple Network Management Protocl

SSH

Secure Shell

WMI

Windows Management Infrastructure

W3C

World Wide Web Consortium

2011 by MASSIF Consortium

3 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Executive Summary
Deliverable D3.2.1 is one of the first technical productions of the MASSIF project. The description of
work specifies that this document is an analysis of input and output formats from use case scenarii,
and specification of common message formats for these data streams. This document has therefore
two objectives, enumerate data formats and models that have been used by the partners of the project
in SIEM-related projects, and provide a first glimpse at use cases, from a data point of view, that will
spread knowledge and understanding among partners on these use cases, and provide a first evaluation
of the importance of the aforementioned data formats. The document is constituted of 2 parts, Alert and
Event Languages describing security alerts and events, and use-case specific data streams describing
log formats specific to the proposed use cases. This document concludes with an analysis highligting
several characteristics shared between these languages and event formats, among wich simplicity of
the information representation that must be easily readable, timestamping and modularity of the format
structure.

2011 by MASSIF Consortium

4 / 61

Contents
1 Introduction

11

1.1 Deliverable objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.2 MASSIF architecture sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2 Alert and Event Languages

14

2.1 Languages selection rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.1.1 Analysis of Commercial SIEMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

. . . . . . . . . . . . . . . . . . . . . . . . .

16

2.2 The Common Event Format (CEF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.2 Presentation of log sources selection

16

2.2.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.2.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.3 The Common Log Format (CLF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.3.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.3.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.3.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20
20

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.4 The Intrusion Detection Message Exchange Format (IDMEF) . . . . . . . . . . . . . . . .

21

2.4.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.4.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21
21

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.4.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.5 InterFace to Metadata Access Point (IF-MAP) . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.5.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.5.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.5.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

2.5.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.6 Incident Object Description and Exchange Format (IODEF) . . . . . . . . . . . . . . . . .

26

2.6.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.6.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.6.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.6.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28
28

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.7 IP Flow Information Export (ipfix) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.7.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.7.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.7.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.7.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2011 by MASSIF Consortium

6 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.8 The Syslog Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


2.8.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31
31

2.8.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.8.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

2.8.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.9 Windows Management Instrumentation (WMI)

. . . . . . . . . . . . . . . . . . . . . . . .

35

2.9.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.9.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.9.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.9.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.10 WS-Eventing and WS-Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.10.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.10.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Delivery mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40
40

Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.10.3 Advantages of the formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3 Use-case specific data streams

43

3.1 Olympic Games Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3.1.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3.1.2 Novell Sentinel Interface: Syslog data format . . . . . . . . . . . . . . . . . . . . .

44

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Drawbacks and issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.1.3 Novell Sentinel Interface: LEA API . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

2011 by MASSIF Consortium

7 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Drawbacks and issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46
47

3.2 Mobile Money Transfer Service scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.2.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.2.2 Mobile Money Service: proprietary data format . . . . . . . . . . . . . . . . . . . .


3.3 Managed Enterprise Service Infrastructures scenario

47

. . . . . . . . . . . . . . . . . . . .

49

3.3.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.3.2 Tivoli TSOM interface: SNMP data format . . . . . . . . . . . . . . . . . . . . . . .

50

3.3.3 Tivoli TSOM interface: Syslog data format . . . . . . . . . . . . . . . . . . . . . . .

51

3.4 Critical Infrastructure Process Control (Dam) scenario . . . . . . . . . . . . . . . . . . . .

51

3.4.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

3.4.2 Dam scenario: Modbus data format . . . . . . . . . . . . . . . . . . . . . . . . . .

52

Structure overview (Modbus) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Modbus Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

Issues (Modbus) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

Modbus Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.4.3 Dam scenario: WSN and CTP data formats . . . . . . . . . . . . . . . . . . . . . .

56

WSN Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

Advantages (WSN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Issues (WSN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

3.4.4 Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

4 Analysis and Conclusion

59

4.1 Analysis of alert and event languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

4.2 Analysis of use case specific data streams

60

2011 by MASSIF Consortium

. . . . . . . . . . . . . . . . . . . . . . . . . .

8 / 61

List of Figures
1.1 MASSIF Blueprint Architecture (proposed) . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.1 An example metadata graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.2 Windows Management Infrastructure architecture data flow . . . . . . . . . . . . . . . . .

36

3.1 Log example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

3.2 General Modbus Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

3.3 Modbus transaction (error free) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

3.4 Modbus transaction (exception response) . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

List of Tables
2.1 RSA Envision collectors summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.2 Included log sources summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.3 Eliminated log sources summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3.1 Money Transfer Message Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.2 TIVOLI TSOM SNMP Trap content example . . . . . . . . . . . . . . . . . . . . . . . . . .

51

10

Chapter 1

Introduction

1.1 Deliverable objectives


This deliverable is one of the first technical productions of the MASSIF project. The description of work
specifies that this document is an analysis of input and output formats from use case scenarii, and
specification of common message formats for these data streams. This document has therefore two
objectives:
enumerate data formats and models that have been used by the partners of the project in SIEMrelated projects, in order to give a broad overview of the richness of the field, and prepare the
definition of the ontology (MASSIF Deliverable 3.2.2).
provide a first glimpse at use cases, from a data point of view, that will spread knowledge and
understanding among partners on these use cases, and provide a first evaluation of the importance
of the aforementioned data formats.
As one can see from these two items, data is at the core of the MASSIF project, since Security Information and Event Management is, at the heart, about gathering data, analyzing it, and making informed
decisions in the ICT security domain. With respect to data gathering, this document concentrates on
the syntax and semantics of the information, regardless of location or actual transport mechanisms.
Resilient event collection is handled in workpackage 3.1, scalable event processing engine. The only
assupmtion of this document is that whatever format chosen will be available without restrictions through
WP31. With respect to data analysis, methods will be studied in WP33 (event collection, parsing and
propagation) on the sensor side and WP34 (event filtering, aggregation, abstraction, and correlation) on
the SIEM platform side. We will thus focus on the syntax and semantic of as many data formats as felt
pertinent by the projects partners.
In accordance with the objectives of the document, we have segmented it in two main parts, as
follows:
Alert and Event Languages Chapter 2 gathers all formats and languages that represent transient information, information that is time-driven and that has to be handled by the MASSIF SIEM system

11

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

to manage the security status of the monitored system. In this area, we will focus on languages that
are considered having standards status, either through their publication mechanism or because of
their widespread use.
Use-case specific data streams Chapter 3 describe the use cases data stream formats. We are particularly interested in describing the specificities of the content of the data streams, such as the
way they build syslog message contents, as most of the syntax should be covered in the previous
chapters.

1.2 MASSIF architecture sketch


The work on data streams analysis has to be considered in relationship with the definition of the MASSIF
platform architecture. While the mandate of this document is not to specify an architecture for the
MASSIF project, we do introduce it with thoughts on a very simple architecture sketch shown in figure
1.1.

Figure 1.1: MASSIF Blueprint Architecture (proposed)


Figure 1.1 separates the world in two parts, the MASSIF SIEM Platform plane and the monitored
business system plane. The former is under full control of the project and the latter should be left as

2011 by MASSIF Consortium

12 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

undisturbed as possible, or at least the capabilities required by the MASSIF SIEM system in terms of
monitoring and countermeasures should be fixed and acceptable to the business system owners.
Within the monitored system, we have separated three functions, intrusion detection sensors, business process components, and access control. Business process components have as primary function
to service users; however, they have also auditing capabilities in the form of log files, and minimal policy
enforcement capabilities like startup and shutdown. Sensors have as primary function to detect and
report sensitive events, either attacks or anomalies. Access control and identity management are security policy components, whose interaction with the MASSIF SIEM system will be the primary mean for
security response. In the current security litterature, intrusion prevention systems should be considered
as belonging to the two last categories.
Within the SIEM platform, we separate the operational decision support subsystem, handling the
alerts in real time, and the model management subsystem, which evaluates and updates the decision
support system according to its past performance, to the evolution of the monitored system, and to the
evolution of the global knowledge (vulnerabilities, etc.).
The most relevant part of this architecture for the present deliverable is the exchanges between the
two planes, which we model as follows:
Events (push) This stream describes events being pushed by the monitored business system to the
MASSIF SIEM platform. These events are typically alerts or logs driven by the interactions that the
monitored business system has with the outside world (users, updates, etc.) The formats used in
this data stream are described in section 4.1, alert and event languages.
Events (pull) This stream describes events being requested by the MASSIF SIEM platform from the
monitored business system. This allows the business system to store data and only make it available to the MASSIF SIEM if necessary. It is a way for the SIEM platform to ask questions or verify
information that it has on the monitored system. The formats used in this data stream should be
similar to the ones described in section 4.1, alert and event languages.
Configurations (Commands) This stream describes modifications of the behaviour of the business
system that are driven by the MASSIF SIEM system, mainly for update or response purposes.
This stream is important for alert correlation, but is outside the scope of this document.
Audits This stream represents the interaction of the model management subsystem with the monitored
business system. While it is analytically a different data stream, it might be assimilated to the
combination of both event (push + pull) streams, and might be implemented in this way, to simplify
the plane interface management. This stream is particularly important for model acquisition and
maintenance, but is outside the scope of this document.
The refined small blue arrows precise the data stream names in the case of sensors and should be
treated as examples only for the purpose of this deliverable.
This blueprint architecture will further evolve as the specifications of the MASSIF SIEM prototype are
developed.

2011 by MASSIF Consortium

13 / 61

Chapter 2

Alert and Event Languages

2.1 Languages selection rationale


Since it is impossible to produce a comprehensive list of all formats, we have specified selection criteria
to include only a subset of the available data formats. One first need to note that we are interested in
formats, not in transport protocols. Unfortunately, there is a very close association between data formats
and transport protocols in several cases, which makes it difficult to exactly understand the motivations
of developers and users. Another consideration is that we do not need to describe all formats, but we
need to identify formats that are also generic representations of information.
The following elements are the foundations of our rationale:
SIEM-market supported We have looked at the SIEM market, and specifically the adapters that they
provide. We have specifically analyzed five major commercial SIEMs, OSSIM and Prelude (represented in the MASSIF project), as well as Envision from RSA Security, Novell Sentinell and
Arcsight, to understand what kind of data formats they collects. This analysis is further detailed in
section 2.1.1.
Standards-body driven We are interested in using formats that are supported by open standards organizations, and that are freely available. In that group, we have selected standards defined by the
Internet Engineering Task Force (IETF), the World Wide Web Consortium (W3C) and the Organization for the Advancement of Structured Information Standards (OASIS). Even though they may
not be in production use today, they do provide an interesting and collective vision of the problem
that we are addressing, and some of them have actually been used.
User-supported Finally, we have also drawn from the collective experience and knowledge of the
projects partner, particularly the commercial users and use case providers, to complement and
confirm the first two criteria.

14

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.1.1 Analysis of Commercial SIEMs


Commercial SIEM vendors have a strong marketing incentive to collect information from as many data
sources as possible, in order to market their products as a data warehouse for logging and compliance.
They also have a strong technical incentive to limit the number of protocols they understand, in order to
simplify not only development but also integration. Therefore, we expect from the documentation of the
capabilities of the SIEM products many data sources but few protocols.
A summary of the list of connectors for the RSA EnVision SIEM is presented in table 2.1. It lists 186
products but only about 15 different connectors. We have counted in table 2.1 the number of times each
connector type appears in the documentation1 . This summary shows that a majority of log sources are
connected via Syslog. The three other important mechanisms are acquisition of log files via FTP, ODBC
and SNMP; however, SNMP does not even mention if it is about traps; or which management information
bases are involved. The other connectors are dedicated to a specific set of tools (e.g. Checkpoints LEA
or Windows WMI).
Connector identity

Number of instances Percentage

Syslog

95

51%

Log File FTP

25

13%

ODBC

25

13%

SNMP

20

11%

File Reader

2%

Agentless Windows

2%

Other connectors

13

7%

Total number of interfaced products

186

100%

Table 2.1: RSA Envision collectors summary


Novell Sentinel documents connectivity to at least 61 products, using 11 different connectors. The
identification of the collectors is extremely similar to the one shown in table 2.1. Even though we do not
have available the same level of detail, we surmise that the results would be quite similar.
One of the major issues when dealing with SIEM tools is the lack of separation between the data
format and the transportation protocol. In fact, the operation of these products requires understanding
not only of the protocol, but also of the content and semantic of the message. That is why Arcsights
SIEM has published its interface specification, the Common Event Format, wishing for wide adoption by
the community of security tools vendors. While this has not seen the light, it provides an interesting and
important viewpoint at the way SIEM vendors see their data providers today.
Finally, one needs to note that Prelude, one of the SIEM tools we are looking at in MASSIF, is using
the IETF standard IDMEF for its data format, even though it is not using the companion IDXP protocol.
1 Whenever

a product listed several connectors, we selected the most represented one.

2011 by MASSIF Consortium

15 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.1.2 Presentation of log sources selection


One also needs to realize that this analysis does not give us information about deployment in the field,
or only in an approximate way. We have therefore added a third element, the experience of the partners
in the field, to evaluate the data sources and reinforce our selection criteria. Table 2.2 presents our
selection of log sources that are included in the alert and event languages description.
As one can see from table 2.2, our selection points us to 8 different alert and event languages.
Beyond the ubiquitous syslog, we have included languages that are important either because of their
standard status, and because they will help us reach the goals of the project, even though they are not
currently used in SIEM environments (to the best of our knowledge). When analyzing the existing SIEM
environments, we have also eliminated the description of log sources from this deliverable. The reasons
for not selecting these sources are presented in table 2.3.
We will now proceed to the description of the alert and event languages, following as much as
possible an homogeneous template. The description in itself is kept short, as the reader is refered to
already existing documentation. We have rather focused on our experience with these data sources,
and their relationship to the project.

2.2 The Common Event Format (CEF)

2.2.1 Reference
The Common Event Format (CEF)2 is specified and provided without charge by Arcsight Inc3 , a SIEM
vendor, as part of its strategy to foster interoperability between its SIEM vendor and sensors vendors.

2.2.2 Objectives
The Common Event Format (CEF) is an open log management standard that improves the interoperability of security-related information from different security and network devices and applications. CEF
has been designed to enable technology companies and customers to use a common event log format
so that data can easily be collected and aggregated for analysis by an enterprise management system.
2 http://www.arcsight.com/collateral/CEFstandards.pdf
3 http://www.arcsight.com/

2011 by MASSIF Consortium

16 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.2.3 Structure

Structure overview
CEF is an extensible, text-based, high-performance format designed to support multiple device types
from both security and non-security devices and applications in the most simple manner possible, unlike
other standards that target a single component of the security infrastructure, are tied to a specific transport protocol, or are designed specifically for applications and cannot support todays high-performance,
real-time security requirements.
To simplify integration, the syslog message format is used as a transport mechanism. However, if an
event producer is unable to write syslog messages, it is still possible to write the events to a file.
The basic grammar of the format includes the self-explanatory fields:

CEF:Version|Device Vendor|Device Product|Device Version|Signature


ID|Name|Severity|Extension
An example of a CEF message taken from the documentation is:

Sep 19 08:26:10 zurich CEF:0|security|threatmanager|1.0|100|worm


successfully stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

Links with other data formats


CEF is fairly close to syslog in spirit, and also share similarities with the Security Device Event Exchange
(SDEE)4 , a joint effort between Cisco and SourceFire to standardize events coming out of network-based
intrusion detection sensors.

Relationship with MASSIF


This format should be considered in the light of competition. Owning the base data format is a way
to lock customers into a specific SIEM platform, in this case Arcsights, because of the investment in
developing translation agents for custom logs and in deploying these agents in the field. It might be
useful to have at least import capabilities from CEF into MASSIF.
4 http://www.cisco.com/en/US/docs/security/ips/specs/CIDEE_Specification.htm

2011 by MASSIF Consortium

17 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.2.4 Critical assessment of the format

Advantages
The Common Event Format promotes interoperability between various event (or log) generating devices.
Although each vendor has its own format for reporting event information, these event formats often lack
the key information necessary to integrate the events from their devices.
The ArcSight standard attempts to improve the interoperability of infrastructure devices by aligning
the logging output from various technology vendors.
The Extension Dictionary from the CEF provides a broad set of predefined extension keys which
covers most event log requirements.

Issues
Custom extension keys are recommended for use only when no reasonable mapping of the information
can be established for a predefined CEF key. While the custom extension key mechanism can be used
to safely send information to CEF consumers for persistence, there are certain limitations as to when
and how to access the data mapped into them.
Data submitted to ArcSight Logger using custom key extensions is retained in the system; however,
it is not available for use in the Logger reporting infrastructure.

Uses
Use of the CEF format is limited to Arcsights deployments, despite the lobbying efforts deployed.

2.3 The Common Log Format (CLF)

2.3.1 Reference
The Common Log Format (CLF) and its sibling the Extended Common Log Format (ECLF) are specified
by the W3C community5 and by the Apache developper community6 . This format falls into the category
of de-facto standards; while it is widely adopted by web servers, there is no normative reference.
5 http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format
6 http://httpd.apache.org/docs/2.2/logs.html#common

2011 by MASSIF Consortium

18 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.3.2 Objectives
The Common Log Format is used by web servers, in particular the Apache web server, to trace all
requests processed by the server. It is generally shared by all log files (access.log, error.log, and others).
While the Apache web server offers the possibility to customize the log format, the users tend to keep
the default configuration, using either the simple CLF format, or its extension the ECLF format, which
shares the same initial description.

2.3.3 Structure

Structure overview
The CLF format stores the following information:
IP address of the origin of the request as presented to the server. If the requesting browser is behind
a proxy, the address of the proxy will show up in the logs.
identd identity of the client as specified in RFC 1413[8].
userid of the requester as determined by HTTP authentication.
Timestamp of the request.
Request line presented by the client, including the method, the URI and the protocol.
Status code that was returned to the client, indicating how the server was able to fulfil the request.
Size of the object returned to the client.
The ECLF format includes in parenthesis, after the information provided by CLF, additional information provided by the client, such as the referign URL and user agent identifiyng the clients browser.

Links with other data formats


This format is similar in spirit to syslog (one line of timestamped textual information), but is tailored for
web servers.

Relationship with MASSIF


We expect that all web servers providing information to the MASSIF platform will use this format(s).

2011 by MASSIF Consortium

19 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.3.4 Critical assessment of the format

Advantages
The CLF format is very easy to use and very informative. Even though it limits itself to HTTP header
information, it synthetizes the important aspects of the activity of the web server, from the point of view
of security: who asked what, when, and how did the server react. It is extremely compact and thus
efficient in terms of processing. Being widely adopted by web servers developers and proxy developers,
it provides a solid basis for analysis and detection of malicious activity aiming to subvert the web server
through the use of the HTTP protocol.

Issues
The CLF format does suffer from several issues, that have an impact on the detection and diagnosing of
attacks:
Multiplicity of lines Since the HTTP server may serve multiple requests for a single page view, a complete diagnosis may require the analysis of multiple lines which are not necessarily sharing an
identifying token.
Lack of payload information The log file does not contain HTTP payload information. This means that
for methods such as POST, the complete information is not available for diagnosis. This may be a
serious limitation for diagnosing infections such as XSS or SQL injection, for example if content is
pushed into comments in dynamic web sites.
Lack of server-side information The log file does not contain information identifying the web server
(such as the virtual server accessed). This is a serious limitation in identifying the exact target of
the attacker.

Uses
The CLF format is extremely used for web servers.

2011 by MASSIF Consortium

20 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.4 The Intrusion Detection Message Exchange Format (IDMEF)

2.4.1 Reference
The Intrusion Detection Message Exchange Format (IDMEF) is normalized by the Internet Engineering
Task Force (IETF) as RFC 4765[5].

2.4.2 Objectives
The Intrusion Detection Message Exchange Format (IDMEF)[13] is intended to be a standard data format that automated intrusion detection systems can use to report alerts about events that they deem
suspicious. The development of this standard format aims at enabling interoperability among commercial, open source, and research systems, allowing users to mix-and-match the deployment of these
systems according to their strong and weak points to obtain an optimal implementation. It standardizes
messages between a sensor providing security analysis and detecting threats, and a manager which
receives and treats these messages. In the MASSIF context, the manager should be either the SIEM
platform itself or a gateway to it.

2.4.3 Structure

Structure overview
IDMEF is built as an UML class diagram of components. The standard defines two types of messages,
Alert (for security information) and Heartbeat (for management information). A message is an aggregation of components, modeling various entities that are part of an intrusion-detection sensor. At the
top level, a message requires a timestamp (CreateTime in IDMEF), a meaning (Classification in IDMEF)
and a generating sensor (Analyzer in IDMEF). The two other major components are the target and the
source of the attack. Each of these blocs has a complex structure, that attempts to capture the various
facets that characterize a component of an information system. One example of the elementary components that compose these larger blocks is the notion of Node, which is found both in Analyzer, Source
and Target, which models a machine.

2011 by MASSIF Consortium

21 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Links with other data formats


IDMEF per se does not have links with other formats. However, several tools including Prelude provide
mechanisms for parsing log formats, for example syslog or clf, and transform these log formats into
IDMEF messages. This parsing includes and requires knowledge not only of the source format but also
of its semantic, in order to provide meaningful conversion.

Relationship with MASSIF


The IDMEF format is the back-end format of Prelude. It is also used by 6cure and Tlcom SudParis for
their research activities, to represent and transmit alert information.

2.4.4 Critical assessment of the format

Advantages
Semantic IDMEF is extremely conscious of the semantic of the information it manipulates, and does
much more that providing a syntax. Furthermore, it provides rationales and explanations to limit
interpretation by developers and thus reduce ambiguity. IDMEF also includes many constants that
strongly type objects. While the manner in which these constants are defined may not be the best,
the idea of strongly typing objects is very important in contributing to strong and clear semantic.
Modularity IDMEF is built of a set of components and thus is extremely modular. It also provides
facilities for referencing components instead of including them in the message, which contributes
to the efficiency in transfering and sharing identical information.
Extensibility IDMEF provides facilities for including structured information in a message, under the
form of the AdditionalData blob. This facility enables including original messages within IDMEF, or
information that becomes available at a later stage.

Issues
Dissemination Even though IDMEF is an RFC, it is only an informational one and it has not been
widely picked up by the security product developers, as sensor developers prefer simpler and less
constrained solutions, and as SIEM developers have prefered to own their base formats.

2011 by MASSIF Consortium

22 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

XML IDMEF is an XML format, thus it is quite verbose. While for transport purposes it compresses
quite well, it should not be used for storing information, nor for developing DB schemas. Also, the
normative reference is the XML DTD and not the XML schema, thus type checking is less precise.
extensibility IDMEF is extensible through the use of XML blobs. The idea is nice and useful, but there
are currently no possibilities for creating and sharing standard or useful patterns out of these blobs.

Uses
The IDMEF format is used mostly in the research community as a standard back-end for intrusion detection and alert correlation research projects and communities. It is also used by the Prelude SIEM
environment7 as its back-end data format (although the companion transport protocol IDXP is not used
by Prelude).

2.5 InterFace to Metadata Access Point (IF-MAP)

2.5.1 Reference
trustedcomputing.org
http://www.trustedcomputinggroup.org/developers/trusted_network_connect/
Specification document of IF-MAP 2.0 [11]
Specification document of IF-MAP Metadata for Network Security [12]

2.5.2 Objectives
IF-MAP is an interface specification between a Metadata Access Point (MAP) Server and entities that
either publish metadata or that subscribe to metadata from the MAP. The entities are called IF-MAP
clients, while the Server is referred to Metadata Access Point (MAP) or as IF-MAP Server. The latter
provides functionalities to publish metadata, to search through the stored metadata and enable clients
to subscribe to specific data and be notified on the event of data changes.
As IF-MAP aims to enable the structured collection and provision of data, it is not only a language to
describe (security) events. Nevertheless, a specification of a metadata language for network security is
part of IF-MAP [12]. As IF-MAP has been created by the TNC working group of the Trusted Computing
Group, its foremost purpose is the gathering of information that can be used in order to apply access
decisions in a networking environment. Thus the metadata comprises elements like registered address
7 http://www.prelude-ids.org/

2011 by MASSIF Consortium

23 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Figure 2.1: An example metadata graph


bindings, authentication status, endpoint policy compliance status, endpoint behavior, and authorization
status. But the specification is open and the process is not finished, thus allowing to influence the
definition of models for metadata describing any kind of information.

2.5.3 Structure

Structure overview
The IF-MAP specification comprises of two single documents yet. One is the general description
and SOAP binding TNC_IFMAP_v2_0r36.pdf, also referred to as IF-MAP 2.0 [11]. The other is the
specification of IFMAP Metadata for Network Security which is v1.0 revision 25 at the time of writing this document [12]. Additionally, for a quick overview, we propose reading the IF-MAP FAQ under

www.trustedcomputing.org.
The session based communication between a MAP client and server is always initiated by the client
and is based on SOAP. The commands comprise different kinds of publish (update, delete etc.), subscribe (e.g. notification poll) and search.
The data model of IF-MAP comprises two types of data. The identifier (e.g. identities of several
types, mac-address, ip-address) and the metadata which can be related to each other by a link. Figure 2.5.3 visualises the data model used in IF-MAP where identifiers are represented by ovals, metadata
is represented by rectangles, and links are represented by lines connecting identifiers.

2011 by MASSIF Consortium

24 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Links with other data formats


The metadata description language is XML, thus any event description based on XML should be easily
introduced.

Relationship with MASSIF


The IF-MAP specification allows a publish and subscribe model for the information collection and processing. This could have a major impact to the different tasks of MASSIF as it might facilitate an interface
for the security information. This does not apply to a single use case only but refers to all four use cases
and could even enable a combination of security information of the use cases and the different SIEM
tools in order to enable convergence and collaboration as well as a uniform presentation of the MASSIF
appliances.

2.5.4 Critical assessment of the format


As part of these points have been described in previos subsections, this sections provides bullet points
mainly.

Advantages
Provision of an interface for various kinds of security information
A central database for information based on one protocol
A simple publish/subscribe data collector
Standard enables integration of application & system input & output from different vendors.
Opportunity to create a vocabulary explicitly for the needs of MASSIF and
thereby have an influence on the standardisation process
IF-MAP is intrincically defined to be extensible
Close contact of SIT with FHH (open source IF-MAP server irond) and Infoblox (IF-MAP server
IBOS and IF-MAP starter kit) and
opportunities of cooperation (user group) and dissemination (though Infoblox who are actively
advertising every adoption of IF-MAP)

2011 by MASSIF Consortium

25 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Issues
As the specification of the metadata is not concluded or only consists of NAC information, respectively, there is no fully-fledged vocabulary. Nevertheless, one could add additional metadata types
through the use of other tags.
The standardisation of IF-MAP is not finished, so the specification might evolve during the run
of MASSIF. Standardisation with the IETF is planned for summer 2011 but usually takes several
years.

Uses
As the metadata definition does not yet exceed that of network security information, normal applications
according to the TCG are:
Federation between remote access and network access control (NAC).
Integration of NAC with endpoint monitoring and e.g. data leak detection.
Integration of physical access control with NAC.
Federation of authentication information, single sign on/off.
Real time information gathering and processing.
There are a lot of potential applications, specifically interesting to the goals of MASSIF. The TCG mentions applications in the field of smart grid and cloud security for reasons, that enable IF-MAP to facilitate
SIEM integration, such as aggregating, correlating and distributing of data from various applications and
systems.

2.6 Incident Object Description and Exchange Format (IODEF)

2.6.1 Reference
The Incident Object Description and Exchange Format (IODEF) is normalized by the Internet Engineering Task Force (IETF) as RFC 5070[4].

2011 by MASSIF Consortium

26 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.6.2 Objectives
The Incident Object Description Exchange Format (IODEF) is a format for representing computer security information commonly exchanged between Computer Security Incident Response Teams (CSIRTs).
It provides an XML representation for conveying incident information across administrative domains
between parties that have an operational responsibility of remediation or a watch-and-warning over a
defined constituency. The data model encodes information about hosts, networks, and the services running on these systems; attack methodology and associated forensic evidence; impact of the activity; and
limited approaches for documenting workflow. The structured format provided by the IODEF allows for
increased automation in processing of incident data; decreased effort in normalizing similar data from
different sources; and a common format on which to build interoperable tools for incident handling and
subsequent analysis, specifically when data comes from multiple constituencies.

2.6.3 Structure

Structure overview
The IODEF implementation is specified as an Extensible Markup Language (XML) document type definition. The data model is composed of nineteen classes that describe the data related to the incident
(e.g. incident ID, related activity, time, assessment, history, etc). The data model serves as a transport
format; it does not attempt to dictate a definition for an incident, it rather assumes a broad understanding
of an incident that is flexible enough to encompass most operators. Since describing an incident for all
definitions requires an extremely complex data model, the IODEF intends to be a framework to convey
commonly exchanged incident information, ensuring that there are ample mechanisms for extensibility
to support organization-specific information and techniques to reference the information kept outside the
model.

Links with other data formats


The data model of the Intrusion Detection Message Exchange Format (IDMEF) influenced the design of
the IODEF. The classes of the data model can be extended through the use of extensible classes, which
provide the ability to have new atomic or XML-encoded data elements in all of the top-level classes of
the Incident class and a few of the more complicated subordinate classes.
Similarly, while the IODEF supports different languages, the data model relies heavily on standardized enumerated attributes that can crudely approximate the contents of the document. With this approach, a CSIRT should be able to make some sense of an IODEF document it receives even if the text
based data elements are written in a language unfamiliar to the analyst.

2011 by MASSIF Consortium

27 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Relationship with MASSIF

2.6.4 Critical assessment of the format

Advantages
The overriding purpose of the IODEF is to enhance the operational capabilities of CSIRTs. Community
adoption of the IODEF provides an improved ability to resolve incidents and convey situational awareness by simplifying collaboration and data sharing.
Implementing the IODEF in XML provides numerous advantages. Its extensibility makes it ideal for
specifying a data encoding framework that supports various character encodings, such as UTF-8 and
UTF-16. Likewise, the abundance of related technologies (e.g., XSL, XPath, XML-Signature) makes for
simplified manipulation.
The data model supports multiple translations of free-form text. The intent is to allow the identical
text to be encoded in different instances of the same class, but each being in a different language. This
approach allows an IODEF document author to send recipients speaking different languages an identical
document.

Issues
XML is fundamentally a text representation, which makes it inherently inefficient when binary data must
be embedded or large volumes of data must be exchanged.
In order to support the changing activity of CSIRTs, the IODEF data model will need to evolve along
with them. Internationalization and localization is of specific concern to the IODEF, since it is only through
collaboration, often across language barriers, that certain incidents be resolved. The IODEF supports
this goal by depending on XML constructs, and through explicit design choices in the data model.
The domain of security analysis is not fully standardized and must rely on free-form textual descriptions. The IODEF attempts to strike a balance between supporting this free-form content, while still
allowing automated processing of incident information.
As the data encoded by the IODEF might be considered privacy sensitive by the parties exchanging
the information or by those described by it, care needs to be taken in ensuring the appropriate disclosure
during both document exchange and subsequent processing. Similarly, care must be taken by the parser
to properly authenticate the recipient of the document and ascribe an appropriate confidence to the data
prior to action.

2011 by MASSIF Consortium

28 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Uses
We do not have specific information about the actual use of the IODEF by FIRST or CERT organizations.

2.7 IP Flow Information Export (ipfix)

2.7.1 Reference
The Internet Protocol Flow Information Export (IPFIX) requirements are normalized by the Internet Engineering Task Force (IETF) as RFC 3917[10]. The specifications are normalized in the RFC 5101[2].

2.7.2 Objectives
The Internet Protocol Flow Information Export (IPFIX) has been created from the need of a standard for
exporting Internet Protocol flow information collected from routers, probes and other devices used by
mediation systems, accounting/billing systems and network management systems. The IPFIX standard
defines how IP flow information has to be formatted and transferred from an exporter to a collector. Previously, many data network operators were relying on the proprietary Cisco Systems Netflow standard
for traffic flow information export. The IPFIX Working Group chose the Netflow version 9 as basis for the
standardization. The working group submitted the IPFIX Protocol Specification to the IESG for approval
in 2006.

2.7.3 Structure

Structure overview
IPFIX defines a flow as any number of packets observed in a specific timeslot and sharing a number of
properties, like "same source, same destination, same protocol". The IPFIX protocol defines a precise
architecture for flow data information exporting. This architecture includes an observation point for
collecting IP packets belonging to a specific observation domain. A metering process filters data packets
and aggregates information about these packets; this information defines the Flow Records. The Flow
Record contains metrics related to packet header data, timestamping, sampling, classification. Flow
Records are sent by the IPFIX exporter to an IPFIX collector, in charge of receiving and cataloguing
IPFIX packets; exporter and collector are in many-to-many relation and work on a push based paradigm.

2011 by MASSIF Consortium

29 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

The IPFIX data format makeup is transmitted by means of template records to the collector; they
could be standard or user-defined. Template Records are an n-uple of type-size couples, used to define
entirely the structure and the semantic of a specific set of metrics sent to the collector. The collector
discerns different Data Records by means of their Template ID. Data Records are composed of a certain
number of Information Elements, representing the attributes description.

Links with other data formats


IPFIX is not strictly related to other data formats, apart from Cisco Systems NetFlow 9, its predecessor
before the standardization. Despite this isolation, IPFIX data format could contain information for feeding
an IDMEF message parser/sender: IP source and destination addresses, IP of target machines, timestamps, data information. The format translation needs a proper IPFIX collector, in charge of extracting
and classifying needed information.

Relationship with MASSIF


IPFIX messages and protocol architecture supply information sent by several network devices, routers,
sensors and critical nodes and machines, like network management systems. These different devices
are present in turn in almost all the scenarios.

2.7.4 Critical assessment of the format

Advantages
Modularity The IPFIX architecture and its many-to-many paradigm is operatively modular and fits perfectly the needs of MASSIF for a distributed data metering system and for collecting data from
remote sites.
Flexibility The IPFIX standard, by means of Template Records, provides solutions to extend the data
message format with user defined fields, for example for introducing non-standard Information
Elements. Moreover it allows the definition of the messages structure. The standard works on
different transmission protocols like TCP, UDP or SCTP.
Interoperability The IPFIX protocol is standard and can rely on a widespread number of compliant
devices from several vendors, reducing the number of ad-hoc solutions.
Extensibility IPFIX information is not limited to flows: network behavior, performance behavior, application behavior, host behavior, security analysis are some of them.

2011 by MASSIF Consortium

30 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Issues
Encryption Analysis of encrypted packets is a relevant issue for a proper data inspection. In encrypted
scenarios, IP packets fields are encrypted and unobservable at several layers, so some metrics,
related for example to protocol headers, cannot be evaluated.
Hardware requirements Probes must be deployed on every link to be monitored. Moreover deep inspection on high bandwidth networks is not tolerated by a simple router device.
Collector flooding Since the protocol is push based the collector could suffer of excessive load coming
from the probes. A careful exporting configuration must be considered.

Uses
The IPFIX format is largely implemented and adopted by generic network devices, like routers, and
network analysis devices provided by several vendors. IPFIX compliant devices are used as support
for effective network measurement, providing vital information on the health of the managed networks;
the collection of network information can be used for several purposes: the standard provides a strong
back-end for security functionalities, like Intrusion Detection.

2.8 The Syslog Format

2.8.1 Reference
The Syslog Protocol is normalized by the Internet Engineering Task Force (IETF) as RFC 5424[6].

2.8.2 Objectives
The need for a new layered specification has arisen because standardization efforts for reliable and
secure syslog extensions suffer from the lack of a Standards-Track and transport-independent RFC.
Without this, each other standard needs to define its own syslog packet format and transport mechanism,
which over time will introduce subtle compatibility issues. The goal of this architecture is to separate
message content from message transport while enabling easy extensibility for each layer.

2011 by MASSIF Consortium

31 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.8.3 Structure

Structure overview
This protocol utilizes a layered architecture, which allows the use of any number of transport protocols
for transmission of syslog messages. It also provides a message format that allows vendor-specific
extensions to be provided in a structured way. The syslog protocol does not provide acknowledgment
of message delivery. Though some transports may provide status information, conceptually, syslog is a
pure simplex communication protocol.
The syslog message has the following ABNF[3] definition:

SYSLOG-MSG

= HEADER SP STRUCTURED-DATA [SP MSG]

HEADER
PRI
PRIVAL
VERSION
HOSTNAME

= PRI VERSION SP TIMESTAMP SP HOSTNAME


SP APP-NAME SP PROCID SP MSGID
= "<" PRIVAL ">"
= 1*3DIGIT ; range 0 .. 191
= NONZERO-DIGIT 0*2DIGIT
= NILVALUE / 1*255PRINTUSASCII

APP-NAME
PROCID
MSGID

= NILVALUE / 1*48PRINTUSASCII
= NILVALUE / 1*128PRINTUSASCII
= NILVALUE / 1*32PRINTUSASCII

TIMESTAMP
FULL-DATE
DATE-FULLYEAR
DATE-MONTH
DATE-MDAY

=
=
=
=
=

FULL-TIME
PARTIAL-TIME

=
=

TIME-HOUR
TIME-MINUTE
TIME-SECOND
TIME-SECFRAC
TIME-OFFSET

=
=
=
=
=

NILVALUE / FULL-DATE "T" FULL-TIME


DATE-FULLYEAR "-" DATE-MONTH "-" DATE-MDAY
4DIGIT
2DIGIT ; 01-12
2DIGIT ; 01-28, 01-29, 01-30, 01-31
; based on month/year
PARTIAL-TIME TIME-OFFSET
TIME-HOUR ":" TIME-MINUTE ":" TIME-SECOND
[TIME-SECFRAC]
2DIGIT ; 00-23
2DIGIT ; 00-59
2DIGIT ; 00-59
"." 1*6DIGIT
"Z" / TIME-NUMOFFSET

2011 by MASSIF Consortium

32 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

TIME-NUMOFFSET = ("+" / "-") TIME-HOUR ":" TIME-MINUTE


STRUCTURED-DATA
SD-ELEMENT
SD-PARAM
SD-ID
PARAM-NAME
PARAM-VALUE
SD-NAME

=
=
=
=
=
=

NILVALUE / 1*SD-ELEMENT
"[" SD-ID *(SP SD-PARAM) "]"
PARAM-NAME "=" %d34 PARAM-VALUE %d34
SD-NAME
SD-NAME
UTF-8-STRING ; characters '"', '\' and ']'
; MUST be escaped.
= 1*32PRINTUSASCII except '=', SP, ']',
%d34 (")

MSG
MSG-ANY
MSG-UTF8
BOM

=
=
=
=

MSG-ANY / MSG-UTF8
*OCTET ; not starting with BOM
BOM UTF-8-STRING
%xEF.BB.BF

UTF-8-STRING

= *OCTET ; UTF-8 string as specified


; in RFC 3629

OCTET
SP
PRINTUSASCII
NONZERO-DIGIT
DIGIT
NILVALUE

=
=
=
=
=
=

%d00-255
%d32
%d33-126
%d49-57
%d48 / NONZERO-DIGIT
"-"

Syslog message size limits are dictated by the syslog transport mapping in use. There is no upper
limit per se. Each transport mapping defines the minimum maximum required message length support,
and the minimum maximum must be at least 480 octets in length.
The TIMESTAMP field is a formalized timestamp derived from [RFC3339].
The HOSTNAME field identifies the machine that originally sent the syslog message.
The APP-NAME field should identify the device or application that originated the message. It is a
string without further semantics. It is intended for filtering messages on a relay or collector.
The PROCID field is a value that is included in the message, having no interoperable meaning,
except that a change in the value indicates there has been a discontinuity in syslog reporting. The
field does not have any specific syntax or semantics; the value is implementation-dependent and/or
operator-assigned.
The MSGID should identify the type of message. For example, a firewall might use the MSGID
TCPIN for incoming TCP traffic and the MSGID TCPOUT for outgoing TCP traffic. Messages with the
same MSGID should reflect events of the same semantics. The MSGID itself is a string without further
semantics. It is intended for filtering messages on a relay or collector.

2011 by MASSIF Consortium

33 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

STRUCTURED-DATA provides a mechanism to express information in a well defined, easily parseable


and interpretable data format. There are multiple usage scenarios.

Links with other data formats


Relationship with BSD Syslog, RFC 3164[9].

Relationship with MASSIF


Given its widespread use, we expect many of the use cases to partially rely on it. Beyond the project,
supporting syslog is an absolute requirement for commercial success of a SIEM platform, be it as software or as a managed security service.

2.8.4 Critical assessment of the format

Advantages
The syslog format tries to provide a solid basis that allows code to be written once for each syslog feature
rather than once for each transport. Without this format, each other standard would need to define its
own syslog packet format and transport mechanism, which over time will introduce subtle compatibility
issues.

Issues
The protocol may content the NULL value as control characters. However, invalid UTF-8 sequences may
be used by an attacker to inject ASCII control characters. Similarly, message truncation can be misused
by an attacker to hide vital log information.
There is no mechanism in the syslog protocol to detect message replay. An attacker may record a
set of messages that indicate normal activity of a machine. At a later time, that attacker may remove
that machine from the network and replay the syslog messages to the relay or collector.
Some messages may be lost because there is no mechanism to ensure delivery, and the underlying
transport may be unreliable (e.g., UDP).
Syslog can generate unlimited amounts of data. The transfer of this data over UDP is generally
problematic, since UDP lacks congestion control mechanisms.
The syslog protocol does not have mechanisms to provide confidentiality for the messages in transit.

2011 by MASSIF Consortium

34 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Network administrators must take the time to estimate the appropriate capacity of the syslog collector.
An attacker may perform a Denial of Service attack by filling the disk of the collector with false messages.

Uses
Syslog is in widespread use, both for UNIX operating system hosts and for networking equipments.

2.9 Windows Management Instrumentation (WMI)

2.9.1 Reference
Windows Management Instrumentation (WMI) is the Microsoft implementation8 of Web-based Enterprise Management (WBEM), which is an industry initiative to develop a standard technology for accessing management information in an enterprise environment.
WMI uses the Common Information Model (CIM)9 industry standard to represent systems, applications, networks, devices, and other managed components. CIM is developed and maintained by the
Distributed Management Task Force (DMTF). The Managed Object Format (MOF)10 language is used
to create new CIM class.

2.9.2 Objectives
The main target of WMI is to provide a standard to share management information between management
applications windows-based throughout the network. The aim of this set of specifications is to establish a
uniform model that allows working in different environments and interact with other existing management
standards to access information from any source, such as DMI (Desktop Management Interface) or
SNMP.
8 http://msdn.microsoft.com/en-us/library/aa384642(v=VS.85).aspx
9 http://www.dmtf.org/standards/cim
10 http://msdn.microsoft.com/en-us/library/aa823192%28v=vs.85%29.aspx

2011 by MASSIF Consortium

35 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.9.3 Structure

Structure overview
The Microsoft WMI implements the three-tiered model of the WBEM architecture for working with management data that in this case includes the following components: a standard mechanism for storing
object definition (a CIM-compliant object repository), a standard protocol for collecting and distributing
management data (such as COM/DCOM), and one or more Win32 dynamic-link libraries (DLLs) that
function as WMI data providers.
Diagram shows the data flow in the WMI architecture11 :

Figure 2.2: Windows Management Infrastructure architecture data flow


11 http://msdn.microsoft.com/en-us/library/ff566343%28v=VS.85%29.aspx

2011 by MASSIF Consortium

36 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

It is important to highlight that WMI is an object model and not a language. Several scripting languages, such as VBScript or Windows PowerShell, can be used in WMI to manage the different windowsbased servers locally and remotely.
The Windows Management Instrumentation defines the objects, methods and properties which are
needed to access to the management information data from the different parts of the operating system.
The model that WMI uses to store this information is the standard Common Information Model (CIM).
According to the CIM Specification 2.312 , there are three different levels of classes in the CIM model
for storing information: the Core, Common and the Extended classes.
The core model define an information model that applies to all areas of management
The common model applies to information that is common to particular management areas (such
as systems, applications, networks and devices) but which is independent of a particular implementation or technology.
The extension schemas are extensions to the common model for a specific technology, for example
for different operating systems such as Microsoft Windows or Unix.
On the other hand, according to the CIM definition provided by the DMTF, CIM is composed of a
specification and a schema. The specification defines the details for integration with other management
models, while the schema provides the actual model descriptions.
The specification can be described in Unified Modeling Language (UML), Managed Object Format
(MOF), or Extensible Markup Language (XML). But to create and describe classes in the Common
Information Model (CIM), the Managed Object Format (MOF)13 is the most used and popular language.

Links with other data formats


WMI is an implementation of the Web-Based Enterprise Management (WBEM) and is fully compliant
with the Common Information Model (CIM), defined by the DMFT, which is based upon UML.
MOF, the language that is used for describing the CIM classes, is based on the Interface Definition
Language (IDL).
It is possible to use Windows Remote Management (WinRM) instead the Distributed Component
Object Model (DCOM) to obtain remote WMI management data using the WS-Management SOAPbased protocol that are formatted in XML.

Relationship with MASSIF


In the Olympic Games scenario there are Windows systems where WMI might be used to grab the logs
but at the present, they are enforced using the standard format and moved to syslog.
12 http://www.dmtf.org/sites/default/files/standards/documents/DSP0004V2.3_final.pdf
13 http://www.dmtf.org/sites/default/files/standards/documents/DSP0004_2.6.0_0.pdf

2011 by MASSIF Consortium

37 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

2.9.4 Critical assessment of the format

Advantages
WMI is widely present in windows-based applications so it is a common way to access and share management information from local and remote computers. Besides, there is a variety of scripting languages
(such as VBScript or Perl), that can be used in enterprise applications and administrative scripts to obtain
WMI data or take actions through WMI.
CIM is a model that permits both a common model that applies to all areas and particular extensions
to define different management information for systems, networks, applications, devices and services.
This feature allows building semantically rich management information that will be exchange throughout
the network.

Issues
The WMI log files are being replaced by Event Tracing for Windows (ETW) .
Some vulnerability on applications that use Windows Management Instrumentation can be found.
For example in some applications, due to insufficient security protections on WMI providers, a local
attacker could gain elevated privileges on the local system and use them to take control of it.

Uses
WMI scripts and applications are used to obtain and exchange management information on windowsbased systems. These scripts allow performing administrative tasks on parts of the operating systems
as well as share management data with different products. Some of the products can be Microsoft
System Center Operations Manager or Windows Remote Management (WinRM).

2.10 WS-Eventing and WS-Notification

2.10.1 Objectives
WS-Eventing[1] and WS-Notification[7] are two competing specifications to standardize message formats and Web services interfaces for subscription management and notification delivery in event notification systems in WS-based systems. A WS-based event notification system utilizes Web services tech-

2011 by MASSIF Consortium

38 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

nologies to deliver event notifications and manage subscriptions. In such a system, a SOAP-formatted
subscription is sent to an event producer Web service, requesting a certain kind of event notifications to
one or more event consumer Web services. As events occurr, the event consumer Web services can
receive SOAP-formatted notification messages. The notification messages can be transported through
intermediary and use different transportation mechanisms.

2.10.2 Structure

Architecture
The architectures presented in WS-Eventing and WS-Notification are remarkably similar irrespective of
their incompatibility. In fact, subsequent versions of each specification have converged towards each
other, borrowing concepts from the other to mitigate their own deficiencies.
WS-Eventing and WS-Notifications both process identical WS-based architecture and follow Publisher/Subscriber design. Both define subscriber and subscription manager entities. The event sink
defined in WS-Eventing is comparable to the notification consumer defined in WS-Notification. The
subscribers are separated from notification consumers such that notification consumers are required to
handle only the received notification messages. They are not required to know the message broker location and manage subscriptions. WS-Eventing does not separate the publisher from the event source.
The event source in WS-Eventing has both functions of the notification producer and publisher defined
in WS-Notification.

Function
WS-Eventing defines five operations, namely Subscribe, Renew, GetStatus, Unsubscribe and SubscriptionEnd. The Subscribe operation is used to create a subscription for an event sink. The Renew, GetStatus and Unsubscribe operations are provided by subscription managers to subscribe to their existing
subscriptions. If an event source terminates unexpectedly, a SubscriptionEnd message is generated
and sent to the address specified in the subscription request. If that address is not presented in the
subscription request, this SubscriptionEnd message is not generated.
WS-Notification has comparable operations for the above five operations. Even though it does not
define GetStatus and SubscriptionEnd operations, they can be implemented using the (optional) WSResourceFramework since WS-Notification can treat subscriptions as WS-Resources in WS-ResourceFramework
specification.

2011 by MASSIF Consortium

39 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Delivery mode
Both WS-Eventing and WS-Notification can use push, pull and wrapped mode to deliver notification
messages. The wrapped mode deliver can encapsulate several notification messages on to one for
efficient delivery. The pull mode enables the event sink or notification manager to check an event source
periodically for relevant events. In push mode, the event source waits for an acknowledgement for the
notification message it sends.

Filters
WS-Notification defines three types of message filters namely TopicExpression, ProducerProperties and
MessageContent. A subscriber can use any or all of these filters. WS-Eventing allows at most one filter
in subscription requests. The default filter is a content-based filter using XPath expressions in a specified
dialect that evaluates to a Boolean value as a filtering criteria. WS-Eventing does not specify a way to
filter messages using ProducerProperties of publishers.

Links with other data formats


WS-Eventing and WS-Notification specifications are composable with other WS-* specifications. Hence
they only defines the key publishers/subscriber functions and rely on other WS-* specifications to provide
various value additions such as security, reliability and transactions. For instance, WS-Security can be
used with WS-Eventing or WS-Notification to provide secure delivery of messages.

Relationship with MASSIF


Both specifications are candidates for receiveng events from web services platforms.

2.10.3 Advantages of the formats


Both specifications provide means to develop distributed event notification systems utilizing exiting
Web services technology which intrinsically provides vendor-independent, platform independent
and programming language independent interoperability.
They are composable with other WS-* specifications to provide various value additions such as
secure delivery, reliability and transactions.
Fits well with Asynchronous Web services Invocation paradigm

2011 by MASSIF Consortium

40 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Data source

Characteristics

Rationale summary

SIEM Standard Experience


CEF

Y(1)

CEF is an interesting glance at data collection from an


important SIEM vendor and is a public specification.

CLF

Y(all)

CLF is a major log format for web servers, being supported by Apache out of the box. It can be directly integrated in many SIEMs, e.g. Prelude and RSA.

IDMEF

Y(1)

While IDMEF is not widely used in the community, and


its important overhead may prevent its further diffusion,
it does provide a reference viewpoint for modeling alert
information. At least 2 MASSIF partners have experience
with IDMEF.

IF-MAP

IF-MAP is a recent newcomer and has industrial backing,


although outside the SIEM community so far. One MASSIF partner has experience with IFMAP.

IODEF

IODEF addresses a different community than the classic


SIEM world, so provides an additional, alternative viewpoint about decision support modeling, that has to our
knowledge no equivalent, and that is important for the
MASSIF decision support components.

IPFIX

IPFIX is becoming increasingly important in the networking world, where it may provide an alternative or a complement for syslog.

Syslog

Y(all)

This is the major data source. It is clearly used a lot in


SIEMS, has standards backing and is used by professionals. It is the de-facto data source standard for the
ATOS use case and for many network operators. While
the analysis of syslog messages needs to be refined to
really understand the content, it does provide a first entry
point for syntactic and semantic analysis.

WMI

WMI is one of the major interfaces for managing Microsoft


windows systems, and as such is a way to retrieve information from them, that is of interest to the MASSIF
project.

WS-Eventing

While these languages are currently rarely included in


SIEM environments, the focus of MASSIF on business
processes attack detection makes these languages important.

Table 2.2: Included log sources summary

2011 by MASSIF Consortium

41 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Data source

Characteristics

Rationale summary

SIEM Standard Experience


ODBC

While ODBC is used as a collection mechanism, it should


be considered with caution. We believe that its use is oriented to Windows environments, and WMI provides a better alternative. Also, it is purely about transport and does
not provide us with information about the data, thus is considered out of scope of this deliverable.

SNMP

While SNMP is cited as a collection mechanism by several SIEMs, its use seems to be limited to transporting
data. The management information bases used by SIEMS
would have been in scope, but SIEM products do not publicly document this, and the transport protocol only is out
of the scope of this deliverable.

Log file pull

Several methods for pulling out log files are mentioned in


SIEMs documentations, such as FTP, SFTP, SSH or SCP.
This does not provide information about the content of the
information handled thus does not fall into the scope of this
deliverable.

Table 2.3: Eliminated log sources summary

2011 by MASSIF Consortium

42 / 61

Chapter 3

Use-case specific data streams

3.1 Olympic Games Scenario

3.1.1 Motivation and description


The Olympic Games SIEM definition follows business drivers, that is, definition is tight to the specific
technology that the customer (the Local Organizing Committee) decides. Usually this decision follows
sponsorship interests.
Hence, events processing languages in the Olympic Games Scenario is tight to the specific SIEM
product development context. The choice of the language events processing protocol will influence the
internal representation of the events data, transmission and storage but, by all means, it is usually tight
to the specific SIEM product. Current contexts are based in the Novell SIEM product (i.e. Novell Sentinel
6.1 in the Vancouver Winter Olympic Games project) and only two different protocols where used in the
last Olympic Games: Syslog and LEA.
The Olympic Games SIEM uses the Novell Sentinel product. Novell Sentinel 6.11 delivers realtime monitoring and remediation for automated security and compliance. With a single view of security
and compliance events across the enterprise, Sentinel 6.1 combines identity management and security
events management for real-time. Sentinel 6 streamlines labor-intensive and error-prone processes,
cuts costs through automation, and enables you to deliver a more rigorous security and compliance
program.
1 http://www.novell.com/products/sentinel/

43

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

3.1.2 Novell Sentinel Interface: Syslog data format

Description
Syslog2 (see section 2.8) is a standard for logging program messages. It allows separation of the
software that generates messages from the system that stores them and the software that reports and
analyzes them. It also provides devices, which would otherwise be unable to communicate, a means to
notify administrators of problems or performance.
There are three main topics when defining the Olympic Games related events and languages:
1. How to collect data transmission, syslog, wmi, snmp, etc
2. How to parse the data format, spaces and commas
3. How to make sense out of the collected data meaning/logics of the fields posed by the monitored
application/system
Mapping these three topics into Novell Sentinel 6.1 we get the following Novell components:
Sources are systems that are being monitored.
Connectors define connectivity protocols. Only two different protocols where used in the last Olympic
Games: Syslog and LEA.
Collectors define parsing rules and mapping of the internal data presentation into Sentinel taxonomy.
Collectors examples used in the Olympic Games were Windows (through Snare agents), Sourcefire, Nortel switches/routers or Sophos Antivirus.

Advantages
Syslog provides flexibility when dealing with different SIEM products and obviously is a widely extended
log format.
Syslog is the preferred (de facto) format in the Olympic Games scenario.

Drawbacks and issues


We have used Syslog as native log function built-in in the network devices, e.g. switches/routers, IDS,
FW appliances, etc. These devices can not speak IDMEF or similar.
2 http://www.syslog.org/

2011 by MASSIF Consortium

44 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

When monitoring Windows systems we might used WMI to grab the logs, but still we enforced using
the standard format and moved to syslog by implementing Snare agents on each windows system
translating Eventlog into Syslog.

Examples
The following are examples of valid syslog messages. A description of each example can be found below
it. The examples are based on similar examples from RFC 3164[9] and may be familiar to readers. The
otherwise-unprintable Unicode BOM is represented as "BOM" in the examples.

Example 1 - with no STRUCTURED-DATA


<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47
- BOM'su root' failed for lonvick on /dev/pts/8
In this example, the VERSION is 1 and the Facility has the value of 4. The Severity is 2. The message
was created on 11 October 2003 at 10:14:15pm UTC, 3 milliseconds into the next second. The message
originated from a host that identifies itself as mymachine.example.com. The APP-NAME is su and
the PROCID is unknown. The MSGID is ID47. The MSG is su root failed for lonvick..., encoded in
UTF-8. The encoding is defined by the BOM. There is no STRUCTURED-DATA present in the message;
this is indicated by - in the STRUCTURED-DATA field.

Example 2 - with no STRUCTURED-DATA


<165>1 2003-08-24T05:14:15.000003-07:00 192.0.2.1 myproc 8710 - %% It's time to make the do-nuts.
In this example, the VERSION is 1. The Facility is 20, the Severity 5. The message was created
on 24 August 2003 at 5:14:15am, with a -7 hour offset from UTC, 3 microseconds into the next second.
The HOSTNAME is 192.0.2.1, so the syslog application did not know its FQDN and used one of its
IPv4 addresses instead. The APP-NAME is myproc and the PROCID is 8710 (for example, this could
be the UNIX PID). There is no STRUCTURED-DATA present in the message; this is indicated by - in
the STRUCTURED-DATA field. There is no specific MSGID and this is indicated by the - in the MSGID
field. The message is %% Its time to make the do-nuts.. As the Unicode BOM is missing, the syslog
application does not know the encoding of the MSG part.

2011 by MASSIF Consortium

45 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Example 3 - with STRUCTURED-DATA


<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3" eventSource="Application"
eventID="1011"] BOMAn application event log entry...
This example is modeled after Example 1. However, this time it contains STRUCTURED-DATA, a single element with the value [exampleSDID@32473 iut=3 eventSource=Application eventID=1011].
The MSG itself is An application event log entry... The BOM at the beginning of MSG indicates UTF-8
encoding.

Example 4 - STRUCTURED-DATA Only


<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3" eventSource="Application"
eventID="1011"][examplePriority@32473 class="high"]
This example shows a message with only STRUCTURED-DATA and no MSG part. This is a valid
message.

3.1.3 Novell Sentinel Interface: LEA API

Description
Checkpoint3 has two APIs, LEA (Log Export API) and ELA (Event Logging API), that allow third parties
to access log data. This ability to access a granular level of connection detail enables robust reporting
capabilities by specialized security products, network reporting products, help desk and event management systems, security audits, accounting and billing, and network management systems. This
integration is accomplished through two client-server APIs which enable events to be passed between
the Check Point Management Console and other products through secure channels.
The Log Export API enables applications to read the VPN-1/FireWall-1 log database. The LEA client,
written by an OPSEC (Open Platform for Security) partner, can retrieve both real-time and historical log
data from the Management Console with the LEA server. A reporting application can use the LEA
client in an on-line mode or off-line mode to process the logged events that are generated by the VPN1/FireWall-1 security policy. OPSEC partners rely on LEA as a mission-critical source for granular traffic
connection information driven by the VPN-1/FireWall-1 kernel engine. The SSL-enabled version of LEA
3 http://www.checkpoint.com/

2011 by MASSIF Consortium

46 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

provides additional security to applications-ensuring that all data traversing the network between the
LEA application and the firewall management system is encrypted.

Drawbacks and issues


Checkpoint supports only its proprietary protocol LEA for real time events export. LEA is a proprietary
protocol with disadvantages such as higher costs, slower distribution models and a lack of personalization.

3.2 Mobile Money Transfer Service scenario

3.2.1 Motivation and description


The Money Transfer Service is a system where virtual money, called mMoney, is used to carry out
various types of money transfers. As it is forbidden to create money in a country, the systems operator
has to be associated with a bank which will emit mMoney in exchange of the equivalent amount of real
money. The operator and its associated bank have to report specific data and activities to the central
bank which is responsible of a countrys financial policy in order to fight against fraudulent activities such
as money laundering.

3.2.2 Mobile Money Service: proprietary data format


The Mobile Money Transfer Service can be divided into three processes which run concurrently:
Management of the systems stock of mMoney: The various users of the Money Transfer Service
(operator, merchants, billers, retailers and customers) form a system in which the mMoney stock
has to remain constant.
Before the Mobile Money Transfer Service is launched, mMoney has to be emitted by the Partner
Bank. For this purpose, the operator gives non-virtual money to the bank and will receive the
equivalent amount of mMoney. This is the initial stock of mMoney that can be distributed by the
operator to the other users of the Mobile Money Transfer Service in exchange of cash. The operator
is able to increase or decrease the amount of mMoney that is used in the system by exchanging
mMoney and cash with the Partner Bank.
Transactions made by the services subscribers: The customers of the Mobile Money Transfer Service can perform the following operations:

2011 by MASSIF Consortium

47 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Recharge: This service enables customers to buy extra time for their telecom prepaid account
with mMoney.
Cash in / Cash out: This transaction allows customers to deposit or withdraw money from
their mobile wallet (mWallet) through a retailer.
National / International Money Transfer: This enables customers to transfer mMoney from
their Wallet to another person within the country or outside the country. The receiver may
be registered or not to the same Money transfer service and may also be a user of different
operator.
Bill Payment: It enables customers to receive and pay bills using their mWallet account.
Salary Payment: It allows customers to have their salary paid on their mWallet account.
Social Security Payment: It allows users to have their social security benefits paid on their
mWallet account.
Merchant Payment: It enables users to buy goods and services with mMoney from their
mWallet accounts.
Third-Party Payments: It enables users to pay through a third party like Paypal.
Financial Operations: It allows users to perform financial operations such as credit and savings.
The retailers, billers and merchants can also interact with the operator to exchange mMoney into
cash or inversely.
Reporting to the Central Bank: Periodically, a report is generated for the Central Bank by the Partner
Bank. The Central Bank also has the right to access the information of any transaction it wishes to
investigate.
All of these operations must be included in the audit trail provided by the applications log files.
Table 3.1 summarizes the description of Money Transfer Service actors.
In money transfer service, information which follow are necessary for each transaction:
MSISDN The phone number of customer (sender/receiver)
User ID The identifier of actor (sender/receiver)
Transaction ID The transaction identifier
Transaction Type The transaction type (money transfer, withdrawal, ...)
Transaction Status The transaction status (success, fail, waiting, ...)
Request Type The type of transaction (Request, Reply, Signaling, ...)
Transfer ID, Date and Time of Transfer
Actor Category Sender/Receiver Category (Customer, Merchant, Biller, ...)
Balance Sender/Receiver mMoney Balance (Customer, Merchant, Biller, ...)
Figure 3.1, shows an example of log message.

2011 by MASSIF Consortium

48 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Actor

Table 3.1: Money Transfer Message Elements


Description

Customer

An individual user of the Money Service holding an active Money


Account.

mMoney

An electronic unit of monetary denominated in local currency and


issued by the Bank

mWallet

Virtual account hosted in the Operator system, allowing customer


to deposit money in his mWallet, transfer amount from own mWallet to other customers mWallet, do the mobile recharge from
mWallet, pay bills through mWallet, balance inquiry etc

User Money Ac- The mMoney account opened by Operator in the users name,
count
for the purpose of holding and managing the mMoney held by
the relevant Participant.
Participant

Any service actor (Customer, Retailer, Merchant, ...)

Retailer

Local interface between the operator and the end user

Wholesaler

Retailer provider

Merchant

The merchant sells goods or services.

Biller

The biller sells goods or services

3.3 Managed Enterprise Service Infrastructures scenario

3.3.1 Motivation and description


The Managed Enterprise Service Infrastructures scenario uses the IBM Tivoli Security Manager (TSOM)4
SIEM. This product offers centralization and storage of security data, improving security operations and
aiding in information risk management. TSOM SIEM system offers a platform for managed security services, using automation to reduce operational costs. TSOM is comprised of four components, namely
an Event Aggregation Module (EAM), Universal Collection Module (UCM), Central Management System
(CMS) and a database.
The aim of the Managed Enterprise Service Infrastructure scenario is to improve the functionality
of SIEM systems, by providing input to the analysis and detection modules of MASSIF and receiving
feedback from the analysis modules. Analysis of events can shed light on those which are risk-creating
or a violation of system security, improving the effectiveness of security management within such a
managed environment. It is the intention to create a feedback loop whereby security alerts created in
MASSIF can be fed back into the Managed Enterprise Service environment to improve the quality and
proactiveness with which security management teams can respond.
Two main components within IBMs TSOM manage the collection of events within the security environment:
4 http://www-01.ibm.com/software/tivoli/products/security-operations-mgr/

2011 by MASSIF Consortium

49 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Figure 3.1: Log example


<MESSAGE>
<date> [2010-03-25 16:12:19,761] </date>
<Type1> Request From MSISDN: 1111111111 </Type1>
<TYPE>Type1_Request</TYPE>
<MSISDN>1111111111</MSISDN>
<AMOUNT>2000</AMOUNT>
...
..
</MESSAGE>
<MESSAGE>
<date> [2010-03-25 16:12:20,020] </date>
<Type1> Reply To MSISDN: 1111111111 </Type1>
<TYPE>Type1_Reply</TYPE>
</MESSAGE>
<MESSAGE>
<date> [2010-03-25 16:47:51,177] </date>
<Type1> Request From MSISDN: 2222222222 </Type1>
<TYPE>Type2_Request</TYPE>
<MSISDN>3333333333</MSISDN>
<STATUS></STATUS>
</MESSAGE>
<MESSAGE>
<date> [2010-03-25 17:21:25,826] </date>
<Type1> Request From MSISDN: 1111111111 </Type1>
<TYPE>Type3_Request</TYPE>
<MSISDN>1111111111</MSISDN>
<AMOUNT>2000</AMOUNT>
</MESSAGE>

Event Aggregation Module (EAM): Data from various network devices and applications are gathered
by the EAM, via conduits such as Syslog or SNMP. The EAM normalizes, filters, batches and
transmits incoming data streams to the Central Management System (CMS) for further processing.
Central Management System (CMS): Data streams received from EAM servers are correlated and
events categorized and stored within a connected database. Using a deterministic threat analysis
technique, the CMS determines the level of threat an event poses, applying pre-configured rules
from the stateful rules engine to respond to threatening events and attack signatures.
TSOM allows actions to be performed in response to events, such as transmission of SNMP traps or
Syslog messages.

3.3.2 Tivoli TSOM interface: SNMP data format


An SNMP trap consisting of various OIDs and data is illustrated in table 3.2 below using a Windows
Event Logging example:

2011 by MASSIF Consortium

50 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

OID Name

Value

sysUpTime

snmpTrapOID

1.3.6.1.4.1.13978.2.0.2

EamTime

1294301091056

SensorTime

1294301855000

SecurityDomain All Event, MF

SensorName

OMRSNA016

SensorType

Windows EventLog

... ...

...

25 Information

EventLog = Security
RecordNumber = 6049430
TimeGenerated = 2011-01-06 10:17:35
TimeWritten = 2011-01-06 10:17:35
EventID = 529
EventType = 16
EventTypeName = Failure Audit event
Table 3.2: TIVOLI TSOM SNMP Trap content example

Tasks include identifying the most significant OIDs within SNMP traps, and pre-processing this data
into CSV files. An anonymisation tool is responsible for providing anonymous sample event data for
testing with MASSIF.

3.3.3 Tivoli TSOM interface: Syslog data format


The Syslog format has proven to be useful as there are many open source and proprietary tools for
reporting and analysis. Message packet sizes are 1024 bytes and contain facility, severity, hostname,
timestamp and message fields. An example with these fields is shown using a Syslog message:

Mar

6 22:48:34.452 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback0, changed state to

3.4 Critical Infrastructure Process Control (Dam) scenario


This use case section is more detailed than the previous ones, as the partners of the MASSIF project
are less familiar with this kind of data.

2011 by MASSIF Consortium

51 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

3.4.1 Motivation and description


This section describes selected formats related to the Critical Infrastructure Process Control scenario.
The scenario considers an Automatic Data Acquisition System (ADAS) developed for controlling dams.
A typical ADAS architecture is organized on a three levels hierarchical architecture. The lower level of
this architecture is composed of a number of sensors/actuators. A new trend for such a level is the
adoption of wireless sensor networks resulting in a number of opportunities but also in new challenges.
The second level of this architecture is composed of Remote Transmission Units (RTUs) responsible for
interfacing on the field devices with a remote control unit (Master Terminal Unit, MTU) placed at the third
level of the hierarchy. Sometimes RTUs are substituted by more complex devices referred as Master
Control Unit.
Typically the MTU is connected to a Control Station and a Visualization Station. The Control Station
is a complex service infrastructure, which includes: applications, systems, networks, security, storage,
mainframe environments. In the Visualization Station, a web interface is available for remote monitoring
of the dam.
The scenario includes a high number of heterogeneous formats and data flows including widely used
formats, such as Syslog or Common Log Format (CLF), industrial control specific formats, and legacy
formats.
This section specifically focuses on a selection of modern industrial control specific formats, namely
Modbus, CTP, and DIP.
This section describes data formats specific for the Critical Infrastructure Process Control (Dam)
scenario. A more comprehensive description of the scenario can be found in the chapter 6 of the D2.1.1
(Scenario Requirements) deliverable. A Modbus reference can be found at the Modbus homepage 5 .
The Collection Tree Protocol (CTP)6 and the Dissemination Protocol (DIP)7 are part of the TinyOS

operative system, a BSD-licensed operating system designed for low-power wireless devices. More
formats of interest for the scenario are also described in this deliverable (e.g. syslog, CLF).

3.4.2 Dam scenario: Modbus data format


Modbus Protocol is a messaging structure developed by Modicon in 1979. It is used to establish masterslave/client-server communication between intelligent devices. It is a de facto standard, truly open and
the most widely used network protocol in the industrial manufacturing environment. It has been implemented by hundreds of vendors on thousands of different devices to transfer discrete/analog I/O and
register data between control devices. The Modbus protocol was transferred from Schneider Electric to
Modbus-IDA2 in April 2004, as a commitment to openness. The specification is available free of charge
for download, and there are no subsequent licensing fees required for using Modbus or Modbus TCP/IP
protocols.
5 http://www.modbus.org
6 http://www.tinyos.net/tinyos-2.x/doc/html/tep123.html
7 http://www.tinyos.net/tinyos-2.x/doc/html/tep118.html
8 http://www.tinyos.net/

2011 by MASSIF Consortium

52 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

With regards to the CTP and DIP protocols, they are used in the Wireless Sensor Networks deployed
on the dam scenario. CTP is a tree-based collection protocol where nodes in a wireless network are
tree roots. In our context the protocol is used to collect data and information from wireless sensor nodes
constituting a WSN. DIP is a dissemination protocol used in the WSN to send commands through the
tree nodes.

Structure overview (Modbus)


Modbus is an application-layer messaging protocol, positioned at level 7 of the OSI model. It provides
client/server communication between devices connected on different types of buses or networks.
Modbus is used in multiple master-slave applications to monitor and program devices; to communicate between intelligent devices and sensors and instruments; to monitor field devices using PCs and
HMIs. Modbus is also an ideal protocol for RTU applications where wireless communication is required.
For this reason, it is used in innumerable gas and oil and substation applications. Modbus is not only an
industrial protocol it is also used in building, infrastructure, transportation and energy applications also.
It is currently implemented using:
TCP/IP over Ethernet
Asynchronous serial transmission over a variety of media (wire : EIA/TIA-232-E, EIA-422, EIA/TIA485-A; fiber, radio, etc.)
Modbus PLUS, a high speed token passing network.
The Modbus protocol allows an easy communication in all types of network architectures.
Every type of device (PLC, HMI, Control Panel, Driver, Motion control, I/O Device, . . . ) can use the
Modbus protocol to initiate a remote operation. The same communication can be done as well on a
serial line as on an Ethernet TCP/IP network. Gateways allow a communication between several types
of buses or networks using the Modbus protocol.
Modbus is a request/reply protocol and offers services specified by function codes. Modbus function
codes are elements of Modbus request/reply PDUs (Protocol Data Units). The TCP port number 502 is
reserved for this protocol.
The Modbus application data unit is built by the client that initiates a Modbus transaction. The function indicates to the server which kind of action must be performed. The Modbus application protocol
establishes the format of a request initiated by a client.

Figure 3.2: General Modbus Frame

2011 by MASSIF Consortium

53 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

The function code field of a Modbus data unit is coded in one byte. Valid codes are in the range of 1
255 decimal (the range 128 255 is reserved and used for exception responses). When a message
is sent from a Client to a Server device, the function code field tells the server which kind of action must
be performed. Function code "0" is not valid. Sub-function codes are added to some function codes to
define multiple actions. The data field, contained in the messages sent from a client to a server device,
contains additional information that the server uses to take the action defined by the function code. This
can include items like discrete and register addresses, the quantity of items to be handled, and the count
of actual data bytes in the field. The data field may be non-existent (of zero length) in certain kinds of
requests, in this case the server does not require any additional information. The function code alone
specifies the action. If no error related to the Modbus function requested occurs (in a properly received
Modbus ADU), the data field of a response from a server to a client contains the data requested. If
an error related to the Modbus function requested occurs, the field contains an exception code that the
server application can use to determine the next action to be taken. For example a client can read the
ON / OFF states of a group of discrete outputs or inputs or it can read/write the data contents of a group
of registers. When the server responds to the client, it uses the function code field to indicate either
a normal (error-free) response or that some kind of error occurred (called exception response). For a
normal response, the server simply echoes to the request of the original function code.

Figure 3.3: Modbus transaction (error free)


For an exception response, the server returns a code that is equivalent to the original function code
from the request PDU with its most significant bit set to logic 1. The size of the Modbus PDU is limited by

Figure 3.4: Modbus transaction (exception response)


the size constraint inherited from the first Modbus implementation on Serial Line network (max. RS485
ADU = 256 bytes).

2011 by MASSIF Consortium

54 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

The Modbus protocol defines three PDUs. They are :


Modbus Request PDU, mb_req_pdu
Modbus Response PDU, mb_rsp_pdu
Modbus Exception Response PDU, mb_excep_rsp_pdu
Modbus uses the big-Endian representation for addresses and data items.

Modbus Advantages
Simplicity Modbus TCP/IP simply takes the Modbus instruction set and wraps TCP/IP around it. Development costs are exceptionally low. Minimum hardware is required, and development is easy
under any operating system.
Open The Modbus specification is available free of charge for download, and there are no subsequent
licensing fees required for using Modbus or Modbus TCP/IP protocols. Additional sample code,
implementation examples, and diagnostics are available on the Modbus TCP toolkit, a free benefit
to Modbus Organization members and available for purchase by nonmembers.
Availability of many devices Interoperability among different vendors devices and compatibility with a
large installed base of Modbus-compatible devices.

Issues (Modbus)
Non-encrypted Modbus is not encrypted. There is no protection from message eavesdropping and
spoofing.
No data description The Modbus protocol does not natively support data object description.

Modbus Example
Modbus is the standard protocol used for communication between an RTU and a supervisory server,
like a SCADA system. Several version of the standard are available and differences include the data
format used. The sample below is related to the so called Modbus ASCII and RTU version and contains
measurements data:

Modbus ASCII Frame Format


START: 1 char starts with colon ( : )
ADDR: 2 char Station Address

2011 by MASSIF Consortium

55 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

FUNCT: 2 char
DATA:
n
LRC Check: 2
End: 2 chars

Function codes
chars Data
chars Error checks
carriage return line feed (CR-LF) pair

Modbus ASCII Frame Format


:
11 03
0 0 6 B 0 0 0 3
7E
CR LF
|START|ADDR |FUNCT|---------DATA----------|-LRC-|-END-|

Modbus RTU Frame Format


ADDR:
FUNCT:
DATA:
CRC:

8 bits Station Address


8 bits Function codes
n * 8 bits Data
Checksum

11
03
00
6B
00 03
76 87
|ADDR |FUNCT|-------DATA---------| CRC |
These are other typical Modbus operative commands:

Read Input Status (On/Off)


11
01
0013
0025
0E84
|ADDR|FUNCT| INPUT ADDR| #INPUTS | CRC |
Response
11
01
05
CD6BB20E1B
|ADDR|FUNCT| # DATA BYTES | response flags

45E6
| CRC |

3.4.3 Dam scenario: WSN and CTP data formats


The Collection Tree Protocol is used to send metering or alerting data elaborated by the sensors to
a collector server for feeding the RTU device used in the Infrastructure Protection system. Metering
processes regards the evaluation of different physical parameters useful in the dam protection.
Dissemination protocol is similar to multicast protocols, in that it is used to send the same information
to all the nodes. A central server uses DIP to send commands or configuration to the sensor network
nodes, for example to change their metering tasks or to change alerting thresholds.

2011 by MASSIF Consortium

56 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

The Collection Tree Protocol (CTP) is a tree-based collection protocol. The CTP is used to collect
data from the sensors in a Wireless Sensor Network by means of data messages and to send routing
information to other nodes by means of routing messages. Data in this context are measurements.
Regarding the rooting mechanism, the CTP assumes that every node is root for other nodes. The
first root node is the real root node and is called Base Station (BS). All the nodes part of a WSN send
data to the Base Station. In the CTP data message we have, among the others, the orig ID of the node
originating the message and routing flags for congestion control. Moreover CTP is address-free, in that
a node does not send a packet to a particular root; instead, it implicitly chooses a root by choosing a
next hop. Nodes generate routes to roots using a routing gradient. For the next hop choice, the CTP
uses a shortest path first algorithm, which gives priority to the route to the base station having the lowest
cost. The cost function can be based on either the hop count to the base station or on the estimate of
the link bandwidth.
So the CTP estimates the link quality with a certain number of neighbors; the protocol used to
exchange information with other nodes about the transmission cost is called LEEP (Link Estimation
Exchange Protocol)9 . The quality values are used to select the parent node, that is the neighbor node
with the best path metric. The nodes periodically send route update messages with routing information
to their neighbors. The routing message contains the measured Expected Transmission cost (ETX) to
the base station and a measure of the link quality for every neighbor node. Moreover it contains the
generating node current parent ID and the nodes current routing metric value.
The Dissemination Protocol (DIP) instead has different aims in the context of the WSN: common
uses include network reconfiguration and reprogramming. The mechanism for realizing these operations
involves the use of some shared variables among the nodes. Maintaining shared variables consistency
is the service offered by the DIP. Indeed, the dissemination service tells nodes when the value changes,
and exchanges packets so that the value will reach eventual consistency across the network. At any
given time, two nodes may disagree, but over time the number of disagreements will shrink and the
network will converge on a single value.

WSN Example
In a Wireless Sensor Network the messages from the Wireless Sensors to the Base Station contain
routing data or measurement data and are transported by means of CTP and DIP protocol. Follow some
sample packets (header and data):

PCR = Routing Pull flag (P), Congestion Notification flag (C), Reserved Bits (R)
THL = Time Has Lived
ETX = Expected Transmission
ORIGIN = Origin Nodes
SQ = Origin Sequence Number
CI = Collect_id
Data = Data payloads (i.e. measurements)
9 http://www.tinyos.net/tinyos-2.x/doc/html/tep124.html

2011 by MASSIF Consortium

57 / 61

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

Data packet (type 1):


00
01 00 15
00 03 02 0D 03 53 01 DE 00 03
00
01 00 14
00 03 02 0D 03 53 01 DE 00 03
00
01 00 2D
00 05 02 0D 02 F6 01 E1 00 05
|PCR |THX| ETX |ORIGIN |SQ |CI |
Data
|
Data packet (type 2):
00
00 00 20
00 01 00 0C 00
00
02 00 10
00 01 00 0C 00
00
00 00 20
00 01 00 0C 00
00
01 00 15
00 03 01 0C 00
|PCR |THX| ETX |ORIGIN |SQ |CI |

05
05
05
09

00
00
00
00

05
05
05
0B

00
00
00
00

01
01
01
06

00
00
00
00

00 00 00
00 00 00
00 00 00
02 00 00
Data

00
00
00
00

00
00
00
01

00
00
00
00

00
00
00
01

02
02
02
02

00
00
00
00

03
03
03
00
|

Advantages (WSN)
The CTP and DIP protocol are particularly suited for mobile devices, having strict requirements in terms
of energy saving.

Issues (WSN)
The CTP and DIP protocols lack in terms of data transmission security, due to poor on null authentication, cryptography and integrity support. WSN nodes could be victim of several cyber attacks. For
example the Sink Hole or the Sleep Deprivation try to exploit the routing mechanisms of the protocols.
Main weakness is related to the energy consumption, even in the case of the topology and functionality
restoration.

3.4.4 Links with other data formats


The Modbus, the CTP protocol and the DIP protocol have no links with other formats. Nevertheless some
kind of data collected by the RTU could feed an anomaly detection system. These metrics messages
could be forwarded in IPFIX format. Moreover the IPFIX format could be used for user defined metrics
exporting.

2011 by MASSIF Consortium

58 / 61

Chapter 4

Analysis and Conclusion

4.1 Analysis of alert and event languages


The list of event languages compiled in this deliverable covers what we feel are the most important.
Several points stand out from this list:
Widespread use The number of event languages listed shows that there is no shortage of information
being available for SIEM environments, even though the formats and access methods may vary.
Simplicity There are several very successful yet very simple formats, such as syslog and CLF. These
formats are very simple and lightweight, so easy to transport, sort, compress or process. They
are de-facto industry standards and widely deployed. These formats have also inspired CEF, the
interface data format of the Arcsight SIEM tool.
Simplicity has been a justified target for many formatis in the past, but we believe that it is reaching
its limits. All data analysis tools need to transform these simple unstructured data format in structured form, which often requires tradeoffs and interpretation on insufficient information. Also, this
simplicity comes at the cost of often unreliable transport, absence of any proof of origin, integrity
or authenticity, and limitation in the amount and type of information transmitted (e.g. syslog messages are seriously limited in size by todays standards). These constraints may be too strong for
the software developers, which are tempted to create their own, self-convenient but hard to exploit
formats.
Also, recent work on historical data analysis, and on the fly data analysis led in business environments under the terms business intelligence or decision support systems may provide modern
alternatives that enable more complex data structures.
Timestamping All events have a careful time management. Recent formats do introduce time-management
and synchronization requirements, whereas older formats repose on sound operational practices
to manage time externally.

59

MASSIF - FP7-257475
D3.2.1 - Scenarios analysis and external languages specification

The ability to accurately manage time will be a primary operational requirement for the MASSIF
platform.
Modularity Many of these data formats have some sort of hierarchical structure. The older formats may
have only one or two levels of indirection (e.g. the CLF format has two levels), and more recent
formats such as IDMEF and IODEF use a fairly complex class structure. We consider this trend to
be a corroborating example that simplicity is not enough.
Furthermore, several of the components defined in these formats are fairly similar. The notion of
address, of machine, of sensor, of timestamp, are quite similar both in syntax and in semantic
accross formats. It will thus be important, when working on deliverable 3.2.2, to precisely and extensively define these components, in order to reach concensus both on the syntax and semantic.
The existence of these components justifies the choice of defining an ontology in deliverable 3.2.2,
and in addition to the format we might also need to define the major constants that are important
in a SIEM environment, such as localhost (127.0.0.1), or the IANA ports assignments.
Also, the actual instances of these components are likely to be shared by many components of the
SIEM system. Thus, instead of including in an event all the information that qualifies it, it might be
more useful to simply provide common references to shared objects, this sharing hapenning either
in real-time or during separate information synchronization sessions.
XML XML does not appear to be widely adopted in event languages. Thus, there will be a need in
the MASSIF project to reach concensus on the use (or not) of XML, and more specifically XML
schemas, to define event streams. Two of the major advantages of XML are the built-in syntactic verification of messages (including typing with carefully specified schemas) and the ability to
project a base language into others using XSLT transformations.

4.2 Analysis of use case specific data streams


The Olympics games scenario is a classic IT application. As such, the data formats described largely
match the ones described in chapter 2, with the addition of the windows-specific WMI and the proprietary
LEA protocol from Checkpoint. As mentioned in the use case description, we can safely consider that
the basic common denominator will be syslog.
The Mobile Money Transfer Service scenario brings us in the higher layers of the stack. The components described (user, bank, transaction, etc.) are of a higher level than the ones mentioned in section
4.1. However, even though the log mixes multiline text and XML-like syntax and is clearly a custom
development by the application integrator, it does share many of the characteristics already highlighted.
The Dam scenario uses sensors which are of a different nature than the others. IT does clearly highlight security issues, particularly related to the way sensors will be able to communicate with the central
management platform, in our case the SIEM environment. Further studies are required to analyze how
they will integrate with the MASSIF environment, and to define the kind of data will be recieved and
correlated.

2011 by MASSIF Consortium

60 / 61

Bibliography
[1] Don Box, Luis Felipe Cabrera, Craig Critchley, Francisco Curbera, Donald Ferguson, Steve
Graham, David Hull, Gopal Kakivaya, Amelia Lewis, Brad Lovering, Peter Niblett, David Orchard, Shivajee Samdarshi, Jeffrey Schlimmer, Igor Sedukhin, John Shewchuk, Sanjiva Weerawarana, and David Wortendyke. Web services eventing. W3C Member Submission, March 2006.
http://www.w3.org/Submission/WS-Eventing/.
[2] B. Claise. Specification of the ip flow information export (ipfix) protocol for the exchange of ip traffic
flow information. RFC 5101, January 2008. http://www.ietf.org/rfc/rfc5101.txt.
[3] D. Crocker and P. Overell. Augmented bnf for syntax specifications: Abnf. RFC 5234, January
2008. http://www.ietf.org/rfc/rfc5234.txt.
[4] R. Danyliw, J. Meijer, and Y. Demchenko. The incident object description exchange format. RFC
5070, December 2007. http://www.ietf.org/rfc/rfc5070.txt.
[5] H. Debar, D. Curry, and B. Feinstein. Intrusion detection message exchange format. RFC 4765,
March 2007. http://www.ietf.org/rfc/rfc4765.txt.
[6] R. Gerhards. The syslog protocol. RFC 5424, March 2009. http://www.ietf.org/rfc/rfc5424.txt.
[7] Steve Graham, David Hull, and Bryan Murray. Web services base notification 1.3. OASIS Standard,
October 2006. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsn.
[8] M. St. Johns. Identification protocol. RFC 1413, February 1993. http://www.ietf.org/rfc/rfc1413.txt.
[9] C. Lonvick. The bsd syslog protocol. RFC 3164, August 2001. http://www.ietf.org/rfc/rfc3164.txt.
[10] J. Quittek, T. Zseby, B. Claise, and S. Zander. Requirements for ip flow information export (ipfix).
RFC 3917, October 2004. http://www.ietf.org/rfc/rfc3917.txt.
[11] TCG Trusted Network Connect. TNC IF-MAP Binding for SOAP. Technical report, Trusted Computing Group, 2010.
[12] TCG Trusted Network Connect. Tnc if-map metadata for network security. Technical report, Trusted
Computing Group, 2010.
[13] M. Wood and M. Erlinger. Intrusion detection mesage exchange requirements. RFC 4766, March
2007. http://www.ietf.org/rfc/rfc4766.txt.

61

You might also like