Professional Documents
Culture Documents
NetGuardian Screener Admin Guide 7.3.1
NetGuardian Screener Admin Guide 7.3.1
Version 7.3.1
NetGuardians SA <info@netguardians.ch>
Table of Contents
1. NG|Screener Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. NG|Screener Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1. NG|Screeener. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2. NG|ScreeenerUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3. NG|Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.4. NG|Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.5. NG|Discover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.6. Global . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4. NG|CaseManager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5. Syslog-NG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3. NG|Connectors Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4. Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Edit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
LDAP Mappers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Password Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
SP Descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6. User Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Role creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
User creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2. Multi-tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.5.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.6.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.7.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.8.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8. NG|Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.3.1. Sample Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.3.2. Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9. Reference Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.3. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
event_tracking_handling_sample.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
event_tracking_monitoring_sample.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Selecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
19.3.9. UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
19.9.1. Problem #1: My cluster status is red or yellow. What should I do? . . . . . . . . 186
19.9.2. Problem #2: Help! Data nodes are running out of disk space. . . . . . . . . . . . . 187
19.10.2. Controls runs very slow with huge amount of data . . . . . . . . . . . . . . . . . . . . 190
/etc/ng-screener/daemon/modules/executor.conf . . . . . . . . . . . . . . . 199
/etc/ng-screener/common/ng-screener.conf . . . . . . . . . . . . . . . . . . . . . 199
/etc/ng-screener/common/referenceData.conf . . . . . . . . . . . . . . . . . . . 200
/etc/ng-screener/daemon/modules/{forensic,feeding}.conf . . . . 200
/etc/ng-screener/common/controlCommon.conf . . . . . . . . . . . . . . . . . . . 201
/etc/ng-screener/daemon/modules/control.conf. . . . . . . . . . . . . . . . . 202
/etc/ng-screener/common/security.conf . . . . . . . . . . . . . . . . . . . . . . . . . 203
/etc/syslog-ng-rules/syslog-ng.conf . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
D.1.4. Migrate LDAP configuration for users/roles (in each tenant) . . . . . . . . . . . . . . 212
D.1.7. Install new versions of NG|Screeener and NG|CaseManager (and other required
213
packages)
D.1.8. Configure the Apache reverse proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
D.1.9. Make sure the certificates are included in the keystore . . . . . . . . . . . . . . . . . . 218
D.1.10. Restart the Apache reverse proxy and the applications. . . . . . . . . . . . . . . . . . 218
D.2. LDAP user/role migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
1.1. Overview
NG|Screener uses big data technology and predictive analytics to combine and standardize
data from across the entire bank system. Within the framework of your selected controls,
probes and connectors capture and analyze large volumes of data related to user activity
and transactions. By associating user behavior with core banking transactions, it flags
activity that may indicate fraud. A user-friendly graphical interface gives a consolidated
“control tower” dashboard view to help turn the data captured through controls into
actionable information. You have all the information you need to effectively identify and
investigate fraud, conduct forensic analysis and implement measures to prevent future
incidents. All captured audit trails and transactions are copied and stored permanently.
They cannot be corrupted or erased, which is critical for a successful prosecution case.
• Syslog
• SNMP Traps
• WMI Polling
• JDBC Polling
• Checkpoint OPSEC
• Flat files
• SAP Polling
• LDAP Polling
• Syslog agent
Once collected, audit trails are stored in their original format for long term conservation on
the file system.
The normalization process that translates the original audit trail format to a unified data
model is then executed on demand when a forensic analysis (precise operator-led
investigations) is needed.
Audit trail analysis is achieved using NG|ScreenerUI to drill down into the vast amount of
heterogeneous audit trails. Operational issues (intrusions, performance issues, internal
security threats, etc.) are detected via an intelligent collation and correlation of gathered
audit trails. Furthermore, the solution includes a report generating tool allowing to
automate regular controls and indicator generation.
NG|Console Tracking System is an SSH proxy, enabling to track all Unix servers
administrators’ activities and sends its audit trails to be processed by the NG|Analytic
Server.
This chapter presents NG|Screener global architecture and its main components.
This layer is responsible for collecting audit trails. Audit trail collection is the essence of
the solution, thus NG|Screener allows many audit trail collection mechanisms to
integrate with as many systems as possible. Most collection mechanisms proposed by
NG|Screener are not intrusive and do not require installation of other agents onto the
audit trail source system.
1. Passive collection is the easiest way to collect audit trails as it relies on standard
protocols. NG|Screener directly receives audit trails from their sources and does not
need to access the sources of the audit trails.
2. Active collection (polling) is an alternative way to retrieve audit trails when the
source system does not support any of the standard protocols presented above. In
such a situation, NG|Screener will regularly poll the source device to obtain the
latest audit trails available. Generally, read access needs to be configured on the
audit trail source to allow NG|Screener to gather audit trails. NG|Screener supports
the following active collection mechanisms: WMI Polling (Microsoft), Database
Polling, T24 Polling, Checkpoint OPSEC, SAP Polling, LDAP Polling, Flat file reading
3. Agent based collection In cases where neither passive collection nor active
collection is applicable, an agent needs to be installed on the source device to gather
local audit trails and forward them to NG|Screener.
• Raw Data Storage and Dispatching Layer
Once collected, audit trails are passed to syslog-ng to be proceeded and indexed in
NG|Storage, and, simultaneously, they are stored on the file system for archive/recovery
purpose. It is important to notice that for compliance reasons, NG|Screener stores audit
trails in their original format.
Chapter 1. NG|Screener Overview | 3
NG|Screener Administration Guide
Audit trails are stored on the file system under the /log-collector directory and are
organized in a tree as described on Figure [log_storage_organization]
Once audit trails collection and storage are achieved, the remaining task is the
interpretation of heterogeneous audit trails for analysis. Standardization of collected
audit trails is achieved using a unique normalization process which consists of
translating the proprietary audit trail format to a unified data model.
As shown on the previous schema, there are 4 data flows inside NG|Screener.
This flow normalizes and enriches raw events from NG|Messaging and stores them in
NG|Storage. Those normalized events are ready for forensic investigation or control
execution.
This flow is followed when a control is executed. It first fetches data from NG|Storage
and stores the result in thrift server. The results are then used to generate reports
which are stored in database. In addition, the report may be published to an external
channel if needed.
The reference data flow is used to read information from external data sources (SQL,
CSV, etc.) and store it in NG|Storage, which can then they be used for enriching
normalized events during the normalization process.
The realtime analysis flow computes the number of events fetched for each host/service
Chapter 1. NG|Screener Overview | 5
NG|Screener Administration Guide
and raises an alert if it is not in the pre-configured range.
There is a notion of different data windows in NG|Daemon. The following section gives a
detailed description of all of them.
This window defines the period of data that is normalized and kept in NG|Storage. The
Daemon uses a dedicated thread to clean data outside of this window at midnight. This
window is present to ensure that a controlled amount of data is stored in NG|Storage to
preserve disk space.
In case we want to load data outside of the storage window for investigation, the Custom
processing job can be used for that purpose. The functionality can be reached from the
Admin / Processing menu in the UI. This job can load data for specific services/hosts
and the desired period.
In case the job is requested to load data which already exist in NG|Storage, it will be
overwritten. That is to ensure there is no duplicated data in NG|Storage. All data outside of
the storage window is automatically removed every night at midnight.
This window defines the maximum period of data that a user can analyze in the UI. It is
configurable per forensic view (e.g. violations, transaction) to limit the maximum time
period a user can choose on each view.
Its purpose is to achieve reasonable response times and user experience of the forensic
views.
When creating such a file for a new tenant, one has to make sure it is
readable by the ng-screener user.
To save space in storage, log files under /log-collector are compressed periodically.
The logs archiving window defines the period in which logs are left uncompressed.
Normally, this window is set so that all logs arrive in the system in this period. This
prevents multiple archived files from the same day. By default, this window is set to 2 days,
which means that today’s and yesterday’s logs are not archived.
When daemon is installed, it sets a cron job to maintain this window every day. The cron job
is located in /etc/cron.d/logrotate. The maintenance script itself is located in
/usr/local/ng-screener/daemon/script/logrotate.sh.
This window defines the period for which logs are kept in /log-collector. All logs
outside of this period are removed automatically to preserve storage space.
2.1. Introduction
2.2.1. NG|Screeener
By default, the memory allocated for NG|Screener is equal to 15% of the available machine
memory minus 400MB.
The minimum memory necessary for NG|Screener is 512MB. That setting is located in
/usr/local/ng-screener/tools/packaging/generate-daemon-systemd-env.
After changing its value, restart the ng-screener service to apply the change.
2.2.2. NG|ScreeenerUI
By default, the memory allocated for NG|ScreenerUI is 10% of the available machine
memory minus 400MB.
The minimum memory necessary for NG|ScreenerUI is 400MB. That setting is located in
/usr/local/ng-screener/ui/tools/generate-ui-systemd-env. After changing
its value, restart the ng-screener-ui service to apply the change.
2.2.3. NG|Messaging
By default, the memory allocated for NG|Messaging is 1GB. To change this value, change
the KAFKA_HEAP_OPTS parameter in /usr/local/ng-
screener/ngmessaging/bin/kafka-server-start.sh. Restart the ng-messaging
service to apply the change.
2.2.4. NG|Storage
By default, the memory allocated for NG|Storage is 50% of the available machine memory
minus 400MB.
The minimum memory allocation for NG|Storage is 1GB and the maximum value is 30GB.
Those settings are located in /usr/local/ng-screener/ngstorage/bin/generate-
ngstorage-systemd-env. After changing their values, restart the ng-storage service
to apply the changes.
2.2.5. NG|Discover
2.2.6. Global
Most of the above settings (and others) can be configured globally in /etc/ng-
screener/global.env. All settings placed in that file will take precedence over any
default values. Refer to that file for available settings.
In the ng-screener.conf configuration file, one can act on a few display switches that
have an influence on the links shown in the UI (defaults are all true):
2.4. NG|CaseManager
NG|CaseManager is usually installed on the same machine with NG|Screener but it’s not a
must. If it’s not, the custom description link 'investigate hit' (the main relation from
NG|CaseManager to NG|Screener) will not work out-of-the-box. Apache configuration
needs to be modified in that case.
This will redirect all requests from the CM to the right server. If you have just
NG|CaseManager on this server then the configuration will be like the following:
SSLCipherSuite EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH
SSLCertificateFile /etc/httpd/conf.d/netguardians.crt
#SSLCertificateKeyFile /etc/httpd/conf.d/server.key
RewriteEngine On
ProxyPreserveHost On
# Proxy to caseManager
ProxyPass /cm/ http://127.0.0.1:3000/cm/
ProxyPassReverse /cm/ http://127.0.0.1:3000/cm/
</VirtualHost>
2.5. Syslog-NG
Syslog-NG has a maximum connection limit for TCP, with the default set to 10. If there’s a
lot of sources posting events to Syslog-NG, this limit might need to be increased. This can
be done in /etc/syslog-ng-rules/syslog-ng.conf as follows:
# Standard syslog
source s_collector {
tcp(ip(0.0.0.0) port(514) encoding("iso-8859-1") flags(no-multi-line)
max_connections(50));
udp(ip(0.0.0.0) port(514) encoding("iso-8859-1") flags(no-multi-line));
tcp(ip(0.0.0.0) port(63514) encoding("utf-8") flags(no-multi-line)
max_connections(50));
udp(ip(0.0.0.0) port(63514) encoding("utf-8") flags(no-multi-line));
};
3.1. Introduction
A connector is a component that allows NG|Screener to collect and analyze audit trails
from various sources (i.e. Core Banking applications, firewall devices, Windows AD, custom
applications, etc.).
A connector includes:
• Audit trail collection mechanisms and configurations adapted for the source (syslog,
agent, polling, etc.)
• Translation dictionary to transform audit trails to the Business Data Model
• Pre-configured controls, packaged in the corresponding Solutions
A connector has to be installed for every type of audit trails that are expected to be ingested
by the NG|Analytics Server.
• username & password: credentials to login to Update Center. Each client has unique
credentials for authentication. These credentials are provided by the vendor.
After modifying the configuration file, ng-screener and ng-screener-ui services need
to be restarted for the changes to take effect.
The Connectors currently installed on the system can be listed using one of the following
ways:
Do the following steps to see all the connectors installed on the server:
The selected packages will be installed in the background. The result of the operation will
be presented to the user as a notification.
A new/updated connector can be provided by the vendor in the form of an RPM file. Run the
following steps to apply it manually:
A new/updated solution can be provided by the vendor in the form of a zip file. Run the
following steps to apply it manually:
• serviceName
This is the name of the service used into our business data model
• serviceDescription
• indexPattern
Define the indexPattern used to store the event into ngStorage. Can be one of ngt, ngc,
ngi or ngv
• indexGranularity
This setting indicate when to create a new ngStorage index for this kind of event. Can be
one of day, month or year. Default value is month.
• dateFormat
This setting indicates the format to use for date parsing during the normalization phase.
It follows the Java patterns as defined in the documentation of the
DateTimeFormatter class. Its default value is yyyy-MM-dd HH:mm:ss.
• numberFormatThousandsSeparator
This setting indicates the character to consider for thousands separators when parsing
numbers. Its default value is a comma (,). Possible values are:
• numberFormatDecimalSeparator
This setting indicates the character to consider for decimal separator when parsing
numbers (i.e. the character that separates the integer part from the non-integer part in
a number). Its default value is a dot (.). Possible values are:
2 possible values :
• syslogNgUseReceivedTime
• syslogService_xx
serviceName = temenosT24Transaction
serviceDescription = Temenos T24 Audit Trails
parsingRule = temenosT24Transaction.rules
indexPattern = ngt
indexGranularity = month
syslogNgSource = s_collector
syslogNgUseReceivedTime = false
syslogService_1 = temenosT24Transaction
This section explains how to generate, activate, update or obtain information about your
license.
A C2V (client to vendor) file is a fingerprint of the server on which NG|Screener is installed.
This file allows NetGuardians or NetGuardians reseller to create a license. A V2C (vendor to
client) file is the license created specifically for your server system.
To show information about NG|Screener license, click the icon at the top right of the
screen, then select the second tab. The side panel shows system information (Figure
License Information).
Chapter 4. Licensing | 17
NG|Screener Administration Guide
◦ PROVISIONAL: a trial license, it is valid for a time-limited period. After the trial
period, its state changes to EXPIRED. You need to activate the license before its trial
period expires.
◦ ACTIVATED: the license is activated, it is valid for a time-limited period. After that
period, its state changes to EXPIRED. You need to update the license before it
expires.
◦ EXPIRED: the license is expired, you can only use a limited functionality of
NG|Screener. You need to contact the vendor to activate/update the license.
• License Features: contains which connectors are allowed to be used. It may specify the
number of connectors that are allowed to be used, or it contains a list of specific
connectors. With the former, the application can install any connectors such that the
number of connectors does not exceed the number specified in the license feature.
With the latter, you can only install connectors specified in the list. In addition, the
dialog also shows the state of each license feature and its expiration date (in UTC
time). The license feature state may be:
◦ AVAILABLE: the feature is available
◦ EXPIRED: the feature is expired, you need to contact the vendor to update the license
Notice that if the server detects that something is wrong with the license (e.g. system clock
tampering), it may lock the license and the license feature may be expired despite the
expiration date is still valid. If you are in this case, please contact the vendor for support.
A C2V file is used by the vendor to generate the license. To generate a C2V file, follow the
next steps:
Ask the vendor for an activation license (V2C file) to activate the product. The vendor may
ask you for a C2V file (Section Generate C2V file). After getting the activation file, follow the
18 | Chapter 4. Licensing
NG|Screener Administration Guide
next steps:
Ask the vendor for an update license (V2C file) to update the license and follow the next
steps:
1. Go to Admin / Licensing menu, a new page appears to manage license (Figure Update
License)
2. Click on the Choose File button. A dialog box appears asking you to choose the
location of the update license file (V2C file).
3. Specify the activation license file, then click the Open button. The file name is displayed
in the text box beside the Choose File button.
4. Click the Update button to update the license. It may take some minutes to finish the
process.
5. Refresh the web browser to apply the changes
Chapter 4. Licensing | 19
NG|Screener Administration Guide
Chapter 5. Authentication with ngAuth
5.1. Introduction
All authentication process of ngScreener is managed by the service ngAuth. This service is
a repackaging of the open source project Keycloak (https://www.keycloak.org). This chapter
will explain basic operations that an ngScreener admin should know. For more advanced
configuration settings, please refer directly to keycloak’s documentation in
https://www.keycloak.org/docs/.
To access the admin console, you need to connect to https://myhost/auth/admin and use
the superadmin user that is created at installation. The default password should be
changed, is netguardians.
In the keycloak nomenclature, a Realm is the same as a Tenant in the ngScreener side.
Then the easiest way to add a new Realm is to use the script /usr/local/ng-
screener/tools/multi-tenancy/createTenant.py.
A role in ngAuth is only a name. Functionalities are defined in each business application
(ngBrowser or ngCaseManager).
Realm-level roles are a global namespace to define your roles. You can see the list of built-
in and created roles by clicking the Roles left menu item.
To create a role, click Add Role on this page, enter in the name and description of the role,
and click Save.
To create a user, after selecting the right tenant, click on Users in the left menu bar.
Figure 3. Users
This menu option brings you to the user list page. On the right side of the empty user list,
you should see an Add User button. Click that to start creating your new user.
The only required field is Username. Click save. This will bring you to the management
page for your new user.
User role mappings can be assigned individually to each user through the Role Mappings
tab for that single user.
In the above example, we are about to assign the role NG_Admin that was created in the
Create new Roles chapter.
Many companies have existing user databases that hold information about users and their
The way it works is that when a user logs in, ngAuth will look into its own internal user
store to find the user. If it can’t find it there it will iterate over every User Storage provider
you have configured for the realm until it finds a match. Data from the external store is
mapped into a common user model that is consumed by the ngAuth runtime. This common
user model can then be mapped to OIDC token claims and SAML assertion attributes.
External user databases rarely have every piece of data needed to support all the features
that ngAuth has. In this case, the User Storage Provider can opt to store some things locally
in the ngAuth user store. Some providers even import the user locally and sync periodically
with the external store. All this depends on the capabilities of the provider and how it’s
configured. For example, your external user store may not support OTP. Depending on the
provider, this OTP can be handled and stored by ngAuth.
To add a storage provider go to the User Federation left menu item in the Admin
Console.
On the center, there is an Add Provider list box. Choose the provider type you want to
add and you will be brought to the configuration page of that provider.
If a User Storage Provider fails, that is, if your LDAP server is down, you may have trouble
logging in and may not be able to view users in the admin console. ngAuth does not catch
failures when using a Storage Provider to lookup a user. It will abort the invocation. So, if
you have a Storage Provider with a higher priority that fails during user lookup, the login or
user query will fail entirely with an exception and abort. It will not fail over to the next
configured provider.
The local ngAuth user database is always searched first to resolve users before any LDAP
or custom User Storage Provider. You may want to consider creating an admin account that
is stored in the local ngAuth user database just in case any problems come up in
connecting to your LDAP and custom back ends.
Each LDAP and custom User Storage Provider has an enable switch on its admin console
page. Disabling the User Storage Provider will skip the provider when doing user queries so
that you can view and login with users that might be stored in a different provider with
lower priority. If your provider is using an import strategy and you disable it, imported
users are still available for lookup, but only in read only mode. You will not be able to
modify these users until you re-enable the provider.
ngAuth comes with a built-in LDAP/AD provider. It is possible to federate multiple different
LDAP servers in the same ngAuth realm. You can map LDAP user attributes into the ngAuth
common user model. By default, it maps username, email, first name, and last name, but
you are free to configure additional mappings. The LDAP provider also supports password
validation via LDAP/AD protocols and different storage, edit, and synchronization modes.
To configure a federated LDAP store go to the Admin Console. Click on the User
Federation left menu option. When you get to this page there is an Add Provider
select box. You should see ldap within this list. Selecting ldap will bring you to the LDAP
configuration page.
Storage Mode
By default, ngAuth will import users from LDAP into the local ngAuth user database. This
copy of the user is either synchronized on demand, or through a periodic background task.
The one exception to this is passwords. Passwords are not imported and password
validation is delegated to the LDAP server. The benefits to this approach is that all ngAuth
features will work as any extra per-user data that is needed can be stored locally. This
approach also reduces load on the LDAP server as uncached users are loaded from the
ngAuth database the 2nd time they are accessed. The only load your LDAP server will have
is password validation. The downside to this approach is that when a user is first queried,
this will require a ngAuth database insert. The import will also have to be synchronized with
your LDAP server as needed.
Alternatively, you can choose not to import users into the ngAuth user database. In this
case, the common user model that the ngAuth runtime uses is backed only by the LDAP
server. This means that if LDAP doesn’t support a piece of data that a ngAuth feature needs
that feature will not work. The benefit to this approach is that you do not have the overhead
of importing and synchronizing a copy of the LDAP user into the ngAuth user database.
This storage mode is controled by the Import Users switch. Set to On to import users.
Edit Mode
Users, through the User Account Service, and admins through the Admin Console have the
ability to modify user metadata. Depending on your setup you may or may not have LDAP
update privileges. The Edit Mode configuration option defines the edit policy you have
with your LDAP store.
WRITABLE
Username, email, first name, last name, and other mapped attributes and passwords
can all be updated and will be synchronized automatically with your LDAP store.
UNSYNCED
Any changes to username, email, first name, last name, and passwords will be stored
in ngAuth local storage. It is up to you to figure out how to synchronize back to LDAP.
This allows ngAuth deployments to support updates of user metadata on a read-only
LDAP server. This option only applies when you are importing users from LDAP into
the local ngAuth user database.
Priority
The priority of this provider when looking up users or adding a user.
Sync Registrations
Does your LDAP support adding new users? Click this switch if you want new users
created by ngAuth in the admin console or the registration page to be added to LDAP.
Other options
The rest of the configuration options should be self explanatory. You can mouseover
the tooltips in Admin Console to see some more details about them.
When you configure a secured connection URL to your LDAP store(for example
ldaps://myhost.com:636 ), ngAuth will use SSL for the communication with LDAP
server. The important thing is to properly configure a truststore on the ngAuth server side,
otherwise ngAuth can’t trust the SSL connection to LDAP.
The global truststore for the ngAuth can be configured with the Truststore SPI. Please
check out the {installguide_name} for more detail. If you don’t configure the truststore SPI,
26 | Chapter 5. Authentication with ngAuth
NG|Screener Administration Guide
the truststore will fallback to the default mechanism provided by Java (either the file
provided by system property javax.net.ssl.trustStore or the cacerts file from the
JDK if the system property is not set).
There is a configuration property Use Truststore SPI in the LDAP federation provider
configuration, where you can choose whether the Truststore SPI is used. By default, the
value is Only for ldaps, which is fine for most deployments. The Truststore SPI will
only be used if the connection to LDAP starts with ldaps.
If you have import enabled, the LDAP Provider will automatically take care of
synchronization (import) of needed LDAP users into the ngAuth local database. As users log
in, the LDAP provider will import the LDAP user into the ngAuth database and then
authenticate against the LDAP password. This is the only time users will be imported. If you
go to the Users left menu item in the Admin Console and click the View all users
button, you will only see those LDAP users that have been authenticated at least once by
ngAuth. It is implemented this way so that admins don’t accidentally try to import a huge
LDAP DB of users.
If you want to sync all LDAP users into the ngAuth database, you may configure and enable
the Sync Settings of the LDAP provider you configured. There are 2 types of
synchronization:
The best way to handle syncing is to click the Synchronize all users button when you
first create the LDAP provider, then set up a periodic sync of changed users. The
configuration page for your LDAP Provider has several options to support you.
LDAP Mappers
LDAP mappers are listeners, which are triggered by the LDAP Provider at various
points, provide another extension point to LDAP integration. They are triggered when a user
logs in via LDAP and needs to be imported, during ngAuth initiated registration, or when a
user is queried from the Admin Console. When you create an LDAP Federation provider,
ngAuth will automatically provide set of built-in mappers for this provider. You are free to
change this set and create a new mapper or update/delete existing ones.
FullName Mapper
This allows you to specify that the full name of the user, which is saved in some LDAP
attribute (usually cn ) will be mapped to firstName and lastname attributes in the
ngAuth database. Having cn to contain full name of user is a common case for some
LDAP deployments.
Role Mapper
This allows you to configure role mappings from LDAP into ngAuth role mappings.
One Role mapper can be used to map LDAP roles (usually groups from a particular
branch of LDAP tree) into roles corresponding to either realm roles or client roles of
a specified client. It’s not a problem to configure more Role mappers for the same
LDAP provider. So for example you can specify that role mappings from groups under
ou=main,dc=example,dc=org will be mapped to realm role mappings and role
mappings from groups under ou=finance,dc=example,dc=org will be mapped to
client role mappings of client finance .
Group Mapper
This allows you to configure group mappings from LDAP into ngAuth group mappings.
Group mapper can be used to map LDAP groups from a particular branch of an LDAP
tree into groups in ngAuth. It will also propagate user-group mappings from LDAP
into user-group mappings in ngAuth.
By default, there are User Attribute mappers that map basic ngAuth user attributes like
username, firstname, lastname, and email to corresponding LDAP attributes. You are free
to extend these and provide additional attribute mappings. Admin console provides tooltips,
Password Hashing
When the password of user is updated from ngAuth and sent to LDAP, it is always sent in
plain-text. This is different from updating the password to built-in ngAuth database, when
the hashing and salting is applied to the password before it is sent to DB. In the case of
LDAP, the ngAuth relies on the LDAP server to provide hashing and salting of passwords.
Most of LDAP servers (Microsoft Active Directory, RHDS, FreeIPA) provide this by default.
Some others (OpenLDAP, ApacheDS) may store the passwords in plain-text by default and
you may need to explicitly enable password hashing for them. See the documentation of
your LDAP server more details.
Click on the "User Federation" left menu to acces the Federation part. Then choose ldap to
create a new connection.
After clicking on save, you get the LDAP admin. Now, it’s time to configure some mappers
to map data from LDAP to the ngAuth data model. The most important for us, it the
ROLE_MAPPER to be able to map some ROLE from the LDAP server with LDAP users.
An Identity Broker is an intermediary service that connects multiple service providers with
different identity providers. As an intermediary service, the identity broker is responsible
for creating a trust relationship with an external identity provider in order to use its
identities to access internal services exposed by service providers.
From a user perspective, an identity broker provides a user-centric and centralized way to
manage identities across different security domains or realms. An existing account can be
linked with one or more identities from different identity providers or even created based
on the identity information obtained from them.
An identity provider is usually based on a specific protocol that is used to authenticate and
communicate authentication and authorization information to their users. It can be a social
provider such as Facebook, Google or Twitter. It can be a business partner whose users
need to access your services. Or it can be a cloud-based identity service that you want to
integrate with.
• SAML v2.0
• OpenID Connect v1.0
In the next sections we’ll see how to configure and use ngAuth as an identity broker,
covering some important aspects such as:
When using ngAuth as an identity broker, users are not forced to provide their credentials
in order to authenticate in a specific realm. Instead, they are presented with a list of identity
providers from which they can authenticate.
You can also configure a default broker. In this case the user will not be given a choice, but
instead be redirected directly to the parent broker.
The following diagram demonstrates the steps involved when using ngAuth to broker an
external identity provider:
There are some variations of this flow that we will talk about later. For instance, instead of
presenting a list of identity providers, the client application can request a specific one. Or
you can tell ngAuth to force the user to provide additional information before federating his
identity.
As you may notice, at the end of the authentication process ngAuth will always issue its own
token to client applications. What this means is that client applications are completely
decoupled from external identity providers. They don’t need to know which protocol (eg.:
SAML, OpenID Connect, OAuth, etc) was used or how the user’s identity was validated. They
only need to know about ngAuth.
It’s possible to automatically redirect to a identity provider instead of displaying the login
form. To enable this go to Authentication select the Browser flow. Then click on config
for the Identity Provider Redirector authenticator. Set Default Identity
Provider to the alias of the identity provider you want to automatically redirect users to.
If the configured default identity provider is not found the login form will be displayed
instead.
ngAuth can broker identity providers based on the OpenID Connect protocol. These IDPs
must support the Authorization Code Flow as defined by the specification in order to
authenticate the user and authorize access.
To begin configuring an OIDC provider, go to the Identity Providers left menu item
and select OpenID Connect v1.0 from the Add provider drop down list. This will
bring you to the Add identity provider page.
You must define the OpenID Connect configuration options as well. They basically describe
the OIDC IDP you are communicating with.
You can also import all this configuration data by providing a URL or file that points to
OpenID Provider Metadata (see OIDC Discovery specification). If you are connecting to a
ngAuth external IDP, you can import the IDP settings from the url
<root>/auth/realms/{realm-name}/.well-known/openid-configuration.
This link is a JSON document describing metadata about the IDP.
ngAuth can broker identity providers based on the SAML v2.0 protocol.
To begin configuring an SAML v2.0 provider, go to the Identity Providers left menu
item and select SAML v2.0 from the Add provider drop down list. This will bring you to
the Add identity provider page.
You must define the SAML configuration options as well. They basically describe the SAML
IDP you are communicating with.
You can also import all this configuration data by providing a URL or file that points to the
You can also import all this configuration data by providing a URL or XML file that points to
the entity descriptor of the external SAML IDP you want to connect to.
SP Descriptor
Once you create a SAML provider, there is an EXPORT button that appears when viewing
that provider. Clicking this button will export a SAML SP entity descriptor which you can use
to import into the external SP.
http[s]://{host:port}/auth/realms/{realm-name}/broker/{broker-alias}/endpoint/descriptor
6.1. Introduction
The User Management module is used to handle user authorization in the system (user
authentication is managed by the NG|Auth module) through user roles.
NG|Screener is always installed in multi-tenant mode, which enables each and every login
to be contextual to one tenant (= one of the hosted banks or specific bank internal unit) and,
as such, isolated from the other tenants.
#--------------------------------------------------------------------
# Multi-Tenancy
# List of tenants, must be in Upper case. Must not be empty.
# Example: multiTenancy.tenants = TENANT1,TENANT2,TENANT3
multiTenancy.tenants = DEFAULT
Each of the tenants defined through the multiTenancy.tenants property should have a
corresponding realm in NG|Auth.
NG|Auth, additionnaly to pure authentication management (i.e. checking that a user is who
she claims to be), also associates so-called roles to each user. Roles are only plain names
at this level. Default installation only creates one role, NG_Admin.
In NG|Screener, this mapping may be customized through the UI: new applicative roles may
be defined there (their name must correspond to role names configured in NG|Auth), and
corresponding functionalities associated to them.
The script createRoleKeycloak.py must be used to add roles with roles inside
NG|Auth. The parameters are the following:
User creation
The script createUserKeycloak.py must be used to add users with roles inside
NG|Auth. The parameters are the following:
7.1. Introduction
NG|Admin is the preferred tool to perform operations through the command line instead of
using NG|ScreenerUI, where only a small subset of administration operations is available.
NG|Admin can be used locally or remotely. For example you can add, delete, extract and list
the controls or channels/targets to/from the server.
It can be reached using the ngadmin command line tool directly with the command name
and parameters - it allows to issue commands quickly without the need to specify
credentials, since those are found in two files located in the current user’s home directory:
Its first parameter must be the NG|Screener user to use for login, whereas
the password can be supplied by one of the following means:
$ generate_ngadmin_credentials.sh MyUser
Please enter MyUser's password:
Once the script has gathered both user name and password, it will generate
the corresponding files for the current user (i.e. the user running the
script).
7.2. Multi-tenancy
There is a tenant parameter on the wrapper script’s command line (the user is, as usual,
taken from the ~/.ngadmin/ngadminUser, password from
~/.ngadmin/ngadminPassword file)
NAME DESCRIPTION
aggregator_exportProfilingAggregations Export profiling aggregations
aggregator_importProfilingAggregations Import profiling aggregations
aggregator_renameProfilingAggregation Rename profiling aggregations
aggregator_recomputeAggregations Recompute profiling aggregations
aggregator_recomputePeerGroups Recompute profiling peer groups
control_addClassification Add Documentation
control_addControls Add the controls
control_addOrUpdateControls Add or update controls
control_addTargets Add the report targets
control_delControls Delete the controls
control_delTargets Delete the report targets
control_exportSolutionsDoc Export Solutions document
control_exportReports Run controls and export their output (report)
as JasperPrint, PDF, and a PNG thumbnail of
the first page
control_extractControls Extract controls
control_extractTargets Extract the report targets
control_listControls List the controls
control_listSolutions List Solutions
control_listProfilingVariableWeights List profiling controls' variable weights
Escape any spaces in command arguments or options with a backslash character (\).
For example, the following command exports all targets named My Target:
7.5.1. Syntax
ngadmin showDaemonVersion
7.6.1. Syntax
• Add control(s):
Command: control_addControls
Usage examples:
Command: control_addOrUpdateControls
Usage example:
• Delete controls:
Command: control_delControls
• SOL_1/control1
• SOL_1/contr*
• SOL/*
• …
Usage examples:
• Extract controls:
Command: control_extractControls
• SOL_1/control1
• SOL_1/contr*
• SOL/*
• …
Usage example:
# export all controls following given pattern into given ZIP file
ngadmin control_extractControls -f /home/MyReports.zip '*/*/My*'
Command: control_exportReports
-s, --save Format(s) to save output in, as a list among pdf, png,
csv and jrprint (use the option several times to ask for
several formats)
• SOL_1/control1
• SOL_1/contr*
• SOL/*
• …
Usage example:
# run all controls following the given pattern, for the given time
# frame, without exporting any actual report (no --save option
# used), although PDF reports are of course generated and accessible
# from the UI later
ngadmin control_exportReports --from 2014-01-25T00:00:00 \
--to 2014-01-30T23:59:59 'MySolution/*'
Command: control_exportSolutionsDoc
• SOL*
• *
• …
Usage example:
• List controls:
Command: control_listControls
<arg>… List of IDs or patterns to search for the controls (if empty,
all controls will be listed)
• SOL1/control1
• SOl/contr*
• SOL/*
• …
Usage examples:
# list all controls with specific naming convention, wherever they are
ngadmin control_listControls "*/My*"
# list all controls using several filters (only one match is necessary
# for a control to be listed)
ngadmin control_listControls '*/*open*' '*/*net*'
• List Solutions:
Command: control_listSolutions
• SOL*
• *
• …
Usage examples:
Command: control_listProfilingVariableWeights
<arg>… List of IDs or patterns to search for the controls (if empty,
all simple profiling controls will be listed)
• SOL1/control1
• SOL/contr*
• SOL/*
• …
Output is a JSON file like the following one (prettyfied here to ease reading):
[
{
"name":"Pr03 - Unusual Applications",
"id":99,
"variables": [
{
"id":1184,
"type":"AGGREGATION",
"name":"application_day",
Usage examples:
Command: control_setProfilingVariableWeights
-f, --file JSON file from which to extract the new variable weights
As input file, a JSON file like the following one (prettyfied here to ease reading) is
expected:
{
"application_day": 3.14,
"user_application_day": 2.71,
"user_source_ip": 1.414,
"user_source_terminal": 1.732
}
In case the variables' names do not match those in the control, the operation fails. This was
made so to prevent accidental resetting of one control’s variables' weights with data
actually intended for another control.
Usage examples:
Following commands are used to import, export and delete report targets.
7.7.1. Syntax
Command: control_addTargets
Usage examples:
Command: control_delTargets
• channelType1/channelName1/target1
• channelType*/*/target*
• */*/*
• …
Usage examples:
Command: control_extractTargets
• channelType1/channelName1/target1
• channelType*/*/target*
• */*/*
• …
Usage examples:
Command: control_listTargets
<arg>… List of IDs or patterns to search (if empty, all targets will
be listed)
• channelType1/channelName1/target1
• channelType*/*/target*
• */*/*
• …
Usage examples:
# list all targets with specific name prefix, wherever they are
ngadmin control_listTargets '*/*/my*'
The following are some useful utilities to interact with the control database.
7.8.1. Syntax
• Add classification:
Command: control_addClassification
Usage examples:
Command: control_removeOldExecutions
Usage example:
Profiling aggregations are used by profiling controls. A profiling control may utilize one or
more profiling aggregations. When importing/exporting a profiling control by using
ngadmin, it imports/exports its corresponding aggregations, too, so there is no need to
import/export aggregations afterwards.
These import/export commands are useful when you want to import/export a profiling
aggregation which is not associated with a profiling control, or in case you want to
import/export an aggregation without importing/exporting the associated profiling control.
Command: aggregator_exportProfilingAggregations
This command is used to export all profiling aggregations into an XML file on the
filesystem. The -f option is used to indicate the destination file path (location should be
writeable by the ng-screener user).
Usage example:
Command: aggregator_importProfilingAggregations
This command is used to import profiling aggregations from an XML file. The -f option is
used to indicate the source file path, which should be readable by the ng-screener user.
Usage example:
Command: aggregator_recomputeAggregations
This command is used to recompute some or all of the profiling aggregations. The
recalculation process itself is performed asynchronously.
Usage example:
Command: aggregator_renameProfilingAggregation
Usage example:
Profiling peer groups are used by profiling controls. A profiling control may utilize one or no
profiling peer group.
Command: aggregator_recomputePeerGroups
<arg>… Peer group names separated by spaces (if empty, all peer
groups will be recomputed)
This command is used to recompute some or all of the profiling peer groups. The
recalculation process itself is performed asynchronously.
Usage example:
Command: control_importSolution
-ot, --owner-type Type of forced owner ('user' vs. 'role', default to 'user')
--exclude-targets If true then don’t import targets and channels. Default: false
This command is used to import solution ZIP containing definition of: * controls *
aggregations * targets * channels
Usage example:
Command: dashboard_listDashboards
Accepted values:
• control
• forensic
• all
• dashboard1
• dashboard*
• …
-h This help
Usage example:
Command: dashboard_importDashboards
-h This help
Usage examples:
Command: dashboard_exportDashboards
Accepted values:
• control
• forensic
• all
• dashboard1
• dashboard*
• …
-h This help
Usage example:
Command: referencedata_reloadCaches
This command is used to reload caches in the system by executing the corresponding
queries. Notice that it does not reload the configuration files. If we need to reload the cache
configuration files, a restart of NG|Screener is required.
When reloading a cache, it only adds/updates new values and does not remove non-existing
values. It may leave stale entries in the cache. To reload cache with up-to-date values, use
the --clear option to clear cache before reload.
Option -g is used to reload specific cache groups. Its parameters are the names of the
cache groups. Those parameters are separated by a comma or provided through several -g
parameters. If this option is not specified, it reloads all cache groups in the system.
Usage example:
Command: referencedata_listCacheEntries
-t, --keyFormats Formats of input keys, currently only used for date formats
-v, --values Value column names, used to restrict the listed values
associated with the cache keys
Usage examples:
Command: referencedata_listCaches
Usage example:
Field mapping is a module to map technical name to business name in ngScreener UI.
Command: fieldMapping_importFieldMapping
User can remove all field mappings by importing empty json array ( [] ) file with -force
option.
Remember to restart ngScreenerUI after executing this command to see updated changes!
Command: fieldMapping_exportFieldMapping
Command: datacapturealerting_listAlertingPolicies
• policy1
• policy*
• …
-h This help
Usage example:
Command: datacapturealerting_importAlertingPolicies
-h This help
Usage examples:
Command: datacapturealerting_exportAlertingPolicies
<args> Policies’s name patterns to export (if empty, all policies will
be exported)
• policy1
• policy*
• …
-h This help
Usage example:
When finishing polling, the polling saves its last status to the status file. Then at the next
poll, it only polls the new logs from the last status, avoids polling duplicated logs in the
server. For more information on the polling system, please refer to its own documentation
ngPollingSystem_Admin_Guide.
This command presents the contents of all .pollstatus and .nextpoll files located in
a specific folder.
Syntax
Command: polling_listStatus
-o, --outputFormat Next poll time output format (for instance dd-MM-yy
HH:mm:ss z, which is the default)
Usage examples:
Syntax
Command: polling_readPollStatusFile
Syntax
Command: polling_readNextPollFile
-o, --outputFormat Next poll time output format (for instance dd-MM-yy
HH:mm:ss z, which is the default)
# read and display the next poll time from given file
ngadmin polling_readNextPollFile \
-f /home/user/pollingStatsBackup/test@test2.nextpoll
# read and display the next poll time from given file with given format
ngadmin polling_readNextPollFile \
-f /home/user/pollingStatsBackup/test@test2.nextpoll \
-o "dd-MM-yy HH:mm:ss z"
This command updates the content of a .pollstatus file. If the given file doesn’t exist, it
will be created automatically. It doesn’t necessarily mean that the value will be picked by a
running connector. If you want to update the .pollstatus file of a running connector, you
need to stop the daemon, update the status file and restart the daemon to be sure that the
connector picks up the new status.
The polling status file keeps the last polling status. At the next poll, it polls only new logs
since the last status. For that reason, the polling status must be comparable to determine
the new logs to poll.
Syntax
Command: polling_updatePollStatusFile
-t, --type Polling status comparable object type (see above, STRING
being the default)
Usage examples:
This command updates the content of a .nextpoll file. If the given file doesn’t exist, it will be
created automatically.
For that new value to be picked up by the polling system, you need to restart it by running
service polling-system restart.
Command: polling_updateNextPollFile
-i, --inputFormat Time value input format (eg. dd-MM-yy HH:mm:ss z);
time value is expected to be milliseconds since the January
1st 1970, 00:00 GMT, if not set)
Usage examples:
# set the next poll time to Tuesday, September 18, 2012 6:00:00 PM (GMT)
ngadmin polling_updateNextPollFile \
-f /home/user/pollingStatsBackup/test@test2.nextpoll 1347991200000
# set the next poll time to Tuesday, September 18, 2012 8:00:00 PM (CEST)
ngadmin polling_updateNextPollFile \
-f /home/user/pollingStatsBackup/test@test2.nextpoll \
-i "dd-MM-yy HH:mm:ss z" "18-09-2012 20:00:00 CEST"
The next sections present licensing related commands - showing a license’s information,
extracting a C2V file and updating a license.
Syntax
Command: licensing_showLicenseInformation
This command extracts a C2V file from the appliance. This file is needed by NetGuardians to
activate your license.
Syntax
Command: licensing_extractC2V
Usage example:
Syntax
Command: licensing_updateLicense
-s, --skip Skip checking if the file is valid for update (default is false)
To skip checking the C2V file before updating, the -s or --skip options can be used. It is
useful if you want to install multiple licenses on the same machine.
Usage examples:
# update the license with the given file and without checking
# first if the existing license is the same, potentially resulting
# in multiple licenses being installed
ngadmin licensing_updateLicense -s -f /home/user/license.v2c
7.18.1. Syntax
Command: util_encodePassword
When using the util_encodePassword command, it asks for a password in clear text as
an argument.
When using ngadmin wrapper, it does not accept the password in clear text as parameter -
it asks for the password in a safe prompt instead. This prevents the clear text password to
be captured in command history.
This command recreates the search index used by NG|ScreenerUI from scratch. It can be
used to solve problems with objects that cannot be found through the search functionality
in the user interface. Objects (SOL’s, Controls) have to be indexed in order to be searchable.
If one object is modified outside of NG|ScreenerUI or NG|Admin, this command might be
helpful to make those changes visible to the search component.
Command: search_reindexAll
Usage example:
This section presents commands that are useful when working with NG|Storage
It might happen that new log files are added to log-collector with a past date (when, for
example, a new service has been configured) or data was pruned using
data_removeEntries. data_launchInitialProcessing forces a re-analysis and
reloading of the missing entries in NG|Storage.
Syntax
Command: data_launchInitialProcessing
Usage example:
There are various cases where partially removing some data might come in helpful:
Without a host/service and time frame provided, this command will remove
all the data from NG|Storage.
Syntax
Command: data_removeEntries
-o, --host Host pattern to select data to clean (* is the default, stands
for all)
-f, --from Date pattern to select data to clean (lower bound), expected
format is dd-MM-yyyy
Usage examples:
Command used to remove violations for specified control from log-collector and
NG|Storage.
There are various cases where partially removing some violations might come in helpful:
• During a POC
• When a control has been deleted and we won’t keep its violations
Syntax
Usage examples:
In case log files are deleted from log-collector and we want data in NG|Storage to be kept
in sync, data_sanitize should be used. The command will remove all the data in
NG|Storage for the specified period and files not present in the log-collector.
Syntax
Command: data_sanitize
Usage examples:
This section presents commands that can be used to import/export forensic filters and
transformations.
Command used to extract all defined forensic transformations and filters to a zip file.
Syntax
Command: forensic_extractFilters
Usage example:
Command used to import forensic transformations and filters from a zip file.
Syntax
Command: forensic_importFilters
Usage example:
8.1. Introduction
Logs are collected and written into /log-collector, then they are used to do forensic,
generate reports. Logs are written in plain text, and could be modified. The NG|Integrity
module is created to detect the modification of logs under /log-collector.
Depend on the running mode, the module could detect that a log file under /log-
collector is modified, or it could detect which log line has been modified, the old log line,
and the new one before and after the modification.
8.2. Overview
• Daemon: Generate the integrity database periodically with a recurrence interval defined
in the configuration file.
• Single: Generate the integrity database once with .log.gz and .log audit trails files
without current day.
• file : Ensure that files were not modified. This mode is lightweight but does not allow to
know exactly which line was changed.
• line : Ensure the integrity line by line. This mode is heavyweight but does allow to know
exactly what was modified.
84 | Chapter 8. NG|Integrity
NG|Screener Administration Guide
defined in the NG|Integrity configuration.
3. NG|Integrity updates the integrity database with the new audit trails.
4. NG|Integrity deletes the output folder after updating the integrity database.
5. NG|Integrity periodically reads the state of the log-collector files and compares with
their signature with the integrity database to ensure integrity is guaranteed.
6. NG|Integrity sends an alert if files were tampered with.
8.3. Configuration
# The signature type is used to create the digester. It can be SHA-1 or Md5, etc
signatureType=SHA-1
# Destination alert
alert_destination=localhost
8.3.2. Parameters
Chapter 8. NG|Integrity | 85
NG|Screener Administration Guide
• service_to_check_n: Defines a service and a host that must be controlled and its
integrity mode (line or file).
• signatureType: The hash algorithm to be used for verification. It can be SHA-1 or MD5.
• integrityDirectory: Directory path where the integrity database structure is created.
• intervalChecking: Number of second between two validations.
• intervalDiscovery: Number of second between two database updates.
• logFolder: Folder that NG|Integrity ensures integrity of.
The option "-v" or "–verbose" can be used to get verbose output. You can find NG|Integrity’s
log file in the following location:
/var/log/ng-screener/ngIntegrity.log
A user wishes to ensure the integrity of the core banking system audit trails (temenosT24).
The user would like to be notified when the stored core banking audit trails files are
modified. For this case, the NG|Integrity’s configuration would be the following:
# The signature type is used to create the digester. It can be SHA-1 or Md5, etc
signatureType=SHA-1
# Destination alert
alert_destination=localhost
After modifying the configuration file, NG|Integrity should be restarted using the command:
86 | Chapter 8. NG|Integrity
NG|Screener Administration Guide
service ngintegrity.ngc start.
Now, the core banking audit trails files’ integrity is being monitored. If an integrity violation
occurs, an audit trail is written under the NG-Integrity service of NG-SCREENER host. You
can define a control to receive an alert when this happens.
Chapter 8. NG|Integrity | 87
NG|Screener Administration Guide
Chapter 9. Reference Data
9.1. Introduction
The data is stored in a key-value format. When a module requires information about a key, it
returns all the corresponding values. For example, Reference Data may store user
information with the key being 'user id'. A request on a specific 'user id' would then return
the corresponding user information (e.g. name, account, department, etc.).
9.2. Overview
When the Reference Data module is initialized, it consults the configuration files to locate
the data sources (e.g. SQL Database, CSV directory, LDAP Directory), then executes the
query on those data sources and stores the results in a specific format. When it finishes
executing those queries and all the data is cached, it is ready to serve other modules.
Data in Reference Data module is constructed as key-value pairs, with a set of key-value
pairs called a 'cache' and denoted as 'keys → values'. 'Keys' and 'values' have multiple
attributes. An example association of user information would be 'user_id → user_name,
branch_id', or 'currency, date → rate'.
A cache is represented as a graph, where 'keys' are the source nodes and 'values' are the
target nodes. From a node, a module can access any attribute in another node if it has a
path from the source node to the target node. For example, with the 2 caches:
• Standard cache: the cache contains entries as key-value pairs and fetches an entry by
exact match. Imagine an account cache which maps account IDs to account names. A
search for an account ID will search for a cache entry with the exact match of that
account ID and if no such account ID is found in the cache, it will return a cache missed
reply.
9.3. Configuration
The module has two types of configuration: module configuration and cache configuration.
Module configuration defines default values for cache parameters, data source parameters,
etc. for cases where they are not defined in the cache/data source configuration. In
addition, it defines global parameters related to the module. Cache configuration defines
the cache structure and how to populate data in the cache. It also defines additional
parameters to manage cache efficiently.
# CACHE CONFIGURATION
defaultCacheInMemorySize = 50000
defaultCacheEviction = LRU
defaultCacheCaseSensitive = true
defaultCacheKeyMatch = exact
defaultCacheKeyClass = String
defaultCacheKeyFormat = ''
# CACHE MANAGEMENT
defaultCacheRefresh = 0
maxReturnedKeys = 1000
cacheSyncIntervalSeconds = 5
The configuration file has 3 main sections: data source configuration, cache configuration,
and cache management. The data source configuration section declares default
parameters of data sources. Each type of a data source (i.e. JDBC, CSV, LDAP) has its
specific parameters. The cache configuration section declares default parameters of a
cache. The cache management section declares other parameters.
• defaultFetchSize: Give a hint to the underlying DBMS about the maximum number
• csvDriver: JDBC driver used to connect to a CSV database. By default, it uses the
csvjdbc driver to establish a connection to a CSV file.
• csvUrlFormat: The URL used to connect to a CSV database. It is built based on java
String format (https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#
syntax) with parameter is the 'path' provided in the CSV data source configuration. In
brief, '%s' in this parameter replaces the whole 'path' value in the CSV data source
configuration. By default, it uses csvjdbc driver to establish a connection to csv file and
the default URL would be jdbc:relique:csv:/path/to/csv/folder.
This parameter is optional. Default value is the static part of url with dynamic part as
parameter 'jdbc:relique:csv:%s'.
• csvDefaultHeaderLine: Define the header columns if they’re not present in the first
line of CSV file. This parameter is only in considered if the 'suppressHeaders' parameter
is set to 'true'.
• defaultCacheEviction: Set the algorithm used to evict existing cache entries from
memory. Possible values:
◦ 'LFU': removes the least frequently used entries from memory
◦ 'LRU': removes the least recently used entries from memory
If the value is 'exact', it will create a simple cache. If the value is 'upper' or 'lower', it
will create a comparable cache.
• defaultCacheKeyClass: Java type of the cache key. The key value extracted from
data source is converted to the key class before storing in the cache. For a normal
cache, the key would be 'String'. For comparable caches (i.e. 'keyMatch' is 'upper' or
'lower'), the key class must be comparable. Possible values: String, Long, Int, Integer,
Short, Byte, Float, Double, Date, IP. IP class denotes for IPv4 address in format
XXX.XXX.XXX.XXX, where XXX is in range 0-255. Input value is case sensitive. Currently,
all supported key classes are comparable.
• maxReturnedKeys: The maximum number of keys returned when getting all cache
keys. It is used to prevent OutOfMemoryErrors when a cache has a big number of keys. If
this value is set to zero or negative, the parameter is ignored, and all keys in cache are
returned.
<cachegroup name="group1"
cacheInMemorySize="50000"
cacheRefresh="86400">
<query>
select col1, col2, col3 from table
</query>
<domains>
<domain forValue="col2">
<allowed value="Whitelisted"/>
<allowed value="Blacklisted"/>
<allowed value="Neutral"/>
</domain>
</domains>
<cache name="col1_col2col3" inMemorySize="30000">
col1 -> col2, col3
</cache>
<cache name="col1col2_col3"
keyClass="string, date" keyFormat=", 'dd.MM.yyyy'">
col1, col2 -> col3
</cache>
</cachegroup>
</cacheconfig>
The 'datasource' tag defines a data source and there can only be one such tag in a cache
configuration file. The 'cachegroup' tag defines a cache group, which may contain multiple
caches. There may be multiple cache groups in a cache configuration file.
The data source configuration contains information on how to connect to a data source to
fetch data from.
The data source is defined by the 'type' property. It may take one of the following values:
Different data source types require different parameters. Following is the list of
parameters of a data source and the scope they apply to.
• jdbcDriver: Set the driver used to connect to the data source. Different DBMS
correspond to different drivers. Some examples of those drivers are listed in Table
[tab:jdbc_parameters].
• url: Set the URL for the data source connection. Different data sources expect different
URL formats. Some examples of JDBC URL formats are listed in Table
[tab:jdbc_parameters]. The URLs themselves take the following parameters:
◦ HOSTNAME: host name of the database server, it could be an IP address or a DNS
name
◦ DATABASE_NAME: the database name on the server
• username: The user name used to connect to the database. It binds with the password
to form valid credentials.
• password: The password used to connect to the database. It binds with the username
to form valid credentials. This entry can be either in plain text or an encoded form
provided by the NgAdmin encodePassword command.
• initialQuery: The initial query executed right after opening a new connection to a
database and before executing the query defined in the cache group. This query should
be idempotent, since it could be executed multiple times before executing the queries
defined in cache groups.
• path: The path to the CSV folder. This folder is used as a datasource and all CSV files in
this folder are served as tables.
Cache group configuration contains information on how to create a group of caches with a
common query. It always has a query specified, one or more caches, and may have default
properties for caches in the same group.
a. Properties
Cache groups inherit all the cache properties defined in the module configuration file. If a
property is defined in the cache group, this value will overwrite the corresponding value
from the module configuration file. Following are all valid properties of a cache group:
• name: The name of the cache group. This property is used to distinguish different cache
groups in the same cache configuration file. Therefore different cache groups must
have different names in the same cache configuration file.
• cacheInMemorySize: Same meaning as the defaultCacheInMemorySize
parameter in the module configuration file. If this parameter is defined in the cache
group, it overwrites the value from the module configuration file and becomes the
default value for all caches in the group.
• cacheEviction: Same meaning as the defaultCacheEviction parameter in
module configuration file. If this parameter is defined in the cache group, it overwrites
the value from the module configuration file and becomes the default value for all
caches in the group.
• cacheCaseSensitive: Same meaning as the defaultCacheCaseSensitive
parameter in module configuration file. If this parameter is defined in the cache group,
it overwrites the value from the module configuration file and becomes the default value
for all caches in the group.
b. Query
It specifies the query to execute on the data source to get the results in multiple columns.
Those columns are then used to build the caches in the same group.
<domains>
<domain forValue="col2">
<allowed value="Whitelisted"/>
<allowed value="Blacklisted"/>
<allowed value="Neutral"/>
</domain>
<domain forValue=...>
...
</domain>
</domains>
This defines that variable "col2" may only take one of the three enumerated values:
"Whitelisted", "Blacklisted" or "Neutral".
d. Caches
Cache configuration defines the cache structure as well as its properties. Cache is built
from the result of the query execution. A cache inherits all properties of its cache group
except for 'cacheRefresh'. Following are all valid properties accepted by a cache:
Also, note that the defaultValues and addMissing attributes are only applicable to
simple caches, and that the number of comma-separated values should be the same as the
number of values defined in the cache (see below).
For comparable caches with multiple keys, only one comparable key is currently supported.
All other keys must have an 'EXACT' type. The comparable key can have either an 'UPPER'
or 'LOWER' type and must be put at the end of the key set. For example, with a cache of
key1,key2 ¬ val1,val2 and attribute keyMatch=exact,lower, key1 has match
type 'EXACT' and key2 has match type 'LOWER'.
Notice that all similar properties in a cache, a cache group and a module configuration have
the same name but different prefixes. The properties in cache have no prefix, the properties
in cache group have 'cache' as prefix and those in module configuration file have
'defaultCache' as prefix.
An example cache structure is col1, col2 ¬ col3, col4, with col1 and col2 as the
keys and col3 and col4 as their respective values. The value for col3 or col4 can be
fetched for the combination of col1 and col2 keys.
As mentioned in the previous section, a cache defines a key and multiple values which form
a cache structure. A cache structure can be represented as a simple graph which where
node is a key/value, and arrows connect key nodes to value nodes. For example, with a
cache structure of node1 ¬ node2, node3, the respective graph would be:
Multiple simple graphs like that compose a larger graph together and form a complete
cache system. For example, the three following caches:
Note that caches are not reloaded on ng-screener start if the cache was already loaded
before and the cache metadata has not been changed.
If a new cache configuration file is added, ng-screener needs to be restarted to load it.
10.1. Introduction
The feeding module provides the ability to normalize events coming from the log-collector
to the business model used for both control execution and forensic investigation.
The feeding module is responsible for making sure that the data needed for a specific task
is available in NG|Storage.
In day-to-day life, the content held in NG|Storage relative to each index pattern is managed
by a time-window (meaning that events that do not lie in this time-window are candidates
for purging). Time-windows are expressed as a number of days in the past from today
(default is 365 days for all of the following index patterns: ngv-*, ngi-*, ngc-* and ngt-
*).
Incoming events from the source systems (mostly through Syslog-NG) are normalized
on-the-fly and continuously contribute to NG|Storage content.
• through the new violations/hits they generate, obviously (this part is clearly outside the
scope of the feeding module, except for the purging-when-out-of-time-window part)
• if the data required by a control’s execution is currently not present in NG|Storage,
◦ either because it lies partially or completely outside of the configured time-window,
◦ or because one service is totally excluded from the feeding module’s stream-like
actions
then the feeding module is also responsible for fetching the required data from the
log-collector, normalizing and feeding it to NG|Storage before the control can be
run. It is therefore possible that the time-window is temporarily overlooked for the
concerned services. In any case the nightly purging operation will erase superfluous
data.
The feeding module relies on an interpretation dictionary provided with each connector,
which contains rules to normalize raw audit trails into the unified data model described
below. For more information on the business model refer to the corresponding chapter.
Each and every event present in Channels (ngc-*), IT Layers (ngi-*), Transactions (ngt-*)
or Violations (ngv-*) indices has the following technical attributes:
For ngv-* indices @timestamp refers to time of violation processing. For all other indices
@timestamp field stores event timestamp.
Each and every event present in Channels (ngc-*), IT Layers (ngi-*), Transactions (ngt-*)
or Violations (ngv-*) indices has the following business attributes:
Events in the Transactions (ngt-*) and Violations (ngv-*) indices have the following specific
attributes:
Events in the Channels (ngc-*) and Violations (ngv-*) indices have the following specific
attributes:
Events in the IT Layers (ngi-*) and Violations (ngv-*) indices have the following specific
attributes:
Events in the Violations (ngv-*) indices have the following specific attributes:
The raw log file may contain technical information such as user id, account number,
while we want to display to end-user rich information such as user name, account
holder. To do that, the feeding module is able to enrich the normalized event with
additional information by adding / replacing the content of some fields by other information.
All those fields mappings are kept in the Reference Data module (Chapter Reference
Data). We need to configure the feeding module to use those mappings and translate the
events.
Some global javascript scriptlets are already supplied by either the deamon itself or the
connectors that need them. Such supplied scripts are deposited in the /etc/ng-
screener/daemon/modules/feeding/translators/scripts folder.
In multi-tenant mode wildcard(*) character is not allowed and tenant in source definition is
mandatory. It contains a sample file in the samples sub-folder. A configuration file looks
like the following example:
<translator>
<key>Initiator_User_Name=user_id</key>
<value>Initiator_User_UserId=username</value>
<value action="replace">Initiator_Process_Pid=branch_id</value>
<value action="append">
Initiator_User_Domain=branch_id.branch_name
</value>
</translator>
<scriptedField overwrite="true">
<field>day_of_week</field>
<script>
["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
"Saturday"][new Date(event['@timestamp']).getDay()]
</script>
</scriptedField>
<scriptedField>
<field>part_of_day</field>
<script>
['0h-6h', '6h-12h', '12h-18h', '18h-24h'][~~(new
Date(event['@timestamp']).getHours() / 6)]
</script>
</scriptedField>
<scriptedField>
<field>transaction_ord_cre_hour</field>
<script>new Date(event['@timestamp']).getHours()
<type>INTEGER</type>
</scriptedField>
</translatorconfig>
• The sources tag specifies the sources of the events for which translators and scripted
fields defined in the file apply; this tag is mandatory and may only appear once
Scripted fields and translators are executed in the order they are defined in the XML file.
10.4.1. Sources
The sources tag may have one or multiple source tags. Each source defines the source
of the event (i.e. host, service) on which the file’s translators and scripted fields will apply.
A source has the format service@host where service is the service name (e.g.
ngaudit, nglicensing) and host the host name (e.g. NG-SCREENER) of the event. Wildcards
(*) can be used to match all the service / host names. For example the source *@* matches
all hosts and services.
10.4.2. Translators
Each translator has one or multiple key and value tags. The keys are used as a
reference to find the corresponding values in the Reference Data module. Keys and values
have the same format, namely Event Field = cache node. The Event Field is a
field of the event, it is case-sensitive. The cache node is a node of the cache system as
described in Section Cache structure. Chain nodes are specified using a dot separation
character to specify the path from the key node to the target node in the value cache node.
Those nodes are used to make requests to the Reference Data module to get the
corresponding values of the specified keys.
The keys are extracted from the key event fields of the current event. From these keys,
corresponding values for value tags are retrieved, which are assigned (using the specified
valueSeparator in case of append, otherwise overwriting any existing value) to the value
event fields of the current event.
We can also configure a global valueSeparator in the feeding module configuration file -
feeding.conf. This value is overwritten by any valueSeparator defined in the
translator definition. Default value for valueSeparator is ", ".
The key tag has three properties regex, class and format:
• regex: the regex applied on the key string. It aggregates all matching groups to form a
new key string if the regex matches, otherwise it skips translating the event. This
property is processed before class and format.
The value tag has one property: action. It accepts the following values (case-insensitive):
• Replace: replaces the requested value in the event field. This is the default value if the
property is not specified.
• Append: appends the requested value to current value of the event field (using the
specified valueSeparator).
• overwrite (attribute): used to indicate whether we need to replace the field value if the
target field already exists. Default value is false.
• import: optional, multi-valued element indicating that more global script(s) should be
imported prior to defining the specific script (from the script field); they are imported in
the order they are defined in the scriptedField element) from the base location
defined by the importBaseDir global translator config element.
• field: used to indicate which field do we want to put the result of the script into
• script: a javascript expression used to compute the value for the field. All existing fields
are available in a javascript array called event (ex.: event['field']).
• type: allows casting value to numeric types. By default, a field’s value is stored as a
string, even though the actual type might be numeric. To force a field to use another
type, use one of:
◦ INTEGER
◦ LONG
◦ DOUBLE
Pay attention to set a proper index mapping type for that field in NG|Storage as well.
Example scripted field with type set and import functionality activated:
<scriptedField>
<field>transaction_ord_cre_hour</field>
<import>common.js</import>
<script>new Date(event['@timestamp']).getHours()
<type>INTEGER</type>
</scriptedField>
When the NG|Screener platform is started for the first time, NG|Storage is empty.
NG|Screener will use a background process called initial loading to populate it with data.
Depending on the configured time-windows, the process loads archive events from the log
collector, normalizes and enriches them before pushing them to NG|Storage. During the
initial loading events are loaded from newest to oldest.
If the NG|Screener daemon is stopped for some reason and then re-started, it will begin its
loading process from the place it has stopped. The time it actually takes to finish the
loading task depends on the performance of the provided infrastructure and volume of data
to be loaded.
NG|Storage creates the following Lucene indices for storing event data:
Example:
• directory: /log-collector/2016/T24Server1/temenosT24Transaction/09-
07-2016.log
• created index: ngt-t24server1-temenost24transaction-20160709
For further information about clearing the NG|Storage database, please refer to ngAdmin
commands: data_launchInitialProcessing and data_removeEntries.
11.1. Introduction
NG|Screener’s control module gives the ability to extract valuable audit information in a
customized PDF report. This module provides automatic and use-case oriented report
generation where users are able to fully customize their reports (add logo, define charts,
add text around reported information, etc.). NG|Screener control module’s objectives are
the following:
• Provide a clear and comprehensive overview of a specific situation (use case driven)
• Group different information in a single document (charts, listings, textual description,
etc.)
• Schedule controls for periodic delivery of reports or for alerting purpose
Three types of controls are currently available in NG|Screener. From simple, PBI ones to
the most sophisticated, Machine Learning ones. They differ vastly in implementation
complexity and, as a result, in capabilities offered:
11.2. Overview
You can setup a control from the ground-up on the control edition page (picture Control
View, Ad. 1) or duplicate the current control (picture Control View, Ad. 2). The duplication
You can pick and configure report templates on Template Selection and Template
Configuration tabs (picture Control Edit, Ad. 2 and 3).
This section provides a brief overview of report templates available within NG|Screener.
This tab is used to configure the layout of the generated PDF report. The following options
are available and a live preview is also provided on the right side of the page:
Simple mode
The template selected on the previous screen can be configured here. The exact layout of
Some elements (titles, descriptions, etc.) may be edited directly in the preview pane on the
right hand side of the page (see Template Configuration (simple), Ad. 1 to 4).
On the left hand side of the page, the timeline, chart and table elements relevant to the
current template can be configured. Mandatory fields are highlighted in red, as shown on
Template Configuration (simple), Ad. 6.
The syntax for the Filter field in the tab’s General section (see Template Configuration
(simple), Ad. 5) conforms to ElasticSearch’s query string syntax (https://www.elastic.co/
guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-
string-syntax). This filter is the place to act on when implementing whitelisting (or, on the
contrary, blacklisting) for PBI controls (for profiling controls, such concepts are only defined
at each profiling variable’s level, not globally), as presented in chapter Managing
whitelisting and blacklisting.
Advanced mode
In advanced mode, the tab looks similar to the following screenshot. Again, the exact layout
of the page depends on the template selected.
The attributes used by the control have to be chosen in Template Configuration (advanced),
Ad. 1. Those fields will be provided to the code as data frames to operate on in Template
Configuration (advanced), Ad. 2. The administrator is then free to implement the control’s
118 | Chapter 11. Control
NG|Screener Administration Guide
functionality in Python 2.7.
The table’s order clause (see picture Template Configuration (advanced), Ad. 3) has a
syntax resembling a standard SQL order-by clause, where the usable column names are
the following:
• section for the section column (i.e. the first one, whatever its actual name in the data
frame)
Note: the section does not exist if the report type is set to Export (as opposed to
Status).
• column1 to columnn (n being the total number of columns in the table) for the table
data.
Other code input fields are also present on the tab, as shown on picture Timeline, chart and
table code in advanced configuration, to describe how the timeline, charts and tables
should be populated.
• Input parameter dfMap is a map (indexed by service name) of all input data frames (as
already brought up above, so-called data frames are matrix-like distributable
representations of data), with columns corresponding to the fields selected in the
Selected Fields area (see picture Template Configuration (advanced), Ad. 1).
• Output result can be either a data frame itself or a map (Python’s dict) of data frames.
Note: Our data frames are implemented using the PySpark library which documentation
can be found on https://spark.apache.org/docs/0.9.0/python-programming-guide.html
Note: The value returned by this function will be provided as an input parameter to all the
other custom functions described below.
aggDF = spark.read.format("es").options(**es_base_read_conf) \
.load(resource="na-my_aggregation/document")
Timeline code
This function receives df as input parameter, which corresponds to what was returned by
the common code's function described above.
The expected output of this function is a data frame with at least one column (any other
column is actually ignored), named @timestamp, containing the data timestamp
(truncating it to the requested interval is taken care of by the framework). The number of
rows associated with a given timestamp (once truncated) will be reflected in the height of
the timeline curve.
This function receives df as input parameter, which corresponds to what was returned by
the common code's function described above.
The function’s output is expected to be a data frame with at least two columns, the first one
being the distinct key, and the second being the number of elements to associate with the
said key. Ordering in decreasing element number and limiting to a maximum number of
different keys is taken care of by the framework.
This function receives df as input parameter, which corresponds to what was returned by
the common code's function described above.
• its first column used for the section headers (only applicable if the report type is set to
Status, not if it is set to Export, as shown on picture Template Selection, Ad. 2).
• all other columns (up to the number of columns selected in the table section of the
Template Selection tab) are table data, in the order they should be present in the
report.
This tab allows to specify aggregation variables, scoring variables, giving them weights and
setting a score threshold that marks an event as a hit (i.e. anomaly).
For discrete variables (i.e. aggregation variables which can only take discrete values, like
day-of-week, currency, country…), following scoring methods are available:
• STATISTICAL
• LOGARITHMIC
The Ignore missing values toggle is only relevant when the aggregation’s dimensions
values, as seen from the to-be-scored data, do not match with any of the aggregated data.
Missing value would describe the case where the aggregated data does
not know of the found customer (yet), or of this (known) customer using the
current transaction’s type before (in the aggregation period).
On the other hand, if this very customer is already known to have used the
same transaction type, just never in that currency (which is not part of the
aggregation’s dimensions, only being the aggregated variable), then the case
cannot be described as a missing value case (see warning below).
In case of an identified missing value case, the variable’s partial score would normally
be set highest (i.e. with a 1), based on the safe side reaction to a behavior never seen
before. After some business analysis, it can very well be decided that such cases should on
the contrary be scored lowest (i.e. with a 0).
This is exactly what activating the Ignore missing values toggle does: ignore the
behavioral anomaly represented by cases identified as new in the axes of the aggregation’s
dimensions.
In the case only the variable's value is seen as new, the partial score will
always be set highest (i.e. as 1), regardless of the value of the Ignore
missing value toggle.
When NG|Screener is down it may miss some of scheduled control executions. This can
happen if a control is scheduled to run more frequently than the downtime lasted. If there’s
a will to recover some or all of the missed runs, two parameters need to be set in the
Control Module’s configuration file (located in /etc/ng-
screener/daemon/modules/control.conf).
A restart of NG|Screener is required for any changes to take effect. In the above case,
maximum of 30 of the newest missed runs will be recovered for each scheduled control
Most of this chapter’s content applies only to simple controls, meaning that advanced ones
will have - like for most other things also - to deal with the functionality explicitly in their
specific python code sections. The part about reference data usage, though, remains
relevant for all kinds of controls.
For PBI controls, blacklisting (resp. whitelisting) an event only consists of having it forcefully
appear (resp. not appear) in the control’s results. Doing so can be triggered through the
Filter attribute on the control editor’s Template Configuration tab.
As an example, the following filter snippets may be added to an already existing filter to
make sure
1. an event with an attribute (aptly named dubious) with a specific value (yes here) always
appears in the control’s output report (= is blacklisted)
(...) OR dubious:yes
2. an event with an attribute (named clear hereafter) has a specific value (sure chosen
here) never appears in the control’s output report (= is whitelisted)
In case of profiling controls, the Filter attributes previously used to have an event be
included in the control’s output report has a slightly different meaning, as it governs which
events get scored, i.e. which events a profiling score is computed for.
Therefore, whitelisting can still be achieved this way, since not scoring an event is a sure
guarantee that it will not end up having a big score and appear in the control’s violations.
Blacklisting is another matter, though, since it not only requires the event to be scored, but
also to be scored high.
White- and blacklisting are rather dealt with at each variable’s level, through the
corresponding settings attached to the variables in the control editor’s Profiling tab.
When activated, each option compares the value of the given attribute (which name is in the
first field) to the given value (entered in the second field). If a difference is found, nothing
specific happens (i.e. variable scoring occurs normally). In case the values are found the
same, though, then the variable’s partial score is forced to 0 (whitelisting) or 1 (blacklisting)
regardless of the other settings. Whatever happens, this partial score will then contribute
to the global score the same way.
If both options are activated, and both expressions match, then blacklisting
wins.
In the general case, the attribute(s) on which the decision about black- or whitelisting is
based is/are not coming straight out of the scored event’s original data (i.e. it first has to be
enriched with it).
In case the trigger value(s) can be deducted from a combination of others from the event’s
attributes, then this combination can be either
There are cases also where the information has to come from other, external sources. One
can think of an explicit list of highly risky, or even embargoed destinations for financial
transactions, for instance (think blacklisting). Or, on the contrary, an explicit list of known
trusted counterparties (state institutions and the like…), which could be used to whitelist
financial transactions.
That’s where reference data come handy. The data source can either be completely
automatic (extracted from a provided CSV file, a database…) or manually managed.
• let’s assume that, in each financial transaction entered in the system, the destination
account is present - using the business model - in the
transaction_receiver_account_id attribute;
• one could build a reference data model, taking an account_identifier as a key, and
an account_status value (status being something like trusted, resulting in
whitelisting, or forbidden, rather resulting in blacklisting, or also unknown, resulting in
normal processing…); this model could be either filled manually or directly fed from an
external data source;
• the content for this referecence data model can then be edited in NG|Screener UI
(Admin menu, Reference data section)
• the event data source’s translator ought to be configured so that it adds a new
transaction_receiver_account_type attribute based on the reference data
content:
<translator>
<key>transaction_receiver_account_id=account_identifier</key>
<value>transaction_receiver_account_type=account_status</value>
</translator>
11.7.1. Context
For simple profiling controls, there is the possibility to activate the so-called event tracking
functionality by setting the appropriate switch in the control’s general configuration panel.
This functionality enables tracking of all events that pass through the control. The goal here
is to gather all the information required to be able to notify external systems about hits or
groups of events without any hits.
11.7.2. Configuration
The following parametrisation options are available for the event tracking functionality:
11.7.3. Usage
When activated and properly configured, a simple profiling control will dump data
consisting of
To understand what a group is, let’s take an analogy with several payments (= events) being
recorded through a single order (= the group). The group is the entity that may be cleaned
from any suspicion as a whole, while suspicious events are still notified individually.
This table is cleaned up by control executions (all data loaded before the beginning of the
current day is systematically removed). For reference, this cleanup process is performed in
the upload_event_tracking_data() function of the simple_profiling.template file.
There are also two other scripts that have to be tuned to complete the analysis, both
located in the /usr/local/ng-screener/python/packages directory:
event_tracking_handling_sample.py
This script - or a spin-off of it - is intended to be run on a regular basis (every few minutes
or so), for instance using the crond daemon. Its parameters are the identifiers of the
controls that must have all declared an event as genuine for the event to be considered
ultimately genuine.
Inside the script, the following variables have to be set properly. They are all gathered at
the beginning of the file for accessibility.
• DB_* (DB_USER and DB_PWD): user name and associated password to access the
ngscreener database where the CTRL_EVENT_TRACKING table is located.
• SP_* (SP_USER, SP_PWD, SP_DBTYPE, SP_DBHOST, SP_DBPORT, SP_DBNAME):
connection information for the external DB where the stored procedures (hence SP as
name prefix) are to be called. SP_DBTYPE currently accepts the following possible
values: mysql, oracle, postgres and sqlserver.
• CM_* (CM_USER, CM_PWD, CM_TARGET): user name, associated password for the
connection to the Case-Manager application, and the name of the "Case Manager"
target to use for custom fields resolution.
Additionally, the following methods, both located in the Handling class close to the
beginning of the file, most probably have to be adapted to each local situation:
All errors encountered during DB access (be it the ngscreener database itself or the one
concerned by the stored procedure calls) are logged in /var/log/ng-screener/event-tracking-
db.log.
This script is another template, this time to check that there are no long-running un-
handled events. It is also intended to be run at regular intervals, with a lower frequency
than the first one, though. The following variables have to be tuned in the script:
• DB_* (DB_USER and DB_PWD): user name and password to access the ngscreener
database where the CTRL_EVENT_TRACKING table is located.
• THRESHOLD_MINUTES: the time (in minutes) above which an un-handled event should
be highlighted (which should be high enough to take into account any treatment action
taking place, and low enough to allow for quick detection if something failed to work).
11.8.1. Context
To be able to tune the profiling parameters, it may be necessary to analyze the statistical
spread of generated scores periodically. This is why all profiled elements are dumped in a
raw CSV file. This is done automatically for simple controls, and has to be done manually
for advanced controls (in the advanced profiling control template, a variable named
PROFILING_AUDIT_PATH contains the name of the directory into which those CSV files
are expected to be dumped).
Special care must be given to use the same columns in the same sorting
order for all runs of a given control so that the daily logs could be
concatenated properly.
Due to the way profiling controls are run (i.e. extending the relevant control’s time frame
towards the past slightly so that we are sure to treat all incoming events, even if they only
arrive after a slight delay), some input data may actually be analyzed more than once,
resulting in duplicates in the generated audit.
A cron job has therefore been added, run every day at 1 AM, which removes duplicate lines
in the previous days' profiling audit logs and compacts the result into one .gz file per
control (identified by its id) per day. It is defined in the /etc/cron.d/profilingAudit
file and refers to the script located at /usr/local/ng-
screener/tools/profiling/profiling_audit_daily_aggreg.sh.
The folder hierarchy where those audit logs are dumped is defined by the
profilingAuditBasePath variable in /etc/ng-
screener/daemon/modules/control.conf. Its default value is
/data/control/profiling).
For a given control run, the PROFILING_AUDIT_PATH variable defining the folder, in
which the CSV log files are dumped is composed of the following components:
1. the control’s id
2. the run date (in form yyyyMMdd)
3. the run time (in form yyyyMMdd_hhmmssSSSSSS)
/data/control/profiling/<id>/<yyyyMMdd>/<yyyyMMdd_hhmmssSSSSSS>
If, for some reason, the de-duplication and compacting actions should be performed earlier
(for a specific run, for example, to be able to examine quickly what the partial scores for a
specific transaction were), the following commands can be used to generate the compacted
file:
(1) changing to the directory where the run audit logs were deposited
(if the aim is to gather all existing runs of the same day into one
archive, one can also use one level higher, i.e. at day level)
(2) finding all *.csv files below current directory, removing duplicate
lines, sorting them, and compacting the resulting CSV file into
/tmp/run.csv.gz.
Controls now all use the Business Data Model in place of the obsolete NGE model.
Moreover, all data available to the controls is stored in ElasticSearch.
As described in section Advanced mode of the Template configuration chapter, the main
data holder is now a data frame, which is a kind of matrix holding rows (the events) of
columns (the attributes).
• filtering: the data frame’s content can be filtered by providing a predicate (on the row
level)
• selecting: only a subset of columns can be kept for each row, reducing the amount of
attributes analyzed
• sorting: changing the data frame’s sorting order
• joining: two data frames may be merged together, generating a third one in the process;
rows from the first two data frames are associated with each other using a join
expression.
One important point about Spark data frames is that they are lazily evaluated (computed
Moreover, data frames are immutable: every operation performed will actually create
another data frame, leaving the original one(s) intact.
Filtering / Limiting
• attribute matching
◦ business_reference attribute value starting with CUST)
df = df.filter(col('business_reference').like('CUST%'))
df = df.filter(df.score > k)
df = df.limit(100)
Selecting
◦ Using a constant
df = df.withColumn('my_new_column_name', lit(42))
df = df.withColumn('my_new_column_name',
col('attribute1') + col('attribute2') * 2)
Sorting
Provided we have a data frame with at least the '@timestamp' and 'currency' attributes, we
would like to sort the data frame by currency (ascending) and timestamp (descending, i.e.
reversed chronological).
df = df.orderBy(df['currency'].desc())
Aggregating / Reducing
When we want to aggregate data (count the number of events with a given characteristic,
sum the transaction amounts per currency, etc.), this is called aggregating or reducing.
For instance, when we have a data frame with columns user_id, trans_type and
amount and we want the number of transactions per user in a column named
'trans_per_user', it could be implemented like:
per_user = df.groupBy('user_id').agg(
count(lit(1)).alias('trans_per_user')
)
If we want the sum and average of the amounts of transactions per user and transaction
type, the following expression could come in handy:
Note: if the aggregated and source data have to be joined again into a single data frame,
please refer to the Joining section below.
Joining two data frames means creating a new one with data from both, following given
association rules (similarly to what joins are in relational databases).
Picking up on the joining point from Aggregating / Reducing section above, to join
aggregated data with the source data, one of the following can be used:
• if the joining columns have the same name in both data frames and an Equi join is
performed:
df = df.join(per_user, ('user_id'))
• if the joining columns have different names, or if the join condition should be a non-
strict-equality comparison, the join condition has to be explicit:
agg = per_user_and_trans_type
df = df.join(agg,
(df.user_id == agg.user_id) & (df.trans_type == agg.trans_type),
'left_outer')
In the last code snippet, the agg variable was only introduced to improve readability and
decrease the overall length of the statement.
Regarding the last parameter of the above join call: its allowed values are the following:
• inner (default): a row in the source data frame will only be carried onto the destination
data frame if it has at least one corresponding row (according to the join condition) in
the data frame being joined
• outer: rows in the source data frame are always carried onto the destination data
frame; if there is no corresponding row in the other data frame, then the corresponding
fields will remain empty
• left_outer: rows in the first source data frame (on the left of the expression) may not
have any corresponding rows in the second source data frame to be presented anyway
in the destination data frame (in case they don’t, the corresponding fields coming from
the source data frame will remain empty); rows from the second source data frame
must have at least one corresponding row in the first source data frame to be
represented in the destination data frame
• right_outer: same as left_outer where the roles of first and second data frames
are inverted
• leftsemi: columns from the second source data frame are never output to the
destination data frame, which is only used to choose the rows from the first source data
frame which have at least one corresponding row (according to the join condition) in it.
Due to the lazy characteristics of actual data frame evaluation, it can happen that the
same calculation has to be done several times, which in turn can cause serious
performance issues.
...
high = df.filter(df.score >= 0.8)
low = df.filter(df.score < 0.4)
...
When evaluating these two resulting data frames (high and low) later, if nothing specific is
done, then the df dataframe itself will have to be evaluated twice. That’s because its
intermediate state is not persisted between the two calls.
One solution is to enable the so-called caching of the data frame which is used in more than
one evaluation tree:
...
df.cache()
high = df.filter(df.score >= 0.8)
low = df.filter(df.score < 0.4)
...
Doing so will ensure that the data frame’s content is somehow persisted so that its
evaluation is only be done twice. Unfortunately, it also comes with a price: in the presented
example, the data frame content is evaluated prior to any filtering, which can make it rather
big. As usual, it is a trade-off between performance and memory consumption…
Processing always starts with Business Data Model in ElasticSearch. This data is fetched by
Spark which does the heavy lifting of data processing (Ad. 1).
Chapter 11. Control | 135
NG|Screener Administration Guide
The process is performed in one of two ways: either batch mode — with plain old Spark
SQL — or Real-time mode, by utilizing Spark Structured Streaming (Ad. 2). It always
results in potential hits being repopulated back to a relevant index in ElasticSearch (Ad. 3)
and the raw results stored in .parquet files. These only serve as a pivot format for report
population run down the road (Ad. 4).
12.1. Introduction
Despite we are using NgStorage to store big data, we still use RDBMS in our application to
store metadata (e.g. control definitions, users information, …). In addition, it supports
transaction and ACID features (i.e. atomicity, consistency, isolation, durability).
NG|Screener uses MariaDB as RDBMS, which has only one database instance (i.e.
ngscreener). It is connected on the default port 3306, and located on the /storage
partition which is normally a RAID5 disk.
• FEEDING_ : these tables contain data for the feeding module (for more information
about feeding, please refer to chapter Feeding)
• UI_ : tables contain ngBrowser data - filters, notifications, etc.
• REALTIMEANALYSIS_ : contain the defined policy and blackout (for more information
about alerting, please refer to NGBrowserGuide chapter Realtime Analysis).
• DCA_: contain definitions of data capture alerting objects
• CONTROL_ : tables store controls configuration (python code, related target, scheduled
information, etc.)
• SECURITY_ : tables hold information about NG|Screener user role and the
authentication method (Local, LDAP)
• databaseConfig.url
This is the jdbc url of the MariaDB server to use.
Default: "jdbc:mariadb://localhost:3306/ngscreener"
• databaseConfig.username
This is the MariaDB username to use.
Default: prelude
mysql
mysql -u prelude -p
After this command a prompt will appear to enter the password of the mysql user. To exit
the mariaDB shell you need to enter the command:
exit
show databases;
use database_name;
show tables;
describe tablename;
desc tablename;
• ngscreener database
• System files
◦ /etc/hosts
◦ /etc/cron.d /etc/cron.daily /etc/cron.deny /etc/cron.hourly /etc/cron.monthly
/etc/crontab /etc/cron.weekly
◦ /etc/ng-screener (this will include all config files of all NG applications)
◦ /etc/syslog-ng-rules
◦ /usr/local/prelude-runtime/etc/prelude-lml/ruleset
◦ /usr/local/ng-screener/ngprocessing/ngmesos/etc
◦ /usr/local/ng-screener/ngprocessing/ngspark/conf
• /log-collector (in case the --data option is used)
• Objects managed by ngadmin commands
◦ UI Forensic Filters (using ngadmin forensic_extractFilters)
13.1.1. Usage
• option
◦ No options: it backups the database, the configuration files and UI objects managed
by ngadmin command. In this mode, it does not backup /log-collector.
◦ --data: in addition to the above case, it backups /log-collector as well. Notice
that the final backup file could be big if it has many logs under /log-collector.
• backupFile: the file could be absolute or relative path. The file name accepts only the
extension .tar.gz. If the file has no extensions, it adds the extension .tar.gz to the
Example
The restore script is used to restore the information stored in the backup script
• ngscreener database
• System files
◦ /etc/hosts
◦ /etc/cron.d /etc/cron.daily /etc/cron.deny /etc/cron.hourly /etc/cron.monthly
/etc/crontab /etc/cron.weekly
◦ /etc/ng-screener (this will include all config files of all NG applications)
◦ /etc/syslog-ng-rules
◦ /usr/local/prelude-runtime/etc/prelude-lml/ruleset
◦ /usr/local/ng-screener/ngprocessing/ngmesos/etc
◦ /usr/local/ng-screener/ngprocessing/ngspark/conf
• /log-collector (in case the --data option is used)
• Objects managed by ngadmin commands
◦ UI Forensic Filters (using ngadmin forensic_extractFilters)
13.2.1. Usage
• option
◦ No options: it restores the database, the configuration files and UI objects managed
by ngadmin command in the backup file. In this mode, it does not restore /log-
collector.
◦ --data: in addition to the above case, it restores /log-collector as well. /log-
collector is skipped if it does not exist in the backup file.
Example
14.1. Introduction
To make the terminology used in NG|Storage clearer, the table below shows a comparison
of the various elements with their RDBMS counterparts:
An ElasticSearch cluster can contain multiple Indices (databases), which in turn contain
multiple Types (tables). These types hold multiple Documents (rows), and each document
has Properties (columns).
14.2. Architecture
The architectural diagram in External Architecture shows how NG|Storage is used in the
context of NG|Screener.
NG|Storage is used as the main data store for Ng|Discover - forensic sessions are run (and
can only be run) on the data loaded in NG|Storage.
It’s secondary role is to serve as the search feature backend of NG|Discover as well.
In ElasticSearch, all data in every field is indexed by default. That is, every field has a
dedicated inverted index for fast retrieval. All those inverted indexes can be used in a single
query, regardless of the types or indices queried.
Fields can be configured to be indexed or not. If a field is set as not indexed then it is only
stored, and such field cannot be searched or used in filters.
Each field has its own type which determines the data type to be stored in storage and the
way the field is queried. For a string field, the type could be text or keyword. For text
type, the field is tokenized into words and can be queried by those words. For keyword
type, the field is stored as a whole and it can be queried only by the whole field.
14.3.1. Indexes
<prefix>-<host_name>-<service_name>-<date>
By default, the daemon uses its own mapping template file to set the correct number of
shards and replica. The default setting is 2 shards and 0 replica for each index.
You can overwrite these settings by creating a new mapping file in /etc/ng-
screener/daemon/indextemplates. The mapping template should have a name
corresponding to the index pattern to match (ng*.json, ngt-*.json, …).
All mapping template files are applied when NG|Daemon starts. New indices matching the
index_patterns will have the settings applied. If an index matches multiple mapping
template files, the file with higher order number takes precedence.
The effective index mapping template applied to each index can be found in Ng|Storage
Admin (check Tools) at https://server_ip/ui/storageadmin/index.html#!/cluster. Click on
the arrow in the top right corner of the index and select show mappings in the popup
menu. Kopf can also be used to update index settings by clicking on edit setting in the
popup menu. Those settings are applied temporarily, and will be reset when ng-storage is
restarted.
NG|Storage has a sliding window of data since we cannot keep all of it indefinitely. Data
kept in NG|Storage takes 8 times the space it would take compressed in log-collector.
Therefore, data in NG|Storage is maintained using a sliding window approach.
Default windows size for NG|Storage is 365 days. During data loading the most recent log
files are loaded first, followed by older ones.
14.5. Tools
It offers an easy way of performing common tasks on an elasticsearch cluster. Not every
14.6. Limitations
NG|Storage uses significantly more disk space to store the same data compared to
compressed files. The rule of thumb is that that data held in NG|Storage occupies 8 times
the amount of disk space as the same data held in /log-collector.
Some examples on how to estimate disk space consumption by NG|Storage are provided
below:
To verify if the API is available, enter the following URL in the web browser:
http://localhost:9200.
Expected response:
Cluster health
Example response:
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 534,
"active_shards" : 534,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 97.8021978021978
}
The most important field here is "status". The cluster health status can be green, yellow
or red. On the shard level, a red status indicates that the specific shard is not allocated in
the cluster. Yellow means that the primary shard is allocated but replicas are not, and
green means that all shards are allocated. The index level status is determined by the
lowest shard status. The cluster status is determined by the lowest index status.
Example response:
{
"test" : {
"mappings" : {
"document" : {
"dynamic_templates" : [
{
"strings" : {
"match_mapping_type" : "string",
"mapping" : {
"index" : false,
"type" : "keyword"
}
}
},
{
"geopoints" : {
"match" : "*_geo",
"mapping" : {
"type" : "geo_point"
}
}
}
]
}
}
}
}
The output provides information like the number of replicas, number of shards, refresh
interval, etc.
Example response:
List indexes
When there are no indexes in the cluster the response will show an empty list:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
You can retrieve documents by some fields. For example, you can get documents where
Inserting documents
Deleting documents
Counting documents
Example response:
In case you have an installation on multiple nodes, here is the procedure to configures
multiple instances of NG|Storage to run as a cluster.
Verify that the communication on port 9300 is allowed between all nodes. You might need to
open the firewall on port 9300 on each node using the following command:
Before starting the daemon, you need to set the replication factor of the indexes. For that
edit your index templates located in /etc/ng-screener/daemon/indextemplates
and change the "number_of_replicas": "0" to the number of replicas you want in
your cluster.
NG|Storage can be fine tunned in multiple ways. But the most common use cases is to tune
the number of indexes/shards in a cluster. Most of the times the performance issues
comes from the "Shards Gazillion" problem. We need to think correctly about the index
partitionning depending of each client needs.
Data in NG|Storage is organized into indices. Each index is made up of one or more shards.
Each shard is an instance of a Lucene index, which you can think of as a self-contained
search engine that indexes and handles queries for a subset of the data in an NG|Storage
cluster.
As the number of segments grows, these are periodically consolidated into larger
segments. This process is referred to as merging. As all segments are immutable, this
means that the disk space used will typically fluctuate during indexing, as new, merged
segments need to be created before the ones they replace can be deleted. Merging can be
quite resource intensive, especially with respect to disk I/O.
The shard is the unit at which NG|Storage distributes data around the cluster. The speed at
which NG|Storage can move shards around when rebalancing data, e.g. following a failure,
will depend on the size and number of shards as well as network and disk performance.
Each shard has data that need to be kept in memory and use heap space. This includes data
structures holding information at the shard level, but also at the segment level in order to
define where data reside on disk. The size of these data structures is not fixed and will vary
depending on the use-case.
One important characteristic of the segment related overhead is however that it is not
strictly proportional to the size of the segment. This means that larger segments have less
overhead per data volume compared to smaller segments. The difference can be
substantial.
In order to be able to store as much data as possible per node, it becomes important to
manage heap usage and reduce the amount of overhead as much as possible. The more
heap space a node has, the more data and shards it can handle.
Indices and shards are therefore not free from a cluster perspective, as there is some level
of resource overhead for each index and shard.
As the overhead per shard depends on the segment count and size, forcing
smaller segments to merge into larger ones through a forcemerge
operation can reduce overhead and improve query performance. This
should ideally be done once no more data is written to the index. Be aware
that this is an expensive operation that should ideally be performed during
off-peak hours.
The number of shards you can hold on a node will be proportional to the
amount of heap you have available, but there is no fixed limit enforced by
NG|Storage. A good rule-of-thumb is to ensure you keep the number of
shards per node below 20 to 25 per GB heap it has configured. A node with
a 30GB heap should therefore have a maximum of 600-750 shards, but the
further below this limit you can keep it the better. This will generally help
the cluster stay in good health.
In summary: To size your NG|Storage cluster you can simply determine the minimum heap
size need for the whole cluster by this function :
The minimum heap to set for a node is 8GB and the maximum is 30GB.
If we were to have one big index for documents, we would soon run out of space. Logging
events just keep on coming, without pause or interruption. We could delete the old events
with a scroll query and bulk delete, but this approach is very inefficient. When you delete a
document, it is only marked as deleted (see Deletes and Updates). It won’t be physically
deleted until the segment containing it is merged away.
Instead, use an index per time frame. You could start out with an index per year (logs_2014)
or per month (logs_2014-10). Perhaps, when your website gets really busy, you need to
switch to an index per day (logs_2014-10-24). Purging old data is easy: just delete old
indices.
This approach has the advantage of allowing you to scale as and when you need to. You
don’t have to make any difficult decisions up front. Every day is a new opportunity to change
your indexing time frames to suit the current demand. Apply the same logic to how big you
make each index. Perhaps all you need is one primary shard per week initially. Later,
maybe you need five primary shards per day. It doesn’t matter—you can adjust to new
circumstances at any time.
When you index a document, it is stored on a single primary shard. How does Elasticsearch
know which shard a document belongs to? When we create a new document, how does it
know whether it should store that document on shard 1 or shard 2?
The process can’t be random, since we may need to retrieve the document in the future. In
fact, it is determined by a simple formula:
The routing value is an arbitrary string, which defaults to the document’s _id but can also
be set to a custom value. This routing string is passed through a hashing function to
generate a number, which is divided by the number of primary shards in the index to return
the remainder. The remainder will always be in the range 0 to number_of_primary_shards
- 1, and gives us the number of the shard where a particular document lives.
This explains why the number of primary shards can be set only when an index is created
and never changed: if the number of primary shards ever changed in the future, all previous
routing values would be invalid and documents would never be found.
All document APIs (get, index, delete, bulk, update, and mget) accept a routing parameter
that can be used to customize the document-to- shard mapping. A custom routing value
could be used to ensure that all related documents—for instance, all the documents
belonging to the same user—are stored on the same shard.
PUT /ng-test/document/1?routing=shardkey
{
"name": "abc",
"title": "lorem ipsum",
...
}
The number of shards in an index is configured at index creation and is immutable. Shards
are configured in index templates with the number_of_shards parameter - see Indexes
chapter.
Changing the number of shards in an existing index requires complete index rebuilding.
To display shards statistics directly from NG|Storage, make a REST API call to the following
URL providing your index name. For example:
http://localhost:9200/_cat/shards/ngt-*?v
Spark is a general-purpose data processing engine that is suitable for use in a wide range
of circumstances. Application developers and data scientists incorporate Spark into their
applications to rapidly query, analyze, and transform data at scale. Tasks most frequently
associated with Spark include interactive queries across large data sets, processing of
streaming data from sensors or financial systems, and machine learning tasks.
Spark is designed to run large scale data processing applications on clusters of machines,
in which it distributes the workload to achieve much faster run time. Despite the fact that
Spark is generally very performant we can face issues with some of our Spark jobs often
failing, getting stuck, and taking long hours to finish. Here is a collection of best practices
and optimization tips for Spark to achieve better performance and cleaner Spark code.
Most of our Spark jobs reads data from Elasticsearch, thus it’s important to be sure that we
do it efficiently. If we use filtering operations properly on Spark DataFrame then Spark can
translate those filters into Elasticsearch query which can speed up reading and processing
only necessary data. An important hidden feature of using Elasticsearch as a Spark source
is that the Spark-ES connector understand the operations performed within the
DataFrame/SQL and, by default, will translate them into the appropriate QueryDSL. In other
words, the connector pushes down the operations directly at the source, where the data is
efficiently filtered out so that only the required data is streamed back to Spark. This
significantly increases the queries performance and minimizes the CPU, memory and I/O
on both Spark and Elasticsearch clusters as only the needed data is returned (as oppose to
returning the data in bulk only to be processed and discarded by Spark).
The following pySpark code will generate shown below Elasticsearch query:
sparkSession.read.format("es").options(**self.es_base_read_conf).load(resource="ngt-
default_finnovaserver-swisscomfinnovacorebankingtransaction-201904")
.select('business_reference', 'transaction_receiver_amount')
.where(col('transaction_receiver_amount') > 128)
{
"size": 10000,
"query": {
"bool": {
"must": [{ "match_all": { "boost": 1.0 } }],
"filter": [
{ "exists": { "field": "transaction_receiver_amount", "boost": 1.0 } },
{
"range": {
"transaction_receiver_amount": {
"from": 128.0, ①
"to": null,
"include_lower": false,
"include_upper": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": ["business_reference", "transaction_receiver_amount"], ②
"excludes": []
},
"sort": [{ "_doc": { "order": "asc" } }]
}
To verify if Spark generates proper Elasticsearch query you can turn on for the moment
logging all queries to certain index.
PUT /ngt-default_finnovaserver-swisscomfinnovacorebankingebankingtransaction-
201904/_settings
{
"index.search.slowlog.threshold.query.warn": "0s",
"index.search.slowlog.threshold.fetch.warn": "0s",
"index.indexing.slowlog.threshold.index.warn": "0s"
}
From now on you will see all queries run on the ngt-default_finnovaserver-
swisscomfinnovacorebankingebankingtransaction-201904 index in the
/var/log/ng-screener/ngstorage/NGELK_index_search_slowlog.log.
Very often Spark is able to figure out to push down filters and selections, even after joins.
See the execution plan for pyspark code to check how actually Spark will apply filters and in
which order it will be executed.
customers = spark.createDataFrame(
[ (1, 1, 'John', 30),
(2, 1, 'Andy', 35),
(3, 2, 'Roger', 40),
(4, None, 'Sarah', 45),
], ['id', 'type_id', 'name', 'age']
)
types = spark.createDataFrame(
[ (1, 'Normal', 10),
(2, 'Premium', 20),
], ['type_id', 'type_name', 'fee']
)
df = (
customers
.join(types, 'type_id', 'left_outer')
.filter(col('age') >= 40)
)
df.explain()
== Physical Plan == ①
*(5) Project [type_id#462L, id#461L, name#463, age#464L, type_name#483, fee#484L]
+- SortMergeJoin [type_id#462L], [type_id#482L], LeftOuter
:- *(2) Sort [type_id#462L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(type_id#462L, 8)
: +- *(1) Filter (isnotnull(age#464L) && (age#464L >= 40)) ②
: +- Scan ExistingRDD[id#461L,type_id#462L,name#463,age#464L]
+- *(4) Sort [type_id#482L ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(type_id#482L, 8)
+- *(3) Filter isnotnull(type_id#482L)
+- Scan ExistingRDD[type_id#482L,type_name#483,fee#484L]
① Read from the bottom-up. The plan is also visible in the Spark web frontend SQL tab
while a job is running.
② Spark pushed down filter before join
But it’s not always possible to push down filters, for example when DataFrame is cached.
Take a look at the example:
print '\nCached:'
No cache:
== Physical Plan ==
*(1) Project [type_id#482L, fee#484L]
+- *(1) Filter (isnotnull(fee#484L) && (fee#484L >= 20))
+- Scan ExistingRDD[type_id#482L,type_name#483,fee#484L]
Cached:
== Physical Plan ==
*(1) Filter (isnotnull(fee#484L) && (fee#484L >= 20)) ①
+- *(1) InMemoryTableScan [type_id#482L, fee#484L], [isnotnull(fee#484L), (fee#484L >=
20)]
+- InMemoryRelation [type_id#482L, fee#484L], StorageLevel(disk, memory,
deserialized, 1 replicas)
+- *(1) Project [type_id#482L, fee#484L]
+- Scan ExistingRDD[type_id#482L,type_name#483,fee#484L]
① Filter is applied after caching data. It could entail with huge performance issue.
To conclude, be careful and pay special attention where do you apply your filters and
columns selection.
A critical component for scalability is parallelism or splitting a task into multiple, smaller
ones that execute at the same time, on different nodes in the cluster. Shards play a critical
role when reading information from Elasticsearch. Since it acts as a source, elasticsearch
connector will create one Spark partition per Elasticsearch / NG|Storage shard. In short,
roughly speaking more input splits means more tasks that can read at the same time,
different parts of the source. More shards means more buckets from which to read an
index content (at the same time).
To sum up, number of shards determines number of tasks in Spark which can read and
process the data, one task is created by Spark to read one partition (shard).
Broadcast join
When joining two tables if one of the join is small enough to fit into memory it is advisable to
broadcast it, to avoid shuffles. And if the tasks across multiple stages requires the same
data, it is better to broadcast the value rather than send it to the executors with each task.
To figure out if your DataFrame is a good candidate for broadcasting check the amount of
data which is shuffled during normal join. Check the Spark UI to do so.
To tell Spark that it can use broadcast join use the broadcast hint in your pyspark code:
joined_data = (
large_df
.join(
broadcast(small_df), ①
'business_contract_name')
)
15.2.3. Tools
Using Spark UI is a good method to track jobs execution and detect performances issues.
See technical guide to read more about Spark history server which is available on
NG|Screener platform.
Chapter 15. NG|Processing | 163
NG|Screener Administration Guide
15.2.4. Spark performance advices in a nutshell
1. make it work
◦ select only necessary columns
◦ filter out data as soon as possible
◦ consider using precomputed aggregations if it’s possible
◦ enrich your data during loading data into system (use scripted fields, translators
etc.)
◦ plan ahead how many data you will expect to have in NG|Screener platform
▪ set proper number of shards for certain indices
2. make it right
◦ have a look at ES queries generated by Spark
◦ review you code again, maybe there is still column or row to drop out
◦ have a quick look at Spark UI
▪ check if the number of created tasks isn’t suspicious
▪ check if all the tasks are busy
◦ remove unnecessary lines in your code such as df.show(), df.explain()
3. make it fast (only if there is a performance problem)
◦ use broadcast if it’s possible
◦ try to find proper value for spark.sql.shuffle.partitions
◦ see an execution plan (df.explain())
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing
across distributed applications or frameworks. It sits between the application layer and the
operating system and makes it easier to deploy and manage applications in large-scale
clustered environments more efficiently. It can run many applications on a dynamically
shared pool of nodes.
15.4. Settings
For POC, we don’t need to run 3 controls at a time, but only 1 with most of the resources.
• Mesos
◦ Set the total memory available as half of the machine in /usr/local/ng-
screener/ngprocessing/ngmesos/etc/mesos-slave/resources/mem.
Example for a machine with 32Gb RAM execute the following command echo
16384 > /usr/local/ng-screener/ngprocessing/ngmesos/etc/mesos-
slave/resources/mem
◦ Set the total cpus to use equals to half of the machine + 1 in /usr/local/ng-
screener/ngprocessing/ngmesos/etc/mesos-slave/resources/cpus
Example for a machine with 8 cpus execute the following command echo 5 >
/usr/local/ng-screener/ngprocessing/ngmesos/etc/mesos-
slave/resources/cpus
• Spark
◦ You need to edit the following settings in /usr/local/ng-
screener/ngprocessing/ngspark/conf/spark-default.conf
▪ spark.executor.memory
Set this value to the mesos mem minus 15% and minus 512m. If Mesos has 16384
available, you should set this value to 13400m.
15.6.1. allocateResource.sh
allocateResource.sh is bash script for calculating CPUs & memory for spark
executors.
When calculation is done, it writes the calculated values to the relevant spark configuration
file.
15.6.2. enableSparkAllocationMode.sh
• dynamicMode: [true|false]
◦ true: enable dynamic allocation mode with mesos
◦ false: enable static allocation mode with mesos
• masterNode: [true|false]
◦ true: this is the mesos master node
◦ false: this is the mesos slave node
• mesosMasterIpAddress: external IP address of mesos master (e.g. 192.168.56.11)
• mesosSlaveIpAddress: external IP address of mesos slave (e.g. 192.168.56.12)
• With dynamicMode=false, we only need to pass the first parameter
16.1. Introduction
The NG|Screener platform of multiple services. Most of these services are Java
applications, managed as Systemd services by the operating system.
16.2. NG|Screener
16.2.1. ng-screener.service
Logs:
Configuration:
The service needs at least 512MB to run but is usually configured to use 2GB of memory.
Exposed ports:
Dependencies:
• NG|Storage
• NG|Processing
• NG|Messaging
• MariaDB
16.2.2. ng-screener-ui.service
Configuration:
The service needs at least 400MB to run but is usually configured to use 1GB of memory.
Exposed ports:
Dependencies:
• NG|Storage
• NG|Discover
• MariaDB
16.3. NG|Messaging
16.3.1. ng-messaging.service
16.3.2. ng-zookeeper.service
Zookeeper service Uses 512MB of memory. Starts zookeeper server on port 2181.
16.4. NG|Discover
16.4.1. ng-discover.service
add an option
NODE_OPTIONS="--max-old-space-size=1024"
# Allows to specify a path to mount Kibana at if you are running behind a proxy. This
only affects
# the URLs generated by Kibana, your proxy is expected to remove the basePath value
before forwarding requests
# to Kibana. This setting cannot end in a slash.
server.basePath: "/ui/ngdiscover"
server.xsrf.disableProtection: true
16.5. NG|Storage
16.5.1. ng-storage.service
NG|Storage service.
The amount of memory it uses depends on the available memory of host machine. A script
is used to calculate that amount for each host called generate-ngstorage-systemd-
env.
16.6. NG|Processing
16.6.1. ng-history-server.service
Uses /usr/local/ng-screener/ngprocessing/ngspark/sbin/start-history-
server.sh for service startup.
Exposes port:
spark.eventLog.enabled=true
spark.eventLog.dir=file:/usr/local/ng-screener/ngprocessing/ngspark/history
spark.history.fs.logDirectory=file:/usr/local/ng-screener/ngprocessing/ngspark/history
spark.history.fs.update.interval=5s
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.maxAge=2d
spark.serializer=org.apache.spark.serializer.KryoSerializer
16.6.2. ng-mesos-master.service
Uses /usr/local/ng-screener/ngprocessing/ngmesos/sbin/mesos-init-
wrapper.sh master for service startup.
Script runs Mesos in master mode, loads environment files, sets logging up and loads
configuration parameters as appropriate.
16.6.3. ng-mesos-shuffle.service
Uses /usr/local/ng-screener/ngprocessing/ngspark/sbin/start-mesos-
shuffle-service.sh for service startup.
The external Shuffle Service used is the Mesos Shuffle Service. It provides shuffle data
cleanup functionality on top of the Shuffle Service since Mesos doesn’t yet support notifying
another framework’s termination.
16.6.4. ng-mesos-slave.service
Uses /usr/local/ng-screener/ngprocessing/ngmesos/sbin/mesos-init-
wrapper.sh master for service startup.
Script runs Mesos in slave mode, loads environment files, sets logging up and loads
configuration parameters as appropriate.
Every mesos slave runs spark and opens port 4040 on the host machine during the
execution of an application. There is a SparkContext web UI accesible at this port that
displays useful information about the application. This includes:
16.6.5. ng-thrift-server.service
Thrift server works all the time as a mesos framework. It is configured to use only one
core. Uses /usr/local/ng-screener/ngprocessing/ngspark/sbin/start-
thriftserver.sh for service startup.
Since version 7.2 of the platform a new so-called pseudo-service has come to life: ng-
platform.service.
It is deployed from the NG|Storage RPM and gathers together all services from the
following list (provided they are installed at all, of course, therefore being present in both
master and slave - in the sense storage and processing nodes can be master or slave in a
cluster - installations):
• NG|Auth
◦ ng-screener-auth.service
• NG|Discover
◦ ng-discover.service
• NG|MapServer
◦ ng-mapserver.service
• NG|Messaging
◦ ng-kafka-manager.service
◦ ng-messaging.service
◦ ng-zookeeper.service
• NG|Processing
◦ ng-history-server.service
◦ ng-mesos-master.service
◦ ng-mesos-slave.service
◦ ng-mesos-shuffle.service
◦ ng-thrift-server.service
• NG|Scoring
◦ ng-scoring-api.service
◦ ng-scoring-api-ui.service
• NG|Screener
◦ httpd.ngc.service
The following commands can therefore be used to manage all these services together:
To access the dashboard list in NG|Discover, use the Dashboard menu on the left side-
bar. The view shows a list of all dashboards available in NG|Screener.
For detailed information on how to create and edit dashboards see Kibana User Guide:
https://www.elastic.co/guide/en/kibana/current/dashboard.html.
Main configuration of dashboards and forensic views (does not apply to control dashboards)
accessible from the left menu in NG|Screener is located at path:
/etc/ng-screener/common/forensicMenu.json.
When creating such a file for a new tenant, one has to make sure it is
readable by the ng-screener user.
{
"title": "sidebar.menu.forensic.violations",
"forensicView": "violations",
"iconClass": "forensic-icon forensic-violations-icon",
"dashboardMapping":["dashboard_ngv_[username]","dashboard_ngv_[rolename]",
"dashboard_ngv"],
"displayMenu": false,
"maxPeriod": "1y",
"order": 0
}
The Home Dashboard is displayed as the default page on NG|Screener UI after user login.
Its definition is stored in NG|Discover under dashboard_ngv name and can be customized
by user or role.
Controls may use dashboard templates to show execution output. A dashboard definition
may be provided as json in control editor or a predefined dashboard from NG_Discover may
be used (see NG|Screener User Guide).
Staring with version 6.1, you need to limit the maximum analysis period per dashboard.
This is configurable per dashboard due to different complexity of each of them. For
example, a simple dashboard can provide analysis for 1 year and a very complicated one
should be limited to only 1 month or even 1 day to be able to display it in a reasonable time.
When creating such a file for a new tenant, one has to make sure it is
readable by the ng-screener user.
In this file, you will find an entry for every forensic view. And for each view, you need to set a
correct value (depending of the client system performances) for the maxPeriod
176 | Chapter 17. NG|Discover Dashboard and Forensic definitions
NG|Screener Administration Guide
parameter.
"maxPeriod": "3M"
The default value (installed with the RPM) is 3 month. But if the client has a very large
number of events, you need to tune this value to have a reasonable response time of the
forensic view (<10s). The units possible are:
• Y for years
• M for months
• D for days
• H for hours
Based on https://www.elastic.co/guide/en/elasticsearch/guide/current/
_preventing_combinatorial_explosions.html
If you have multiple levels of aggregations in a visualization, you need to change the search
strategy from depth-first to breath-first. For that, you need to put the following json:
{
"collect_mode" : "breadth_first"
}
18.1. Introduction
18.2. Structure
The Business Data Model contains four types of records: Layers, Channels, Transactions and
Violations. All of them share a common set of fields. That common set of fields allows for
the creation and navigation of forensic views regardless of the actual type of the data. Other
than that, each data type has its own, specific set of fields. It is important to note that this is
neither an exhaustive nor a complete set - it just defines the minimum set of attributes that
service has to provide. The diagram below show the exact set of fields and their common
subset (Figure Business Data Model structure):
18.3. Mapping
To check if the services needed by NG|Screener are running, run the following:
• Check NG|Screener:
This section provides information about the location of all NG|Screener configuration files.
The individual configuration files are grouped into the following categories:
19.3.1. General
19.3.2. Licensing
19.3.5. Feeding
19.3.6. Control
19.3.7. Updater
• /etc/ng-screener/daemon/modules/realtimeAnalysis.conf: restart
NG|Screener to apply changes
19.3.9. UI
tail -f /var/log/ng-screener/daemon-all.log
Since it’s a Java application, errors usually include a stacktrace of where the error
happened. If an issue occurred, please send this file to support@netguardians.ch to
seek support.
• Disk space: Check the amount of available disk space with df -h.
• Resource usage: Check the load of the system with top. The load average indicator
should be below 1.00 in normal circumstances.
top - 15:12:07 up 11 days, 23:22, 3 users, load average: 0.00, 0.00, 0.00
Tasks: 121 total, 2 running, 119 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2058764k total, 1965376k used, 9 3388k free, 445544k buffers
Swap: 8193140k total, 80384k used, 8112756k free, 653448k cached
In some time zones MariaDB is not able to use the system time zone defined in the system.
The symptoms include failing connection attempts when using JDBC (CLI access usually
works fine however), for instance in the event tracking handling scripts, throwing an
exception/message which looks like the following:
One solution is to explicitly set the server’s time zone on the client’s JDBC connection URL.
Apart from being conceptually awkward (the client side having to know where the server is
located geographically), this solution only works if all connections to the database are made
sure to assume the same server time zone (including those not made through any JDBC
layer, directly using the command line interface). The following parameter can be added to
the connection URL: &serverTimezone=UTC.
A better solution is to set a proper time zone on the server itself without expecting any
configuration from the clients. Since the system time zone in our case does not seem to be
recognized properly, it has to be explicitly set. Three steps are necessary:
The script shown above can also be used to upload time zone
information one by one, if only one is missing, for instance, or if it is not
acceptable to upload all system-known time zones into the database; as
a second, optional parameter, the script also accepts the time zone
name to be loaded from a directory (given as first parameter).
2. Select the time right time zone and set it in the server’s configuration
[mysqld]
default_time_zone=Europe/Zurich
Once this has been done, connections to the database using the JDBC API should occur
normally, with the right time zone conversions between client and server if necessary.
In usual circumstances, when the configured partition size is big enough and there is
enough data, all partitions are expected to be balanced. But that does not mean that there
aren’t any cases where the partitions are hugely imbalanced. In such a case there are two
configuration parameters in /etc/ng-screener/common/ngStorage.conf that can be
used to fine tune this behavior:
In case an aggregation with multiple terms does not seem to take all data into account, you
may have a look a the daemon logs, looking for messages like:
Normally, in this folder you should have a file called netguardians.conf.rpmsave. You
simply need to restore this one and restart httpd.ngc (systemctl restart httpd.ngc)
Like a car, ngStorage was designed to allow its users to get up and running quickly, without
having to understand all of its inner workings. However, it’s only a matter of time before
you run into engine trouble here or there. This chapter will walk through five common
ngStorage challenges, and how to deal with them.
19.9.1. Problem #1: My cluster status is red or yellow. What should I do?
Cluster status is reported as red if one or more primary shards (and its replicas) is missing,
and yellow if one or more replica shards is missing. Normally, this happens when a node
drops off the cluster for whatever reason (hardware failure, long garbage collection time,
etc.). Once the node recovers, its shards will remain in an initializing state before they
transition back to active status.
The number of initializing shards typically peaks when a node rejoins the cluster, and then
drops back down as the shards transition into an active state, as shown in the graph below.
However, if you notice that your cluster status is lingering in red or yellow state for an
extended period of time, verify that the cluster is recognizing the correct number of
ngStorage nodes.
If the number of active nodes is lower than expected, it means that at least one of your
nodes lost its connection and hasn’t been able to rejoin the cluster. To find out which
node(s) left the cluster, check the logs (located by default in /var/log/ng-
screener/ngstorage/NGELK.log) for a line similar to the following:
Reasons for node failure can vary, ranging from hardware or hypervisor failures, to out-of-
memory errors. Check any of the monitoring tools outlined here for unusual changes in
performance metrics that may have occurred around the same time the node failed, such
as a sudden spike in the current rate of search or indexing requests. Once you have an idea
of what may have happened, if it is a temporary failure, you can try to get the disconnected
node(s) to recover and rejoin the cluster. If it is a permanent failure, and you are not able to
recover the node, you can add new nodes and let ngStorage take care of recovering from
any available replica shards; replica shards can be promoted to primary shards and
redistributed on the new nodes you just added.
However, if you lost both the primary and replica copy of a shard, you can try to recover as
much of the missing data as possible by using ngStorage’s snapshot and restore module. If
you’re not already familiar with this module, it can be used to store snapshots of indices
over time in a remote repository for backup purposes.
19.9.2. Problem #2: Help! Data nodes are running out of disk space
If all of your data nodes are running low on disk space, you will need to add more data
nodes to your cluster. You will also need to make sure that your indices have enough
primary shards to be able to balance their data across all those nodes.
However, if only certain nodes are running out of disk space, this is usually a sign that you
initialized an index with too few shards. If an index is composed of a few very large shards,
it’s hard for ngStorage to distribute these shards across nodes in a balanced manner.
ngStorage takes available disk space into account when allocating shards to nodes. By
default, it will not assign shards to nodes that have over 85 percent disk in use and switch
the ngStorage node to read-only mode.
There are two remedies for low disk space. One is to remove outdated data and store it off
the cluster. This may not be a viable option for all users, but, if you’re storing time-based
Chapter 19. Troubleshooting | 187
NG|Screener Administration Guide
data, you can store a snapshot of older indices’ data off-cluster for backup, and update the
index settings to turn off replication for those indices.
The second approach is the only option for you if you need to continue storing all of your
data on the cluster: scaling vertically or horizontally. If you choose to scale vertically, that
means upgrading your hardware. However, to avoid having to upgrade again down the line,
you should take advantage of the fact that ngStorage was designed to scale horizontally. To
better accommodate future growth, you may be better off reindexing the data and
specifying more primary shards in the newly created index (making sure that you have
enough nodes to evenly distribute the shards).
Another way to scale horizontally is to roll over the index by creating a new index, and using
an alias to join the two indices together under one namespace. Though there is technically
no limit to how much data you can store on a single shard, ngStorage recommends a soft
upper limit of 50 GB per shard, which you can use as a general guideline that signals when
it’s time to start a new index or to split the index to more shards.
ngStorage comes pre-configured with many settings that try to ensure that you retain
enough resources for searching and indexing data. However, if your usage of ngStorage is
heavily skewed towards writes, you may find that it makes sense to tweak certain settings
to boost indexing performance, even if it means losing some search performance or data
replication. Below, we will explore a number of methods to optimize your use case for
indexing, rather than searching, data.
Shard allocation
As a high-level strategy, if you are creating an index that you plan to update
frequently, make sure you designate enough primary shards so that you can spread
the indexing load evenly across all of your nodes. The general recommendation is to
allocate one primary shard per node in your cluster, and possibly two or more
primary shards per node, but only if you have a lot of CPU and disk bandwidth on
those nodes. However, keep in mind that shard overallocation adds overhead and may
negatively impact search, since search requests need to hit every shard in the index.
On the other hand, if you assign fewer primary shards than the number of nodes, you
may create hotspots, as the nodes that contain those shards will need to handle more
indexing requests than nodes that don’t contain any of the index’s shards.
The first thing that new users do when they learn about shard overallocation is to say to
themselves:
I don’t know how big this is going to be, and I can’t change the index size later on, so to
be on the safe side, I’ll just give this index 1,000 shards…
One thousand shards—really? And you don’t think that, perhaps, between now and the time
you need to buy one thousand nodes, that you may need to rethink your data model once or
twice and have to reindex?
• A shard is a Lucene index under the covers, which uses file handles, memory, and CPU
cycles.
• Every search request needs to hit a copy of every shard in the index. That’s fine if every
shard is sitting on a different node, but not if many shards have to compete for the same
resources.
• Term statistics, used to calculate relevance, are per shard. Having a small amount of
data in many shards leads to poor relevance.
If you have a 4 nodes cluster with 16Gb memory on each node, your max
number of shards will be 1600.
You have launched a control since a while, and based on daemon-all.log you don’t see any
progress log. The root cause may be that Mesos is unable to assign resources to Spark.
If no other controls runs and both values are equals to 0, then you have a Spark
misconfiguration. Verify in the spark-default.conf or in the global.env file, how much
memory is set for each Spark executor. Probably, you set a value higher than the number of
CPUs/Memory available (idle state).
Most of the time in POC mode, we have only 1 node but a big amount of data.
20.1. Introduction
This chapter provides information about NG|Screener and open-source licenses and
sources location of open-source software used by NetGuardians.
20.2. NG|Screener
Configuration allows to specify the complexity of password. The parameters below allow for
a fine grained tuning:
Setting lcredit, ucredit ,dcredit ,ocredit to a negative value makes the parameter not
account to the total complexity.
Example:
21.1.1. Important
In order to enforce password policy for root user (which is strongly suggested) it is
necessary to add an "enforce_for_root" parameter.
This appendix provides an example of static data configuration files in version 5.0 and the
corresponding configuration files in versions 6.0 and 7.0. It thus provides a clue on how to
migrate static data from version 5.0 to version 6.0 and above.
<storerPath value="/data/staticdata/"/>
<translators name="userid2name">
<sources>
<source service="temenosT24Protocol"/>
</sources>
<translator name="translator1" connector-ref="connector_jdbc_oracle">
<cache size="1000000">
<load-query>select user_id, username from account</load-query>
</cache>
<initials>
<initial id="userid" name="Initiator_User_Name"/>
</initials>
<target name="Initiator_User_UserId" action="Replace"/>
</translator>
</translators>
The above configuration from version 5.0 and lower, when migrated to version 6 and above
needs to be split into two files. One is the cache configuration file to construct caches and it
is located in /etc/ng-screener/common/referencedata/. The other is the translator
configuration file to translate events, which is located:
<cachegroup name="account_group">
<query>
select user_id, username, branch_id from account
</query>
<cache name="account">
user_id -> username, branch_id
</cache>
</cachegroup>
<cachegroup name="branch_group">
<query>
select branch_id, branch_name from branch
</query>
<cache name="branch">
user_id -> username, branch_id
</cache>
</cachegroup>
</cacheconfig>
And below is the corresponding translator configuration file for version 6.x:
<translatorconfig>
<sources>
<source>temenosT24Protocol@*</source>
</sources>
<translator>
<key>Initiator_User_Name=user_id</key>
<value>Initiator_User_UserId=username</value>
<value>Initiator_Process_Pid=branch_id</value>
<value action="append">Initiator_User_Domain=branch_id.branch_name</value>
</translator>
</translatorconfig>
And finally here comes the corresponding translator configuration file for version 7.x and
above (using business attributes and their naming convention, all lowercase):
<translator>
<key>source_user=user_id</key>
<value>source_user_id=username</value>
<value>source_pid=branch_id</value>
<value action="append">source_domain=branch_id.branch_name</value>
</translator>
</translatorconfig>
• Static data had only one configuration file named config.xml, while the new version
has multiple cache and translator configuration files.
• Each connector tag in static data is migrated to a datasource tag in separate cache
configuration files in reference data.
• connectionRetryDelay tag in static data is provided in minutes, whereas in
reference data it is provided in seconds.
• storerPath tag is moved to cacheLocation property in referenceData.conf.
• For translators with the same source, we can aggregate them into one translator
configuration file.
• Load queries on the same table in different translators can be merged into one query in
cache configuration file
• The notion of cache size in static data is not valid anymore in the new reference data, we
currently use attribute inMemorySize to keep part of the cache in memory. The rest is
kept in disk. Leaving this attribute at its default value should be sufficient in most cases.
• The notion of orderId of translators in static data is not valid anymore. Events will
be translated in the order defined in the translator configuration file.
Static data from version 6.0 is compatible with version 7.0. The only difference is that static
data should be loaded now from /etc/ng-screener/common/referenceData instead of
/etc/ngscreener/common/referenceData.
/etc/ng-screener/daemon/modules/executor.conf
This new file was added to deal with the so-called executor module (able to launch Spark
jobs). Its content is as follows:
/etc/ng-screener/common/ng-screener.conf
The encoding of the logs in log-collector is not configurable any more (it is always UTF-8).
Therefore, the following variable has now disappeared:
• SyslogStorageFileReadEncoding
• cacheLocation
/etc/ng-screener/daemon/modules/{forensic,feeding}.conf
This file was renamed and underwent some modifications. Most notably, all NRT and
forensic-related configuration variables were removed, namely:
• forensicSessionTimeoutInMillis
• controlSessionTimeoutInMillis
• threadPoolSizeNormalization
• jdbcDriver
• serverAddress
• serverPort
• dbUsername
• dbPassword
• all nrt*
# A unique string that identifies the consumer group this consumer belongs to
# Default: ngDaemon
#kafkaGroupId = ngDaemon
# The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if
kafkaEnableAutoCommit is set to true
# Default: 1000
#kafkaAutoCommitIntervalMs = 1000
# The timeout used to detect consumer failures when using Kafka's group management
facility
# Default: 15000
#kafkaSessionTimeoutMs = 15000
# The maximum delay between invocations of poll() when using consumer group management
# Default: 30000
#kafkaMaxPollIntervalMs = 30000
/etc/ng-screener/common/controlCommon.conf
Here, as control templates and forensic templates completely changed forms between the
two releases, the following variables were removed:
• controlTemplatesDirectoryPath
• forensicTemplatesDirectoryPath
/usr/local/ng-screener/targetTemplates
to
/etc/ng-screener/daemon/modules/targetDescriptionTemplates
/etc/ng-screener/daemon/modules/control.conf
• nrtConcurrentOnlineControls
• nrtConcurrentScheduledControls
• joinScriptsEnabled
• joinScriptsDirectoryPath
• joinScriptExecutionTimeout
Some new configuration variables were added, relative to the new way controls are now run
(using Python scripts, Spark and Mesos…):
# The root path to store the results when executing spark script
# Those results will be used to fill jasper report
# Default: /data/control
#controlResultPath = /data/control
# The connection used to connect to the Thrift server to read spark result
# Thrift server is located at the same server as NgProcessing with
# default port 10000
# Default: jdbc:hive2://localhost:10000
#hiveConnection = jdbc:hive2://localhost:10000
/etc/ng-screener/common/security.conf
In those files, a new attribute is now present, called indexPattern, which may take one of
the following values:
/etc/httpd/conf.d/netguardians.conf
SSLCipherSuite EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH
To allow for direct access the the Spark and Mesos consoles from NG|Screener UI, some
new statements were added:
ProxyHTMLEnable On
ProxyHTMLExtended On
SetOutputFilter INFLATE;proxy-html;DEFLATE;
ProxyHTMLURLMap /static/ /sparkhistory/static/
# Proxy to mesos UI
RewriteRule "^/master/(.+)" "/mesos/master/$1" [R]
RewriteRule "^/metrics/(.+)" "/mesos/metrics/$1" [R]
/etc/httpd/conf.d/httpd.ngc.conf
For security reasons, the following statement was added, because the TRACE HTTP verb is
not used at all:
/etc/syslog-ng-rules/syslog-ng.conf
Due to violations being written to the log collector now and the new NG|Messaging product,
the following statements were added.
#Kafka destination
destination d_kafka_normal_r {
program("/usr/local/ng-screener/tools/kafkacat -P -b 127.0.0.1 -t ng-syslogEvents -z
snappy" template(dt_default_r) );
};
destination d_kafka_normal_s {
program("/usr/local/ng-screener/tools/kafkacat -P -b 127.0.0.1 -t ng-syslogEvents -z
snappy" template(dt_default_s) );
};
# Polling logs
log {
source(s_pipe_polling);
destination(d_file_normal_s);
destination(d_kafka_normal_s);
};
The way ngDiscover stores its objects has changed from version 6 and version 7. To support
porting objects from the old version to the new one, we developed a tool called
importDiscoverObjects.py and put it under /usr/local/ng-
screener/tools/migration-script/.
• In v7
◦ Upload the discover_objects.json file generated above to this server (assume
the file is put at /tmp/discover_objects.json)
◦ Run the following commands to import those objects to ngDiscover
cd /usr/local/ng-screener/tools/migration-script
python importDiscoverObjects.py /tmp/discover_objects.json
The following procedure is only applied if the previous installation does not have multi-
tenancy. In case the previous installation is multi-tenant, just skip it and apply the standard
migration.
• Uninstall ngScreener
Here are described the steps to migrate from a version without NG|Auth to a version
including it. Please read it through before starting the first step.
Per default, NG|Auth is installed with only one tenant (named DEFAULT), having only one
defined role (the famous NG_Admin) and one user belonging to that role (name admin,
initial password netguardians).
If the previous installation was multi-tenant (or if the sole existing tenant’s name had been
changed from the default DEFAULT), create the missing tenant(s) in NG|Auth (tenants
correspond to so-called realms there):
To do that, one first has to grab the create_realm.zip file containing the necessary
scripts (provided with the RPM files) and run the following commands:
[root@NG-SCREENER ~] cd /tmp
[root@NG-SCREENER /tmp] unzip /path/to/create_realm.zip
[root@NG-SCREENER /tmp] create_realm/createKeycloakRealm.py \
--realm NEWREALM \
--kc-super-user superadmin \
--kc-pwd netguardians \
--auth-url https://ngscreener.bankdomain.com/auth
All added tenants now have a default admin user (initial password: netguardians)
Additionnally, in case the DEFAULT tenant is not used at all on the installation, one may
want to remove it from NG|Auth entirely (we shall come to that later on).
[root@NG-SCREENER ~] cd /tmp
[root@NG-SCREENER /tmp] unzip /path/to/auth_local_migration.zip
[root@NG-SCREENER /tmp] python auth_local_migration/authMigrations.py
This will take all local (= non-tenant-related) users and roles from the previous MariaDB
database and push them (once for each tenant) into NG|Auth. Tenant-related roles are only
pushed to the relevant realm. Please see next chapter to migrate their LDAP mappings.
It is possible that some errors are raised during the script’s run. In case a user or a role
already exists in the target realm, such an error will be raised, and can safely be ignored
(example of the admin user which indeed already exists):
At the end of the procedure, do not forget to remove the zip file and its exploded version
from the filesystem:
This only applies in case such a configuration existed in the past. Please refer to Section
D.2 for details.
The user migration for SSO login should be transparent for case manager if the
authentification providers are properly configured.
When connecting using SSO in case manager, the user lookup is done using the username
212 | Appendix D: Migration from version 7.1.x to 7.2.x
NG|Screener Administration Guide
defined in Keycloak. If there is no match between the login on CM and username defined in
Keycloak, a new user is created on the fly using the following attributes provided by
Keycloak: username, first name, last name and email address.
Make sure those attributes are defined for all new users, otherwise an error will occur
when you connect with the user for the first time using SSO on Case Manager.
Please make sure to have migrated local users prior to executing that
step.
As already mentioned above, the way the NG|Screener applications will know which tenant
a user is connected to goes through the hostname used to access the application.
In the case where the SSL certificate covers all existing tenants (for instance in a bank
where tenants correspond to business units, all associated to the same top-level domain
name mybank.com - country1.mybank.com, country2.mybank.com… - and the
certificate covers *.mybank.com), the configuration is somewhat simpler: only two virtual
host sections are necessary in the configuration file.
<VirtualHost *:80>
ServerName country1.mybank.com
ServerAlias country2.mybank.com country3.mybank.com
RewriteEngine On
</VirtualHost>
<VirtualHost *:443>
ServerName country1.mybank.com
ServerAlias country2.mybank.com country3.mybank.com
SSLEngine On
SSLCertificateFile ...path-to-the-ssl-certificate-file...
SSLCertificateKeyFile ...path-to-the-certificate-key-file...
#############################################################
# HEADERS
# Change the name of the tenant and duplicate for each tenant
RequestHeader set X-NG-TENANTID "COUNTRY1" "expr=%{HTTP_HOST} == 'country1.mybank.com'"
RequestHeader set X-NG-TENANTID "COUNTRY2" "expr=%{HTTP_HOST} == 'country2.mybank.com'"
RequestHeader set X-NG-TENANTID "COUNTRY3" "expr=%{HTTP_HOST} == 'country3.mybank.com'"
ProxyPreserveHost On
...
# Proxy to ngAuthServer
ProxyPass /auth/ http://127.0.0.1:9090/auth/
ProxyPassReverse /auth/ http://127.0.0.1:9090/auth/
...
</VirtualHost>
If each tenant is actually covered by a specific SSL certificate, then a solution is to duplicate
the netguardians.conf file, once for each tenant. Each copy of the file can then be
adapted specifically for each tenant.
Only configuration files which name ends with .conf will be taken into
account by the httpd.ngc service.
<VirtualHost *:80>
ServerName ngscreener.mybank.com
RewriteEngine On
</VirtualHost>
<VirtualHost *:443>
ServerName ngscreener.mybank.com
SSLEngine On
SSLCertificateFile ...path-to-the-ssl-certificate-file...
SSLCertificateKeyFile ...path-to-the-certificate-key-file...
#############################################################
# HEADERS
# Change the name of the tenant
RequestHeader set X-NG-TENANTID "MYBANK"
ProxyPreserveHost On
...
# Proxy to ngAuthServer
ProxyPass /auth/ http://127.0.0.1:9090/auth/
ProxyPassReverse /auth/ http://127.0.0.1:9090/auth/
...
</VirtualHost>
We could imagine the above example duplicated to another tenant with, in the second copy:
The following command should be used to import each referenced certificate to the java
keystore. Of course,
In order to configure LDAP users and/or roles, one must access the specific NG|Auth UI.
This can be done in two ways, depending on the current state of the installation.
• if the new version of the NG|Screener packages were not installed yet, the only way to
access the NG|Auth administation console is through usage of a SSH tunnel.
From a local shell (linux machine or Windows machine with Cygwin installed) or using a
specific PuTTY session configuration (Windows machine), a local port has to be mapped
to the appliance’s port 9090 (bound only on the loopback interface, i.e. on localhost).
As an exemple, the following command binds local port 9999 to the appliance’s port
9090, so that using the http://localhost:9999/auth/admin URL in a local
browser will connect to the appliance’s NG|Auth administation console:
• if the new version of the NG|Screener packages was installed already, and the
httpd.ngc service restarted, then the administration console may be reached directly
on the public appliance’s network address at the following URL:
https://appliance.client.com/auth/admin
Right after the login page, choose the tenant (denoted realm in NG|Auth) for which the
LDAP configuration should occur.
Once the tenant/realm has been chosen (the steps will have to be taken for each concerned
Appendix D: Migration from version 7.1.x to 7.2.x | 219
NG|Screener Administration Guide
tenant), click on the User Federation section on the left pane, and choose a new LDAP
provider.
Fill the presented properties, not missing at least the following set (and reading the tooltips
popping-up when hovering the mouse pointer):
The password entered here should be the plain-text version, not the
potentially encrypted form present in the /etc/ng-
screener/common/security.conf configuration file.
Encryption/hashing will now occur at NG|Auth level.
• Periodic Full Sync and Periodic Changed Users Sync: consider activating
one or both of these settings to enable synchronization between the LDAP provider and
NG|Auth (when users are created, removed…)
When all fields have been filled properly, click the Save button to make the setting
persistent. From this moment, a new Mappers tab appears for the LDAP configuration.
The role names to which the LDAP groups will be mapped are taken from
an attribute on the LDAP group.
This constraint did not exist in previous versions of NG|Screener where role
names could be set arbitrarily, independently of the actual LDAP groups'
attribute values.
• Name: unique name for the mapper, no specific meaning except for documentation (and
maintenability), mandatory attribute
• Mapper Type: should be role-ldap-mapper
• LDAP Roles DN: DN of the hierarchy where the LDAP groups may be found (does not
have to cover all groups, as several mappers of the same type may be added to a LDAP
provider, each of them potentially covering only part of the whole spectrum)
• Role Name LDAP Attribute: name of the attribute found in the LDAP group which
value is used to build the role name
• Role Object Classes: LDAP class(es) to qualify an LDAP entry as a group in the
given hierarchy of objects
Once connected to the NG|Auth admin console (see previous chapter), one can destroy
realms that are not used any more, by clicking on the small bin icon close to the realm
name (somewhat reddish highlighted on the following screenshot).
This brings the huge advantage of allowing Single-Sign-On functionality to the appliance, but
also has an impact on all various scripts which want to connect to those applications as
well.
Especially for python scripts, a new class has been defined which intends to abstract those
SSO considerations when authenticating to the applications (to access their respective
REST-API endpoints, for instance).
E.1. Authentication
• line 1: import of the AuthToken class (it may be necessary to either put the
authtoken.py files in a system location or explicitly add its location to the system
path, using a construct similar to
import sys
sys.path.insert(0, '/usr/local/ng-screener/tools/auth')
• lines 4 to 6: user (extisting in the tenant defined at line 6) name and password required
Examples:
The following example calls for the list of available channels in from the NG|Screener
application (through the UI application) and displays a pretty JSON representation of the
returned list on the standard output:
The following example calls for the list of existing issues from NG|CaseManager and
dumps them to the standard output:
1 import json
2
3 ...
4
5 url = 'https://ngscreener.mybank.com/cm/issues.json'
6 headers = {
7 'X-Redmine-API-Key': '862af85646b3a929d94b7601a72c33eba52e4a5d'
8 }
9
10 with AuthToken(**param) as token:
11 response = token.get_call(url, headers)
12 if response.status_code == 200:
13 print json.dumps(response.json(), indent=2, separators=(',', ': '))
14 else:
15 print 'Error code returned: %d' % response.status_code
E.4. Troubleshooting
Normally, each time a request is sent through the token instance, the OAuth2
authentication token is refreshed if it happens to have expired since last request was made.
This refreshing is only attempted once (per request) if the request’s return code is 401. In
case another reason was responsible for the 401 return code, the second attempt will fail
again, and token refreshing will not be attempted again, resulting on the 401 response to be
transmitted to the caller.
To have more information about what it actually going on, it may be worth activating the so-
called DEBUG mode, which will dump to the standard output many things.
To activate this DEBUG mode, just configure python’s logging mechanism accordingly:
1 import sys
2 import logging
3
4 logging.basicConfig(stream=sys.stdout,
5 format='%(asctime)s [%(levelname)s] %(message)s',
6 level=logging.DEBUG)
To allow the new client’s access tokens to enable access to other applications, one needs
so-called client scopes. Preferably one client-scope per targeted application (so that they
can be re-used and mixed for several client if necessary later).
Per convention, the existing client scopes created specifically for this purpose on our
solution are named targetApplication-audience. Per default only the ngDaemon-
To create a new one, please follow those steps (example here with a ngBrowser-
audience client scope):
1. Fill the scope name, check that the protocol is openid-connect and save
2. Switch to the Mappers tab and add a new mapper of type Audience (the name is
mandatory and should be explicit, obviously, but does not have to match the client
scope’s), for which the Included Client Audience field should be filled with the
client representing the target application (ngBrowser in our example).
2. Give it a new name (this name will be its so-called client_id) and save
4. Go to the Client Scopes tab and select the required client scopes from the
Available Client Scopes section before clicking on the Add selected button
Done! This client (through its client id) can now be used as seen in Section E.1.
The first reason of this framework comes from the Event Handling solution.
The Event Handling solution runs after the controls execution and that for a good reason.
The goal of Event Handling is to analyze multiple hits generated by some profiling controls
to validate if a hit is really a hit (e.g. ControlsA and ControlB and ControlC raise a hit so this
is really a hit. Result: Create a hit in CM).
Due to this fact, we cannot use the standard way to create a hit (NGScreener target
solution). To solve this problematic, a framework has been put in place to create a hit with a
python code (same language as Event Handling solution).
F.2. Prerequisites
The python exporter framework uses the official channel/target solution for configuration
purpose. To use it, you must create a dedicated target in NGScreener.
To explain this point, we will start by analyzing a code and explain it.
6. import sys
7. if __name__ == "__main__":
8. target_manager = ng_target.TargetManagement()
9. channel_json = target_manager.get_channel("Case Manager")
10. c = channel.Channel()
11. c.parse_channel(channel_json, target_manager.get_target(channel_json,
'Pr02UnusualLocation'))
12. parser = ng_parser.Parser()
13. parser.template_filename = "/usr/local/ng-screener/targetTemplates/" +
c.target.custom_description.template
14. parser.parse_arguments(sys.argv[1:])
We can start by the 5 first lines. These lines are mandatory, all these imports provide the
necessary code to run the hit exportation.
The line 6 is a contextual import (depends of the use case). The explanation for it will come
at the line 14.
Now, we enter in the real subject. We will start by the channel and target part:
• At the line 8, we initialize a object to manage a target element. Basically, this class
provides two important things:
◦ get_channel: With it, we can recover a dedicated channel base on it name.
◦ get_target: With it (and only when we have a channel), we can recover a dedicated
target base on it name.
• At the line 9, we use the get_channel method (describe previously) to recover all the
information in regards of the channel Case Manager in this case and we get also all the
targets related to the channel.
• At line 10, we initialize a object to manage the channel and target element receive from
the TargetManagement object.
• At line 12, we initialize a parser object. This object has two main goal:
◦ The first is to parse the HTML template to generate the description render.
◦ The second is the replacement of the special tokens by the control/source values
(e.g. in target definition, we define that the custom field 3 receive the value of the
column 6 of the control). This second goal is this replacement.
• At line 13, we provide the HTML template source file.
• At line 14, This is a specific line of code. In our example, the don’t use the Event Handler
solution but the NGScreener Command_ channel. This line parses the command
argument to parse the file (basically, in shell script, we have the special variable $1 .. $n
but not in python. To parse that, we pass by this method).
• At line 15, we initialize an object to map the data from the source file into an dedicate
object.
• At line 16, we do the mapping explain at the previous line.
• At line 17, we have all the required elements to create a custom description. Therefor,
we initialize an object to create it at this stage.
Now we have seen a complete example of how to use this framework, now we will see
which parts could change in this example.
The first thing could be the line 14. This part depends of the information source (in our
example, a csv that comes from the control result). Depending of your source, this might
have to change.
The second thing is the line 19-20, the reason is that you could be send to different targets
depending of a certain value in the result (in this case, don’t forgot to instance multiple
channels and targets elements).
F.4. Conclusion
With this framework, you have more granularity to manage the hit creation of
NGCaseManager based on result values.
• Normalization
• Control Execution
G.1. Normalization
• Syslog doesn’t care about timezone, it takes the date/time and write them into
ngMessaging and log-collector.
the syslog process is run at Singapore with timezone GMT+8, the log sent to
ngMessaging and log-collector is
We can see that syslog only take the date/time presented in the message header
without considering its timezone. The event date passed to ngMessaging and log-
collector are local to server’s timezone
Example: the following log line in log-collector means that the event occurs at
26/01/2019 00:00:00 at server time.
{
...
"@timestamp": "2019-01-26T00:00:00+08:00",
"host": "Host",
"service": "Service"
...
}
When importing logs from remote server, try to convert their timestamps
into server’s time before sending them to syslog.
Translator scripted field may extract hour from event and save it as part of
the normalized event. From users' view, this hour is not the same as they
pretend. Example user at Geneva do a transaction at 10am, it’s stored in
server at Singapore as 5pm and the hour part of the event is 17 instead of
10. This leads to the rewrite of control for different tenants on different
timezone. It also affects the aggregations based on hour range buckets.
Example: the user is at Geneva (GMT+1), he schedules a control to run first time at
2019-01-26 00:00:00. The date string on the wire is 2019-01-
25T23:00:00.000Z.
• UI backend: it converts the date string in UTC format into its local date time (no
timezone information).
Example: for the example above, if the server is installed at Singapore (GMT+8), the date
is represented as 2019-01-26 07:00:00 at server time.
• if control execution period is last N TimeUnit, that information is sent to the UI backend,
the control execution period is calculated into absolute period by the backend, using its
local date time. Then it stores this local date time in database.
• if control execution period is absolute date range: similar to the case in section Section
G.3.1, the date in user’s timezone is converted into string in UTC format, then converted
back to local date time at server’s timezone and stored in database in local date time
without timezone.
• Control Execution: similar to UI backend, date presented in this module is in local date
time. When sending date information to NgProcessing module, it converts date object
into string with its timezone information.
• NgProcessing: it processes the string passed from Control Execution module, converts
it into python date object with timezone information, and processes it based on this date
object. All requests to NgStorage are done with timezone-aware.
• NgStorage: timestamps are stored in date objects with timezone information.
The control filter in control Template Configuration and spark custom codes
are intact from those timezone conversion.
If it exports the whole pdf file to NgCaseManager, we have the problem in section Section
G.3.4.
If it exports each line of the report to generate a case in NgCaseManager, the date is
presented as a string without timezone information. Then a date 2019-01-26 07:00:00
at Singapore server is displayed as 2019-01-26 07:00:00 at Geneva.
Jupyter Notebook can be set up with a help of virtualenv. It helps to maintain your system
clean since you don’t install system-wide libraries that you are only going to need in a
Jupyter Notebook environment.
To set up Jupyter Notebook execute the following commands on your VM (logged as the
ng-screener user).
#install requirements
pip install pyspark==2.4.0 jupyter
Then you can just start the jupyter notebook server by typing the command:
jupyter notebook
To access jupyter notebook site locally you need to forward the 8888 port via ssh tunnel:
To activate already existing environment just run the source bin/activate command in
the folder. Then you can start jupyter notebook.