Download as pdf or txt
Download as pdf or txt
You are on page 1of 28



IBM InfoSphere BigInsights

Data security best practices


A practical guide to implementing
data encryption for InfoSphere BigInsights

Walid Rjaibi
Chief Security Architect for Information
Management
Nisanth Simon
InfoSphere BigInsights Software Developer
Monty Wright
Senior Solutions Architect, Vormetric Data
Security

Issued: June 2013


1. Introduction ...................................................................................................... 3
2. Requirements for a data encryption solution .............................................. 3
2.1 Run-time component requirements ...................................................... 4
2.2 Key management component requirements........................................ 4
3. Guardium data encryption architecture....................................................... 4
4. Installing Guardium data encryption ........................................................... 6
5. Configuring encryption policies .................................................................... 6
5.1 Creating a policy for encrypting existing data .................................... 6
5.2 Creating a policy for encrypting new data......................................... 15
7. Conclusion ...................................................................................................... 23
Further reading................................................................................................... 25
Reviewers ............................................................................................................ 26

A practical guide to implementing data encryption for InfoSphere BigInsights Page 2 of 28


1. Introduction
Encryption is the process of storing and transmitting data in a form that only those it is
intended for can read and process. It is an effective way of protecting sensitive
information as it is stored on media or transmitted through un-trusted communication
channels. Encryption is mandatory for complying with many government regulations
and industry standards such as the Payment Card Industry Data Security Standard (PCI
DSS).

In an encryption scheme, the data requiring protection (referred to as plaintext) is


transformed into an unreadable form (referred to as ciphertext) by applying an
encryption algorithm and encryption key. Encryption keys are randomly generated using
a key-generation algorithm.

There are two main encryption schemes: Symmetric encryption and asymmetric
encryption. In symmetric encryption, the same key is used to encrypt and decrypt a
given piece of data. The Advanced Encryption Standard (AES) is an example of a
symmetric encryption scheme. In asymmetric encryption, data is encrypted using one
key (usually referred to as the public key) and is decrypted using another key (usually
referred to as the private key). The Rivest, Shamir, Adleman (RSA) algorithm is an
example of an asymmetric encryption scheme. In practice, asymmetric and symmetric
encryption schemes are often combined to offer an encryption solution. Generally, a
symmetric algorithm is used to protect actual data using some encryption key, and an
asymmetric algorithm is used to protect that encryption key.

While Transport Layer Security (TLS) is widely accepted as the solution for protecting
data in transit, no single solution has achieved similar status for protecting data at rest
although some solutions such as the one described in this paper are clearly emerging as
leaders in this area.

This paper focuses on encryption for data at rest, specifically for data stored within IBM
InfoSphere BigInsights Hadoop. The rest of this paper is organized as follows. Section 2
reviews the requirements for a sound data encryption solution. Section 3 introduces IBM
InfoSphere Guardium Data Encryption (GDE). Sections 4 and 5 describe how to install
and configure GDE to protect data stored within IBM InfoSphere BigInsights Hadoop.
Lastly, we present our concluding thoughts in section 6.

2. Requirements for a data encryption solution


An encryption solution for data at-rest consists of two main components: A run-time
component and a key management component. The run-time component is responsible
for the efficient encryption and decryption of data blocks for which an encryption policy
exists. Data blocks are typically protected with data encryption keys (DEK) that are
stored locally within the run-time component. For example, in a file system, the DEK
may be stored together with the file meta-data. A DEK is typically protected with a
master key that is stored in the key management component. In some encryption
solutions such as GDE, both data encryption keys and master keys are stored in the key

A practical guide to implementing data encryption for InfoSphere BigInsights Page 3 of 28


management component and provided to the run-time component as needed following a
well defined secure protocol.

2.1 Run-time component requirements


To comply with industry standards for encryption, a run-time encryption component
should adhere to the following requirements:
Use FIPS 140-2 level 1 certified encryption modules.
Use NIST SP 800-131 compliant algorithms for encryption, hashing, and random
number generation.
Exchange data with the key management component over TLS after mutual
authentication has been established.
Provide a means for key rotation.
Provide a means for encrypting database backups (for a database system).

Although not mandatory, the following are highly desirable properties of the run-time
component:
The ability to exploit recent innovations in hardware acceleration for
cryptography such as the AES NI on the Intel chip.
The ability to perform in-place encryption to be able to handle existing data in a
non-intrusive way.

2.2 Key management component requirements


To comply with industry standards for encryption, a key management encryption
component should adhere to the following requirements:
Support high availability so that access to data is not lost when the primary key
management component becomes unavailable.
Provide a means for key backup and recovery so that keys can be recovered after
a crash or a major disruption.
Enforce authentication and access control before returning the keys to the
requester.
Achieve FIPS 140-2 level 2 certification in order to meet the requirements of high
assurance environments such as those within government agencies.

Although not mandatory, the following are highly desirable properties of the key
management component:
Allow flexibility in authoring encryption policies (time of day, day of week,
digital signature of executables, etc.).

3. Guardium data encryption architecture


InfoSphere Guardium Data Encryption is a comprehensive data protection solution
which meets all the requirements outlined in section 2. It manages access control to files,
directories, executables, and provides strong encryption of file content. It consists of two
main components (figure 1):

A practical guide to implementing data encryption for InfoSphere BigInsights Page 4 of 28


Security server: This is the central point of administration for encryption, access
control and audit policies
File system agent: It provides encryption and access control services for data in
online storage accessed by file systems

When InfoSphere Guardium Data Encryption is used to protect a database system such
as DB2 or Informix IDS, a backup agent is also provided. The backup agent integrates
with the database system backup command to allow the generation of encrypted
database backups. This ensures that the same data is consistently protected whether it is
online or offline.

Figure 1: InfoSphere Guardium Data Encryption Architecture

An important distinction between InfoSphere Guardium Data Encryption and other


solutions that offer encryption is how the encryption is performed. InfoSphere Guardium
Data Encryption employs a technique in which the file metadata is left in clear text
(unencrypted) while the file content are encrypted. This technique provides an additional
level of file access control in addition to what the file system offersaccess without
viewability. Effectively, an application can be granted access to a file for the purpose of
management without decrypting its contents. Privileged super users can continue to
manage their environments and access the file, but be restricted from having clear-text
access to the file content. This capability helps mitigate risks from internal malicious
activity targeted at sensitive data.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 5 of 28


4. Installing Guardium data encryption
Installing the Infosphere Guardium Data Encryption solution requires installing the
security server component and the file system agent component. The security server
needs to be installed once on the server of your choice. The file system agent needs to be
installed on all the servers where you need protection. For example, if you have a
BigInsights Hadoop cluster of three nodes, then you need to have the file system agent
installed on each of those three nodes. The installation procedure itself is well
documented in the Infosphere Guardium Data Encryption product documentation and is
beyond the scope of this paper. Please check the references section at the end of this
paper for product installation documentation and hardware/software requirements. The
rest of this paper assumes that you have installed Infosphere Guardium Data Encryption
on a supported environment. For the testing conducted as part of this paper, the
environment was Infosphere BigInsights HDFS on Red Hat Linux. This environment did
not include GPFS.

5. Configuring encryption policies


When configuring encryption policies you need to consider whether you have existing
data that needs to be encrypted. If so, you need to create an encryption policy that allows
you to encrypt that data in place. If you dont have any existing data in the files or
directories you are going to protect, then you can skip this step of encrypting existing
data. Your new data will be automatically encrypted by the encryption policies in place
as it is ingested into the files or directories you have protected.

5.1 Creating a policy for encrypting existing data


This section described the steps required to encrypt the existing data. In a nutshell, this
process is about associating an encryption policy with a directory located on a particular
node or host. In the screen shots given below, the directory containing the data to encrypt
is called /Hadoop. Our BigInsights Hadoop cluster consists of three nodes called
hdtest021.svl.ibm.com, hdtest022.svl.ibm.com, and hdtest036.svl.ibm.com
respectively.

__1. Creating the Policy

__a. Log in to the security server administration console as secadmin.

__b. Click on Add Online Policy in Manage Policies

A practical guide to implementing data encryption for InfoSphere BigInsights Page 6 of 28


__c. Click on Action Button and add key_op Key operations and click OK

__d. Click on Effect button and Add effect as permit & apply_key and click OK.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 7 of 28


__e. Click the Add button to add the rule.

__f. Open Key Selection Rules tab and select the key as clear_key and press Add.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 8 of 28


__g. Open Data Transformation Rules tab and select the key as test-aes256-key and
press Add.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 9 of 28


__h. Open Security Rules tabs and click Reset button.

__i. Click on Effects and add deny & audit as effects.

__j. Click Add button to add the rules as shown below.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 10 of 28


__k. Save the policy as NewDataEncryptionPolicy1.

__2. Linking the policy to the host.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 11 of 28


__a. Go to Hosts tab

__b. Click on the host name (hdtest021.svl.ibm.com)

__c. Click on Guard FS Tab and click Guard button. Add the policy and the folder
where the data has to be encrypted. All the hadoop data will be stored under /hadoop
folder.

__d. After adding the guard, refresh ensure that the status in Green

A practical guide to implementing data encryption for InfoSphere BigInsights Page 12 of 28


__3. Performing the encryption on existing HDFS data

__a. Open the terminal and login as root user.

__b. Run secfsd -status guard

__c. Run dataxform --rekey --gp /hadoop

__d. Run dataxform --cleanup --gp /hadoop

A practical guide to implementing data encryption for InfoSphere BigInsights Page 13 of 28


Perform the same operation (section __2 & __3) in other host machines. This is to ensure
that existing data is encrypted across all nodes.

__4. Removing/un-guarding the policy from the host

__a. Open the Host tab and click on host name (hdtest021.svl.ibm.com).

__b. Select the policy and click unguard button

__c. Click the Refresh button to ensure that the policy is deleted.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 14 of 28


Perform the same operation (section __4) in other host machines. This is to ensure that
the policy is removed from all the nodes. At this point, all existing data is encrypted and
we are now ready to create a permanent policy for encrypting any new data that will be
ingested going forward.

5.2 Creating a policy for encrypting new data


In a nutshell, this process is about associating an encryption policy with a directory
located on a particular node or host. In the screen shots given below, the directory
containing the data to encrypt is called /Hadoop. Our BigInsights Hadoop cluster
consists of three nodes called hdtest022.svl.ibm.com, hdtest036.svl.ibm.com, and
hdtest021.svl.ibm.com respectively.

__1. Creating the Policy.

__a. Select Manage Policies and click Add Online Policy

__b. Click on Effect button and add effects as permit, apply_key & auditand
click OK

A practical guide to implementing data encryption for InfoSphere BigInsights Page 15 of 28


__c. Open Key Selection Rules and select key as test-aes256-key and press Add
button as shown below.

__d. Press Add button in Security Rules tab.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 16 of 28


__e. Click on reset button.

__f. Click on Effect and add effects as deny & Audit and click OK.

__g. Click Add button so that effect will be added to the security rules.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 17 of 28


__h. Save the policy

__i. Now the policy is added to the Server

A practical guide to implementing data encryption for InfoSphere BigInsights Page 18 of 28


__2. Linking the policy to the Host.

__a. Go to Hosts tab

__b. Click on the host name (hdtest021.svl.ibm.com)

__c. Click on Guard FS Tab and click Guard button. Add the policy and the folder
where the data has to be encrypted. All the HDFS data will be stored under /hadoop
folder.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 19 of 28


__d. After adding the guard, refresh to ensure that the status in Green

__e. Perform the same operation in section __2 in other host machines. Thus we linked
the policy with all the nodes.

__3. Changing the log info & Host setting in all host machines

__a. Go to Hosts tab

__b. Click on the host name (hdtest021.svl.ibm.com)

A practical guide to implementing data encryption for InfoSphere BigInsights Page 20 of 28


__c. Click on FS Agent Log Tab and change the level in Policy Evaluation level as
INFO.

__d. Click Ok button.

__e. Click on Host Settings and add |trust|* as shown below.

__f. Click Ok button.

__g. Perform the same operation in section __3 in other host machines. Thus we changed
the log info and host settings in all the nodes.

__4. Adding more rules to the existing policy - Here we add one more rule to policy.
Note that this new rule does not audit the BIADMIN user, which is typically a trusted
user id. This is fine for a test environment but for a production environment it is
recommended that this user is also audited. This is particularly important since many
breaches are due to compromised privileged user credentials or to a privileged user gone
rogue.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 21 of 28


__a. Select Manage Policies and click on policy newlyCreatedData1

__b. Click on Effects button and add effects as permit & apply_key and click OK

__c. Click on User button and add select user as BIADMIN as shown below.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 22 of 28


__d. Press "Add" button in "Security Rules" tab. The new rule will be added to the policy.

__e. Click on Up button and move the new rule to top as shown below.

At this point, we have added a permanent policy to ensure that all newly ingested data
across all nodes is encrypted going forward. Now simply start BigInsights. Data will be
encrypted and decrypted transparently to your BigInsights applications from now on.

7. Conclusion
More and more customers from all sectors would like to take Hadoop to the next level by
integrating big data with mission-critical systems and sensitive data. In order for this to
happen, big data solutions need to integrate enterprise security solutions such as

A practical guide to implementing data encryption for InfoSphere BigInsights Page 23 of 28


encryption, access control, and auditing. In this regard, the InfoSphere Guardium activity
monitoring and the InfoSphere Guardium data encryption solutions clearly emerge as
leaders. They seamlessly allow you to integrate your InfoSphere BigInsights Hadoop
data protection into your existing enterprise data security strategy and meet your
regulatory compliance needs.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 24 of 28


Further reading
IBM InfoSphere Guardium Data Encryption V2.0 secures data through
encryption to help you meet rigorous data governance and compliance
requirements, http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?htmlfid=897/ENUS212-224&infotype=AN&subtype=CA

Big data security and auditing with IBM InfoSphere Guardium,


http://www.ibm.com/developerworks/data/library/techarticle/dm-
1210bigdatasecurity/

Install IBM InfoSphere Guardium Data Encryption on the IBM PureApplication


System, http://www.ibm.com/developerworks/cloud/library/cl-installguardium/

A practical guide to implementing data encryption for InfoSphere BigInsights Page 25 of 28


Reviewers
Ron Ben Natan
IBM Distinguished Engineer
VP and CTO, Data Security, Compliance and
Optimization
James Giles
IBM Distinguished Engineer
Senior Manager, Big Data Development

Ashvin Kamaraju
VP of Product Development
Vormetric Data Security

Hui Liao
Senior Development Manager
BigInsights Development

Kan Zhang
Senior Technical Staff Member
BigInsights Development

Paul Zikopoulos
Director
World Wide Big Data Tiger Team

A practical guide to implementing data encryption for InfoSphere BigInsights Page 26 of 28


Notices
This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and services
currently available in your area. Any reference to an IBM product, program, or service is not
intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:

IBM Director of Licensing


IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-
INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do
not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.

Without limiting the above disclaimers, IBM provides no representations or warranties


regarding the accuracy, reliability or serviceability of any information or recommendations
provided in this publication, or with respect to any results that may be obtained by the use of
the information or observance of any recommendations provided herein. The information
contained in this document has not been submitted to any formal IBM test and is distributed
AS IS. The use of this information or the implementation of any recommendations or
techniques herein is a customer responsibility and depends on the customers ability to
evaluate and integrate them into the customers operational environment. While each item
may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Anyone attempting to adapt
these techniques to their own environment do so at their own risk.

This document and the information contained herein may be used solely in connection with
the IBM products discussed in this document.

This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only
and do not in any manner serve as an endorsement of those Web sites. The materials at
those Web sites are not part of the materials for this IBM product and use of those Web sites is
at your own risk.

IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.

Any performance data contained herein was determined in a controlled environment.


Therefore, the results obtained in other operating environments may vary significantly. Some
measurements may have been made on development-level systems and there is no
guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurements may have been estimated through extrapolation. Actual
results may vary. Users of this document should verify the applicable data for their specific
environment.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 27 of 28


Information concerning non-IBM products was obtained from the suppliers of those products,
their published announcements or other publicly available sources. IBM has not tested those
products and cannot confirm the accuracy of performance, compatibility or any other
claims related to non-IBM products. Questions on the capabilities of non-IBM products should
be addressed to the suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.

This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE: Copyright IBM Corporation 2013. All Rights Reserved.

This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol ( or ), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may
also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at Copyright and trademark information at
www.ibm.com/legal/copytrade.shtml

Windows is a trademark of Microsoft Corporation in the United States, other countries, or


both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

A practical guide to implementing data encryption for InfoSphere BigInsights Page 28 of 28

You might also like