Professional Documents
Culture Documents
Data Security Best Practices - A Practical Guide To Implementing Data Encryption For InfoSphere BigInsights - Issued - June 2013 - (IBM Corporation)
Data Security Best Practices - A Practical Guide To Implementing Data Encryption For InfoSphere BigInsights - Issued - June 2013 - (IBM Corporation)
Walid Rjaibi Chief Security Architect for Information Management Nisanth Simon InfoSphere BigInsights Software Developer Monty Wright Senior Solutions Architect, Vormetric Data Security
1. Introduction ...................................................................................................... 3 2. Requirements for a data encryption solution .............................................. 3 2.1 Run-time component requirements ...................................................... 4 2.2 Key management component requirements........................................ 4 3. Guardium data encryption architecture....................................................... 4 4. Installing Guardium data encryption ........................................................... 6 5. Configuring encryption policies .................................................................... 6 5.1 Creating a policy for encrypting existing data .................................... 6 5.2 Creating a policy for encrypting new data......................................... 15 7. Conclusion ...................................................................................................... 23 Further reading................................................................................................... 25 Reviewers ............................................................................................................ 26
Page 2 of 28
1. Introduction
Encryption is the process of storing and transmitting data in a form that only those it is intended for can read and process. It is an effective way of protecting sensitive information as it is stored on media or transmitted through un-trusted communication channels. Encryption is mandatory for complying with many government regulations and industry standards such as the Payment Card Industry Data Security Standard (PCI DSS). In an encryption scheme, the data requiring protection (referred to as plaintext) is transformed into an unreadable form (referred to as ciphertext) by applying an encryption algorithm and encryption key. Encryption keys are randomly generated using a key-generation algorithm. There are two main encryption schemes: Symmetric encryption and asymmetric encryption. In symmetric encryption, the same key is used to encrypt and decrypt a given piece of data. The Advanced Encryption Standard (AES) is an example of a symmetric encryption scheme. In asymmetric encryption, data is encrypted using one key (usually referred to as the public key) and is decrypted using another key (usually referred to as the private key). The Rivest, Shamir, Adleman (RSA) algorithm is an example of an asymmetric encryption scheme. In practice, asymmetric and symmetric encryption schemes are often combined to offer an encryption solution. Generally, a symmetric algorithm is used to protect actual data using some encryption key, and an asymmetric algorithm is used to protect that encryption key. While Transport Layer Security (TLS) is widely accepted as the solution for protecting data in transit, no single solution has achieved similar status for protecting data at rest although some solutions such as the one described in this paper are clearly emerging as leaders in this area. This paper focuses on encryption for data at rest, specifically for data stored within IBM InfoSphere BigInsights Hadoop. The rest of this paper is organized as follows. Section 2 reviews the requirements for a sound data encryption solution. Section 3 introduces IBM InfoSphere Guardium Data Encryption (GDE). Sections 4 and 5 describe how to install and configure GDE to protect data stored within IBM InfoSphere BigInsights Hadoop. Lastly, we present our concluding thoughts in section 6.
Page 3 of 28
management component and provided to the run-time component as needed following a well defined secure protocol.
Page 4 of 28
Security server: This is the central point of administration for encryption, access control and audit policies File system agent: It provides encryption and access control services for data in online storage accessed by file systems
When InfoSphere Guardium Data Encryption is used to protect a database system such as DB2 or Informix IDS, a backup agent is also provided. The backup agent integrates with the database system backup command to allow the generation of encrypted database backups. This ensures that the same data is consistently protected whether it is online or offline.
Figure 1: InfoSphere Guardium Data Encryption Architecture An important distinction between InfoSphere Guardium Data Encryption and other solutions that offer encryption is how the encryption is performed. InfoSphere Guardium Data Encryption employs a technique in which the file metadata is left in clear text (unencrypted) while the file content are encrypted. This technique provides an additional level of file access control in addition to what the file system offersaccess without viewability. Effectively, an application can be granted access to a file for the purpose of management without decrypting its contents. Privileged super users can continue to manage their environments and access the file, but be restricted from having clear-text access to the file content. This capability helps mitigate risks from internal malicious activity targeted at sensitive data.
Page 5 of 28
Page 6 of 28
__c. Click on Action Button and add key_op Key operations and click OK
__d. Click on Effect button and Add effect as permit & apply_key and click OK.
Page 7 of 28
__f. Open Key Selection Rules tab and select the key as clear_key and press Add.
Page 8 of 28
__g. Open Data Transformation Rules tab and select the key as test-aes256-key and press Add.
Page 9 of 28
__h. Open Security Rules tabs and click Reset button. __i. Click on Effects and add deny & audit as effects.
Page 10 of 28
Page 11 of 28
__c. Click on Guard FS Tab and click Guard button. Add the policy and the folder where the data has to be encrypted. All the hadoop data will be stored under /hadoop folder.
__d. After adding the guard, refresh ensure that the status in Green
Page 12 of 28
__3. Performing the encryption on existing HDFS data __a. Open the terminal and login as root user.
__b. Run secfsd -status guard
Page 13 of 28
Perform the same operation (section __2 & __3) in other host machines. This is to ensure that existing data is encrypted across all nodes. __4. Removing/un-guarding the policy from the host __a. Open the Host tab and click on host name (hdtest021.svl.ibm.com).
__c. Click the Refresh button to ensure that the policy is deleted.
Page 14 of 28
Perform the same operation (section __4) in other host machines. This is to ensure that the policy is removed from all the nodes. At this point, all existing data is encrypted and we are now ready to create a permanent policy for encrypting any new data that will be ingested going forward.
__b. Click on Effect button and add effects as permit, apply_key & auditand click OK
Page 15 of 28
__c. Open Key Selection Rules and select key as test-aes256-key and press Add button as shown below.
Page 16 of 28
__e. Click on reset button. __f. Click on Effect and add effects as deny & Audit and click OK.
__g. Click Add button so that effect will be added to the security rules.
Page 17 of 28
Page 18 of 28
__c. Click on Guard FS Tab and click Guard button. Add the policy and the folder where the data has to be encrypted. All the HDFS data will be stored under /hadoop folder.
Page 19 of 28
__d. After adding the guard, refresh to ensure that the status in Green
__e. Perform the same operation in section __2 in other host machines. Thus we linked the policy with all the nodes.
__3. Changing the log info & Host setting in all host machines __a. Go to Hosts tab
Page 20 of 28
__c. Click on FS Agent Log Tab and change the level in Policy Evaluation level as INFO.
__d. Click Ok button. __e. Click on Host Settings and add |trust|* as shown below.
__f. Click Ok button. __g. Perform the same operation in section __3 in other host machines. Thus we changed the log info and host settings in all the nodes. __4. Adding more rules to the existing policy - Here we add one more rule to policy. Note that this new rule does not audit the BIADMIN user, which is typically a trusted user id. This is fine for a test environment but for a production environment it is recommended that this user is also audited. This is particularly important since many breaches are due to compromised privileged user credentials or to a privileged user gone rogue.
Page 21 of 28
__b. Click on Effects button and add effects as permit & apply_key and click OK
__c. Click on User button and add select user as BIADMIN as shown below.
Page 22 of 28
__d. Press "Add" button in "Security Rules" tab. The new rule will be added to the policy.
__e. Click on Up button and move the new rule to top as shown below.
At this point, we have added a permanent policy to ensure that all newly ingested data across all nodes is encrypted going forward. Now simply start BigInsights. Data will be encrypted and decrypted transparently to your BigInsights applications from now on.
7. Conclusion
More and more customers from all sectors would like to take Hadoop to the next level by integrating big data with mission-critical systems and sensitive data. In order for this to happen, big data solutions need to integrate enterprise security solutions such as
Page 23 of 28
encryption, access control, and auditing. In this regard, the InfoSphere Guardium activity monitoring and the InfoSphere Guardium data encryption solutions clearly emerge as leaders. They seamlessly allow you to integrate your InfoSphere BigInsights Hadoop data protection into your existing enterprise data security strategy and meet your regulatory compliance needs.
Page 24 of 28
Further reading
IBM InfoSphere Guardium Data Encryption V2.0 secures data through encryption to help you meet rigorous data governance and compliance requirements, http://www-01.ibm.com/common/ssi/cgibin/ssialias?htmlfid=897/ENUS212-224&infotype=AN&subtype=CA Big data security and auditing with IBM InfoSphere Guardium, http://www.ibm.com/developerworks/data/library/techarticle/dm1210bigdatasecurity/ Install IBM InfoSphere Guardium Data Encryption on the IBM PureApplication System, http://www.ibm.com/developerworks/cloud/library/cl-installguardium/
Page 25 of 28
Reviewers
Ron Ben Natan IBM Distinguished Engineer VP and CTO, Data Security, Compliance and Optimization James Giles IBM Distinguished Engineer Senior Manager, Big Data Development Ashvin Kamaraju VP of Product Development Vormetric Data Security Hui Liao Senior Development Manager BigInsights Development Kan Zhang Senior Technical Staff Member BigInsights Development Paul Zikopoulos Director World Wide Big Data Tiger Team
Page 26 of 28
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. Without limiting the above disclaimers, IBM provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any recommendations or techniques herein is a customer responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environment do so at their own risk. This document and the information contained herein may be used solely in connection with the IBM products discussed in this document. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.
Page 27 of 28
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: Copyright IBM Corporation 2013. All Rights Reserved. This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
Page 28 of 28