Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Like what you hear?

Tweet it using: #Sec360


HADOOP SECURITY

Like what you hear? Tweet it using: #Sec360


HADOOP SECURITY
About Robert:
School: UW Madison, U St. Thomas
Programming: 15 years, C, C++, Java
Security Work:
§  Surescripts, Minneapolis (present)
§  Big Retail Company, Minneapolis
§  Big Healthcare Company, Minnetonka
OWASP Local Volunteer
CISSP, CISM, CISA, CHPS
Email: bob@confidentialsoftware.com
Twitter: @msp_sullivan
HADOOP SECURITY
History
What is new?
Common Applications
Threats
Security Architecture
Secure Baseline and Testing
Policy Impact
HADOOP HISTORY
•  2002 : Doug Cutting & Mike Cafarella: Nutch
•  Crawl and index hundreds of millions of pages
•  2003: Google File System paper released
•  2004: Google MapReduce paper released
•  2006: Yahoo formed Hadoop 5 to 20 nodes
•  2008: Yahoo, Hadoop “behind every click”
•  2008: Google spun off Cloudera 2,000 Hadoop nodes
•  2008: Facebook open sourced Hive for Hadoop
•  2011: Yahoo spins out Hortonworks
•  Hortonworks Hadoop 42,000 nodes, hundreds of petabytes

Derrick Harris “The History of Hadoop from 4 nodes to the future of


data”, gigamon.com
HADOOP IS
The Apache Hadoop software library is a framework that allows for the
distributed processing of large …
-  Software Framework
-  Distributed Processing
-  Large Data Sets
-  Clusters of Computers
-  High Availability
-  Scale to Thousands of Machines
Link:
https://developer.yahoo.com/hadoop/tutorial
MAPREDUCE IS NEW

MAP

REDUCE
HADOOP COMMON APPLICATIONS

1. Web Search
2. Advertising & recommendations
3. Security Threat Identification
4. Fraud Detection
5. Patient Record Search
Source: Yahoo:
https://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
PATIENT MATCHING AT SURESCRIPTS
-  Surescripts provides a Patient Matching service
-  230 Million Patients
-  Over 1 Billion matches last year
-  Requirements:
-  Reliability and performance
-  Data Protection at rest is required
-  Data Protection in transit is required
-  Comprehensive security logging is needed
-  ISO 27001 & EHNAC Audit Accreditation status must be
maintained
NOW WHAT?

SECURE THE BEES


HADOOP THREAT MODEL
1)  Unauthorized data access (protected health information access)
2)  Unauthorized data change
3)  Unauthorized job submission, delete or change
4)  Task may access other tasks or access local data
5)  Rogue DataNode, NameNode or Job Tracker
6)  User spoofing to submit workflow as another user
From:
“Adding Security to Apache Hadoop”, Das, O’Malley, Rhadia, Zhang, 2011,
http://hortonworks.com/wp-content/uploads/2011/10/security-
design_withCover-1.pdf
HADOOP SECURITY Data Nodes Management
Nodes
-  Network Security
-  Authentication
-  Authorization Admins
-  Auditing
-  Data Protection

Applications
Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
DATA PROTECTION Data Nodes Management
Nodes
-  Network Security
-  Authentication
-  Authorization Admins
-  Auditing
-  Data Protection
-  Encryption at rest;
HTTPS HTTPS
-  Volume, file
-  Encryption in transit: Applications
-  HTTPS Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
SECURITY AUDITING Data Nodes Management
Nodes
-  Network Security
-  Authentication
-  Authorization Admins
-  Auditing
-  Failed/Successful Authn.
-  System changes
-  Access to PHI
-  Application logs: HDFS, Applications
YARN, MapReduce… Application
Users
-  Data Protection
Enterprise Identity,
Logging, Encryption,
Key Management
AUTHORIZATION Data Nodes Management
Nodes
-  Network Security
-  Authentication
-  Authorization Admins
-  Limit user access to
function
-  Limit user access to objects
-  Manage delegation of
access
Applications
-  Auditing Application
-  Data Protection Users
Enterprise Identity,
Logging, Encryption,
Key Management
AUTHENTICATION Data Nodes Management
Nodes
-  Network Security
-  Authentication
-  All users, all applications, Admins
all access paths
-  Apache Knox Gateway
-  Authorization HTTPS
-  Auditing
-  Data Protection Applications
Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
NETWORK SECURITY Data Nodes Management
Nodes
-  Network Security
-  Authentication
-  Authorization Admins
-  Auditing
-  Data Protection

Applications
Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
HADOOP SECURE MODE
Apache Hadoop Secure Mode: 2.6.0 (March 14’)
-  Authentication
-  Covers HDFS, YARN, MapReduce & Web Console
-  Uses central LDAP Server or Active Directory
-  Requires Kerberos keytabs for each application
-  Authorization
-  Each Hadoop service has a list of users and groups
-  Group permissions on HDFS filesystem components
-  Audit
-  Hadoop log, YARN log, other logs
-  Data Protection
-  Encryption in transit between Hadoop services & clients
-  Encryption in transit between DataNodes
-  Encryption in transit between web console & clients (HTTPS)
-  Encryption at rest for HDFS columns
HADOOP SECURE MODE
Apache Hadoop Secure Mode: 2.6.0 (March 14’)

Data Data Job Task Rogue User


Access Change Submission Access Node Spoofing
Network
Security
Authentication
Authorization
Audit
Data
Protection
APACHE KNOX
The Apache Knox Gateway is a REST API Gateway for interacting with
Hadoop clusters. The Knox Gateway provides a single access point for all
REST interactions with Hadoop clusters.
Knox can provide:
•  Authentication (LDAP and Active Directory Authentication Provider)
•  Federation/SSO (HTTP Header Based Identity Federation)
•  Authorization (Service Level Authorization)
•  Auditing
Integrations:
- WebHDFS (HDFS), Templeton (Hcatalog), Stargate (Hbase), Oozie, Hive/
JDBC
Status: Incubating
APACHE RANGER
A centralized security framework to manage fine grained access control.
Status: Incubating

Authentication

•  Kerberos in native Apache Hadoop


•  Secured by the Apache Knox Gateway via the HTTP/REST API

Authorization
•  on the folder and file level, via HDFS
•  on the database, table and column level, via Hive
•  on the table, column family and column level, via HBase

Audit
User access auditing in HDFS, Hive and HBase at IP address, Resource/resource type, Timestamp, Access granted or denied

Data Protection
•  Wire, volume and file/column encryotion
•  HDFS Transparent Encryption (TDE)
•  Third-Party Partners (Hortonworks)

Administration
•  Policy management, administration and delegation

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/Ranger_U_Guide_v22/index.html#Item1.1
HADOOP SECURITY POLICY
Authentication of processes:
-  May go into existing application security policy
Security Logging requirements:
-  Which applications must be logged?
-  Add node identifier to standard log records
De-anonymization Issues
-  Sparse data can be de-anonymized through matching to public sources
-  Could 200 days of tweets be matched to any of my de-identified data?
Key Management & Business Continuity
BUILD A SECURITY BASELINE
-  Start with your Vendor’s distribution
-  Add your company’s sauce
-  Review Hadoop Security Benchmark project at the Center For Internet
Security:
-  Apache Hadoop 2.6.0 Benchmark
-  Community Discussion
-  Editors and members get free access to validation tools
-  Everyone gets free access to baselines
-  Registration is moderated. That means human registrants are approved and
receive a welcome email.
-  Link:
-  http://tinyurl.com/HadoopSecurityBenchmark
HADOOP SECURITY REVIEW
1.  Start with the threats
2.  Choose your diagram
3.  Ask the standard security questions:
u Network Security
u Authentication
u Authorization
u Security Audit
u Data Protection
4.  Update your policy
5.  Build a Security Baseline
HADOOP SECURITY RESOURCES
1.  Apache “Hadoop in Secure Mode
http://tinyurl.com/hadoopSecureMode
2.  Yahoo Hadoop Tutorial
https://developer.yahoo.com/hadoop/tutorial
3.  Securosis: “Securing Big Data: Security Recommendations for Hadoop and NoSQL
Environments”, 10/12/2012, Adrian Lane
https://securosis.com/assets/library/reports/SecuringBigData_FINAL.pdf
4.  Cloudera: “Introduction to Hadoop Security”
http://tinyurl.com/cloudera50security
5.  Hortonworks: “Security for Enterprise Hadoop”
http://hortonworks.com/innovation/security/
6.  Center for Internet Security: Hadoop Security Baseline
http://tinyurl.com/HadoopSecurityBenchmark
QUESTIONS

Updates at http://www.confidentialsoftware.com

You might also like