Professional Documents
Culture Documents
Hadoop Security S360 2015v8 PDF
Hadoop Security S360 2015v8 PDF
MAP
REDUCE
HADOOP COMMON APPLICATIONS
1. Web Search
2. Advertising & recommendations
3. Security Threat Identification
4. Fraud Detection
5. Patient Record Search
Source: Yahoo:
https://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
PATIENT MATCHING AT SURESCRIPTS
- Surescripts provides a Patient Matching service
- 230 Million Patients
- Over 1 Billion matches last year
- Requirements:
- Reliability and performance
- Data Protection at rest is required
- Data Protection in transit is required
- Comprehensive security logging is needed
- ISO 27001 & EHNAC Audit Accreditation status must be
maintained
NOW WHAT?
Applications
Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
DATA PROTECTION Data Nodes Management
Nodes
- Network Security
- Authentication
- Authorization Admins
- Auditing
- Data Protection
- Encryption at rest;
HTTPS HTTPS
- Volume, file
- Encryption in transit: Applications
- HTTPS Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
SECURITY AUDITING Data Nodes Management
Nodes
- Network Security
- Authentication
- Authorization Admins
- Auditing
- Failed/Successful Authn.
- System changes
- Access to PHI
- Application logs: HDFS, Applications
YARN, MapReduce… Application
Users
- Data Protection
Enterprise Identity,
Logging, Encryption,
Key Management
AUTHORIZATION Data Nodes Management
Nodes
- Network Security
- Authentication
- Authorization Admins
- Limit user access to
function
- Limit user access to objects
- Manage delegation of
access
Applications
- Auditing Application
- Data Protection Users
Enterprise Identity,
Logging, Encryption,
Key Management
AUTHENTICATION Data Nodes Management
Nodes
- Network Security
- Authentication
- All users, all applications, Admins
all access paths
- Apache Knox Gateway
- Authorization HTTPS
- Auditing
- Data Protection Applications
Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
NETWORK SECURITY Data Nodes Management
Nodes
- Network Security
- Authentication
- Authorization Admins
- Auditing
- Data Protection
Applications
Application
Users
Enterprise Identity,
Logging, Encryption,
Key Management
HADOOP SECURE MODE
Apache Hadoop Secure Mode: 2.6.0 (March 14’)
- Authentication
- Covers HDFS, YARN, MapReduce & Web Console
- Uses central LDAP Server or Active Directory
- Requires Kerberos keytabs for each application
- Authorization
- Each Hadoop service has a list of users and groups
- Group permissions on HDFS filesystem components
- Audit
- Hadoop log, YARN log, other logs
- Data Protection
- Encryption in transit between Hadoop services & clients
- Encryption in transit between DataNodes
- Encryption in transit between web console & clients (HTTPS)
- Encryption at rest for HDFS columns
HADOOP SECURE MODE
Apache Hadoop Secure Mode: 2.6.0 (March 14’)
Authentication
Authorization
• on the folder and file level, via HDFS
• on the database, table and column level, via Hive
• on the table, column family and column level, via HBase
Audit
User access auditing in HDFS, Hive and HBase at IP address, Resource/resource type, Timestamp, Access granted or denied
Data Protection
• Wire, volume and file/column encryotion
• HDFS Transparent Encryption (TDE)
• Third-Party Partners (Hortonworks)
Administration
• Policy management, administration and delegation
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/Ranger_U_Guide_v22/index.html#Item1.1
HADOOP SECURITY POLICY
Authentication of processes:
- May go into existing application security policy
Security Logging requirements:
- Which applications must be logged?
- Add node identifier to standard log records
De-anonymization Issues
- Sparse data can be de-anonymized through matching to public sources
- Could 200 days of tweets be matched to any of my de-identified data?
Key Management & Business Continuity
BUILD A SECURITY BASELINE
- Start with your Vendor’s distribution
- Add your company’s sauce
- Review Hadoop Security Benchmark project at the Center For Internet
Security:
- Apache Hadoop 2.6.0 Benchmark
- Community Discussion
- Editors and members get free access to validation tools
- Everyone gets free access to baselines
- Registration is moderated. That means human registrants are approved and
receive a welcome email.
- Link:
- http://tinyurl.com/HadoopSecurityBenchmark
HADOOP SECURITY REVIEW
1. Start with the threats
2. Choose your diagram
3. Ask the standard security questions:
u Network Security
u Authentication
u Authorization
u Security Audit
u Data Protection
4. Update your policy
5. Build a Security Baseline
HADOOP SECURITY RESOURCES
1. Apache “Hadoop in Secure Mode
http://tinyurl.com/hadoopSecureMode
2. Yahoo Hadoop Tutorial
https://developer.yahoo.com/hadoop/tutorial
3. Securosis: “Securing Big Data: Security Recommendations for Hadoop and NoSQL
Environments”, 10/12/2012, Adrian Lane
https://securosis.com/assets/library/reports/SecuringBigData_FINAL.pdf
4. Cloudera: “Introduction to Hadoop Security”
http://tinyurl.com/cloudera50security
5. Hortonworks: “Security for Enterprise Hadoop”
http://hortonworks.com/innovation/security/
6. Center for Internet Security: Hadoop Security Baseline
http://tinyurl.com/HadoopSecurityBenchmark
QUESTIONS
Updates at http://www.confidentialsoftware.com