SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights
reserved
MANAGING ENTERPRISE USERS IN
HADOOP ECOSYSTEM
Sailaja Polavarapu
Staff Software
Engineer
Hortonworks
spolavarapu@apache.org
Dataworks Summit 2018 San Jose
Velmurugan Periasamy
Director, Engineering
Hortonworks
vel@apache.org
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
◆ Introduction
◆ Enterprise User Management in Hadoop environment
◆ Enabling technologies
◆ Integrating Hadoop cluster with Multiple LDAP stores
◆ Key takeaways
◆ Demo
3 © Hortonworks Inc. 2011–2018. All rights
reserved
Managing Enterprise Users - Why?
Key
Security
Goals
Confidentiality
Prevent unauthorized disclosure of information
Integrity
Assure no unauthorized data modifications
Availability
Readily available info for authorized use
4 © Hortonworks Inc. 2011–2018. All rights
reserved
Managing Enterprise Users - Why?
●Cost of Average Data breach - $3.62M
●Factor in the loss in business continuity,
regulatory/PR/legal issues, damage to the
brand etc.
●Security is serious business
●Enterprise Users Management is critical
Source: https://www.ibm.com/security/data-breach
5 © Hortonworks Inc. 2011–2018. All rights
reserved
Managing Enterprise Users - Foundations
Authentication
Key
Security Goals
Authorization Auditing Administration Data Protection
Confidentiality
Integrity Availability
6 © Hortonworks Inc. 2011–2018. All rights
reserved
Enterprise Directory (AD/LDAP )
Enterprise
Users Systems
Engg
Suppor
t
Finance
Administration
Authentication
Authorization
Resource Access in
Hadoop Cluster
Managing Enterprise Users in Hadoop Env
Kerberos
7 © Hortonworks Inc. 2011–2018. All rights
reserved
Kerberos
● Open Standard Authentication Protocol
● Developed by MIT for distributed systems
● Kerberos is MUST for Hadoop Env
○ Authenticating to KDC provides access
for all services in REALM
● Directory Server Integration
8 © Hortonworks Inc. 2011–2018. All rights
reserved
Apache Knox
● Centralized authentication
with SSO
● Complements, does not
replace Kerberos
● Single access point for all
REST and HTTP interactions
with Apache Hadoop
clusters
● Directory server integration
9 © Hortonworks Inc. 2011–2018. All rights
reserved
Apache Ranger
● Centralized authorization & auditing for hadoop ecosystem
● Fine grained access control with flexible ABAC models
● Centralized security policy administration
● Directory server integration
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Management in Hadoop: Drill Down
Authentication
User accessing hadoop
resources
Authorization
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Understanding your deployment
– What kind of directory server(s): Active Directory, OpenLdap
server, etc…?
– OS version of the hadoop cluster nodes: CentOS, Ubuntu,
etc...
– User group mapping on hadoop clusters: using SSSD, core-
site.xml, manual, etc…
– Authentication mechanism: Kerberos, Knox gateway, etc...
– Authorization policies to be configured at user level or group
level?
Requirements for User Management
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Management in Knox
⬢ Supports Authentication using either LDAP or
Federation provider
– Active Directory/LDAP
– SPNEGO/Kerberos
– PAM (Pluggable Authentication Module)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication flow using Knox
Active
Directory
FreeIPA
Services:
Ambari
WebHDFS (HDFS)
Yarn RM
Stargate (Apache HBase)
Apache Oozie
Apache Hive/JDBC
Apache Hive WebHCat
(Templeton)
Apache Storm
Apache Tinkerpop - Gremlin
Apache Avatica/Phoenix
Apache SOLR
Apache Livy (Spark REST
Service)
Kafka REST Proxy
UI:
Name Node UI
Job History UI
Yarn UI
Apache Oozie UI
Apache HBase UI
Apache Spark UI
Apache Ambari UI
Apache Ranger Admin
Console
Apache Zeppelin
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Management in Ranger
⬢ Users and Groups in Ranger are used for
– Configuring policies
– Authorization or access control
– Auditing
⬢ Users and Groups management in ranger is handled by
UserSync module
⬢ UserSync modules key responsiblities
– Have Ranger access to enterprise users and groups during policy creation
– Interact with directory servers and provide users, groups, and
membership updates to ranger
– Provides flexible configuration options
– Provides Usersync Audits (Introduced in HDP 3.0)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User sources
⬢ AD/LDAP
– Syncs users and groups from LDAP Organizational Units (OU)
⬢ Unix Native Users
– Syncs users and groups from /etc/passwd and /etc/group files
– Sync users and groups provided by NSS (Name Service Switch)
(Introduced in HDP 3.0)
⬢ File Sources
– Syncs users and groups from a file specified in the configuration.
– Supports many file formats like - CSV, JSON, etc...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger
UserSync
Ranger
Admin
Database
Sync
Users/Groups
User/Group Synchronization in Ranger
Active
Directory
FreeIPA
1
7
© Hortonworks Inc. 2011–2018. All rights
reserved
Use case
● LDAP requirements
●Multiple LDAP (IBM Tivoli, Lotus Notes Dominoes, Active
Directory On-prem, Azure Active Directory)
●Single Enterprise Data Lake Cluster across multiple
countries
1
8
© Hortonworks Inc. 2011–2018. All rights
reserved
Sample User from two different domains
1
9
© Hortonworks Inc. 2011–2018. All rights
reserved
Reference Architecture
2
0
© Hortonworks Inc. 2011–2018. All rights
reserved
Steps Involved
●Setting up secure cluster (HDP version used for this
demo is 3.0 and centos7)
●Active directory and FreeIPA setup
●Setting up SSSD with PAM and NSS services
●Configuring Knox with PAM+SSSD for authentication
●Configuring Ranger admin & usersync with PAM +
SSSD
●Configuring services like hdfs & hive with kerberos
authentication
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SSSD configuration with AD and FreeIPA servers
Multiple Domains:
RANGER.QE.HORTONWORKS.COM
RANGERDEV.HORTONWORKS.COM
RANGER.QE.HORTONWORKS.COM
is an Active Directory
RANGERDEV.HORTONWORKS.COM
is a FreeIPA server
Using Fully qualified names for users
and groups from
RANGERDEV.HORTONWORKS.COM
in order to resolve name conflicts
between two domains
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Knox topology config for PAM Authentication
<value>org.apache.hadoop.gateway.shirorealm.KnoxPamRealm</value>
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Configuration for PAM and NSS
./native/pamCredValidator.uexe
PAM
nss
From Ambari 2.7/HDP 3.0
2
4
© Hortonworks Inc. 2011–2018. All rights
reserved
Demo
2
5
© Hortonworks Inc. 2011–2018. All rights
reserved
Key Takeaways
● Kerberos is essential in Hadoop env
● Integration with enterprise directory critical for seamless security
● Centralized Security Administration
● Ranger for Centralized Authorization and Auditing
● Knox for Centralized REST/HTTP Authentication
● Multiple domains can be integrated via SSSD
2
6
© Hortonworks Inc. 2011–2018. All rights
reserved
2
7
© Hortonworks Inc. 2011–2018. All rights
reserved
References
● Apache Ranger -
http://ranger.apache.org/
https://hortonworks.com/apache/ranger/
https://cwiki.apache.org/confluence/display/RANGER
● Apache Knox -
http://knox.apache.org/
https://hortonworks.com/apache/knox-gateway/
https://cwiki.apache.org/confluence/display/KNOX/
● Configuring Knox with SSSD + PAM -
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66854729
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-
2.6.0/bk_security/content/setting_up_pam_authentication.html
● Configuring Ranger admin with SSSD + PAM -
https://issues.apache.org/jira/browse/RANGER-842
● Configuring Ranger Usersync with SSSD + PAM -
https://issues.apache.org/jira/browse/RANGER-827
2
8
© Hortonworks Inc. 2011–2018. All rights
reserved
Join Apache Ranger Community
user@ranger.apache.org
dev@ranger.apache.org

More Related Content

What's hot

Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
DataWorks Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
hypto
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Edureka!
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
Fadel Chafai
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 

What's hot (20)

Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 

Similar to Managing enterprise users in Hadoop ecosystem

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
Abdelkrim Hadjidj
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
Maxime Lanciaux
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Chris Nauroth
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 

Similar to Managing enterprise users in Hadoop ecosystem (20)

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 

Recently uploaded (20)

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 

Managing enterprise users in Hadoop ecosystem

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved MANAGING ENTERPRISE USERS IN HADOOP ECOSYSTEM Sailaja Polavarapu Staff Software Engineer Hortonworks spolavarapu@apache.org Dataworks Summit 2018 San Jose Velmurugan Periasamy Director, Engineering Hortonworks vel@apache.org
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda ◆ Introduction ◆ Enterprise User Management in Hadoop environment ◆ Enabling technologies ◆ Integrating Hadoop cluster with Multiple LDAP stores ◆ Key takeaways ◆ Demo
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Managing Enterprise Users - Why? Key Security Goals Confidentiality Prevent unauthorized disclosure of information Integrity Assure no unauthorized data modifications Availability Readily available info for authorized use
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Managing Enterprise Users - Why? ●Cost of Average Data breach - $3.62M ●Factor in the loss in business continuity, regulatory/PR/legal issues, damage to the brand etc. ●Security is serious business ●Enterprise Users Management is critical Source: https://www.ibm.com/security/data-breach
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Managing Enterprise Users - Foundations Authentication Key Security Goals Authorization Auditing Administration Data Protection Confidentiality Integrity Availability
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Enterprise Directory (AD/LDAP ) Enterprise Users Systems Engg Suppor t Finance Administration Authentication Authorization Resource Access in Hadoop Cluster Managing Enterprise Users in Hadoop Env Kerberos
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Kerberos ● Open Standard Authentication Protocol ● Developed by MIT for distributed systems ● Kerberos is MUST for Hadoop Env ○ Authenticating to KDC provides access for all services in REALM ● Directory Server Integration
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Apache Knox ● Centralized authentication with SSO ● Complements, does not replace Kerberos ● Single access point for all REST and HTTP interactions with Apache Hadoop clusters ● Directory server integration
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Apache Ranger ● Centralized authorization & auditing for hadoop ecosystem ● Fine grained access control with flexible ABAC models ● Centralized security policy administration ● Directory server integration
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Management in Hadoop: Drill Down Authentication User accessing hadoop resources Authorization
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Understanding your deployment – What kind of directory server(s): Active Directory, OpenLdap server, etc…? – OS version of the hadoop cluster nodes: CentOS, Ubuntu, etc... – User group mapping on hadoop clusters: using SSSD, core- site.xml, manual, etc… – Authentication mechanism: Kerberos, Knox gateway, etc... – Authorization policies to be configured at user level or group level? Requirements for User Management
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Management in Knox ⬢ Supports Authentication using either LDAP or Federation provider – Active Directory/LDAP – SPNEGO/Kerberos – PAM (Pluggable Authentication Module)
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication flow using Knox Active Directory FreeIPA Services: Ambari WebHDFS (HDFS) Yarn RM Stargate (Apache HBase) Apache Oozie Apache Hive/JDBC Apache Hive WebHCat (Templeton) Apache Storm Apache Tinkerpop - Gremlin Apache Avatica/Phoenix Apache SOLR Apache Livy (Spark REST Service) Kafka REST Proxy UI: Name Node UI Job History UI Yarn UI Apache Oozie UI Apache HBase UI Apache Spark UI Apache Ambari UI Apache Ranger Admin Console Apache Zeppelin
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Management in Ranger ⬢ Users and Groups in Ranger are used for – Configuring policies – Authorization or access control – Auditing ⬢ Users and Groups management in ranger is handled by UserSync module ⬢ UserSync modules key responsiblities – Have Ranger access to enterprise users and groups during policy creation – Interact with directory servers and provide users, groups, and membership updates to ranger – Provides flexible configuration options – Provides Usersync Audits (Introduced in HDP 3.0)
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User sources ⬢ AD/LDAP – Syncs users and groups from LDAP Organizational Units (OU) ⬢ Unix Native Users – Syncs users and groups from /etc/passwd and /etc/group files – Sync users and groups provided by NSS (Name Service Switch) (Introduced in HDP 3.0) ⬢ File Sources – Syncs users and groups from a file specified in the configuration. – Supports many file formats like - CSV, JSON, etc...
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger UserSync Ranger Admin Database Sync Users/Groups User/Group Synchronization in Ranger Active Directory FreeIPA
  • 17. 1 7 © Hortonworks Inc. 2011–2018. All rights reserved Use case ● LDAP requirements ●Multiple LDAP (IBM Tivoli, Lotus Notes Dominoes, Active Directory On-prem, Azure Active Directory) ●Single Enterprise Data Lake Cluster across multiple countries
  • 18. 1 8 © Hortonworks Inc. 2011–2018. All rights reserved Sample User from two different domains
  • 19. 1 9 © Hortonworks Inc. 2011–2018. All rights reserved Reference Architecture
  • 20. 2 0 © Hortonworks Inc. 2011–2018. All rights reserved Steps Involved ●Setting up secure cluster (HDP version used for this demo is 3.0 and centos7) ●Active directory and FreeIPA setup ●Setting up SSSD with PAM and NSS services ●Configuring Knox with PAM+SSSD for authentication ●Configuring Ranger admin & usersync with PAM + SSSD ●Configuring services like hdfs & hive with kerberos authentication
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SSSD configuration with AD and FreeIPA servers Multiple Domains: RANGER.QE.HORTONWORKS.COM RANGERDEV.HORTONWORKS.COM RANGER.QE.HORTONWORKS.COM is an Active Directory RANGERDEV.HORTONWORKS.COM is a FreeIPA server Using Fully qualified names for users and groups from RANGERDEV.HORTONWORKS.COM in order to resolve name conflicts between two domains
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Knox topology config for PAM Authentication <value>org.apache.hadoop.gateway.shirorealm.KnoxPamRealm</value>
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Configuration for PAM and NSS ./native/pamCredValidator.uexe PAM nss From Ambari 2.7/HDP 3.0
  • 24. 2 4 © Hortonworks Inc. 2011–2018. All rights reserved Demo
  • 25. 2 5 © Hortonworks Inc. 2011–2018. All rights reserved Key Takeaways ● Kerberos is essential in Hadoop env ● Integration with enterprise directory critical for seamless security ● Centralized Security Administration ● Ranger for Centralized Authorization and Auditing ● Knox for Centralized REST/HTTP Authentication ● Multiple domains can be integrated via SSSD
  • 26. 2 6 © Hortonworks Inc. 2011–2018. All rights reserved
  • 27. 2 7 © Hortonworks Inc. 2011–2018. All rights reserved References ● Apache Ranger - http://ranger.apache.org/ https://hortonworks.com/apache/ranger/ https://cwiki.apache.org/confluence/display/RANGER ● Apache Knox - http://knox.apache.org/ https://hortonworks.com/apache/knox-gateway/ https://cwiki.apache.org/confluence/display/KNOX/ ● Configuring Knox with SSSD + PAM - https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66854729 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 2.6.0/bk_security/content/setting_up_pam_authentication.html ● Configuring Ranger admin with SSSD + PAM - https://issues.apache.org/jira/browse/RANGER-842 ● Configuring Ranger Usersync with SSSD + PAM - https://issues.apache.org/jira/browse/RANGER-827
  • 28. 2 8 © Hortonworks Inc. 2011–2018. All rights reserved Join Apache Ranger Community user@ranger.apache.org dev@ranger.apache.org