SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights
Sailaja Polavarapu
Staff Software
Dataworks Summit 2018 San Jose
Velmurugan Periasamy
Director, Engineering
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
◆ Introduction
◆ Enterprise User Management in Hadoop environment
◆ Enabling technologies
◆ Integrating Hadoop cluster with Multiple LDAP stores
◆ Key takeaways
◆ Demo
3 © Hortonworks Inc. 2011–2018. All rights
Managing Enterprise Users - Why?
Prevent unauthorized disclosure of information
Assure no unauthorized data modifications
Readily available info for authorized use
4 © Hortonworks Inc. 2011–2018. All rights
Managing Enterprise Users - Why?
●Cost of Average Data breach - $3.62M
●Factor in the loss in business continuity,
regulatory/PR/legal issues, damage to the
brand etc.
●Security is serious business
●Enterprise Users Management is critical
5 © Hortonworks Inc. 2011–2018. All rights
Managing Enterprise Users - Foundations
Security Goals
Authorization Auditing Administration Data Protection
Integrity Availability
6 © Hortonworks Inc. 2011–2018. All rights
Enterprise Directory (AD/LDAP )
Users Systems
Resource Access in
Hadoop Cluster
Managing Enterprise Users in Hadoop Env
7 © Hortonworks Inc. 2011–2018. All rights
● Open Standard Authentication Protocol
● Developed by MIT for distributed systems
● Kerberos is MUST for Hadoop Env
○ Authenticating to KDC provides access
for all services in REALM
● Directory Server Integration
8 © Hortonworks Inc. 2011–2018. All rights
Apache Knox
● Centralized authentication
with SSO
● Complements, does not
replace Kerberos
● Single access point for all
REST and HTTP interactions
with Apache Hadoop
● Directory server integration
9 © Hortonworks Inc. 2011–2018. All rights
Apache Ranger
● Centralized authorization & auditing for hadoop ecosystem
● Fine grained access control with flexible ABAC models
● Centralized security policy administration
● Directory server integration
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Management in Hadoop: Drill Down
User accessing hadoop
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Understanding your deployment
– What kind of directory server(s): Active Directory, OpenLdap
server, etc…?
– OS version of the hadoop cluster nodes: CentOS, Ubuntu,
– User group mapping on hadoop clusters: using SSSD, core-
site.xml, manual, etc…
– Authentication mechanism: Kerberos, Knox gateway, etc...
– Authorization policies to be configured at user level or group
Requirements for User Management
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Management in Knox
⬢ Supports Authentication using either LDAP or
Federation provider
– Active Directory/LDAP
– SPNEGO/Kerberos
– PAM (Pluggable Authentication Module)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication flow using Knox
Yarn RM
Stargate (Apache HBase)
Apache Oozie
Apache Hive/JDBC
Apache Hive WebHCat
Apache Storm
Apache Tinkerpop - Gremlin
Apache Avatica/Phoenix
Apache SOLR
Apache Livy (Spark REST
Kafka REST Proxy
Name Node UI
Job History UI
Yarn UI
Apache Oozie UI
Apache HBase UI
Apache Spark UI
Apache Ambari UI
Apache Ranger Admin
Apache Zeppelin
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Management in Ranger
⬢ Users and Groups in Ranger are used for
– Configuring policies
– Authorization or access control
– Auditing
⬢ Users and Groups management in ranger is handled by
UserSync module
⬢ UserSync modules key responsiblities
– Have Ranger access to enterprise users and groups during policy creation
– Interact with directory servers and provide users, groups, and
membership updates to ranger
– Provides flexible configuration options
– Provides Usersync Audits (Introduced in HDP 3.0)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User sources
– Syncs users and groups from LDAP Organizational Units (OU)
⬢ Unix Native Users
– Syncs users and groups from /etc/passwd and /etc/group files
– Sync users and groups provided by NSS (Name Service Switch)
(Introduced in HDP 3.0)
⬢ File Sources
– Syncs users and groups from a file specified in the configuration.
– Supports many file formats like - CSV, JSON, etc...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User/Group Synchronization in Ranger
© Hortonworks Inc. 2011–2018. All rights
Use case
● LDAP requirements
●Multiple LDAP (IBM Tivoli, Lotus Notes Dominoes, Active
Directory On-prem, Azure Active Directory)
●Single Enterprise Data Lake Cluster across multiple
© Hortonworks Inc. 2011–2018. All rights
Sample User from two different domains
© Hortonworks Inc. 2011–2018. All rights
Reference Architecture
© Hortonworks Inc. 2011–2018. All rights
Steps Involved
●Setting up secure cluster (HDP version used for this
demo is 3.0 and centos7)
●Active directory and FreeIPA setup
●Setting up SSSD with PAM and NSS services
●Configuring Knox with PAM+SSSD for authentication
●Configuring Ranger admin & usersync with PAM +
●Configuring services like hdfs & hive with kerberos
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SSSD configuration with AD and FreeIPA servers
Multiple Domains:
is an Active Directory
is a FreeIPA server
Using Fully qualified names for users
and groups from
in order to resolve name conflicts
between two domains
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Knox topology config for PAM Authentication
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Configuration for PAM and NSS
From Ambari 2.7/HDP 3.0
© Hortonworks Inc. 2011–2018. All rights
© Hortonworks Inc. 2011–2018. All rights
Key Takeaways
● Kerberos is essential in Hadoop env
● Integration with enterprise directory critical for seamless security
● Centralized Security Administration
● Ranger for Centralized Authorization and Auditing
● Knox for Centralized REST/HTTP Authentication
● Multiple domains can be integrated via SSSD
© Hortonworks Inc. 2011–2018. All rights
© Hortonworks Inc. 2011–2018. All rights
● Apache Ranger -
● Apache Knox -
● Configuring Knox with SSSD + PAM -
● Configuring Ranger admin with SSSD + PAM -
● Configuring Ranger Usersync with SSSD + PAM -
© Hortonworks Inc. 2011–2018. All rights
Join Apache Ranger Community

More Related Content

What's hot

Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
DataWorks Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
Fadel Chafai
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...

What's hot (20)

Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...

Similar to Managing enterprise users in Hadoop ecosystem

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
Abdelkrim Hadjidj
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
Maxime Lanciaux
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Chris Nauroth
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit

Similar to Managing enterprise users in Hadoop ecosystem (20)

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf

Recently uploaded (20)

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf

Managing enterprise users in Hadoop ecosystem

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved MANAGING ENTERPRISE USERS IN HADOOP ECOSYSTEM Sailaja Polavarapu Staff Software Engineer Hortonworks Dataworks Summit 2018 San Jose Velmurugan Periasamy Director, Engineering Hortonworks
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda ◆ Introduction ◆ Enterprise User Management in Hadoop environment ◆ Enabling technologies ◆ Integrating Hadoop cluster with Multiple LDAP stores ◆ Key takeaways ◆ Demo
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Managing Enterprise Users - Why? Key Security Goals Confidentiality Prevent unauthorized disclosure of information Integrity Assure no unauthorized data modifications Availability Readily available info for authorized use
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Managing Enterprise Users - Why? ●Cost of Average Data breach - $3.62M ●Factor in the loss in business continuity, regulatory/PR/legal issues, damage to the brand etc. ●Security is serious business ●Enterprise Users Management is critical Source:
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Managing Enterprise Users - Foundations Authentication Key Security Goals Authorization Auditing Administration Data Protection Confidentiality Integrity Availability
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Enterprise Directory (AD/LDAP ) Enterprise Users Systems Engg Suppor t Finance Administration Authentication Authorization Resource Access in Hadoop Cluster Managing Enterprise Users in Hadoop Env Kerberos
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Kerberos ● Open Standard Authentication Protocol ● Developed by MIT for distributed systems ● Kerberos is MUST for Hadoop Env ○ Authenticating to KDC provides access for all services in REALM ● Directory Server Integration
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Apache Knox ● Centralized authentication with SSO ● Complements, does not replace Kerberos ● Single access point for all REST and HTTP interactions with Apache Hadoop clusters ● Directory server integration
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Apache Ranger ● Centralized authorization & auditing for hadoop ecosystem ● Fine grained access control with flexible ABAC models ● Centralized security policy administration ● Directory server integration
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Management in Hadoop: Drill Down Authentication User accessing hadoop resources Authorization
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Understanding your deployment – What kind of directory server(s): Active Directory, OpenLdap server, etc…? – OS version of the hadoop cluster nodes: CentOS, Ubuntu, etc... – User group mapping on hadoop clusters: using SSSD, core- site.xml, manual, etc… – Authentication mechanism: Kerberos, Knox gateway, etc... – Authorization policies to be configured at user level or group level? Requirements for User Management
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Management in Knox ⬢ Supports Authentication using either LDAP or Federation provider – Active Directory/LDAP – SPNEGO/Kerberos – PAM (Pluggable Authentication Module)
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication flow using Knox Active Directory FreeIPA Services: Ambari WebHDFS (HDFS) Yarn RM Stargate (Apache HBase) Apache Oozie Apache Hive/JDBC Apache Hive WebHCat (Templeton) Apache Storm Apache Tinkerpop - Gremlin Apache Avatica/Phoenix Apache SOLR Apache Livy (Spark REST Service) Kafka REST Proxy UI: Name Node UI Job History UI Yarn UI Apache Oozie UI Apache HBase UI Apache Spark UI Apache Ambari UI Apache Ranger Admin Console Apache Zeppelin
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Management in Ranger ⬢ Users and Groups in Ranger are used for – Configuring policies – Authorization or access control – Auditing ⬢ Users and Groups management in ranger is handled by UserSync module ⬢ UserSync modules key responsiblities – Have Ranger access to enterprise users and groups during policy creation – Interact with directory servers and provide users, groups, and membership updates to ranger – Provides flexible configuration options – Provides Usersync Audits (Introduced in HDP 3.0)
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User sources ⬢ AD/LDAP – Syncs users and groups from LDAP Organizational Units (OU) ⬢ Unix Native Users – Syncs users and groups from /etc/passwd and /etc/group files – Sync users and groups provided by NSS (Name Service Switch) (Introduced in HDP 3.0) ⬢ File Sources – Syncs users and groups from a file specified in the configuration. – Supports many file formats like - CSV, JSON, etc...
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger UserSync Ranger Admin Database Sync Users/Groups User/Group Synchronization in Ranger Active Directory FreeIPA
  • 17. 1 7 © Hortonworks Inc. 2011–2018. All rights reserved Use case ● LDAP requirements ●Multiple LDAP (IBM Tivoli, Lotus Notes Dominoes, Active Directory On-prem, Azure Active Directory) ●Single Enterprise Data Lake Cluster across multiple countries
  • 18. 1 8 © Hortonworks Inc. 2011–2018. All rights reserved Sample User from two different domains
  • 19. 1 9 © Hortonworks Inc. 2011–2018. All rights reserved Reference Architecture
  • 20. 2 0 © Hortonworks Inc. 2011–2018. All rights reserved Steps Involved ●Setting up secure cluster (HDP version used for this demo is 3.0 and centos7) ●Active directory and FreeIPA setup ●Setting up SSSD with PAM and NSS services ●Configuring Knox with PAM+SSSD for authentication ●Configuring Ranger admin & usersync with PAM + SSSD ●Configuring services like hdfs & hive with kerberos authentication
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SSSD configuration with AD and FreeIPA servers Multiple Domains: RANGER.QE.HORTONWORKS.COM RANGERDEV.HORTONWORKS.COM RANGER.QE.HORTONWORKS.COM is an Active Directory RANGERDEV.HORTONWORKS.COM is a FreeIPA server Using Fully qualified names for users and groups from RANGERDEV.HORTONWORKS.COM in order to resolve name conflicts between two domains
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Knox topology config for PAM Authentication <value>org.apache.hadoop.gateway.shirorealm.KnoxPamRealm</value>
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Configuration for PAM and NSS ./native/pamCredValidator.uexe PAM nss From Ambari 2.7/HDP 3.0
  • 24. 2 4 © Hortonworks Inc. 2011–2018. All rights reserved Demo
  • 25. 2 5 © Hortonworks Inc. 2011–2018. All rights reserved Key Takeaways ● Kerberos is essential in Hadoop env ● Integration with enterprise directory critical for seamless security ● Centralized Security Administration ● Ranger for Centralized Authorization and Auditing ● Knox for Centralized REST/HTTP Authentication ● Multiple domains can be integrated via SSSD
  • 26. 2 6 © Hortonworks Inc. 2011–2018. All rights reserved
  • 27. 2 7 © Hortonworks Inc. 2011–2018. All rights reserved References ● Apache Ranger - ● Apache Knox - ● Configuring Knox with SSSD + PAM - 2.6.0/bk_security/content/setting_up_pam_authentication.html ● Configuring Ranger admin with SSSD + PAM - ● Configuring Ranger Usersync with SSSD + PAM -
  • 28. 2 8 © Hortonworks Inc. 2011–2018. All rights reserved Join Apache Ranger Community