Practical Cloud Security
- 2. Agenda
• Background and Disclaimers
• Netflix in the Cloud
• Model-Driven Deployment Architecture
• APIs, Automation, and the Security Monkey
• Cloud Firewall and Connectivity Analysis
• Practical Cloud Security Gaps
Tuesday, October 11, 2011
- 5. Background and
Disclaimers
• No cloud definitions, but . . .
Tuesday, October 11, 2011
- 6. Background and
Disclaimers
• No cloud definitions, but . . .
• Focus on IaaS
Tuesday, October 11, 2011
- 7. Background and
Disclaimers
• No cloud definitions, but . . .
• Focus on IaaS
• Netflix uses Amazon Web Services
Tuesday, October 11, 2011
- 8. Background and
Disclaimers
• No cloud definitions, but . . .
• Focus on IaaS
• Netflix uses Amazon Web Services
• Guidance should be generally applicable
Tuesday, October 11, 2011
- 9. Background and
Disclaimers
• No cloud definitions, but . . .
• Focus on IaaS
• Netflix uses Amazon Web Services
• Guidance should be generally applicable
• Works in progress, still many problems to
solve . . .
Tuesday, October 11, 2011
- 13. !"#"$%&'#&($
Netflix could not build data centers fast enough
Tuesday, October 11, 2011
- 14. !"#"$%&'#&($
Netflix could not build data centers fast enough
Capacity requirements accelerating, unpredictable
Tuesday, October 11, 2011
- 15. !"#"$%&'#&($
Netflix could not build data centers fast enough
Capacity requirements accelerating, unpredictable
Product launch spikes - iPhone, Wii, PS2, XBox
Tuesday, October 11, 2011
- 16. Outgrowing Data Center
http://techblog.netflix.com/2011/02/redesigning-netflix-api.html
Netflix API: Growth in Requests
Tuesday, October 11, 2011
- 17. Outgrowing Data Center
http://techblog.netflix.com/2011/02/redesigning-netflix-api.html
Netflix API: Growth in Requests
37x Growth 1/10 - 1/11
Tuesday, October 11, 2011
- 18. Outgrowing Data Center
http://techblog.netflix.com/2011/02/redesigning-netflix-api.html
Netflix API: Growth in Requests
37x Growth 1/10 - 1/11
Tuesday, October 11, 2011
- 19. Outgrowing Data Center
http://techblog.netflix.com/2011/02/redesigning-netflix-api.html
Netflix API: Growth in Requests
37x Growth 1/10 - 1/11
!"#"$%&#%'(
)"*"$+#,(
Tuesday, October 11, 2011
- 21. netflix.com is now
~100% Cloud
Remaining components being migrated
Tuesday, October 11, 2011
- 25. Data Center Patterns
• Long-lived, non-elastic systems
• Push code and config to running systems
Tuesday, October 11, 2011
- 26. Data Center Patterns
• Long-lived, non-elastic systems
• Push code and config to running systems
• Difficult to enforce deployment patterns
Tuesday, October 11, 2011
- 27. Data Center Patterns
• Long-lived, non-elastic systems
• Push code and config to running systems
• Difficult to enforce deployment patterns
• ‘Snowflake phenomenon’
Tuesday, October 11, 2011
- 28. Data Center Patterns
• Long-lived, non-elastic systems
• Push code and config to running systems
• Difficult to enforce deployment patterns
• ‘Snowflake phenomenon’
• Difficult to sync or reproduce
environments (e.g. test and prod)
Tuesday, October 11, 2011
- 31. Cloud Patterns
• Ephemeral nodes
• Dynamic scaling
Tuesday, October 11, 2011
- 32. Cloud Patterns
• Ephemeral nodes
• Dynamic scaling
• Hardware is abstracted
Tuesday, October 11, 2011
- 33. Cloud Patterns
• Ephemeral nodes
• Dynamic scaling
• Hardware is abstracted
• Orchestration vs. manual steps
Tuesday, October 11, 2011
- 34. Cloud Patterns
• Ephemeral nodes
• Dynamic scaling
• Hardware is abstracted
• Orchestration vs. manual steps
• Trivial to clone environments
Tuesday, October 11, 2011
- 35. When Moving to the Cloud,
Leave Old Ways Behind . . .
Tuesday, October 11, 2011
- 36. When Moving to the Cloud,
Leave Old Ways Behind . . .
Generic forklift is generally a mistake
Tuesday, October 11, 2011
- 37. When Moving to the Cloud,
Leave Old Ways Behind . . .
Generic forklift is generally a mistake
Adapt development, deployment, and management
models appropriately
Tuesday, October 11, 2011
- 38. When Moving to the Cloud,
Leave Old Ways Behind . . .
Generic forklift is generally a mistake
Adapt development, deployment, and management
models appropriately
Tuesday, October 11, 2011
- 39. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
Tuesday, October 11, 2011
- 40. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
Perforce
SCM
Tuesday, October 11, 2011
- 41. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
Continuous
Integration
Jenkins
Perforce
SCM
Tuesday, October 11, 2011
- 42. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
Continuous
Integration
Jenkins
Perforce Artifactory
SCM Binary
Repository
Tuesday, October 11, 2011
- 43. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
App-Specific
Continuous Packages and
Integration Configuration
Jenkins Yum
Perforce Artifactory
SCM Binary
Repository
Tuesday, October 11, 2011
- 44. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
App-Specific
Continuous Packages and
Integration Configuration
Jenkins Yum
Perforce Artifactory Bakery
SCM Binary Combine Base and
Repository App-Specific
Configuration
Tuesday, October 11, 2011
- 45. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
App-Specific Customized,
Continuous Packages and Cloud-Ready
Integration Configuration Image
Jenkins Yum AMI
Perforce Artifactory Bakery
SCM Binary Combine Base and
Repository App-Specific
Configuration
Tuesday, October 11, 2011
- 46. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
App-Specific Customized,
Continuous Packages and Cloud-Ready
Integration Configuration Image
Jenkins Yum AMI
Perforce Artifactory Bakery ASG
SCM Binary Combine Base and Dynamic
Repository App-Specific Scaling
Configuration
Tuesday, October 11, 2011
- 47. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
App-Specific Customized,
Continuous Packages and Cloud-Ready
Integration Image Live System!
Configuration
Jenkins Yum AMI Instance
Perforce Artifactory Bakery ASG
SCM Binary Combine Base and Dynamic
Repository App-Specific Scaling
Configuration
Tuesday, October 11, 2011
- 48. Netflix Build and Deploy
http://techblog.netflix.com/2011/08/building-with-legos.html
App-Specific Customized,
Continuous Packages and Cloud-Ready
Integration Image Live System!
Configuration
Jenkins Yum AMI Instance
Perforce Artifactory Bakery ASG
SCM Binary Combine Base and Dynamic
Repository App-Specific Scaling
Configuration
Every change is a new push
Tuesday, October 11, 2011
- 50. Results
• No changes to running systems
Tuesday, October 11, 2011
- 51. Results
• No changes to running systems
• No CMDB
Tuesday, October 11, 2011
- 52. Results
• No changes to running systems
• No CMDB
• No systems management infrastructure
Tuesday, October 11, 2011
- 53. Results
• No changes to running systems
• No CMDB
• No systems management infrastructure
• Fewer logins to prod systems
Tuesday, October 11, 2011
- 56. Impact on Security
• File integrity monitoring
• User activity monitoring
Tuesday, October 11, 2011
- 57. Impact on Security
• File integrity monitoring
• User activity monitoring
• Vulnerability management
Tuesday, October 11, 2011
- 58. Impact on Security
• File integrity monitoring
• User activity monitoring
• Vulnerability management
• Patch management
Tuesday, October 11, 2011
- 61. Common Challenges for
Security Engineers
• Lots of data from different sources, in
different formats
Tuesday, October 11, 2011
- 62. Common Challenges for
Security Engineers
• Lots of data from different sources, in
different formats
• Too many administrative interfaces and
disconnected systems
Tuesday, October 11, 2011
- 63. Common Challenges for
Security Engineers
• Lots of data from different sources, in
different formats
• Too many administrative interfaces and
disconnected systems
• Too few options for scalable automation
Tuesday, October 11, 2011
- 66. How do you . . .
• Add a user account?
Tuesday, October 11, 2011
- 67. How do you . . .
• Add a user account?
• Inventory systems?
Tuesday, October 11, 2011
- 68. How do you . . .
• Add a user account?
• Inventory systems?
• Change a firewall config?
Tuesday, October 11, 2011
- 69. How do you . . .
• Add a user account?
• Inventory systems?
• Change a firewall config?
• Snapshot a drive for
forensic analysis?
Tuesday, October 11, 2011
- 70. How do you . . .
• Add a user account?
• Inventory systems?
• Change a firewall config?
• Snapshot a drive for
forensic analysis?
• Disable a multi-factor
authentication token?
Tuesday, October 11, 2011
- 71. How do you . . .
• Add a user account? • CreateUser()
• Inventory systems?
• Change a firewall config?
• Snapshot a drive for
forensic analysis?
• Disable a multi-factor
authentication token?
Tuesday, October 11, 2011
- 72. How do you . . .
• Add a user account? • CreateUser()
• Inventory systems? • DescribeInstances()
• Change a firewall config?
• Snapshot a drive for
forensic analysis?
• Disable a multi-factor
authentication token?
Tuesday, October 11, 2011
- 73. How do you . . .
• Add a user account? • CreateUser()
• Inventory systems? • DescribeInstances()
• Change a firewall config? • AuthorizeSecurityGroup
Ingress()
• Snapshot a drive for
forensic analysis?
• Disable a multi-factor
authentication token?
Tuesday, October 11, 2011
- 74. How do you . . .
• Add a user account? • CreateUser()
• Inventory systems? • DescribeInstances()
• Change a firewall config? • AuthorizeSecurityGroup
Ingress()
• Snapshot a drive for
forensic analysis? • CreateSnapshot()
• Disable a multi-factor
authentication token?
Tuesday, October 11, 2011
- 75. How do you . . .
• Add a user account? • CreateUser()
• Inventory systems? • DescribeInstances()
• Change a firewall config? • AuthorizeSecurityGroup
Ingress()
• Snapshot a drive for
forensic analysis? • CreateSnapshot()
• Disable a multi-factor • DeactivateMFADevice()
authentication token?
Tuesday, October 11, 2011
- 76. Security Monkey
http://techblog.netflix.com/2011/07/netflix-simian-army.html
Tuesday, October 11, 2011
- 77. Security Monkey
http://techblog.netflix.com/2011/07/netflix-simian-army.html
• Leverages cloud APIs
Tuesday, October 11, 2011
- 78. Security Monkey
http://techblog.netflix.com/2011/07/netflix-simian-army.html
• Leverages cloud APIs
• Centralized framework for cloud security
monitoring and analysis
Tuesday, October 11, 2011
- 79. Security Monkey
http://techblog.netflix.com/2011/07/netflix-simian-army.html
• Leverages cloud APIs
• Centralized framework for cloud security
monitoring and analysis
• Certificate and cipher monitoring
Tuesday, October 11, 2011
- 80. Security Monkey
http://techblog.netflix.com/2011/07/netflix-simian-army.html
• Leverages cloud APIs
• Centralized framework for cloud security
monitoring and analysis
• Certificate and cipher monitoring
• Firewall configuration checks
Tuesday, October 11, 2011
- 81. Security Monkey
http://techblog.netflix.com/2011/07/netflix-simian-army.html
• Leverages cloud APIs
• Centralized framework for cloud security
monitoring and analysis
• Certificate and cipher monitoring
• Firewall configuration checks
• User/group/policy monitoring
Tuesday, October 11, 2011
- 84. Analyzing Traditional
Firewalls
• Positioned at network chokepoints,
providing optimal internetwork visibility
Tuesday, October 11, 2011
- 85. Analyzing Traditional
Firewalls
• Positioned at network chokepoints,
providing optimal internetwork visibility
• Use tools like tcpdump, NetFlow,
centralized logging to gather data
Tuesday, October 11, 2011
- 86. Analyzing Traditional
Firewalls
• Positioned at network chokepoints,
providing optimal internetwork visibility
• Use tools like tcpdump, NetFlow,
centralized logging to gather data
• Review traffic patterns and optimize
Tuesday, October 11, 2011
- 89. AWS Firewalls (Briefly)
• “Security Group” is unit of measure for
firewalling
• Policy-driven and network-agnostic,
configuration follows an instance
Tuesday, October 11, 2011
- 90. AWS Firewalls (Briefly)
• “Security Group” is unit of measure for
firewalling
• Policy-driven and network-agnostic,
configuration follows an instance
• Network diagram irrelevant
Tuesday, October 11, 2011
- 91. AWS Firewalls (Briefly)
• “Security Group” is unit of measure for
firewalling
• Policy-driven and network-agnostic,
configuration follows an instance
• Network diagram irrelevant
• Chokepoints and sniffing are not possible
Tuesday, October 11, 2011
- 92. AWS Firewalls (Briefly)
• “Security Group” is unit of measure for
firewalling
• Policy-driven and network-agnostic,
configuration follows an instance
• Network diagram irrelevant
• Chokepoints and sniffing are not possible
• Outbound connections not filterable (!)
Tuesday, October 11, 2011
- 95. Security Group Analysis
• Use config and inventory to map reachability
• Leverage APIs to evaluate reachability and
detect violations:
Tuesday, October 11, 2011
- 96. Security Group Analysis
• Use config and inventory to map reachability
• Leverage APIs to evaluate reachability and
detect violations:
• Security groups with no members
Tuesday, October 11, 2011
- 97. Security Group Analysis
• Use config and inventory to map reachability
• Leverage APIs to evaluate reachability and
detect violations:
• Security groups with no members
• “Insecure” services (e.g. Telnet, FTP)
Tuesday, October 11, 2011
- 98. Security Group Analysis
• Use config and inventory to map reachability
• Leverage APIs to evaluate reachability and
detect violations:
• Security groups with no members
• “Insecure” services (e.g. Telnet, FTP)
• Rules that use “any” keyword
Tuesday, October 11, 2011
- 99. Security Group Analysis
• Use config and inventory to map reachability
• Leverage APIs to evaluate reachability and
detect violations:
• Security groups with no members
• “Insecure” services (e.g. Telnet, FTP)
• Rules that use “any” keyword
• Visualize config into data flow diagram
Tuesday, October 11, 2011
- 103. Connectivity Analysis
• Reachability shows what “can” communicate
• What about what is communicating?
Tuesday, October 11, 2011
- 104. Connectivity Analysis
• Reachability shows what “can” communicate
• What about what is communicating?
• Take same approach, leverage APIs for
firewall and inventory and combine with
host data
Tuesday, October 11, 2011
- 105. Connectivity Analysis
• Reachability shows what “can” communicate
• What about what is communicating?
• Take same approach, leverage APIs for
firewall and inventory and combine with
host data
• Visualize data into connectivity diagram
Tuesday, October 11, 2011
- 109. Common Security
Product Model
• Examples - AV, FIM, etc.
Tuesday, October 11, 2011
- 110. Common Security
Product Model
• Examples - AV, FIM, etc.
• “Management” station with client “nodes”
Tuesday, October 11, 2011
- 111. Common Security
Product Model
• Examples - AV, FIM, etc.
• “Management” station with client “nodes”
• Limited tagging or abstraction
Tuesday, October 11, 2011
- 112. Common Security
Product Model
• Examples - AV, FIM, etc.
• “Management” station with client “nodes”
• Limited tagging or abstraction
• Strong “manager” and “managed” model
Tuesday, October 11, 2011
- 113. Common Security
Product Model
• Examples - AV, FIM, etc.
• “Management” station with client “nodes”
• Limited tagging or abstraction
• Strong “manager” and “managed” model
• Push and pull approaches
Tuesday, October 11, 2011
- 114. Common Security
Product Model
• Examples - AV, FIM, etc.
• “Management” station with client “nodes”
• Limited tagging or abstraction
• Strong “manager” and “managed” model
• Push and pull approaches
• Per node licensing
Tuesday, October 11, 2011
- 117. “Thundering Herd”
• Mass deployments
• “Red/Black” push - concurrent clusters of
500+ nodes
Tuesday, October 11, 2011
- 118. “Thundering Herd”
• Mass deployments
• “Red/Black” push - concurrent clusters of
500+ nodes
• Elasticity related to traffic spikes
Tuesday, October 11, 2011
- 119. “Thundering Herd”
• Mass deployments
• “Red/Black” push - concurrent clusters of
500+ nodes
• Elasticity related to traffic spikes
• Licensing constraints
Tuesday, October 11, 2011
- 121. Node Ephemerality and
Service Abstraction
• Data related to individual nodes becomes
less important
Tuesday, October 11, 2011
- 122. Node Ephemerality and
Service Abstraction
• Data related to individual nodes becomes
less important
• Dealing with short-lived systems, IP and ID
reuse
Tuesday, October 11, 2011
- 123. Node Ephemerality and
Service Abstraction
• Data related to individual nodes becomes
less important
• Dealing with short-lived systems, IP and ID
reuse
• Event and log archives and data relationships
Tuesday, October 11, 2011
- 125. Resource Usage
Logging and Auditing
• Public-facing APIs make access controls
more difficult and more important
Tuesday, October 11, 2011
- 126. Resource Usage
Logging and Auditing
• Public-facing APIs make access controls
more difficult and more important
• Programmable infrastructure needs robust
logging and auditing capabilities
Tuesday, October 11, 2011
- 127. Resource Usage
Logging and Auditing
• Public-facing APIs make access controls
more difficult and more important
• Programmable infrastructure needs robust
logging and auditing capabilities
• Can metering data be repurposed?
Tuesday, October 11, 2011
- 130. Identity Integration
• Federation use cases
• On-instance credentials
Tuesday, October 11, 2011
- 132. “Trusted Cloud”
• Various components related to providing
higher assurance/trust levels in the cloud
Tuesday, October 11, 2011
- 133. “Trusted Cloud”
• Various components related to providing
higher assurance/trust levels in the cloud
• Virtual TPM / hardware root of trust
Tuesday, October 11, 2011
- 134. “Trusted Cloud”
• Various components related to providing
higher assurance/trust levels in the cloud
• Virtual TPM / hardware root of trust
• Controlled execution
Tuesday, October 11, 2011
- 135. “Trusted Cloud”
• Various components related to providing
higher assurance/trust levels in the cloud
• Virtual TPM / hardware root of trust
• Controlled execution
• HSM in the cloud
Tuesday, October 11, 2011
- 136. Thanks!
Questions?
chan@netflix.com
(I’m hiring!)
Tuesday, October 11, 2011
- 137. References
• http://www.slideshare.net/adrianco
• http://aws.amazon.com
• http://techblog.netflix.com
• http://nordsecmob.tkk.fi/Thesisworks/Soren
%20Bleikertz.pdf
• https://cloudsecurityalliance.org/
• http://www.nist.gov/itl/cloud/index.cfm
Tuesday, October 11, 2011