Brighttalk converged infrastructure and it operations management - final
- 2. Andrew White
Cloud and Smarter Infrastructure Solution Specialist
IBM Corporation
Mr. White has fifteen years of experience designing and managing the
deployment of Systems Monitoring and Event Management software. Prior
to joining IBM, Mr. White held various positions including the leader of the
Monitoring and Event Management organization of a Fortune 100 company
and developing solutions as a consultant for a wide variety of organizations,
including the Mexican Secretaría de Hacienda y Crédito Público, Telmex,
Wal-Mart of Mexico, JP Morgan Chase, Nationwide Insurance and the US
Navy Facilities and Engineering Command.
- 4. Ground rules for this
session…
• If you can’t tell if I am trying to be funny…
–
GO AHEAD AND LAUGH!
• Feel free to text, tweet, yammer, or whatever
to share with the rest of the attendees
• If you have a question, no need to wait until
the end. Just interrupt me. Seriously… I
don’t mind.
- 6. A lack of trust and communication
are common…
- 9. According to the Federal Reserve Bank…
http://www.federalreserve.gov/pubs/feds/2007/200763/
- 10. The 1990s saw historically high
growth in productivity attributed
largely to…
- 13. Productivity Paradox
5.0%
4.5%
4.0%
3.5%
3.0%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
600
500
400
300
200
100
0
Percent Change YoY
Million USD
Spend
Chg Productivity
Linear (Spend)
Linear (Chg Productivity)
http://www.federalreserve.gov/pubs/feds/2007/200763/
- 17. Plus we created “Technical Debt” adding
additional complexity unnecessarily.
- 18. Architecture by Accident
The Humble Start…
Meeting Demand…
The First Bottleneck…
The Second Bottleneck…
Becoming Mission
Critical…
Enabling SOA…
The Fun Begins…
How Did We Get Here?
- 20. This was always a people problem
According to Gartner:
“…when asked what the single biggest challenge for
companies deploying cloud services, respondents cited
management and operational processes at the top of the list
rather than technical issues… Most legacy management
software and infrastructure solutions are mainly point products
instead of end-to-end solutions and rely on manual processes
and a high level of experience and specialized skill sets. The
lack of integration and collective complexity impact operational
efficiencies, slow business agility, and increase operational
costs for IT environments.”
OUCH!!!
- 21. What Is a System?
It is a set of interconnected actors that change
over time when they are influenced by other
elements of the system.
Actor
Actor
Actor
Actor
Actor
Actor
Actor
Actor
- 22. Two Important Properties
• The causal effect between two actors will
always impact the entire system
• Correlation != Causation
- 23. Systems are Volatile
This properties makes it difficult to control the
behavior of the system. The good news is that
systems are perfect. They always deliver the
optimum result given a specific stimuli.
- 24. Feedback Loops
Unfortunately feedback has taken on both positive and negative
indications. In reality, positive feedback is not “praise” and
negative feedback is not “criticism.” Positive feedback
reinforces while negative feedback balances.
Profits
Reinforcing
Cost Cutting
Productivity
Balancing
- 25. The Profit Equation
Business Growth
Profits
Reinforcing
Cost Cutting
Productivity
Balancing
(+)
(+)
(-)
(-)
(+)
- 27. The Plot Thickens
Business Growth
(+)
Profits
Reinforcing
Cost Cutting
Productivity
Balancing
(+)
(+)
(-)
(-)
(+)
Leverage IT
IT Expense
Sustaining
Engineering
Application
Portfolio
Server Count
Storage
Consumption
IT Developers
Supportability
(+)
(+)
(+)
Complexity
(+)
(+)
(+)
Facilities
(+)
(-)
(+)
(+)
(+)
(-)
(-)
(-)
(+)
(-)
- 29. We have to get better
• Rigid and aging infrastructure
• Inefficient and unnecessary processes
64% of IT
Spending is “Run
the Engine”
• Application and information complexity are increasing
exponentially requiring more work to maintain
• The portion of IT’s focus on new business capabilities
is decreasing at an increasing rate
72% of IT
Budgets are
OPEX
• Technical debt is being accumulated
• Organizations are separated into “stovepipes” and
technology decisions are heavily influenced by
“religion” and self-serving interest
Personnel
Represent 63%
of IT Expenses
Source: Garter IT Key Metrics Database 2013
- 32. The CIO Agenda
• Top 3 Areas of Focus
– Reduce the time to deliver value to the business
– Reduce the cost of IT and create cost efficiencies
– Improve Operations by driving simplicity
• Problems to Solve
– How to add scale without adding complexity
– How to support increased consumption with additional cost
– How to deliver these without sacrificing customer experience
- 33. What a Coincidence!
In 2012, IDC conducted a study titled “Converged Systems: State of the
Market and Future Outlook 2012: Market Analysis” which concluded the top
three reasons for adopting Converged Infrastructure over the traditional
component-based approach are:
1. Time to Service – It needs to respond more quickly to new business
requests
2. Cost Efficiency – Converged Infrastructure delivers overall reduced cost
of ownership through workload consolidation, reduced space, and
reduced power and cooling costs
3. Operations Improvements – Consolidation of vendors, streamlined
support, pre-validated interoperability greatly simplify data center
operations
- 34. Cleaning Up the Landscape
Adapted from: Akella, Janaki. “IT Architecture: Cutting costs and complexity.” McKinsey Quarterly 13 Nov 2009
https://www.mckinseyquarterly.com/IT_architecture_Cutting_costs_and_complexity_2391
Silo
Monolithic
Framework
Niche
Management Security Business
Continuity
Launch Pad
Information Bus
Management Security Business
Continuity
- 35. Perceived Value
According to Gartner’s “Market Share Analysis:
Data Center Hardware Integrated Systems,
1Q11-2Q12,” integrated systems:
• Better performance
• Improved cost/performance ratio
• Simplified deployment
• Increased optimization
• Increased automation
• Lower cost of IT operations
• Simplified sourcing and support
• Change in focus from IT maintenance to IT innovation
- 37. Like any journey…
We have a beginning [What is our product]
and a map [the 4 part plan]
the destination [Software Designed Environment]
something to lighten the load [Converged Infrastructure]
and some required skills [Cost and Capacity Management]
- 38. The experience of working with IT has
become the product offered to the Business
http://www.flickr.com/photos/anneacaso/3693155059/sizes/l/in/photostream/
- 40. 1. Understand cost
2. Identify and remove waste
3. Manage to capacity
4. Execute good change management
- 41. Software Defined Environments provides abstractions of workloads,
services and infrastructure and an end-to-end mappings
Workload Abstraction
Based on pattern and
functional and non-functional requirements
Resource Abstraction
Semantically rich abstractions of heterogeneous
resource capabilities and system components
Mapping to resource
Map requirements to potential system
architectures. Proactively orchestrate
.
infrastructure and workload
Continuous Optimization
Autonomously construct available system
architecture to optimize workload outcome
Agility
Consumability Efficiency
Software Defined Environments
IMG
IMG
IMG Agile Workload
Development Services
Workload Abstraction
Analytics
Map/Reduce
Web 2.0 Pattern
Continuous, Autonomous Mapping
SSD HDD
Tape
Resource Abstraction
PowerVM
x86 KVM
Transactional
J2EE/OLTP
PowerVM
x86 KVM
RDMA
Ethernet
Software Defined Compute, Network and Storage
Agility, Consumability, Efficiency (ACE)
Web
- 42. Where we are headed
Private cloud
Hybrid IT
Public cloud
Traditional IT and clouds (public and/or private) that
remain separate but are bound together by technology
that enables data and application portability
Traditional IT
On or off premises cloud infrastructure
operated solely for an organization and
managed by the organization or a third party
Available to the general public or a large
industry group and owned by an
organization selling cloud services.
Appliances, pre-integrated systems and
standard hardware, software and networking.
- 43. Architecture on Purpose
Environments
QA
PROD
Banking Application
Banking Application
Banking Application
DEV
IBM UrbanCode Deploy
OpenStack Heat
IBM Platform Resource Scheduler
NetworkServer
Storage
Application "
Lifecycle
Applications
Heat Orchestration Template (HOT)Heat Orchestration Template (HOT)
OpenStack Heat
IBM Platform Resource Scheduler
NetworkServer
Storage
TEST
IBM Cloud Orchestrator
Public
Dedicated
Traditional Private
IT
Application
template
Infrastructure
template
Hardware
- 45. Top 5 reasons IT projects fail
1. The inability to challenge assumptions
2. Poor role definitions and unclear priorities
3. A “silo” mentality
4. The unwillingness to compromise
5. A focus on the technology rather than the focus on the solution
- 46. How many times have we
documented these lessons learned?
• The expectations of the stakeholders were not in
touch with the reality of what IT could deliver
• We underestimated the complexity
• The market changed before we finished
• The request was driven the the perception of a need
and not the reality
• Assumptions were undocumented and requirements
were hastily defined
- 47. Not having a common
understanding of quality puts more
pain into an organization than
anything else I have ever known.
Philip Crosby, Let’s Talk Quality, 1989
- 49. ITIL Overview of Capacity Management
Business Objective
IT Strategy
Tactical Processes
Service Desk, Incidents, Problems,
Changes, Releases, Configuration
Strategic Processes
SLM, Finance, Capacity, Availability,
Business Continuity
- 50. Why capacity?
• This process is typically run ad-hoc (e.g. spreadsheets and
“gut feel”)
• Planning is typically limited to individual silos
Requirements Business Case
Return on
Investment
Total Cost of
Ownership
Availability
Performance
Risk
- 51. Summary of CLOUD Recommendations
In 2011, The TechAmerica Foundation published a report for the Obama Administration
titled “US Deployment of the Cloud (CLOUD2)
• Need for collaboration &
standardization of data
access across national
borders
• Recommendations in
policy, infrastructure,
and training to help
facilitate broader
adoption of the cloud
• Require vendors to
share relevant
information about their
capabilities, offerings
and service levels
• Ensuring the
combination of factors
that allows consumers
of cloud services to be
confident that the
services are meeting
their computing needs
Trust Transparency
Transnational
Transformation Data Flows
- 52. CLOUD Recommendations on Trust
Ensuring that the cloud is meeting consumer’s needs for security, privacy, availability
Factors Contributing to Trust
• Transparency of practices
• Accountability
• Resiliency
• Redundancy
• Access and Connectivity
• Supply chain provenance
• Life cycle integrity
• Governance
- 53. Capacity Sub-Processes
Business Capacity
Service Capacity
Resource Capacity
Application
Sizing
Demand
Mgmt
Capacity Plan Data Warehouse
Iterative
Activities
• Monitor
• Analysis
• Tuning
• Implement
Modeling
• Trend
Analysis
Capacity
Data
Storage
• Business
• Service
• Technical
• Utilization
- 54. The 3 Needs of the Business
Service Level
Management
Meet the consumer’s expectations for service availability
Performance
Management
Ensure good performance for each consumer’s application
Resource
Optimization
Continuously rebalance resources to limit unnecessary capital expenses
- 55. Capacity Management at
the Resource Level
• Identify and understand the Capacity and utilization of
each component part of the IT infrastructure
• Recommend optimization of hardware and software
• Measure and store resource usage at a process level
• Identify bottlenecks and potential future problems
• Characterize workloads and business drivers
• Evaluate alternative upgrades to meet workloads
• Proactive rather than reactive
• No surprises in performance or IT budgets
- 56. Capacity Management at
the Service Level
• Identify and understand the IT services
• Assess their use of resources
• Identify their working patterns, peaks & troughs
• Ensure that SLA targets are viable
• Monitor performance to identify violations
• Resource data aggregated by application
• Pre-empt difficulties wherever possible
• Proactive rather than reactive
- 57. Capacity Management at
the Business Level
• Published corporate performance objectives
– Standard local metrics defining contribution
– Unification of analytical information
– Improved managers’ business insight
– Greater local accountability via KPIs
– Resource data aggregated by application and then weighted
• Enterprise framework for measurement
– Published Reports and exception reports
– Automated alarms and interpretation
– Interactive Dashboard for alert/drill down
– Predicted outcomes across framework
• Business agility to adjust as necessary
– Strategic modeling to view scenarios
– Ensured focus and drive to growth
– Effective liaison between IT & Management
- 58. Capacity Management Imperatives
Trending Organic Growth:
Analytic tools to help forecast demand and
identify opportunities for efficiency
Modeling Capacity Consumption:
Leveraging elastic resource capacity can help
delay capital expenditures
Providing Cost Transparency:
Metering allows you to affect behavior through
service pricing and helps control “sprawl”
- 59. Looking back to a simpler time
Answering “what if” questions…
• Change in technology, demand, etc… impact?
• Focus on Optimizing Server Cost versus Performance
Extremely Technology-centric
• Servers, Mainframes
• Occasionally Storage or Network – in isolation
• Few distributed servers, even fewer critical apps running on them
• No web-based applications or e-commerce
Big Value and Return, but also effort
• Highly trained staff
• Requires building a central, long term repository (CMIS)
• Scalability of Staff, Tools, …, Politics!
• Many analysts, few systems
Capacity planning was Resource-oriented, not Business/Service oriented
- 60. Capacity Models Used to Be Simple
Capacity
CAPEX Rising Demand Scenario
Consumed
Capacity
Time
Forecasted
Demand
Installed
Capacity
Falling Demand Scenario
Overhead
Downtime
- 61. A new thought process
In the past, the approach to capacity
was similar to an apartment complex.
Tenants arrive and occupy space for
several years at a time.
Consumption was fairly static and
easy to predict…
… In the future, our approach to
capacity will need to be more like a
hotel. Some tenants may be long
term consumers but most will occupy
the space for a short time and then
vacate. This will make forecasting
demand more difficult.
- 63. The evolution of cost and capacity
Used Capacity
Allocated Capacity
Useable Capacity
Raw Capacity
Stranded
Capacity
Allocated Capacity:
The sum of all assignments granted to all customers.
Each individual customer is paying for and expects to have
access to their entire assignment regardless of whether it
exists or not.
Usable Capacity:
The capability of the infrastructure after losses to
administration, hypervisors, redundancy, etc.
Rebalancing Threshold:
When the consumption crosses this threshold the
environment is rebalanced. If consumption does
not fall below the threshold then more capacity is
purchased.
Stand Alone
Deployment
Cloud
Deployment
- 64. Where does oversubscription occur?
Load Balancer!
Corporate!
LANs & VPNs!
Load Balancer!
Firewall!
Switch!
VM Server Farm!
Database!
NAS !
Appliances!
Storage!
Frame!
Web Servers!
Load Balancer!
Common Locations
1. Hypervisor
2. CPU Cycles
3. Memory
4. Blade Backplane I/O
5. SAN Fabric
6. Network Interfaces
7. Host Bus Adapters
8. Backup Device
9. WAN Circuits
10. Storage Processors
!
!
!
!
!
!
!
!
!
!
Here
Here
Here Here
- 65. More complexity
Capacity
“Cloud”
Capacity
Consumed
Capacity
Time
Forecasted
Demand
Subscribed
Capacity
Installed
Capacity
Trending
Alert
Threshold
Alert
CAPEX
Waste
Overhead
Outage
Risk
Downtime
- 66. The new KPIs
• Buffering Capacity:
The amount of capacity kept in reserve to absorb spikes in demand
• Flexibility vs Stiffness:
The systems ability to restructure itself as used capacity increases
beyond the balancing threshold
• Margin:
The maximum acceptable load before measurable occur to
application performance
• Tolerance:
How the applications behave as the system reaches the margin. This
can be either observed or forced behaviors.
- 68. The importance of cost management
The “Showback” Model – A Pragmatic Approach
A “showback” system presents individual business units
or projects how much is being spent on cloud services.
An Ideal Chargeback/Showback Cycle:
1. Increase transparency of costs and usage
2. Increase accountability within business units
6. Reduce IT services costs
3. Promote cost-conscious consumption
The “Chargeback” Model – The Ideal
A chargeback system holds business units or projects
accountable for cloud costs. Costs are “charged back” to
units or projects responsible for consumption.
6. Associate costs with actual benefits
5. Improve business/IT alignment
- 69. Tool Requirements
• The ability to collect performance and resource
consumption monitors for all systems which contribute to
the service
• A repository to warehouse the historical data
• The ability to import cost data and calculate consumption
in Natural Forecast Units
• Provide a facility to generate reports automatically
• Offer a policy engine to direct workload placement and
generate events to trigger a capacity review
• Include a modeling engine that can help forecast
consumption and provide recommendations for
rebalancing
• The tool needs to be “VM-aware”
- 70. Let’s keep the
conversation going…
APWhite@us.ibm.com!
Andrew.P.White@Gmail.com!
@SystemsMgmtZen!
SystemsManagementZen.Wordpress.com!
systemsmanagementzen.wordpress.com/feed/!
ReverendDrew!
ReverendDrew!
614-306-3434!