CMG 101 - Understanding performance
- 2. CMG 101
Computer Cloud Measurement Group
Understand:
• Definitions of availability and response time
• Psychological and business effect of delay/response time. User
interfaces, cost of downtime
• Transactions, and their structure.
• Waterfall diagrams for transactions and web page downloads
• Performance measures (seconds, bytes, bits per seconds, IOPS, etc).
• Reporting measures / metrics.
• Visualization of quantitative data, how to
• Resources (CPU, memory, disk, network, software)
• Elementary queuing theory
• Phases in development and how to incorporate performance and capacity
(analysis, design, etc.), performance engineering
• Typical free and commercial tools, or at least their functionality
– monitoring, reporting, alerting, analysis, modelling
- 3. Availability and Response Time
• Availability: Ability of a
Configuration Item or IT
Service to perform its
agreed Function when
required. *…+ Availability is
usually calculated as a
percentage.
• Response Time: A
measure of the time taken
to complete an Operation
or Transaction
- 6. Pageviews
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
1-jan-08
29-jan-08
26-feb-08
25-Mrt-2008
22-apr-08
20-mei-08
17-jun-08
15-jul-08
IceSave failure
12-aug-08
9-sep-08
Pageviews
7-okt-08
Pageviews
4-nov-08
2-dec-08
30-dec-08
27-jan-09
24-feb-09
24-Mrt-2009
Sudden surges can kill you
21-apr-09
19-mei-09
Bron: SiteStat
- 7. KNMI.nl
Pageviews per hour
180000
160000
140000 Weather alarm day
120000
100000
30-dec
31-dec
80000
60000
40000
Ordinary day
20000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
- 8. Transactions and their structure
waterfall diagrams
A single user level transaction decomposes into
multiple transactions on components
Client Server
Yslow detail
Query
Netwerk latency
Ack
Server
turnaround
time
Reply
Ack
- 9. Transactions:
from visits to bandwidth
1,7 visits/sec
Visits Sitestat meting
6.380 /uur
7,42 pageviews per bezoek (volgens
SiteStat), echter lager tijdens crisis
79 GET per bezoek
13 pageviews/sec
volgens logfile en
Sitestat Pageviews Sitestat meting, Serverlogs
Pageopbouw via FireBug
47.338 /uur
10,6 (=79/7,42) GET/pageview effectief
32 GET voor homepage (volgens browser)
GET requests HTTP Serverlogs 140 requests/sec
Circa 6800 bytes per request gemiddeld
HTTP Serverlogs
0,95 Mbyte/sec
Bandwidth
9
7,6 Megabit/sec
© Digital Infrastructures
- 10. How to diagnose a problem,
where to look? Resource = capacity
(Test) client
WAN Link
Users
Router Switch
(CPE)
Firewall, Proxy Application
LAN switches
End to end Load Balancer
HTTP front end Server Network
MySQL DB
NAS
Network lines
SAN
Example breakdowns
- 11. Resource contribution to response time,
modeling different resource allocations
Modelling different network bandwidth’s effect on response time Excessive client/server
chatter leads to a user
64K
interaction time of more
256K
than 7 minutes!
ICTRO 2Mb
Op basis van 50 mSec
GBO roundtrip op het WAN
0 100 200 300 400 500 How much faster will
this be with?
Server tijd (sec) Client tijd (sec)
•Very fast network/
Netwerk tijd delay (sec) Netwerk tijd bandbreedte (sec)
•Very fast client /
Na het uitvragen van de medewerkersnummers (er zijn 373 Janssen’s), worden dienstverbanddetails
per stuk uitgevraagd (in totaal 612). Dit leidt op het GBO LAN tot 30 sec doorlooptijd (gemeten).
•Very fast server
- 12. Queuing theory
Response depends on capacity At higher
loads, congestion can set
in
Actual throughput
12
10
Delay factor
8 Perfect
6
Sweet spot
4
Congestion
2
0
10% 20% 30% 40% 50% 60% 70% 80% 90%
Sweet spot
Utilisation Traffic load
- 13. So what was the bottleneck?
• KNMI: static page served from database
1000/sec
• Ministry: very chatty client/server interaction
• DNB: JSP application server serves static
content
• Anne Frank: many, large digital assets, no use
of CDN
• Hospital information system: client (front-end)
code
- 15. Typical free and commercial tools
and their functionality
Functionality Example tools
• Monitoring • Nagios
• Reporting • Cacti
• Alerting • WatchMouse
• Analysis • PDQ
• Modelling • R
• Etc … • Yslow
• …
- 16. CMG 101
• We want to develop a ‘standard’ body of
knowledge
– To educate our people
– Speak more of the same language
– Enable tool vendors to more easily express their
offerings
• Note: defining what is in the course is not the
same as developing a course
- 17. Call for Action
• Want to know more?
• Want to collaborate, contribute?
• Want to get a course?
• Want to sponsor?
• Talk to me
Peter HJ van Eijk
@petersgriddle
inbox@peterhjvaneijk.nl
+31 2268 4939
www.nlcmg.nl NLCMG is a chapter of CMG.org
- 18. Some of my performance projects
• KNMI (Weather service): website meltdown after
weather emergency (“weeralarm”)
• DNB (Dutch Banks Authority): website meltdown
during 2008 financial crisis
• Unnamed Ministry: information system with
multi-minute response times
• Crisis.nl: ….
• Anne Frank website: … anticipated surge after
major redesign
• Hospital information system: storage sizing