Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard

Disaster Recovery:
Understanding Implication,
Methodology, Solution & Standard
Sutedjo Tjahjadi, Managing Director

Table of Content
2
Introduction to Datacomm Cloud Business
Understanding Natural Disaster Trend
Risk & Implication to Business
Methodology & Strategy Protection
Solutions for Building Disaster Recovery System
Available Standard for Reference

Table of Content
3

Who is
Datacomm Cloud Business?
4
ABOUT
PT. Datacomm Diangraha business division focusing on Cloud
opportunity in Indonesia. Our philosophy are Enterprise,
Secure and Local.
PRODUCTS
Sentriciti – Security as a Service (SecaaS)
Cloudciti – Cloud Services (XaaS)
Datacomm Datacenter Federation
Datacomm Azure Stack
WHY US?
We are backed up by +29 years experiences and +500 people.
We have strong financial commitment, invested in it owns local
Data Center. We also certified by international standard.

5
Who is
Datacomm Cloud Business?

Certified Infrastructure & Facilities
6
Local Support and Data center location
24x7 Help Desk Support Center
24x7 Network Support Center
24x7 Security Desk Support Center
Rated 3 Constructed
Facilities by TIA-942
CERTIFIED
DCOS Maturity 4
by TIA-942
CERTIFIED
ISO 9001:2008
CERTIFIED
ISO 27001:2013
CERTIFIED
ISO 20000:2011
CERTIFIED
PCI-DSS
CERTIFIED

Table of Content
7

9
NEWS
Padamnya Datacenter membuat sejumlah situs
besar tidak dapat diakses.
Source: detik.com

10
NEWS
Akibat aliran listrik sering hidup mati,
menyebabkan server pencetakan Kartu Tanda
Penduduk Elektronik (KTP-E) di Dinas
Kependudukan dan Pencatatan Sipil (Dukcapil)
Lebong rusak. Dampaknya lebih kurang 500
KTP-E tak bisa dicetak.
Source: bengkuluekspress.com

11
NEWS
Blackout Jakarta sebabkan payment gateway
tidak dapat beroperasi
Source: indotelko.com

Table of Content
12

13
Structural Damages
Reality of Disaster
Natural Disasters Socio-political

14
Not Just Natural Disaster
Power Failures
26%
Hardware Failures
19%
Network Outages
10%
Software Failures
9%
Human Errors
8%
Everything else
30%
Source: Forrester Disaster Recovery Journal 2013

15
Disaster Impact:
Labor Cost
# of People AVG % Affected AVG Rate/Hr # of Hours
How many employees are effected by the
outage?
How reliant are they uptime to produce? Average employee cost per hour?
X X X
How many hours is the outage?

16
Disaster Impact:
Revenue Lost
Yearly Revenue Total Yearly
Operating Hours
% Impact # of Hours
Days per Week X Hours per Week X 52 How reliant on uptime are you produce? How many hours is the outage?
X X)( /

17
Disaster Impact:
Intagible Cost
Increased Demand
for Service
Unhappy Customer
& Brand Reputation
Overtime Cost to Cover Decreased
Production
Customer Loyalty &
Attrition
Missed Deadlines, Agreements &
Contracts

18
Defining Risk Tolerance
$ and Operational Impacts
Manual Processing
Application ‘X’ in 1Hours
Recovery Time Objectives (RTO’s)
Recovery Point Objectives (RPOs)
Information Technology Group
Current Recovery Capabilities (CRC’s)
Current StateAssessment
Business Unit Personnel
Maximum Tolerable Downtimes (MTD’s)
Business ImpactAnalysis
Application ‘X’ in 30 Mins
Management Negotiation
Based on Risk Tolerance
Copyright © 2013 Accenture All rights reserved.

19
RPO & RTO Impact to Business

20
RTO & RPO Metrics Definition
RPO:
Maximum time accepted between last data
backup on DR site and failure event. It consists in
maximum time interval accepted for data loss.
Timet-1 t0 t1 t2
RTO:
Maximum time accepted for fixing infrastructure
issue, and having back 100% of HW resources on
DR site.

Table of Content
21

22
Disaster Recovery System Components

23
Definition : Business Continuity, Disaster Recovery & High Availability
Business Continuity (BC)
The process which utilizes prevention and
crisis management as well as alternate
resources and procedures to sustain
minimum required business functionality
during a crisis. In many cases, prior to IT
recovery.
Disaster Recovery (DR)
Provides the technical ability to maintain
critical services in the event of any unplanned
incident that threatens these services or the
technical infrastructure required to maintain
them.
High Availability (HA)
Ability to automatically switch to alternate
resources when a portion of the system is not
or cannot remain functional.
Minimizing Business Continuity risks requires thorough planning, using a Business Requirement-
driven approach and a proven Business Continuity Planning methodology.

24
Differences
Between BCP & DRP
Business Continuity Plan (BCP) Disaster Recovery Plan (DRP)
• Focused on recovery of individual business processes,
departments, functions, facilities, etc. (revenue production and
operational management)
• Recovery Time Objective (RTO) is typically measured in days or
weeks, sometimes, months
• Active business and IT participation
• Recovery addresses people, process and support technologies
required to continue the business
• Continuity plans are usually by process, department, function
and/or facility
• Focused on recovery of enterprise information technology (IT)
applications and supporting infrastructure (support the business)
• Recovery Time Objective (RTO) is typically measured in minutes
or hours, sometimes, days
• Active IT participation with, normally, little to no business
participation during an event
• Recovery addresses enterprise data center/ computing, facility
and support staff needs
• Recovery plans are usually by application suite, platform and/or
data center facility

25
RTO & RPO
Technological Impacts
RTO and RPO parameters are used to evaluate
possible disaster recovery solutions basing on two
different dimensions: Platform Recovery and Data
Recovery.

26
Platform Recovery Strategies
Type Description
Indicative
RTO
Coverage
Comment
Hardware
Source
Hot Standby Computer hardware that is pre-
configured with software and business
data in a way that is ready to accept the
production load as soon as the primary
server fails. The fail-over is typically
through a stretched cluster or load-
balancing
Minutes This option requires high levels
of operational attention
because it is a fail over
solution. The age of data is
dependant on data restore
method.
Dedicated
Warm
Standby
Computer hardware that is pre-
configured with software (or uses
dynamic provisioning). Once a disaster
occurs business data is restored, the
network is switched to the backup site,
and the server then accepts the
production load.
Hours This option has the resources
required to recover the system
available, but work is required
to make them live.
Dedicated
Cold
Standby
(incl. shared
risk)
Computer hardware that requires the
necessary software and data to be built
or restored before the system would be
in a productive state.
Days This option requires a rebuild
of the system to recover at the
alternate location .
Test /
Development /
Shared Risk
No DR
Standby
No pre-built hardware for disaster
recovery.
Weeks This option should at least
include DR procedures.
Procure on
invocation

27
Data Recovery Strategies
Type Description Minimum RPO Comment
Synchronous
Disk Mirroring
Synchronous replication from one
set of disks to another set of disks at
an alternate location (often SAN
based).
No Transactional
Data Loss
High I/O applications limit the distance
between primary and alternative data
centres.
Asynchronous
Disk Mirroring
Asynchronous replication from one
set of disks to another set of disks at
an alternate location (often SAN
based).
Seconds or Minutes Can run over very large distances but
does not guarantee replication of
transactional data.
Disk Copy
(Periodic
Snapshot)
Snapshot data replication
technologies ensure point-in-time
replication of data from one set of
disks to another (often SAN based).
Hours
(typically up to 24
hours)
Implementation may use synchronous
or asynchronous mirroring but does
not necessarily preserve write order
during copy.
Tape Recovery
(Regular Backup)
Regular backup from disk to tape
followed by an off-siting process
(either inline or duplication or
manual transportation).
12 hours to many
days
RPO depends on time from backupto
off-siting.

28
Combing Platform & Data Strategies
Based on BIA Results

Table of Content
29

Backup Your Data
• Focus is on protecting data
• Tape backup
• Imaging
• Poor performance
• Slow RTO, RPO (days)
• Hidden costs
• How do we get the data back in to a useable
state?
• How long to rebuild server?
30

Double Infrastructure
• Focus is on protecting application
• Clustering
• Like-for-like infrastructure
• Performance, but at what price?
• Near-zero RTO, RPO
• High cost
• Duplicate infrastructure
• Management complexity
31

Cloud Disaster Recovery as a Service
Migrating entire IT operations or DR solutions only
to cloud, and replication or movement of data to
cloud brings significant cost savings and lowering
of recovery times.
• Can shrink and grow in response to demand. ’
Replication Mode’ requires fewer resources and
incurs low cost. When a business disruption
occurs, the system enters ‘Failover Mode’, which
requires more resources that scale smoothly
without requiring large upfront investments.
• Cloud Computing eliminates hardware
unification between primary datacenter and the
cloud.
• Cloud servers start-up can be easily automated
and managed.
32

Cloud Disaster Recovery Approach
New storage, compute, network and backup functionality provided by the Cloud
provide a way to have a redundant resources outside the Primary Datacenter.
33

4 Types of DRaaS Mandatory Operation
34
Replication DR Testing Failover Failback

Table of Content
35

36
“Generally Accepted Standards”
& “Best Practices” for BCM
CoBiT Framework for Assessing IT Controls (www.isaca.org) - the IT Governance Institute provides a guidance publication on IT
governance, security and assurance: Control Objectives for Information Technology (CoBIT), third edition (2000). CoBIT is internationally
accepted as good practice for control over IT and related risks (See Domain: Deliver and Support , Process: Ensure Continuous Service in
the CoBIT framework)
National Institute of Standards and Technology (NIST) Special Publication 800-34 (www.nist.gov) has produced a Contingency
Planning Guide for Information Technology Systems, which sets fundamental planning principles and practices to help personnel develop
and maintain effective IT contingency plans. Used by Federal departments and agencies.
ISO 17799 (www.iso-17799.com): The information security standard of the International Standards Organization (ISO). ISO 17799 has an
entire section entitled Business Continuity Management wherein testing, maintaining, and reassessing the plan are called for directly.
British Standards Institute (www.bsiglobal.com) - The BSI has released a Publicly Available Specification, PAS56, which sets out a high-
level process for implementing BCM within an organization. This is currently being developed into a full British Standard and in time will
likely evolve into an ISO standard.

37
“Generally Accepted Standards”
& “Best Practices” for BCM
Business Continuity Institute (www.thebci.org) - The BCI has developed its "Business Continuity Management - Good Practice
Guidelines." The BCI works closely with the BSI in the development of a BCM Standard, and these guidelines represent the latest thinking.
Their website offers free downloads of Good Practice Guidelines. The BCI also provides certification with a number of different levels
available (e.g. Member, Fellow, etc.) Basic certification is via references from previous BCM projects, with the higher levels requiring
certification interviews. Membership of the BCI is more commonplace in Europe than the US.
DRI International (www.drii.org) - The DRII's "Professional Practices for Business Continuity Planners" sets out a seven-step model for
Business Continuity. DRII also provide certification of Business Continuity professionals through an extensive training and examination
program. Whilst currently predominantly US-based, DRII is making in-roads in Europe with professionals who are looking for an exam-
based certification, although some clients are sceptical that these exams are often at the end of paid-for training courses.
IT Infrastructure Library (www.itil.co.uk) - ITIL is possibly the most widely accepted approach to IT Service Management, and includes an
extensive process concerning "IT Service Continuity Management." Whilst this may sound IT-focused, much of the process focuses on "The
Business Continuity Lifecycle" and covers many of the elements addressed by the Accenture BCM Methods. This is particularly useful
when working at clients who have implemented or are in the process of implementing ITIL processes.

38
U.S. Federal Government Certification & Accreditation
(C&A) Methodologies
There are generally three methodologies used for C & A initiatives:
DITSCAP is an acronym for Defense Information Technology Systems Certification and Accreditation Process. It is based on a publication
known as Defense Information Systems Certification and Accreditation regulation Department of Defense (DoD) 5200.40. DITSCAP is
typically used only for defense agencies, although civilian agencies may opt to apply DITSCAP principles to their own customized C&A
process.
NIACAP stands for National Information Assurance Certification and Accreditation Process. It is based on a process published by the
National Security Telecommunications and Information System Security Instruction known as NSTISSI No. 1000.
NIST is the National Institute of Standards and Technology, and its C&A methodology is described in a document known as Special
Publication 800-37. While many civilian agencies have traditionally used either the NIACAP or NIST methodologies, the current trend is that
most agencies are moving away from NIACAP to embrace the new NIST methodology.

THANK YOU
Sutedjo Tjahjadi, Managing Director

Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard

Related slideshows

More Related Content

Disaster Recovery: Understanding Trend, Methodology, Solution, and Standard