SlideShare a Scribd company logo
HA/DR Architecture
for SAP HANA
on IBM Power Systems
Liwen Yeow
SAP HANA on Power Product Management
yeow@ca.ibm.com
Andreas Müller
IBM Technology Engineering for SAP
andreas.r.mueller@ibm.com
Business Continuity
$5,600/minute
According to a Gartner industry survey, the number cited is an
average cost of downtime which translate to well over $300K/hour
Unplanned downtime incidents
Planned downtime
Planned downtime is time specifically
scheduled to address equipment
performance, hardware/software
upgrades, maintenance, inspections,
and other necessary upkeep
SLA availability
Number of nines Uptime % Maximum annual downtime
Six (6) 99.9999 32 seconds
Five (5) 99.999 5 minutes 16 seconds
Four (4 99.99 52 minutes 36 seconds
Three (3) 99.9 8 hours 45 minutes 57 seconds
Two (2) 99.0 3 days 15 hours 39 minutes 29 seconds
One (1) 90.0 36.5 days
Typical SAP ecosystem/integration domain
Source: Integration Architecture Guide for Cloud and Hybrid Landscapes Based on SAP Integration Solution Advisory Methodology
HA/DR = Eliminate single points of failure
RAID
Site A
Site B
RAID RAID
IBM Cloud
RAID
RAID
UPS
UPS
What are Recovery Point Objective (RPO)
and Recovery Time Objective (RTO)?
SAP Built-in high availability options
SAP high availability architecture
Example: SAP HA ABAP System
Hardware redundancy - distribute
across multiple servers
• SAP Web dispatchers (load balancers)
• SAP Application Servers (AS) with n ≥ 2
Identify and cluster SPOFs
• (ABAP) SAP Central Service (A)SCS (msg + enq):
• Enqueue Replication Server (ERS)
replicates a copy of the lock table
• If (A)SCS fails, Cluster Manager (i.e.
pacemaker) starts new (A)SCS and
syncs with ERS data
• Shared File System i.e. NFS or IBM Spectrum
Scale for central directories like /sapmnt and
/usr/sap/trans
• SAP HANA database server
DMZ Intranet
SAProuter
SAP Cloud
Connector
SAP Web Dispatcher
SAProuter
SAP Cloud
Connector
User
RAID
SAP HANA
Primary
RAID
SAP HANA
Secondary
(A)SCS
ERS
AS 1
AS 2
Application
Server
…
ERS
(A)SCS
SAP Web Dispatcher
RAID
RAID
Shared File
System
ERS
AS 3
(A)SCS
AS n
SAP HANA high availability and disaster recovery options
Host Auto-Failover System Replication Storage Replication
Cluster-like solution, cold standby Log-shipping with regular snapshot
propagation, warm or cold standby
Data propagation on disk block level,
cold standby
• One data pool
• Incl. internal cluster manager for HA
• Uses Storage Connector APIs for
communication with environment
• Both for HA and DR
• Automation with external cluster
manager like pacemaker
• Active/Active for Single,
Scale-out & MDC
• Usually used for DR
• Automation with external
cluster manager like
VM Recovery Manager
• Usually driven by HW or storage
partner (no SAP certification)
Covers HW problems with
additional servers
Covers HW and data integrity
problems with an additional set
of individually driven data pools
Covers HW & site failures
on a broader scale
Name
Server
Host 1
Name
Server
Host 2
Name
Server
Host 3
Storage
Connector
API
SAN
Storage
Secondary
HANA
preload
Primary
HANA
Automatic
initial data load
Log
Data Data Log
Log
Secondary
HANA or
Dev&QA
Primary
HANA
Data
Data Data
Log Log
Log
Name
Server
Standby
Host
SAP HANA high availability: Host auto-failover
Name
Server
Host 1
Name
Server
Host 5
Name
Server
Host 6
Storage
Connector
API
SAN
Storage
Name
Server
Standby
Host
Name
Server
Host 1
Storage
Connector
API
SAN
Storage
Name
Server
Standby
Host
Name
Server
Host 2
Name
Server
Host 3
Name
Server
Host 4
• Scale-out clusters address two business problems:
• Scale to RAM size bigger than one host
• HA solution with one or more standby server
• Host Auto-Failover is managed by the Name Server:
• Component of SAP HANA
• Heartbeat checks of all cluster members
• Fully automated takeover in case of failure
• For SAP Systems that support scale-out configuration:
• BW/4HANA and BW no restrictions
• S/4HANA max 4 nodes
• SAP recommendation: Scale-up first, only then scale-out
• Disadvantages/ short-comings:
• Restarting HANA instance on standby server can be slow due to
data load (comparable to HANA System Replication cost-optimized)
• Administration and performance management can be more
complex than scale-up
• Expensive for minimalistic setup
• Host Auto-failover does not protect from failure of storage or the one
shared copy of the database. Additional measures are required.
Minimalistic setup:
SAP HANA system replication – performance optimized
RAID
Data Log
Primary
(active)
Secondary
(active, data pre-loaded)
Name Server
Index Server
Name Server
RAID
Data Log
Index Server
Transfer by HANA
database kernel
Database organizes the replication process:
• Keeps a secondary shadow instance
• Preload of data on secondary
• All log changes on the primary system are
redone on secondary
• Two flavors possible:
• Performance optimized option
• Cost optimized option
Performance optimized option:
• Secondary system completely used for the
preparation of a possible take-over
• Warm standby, takeover RTO = n minutes
Notes:
• Secondary needs to be identically sized 1:1
i.e. number of nodes if primary is a cluster
• Additional cluster SW like pacemaker is
necessary for automated failover
Index Server
SAP HANA system replication – cost optimized
RAID
Data Log
Primary
(active)
Secondary
(active)
Name Server
Index Server
Name Server
RAID
Data Log
Index Server
Transfer by HANA
database kernel
Cost optimized option:
• Column tables are not pre-loaded into RAM
of secondary system (HANA core + row store
still are pre-loaded)
• Free RAM can be used to run DEV/QA
instances on secondary system
• PRD shadow system processes log changes
• During takeover the non-production
operation needs to be ended and column
tables are loaded from disk to RAM
• Cold standby, takeover RTO = n minutes
Notes:
• Row store size needs to be monitored
• Independent disk volume and I/O capacity is
required for DEV/QA → use separate storage
infrastructure for PRD and DEV/QA
PRD DEV & QA
RAID
DEV QA1
DEV system
QA system
QA2
System replication modes – FULL-SYNC
Log Replication “Synchronous with Full Sync Option”
• Transaction is committed, when the log buffer has been written to the log volume of the
primary and the secondary instance.
• When the secondary system is getting disconnected (i.e. because of network failures),
the primary systems suspends transaction processing until the connection to the
secondary system is re-established.
• No data loss can happen in this scenario.
Primary Secondary
Data Log Data Log
Acknowledgement

Log Buffer
RAM RAM
Load to RAM OK
 Write to Log OK

Log Buffer
 Write to Log OK
System replication modes - SYNC
Log Replication “Synchronous”
• When the connection to the secondary system is lost (after a timeout period defined with
parameter logshipping timeout=30), the primary system continues transaction processing
and is writing the changes only to the local disk.
• No data loss can happen in this scenario as long as the secondary system is connected.
• Data loss can occur when takeover is executed after the secondary system was disconnected
and didn‘t receive updates.
Primary Secondary
Data Log Data Log
Acknowledgement

Log Buffer
RAM RAM
Load to RAM OK
 Write to Log OK

Log Buffer
 Write to Log OK
System replication modes – SYNCMEM (default)
Log Replication “Synchronous in Memory”
• When the connection to the secondary system is lost, the primary system continues
transaction processing and is writing the changes only to the local disk.
• Data loss can occur when primary and secondary fail at the same time and afterwards
the secondary system takes over.
Primary Secondary
Data Log Data Log
Log Buffer
Acknowledgement

Log Buffer
RAM RAM
Load to RAM OK

 Write to Log OK
System replication modes - ASYNC
Log Replication “Asynchronous”
• Transaction is committed, when the log buffer has been written to the log volumes of the
primary and send to the secondary via the network channel.
• In this case data loss can occur when the secondary needs to take over.
• Synchronous or asynchronous replication is chosen based on workload intensity on the
primary server and network latency i.e. for distances >100km ASYNC is recommended.
Primary Secondary
Data Log Data Log
Log Buffer
Send OK

Log Buffer
RAM RAM
 Write to Log OK
Multi-tier and multi-target landscape
Dallas
Site A Site B Site C
London West
London North
Tier 1 - Primary
RAID
RAID
Tier 2 - Secondary
RAID
RAID
Tier 3 - Secondary
RAID
<100 km worldwide
Failover between sites is DR scenario which
is human decision and manually triggered
SYNCMEM ASYNC
SYNC SYNC
SAP HANA System replication with
cluster-automation
SAP HANA System replication
Tier 2 - Secondary Tier 3 - Tertiary
Cluster Managers
Failover automation
HA failover cluster is a set of computer servers that
work together to provide high availability (HA):
• Linux Pacemaker
• IBM PowerHA
VM management software provides HA for virtual
machines (VMs) by pooling the virtual machines and
the physical servers they reside on into a cluster. If
failure occurs, the VMs on the failed host are restarted
on alternate hosts:
• IBM VM Recovery Manager
• VMware vSphere HA
Network
Cluster Node 1
Primary
Cluster Node 2
Secondary
Private Network
Heartbeat
Shared Storage
(on failover)
Virtual IP Address
RAID
Data Log
Linux HA architecture with Pacemaker
Heartbeat
Server 1
Designated Coordinator
Resource Scripts
Corosync
Communication/Sync Layer
Pacemaker
CIB
CRM
STONITH
fencing
agent
PE
LRM
RA RA RA
Server 3
Resource Scripts
Corosync
Communication/Sync Layer
Pacemaker
CIB
CRM
STONITH
fencing
agent
PE
LRM
RA RA RA
Server 2
Resource Scripts
Corosync
Communication/Sync Layer
Pacemaker
CIB
Replica
CRM
STONITH
fencing
agent
LRM
RA RA RA
DC = Designated Coordinator
• CRM of one node elected for „primary“
RA = Resource Agent
• Program to start/stop/monitor services like
SAP HANA
CRM = Cluster Resource Manager
• Brain of Pacemaker
• All cluster actions are routed through CRMd
CIB = Cluster Information Base
• In-Memory XML representation of cluster
configuration & status
LRM = Local Resource Manager
• Triggered by CRM to call RAs
• Reports result back
PE = Policy Engine
• When DC triggers a cluster-wide change, the PE calculates the next (ideal)
state of the cluster based on the current state and configuration
• Runs always on the DC
Split Brain, Fencing, and STONITH
• Split Brain
• The connection between cluster nodes is disrupted
but the nodes still have connection to the outside
i.e. SAP application servers
• The cluster nodes continue to run and update the database
• Both parts of the cluster have only a subset of the changes
→ Logical inconsistency and data corruption
• Fencing agent
• Watchdog in combination with an external device (like power unit or I/O adapter) or a shared disk
• Restricts access to shared resources by an errant node i.e. by hard reboot of that node
• STONITH = Shoot The Other Node In The Head
• The cluster partition with less nodes stops operation, reboots and tries to reconnect
to the cluster (Server 1)
• The cluster part with more nodes (Quorum), here Servers 2 & 3, elect a new Designated
Coordinator and reboots the errant node via the fencing agent
• When the fencing agent confirms that the errant node has stopped operation, the
remaining cluster continues operation
Example of the takeover process
with SAP HANA system replication and a cluster manager
Application Server
DBSL
Virtual
IP
Primary
SAP HANA
Data Log
Secondary
SAP HANA
Data Log
(A)Sync
Application Server
DBSL
Virtual
IP
Primary
SAP HANA
Data Log
Secondary
SAP HANA
Data Log
(A)Sync
Application Server
DBSL
Virtual
IP
Secondary
SAP HANA
Data Log
Primary
SAP HANA
Data Log
(A)Sync
Virtual
Hostname
Cluster
Manager
Take-
over
Virtual
Hostname
Virtual
Hostname
Cluster
Manager
1. Rebuild
2. Reconfigure
3.Re-initiate
1 2 3
VM Recovery Manager product use cases
Primary Site Secondary Site
VIO
VIO
VIO
VIO VIO
VIO
VIO
VIO
Managed
Managed
Unmanaged
Host
Group
A
Managed
Managed
Managed
Unmanaged
Unmanaged
Managed
Managed
Unmanaged
Unmanaged
Host
Group
B
VIO
VIO VIO
VIO
Disk Group
Disk Group
HMCs HMCs
KSYS
LPAR
VMR
Code
CLI or GUI
Administration
HA Use Cases:
• LPM Management
• Automatic VM Restart [ Host or VM Level ]
• Application Monitoring / Restart
DR Use Cases:
Planned DR [ Site | Host Group | VM Workgroup ]
Unplanned DR [ Site | Host Group | VM Workgroup ]
DR Testing [ Site | Host Group | VM Workgroup ]
VM Recovery Manager – HA for SAP HANA
Clustered HANA DB with HSR
• Redundant OS images
• No shared data volumes
• HANA HSR replicates data over IP
• Separate hardware
• Dedicated or virtualized resources
VIO
VIO
OS Data
VIO
VIO
OS Data
HANA HSR
HANA DB VM1 HANA DB VM2
KSYS VM
VMR-HA
Datacenter
Block Level Replication
(Alternative)
Data + Logs Data + Logs
VM Agent for HANA VM Agent for HANA
VMR HA works with HSR by installing VM Agent for HANA integration
• VM agent monitors the application (HANA)
• Switch Roles [PRIMARY / SECONDARY]
• Control Redirection of replication [SYNC /SYNCMEM / ASYNC]
• Swing virtual IP between Primary and Secondary
OR
VMR + Pacemaker
• Pacemaker manages the cluster
• VMR – KSYS monitors for VM failure
VM Recovery Manager – DR for SAP HANA
SAP HANA
VM
KSYS
Primary Datacenter Secondary Datacenter
VIO
VIO
VIO
VIO
VMR-DR
User Initiated DR Moves
DR only
Solution
Block Level Replication
• No SAP HANA
LPAR on
Secondary
VM Agent + SAP HANA Monitoring
• Logical volumes for OS, application, HANA data and logs
are block level replicated to secondary site
• No SAP HANA VM instance exists on secondary site
• Server at secondary site can be used for other applications
VM Recovery Manager – combined HA and DR for SAP HANA
Primary Datacenter Secondary Datacenter
HADR
Solution
(HSR/VMR)
(2) Block Level Replication
SAP HANA
VM
VIO
VIO
VIO
VIO
KSYS
VMR-HADR
VIO
VIO
Auto Restart User Initiated DR Moves
HADR
Solution
(HSR/Pacemaker)
(1)
Block Level Replication
SAP HANA
VM
VIO
VIO
VIO
VIO
KSYS
VMR-DR
VIO
VIO
Auto Restart
User Initiated DR Moves
SAP HANA
VM
SAP HANA
VM
HSR SYNC
Pacemaker Clustering
HSR SYNC
VM Agent +
SAP HANA Monitoring
VM Agent +
SAP HANA Monitoring
VM Agent +
SAP HANA Monitoring
VM Recovery Manager – DR rehearsal functionality
Primary Site Secondary Site
ACTIVE BACKUP
AIX 7.2
VMR UI server
KSYS LPAR
CLI or GUI
Administration
VMR Code
DR Rehearsals:
• Site Level
• HG Level
• VM Workgroup Level
Block Level
Replication
Flashcopy
Target LUNs
Clones on Flashcopy
Volumes
SAP HANA
VM
VIO
VIO
VIO
VIO
VM Agent + SAP HANA Monitoring
SAP HANA
VM
SAP HANA
VM
Pacemaker vs. VMR
Pacemaker vs. VMR
Pacemaker IBM VM Recovery Manager HA IBM VM Recovery Manager DR
Base technology Linux OS IBM PowerVM & shared
storage
IBM PowerVM & Storage
Replication
Supported by SAP   
Automated failover   
Ideal for DR   
Ideal for HA   *
Ease of Management + +++ ++
Cloud support   
Reuse secondary for non-
production workload
Partly Partly 
License cost Part of Linux OS $ $$
DR Rehearsal feature for
non-disruptive DR testing
  
* Depends on the topology deployed
for SAP Landscape DR in the cloud, hybrid or on-premises
BusinessShadow
 Solution for disaster recovery and high availability
that can mirror SAP landscapes and other
application systems with a time delay
 Automatically creates the standby-system. The
sequences for switching to emergency mode as
well as switching back to normal mode can be
initiated automatically or at the push of a button
 Protects not only from the consequences of
hardware and application errors, but also from the
consequences of elemental damage, sabotage,
or data loss due to human error
 Dynamically adjustable time funnel temporarily
stores the change logs before they are mirrored
to the standby-system
 Supports legacy databases (for Netweaver-based)
and SAP HANA (i.e. S/4HANA)
CloudShadow
Libelle CloudShadow for SAP Landscapes on IBM
Power Systems Virtual Server is a sophisticated disaster
recovery monitoring, management, and replication
solution. It supports SAP clients with managing their
system replication, replicates application servers,
and manages post-DR automation tasks
CloudShadow is based on Libelle’s data replication solution.
In cooperation with IBM, Libelle optimized and expanded
the solution to operate optimized within IBM Power Systems
Virtual Server (PowerVS)
RAS features of IBM Power for SAP
[6] The Cost of Downtime (Gartner Blog Network)
[7] Global hourly enterprise server downtime cost 2019 | Statista
[8] ITIC 2020 Global Server Hardware, Server OS Reliability Report (ibm.com)
IBM Power – reliability, availability,
performance
Independent studies confirm IBM Power Systems
is among the most resilient systems[8]:
• Memory protection and Chipkill
• Predictive failure alerts
• IBM PowerVM Live Partition Mobility
• Virtual persistent memory
Live Partition Mobility
(LPM)
Live Partition Mobility is the actual movement of
a running partition from one physical machine to
another without disrupting the operation of the
operating system and applications running in
that partition
Usage:
• Workload consolidation (move from many systems
to one system)
• Workload balancing (move a partition to a system
that has a lighter workload)
• Planned CEC outages for maintenance/upgrades
• Impending CEC outages (as an option to keep a
partition running if hardware warnings are received)
Requirements:
• POWER7 or newer
• All I/O must be virtual and supported by VIOS
• Configuration must use only external storage,
and it must be accessible to both source and
destination systems
• Both source and destination systems must be
on the same Ethernet network
VIO
VIO
VIO
VIO
Virtual SAN and Network infrastructure
HMC (Optional)
IBM Power – data protection & security
• Full control over RPO and RTO
• Security built in at all layers in the stack
• IBM PowerVM has fewer reported vulnerabilities
than other hypervisors by orders of magnitude[9]
• Dedicated crypto engines for transparent
memory encryption
• Hardware-enforced container protection and
isolation capabilities
• IBM PowerSC: security and compliance for
virtualized environments on Power Systems
[9] How does PowerVM provide security between different LPARs - (IBM Power Systems Community)
SAP HANA data tiering
End-to-End Data tiering with optimized data access. Types of HANA data:
DRAM PMEM
 Persistent memory
(PMEM) is used to keep
hot data pre-loaded
HOT Data
 NSE is an intelligent, built-in buffer cache for
the SAP HANA In-memory database to leave
warm data on disk
 NSE is the primary warm store option for
HANA on-premises and HANA Service for
eligible SAP Applications
 Extension node and dynamic tiering will
continue to be offered
WARM Data
Dynamic tiering
Extension node
Native storage extension (NSE)
NLS IQ
(BW only)
SAP IQ
Spark Controller / Hadoop
 SAP HANA cold data tiering provides persistence
capabilities for HANA cold data in external data stores
like SAP IQ, Hadoop distributed file system (HDFS),
Azure Data Lake, and SAP Big Data Services
COLD Data
HOT WARM COLD
Vendor solutions for SAP HANA PMEM
INTEL Optane DC
Persistent Memory
• Dedicated
Hardware
• Cascade Lake
and newer
processor systems
• NVM
SAP HANA Fast
Restart
• Linux kernel
capability (tmpfs)
• No prerequisites
• DRAM
IBM Virtual
Persistent Memory
• PowerVM feature
• Power9/10 Server
• DRAM
DRAM
Working &
Volatile Memory
Area
PMEM
Main Area
DISK
/hana/data &
/hana/log
SAP HANA with DRAM as vPMEM on IBM Power
How it works
– DRAM is split into two regions:
DRAM1 (delta region) and
DRAM2 (main region)
– DRAM2 is configured as
PMEM device
– DRAM2 region is initialized
with main region when it is added
for the first time
– Data written from DRAM2 to
data volume after delta merge
(creates new main)
– Changes to database continuously
logged to Log volume
– Restart of HANA or Linux do not
require main region to be loaded
from storage layer into DRAM2
Client value
� 17x faster restart of SAP HANA
database in case of planned
and unplanned downtime
✅ Available on existing Power9/10
systems with PowerVM
✅ No additional cost
✅ No performance degradation
or latency
Data
Volume
Log
Volume
DRAM1
Delta Region
DRAM2
Main Region
PowerVM
Linux
DRAM
Backup
The last line of defense - database backups and recovery
• High Availability measures focus on keeping the
database up and running
• No protection against a different class of threats:
• Data deletion
• Data corruption through HW error or logical
errors i.e. application bugs
• Data corruption through malicious attacks i.e.
ransomware attacks
• Permanent destruction of all resources in the
HADR architecture and necessity to rebuild
from scratch
• HA solution is not working as expected
• Backups can recover from these types of disaster
• Backups are an important part of DR strategies
• As such human decision to execute and high RTO
measured in hours
Data persistency
Although SAP HANA is an IMDB, it still has to have
persistent storage in the event of failure
In-memory data
• User data, meta data, changed data
• Modeling data
• Undo log information
• Data compression is 4-5X
Log
• Redo log containing changed data
• Saved to disk when transaction commits
• Cyclical override – only after backup
Savepoint
• Changed data and undo logs are written
• Automatic
• At least every 5 mins
SAP HANA backups
Data Backup
• Can be initiated from HANA Cockpit/Studio or Command
Line Interface
• Data backup contains only actual data
• Backup does not include
• Unused space
• Log area/files
• Data integrity checks are performed on block level during
backups (content is not checked)
• Delta backups – Differential and Incremental are
supported
Storage Snapshot
• Captures the content of the SAP HANA data area at a
particular point in time.
• Includes all the data that is required to recover the
database to a consistent state
Log Backup
• By default, SAP HANA log segments are backed up
automatically
Backint
• Interface to a third-party tool to back up and recover
SAP HANA database and logs
• Backint agent must be installed
IBM Spectrum Protect
Integrated with SAP HANA backup and recovery utilities
via backing interface and SAP HANA Studio
Ensures all backup objects that belong to the same
full database backup gets assigned to same backup id
and handled as single entity
Integrated with Offline Log files backed up through
BRARCHIVE command
Differentiators
Highly scalable two-tier architecture
A single site could have
many servers, managed by a
single interface
Highly efficient data, secure
data transfer
Fully software-defined: no hardware
required to enable features or scale
Can migrate data to tape,
public cloud services, and on-
premises object storage
Solution Bundles
• Spectrum Protect Entry
• Spectrum Protect Suite – Front
End
• Spectrum Protect Suite
Hybrid Cloud Integration
Increase client flexibility to utilize cloud solutions based on
business needs
Supports IBM Cloud, AWS,
Azure and GCP
Comparing delta backup types
Differential Backup (Cumulative) Incremental Backup
What data is backed up? The data changed since the last full data backup The data changed since the last full data backup or the last
delta backup (incremental or differential).
Backup size The amount of data to be saved with each differential
backup increases.
If data remains unchanged, it is not saved to more than
one backup. For this reason, incremental backups are
the smallest of the backup types
Backup and recovery strategy If your backup strategy is based on only full data
backups and differential backups, only two
backups are needed for a recovery: one full
data backup and one differential backup.
If your backup strategy is based on only full data backups
and incremental backups, to recover the database, SAP
HANA needs the following backups:
• The full data backup on which the incremental backups
are based
• Each incremental backup made since the full data
backup
In some situations, many incremental backups may
be needed for a recovery.
Comparison of data backups and data snapshots
Data Backup to File Data Backup Using Backint Data Snapshot
Advantages • Integrity checks at block level
• Can be encrypted
• Integrity checks at block level
• Integrated into existing data
center infrastructure
• Third-party backup tool offers
additional features
• Backups are immediately available
for recovery
• Fast
• Negligible impact on network
• Can be encrypted
• No third-party backup integration
Disadvantages • Requires additional storage
• Generates additional network load
• File system needs to be monitored
• More time is needed to make backups
available for recovery
• Generates additional network load • No integrity checks at block level
Backup Duration • IO-bound (reading from data volume,
writing to target)
• Network-bound (writing to target
file system)
• IO-bound (reading from data
volume)
• Network-bound (writing to backup
server)
• Negligible
SAP HANA backup and recovery using
IBM Cloud Object Storage
Backup and Recovery in a hybrid environment
• via Backint (agent for Amazon S3) to supported
cloud backend storage types
• S3 storage offered from AWS
• IBM Cloud Object Storage
• In HANA Cockpit, specify Backint as
destination type
RAID
Data Log
Backint
IBM Cloud Object Storage
AWS S3 Storage
SAP HANA DB
Additional references on HA/DR topics
• SAP HANA Network Requirements - SAP
• Supported HA Scenarios for SAP HANA, SAP S/4HANA, and SAP NetWeaver - Red Hat
• Need SAP HANA Running 24/7? Challenge Accepted! (SUSE) - openSAP
• IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment - IBM
Redbooks
• SAP HANA Data Management and Performance on IBM Power Systems - IBM Redbooks
• SAP HANA on IBM Power Systems Backup and Recovery Solutions - IBM Redbooks
• Implementing High Availability and Disaster Recovery Solutions with SAP HANA on IBM Power Systems - IBM
Redbooks
• Implementing IBM VM Recovery Manager for IBM Power Systems - IBM Redbooks
• SAP HANA Agent for IBM VM Recovery Manager - IBM System Technical White Paper
• MrPowerHA - YouTube
No affiliation, endorsement, sponsorship, or approval with or by SAP, Red Hat,
SUSE, or YouTube of this document is implied by the use of the links above
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the
information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area.
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of
non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these
patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or
implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be
achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. IBM hardware products are manufactured from new parts, or
new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors
including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-
level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have
been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.
SAP, SAP R/3, SAP NetWeaver, ABAP, SAP HANA, SAP S/4HANA, SAP Fiori, and other SAP products and services mentioned herein as well as their respective logos are
trademarks or registered trademarks of SAP SE in Germany and other countries. See www.sap.com/trademark for additional trademark information and notices.
IBM, the IBM logo, ibm.com, AIX, Power, POWER, POWER7, POWER8, POWER9, Power Architecture, PowerHA, PowerPC, PowerVM, RACF, Redbooks, Storwize,
System i5, and System p5 are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A full list of U.S.
trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml.
For all of the following trademarks, no affiliation, endorsement, sponsorship, or approval of this document with or by the trademark owner is implied by the use of these
marks:
• Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
• Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or
other countries.
• “Microsoft”, “Azure” are trademarks of the Microsoft group of companies.
• Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
• UNIX is a registered trademark of The Open Group in the United States, other countries or both.
• The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
• Red Hat, Open Shift, Red Hat® Ansible® Automation Platform and Ansible are registered trademarks of Red Hat, Inc. in the U.S. and other countries.
• SUSE and the SUSE logo are trademarks of SUSE LLC or its subsidiaries or affiliates in the U.S. and other countries.
• APACHE HADOOP® SOFTWARE and APACHE CAMEL™ SOFTWARE are registered trademark of The Apache Software Foundation in the United States, European
Union, and/or other countries.
• TWITTER, TWEET, RETWEET and the Twitter Bird logo are trademarks of Twitter Inc. or its affiliates.
• “Amazon”, “Amazon Web Services” and “aws” are trademarks of Amazon.com, Inc. in the United States and/or other countries.
• “Google” and “Google Cloud Platform” are trademarks of Google LLC in the United States and/or other countries.
• “Alibaba cloud” is a trademark of Alibaba Group Holding Limited in China and/or other countries.
• Other company, product and service names may be trademarks or service marks of others.

More Related Content

HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTX

  • 1. HA/DR Architecture for SAP HANA on IBM Power Systems Liwen Yeow SAP HANA on Power Product Management yeow@ca.ibm.com Andreas Müller IBM Technology Engineering for SAP andreas.r.mueller@ibm.com
  • 3. $5,600/minute According to a Gartner industry survey, the number cited is an average cost of downtime which translate to well over $300K/hour
  • 5. Planned downtime Planned downtime is time specifically scheduled to address equipment performance, hardware/software upgrades, maintenance, inspections, and other necessary upkeep
  • 6. SLA availability Number of nines Uptime % Maximum annual downtime Six (6) 99.9999 32 seconds Five (5) 99.999 5 minutes 16 seconds Four (4 99.99 52 minutes 36 seconds Three (3) 99.9 8 hours 45 minutes 57 seconds Two (2) 99.0 3 days 15 hours 39 minutes 29 seconds One (1) 90.0 36.5 days
  • 7. Typical SAP ecosystem/integration domain Source: Integration Architecture Guide for Cloud and Hybrid Landscapes Based on SAP Integration Solution Advisory Methodology
  • 8. HA/DR = Eliminate single points of failure RAID Site A Site B RAID RAID IBM Cloud RAID RAID UPS UPS
  • 9. What are Recovery Point Objective (RPO) and Recovery Time Objective (RTO)?
  • 10. SAP Built-in high availability options
  • 11. SAP high availability architecture Example: SAP HA ABAP System Hardware redundancy - distribute across multiple servers • SAP Web dispatchers (load balancers) • SAP Application Servers (AS) with n ≥ 2 Identify and cluster SPOFs • (ABAP) SAP Central Service (A)SCS (msg + enq): • Enqueue Replication Server (ERS) replicates a copy of the lock table • If (A)SCS fails, Cluster Manager (i.e. pacemaker) starts new (A)SCS and syncs with ERS data • Shared File System i.e. NFS or IBM Spectrum Scale for central directories like /sapmnt and /usr/sap/trans • SAP HANA database server DMZ Intranet SAProuter SAP Cloud Connector SAP Web Dispatcher SAProuter SAP Cloud Connector User RAID SAP HANA Primary RAID SAP HANA Secondary (A)SCS ERS AS 1 AS 2 Application Server … ERS (A)SCS SAP Web Dispatcher RAID RAID Shared File System ERS AS 3 (A)SCS AS n
  • 12. SAP HANA high availability and disaster recovery options Host Auto-Failover System Replication Storage Replication Cluster-like solution, cold standby Log-shipping with regular snapshot propagation, warm or cold standby Data propagation on disk block level, cold standby • One data pool • Incl. internal cluster manager for HA • Uses Storage Connector APIs for communication with environment • Both for HA and DR • Automation with external cluster manager like pacemaker • Active/Active for Single, Scale-out & MDC • Usually used for DR • Automation with external cluster manager like VM Recovery Manager • Usually driven by HW or storage partner (no SAP certification) Covers HW problems with additional servers Covers HW and data integrity problems with an additional set of individually driven data pools Covers HW & site failures on a broader scale Name Server Host 1 Name Server Host 2 Name Server Host 3 Storage Connector API SAN Storage Secondary HANA preload Primary HANA Automatic initial data load Log Data Data Log Log Secondary HANA or Dev&QA Primary HANA Data Data Data Log Log Log Name Server Standby Host
  • 13. SAP HANA high availability: Host auto-failover Name Server Host 1 Name Server Host 5 Name Server Host 6 Storage Connector API SAN Storage Name Server Standby Host Name Server Host 1 Storage Connector API SAN Storage Name Server Standby Host Name Server Host 2 Name Server Host 3 Name Server Host 4 • Scale-out clusters address two business problems: • Scale to RAM size bigger than one host • HA solution with one or more standby server • Host Auto-Failover is managed by the Name Server: • Component of SAP HANA • Heartbeat checks of all cluster members • Fully automated takeover in case of failure • For SAP Systems that support scale-out configuration: • BW/4HANA and BW no restrictions • S/4HANA max 4 nodes • SAP recommendation: Scale-up first, only then scale-out • Disadvantages/ short-comings: • Restarting HANA instance on standby server can be slow due to data load (comparable to HANA System Replication cost-optimized) • Administration and performance management can be more complex than scale-up • Expensive for minimalistic setup • Host Auto-failover does not protect from failure of storage or the one shared copy of the database. Additional measures are required. Minimalistic setup:
  • 14. SAP HANA system replication – performance optimized RAID Data Log Primary (active) Secondary (active, data pre-loaded) Name Server Index Server Name Server RAID Data Log Index Server Transfer by HANA database kernel Database organizes the replication process: • Keeps a secondary shadow instance • Preload of data on secondary • All log changes on the primary system are redone on secondary • Two flavors possible: • Performance optimized option • Cost optimized option Performance optimized option: • Secondary system completely used for the preparation of a possible take-over • Warm standby, takeover RTO = n minutes Notes: • Secondary needs to be identically sized 1:1 i.e. number of nodes if primary is a cluster • Additional cluster SW like pacemaker is necessary for automated failover Index Server
  • 15. SAP HANA system replication – cost optimized RAID Data Log Primary (active) Secondary (active) Name Server Index Server Name Server RAID Data Log Index Server Transfer by HANA database kernel Cost optimized option: • Column tables are not pre-loaded into RAM of secondary system (HANA core + row store still are pre-loaded) • Free RAM can be used to run DEV/QA instances on secondary system • PRD shadow system processes log changes • During takeover the non-production operation needs to be ended and column tables are loaded from disk to RAM • Cold standby, takeover RTO = n minutes Notes: • Row store size needs to be monitored • Independent disk volume and I/O capacity is required for DEV/QA → use separate storage infrastructure for PRD and DEV/QA PRD DEV & QA RAID DEV QA1 DEV system QA system QA2
  • 16. System replication modes – FULL-SYNC Log Replication “Synchronous with Full Sync Option” • Transaction is committed, when the log buffer has been written to the log volume of the primary and the secondary instance. • When the secondary system is getting disconnected (i.e. because of network failures), the primary systems suspends transaction processing until the connection to the secondary system is re-established. • No data loss can happen in this scenario. Primary Secondary Data Log Data Log Acknowledgement  Log Buffer RAM RAM Load to RAM OK  Write to Log OK  Log Buffer  Write to Log OK
  • 17. System replication modes - SYNC Log Replication “Synchronous” • When the connection to the secondary system is lost (after a timeout period defined with parameter logshipping timeout=30), the primary system continues transaction processing and is writing the changes only to the local disk. • No data loss can happen in this scenario as long as the secondary system is connected. • Data loss can occur when takeover is executed after the secondary system was disconnected and didn‘t receive updates. Primary Secondary Data Log Data Log Acknowledgement  Log Buffer RAM RAM Load to RAM OK  Write to Log OK  Log Buffer  Write to Log OK
  • 18. System replication modes – SYNCMEM (default) Log Replication “Synchronous in Memory” • When the connection to the secondary system is lost, the primary system continues transaction processing and is writing the changes only to the local disk. • Data loss can occur when primary and secondary fail at the same time and afterwards the secondary system takes over. Primary Secondary Data Log Data Log Log Buffer Acknowledgement  Log Buffer RAM RAM Load to RAM OK   Write to Log OK
  • 19. System replication modes - ASYNC Log Replication “Asynchronous” • Transaction is committed, when the log buffer has been written to the log volumes of the primary and send to the secondary via the network channel. • In this case data loss can occur when the secondary needs to take over. • Synchronous or asynchronous replication is chosen based on workload intensity on the primary server and network latency i.e. for distances >100km ASYNC is recommended. Primary Secondary Data Log Data Log Log Buffer Send OK  Log Buffer RAM RAM  Write to Log OK
  • 20. Multi-tier and multi-target landscape Dallas Site A Site B Site C London West London North Tier 1 - Primary RAID RAID Tier 2 - Secondary RAID RAID Tier 3 - Secondary RAID <100 km worldwide Failover between sites is DR scenario which is human decision and manually triggered SYNCMEM ASYNC SYNC SYNC SAP HANA System replication with cluster-automation SAP HANA System replication Tier 2 - Secondary Tier 3 - Tertiary
  • 22. Failover automation HA failover cluster is a set of computer servers that work together to provide high availability (HA): • Linux Pacemaker • IBM PowerHA VM management software provides HA for virtual machines (VMs) by pooling the virtual machines and the physical servers they reside on into a cluster. If failure occurs, the VMs on the failed host are restarted on alternate hosts: • IBM VM Recovery Manager • VMware vSphere HA Network Cluster Node 1 Primary Cluster Node 2 Secondary Private Network Heartbeat Shared Storage (on failover) Virtual IP Address RAID Data Log
  • 23. Linux HA architecture with Pacemaker Heartbeat Server 1 Designated Coordinator Resource Scripts Corosync Communication/Sync Layer Pacemaker CIB CRM STONITH fencing agent PE LRM RA RA RA Server 3 Resource Scripts Corosync Communication/Sync Layer Pacemaker CIB CRM STONITH fencing agent PE LRM RA RA RA Server 2 Resource Scripts Corosync Communication/Sync Layer Pacemaker CIB Replica CRM STONITH fencing agent LRM RA RA RA DC = Designated Coordinator • CRM of one node elected for „primary“ RA = Resource Agent • Program to start/stop/monitor services like SAP HANA CRM = Cluster Resource Manager • Brain of Pacemaker • All cluster actions are routed through CRMd CIB = Cluster Information Base • In-Memory XML representation of cluster configuration & status LRM = Local Resource Manager • Triggered by CRM to call RAs • Reports result back PE = Policy Engine • When DC triggers a cluster-wide change, the PE calculates the next (ideal) state of the cluster based on the current state and configuration • Runs always on the DC
  • 24. Split Brain, Fencing, and STONITH • Split Brain • The connection between cluster nodes is disrupted but the nodes still have connection to the outside i.e. SAP application servers • The cluster nodes continue to run and update the database • Both parts of the cluster have only a subset of the changes → Logical inconsistency and data corruption • Fencing agent • Watchdog in combination with an external device (like power unit or I/O adapter) or a shared disk • Restricts access to shared resources by an errant node i.e. by hard reboot of that node • STONITH = Shoot The Other Node In The Head • The cluster partition with less nodes stops operation, reboots and tries to reconnect to the cluster (Server 1) • The cluster part with more nodes (Quorum), here Servers 2 & 3, elect a new Designated Coordinator and reboots the errant node via the fencing agent • When the fencing agent confirms that the errant node has stopped operation, the remaining cluster continues operation
  • 25. Example of the takeover process with SAP HANA system replication and a cluster manager Application Server DBSL Virtual IP Primary SAP HANA Data Log Secondary SAP HANA Data Log (A)Sync Application Server DBSL Virtual IP Primary SAP HANA Data Log Secondary SAP HANA Data Log (A)Sync Application Server DBSL Virtual IP Secondary SAP HANA Data Log Primary SAP HANA Data Log (A)Sync Virtual Hostname Cluster Manager Take- over Virtual Hostname Virtual Hostname Cluster Manager 1. Rebuild 2. Reconfigure 3.Re-initiate 1 2 3
  • 26. VM Recovery Manager product use cases Primary Site Secondary Site VIO VIO VIO VIO VIO VIO VIO VIO Managed Managed Unmanaged Host Group A Managed Managed Managed Unmanaged Unmanaged Managed Managed Unmanaged Unmanaged Host Group B VIO VIO VIO VIO Disk Group Disk Group HMCs HMCs KSYS LPAR VMR Code CLI or GUI Administration HA Use Cases: • LPM Management • Automatic VM Restart [ Host or VM Level ] • Application Monitoring / Restart DR Use Cases: Planned DR [ Site | Host Group | VM Workgroup ] Unplanned DR [ Site | Host Group | VM Workgroup ] DR Testing [ Site | Host Group | VM Workgroup ]
  • 27. VM Recovery Manager – HA for SAP HANA Clustered HANA DB with HSR • Redundant OS images • No shared data volumes • HANA HSR replicates data over IP • Separate hardware • Dedicated or virtualized resources VIO VIO OS Data VIO VIO OS Data HANA HSR HANA DB VM1 HANA DB VM2 KSYS VM VMR-HA Datacenter Block Level Replication (Alternative) Data + Logs Data + Logs VM Agent for HANA VM Agent for HANA VMR HA works with HSR by installing VM Agent for HANA integration • VM agent monitors the application (HANA) • Switch Roles [PRIMARY / SECONDARY] • Control Redirection of replication [SYNC /SYNCMEM / ASYNC] • Swing virtual IP between Primary and Secondary OR VMR + Pacemaker • Pacemaker manages the cluster • VMR – KSYS monitors for VM failure
  • 28. VM Recovery Manager – DR for SAP HANA SAP HANA VM KSYS Primary Datacenter Secondary Datacenter VIO VIO VIO VIO VMR-DR User Initiated DR Moves DR only Solution Block Level Replication • No SAP HANA LPAR on Secondary VM Agent + SAP HANA Monitoring • Logical volumes for OS, application, HANA data and logs are block level replicated to secondary site • No SAP HANA VM instance exists on secondary site • Server at secondary site can be used for other applications
  • 29. VM Recovery Manager – combined HA and DR for SAP HANA Primary Datacenter Secondary Datacenter HADR Solution (HSR/VMR) (2) Block Level Replication SAP HANA VM VIO VIO VIO VIO KSYS VMR-HADR VIO VIO Auto Restart User Initiated DR Moves HADR Solution (HSR/Pacemaker) (1) Block Level Replication SAP HANA VM VIO VIO VIO VIO KSYS VMR-DR VIO VIO Auto Restart User Initiated DR Moves SAP HANA VM SAP HANA VM HSR SYNC Pacemaker Clustering HSR SYNC VM Agent + SAP HANA Monitoring VM Agent + SAP HANA Monitoring VM Agent + SAP HANA Monitoring
  • 30. VM Recovery Manager – DR rehearsal functionality Primary Site Secondary Site ACTIVE BACKUP AIX 7.2 VMR UI server KSYS LPAR CLI or GUI Administration VMR Code DR Rehearsals: • Site Level • HG Level • VM Workgroup Level Block Level Replication Flashcopy Target LUNs Clones on Flashcopy Volumes SAP HANA VM VIO VIO VIO VIO VM Agent + SAP HANA Monitoring SAP HANA VM SAP HANA VM
  • 31. Pacemaker vs. VMR Pacemaker vs. VMR Pacemaker IBM VM Recovery Manager HA IBM VM Recovery Manager DR Base technology Linux OS IBM PowerVM & shared storage IBM PowerVM & Storage Replication Supported by SAP    Automated failover    Ideal for DR    Ideal for HA   * Ease of Management + +++ ++ Cloud support    Reuse secondary for non- production workload Partly Partly  License cost Part of Linux OS $ $$ DR Rehearsal feature for non-disruptive DR testing    * Depends on the topology deployed
  • 32. for SAP Landscape DR in the cloud, hybrid or on-premises BusinessShadow  Solution for disaster recovery and high availability that can mirror SAP landscapes and other application systems with a time delay  Automatically creates the standby-system. The sequences for switching to emergency mode as well as switching back to normal mode can be initiated automatically or at the push of a button  Protects not only from the consequences of hardware and application errors, but also from the consequences of elemental damage, sabotage, or data loss due to human error  Dynamically adjustable time funnel temporarily stores the change logs before they are mirrored to the standby-system  Supports legacy databases (for Netweaver-based) and SAP HANA (i.e. S/4HANA) CloudShadow Libelle CloudShadow for SAP Landscapes on IBM Power Systems Virtual Server is a sophisticated disaster recovery monitoring, management, and replication solution. It supports SAP clients with managing their system replication, replicates application servers, and manages post-DR automation tasks CloudShadow is based on Libelle’s data replication solution. In cooperation with IBM, Libelle optimized and expanded the solution to operate optimized within IBM Power Systems Virtual Server (PowerVS)
  • 33. RAS features of IBM Power for SAP
  • 34. [6] The Cost of Downtime (Gartner Blog Network) [7] Global hourly enterprise server downtime cost 2019 | Statista [8] ITIC 2020 Global Server Hardware, Server OS Reliability Report (ibm.com) IBM Power – reliability, availability, performance Independent studies confirm IBM Power Systems is among the most resilient systems[8]: • Memory protection and Chipkill • Predictive failure alerts • IBM PowerVM Live Partition Mobility • Virtual persistent memory
  • 35. Live Partition Mobility (LPM) Live Partition Mobility is the actual movement of a running partition from one physical machine to another without disrupting the operation of the operating system and applications running in that partition Usage: • Workload consolidation (move from many systems to one system) • Workload balancing (move a partition to a system that has a lighter workload) • Planned CEC outages for maintenance/upgrades • Impending CEC outages (as an option to keep a partition running if hardware warnings are received) Requirements: • POWER7 or newer • All I/O must be virtual and supported by VIOS • Configuration must use only external storage, and it must be accessible to both source and destination systems • Both source and destination systems must be on the same Ethernet network VIO VIO VIO VIO Virtual SAN and Network infrastructure HMC (Optional)
  • 36. IBM Power – data protection & security • Full control over RPO and RTO • Security built in at all layers in the stack • IBM PowerVM has fewer reported vulnerabilities than other hypervisors by orders of magnitude[9] • Dedicated crypto engines for transparent memory encryption • Hardware-enforced container protection and isolation capabilities • IBM PowerSC: security and compliance for virtualized environments on Power Systems [9] How does PowerVM provide security between different LPARs - (IBM Power Systems Community)
  • 37. SAP HANA data tiering End-to-End Data tiering with optimized data access. Types of HANA data: DRAM PMEM  Persistent memory (PMEM) is used to keep hot data pre-loaded HOT Data  NSE is an intelligent, built-in buffer cache for the SAP HANA In-memory database to leave warm data on disk  NSE is the primary warm store option for HANA on-premises and HANA Service for eligible SAP Applications  Extension node and dynamic tiering will continue to be offered WARM Data Dynamic tiering Extension node Native storage extension (NSE) NLS IQ (BW only) SAP IQ Spark Controller / Hadoop  SAP HANA cold data tiering provides persistence capabilities for HANA cold data in external data stores like SAP IQ, Hadoop distributed file system (HDFS), Azure Data Lake, and SAP Big Data Services COLD Data HOT WARM COLD
  • 38. Vendor solutions for SAP HANA PMEM INTEL Optane DC Persistent Memory • Dedicated Hardware • Cascade Lake and newer processor systems • NVM SAP HANA Fast Restart • Linux kernel capability (tmpfs) • No prerequisites • DRAM IBM Virtual Persistent Memory • PowerVM feature • Power9/10 Server • DRAM DRAM Working & Volatile Memory Area PMEM Main Area DISK /hana/data & /hana/log
  • 39. SAP HANA with DRAM as vPMEM on IBM Power How it works – DRAM is split into two regions: DRAM1 (delta region) and DRAM2 (main region) – DRAM2 is configured as PMEM device – DRAM2 region is initialized with main region when it is added for the first time – Data written from DRAM2 to data volume after delta merge (creates new main) – Changes to database continuously logged to Log volume – Restart of HANA or Linux do not require main region to be loaded from storage layer into DRAM2 Client value � 17x faster restart of SAP HANA database in case of planned and unplanned downtime ✅ Available on existing Power9/10 systems with PowerVM ✅ No additional cost ✅ No performance degradation or latency Data Volume Log Volume DRAM1 Delta Region DRAM2 Main Region PowerVM Linux DRAM
  • 41. The last line of defense - database backups and recovery • High Availability measures focus on keeping the database up and running • No protection against a different class of threats: • Data deletion • Data corruption through HW error or logical errors i.e. application bugs • Data corruption through malicious attacks i.e. ransomware attacks • Permanent destruction of all resources in the HADR architecture and necessity to rebuild from scratch • HA solution is not working as expected • Backups can recover from these types of disaster • Backups are an important part of DR strategies • As such human decision to execute and high RTO measured in hours
  • 42. Data persistency Although SAP HANA is an IMDB, it still has to have persistent storage in the event of failure In-memory data • User data, meta data, changed data • Modeling data • Undo log information • Data compression is 4-5X Log • Redo log containing changed data • Saved to disk when transaction commits • Cyclical override – only after backup Savepoint • Changed data and undo logs are written • Automatic • At least every 5 mins
  • 43. SAP HANA backups Data Backup • Can be initiated from HANA Cockpit/Studio or Command Line Interface • Data backup contains only actual data • Backup does not include • Unused space • Log area/files • Data integrity checks are performed on block level during backups (content is not checked) • Delta backups – Differential and Incremental are supported Storage Snapshot • Captures the content of the SAP HANA data area at a particular point in time. • Includes all the data that is required to recover the database to a consistent state Log Backup • By default, SAP HANA log segments are backed up automatically Backint • Interface to a third-party tool to back up and recover SAP HANA database and logs • Backint agent must be installed
  • 44. IBM Spectrum Protect Integrated with SAP HANA backup and recovery utilities via backing interface and SAP HANA Studio Ensures all backup objects that belong to the same full database backup gets assigned to same backup id and handled as single entity Integrated with Offline Log files backed up through BRARCHIVE command Differentiators Highly scalable two-tier architecture A single site could have many servers, managed by a single interface Highly efficient data, secure data transfer Fully software-defined: no hardware required to enable features or scale Can migrate data to tape, public cloud services, and on- premises object storage Solution Bundles • Spectrum Protect Entry • Spectrum Protect Suite – Front End • Spectrum Protect Suite Hybrid Cloud Integration Increase client flexibility to utilize cloud solutions based on business needs Supports IBM Cloud, AWS, Azure and GCP
  • 45. Comparing delta backup types Differential Backup (Cumulative) Incremental Backup What data is backed up? The data changed since the last full data backup The data changed since the last full data backup or the last delta backup (incremental or differential). Backup size The amount of data to be saved with each differential backup increases. If data remains unchanged, it is not saved to more than one backup. For this reason, incremental backups are the smallest of the backup types Backup and recovery strategy If your backup strategy is based on only full data backups and differential backups, only two backups are needed for a recovery: one full data backup and one differential backup. If your backup strategy is based on only full data backups and incremental backups, to recover the database, SAP HANA needs the following backups: • The full data backup on which the incremental backups are based • Each incremental backup made since the full data backup In some situations, many incremental backups may be needed for a recovery.
  • 46. Comparison of data backups and data snapshots Data Backup to File Data Backup Using Backint Data Snapshot Advantages • Integrity checks at block level • Can be encrypted • Integrity checks at block level • Integrated into existing data center infrastructure • Third-party backup tool offers additional features • Backups are immediately available for recovery • Fast • Negligible impact on network • Can be encrypted • No third-party backup integration Disadvantages • Requires additional storage • Generates additional network load • File system needs to be monitored • More time is needed to make backups available for recovery • Generates additional network load • No integrity checks at block level Backup Duration • IO-bound (reading from data volume, writing to target) • Network-bound (writing to target file system) • IO-bound (reading from data volume) • Network-bound (writing to backup server) • Negligible
  • 47. SAP HANA backup and recovery using IBM Cloud Object Storage Backup and Recovery in a hybrid environment • via Backint (agent for Amazon S3) to supported cloud backend storage types • S3 storage offered from AWS • IBM Cloud Object Storage • In HANA Cockpit, specify Backint as destination type RAID Data Log Backint IBM Cloud Object Storage AWS S3 Storage SAP HANA DB
  • 48. Additional references on HA/DR topics • SAP HANA Network Requirements - SAP • Supported HA Scenarios for SAP HANA, SAP S/4HANA, and SAP NetWeaver - Red Hat • Need SAP HANA Running 24/7? Challenge Accepted! (SUSE) - openSAP • IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment - IBM Redbooks • SAP HANA Data Management and Performance on IBM Power Systems - IBM Redbooks • SAP HANA on IBM Power Systems Backup and Recovery Solutions - IBM Redbooks • Implementing High Availability and Disaster Recovery Solutions with SAP HANA on IBM Power Systems - IBM Redbooks • Implementing IBM VM Recovery Manager for IBM Power Systems - IBM Redbooks • SAP HANA Agent for IBM VM Recovery Manager - IBM System Technical White Paper • MrPowerHA - YouTube No affiliation, endorsement, sponsorship, or approval with or by SAP, Red Hat, SUSE, or YouTube of this document is implied by the use of the links above
  • 50. This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only. The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied. All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development- level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.
  • 51. SAP, SAP R/3, SAP NetWeaver, ABAP, SAP HANA, SAP S/4HANA, SAP Fiori, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE in Germany and other countries. See www.sap.com/trademark for additional trademark information and notices. IBM, the IBM logo, ibm.com, AIX, Power, POWER, POWER7, POWER8, POWER9, Power Architecture, PowerHA, PowerPC, PowerVM, RACF, Redbooks, Storwize, System i5, and System p5 are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml. For all of the following trademarks, no affiliation, endorsement, sponsorship, or approval of this document with or by the trademark owner is implied by the use of these marks: • Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. • Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries. • “Microsoft”, “Azure” are trademarks of the Microsoft group of companies. • Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. • UNIX is a registered trademark of The Open Group in the United States, other countries or both. • The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. • Red Hat, Open Shift, Red Hat® Ansible® Automation Platform and Ansible are registered trademarks of Red Hat, Inc. in the U.S. and other countries. • SUSE and the SUSE logo are trademarks of SUSE LLC or its subsidiaries or affiliates in the U.S. and other countries. • APACHE HADOOP® SOFTWARE and APACHE CAMEL™ SOFTWARE are registered trademark of The Apache Software Foundation in the United States, European Union, and/or other countries. • TWITTER, TWEET, RETWEET and the Twitter Bird logo are trademarks of Twitter Inc. or its affiliates. • “Amazon”, “Amazon Web Services” and “aws” are trademarks of Amazon.com, Inc. in the United States and/or other countries. • “Google” and “Google Cloud Platform” are trademarks of Google LLC in the United States and/or other countries. • “Alibaba cloud” is a trademark of Alibaba Group Holding Limited in China and/or other countries. • Other company, product and service names may be trademarks or service marks of others.