SlideShare a Scribd company logo
WINDOWS SERVER “10”:
ЧТО НОВОГО В КЛАСТЕРИЗАЦИИ
A. Kibkalo
CLUSTER ROLLING UPGRADES
Decrease Time to Value
Ability to upgrade a cloud platform OS to WS.vNext
without interrupting cluster workloads, adopting new
capabilities with no downtime or SLA penalties.
WHAT IS THE CLUSTER OS ROLLING UPGRADE PROCESS?
Scenario:
• Start with a Hyper-V cluster with 7 WS2012R2 nodes and 15 VMs
• Support both storage topologies for Hyper-V
• Disaggregated – File Based Storage with SMB (Scale-out File Server)
• Converged – Local block storage and Cluster Shared Volumes
FOR RACH NODE IN CLUSTER:
STEP 1: PAUSE NODE | DRAIN ROLES
Node is paused and gracefully drained of all running virtual machines
VMs are live migrated to other nodes – with no downtime
STEP 2: EVICT A NODE, CLEAN INSTALL NEW OS
WS.vNext is installed
OS is wiped and a clean install of WS.vNext is done
STEP 3: REJOIN NODE TO CLUSTER
Done from WS.vNext or Windows 10 with RSAT
Node is added back to cluster
Cluster runs with “Mixed OS versions”
Cluster Functional Level stays WS2012R2
Enhancements of WS.vNext node will operate in compatibility mode
New features which impact downlevel compatibility on WS.vNext
node will not be enabled
STEP 4: REBALANCE THE CLUSTER WORKLOAD
VMs can fail over and live migrate anywhere in mixed mode
Uplevel to WS.vNext node
Downlevel to WS2012R2 node
STEP 5: REPEAT STEPS 1-4 FOR THE NEXT NODE
Process is repeated on the next node
UI’s can manage downlevel nodes
Entire cluster is manageable from WS.vNext node
WS.vNext can manage downlevel WS2012R2 nodes
Cannot manage uplevel (WS.vNext) nodes from WS2012R2 node
UPGRADE ALL NODES
Process continues until all nodes are upgraded to WS.vNext
UPGRADE CLUSTER FUNCTIONAL LEVEL
Once all nodes are upgraded to WS.vNext the Cluster Functional
Level is upgraded via Upgrade-ClusterFunctionalLevel cmdlet
Cluster Functional Level considerations:
Cannot be upgraded until all nodes are running WS.vNext
Point of no return – no WS2012R2 nodes can be added after
All compatibility mode disabled vNext features are unlocked
Some features require additional steps to be done:
Spaces require Update-StoragePool cmdlet to upgrade the pool
VMs require Update-VMConfigurationVersion (VM is off) to unlock
some new features, like vTPM
UPGRADE COMPLETE
Private Cloud Upgrade:
All nodes are running WS.vNext
Cluster Functional Level is vNext
No downtime to tenant VMs
Lower cost of adopting vNext
vNext features for the cluster:
Storage Replication
Cloud Witness in Azure
VM Resiliency
Node isolation
Quarantine
CLUSTER OS ROLLING UPGRADE GUIDANCE
Recommended not to run in Mixed mode over two weeks
Co not create or resize storage on WS.vNext nodes while in Mixed mode
Test an upgrade from WS2012R2 to Technical Preview now
Not supported in Technical Preview release:
Cluster OS Rolling Upgrade of cluster with data de-duplication
Cluster OS Rolling Upgrade of VMs with SCDPM backups
Cluster OS Rolling Upgrade of Shared VHDX guest clusters
HYPER-V CONFIGURATION VERSIONING
What is Configuration version?
Configuration files
Saved State and Snapshot Files
What OS support what configuration versions?
INTERACTING WITH HYPER-V CONFIGURATION VERSION
PowerShell: Update-VMConfigurationVersion cmdlet
UI will come in future builds
Virtual Machine must be off
Saved state and online checkpoints are discarded
Cluster functional level must be upgraded
STORAGE REPLICA
TECHNICAL PREVIEW SCENARIOS
Stretch Cluster
Server to Server
BLOCKS, NOT FILES
SR is not DFSR
Replicating storage blocks underneath the volume
Don’t care if files are in use
Write IOs is all that matters for Storage Replica
DRIVER LAYERING
SYNCHRONOUS WORKFLOW
ASYNCHRONOUS WORKFLOW
REQUIREMENTS
Kerberos (for SMB)
>= 1Gbps between servers
Disks:
GBT, no MBR
Yes: JBOD, iSCSI, Local SCSI or SATA, SAN
No: USB, thumb drives, tapes, floppy disks, etc
Same disk geometry
Free space for logs on Windows volume
No %systemroot% or page file
Firewall: SMB and WS-MAN
RECOMMENDATIONS FOR SYNCHRONOUS
Latency:
<=5ms round trip average
Most cases 30-50km on 10GBASE or dark fibre
Write IO:
Perfmon (logical disk) and DISKSPD
Use micro benchmarks before and after SR
Log sizing and backing
SSD or bust
Larger logs allow faster recovery from larger outages, but cost space
RECOMMENDATIONS FOR ASYNCHRONOUS
Latency:
Doesn’t matter
Write IO:
Perfmon (logical disk) and DISKSPD
Use micro benchmarks before and after SR
Log sizing and backing
SSD or bust
Larger logs allow faster recovery from larger outages, but cost
space
PHILOSOPHY
Async crash consistency versus application consistency
SR guarantees mountable volume
App guarantees a readable file
Snapshots…
Are not working in Technical Preview
Async means some data loss
No RPO in Technical Preview
How much money is your data worth?
GOOD IDEAS TO FOLLOW
Drivers, drivers and… drivers
Filters!
Performance envelopes
WHAT ARE THE OPTIONS?
WHAT ISN’T STORAGE REPLICA
Storage Replica is not a “shared nothing” clustering
Storage Replica is not a backup
Will easily replicate deleted/damaged data
Storage Replica is not DFS-R
No file-level
No multi-master
No multi-endpoint
No low bandwidth
Storage Replica is not therefore a general branch office solution
TOPOLOGY
Stretch cluster
Server-to-server
No 1-to-many
No a->b->c
No a->b->c->d
Cluster-to-cluster is coming
Not working in Technical Preview
Fun fact: you can set up Storage Replica to replicate server to itself
Clone disks, or even volumes
Create a local mirror for future remote initial sync as seed
STRETCH CLUSTERS
Synchronous only
Asymmetric storage only
Two sites
Two sets of shared storage
Cluster storage: CSV or role assigned
Configure and manage via Failover Cluster Manager or PowerShell
Designed to increase the DR capabilities of a cluster
Hyper-V and General Use File Server are main use cases in TP
Scale-out File Server is not supported as stretched solution with SR
FAMILIAR FAILOVER CLUSTER MANAGER UI
CONFIGURE AND MANAGE
1. Add a source data disk to a role or CSV
2. Enable replication on that source disk
3. Select a source log
4. Select a destination data disk
5. Select a destination log
All these steps (except 1) would be changed in Consumer Preview UI
SERVER TO SERVER
Synchronous or Asynchronous
Any type of fixed storage (same rules as previously)
Configure and manage via PowerShell (no UI)
Designed to increase DR capabilities of a server
Cluster to cluster is coming
File Server is the main use case in Technical Preview
WINDOWS POWERSHELL
Module WVR
Get-SRGroup
Get-SRPartnership
New-SRPartnership (Create)
Remove-SRGroup
Remove-SRPartnership
Set-SRGroup
Set-SRPartnership (Change direction)
Suspend-SRGroup
Sync-SRGroup
NON-CLUSTERED ARCHITECTURE
CLUSTERED ARCHITECTURE
SMB 3.11 TRANSPORT
Using scalability and performance built in SMB3
Multichannel
SMB Direct (RDMA)
Encryption and signing
REPLICATION METADATA
Hidden disk partition
REPLICATION METADATA
Hidden disk partition
Hidden logs
Always write through disk
Inside System Volume Information
REPLICATION METADATA
Hidden disk partition
Hidden logs
Always write through disk
Placed inside System Volume Information
Registry (real and cluster hive)
HKLMSoftwareMicrosoftWVR
HKLMClusterWVR
SUPPORTABILITY
Performance Counters
Dozen of counters
Many changes done after TP already
SUPPORTABILITY
Performance Counters
Dozen of counters
Many changes done after TP already
Event logs
Hundreds of clear, guided, low-noise events
Many changes done after TP already
KNOWN ISSUES IN TECHNICAL PREVIEW
Removal of replication in Failover Cluster Manager doesn’t work
PowerShell remoting doesn’t work
Performance
Failover Cluster Manager UI
The name.. WVR
PLANS FOR CHANGES
Azure Site Recovery integration
Cluster-to-cluster
Ease of management
Performance
Migration
Inventory
GET THE GUIDE
Technical Preview Step-by-Step Guide: Storage Replica:
http://go.microsoft.com/fwlink/?LinkID=514902

More Related Content

Windows Server "10": что нового в кластеризации

  • 1. WINDOWS SERVER “10”: ЧТО НОВОГО В КЛАСТЕРИЗАЦИИ A. Kibkalo
  • 2. CLUSTER ROLLING UPGRADES Decrease Time to Value Ability to upgrade a cloud platform OS to WS.vNext without interrupting cluster workloads, adopting new capabilities with no downtime or SLA penalties.
  • 3. WHAT IS THE CLUSTER OS ROLLING UPGRADE PROCESS? Scenario: • Start with a Hyper-V cluster with 7 WS2012R2 nodes and 15 VMs • Support both storage topologies for Hyper-V • Disaggregated – File Based Storage with SMB (Scale-out File Server) • Converged – Local block storage and Cluster Shared Volumes
  • 4. FOR RACH NODE IN CLUSTER: STEP 1: PAUSE NODE | DRAIN ROLES Node is paused and gracefully drained of all running virtual machines VMs are live migrated to other nodes – with no downtime
  • 5. STEP 2: EVICT A NODE, CLEAN INSTALL NEW OS WS.vNext is installed OS is wiped and a clean install of WS.vNext is done
  • 6. STEP 3: REJOIN NODE TO CLUSTER Done from WS.vNext or Windows 10 with RSAT Node is added back to cluster Cluster runs with “Mixed OS versions” Cluster Functional Level stays WS2012R2 Enhancements of WS.vNext node will operate in compatibility mode New features which impact downlevel compatibility on WS.vNext node will not be enabled
  • 7. STEP 4: REBALANCE THE CLUSTER WORKLOAD VMs can fail over and live migrate anywhere in mixed mode Uplevel to WS.vNext node Downlevel to WS2012R2 node
  • 8. STEP 5: REPEAT STEPS 1-4 FOR THE NEXT NODE Process is repeated on the next node UI’s can manage downlevel nodes Entire cluster is manageable from WS.vNext node WS.vNext can manage downlevel WS2012R2 nodes Cannot manage uplevel (WS.vNext) nodes from WS2012R2 node
  • 9. UPGRADE ALL NODES Process continues until all nodes are upgraded to WS.vNext
  • 10. UPGRADE CLUSTER FUNCTIONAL LEVEL Once all nodes are upgraded to WS.vNext the Cluster Functional Level is upgraded via Upgrade-ClusterFunctionalLevel cmdlet Cluster Functional Level considerations: Cannot be upgraded until all nodes are running WS.vNext Point of no return – no WS2012R2 nodes can be added after All compatibility mode disabled vNext features are unlocked Some features require additional steps to be done: Spaces require Update-StoragePool cmdlet to upgrade the pool VMs require Update-VMConfigurationVersion (VM is off) to unlock some new features, like vTPM
  • 11. UPGRADE COMPLETE Private Cloud Upgrade: All nodes are running WS.vNext Cluster Functional Level is vNext No downtime to tenant VMs Lower cost of adopting vNext vNext features for the cluster: Storage Replication Cloud Witness in Azure VM Resiliency Node isolation Quarantine
  • 12. CLUSTER OS ROLLING UPGRADE GUIDANCE Recommended not to run in Mixed mode over two weeks Co not create or resize storage on WS.vNext nodes while in Mixed mode Test an upgrade from WS2012R2 to Technical Preview now Not supported in Technical Preview release: Cluster OS Rolling Upgrade of cluster with data de-duplication Cluster OS Rolling Upgrade of VMs with SCDPM backups Cluster OS Rolling Upgrade of Shared VHDX guest clusters
  • 13. HYPER-V CONFIGURATION VERSIONING What is Configuration version? Configuration files Saved State and Snapshot Files What OS support what configuration versions?
  • 14. INTERACTING WITH HYPER-V CONFIGURATION VERSION PowerShell: Update-VMConfigurationVersion cmdlet UI will come in future builds Virtual Machine must be off Saved state and online checkpoints are discarded Cluster functional level must be upgraded
  • 16. TECHNICAL PREVIEW SCENARIOS Stretch Cluster Server to Server
  • 17. BLOCKS, NOT FILES SR is not DFSR Replicating storage blocks underneath the volume Don’t care if files are in use Write IOs is all that matters for Storage Replica
  • 21. REQUIREMENTS Kerberos (for SMB) >= 1Gbps between servers Disks: GBT, no MBR Yes: JBOD, iSCSI, Local SCSI or SATA, SAN No: USB, thumb drives, tapes, floppy disks, etc Same disk geometry Free space for logs on Windows volume No %systemroot% or page file Firewall: SMB and WS-MAN
  • 22. RECOMMENDATIONS FOR SYNCHRONOUS Latency: <=5ms round trip average Most cases 30-50km on 10GBASE or dark fibre Write IO: Perfmon (logical disk) and DISKSPD Use micro benchmarks before and after SR Log sizing and backing SSD or bust Larger logs allow faster recovery from larger outages, but cost space
  • 23. RECOMMENDATIONS FOR ASYNCHRONOUS Latency: Doesn’t matter Write IO: Perfmon (logical disk) and DISKSPD Use micro benchmarks before and after SR Log sizing and backing SSD or bust Larger logs allow faster recovery from larger outages, but cost space
  • 24. PHILOSOPHY Async crash consistency versus application consistency SR guarantees mountable volume App guarantees a readable file Snapshots… Are not working in Technical Preview Async means some data loss No RPO in Technical Preview How much money is your data worth?
  • 25. GOOD IDEAS TO FOLLOW Drivers, drivers and… drivers Filters! Performance envelopes
  • 26. WHAT ARE THE OPTIONS?
  • 27. WHAT ISN’T STORAGE REPLICA Storage Replica is not a “shared nothing” clustering Storage Replica is not a backup Will easily replicate deleted/damaged data Storage Replica is not DFS-R No file-level No multi-master No multi-endpoint No low bandwidth Storage Replica is not therefore a general branch office solution
  • 28. TOPOLOGY Stretch cluster Server-to-server No 1-to-many No a->b->c No a->b->c->d Cluster-to-cluster is coming Not working in Technical Preview Fun fact: you can set up Storage Replica to replicate server to itself Clone disks, or even volumes Create a local mirror for future remote initial sync as seed
  • 29. STRETCH CLUSTERS Synchronous only Asymmetric storage only Two sites Two sets of shared storage Cluster storage: CSV or role assigned Configure and manage via Failover Cluster Manager or PowerShell Designed to increase the DR capabilities of a cluster Hyper-V and General Use File Server are main use cases in TP Scale-out File Server is not supported as stretched solution with SR
  • 31. CONFIGURE AND MANAGE 1. Add a source data disk to a role or CSV 2. Enable replication on that source disk 3. Select a source log 4. Select a destination data disk 5. Select a destination log All these steps (except 1) would be changed in Consumer Preview UI
  • 32. SERVER TO SERVER Synchronous or Asynchronous Any type of fixed storage (same rules as previously) Configure and manage via PowerShell (no UI) Designed to increase DR capabilities of a server Cluster to cluster is coming File Server is the main use case in Technical Preview
  • 33. WINDOWS POWERSHELL Module WVR Get-SRGroup Get-SRPartnership New-SRPartnership (Create) Remove-SRGroup Remove-SRPartnership Set-SRGroup Set-SRPartnership (Change direction) Suspend-SRGroup Sync-SRGroup
  • 36. SMB 3.11 TRANSPORT Using scalability and performance built in SMB3 Multichannel SMB Direct (RDMA) Encryption and signing
  • 38. REPLICATION METADATA Hidden disk partition Hidden logs Always write through disk Inside System Volume Information
  • 39. REPLICATION METADATA Hidden disk partition Hidden logs Always write through disk Placed inside System Volume Information Registry (real and cluster hive) HKLMSoftwareMicrosoftWVR HKLMClusterWVR
  • 40. SUPPORTABILITY Performance Counters Dozen of counters Many changes done after TP already
  • 41. SUPPORTABILITY Performance Counters Dozen of counters Many changes done after TP already Event logs Hundreds of clear, guided, low-noise events Many changes done after TP already
  • 42. KNOWN ISSUES IN TECHNICAL PREVIEW Removal of replication in Failover Cluster Manager doesn’t work PowerShell remoting doesn’t work Performance Failover Cluster Manager UI The name.. WVR
  • 43. PLANS FOR CHANGES Azure Site Recovery integration Cluster-to-cluster Ease of management Performance Migration Inventory
  • 44. GET THE GUIDE Technical Preview Step-by-Step Guide: Storage Replica: http://go.microsoft.com/fwlink/?LinkID=514902