SlideShare a Scribd company logo
Audience
Getting Space Pirate Trainer* to Perform ON
Intel® Graphics
Dirk Van Welden – Founder of I-Illusions
Cristiano Ferreira – Intel Developer Relations Engineer
Seth Schneider – Intel® GPA Product Owner
2
▪ Getting cozy with Space Pirate Trainer*
▪ Why target mainstream VR?
▪ Dive into optimizations found with Intel® GPA
▪ Questions?
Agenda
What is Space Pirate
Trainer*?spacepiratetrainer.com
4
5
● Launch title for HTC* Vive, Oculus*
Touch and Microsoft* Mixed
Reality
● Early access title since April 2016
● 1.0 since October 2017
● 150,000+ units sold
● Used worldwide in VR arcades and
as a demo experience
A Few Facts
6
● Rift backer
● Valve / Vive kit
● Here’s a VR demo
“I’m never going to create a VR game”
7
Pre-beta
Evolution
1.0
While keeping min spec of NVIDIA* 970GTX
8
● Initial MSMR port
○ Together with Microsoft*
○ Up & Running in 3 days
○ Same min. spec of 970
● Mainstream audience demo feedback
○ Cool, can I run it on my ultrabook?
○ Affordable HMD <-> Expensive PC/Laptop
● Huge market opportunity, but at the time, limited quality content
Why Mainstream?
9
● Initial mainstream version (Intel® Core™ i5 processor family, Intel®
HD620 Graphics, Intel® NUC)
○ 12 FPS
● No point-lights, no post-effects
○ Almost 30 FPS, ugly
● “I seriously doubt this, but let’s try”
Let’s do it! … But
10
Before and After…
12 FPS
Constant Hitches
Fantastic Visual Quality
60 FPS
No Hitches
Good Visual Quality
Tools and Hardware
12
Development Configuration
13
Microsoft* Surface Pro 1796
Todays Configuration
Surface Compatibility: https://goo.gl/YAuQ3X
14
What’s Inside Intel® Graphics Performance
Analyzer (Intel® GPA)? System Analyzer / HUD
Graphics Frame Analyzer
In-game analysis
Single frame analysisTimeline analysis
Graphics Monitor
Launch & config tool
Graphics Trace Analyzer
15
Performance Feature Focused
Hotspot Analysis
Identifies most expensive sets of
events grouped by state and/or
bottleneck
Metrics Analysis
Identifies exact hardware bottleneck
Playback Experiments
Test performance optimizations and
quantify improvements
Optimizations Shaders and Materials
17
Shaders – Floor
Before (Standard Shader) After (Lambert Shader)
267 Instructions
47 Instructions
~1.5ms GPU Duration ~0.3ms GPU Duration
(5x Performance Improvement)
18
Shaders – Floor Visual Quality
Before (Standard Shader) After (Lambert Shader)
19
Shaders – Microsoft* Windows Mixed Reality Toolkit
▪ Microsoft* has recreated all of Unity’s* built-in shaders to contain
significantly less math ops
▪ Quick action:
– Download the kit: https://github.com/Microsoft/MixedRealityToolkit-Unity
– When detecting Mainstream WinMR swap materials to the ones
contained in the kit
20
Shaders - Material Batching With Unlit Shaders
…To the Frame Analyzer!
21
Shaders - Material Batching With Unlit Shaders
No Batching or Instancing
# of Draws: 1300
# of Vertices: 1.5M
GPU Duration: 3.5ms
Batching and Instancing
# of Draws: 8
# of Vertices: 2M
GPU Duration: 1.7ms
(2x Performance Improvement)
22
Shader LODs - Droid Lazers
Optimizations Lighting and Post Effects
24
Remove Dynamic Lights
• Dynamic Lights need to
render multiple passes for
each light contributing
5ms of frame time!
• Events 83 and 84 are the
base pass, then 123-130
are the additional passes
for each dynamic light
25
Post Processing Stacks - Bloom
…To the Frame Analyzer!
Again!
26
Post Processing Stacks - Bloom
Low Settings
• Ended up using the Mobile Bloom PFX
Stack.
• Consolidated all PFX into one pass
• GPU Duration: 0.6ms
(4x Performance Improvement)
High Settings
• Initially reduced number of passes to 14
• GPU Duration: 2.6ms
27
Post Processing - HDR – Vertical Flip
▪ There is a required vertical flip
step that happens when using
HDR effects
▪ The way to avoid the .3 ms /
frame penalty is to uncheck all
HDR boxes on scene cameras
& to remove image effects that
may require it (usually
tonemapping / bloom / etc).
▪ If using post effects, use the
post effects stack and not
image effects
▪ Removed any effects with a
depth pass (Fog, etc.)
28
Post Processing - Antialiasing
29
Post Processing - TSCMAA
▪ Temporally Stable post effect anti aliasing
techniques like CMAA can provide equivalent
functionality at half the cost
▪ If necessary, use Temporally Stable CMAA
(TSCMAA) – good if rendering at less than
1280x1280 and upscaling.
Performance: 1.5X performance improvement with
TSCMAA over 4x MSAA
4x MSAA
TSCMAA
Optimizations CPU Performance and
Power Sharing
31
Raycasting CPU Side Improvement For the Lasers
Optimizations General Best Practices
33
Render at a Lower Res & Upscale
34
Render at Low Res & Upscale - Code Sample
35
RenderQueue Order for WinMR
1. Draw VR hands and any
interactibles (weapons, etc.)
2. Draw scene dressings,
dynamic / small static objects
3. Draw Large Set Pieces
(Buildings, Ship, etc.)
4. Draw the floor
5. Draw skybox (usually already
done last if using built-in Unity
skybox)
1
2
3
4
5
36
RenderQueue Order for WinMR
…To the Frame Analyzer!
One more time!
37
Check RenderQueue Order with Intel® GPA
38
How to Change RenderQueue Order in Unity*
39
Skybox Compression
Low Settings
• 1K texture
• GPU Duration: 0.2ms
(5x Performance Improvement)
High Settings
• 2k texture - originally was
uncompressed at 4k
• GPU Duration: 1.1ms
Summary
41
42
Results – 4x Faster!
12 FPS on default settings
w/out lights and PFX, about 30 FPS
60 FPS (low) and 35 FPS (High)
Better performance on all platforms!
Questions?
44
Helpful Links:
- Perf Recommendations for immersive
headset apps: https://goo.gl/4V4kpr
- Porting Guides: https://goo.gl/QTbWYp
- Enthusiast’s Guide:
https://goo.gl/gKZE2w
- Development Hardware:
https://goo.gl/gNG5oK
Tools:
Download Intel® GPA for
FREE at
software.intel.com/gpa
Tech:
TSCMAA Article and Sample:
https://goo.gl/6FFnKp
Getting Started with WinMR Optimization
Legal Disclaimers and Optimization Notices
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as
any warranty arising from course of performance, course of dealing, or usage in trade.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a
non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are
available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit www.intel.com/benchmarks.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These
optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to
Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your
system hardware, software or configuration may affect your actual performance.
Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.
Getting Space Pirate Trainer* to Perform on Intel® Graphics

More Related Content

Getting Space Pirate Trainer* to Perform on Intel® Graphics

  • 1. Audience Getting Space Pirate Trainer* to Perform ON Intel® Graphics Dirk Van Welden – Founder of I-Illusions Cristiano Ferreira – Intel Developer Relations Engineer Seth Schneider – Intel® GPA Product Owner
  • 2. 2 ▪ Getting cozy with Space Pirate Trainer* ▪ Why target mainstream VR? ▪ Dive into optimizations found with Intel® GPA ▪ Questions? Agenda
  • 3. What is Space Pirate Trainer*?spacepiratetrainer.com
  • 4. 4
  • 5. 5 ● Launch title for HTC* Vive, Oculus* Touch and Microsoft* Mixed Reality ● Early access title since April 2016 ● 1.0 since October 2017 ● 150,000+ units sold ● Used worldwide in VR arcades and as a demo experience A Few Facts
  • 6. 6 ● Rift backer ● Valve / Vive kit ● Here’s a VR demo “I’m never going to create a VR game”
  • 8. 8 ● Initial MSMR port ○ Together with Microsoft* ○ Up & Running in 3 days ○ Same min. spec of 970 ● Mainstream audience demo feedback ○ Cool, can I run it on my ultrabook? ○ Affordable HMD <-> Expensive PC/Laptop ● Huge market opportunity, but at the time, limited quality content Why Mainstream?
  • 9. 9 ● Initial mainstream version (Intel® Core™ i5 processor family, Intel® HD620 Graphics, Intel® NUC) ○ 12 FPS ● No point-lights, no post-effects ○ Almost 30 FPS, ugly ● “I seriously doubt this, but let’s try” Let’s do it! … But
  • 10. 10 Before and After… 12 FPS Constant Hitches Fantastic Visual Quality 60 FPS No Hitches Good Visual Quality
  • 13. 13 Microsoft* Surface Pro 1796 Todays Configuration Surface Compatibility: https://goo.gl/YAuQ3X
  • 14. 14 What’s Inside Intel® Graphics Performance Analyzer (Intel® GPA)? System Analyzer / HUD Graphics Frame Analyzer In-game analysis Single frame analysisTimeline analysis Graphics Monitor Launch & config tool Graphics Trace Analyzer
  • 15. 15 Performance Feature Focused Hotspot Analysis Identifies most expensive sets of events grouped by state and/or bottleneck Metrics Analysis Identifies exact hardware bottleneck Playback Experiments Test performance optimizations and quantify improvements
  • 17. 17 Shaders – Floor Before (Standard Shader) After (Lambert Shader) 267 Instructions 47 Instructions ~1.5ms GPU Duration ~0.3ms GPU Duration (5x Performance Improvement)
  • 18. 18 Shaders – Floor Visual Quality Before (Standard Shader) After (Lambert Shader)
  • 19. 19 Shaders – Microsoft* Windows Mixed Reality Toolkit ▪ Microsoft* has recreated all of Unity’s* built-in shaders to contain significantly less math ops ▪ Quick action: – Download the kit: https://github.com/Microsoft/MixedRealityToolkit-Unity – When detecting Mainstream WinMR swap materials to the ones contained in the kit
  • 20. 20 Shaders - Material Batching With Unlit Shaders …To the Frame Analyzer!
  • 21. 21 Shaders - Material Batching With Unlit Shaders No Batching or Instancing # of Draws: 1300 # of Vertices: 1.5M GPU Duration: 3.5ms Batching and Instancing # of Draws: 8 # of Vertices: 2M GPU Duration: 1.7ms (2x Performance Improvement)
  • 22. 22 Shader LODs - Droid Lazers
  • 24. 24 Remove Dynamic Lights • Dynamic Lights need to render multiple passes for each light contributing 5ms of frame time! • Events 83 and 84 are the base pass, then 123-130 are the additional passes for each dynamic light
  • 25. 25 Post Processing Stacks - Bloom …To the Frame Analyzer! Again!
  • 26. 26 Post Processing Stacks - Bloom Low Settings • Ended up using the Mobile Bloom PFX Stack. • Consolidated all PFX into one pass • GPU Duration: 0.6ms (4x Performance Improvement) High Settings • Initially reduced number of passes to 14 • GPU Duration: 2.6ms
  • 27. 27 Post Processing - HDR – Vertical Flip ▪ There is a required vertical flip step that happens when using HDR effects ▪ The way to avoid the .3 ms / frame penalty is to uncheck all HDR boxes on scene cameras & to remove image effects that may require it (usually tonemapping / bloom / etc). ▪ If using post effects, use the post effects stack and not image effects ▪ Removed any effects with a depth pass (Fog, etc.)
  • 28. 28 Post Processing - Antialiasing
  • 29. 29 Post Processing - TSCMAA ▪ Temporally Stable post effect anti aliasing techniques like CMAA can provide equivalent functionality at half the cost ▪ If necessary, use Temporally Stable CMAA (TSCMAA) – good if rendering at less than 1280x1280 and upscaling. Performance: 1.5X performance improvement with TSCMAA over 4x MSAA 4x MSAA TSCMAA
  • 30. Optimizations CPU Performance and Power Sharing
  • 31. 31 Raycasting CPU Side Improvement For the Lasers
  • 33. 33 Render at a Lower Res & Upscale
  • 34. 34 Render at Low Res & Upscale - Code Sample
  • 35. 35 RenderQueue Order for WinMR 1. Draw VR hands and any interactibles (weapons, etc.) 2. Draw scene dressings, dynamic / small static objects 3. Draw Large Set Pieces (Buildings, Ship, etc.) 4. Draw the floor 5. Draw skybox (usually already done last if using built-in Unity skybox) 1 2 3 4 5
  • 36. 36 RenderQueue Order for WinMR …To the Frame Analyzer! One more time!
  • 37. 37 Check RenderQueue Order with Intel® GPA
  • 38. 38 How to Change RenderQueue Order in Unity*
  • 39. 39 Skybox Compression Low Settings • 1K texture • GPU Duration: 0.2ms (5x Performance Improvement) High Settings • 2k texture - originally was uncompressed at 4k • GPU Duration: 1.1ms
  • 41. 41
  • 42. 42 Results – 4x Faster! 12 FPS on default settings w/out lights and PFX, about 30 FPS 60 FPS (low) and 35 FPS (High) Better performance on all platforms!
  • 44. 44 Helpful Links: - Perf Recommendations for immersive headset apps: https://goo.gl/4V4kpr - Porting Guides: https://goo.gl/QTbWYp - Enthusiast’s Guide: https://goo.gl/gKZE2w - Development Hardware: https://goo.gl/gNG5oK Tools: Download Intel® GPA for FREE at software.intel.com/gpa Tech: TSCMAA Article and Sample: https://goo.gl/6FFnKp Getting Started with WinMR Optimization
  • 45. Legal Disclaimers and Optimization Notices No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation.