SlideShare a Scribd company logo
Improving performance
using .NET Core 3.0
USING THE GREAT WORK OF OTHERS TO MAKE US LOOK AWESOME
Awesome .NET Performance
https://github.com/adamsitnik/awesome-dot-net-performance
Framework
Improvements
WHAT HAS HAPPENED IN .NET CORE THAT HELPS PERFORMANCE?
Reduced memory allocations
Less time spent in GC collections and less overall GC pressure
Less time allocating and deallocating objects means more CPU for you
Across the framework, lots of small improvements over many classes.
Span<T>
C# 7.2
https://apisof.net/ & https://source.dot.net
Span<T>
https://adamsitnik.com/Span/
Stack access only (use Memory<T> for the heap)
Can’t use it as a field in a class (since a class is on the heap) but can use it in a struct.
Can’t do async/await with it (since the compiler creates a state machine… on the heap)
Substring Comparison
someText.Substring(startIndex: someText.Length / 2);
someText.AsSpan().Slice(start: someText.Length / 2);
Memory<T>
Has a .Span property that you can use to get a Span in a method
Create it from a string, array, or something implementing IOwnedMemory.
Lots of methods in .NET Core 2.1+ take Spans as arguments.
Many more do so in .NET Core 3.0 (.Net Standard 2.1)
https://apisof.net/
Base64.EncodeToUtf8(ReadOnlySpan<Byte>,Span<Byte>,Int32,Int32,Boolean)
System.Buffers.ArrayPool
Object pooling pattern - https://www.codeproject.com/articles/20848/c-object-pooling
In .NET Core (System.Buffers) - https://adamsitnik.com/Array-Pool/
var samePool = ArrayPool<byte>.Shared;
byte[] buffer = samePool.Rent(minLength);
try {
Use(buffer);
} finally {
samePool.Return(buffer);
}
Cheaper as soon as you need 1K of memory (or more) – and no allocations required.
System.Buffers.ArrayPool
String interning
https://taagung.com/string-interning/
https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=netframework-4.7.2
Compiler puts all hardcoded strings in an assembly into an “intern pool” and references point to
them to avoid duplications.
String.Intern() is for using the same concept at runtime.
Warning: Strings in the intern pool are NEVER GC’ed. Great for unplanned memory leaks! Used
with caution can reap large benefits in certain scenarios.
ref locals and ref returns
ref int Max(ref int first, ref int second, ref int third) {
ref int max = ref first;
if (first < second) max = second;
if (second < third) max = third;
return ref max;
}
The method result is simply a reference to whichever value was the largest.
It has zero allocations.
Reduce casting and boxing
Warning: Casting to generic interfaces is sloooow!
https://www.danielcrabtree.com/blog/191/casting-to-ienumerable-t-is-two-orders-of-
magnitude-slower
Boxing operations create invisible allocations. Some boxing operations are hard to spot.
LINQ & Closures
class Symbol { public string Name { get; private set; } /*...*/
}
class Compiler {
private List<Symbol> symbols;
public Symbol FindMatchingSymbol(string name) {
return symbols.FirstOrDefault(s => s.Name == name);
}
}
private class Lambda1Environment {
public string capturedName;
public bool Evaluate(Symbol s) {
return s.Name == this.capturedName;
}
}
Lambda1Environment l = new Lambda1Environment
capturedName = name };
var predicate = new Func<Symbol, bool>(l.Evaluate);
Func<Symbol, bool> predicate = s => s.Name == name;
return symbols.FirstOrDefault(predicate);
Boxing operation.
FirstOrDefault() is an extension
method on IEnumerable<T>
Compiles to…
Alternative implementation?
Not as pretty, but no allocations.
foreach will use the List<T> iterator. No casting and no hidden lambda code.
public Symbol FindMatchingSymbol(string name)
{
foreach (Symbol s in symbols)
{
if (s.Name == name) return s;
}
return null;
}
MemoryMarshal (helps with Spans)
public Span<byte> FloatsToSpanOfBytes() => MemoryMarshal.Cast<float, byte>(arrayOfFloats);
----
[StructLayout(LayoutKind.Explicit)]
public struct Bid {
[FieldOffset(0)] public float Value;
[FieldOffset(4)] public long ProductId;
[FieldOffset(12)] public long UserId;
[FieldOffset(20)] public DateTime Time;
}
…
public Bid Deserialize(ReadOnlySpan<byte> serialized) => MemoryMarshal.Read<Bid>(serialized);
stackalloc Keyword
Allows you to directly allocate memory on the stack
Don’t overdo it and keep it for short-lived usage
Beware: It’s easy to misuse this and make things worse
Span<byte> bytes = length <= 128 ?
stackalloc byte[length] :
new byte[length];
Platform Instrinsics
System.Runtime.Intrinsics – let you use hardware accelerated SIMD specific to ARM, x64, etc.
https://bits.houmus.org/2018-08-18/netcoreapp3.0-instrinsics-in-real-life-pt1
For general use the platform independent Vector SIMD instructions are preferred.
(check System.Numerics.Vector.IsHardwareAccelerated)
Theory Time is Over
LET’S IMPROVE THE PERFORMANCE OF “SOMETHING”
Tip #1:
Understand the “Why?”
BLOCKING & I/O CAN HURT MORE THAN HEAVY CPU USE
Tip #2:
Stay Focused
DON’T OPTIMISE THE UNIMPORTANT STUFF. THINK “HOT PATH”
Tip #3:
Provable Improvements
MEASURE, CHANGE, MEASURE AGAIN.
Let’s work with some real code!
Our target library: PdfPig
Features:
* Targets .NET Standard 2.0
* Port of Apache PDFBox to C#
* Has lots of tests
(And it’s not something I’d seen before prepping this session)
Tooling
PerfView
◦ https://github.com/microsoft/perfview
BenchmarkDotNet
◦ https://benchmarkdotnet.org/
ILSpy:
◦ https://github.com/icsharpcode/ILSpy
VisualStudio 2019 Diagnostic tools (Optional)
Speedscope
◦ https://www.speedscope.app/
---
For X-Plat: dotnet-counters, dotnet-trace, dotnet-dump
◦ https://github.com/dotnet/diagnostics/tree/master/documentation
What we’ll do
Measure current performance (using .NET Core 2.2)
Upgrade to .NET Core 3.0 prev. 7 & compare performance
Analyse performance using PerfView
Run microbenchmarks to measure specific performance areas
What you’ll do
Clone https://github.com/rbanks54/PdfPig
◦ use the benchmarks branch
Identify an area you want to improve
Go ahead. Try and improve it. And prove it. 
Suggested developer loop:
1. Ensure all unit tests pass & baseline current performance
2. Make a change
3. Check unit tests still pass
4. Measure new performance and compare with baseline
5. Repeat from step 2 until happy

More Related Content

Improving app performance using .Net Core 3.0

  • 1. Improving performance using .NET Core 3.0 USING THE GREAT WORK OF OTHERS TO MAKE US LOOK AWESOME
  • 3. Framework Improvements WHAT HAS HAPPENED IN .NET CORE THAT HELPS PERFORMANCE?
  • 4. Reduced memory allocations Less time spent in GC collections and less overall GC pressure Less time allocating and deallocating objects means more CPU for you Across the framework, lots of small improvements over many classes.
  • 5. Span<T> C# 7.2 https://apisof.net/ & https://source.dot.net
  • 6. Span<T> https://adamsitnik.com/Span/ Stack access only (use Memory<T> for the heap) Can’t use it as a field in a class (since a class is on the heap) but can use it in a struct. Can’t do async/await with it (since the compiler creates a state machine… on the heap)
  • 7. Substring Comparison someText.Substring(startIndex: someText.Length / 2); someText.AsSpan().Slice(start: someText.Length / 2);
  • 8. Memory<T> Has a .Span property that you can use to get a Span in a method Create it from a string, array, or something implementing IOwnedMemory. Lots of methods in .NET Core 2.1+ take Spans as arguments. Many more do so in .NET Core 3.0 (.Net Standard 2.1) https://apisof.net/ Base64.EncodeToUtf8(ReadOnlySpan<Byte>,Span<Byte>,Int32,Int32,Boolean)
  • 9. System.Buffers.ArrayPool Object pooling pattern - https://www.codeproject.com/articles/20848/c-object-pooling In .NET Core (System.Buffers) - https://adamsitnik.com/Array-Pool/ var samePool = ArrayPool<byte>.Shared; byte[] buffer = samePool.Rent(minLength); try { Use(buffer); } finally { samePool.Return(buffer); } Cheaper as soon as you need 1K of memory (or more) – and no allocations required.
  • 11. String interning https://taagung.com/string-interning/ https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=netframework-4.7.2 Compiler puts all hardcoded strings in an assembly into an “intern pool” and references point to them to avoid duplications. String.Intern() is for using the same concept at runtime. Warning: Strings in the intern pool are NEVER GC’ed. Great for unplanned memory leaks! Used with caution can reap large benefits in certain scenarios.
  • 12. ref locals and ref returns ref int Max(ref int first, ref int second, ref int third) { ref int max = ref first; if (first < second) max = second; if (second < third) max = third; return ref max; } The method result is simply a reference to whichever value was the largest. It has zero allocations.
  • 13. Reduce casting and boxing Warning: Casting to generic interfaces is sloooow! https://www.danielcrabtree.com/blog/191/casting-to-ienumerable-t-is-two-orders-of- magnitude-slower Boxing operations create invisible allocations. Some boxing operations are hard to spot.
  • 14. LINQ & Closures class Symbol { public string Name { get; private set; } /*...*/ } class Compiler { private List<Symbol> symbols; public Symbol FindMatchingSymbol(string name) { return symbols.FirstOrDefault(s => s.Name == name); } } private class Lambda1Environment { public string capturedName; public bool Evaluate(Symbol s) { return s.Name == this.capturedName; } } Lambda1Environment l = new Lambda1Environment capturedName = name }; var predicate = new Func<Symbol, bool>(l.Evaluate); Func<Symbol, bool> predicate = s => s.Name == name; return symbols.FirstOrDefault(predicate); Boxing operation. FirstOrDefault() is an extension method on IEnumerable<T> Compiles to…
  • 15. Alternative implementation? Not as pretty, but no allocations. foreach will use the List<T> iterator. No casting and no hidden lambda code. public Symbol FindMatchingSymbol(string name) { foreach (Symbol s in symbols) { if (s.Name == name) return s; } return null; }
  • 16. MemoryMarshal (helps with Spans) public Span<byte> FloatsToSpanOfBytes() => MemoryMarshal.Cast<float, byte>(arrayOfFloats); ---- [StructLayout(LayoutKind.Explicit)] public struct Bid { [FieldOffset(0)] public float Value; [FieldOffset(4)] public long ProductId; [FieldOffset(12)] public long UserId; [FieldOffset(20)] public DateTime Time; } … public Bid Deserialize(ReadOnlySpan<byte> serialized) => MemoryMarshal.Read<Bid>(serialized);
  • 17. stackalloc Keyword Allows you to directly allocate memory on the stack Don’t overdo it and keep it for short-lived usage Beware: It’s easy to misuse this and make things worse Span<byte> bytes = length <= 128 ? stackalloc byte[length] : new byte[length];
  • 18. Platform Instrinsics System.Runtime.Intrinsics – let you use hardware accelerated SIMD specific to ARM, x64, etc. https://bits.houmus.org/2018-08-18/netcoreapp3.0-instrinsics-in-real-life-pt1 For general use the platform independent Vector SIMD instructions are preferred. (check System.Numerics.Vector.IsHardwareAccelerated)
  • 19. Theory Time is Over LET’S IMPROVE THE PERFORMANCE OF “SOMETHING”
  • 20. Tip #1: Understand the “Why?” BLOCKING & I/O CAN HURT MORE THAN HEAVY CPU USE
  • 21. Tip #2: Stay Focused DON’T OPTIMISE THE UNIMPORTANT STUFF. THINK “HOT PATH”
  • 22. Tip #3: Provable Improvements MEASURE, CHANGE, MEASURE AGAIN.
  • 23. Let’s work with some real code! Our target library: PdfPig Features: * Targets .NET Standard 2.0 * Port of Apache PDFBox to C# * Has lots of tests (And it’s not something I’d seen before prepping this session)
  • 24. Tooling PerfView ◦ https://github.com/microsoft/perfview BenchmarkDotNet ◦ https://benchmarkdotnet.org/ ILSpy: ◦ https://github.com/icsharpcode/ILSpy VisualStudio 2019 Diagnostic tools (Optional) Speedscope ◦ https://www.speedscope.app/ --- For X-Plat: dotnet-counters, dotnet-trace, dotnet-dump ◦ https://github.com/dotnet/diagnostics/tree/master/documentation
  • 25. What we’ll do Measure current performance (using .NET Core 2.2) Upgrade to .NET Core 3.0 prev. 7 & compare performance Analyse performance using PerfView Run microbenchmarks to measure specific performance areas
  • 26. What you’ll do Clone https://github.com/rbanks54/PdfPig ◦ use the benchmarks branch Identify an area you want to improve Go ahead. Try and improve it. And prove it.  Suggested developer loop: 1. Ensure all unit tests pass & baseline current performance 2. Make a change 3. Check unit tests still pass 4. Measure new performance and compare with baseline 5. Repeat from step 2 until happy