SlideShare a Scribd company logo
Dictionaries,
Hash Tables and Sets
Dictionaries, Hash Tables,
Hashing, Collisions, Sets
SoftUni Team
Technical Trainers
Software University
http://softuni.bg
Table of Contents
1. Dictionary (Map) Abstract Data Type
2. Hash Tables, Hashing and Collision Resolution
3.Dictionary<TKey, TValue> Class
4. Sets: HashSet<T> and SortedSet<T>
2
Dictionaries
Data Structures that Map Keys to Values
4
 The abstract data type (ADT) "dictionary" maps key to values
 Also known as "map" or "associative array"
 Holds a set of {key, value} pairs
 Dictionary ADT operations:
 Add(key, value)
 FindByKey(key)  value
 Delete(key)
 Many implementations
 Hash table, balanced tree, list, array, ...
The Dictionary (Map) ADT
5
 Sample dictionary:
ADT Dictionary – Example
Key Value
C#
Modern general-purpose object-oriented programming
language for the Microsoft .NET platform
PHP
Popular server-side scripting language for Web
development
compiler
Software that transforms a computer program to
executable machine code
… …
Hash Tables
What is Hash Table? How it Works?
7
 A hash table is an array that holds a set of {key, value} pairs
 The process of mapping a key to a position in a table is called
hashing
Hash Table
… … … … … … … …
0 1 2 3 4 5 … m-1
T
h(k) Hash table
of size m
Hash function
h: k → 0 … m-1
8
 A hash table has m slots, indexed from 0 to m-1
 A hash function h(k) maps the keys to positions:
 h: k → 0 … m-1
 For arbitrary value k in the key range and some hash function h
we have h(k) = p and 0 ≤ p < m
Hash Functions and Hashing
… … … … … … … …
0 1 2 3 4 5 … m-1
T
h(k)
9
 Perfect hashing function (PHF)
 h(k): one-to-one mapping of each key k to an integer in the
range [0, m-1]
 The PHF maps each key to a distinct integer within some
manageable range
 Finding a perfect hashing function is impossible in most cases
 More realistically
 Hash function h(k) that maps most of the keys onto unique
integers, but not all
Hashing Functions
10
 A collision comes when different keys have the same hash value
h(k1) = h(k2) for k1 ≠ k2
 When the number of collisions is sufficiently small, the hash
tables work quite well (fast)
 Several collisions resolution strategies exist
 Chaining collided keys (+ values) in a list
 Re-hashing (second hash function)
 Using the neighbor slots (linear probing)
 Many other
Collisions in a Hash Table
11
h("Pesho") = 4
h("Kiro") = 2
h("Mimi") = 1
h("Ivan") = 2
h("Lili") = m-1
Collision Resolution: Chaining
Ivan
null
null null
collision
Chaining the elements
in case of collision
null Mimi Kiro null Pesho … Lili
0 1 2 3 4 … m-1
T
null
12
 Open addressing as collision resolution strategy means to take another
slot in the hash-table in case of collision, e.g.
 Linear probing: take the next empty slot just after the collision
 h(key, i) = h(key) + i
 where i is the attempt number: 0, 1, 2, …
 Quadratic probing: the ith next slot is calculated by a quadratic
polynomial (c1 and c2 are some constants)
 h(key, i) = h(key) + c1*i + c2*i2
 Re-hashing: use separate (second) hash-function for collisions
 h(key, i) = h1(key) + i*h2(key)
Collision Resolution: Open Addressing
13
 The load factor (fill factor) = used cells / all cells
 How much the hash table is filled, e.g. 65%
 Smaller fill factor leads to:
 Less collisions (faster average seek time)
 More memory consumption
 Recommended fill factors:
 When chaining is used as collision resolution  less than 75%
 When open addressing is used  less than 50%
How Big the Hash-Table Should Be?
14
Adding Item to Hash Table With Chaining
Ivan null
null Mimi Kiro null null … Lili
0 1 2 3 4 … m-1
T
Add("Tanio")
hash("Tanio") % m = 3
Fill factor >= 75%?
Resize & rehash
yes
no
map[3] == null?
Insert("Tanio")
Initiliaze
linked list
null
null
yes
15
Lab Exercise
Implement a Hash-Table with
Chaining as Collision Resolution
16
 The hash-table performance depends on the probability
of collisions
 Less collisions  faster add / find / delete operations
 How to implement a good (efficient) hash function?
 A good hash-function should distribute the input values uniformly
 The hash code calculation process should be fast
 Integer n  use n as hash value (n % size as hash-table slot)
 Real number r  use the bitwise representation of r
 String s  use a formula over the Unicode representation of s
Implementing a Good Hash Function
17
 All C# / Java objects already have GetHashCode() method
 Primitive types like int, long, float, double, decimal, …
 Built-in types like: string, DateTime and Guid
Built-In Hash Functions in C# / Java
int c, hash1 = (5381<<16) + 5381; int hash2 = hash1;
char *s = src;
while ((c = s[0]) != 0) {
hash1 = ((hash1 << 5) + hash1) ^ c;
c = s[1];
if (c == 0)
break;
hash2 = ((hash2 << 5) + hash2) ^ c;
s += 2;
}
return hash1 + (hash2 * 1566083941);
Hash function for
System.String
18
 What if we have a composite key
 E.g. FirstName + MiddleName + LastName?
1. Convert keys to string and get its hash code:
2. Use a custom hash-code function:
Hash Functions on Composite Keys
var hashCode = (this.FirstName != null ? this.FirstName.GetHashCode() : 0);
hashCode = (hashCode * 397) ^ (this.MiddleName != null ?
this.MiddleName.GetHashCode() : 0);
hashCode = (hashCode * 397) ^ (this.LastName != null ?
this.LastName.GetHashCode() : 0);
return hashCode;
var key = string.Format("{0}-{1}-{2}", FirstName, MiddleName, LastName);
19
 Hash table efficiency depends on:
 Efficient hash-functions
 Most implementations use the built-in hash-functions in C# / Java
 Collisions should be as low as possible
 Fill factor (used buckets / all buckets)
 Typically 70% fill  resize and rehash
 Avoid frequent resizing! Define the hash table capacity in advance
 Collisions resolution algorithm
 Most implementations use chaining with linked list
Hash Tables and Efficiency
20
 Hash tables are the most efficient dictionary implementation
 Add / Find / Delete take just few primitive operations
 Speed does not depend on the size of the hash-table
 Amortized complexity O(1) – constant time
 Example:
 Finding an element in a hash-table holding 1 000 000 elements
takes average just 1-2 steps
 Finding an element in an array holding 1 000 000 elements
takes average 500 000 steps
Hash Tables and Efficiency
Hash Tables in C#
The Dictionary<TKey,TValue> Class
22
Dictionaries: .NET Interfaces and Classes
23
 Implements the ADT dictionary as hash table
 The size is dynamically increased as needed
 Contains a collection of key-value pairs
 Collisions are resolved by chaining
 Elements have almost random order
 Ordered by the hash code of the key
 Dictionary<TKey,TValue> relies on:
 Object.Equals() – compares the keys
 Object.GetHashCode() – calculates the hash codes of the keys
Dictionary<TKey,TValue>
24
 Major operations:
 Add(key, value) – adds an element by key + value
 Remove(key) – removes a value by key
 this[key] = value – add / replace element by key
 this[key] – gets an element by key
 Clear() – removes all elements
 Count – returns the number of elements
 Keys – returns a collection of all keys (in unspecified order)
 Values – returns a collection of all values (in unspecified order)
Dictionary<TKey,TValue> (2)
Exception when the key already exists
Returns true / false
Exception on
non-existing key
25
Dictionary<TKey,TValue> (3)
 Major operations:
 ContainsKey(key) – checks if given key exists in the dictionary
 ContainsValue(value) – checks whether the dictionary
contains given value
 Warning: slow operation – O(n)
 TryGetValue(key, out value)
 If the key is found, returns it in the value parameter
 Otherwise returns false
26
Dictionary<TKey,TValue> – Example
var studentGrades = new Dictionary<string, int>();
studentGrades.Add("Ivan", 4);
studentGrades.Add("Peter", 6);
studentGrades.Add("Maria", 6);
studentGrades.Add("George", 5);
int peterGrade = studentGrades["Peter"];
Console.WriteLine("Peter's grade: {0}", peterGrade);
Console.WriteLine("Is Peter in the hash table: {0}",
studentsGrades.ContainsKey("Peter"));
Console.WriteLine("Students and their grades:");
foreach (var pair in studentsGrades)
{
Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);
}
Dictionary<TKey,TValue>
Live Demo
28
Counting the Words in a Text
string text = "a text, some text, just some text";
var wordsCount = new Dictionary<string, int>();
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}
Counting the Words in a Text
Live Demo
30
 Data structures can be nested, e.g. dictionary of lists:
Dictionary<string, List<int>>
Nested Data Structures
static Dictionary<string, List<int>> studentGrades =
new Dictionary<string, List<int>>();
private static void AddGrade(string name, int grade)
{
if (! studentGrades.ContainsKey(name))
{
studentGrades[name] = new List<int>();
}
studentGrades[name].Add(grade);
}
31
Nested Data Structures (2)
var countriesAndCities =
new Dictionary<string, Dictionary<string, int>>();
countriesAndCities["Bulgaria"] = new Dictionary<string, int>());
countriesAndCities["Bulgaria"]["Sofia"] = 1000000;
countriesAndCities["Bulgaria"]["Plovdiv"] = 400000;
countriesAndCities["Bulgaria"]["Pernik"] = 30000;
foreach (var city in countriesAndCities["Bulgaria"])
{
Console.WriteLine("{0} : {1}", city.Key, city.Value);
}
var totalPopulation = countriesAndCities["Bulgaria"]
.Sum(c => c.Value);
Console.WriteLine(totalPopulation);
Dictionary of Lists
Live Demo
Balanced Tree-Based Dictionaries
The SortedDictionary<TKey,TValue> Class
34
 SortedDictionary<TKey,TValue> implements the ADT
"dictionary" as self-balancing search tree
 Elements are arranged in the tree ordered by key
 Traversing the tree returns the elements in increasing order
 Add / Find / Delete perform log2(n) operations
 Use SortedDictionary<TKey,TValue> when you need the
elements sorted by key
 Otherwise use Dictionary<TKey,TValue> – it has better
performance
SortedDictionary<TKey,TValue>
35
Counting Words (Again)
string text = "a text, some text, just some text";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}
Counting the Words in a Text
Live Demo
Comparing Dictionary Keys
Using Custom Key Classes in
Dictionary<TKey, TValue> and
SortedDictionary<TKey,TValue>
38
 Dictionary<TKey,TValue> relies on
 Object.Equals() – for comparing the keys
 Object.GetHashCode() – for calculating the hash codes of the
keys
 SortedDictionary<TKey,TValue> relies on IComparable<T>
for ordering the keys
 Built-in types like int, long, float, string and DateTime
already implement Equals(), GetHashCode() and
IComparable<T>
 Other types used when used as dictionary keys should provide
custom implementations
IComparable<T>
39
Implementing Equals() and GetHashCode()
public struct Point
{
public int X { get; set; }
public int Y { get; set; }
public override bool Equals(Object obj)
{
if (!(obj is Point) || (obj == null)) return false;
Point p = (Point)obj;
return (X == p.X) && (Y == p.Y);
}
public override int GetHashCode()
{
return (X << 16 | X >> 16) ^ Y;
}
}
40
Implementing IComparable<T>
public struct Point : IComparable<Point>
{
public int X { get; set; }
public int Y { get; set; }
public int CompareTo(Point otherPoint)
{
if (X != otherPoint.X)
{
return this.X.CompareTo(otherPoint.X);
}
else
{
return this.Y.CompareTo(otherPoint.Y);
}
}
}
Sets
Sets of Elements
42
 The abstract data type (ADT) "set" keeps a set of elements with no
duplicates
 Sets with duplicates are also known as ADT "bag"
 Set operations:
 Add(element)
 Contains(element)  true / false
 Delete(element)
 Union(set) / Intersect(set)
 Sets can be implemented in several ways
 List, array, hash table, balanced tree, ...
Set and Bag ADTs
43
Sets: .NET Interfaces and Implementations
44
 HashSet<T> implements ADT set by hash table
 Elements are in no particular order
 All major operations are fast:
 Add(element) – appends an element to the set
 Does nothing if the element already exists
 Remove(element) – removes given element
 Count – returns the number of elements
 UnionWith(set) / IntersectWith(set) – performs union /
intersection with another set
HashSet<T>
45
HashSet<T> – Example
ISet<string> firstSet = new HashSet<string>(
new string[] { "SQL", "Java", "C#", "PHP" });
ISet<string> secondSet = new HashSet<string>(
new string[] { "Oracle", "SQL", "MySQL" });
ISet<string> union = new HashSet<string>(firstSet);
union.UnionWith(secondSet);
foreach (var element in union)
{
Console.Write("{0} ", element);
}
Console.WriteLine();
46
 SortedSet<T> implements ADT set by balanced search tree
(red-black tree)
 Elements are sorted in increasing order
 Example:
SortedSet<T>
ISet<string> firstSet = new SortedSet<string>(
new string[] { "SQL", "Java", "C#", "PHP" });
ISet<string> secondSet = new SortedSet<string>(
new string[] { "Oracle", "SQL", "MySQL" });
ISet<string> union = new HashSet<string>(firstSet);
union.UnionWith(secondSet);
PrintSet(union); // C# Java PHP SQL MySQL Oracle
HashSet<T> and SortedSet<T>
Live Demo
48
Data Structure Internal Structure
Time Compexity
(Add/Update/Delete)
Dictionary<K,V>
HashSet<K>
O(1)
SortedDictionary<K,V>
SortedSet<K>
O(log(n))
Dictionaries and Sets Comparison
49
 Dictionaries map key to value
 Can be implemented as hash table or
balanced search tree
 Hash-tables map keys to values
 Rely on hash-functions to distribute the keys in the table
 Collisions needs resolution algorithm (e.g. chaining)
 Very fast add / find / delete – O(1)
 Sets hold a group of elements
 Hash-table or balanced tree implementations
Summary
?
Dictionaries, Hash Tables and Sets
https://softuni.bg/courses/data-structures/
License
 This course (slides, examples, labs, videos, homework, etc.)
is licensed under the "Creative Commons Attribution-
NonCommercial-ShareAlike 4.0 International" license
51
 Attribution: this work may contain portions from
 "Fundamentals of Computer Programming with C#" book by Svetlin Nakov & Co. under CC-BY-SA license
 "Data Structures and Algorithms" course by Telerik Academy under CC-BY-NC-SA license
Free Trainings @ Software University
 Software University Foundation – softuni.org
 Software University – High-Quality Education,
Profession and Job for Software Developers
 softuni.bg
 Software University @ Facebook
 facebook.com/SoftwareUniversity
 Software University @ YouTube
 youtube.com/SoftwareUniversity
 Software University Forums – forum.softuni.bg

More Related Content

18. Dictionaries, Hash-Tables and Set

  • 1. Dictionaries, Hash Tables and Sets Dictionaries, Hash Tables, Hashing, Collisions, Sets SoftUni Team Technical Trainers Software University http://softuni.bg
  • 2. Table of Contents 1. Dictionary (Map) Abstract Data Type 2. Hash Tables, Hashing and Collision Resolution 3.Dictionary<TKey, TValue> Class 4. Sets: HashSet<T> and SortedSet<T> 2
  • 4. 4  The abstract data type (ADT) "dictionary" maps key to values  Also known as "map" or "associative array"  Holds a set of {key, value} pairs  Dictionary ADT operations:  Add(key, value)  FindByKey(key)  value  Delete(key)  Many implementations  Hash table, balanced tree, list, array, ... The Dictionary (Map) ADT
  • 5. 5  Sample dictionary: ADT Dictionary – Example Key Value C# Modern general-purpose object-oriented programming language for the Microsoft .NET platform PHP Popular server-side scripting language for Web development compiler Software that transforms a computer program to executable machine code … …
  • 6. Hash Tables What is Hash Table? How it Works?
  • 7. 7  A hash table is an array that holds a set of {key, value} pairs  The process of mapping a key to a position in a table is called hashing Hash Table … … … … … … … … 0 1 2 3 4 5 … m-1 T h(k) Hash table of size m Hash function h: k → 0 … m-1
  • 8. 8  A hash table has m slots, indexed from 0 to m-1  A hash function h(k) maps the keys to positions:  h: k → 0 … m-1  For arbitrary value k in the key range and some hash function h we have h(k) = p and 0 ≤ p < m Hash Functions and Hashing … … … … … … … … 0 1 2 3 4 5 … m-1 T h(k)
  • 9. 9  Perfect hashing function (PHF)  h(k): one-to-one mapping of each key k to an integer in the range [0, m-1]  The PHF maps each key to a distinct integer within some manageable range  Finding a perfect hashing function is impossible in most cases  More realistically  Hash function h(k) that maps most of the keys onto unique integers, but not all Hashing Functions
  • 10. 10  A collision comes when different keys have the same hash value h(k1) = h(k2) for k1 ≠ k2  When the number of collisions is sufficiently small, the hash tables work quite well (fast)  Several collisions resolution strategies exist  Chaining collided keys (+ values) in a list  Re-hashing (second hash function)  Using the neighbor slots (linear probing)  Many other Collisions in a Hash Table
  • 11. 11 h("Pesho") = 4 h("Kiro") = 2 h("Mimi") = 1 h("Ivan") = 2 h("Lili") = m-1 Collision Resolution: Chaining Ivan null null null collision Chaining the elements in case of collision null Mimi Kiro null Pesho … Lili 0 1 2 3 4 … m-1 T null
  • 12. 12  Open addressing as collision resolution strategy means to take another slot in the hash-table in case of collision, e.g.  Linear probing: take the next empty slot just after the collision  h(key, i) = h(key) + i  where i is the attempt number: 0, 1, 2, …  Quadratic probing: the ith next slot is calculated by a quadratic polynomial (c1 and c2 are some constants)  h(key, i) = h(key) + c1*i + c2*i2  Re-hashing: use separate (second) hash-function for collisions  h(key, i) = h1(key) + i*h2(key) Collision Resolution: Open Addressing
  • 13. 13  The load factor (fill factor) = used cells / all cells  How much the hash table is filled, e.g. 65%  Smaller fill factor leads to:  Less collisions (faster average seek time)  More memory consumption  Recommended fill factors:  When chaining is used as collision resolution  less than 75%  When open addressing is used  less than 50% How Big the Hash-Table Should Be?
  • 14. 14 Adding Item to Hash Table With Chaining Ivan null null Mimi Kiro null null … Lili 0 1 2 3 4 … m-1 T Add("Tanio") hash("Tanio") % m = 3 Fill factor >= 75%? Resize & rehash yes no map[3] == null? Insert("Tanio") Initiliaze linked list null null yes
  • 15. 15 Lab Exercise Implement a Hash-Table with Chaining as Collision Resolution
  • 16. 16  The hash-table performance depends on the probability of collisions  Less collisions  faster add / find / delete operations  How to implement a good (efficient) hash function?  A good hash-function should distribute the input values uniformly  The hash code calculation process should be fast  Integer n  use n as hash value (n % size as hash-table slot)  Real number r  use the bitwise representation of r  String s  use a formula over the Unicode representation of s Implementing a Good Hash Function
  • 17. 17  All C# / Java objects already have GetHashCode() method  Primitive types like int, long, float, double, decimal, …  Built-in types like: string, DateTime and Guid Built-In Hash Functions in C# / Java int c, hash1 = (5381<<16) + 5381; int hash2 = hash1; char *s = src; while ((c = s[0]) != 0) { hash1 = ((hash1 << 5) + hash1) ^ c; c = s[1]; if (c == 0) break; hash2 = ((hash2 << 5) + hash2) ^ c; s += 2; } return hash1 + (hash2 * 1566083941); Hash function for System.String
  • 18. 18  What if we have a composite key  E.g. FirstName + MiddleName + LastName? 1. Convert keys to string and get its hash code: 2. Use a custom hash-code function: Hash Functions on Composite Keys var hashCode = (this.FirstName != null ? this.FirstName.GetHashCode() : 0); hashCode = (hashCode * 397) ^ (this.MiddleName != null ? this.MiddleName.GetHashCode() : 0); hashCode = (hashCode * 397) ^ (this.LastName != null ? this.LastName.GetHashCode() : 0); return hashCode; var key = string.Format("{0}-{1}-{2}", FirstName, MiddleName, LastName);
  • 19. 19  Hash table efficiency depends on:  Efficient hash-functions  Most implementations use the built-in hash-functions in C# / Java  Collisions should be as low as possible  Fill factor (used buckets / all buckets)  Typically 70% fill  resize and rehash  Avoid frequent resizing! Define the hash table capacity in advance  Collisions resolution algorithm  Most implementations use chaining with linked list Hash Tables and Efficiency
  • 20. 20  Hash tables are the most efficient dictionary implementation  Add / Find / Delete take just few primitive operations  Speed does not depend on the size of the hash-table  Amortized complexity O(1) – constant time  Example:  Finding an element in a hash-table holding 1 000 000 elements takes average just 1-2 steps  Finding an element in an array holding 1 000 000 elements takes average 500 000 steps Hash Tables and Efficiency
  • 21. Hash Tables in C# The Dictionary<TKey,TValue> Class
  • 23. 23  Implements the ADT dictionary as hash table  The size is dynamically increased as needed  Contains a collection of key-value pairs  Collisions are resolved by chaining  Elements have almost random order  Ordered by the hash code of the key  Dictionary<TKey,TValue> relies on:  Object.Equals() – compares the keys  Object.GetHashCode() – calculates the hash codes of the keys Dictionary<TKey,TValue>
  • 24. 24  Major operations:  Add(key, value) – adds an element by key + value  Remove(key) – removes a value by key  this[key] = value – add / replace element by key  this[key] – gets an element by key  Clear() – removes all elements  Count – returns the number of elements  Keys – returns a collection of all keys (in unspecified order)  Values – returns a collection of all values (in unspecified order) Dictionary<TKey,TValue> (2) Exception when the key already exists Returns true / false Exception on non-existing key
  • 25. 25 Dictionary<TKey,TValue> (3)  Major operations:  ContainsKey(key) – checks if given key exists in the dictionary  ContainsValue(value) – checks whether the dictionary contains given value  Warning: slow operation – O(n)  TryGetValue(key, out value)  If the key is found, returns it in the value parameter  Otherwise returns false
  • 26. 26 Dictionary<TKey,TValue> – Example var studentGrades = new Dictionary<string, int>(); studentGrades.Add("Ivan", 4); studentGrades.Add("Peter", 6); studentGrades.Add("Maria", 6); studentGrades.Add("George", 5); int peterGrade = studentGrades["Peter"]; Console.WriteLine("Peter's grade: {0}", peterGrade); Console.WriteLine("Is Peter in the hash table: {0}", studentsGrades.ContainsKey("Peter")); Console.WriteLine("Students and their grades:"); foreach (var pair in studentsGrades) { Console.WriteLine("{0} --> {1}", pair.Key, pair.Value); }
  • 28. 28 Counting the Words in a Text string text = "a text, some text, just some text"; var wordsCount = new Dictionary<string, int>(); string[] words = text.Split(' ', ',', '.'); foreach (string word in words) { int count = 1; if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1; wordsCount[word] = count; } foreach(var pair in wordsCount) { Console.WriteLine("{0} -> {1}", pair.Key, pair.Value); }
  • 29. Counting the Words in a Text Live Demo
  • 30. 30  Data structures can be nested, e.g. dictionary of lists: Dictionary<string, List<int>> Nested Data Structures static Dictionary<string, List<int>> studentGrades = new Dictionary<string, List<int>>(); private static void AddGrade(string name, int grade) { if (! studentGrades.ContainsKey(name)) { studentGrades[name] = new List<int>(); } studentGrades[name].Add(grade); }
  • 31. 31 Nested Data Structures (2) var countriesAndCities = new Dictionary<string, Dictionary<string, int>>(); countriesAndCities["Bulgaria"] = new Dictionary<string, int>()); countriesAndCities["Bulgaria"]["Sofia"] = 1000000; countriesAndCities["Bulgaria"]["Plovdiv"] = 400000; countriesAndCities["Bulgaria"]["Pernik"] = 30000; foreach (var city in countriesAndCities["Bulgaria"]) { Console.WriteLine("{0} : {1}", city.Key, city.Value); } var totalPopulation = countriesAndCities["Bulgaria"] .Sum(c => c.Value); Console.WriteLine(totalPopulation);
  • 33. Balanced Tree-Based Dictionaries The SortedDictionary<TKey,TValue> Class
  • 34. 34  SortedDictionary<TKey,TValue> implements the ADT "dictionary" as self-balancing search tree  Elements are arranged in the tree ordered by key  Traversing the tree returns the elements in increasing order  Add / Find / Delete perform log2(n) operations  Use SortedDictionary<TKey,TValue> when you need the elements sorted by key  Otherwise use Dictionary<TKey,TValue> – it has better performance SortedDictionary<TKey,TValue>
  • 35. 35 Counting Words (Again) string text = "a text, some text, just some text"; IDictionary<string, int> wordsCount = new SortedDictionary<string, int>(); string[] words = text.Split(' ', ',', '.'); foreach (string word in words) { int count = 1; if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1; wordsCount[word] = count; } foreach(var pair in wordsCount) { Console.WriteLine("{0} -> {1}", pair.Key, pair.Value); }
  • 36. Counting the Words in a Text Live Demo
  • 37. Comparing Dictionary Keys Using Custom Key Classes in Dictionary<TKey, TValue> and SortedDictionary<TKey,TValue>
  • 38. 38  Dictionary<TKey,TValue> relies on  Object.Equals() – for comparing the keys  Object.GetHashCode() – for calculating the hash codes of the keys  SortedDictionary<TKey,TValue> relies on IComparable<T> for ordering the keys  Built-in types like int, long, float, string and DateTime already implement Equals(), GetHashCode() and IComparable<T>  Other types used when used as dictionary keys should provide custom implementations IComparable<T>
  • 39. 39 Implementing Equals() and GetHashCode() public struct Point { public int X { get; set; } public int Y { get; set; } public override bool Equals(Object obj) { if (!(obj is Point) || (obj == null)) return false; Point p = (Point)obj; return (X == p.X) && (Y == p.Y); } public override int GetHashCode() { return (X << 16 | X >> 16) ^ Y; } }
  • 40. 40 Implementing IComparable<T> public struct Point : IComparable<Point> { public int X { get; set; } public int Y { get; set; } public int CompareTo(Point otherPoint) { if (X != otherPoint.X) { return this.X.CompareTo(otherPoint.X); } else { return this.Y.CompareTo(otherPoint.Y); } } }
  • 42. 42  The abstract data type (ADT) "set" keeps a set of elements with no duplicates  Sets with duplicates are also known as ADT "bag"  Set operations:  Add(element)  Contains(element)  true / false  Delete(element)  Union(set) / Intersect(set)  Sets can be implemented in several ways  List, array, hash table, balanced tree, ... Set and Bag ADTs
  • 43. 43 Sets: .NET Interfaces and Implementations
  • 44. 44  HashSet<T> implements ADT set by hash table  Elements are in no particular order  All major operations are fast:  Add(element) – appends an element to the set  Does nothing if the element already exists  Remove(element) – removes given element  Count – returns the number of elements  UnionWith(set) / IntersectWith(set) – performs union / intersection with another set HashSet<T>
  • 45. 45 HashSet<T> – Example ISet<string> firstSet = new HashSet<string>( new string[] { "SQL", "Java", "C#", "PHP" }); ISet<string> secondSet = new HashSet<string>( new string[] { "Oracle", "SQL", "MySQL" }); ISet<string> union = new HashSet<string>(firstSet); union.UnionWith(secondSet); foreach (var element in union) { Console.Write("{0} ", element); } Console.WriteLine();
  • 46. 46  SortedSet<T> implements ADT set by balanced search tree (red-black tree)  Elements are sorted in increasing order  Example: SortedSet<T> ISet<string> firstSet = new SortedSet<string>( new string[] { "SQL", "Java", "C#", "PHP" }); ISet<string> secondSet = new SortedSet<string>( new string[] { "Oracle", "SQL", "MySQL" }); ISet<string> union = new HashSet<string>(firstSet); union.UnionWith(secondSet); PrintSet(union); // C# Java PHP SQL MySQL Oracle
  • 48. 48 Data Structure Internal Structure Time Compexity (Add/Update/Delete) Dictionary<K,V> HashSet<K> O(1) SortedDictionary<K,V> SortedSet<K> O(log(n)) Dictionaries and Sets Comparison
  • 49. 49  Dictionaries map key to value  Can be implemented as hash table or balanced search tree  Hash-tables map keys to values  Rely on hash-functions to distribute the keys in the table  Collisions needs resolution algorithm (e.g. chaining)  Very fast add / find / delete – O(1)  Sets hold a group of elements  Hash-table or balanced tree implementations Summary
  • 50. ? Dictionaries, Hash Tables and Sets https://softuni.bg/courses/data-structures/
  • 51. License  This course (slides, examples, labs, videos, homework, etc.) is licensed under the "Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International" license 51  Attribution: this work may contain portions from  "Fundamentals of Computer Programming with C#" book by Svetlin Nakov & Co. under CC-BY-SA license  "Data Structures and Algorithms" course by Telerik Academy under CC-BY-NC-SA license
  • 52. Free Trainings @ Software University  Software University Foundation – softuni.org  Software University – High-Quality Education, Profession and Job for Software Developers  softuni.bg  Software University @ Facebook  facebook.com/SoftwareUniversity  Software University @ YouTube  youtube.com/SoftwareUniversity  Software University Forums – forum.softuni.bg

Editor's Notes

  1. (c) 2007 National Academy for Software Development - http://academy.devbg.org. All rights reserved. Unauthorized copying or re-distribution is strictly prohibited.*