18. Dictionaries, Hash-Tables and Set

Dictionaries,
Hash Tables and Sets
Dictionaries, Hash Tables,
Hashing, Collisions, Sets
SoftUni Team
Technical Trainers
Software University
http://softuni.bg

Table of Contents
1. Dictionary (Map) Abstract Data Type
2. Hash Tables, Hashing and Collision Resolution
3.Dictionary<TKey, TValue> Class
4. Sets: HashSet<T> and SortedSet<T>
2

Dictionaries
Data Structures that Map Keys to Values

4
 The abstract data type (ADT) "dictionary" maps key to values
 Also known as "map" or "associative array"
 Holds a set of {key, value} pairs
 Dictionary ADT operations:
 Add(key, value)
 FindByKey(key)  value
 Delete(key)
 Many implementations
 Hash table, balanced tree, list, array, ...
The Dictionary (Map) ADT

5
 Sample dictionary:
ADT Dictionary – Example
Key Value
C#
Modern general-purpose object-oriented programming
language for the Microsoft .NET platform
PHP
Popular server-side scripting language for Web
development
compiler
Software that transforms a computer program to
executable machine code
… …

Hash Tables
What is Hash Table? How it Works?

7
 A hash table is an array that holds a set of {key, value} pairs
 The process of mapping a key to a position in a table is called
hashing
Hash Table
… … … … … … … …
0 1 2 3 4 5 … m-1
T
h(k) Hash table
of size m
Hash function
h: k → 0 … m-1

8
 A hash table has m slots, indexed from 0 to m-1
 A hash function h(k) maps the keys to positions:
 h: k → 0 … m-1
 For arbitrary value k in the key range and some hash function h
we have h(k) = p and 0 ≤ p < m
Hash Functions and Hashing
… … … … … … … …
0 1 2 3 4 5 … m-1
T
h(k)

9
 Perfect hashing function (PHF)
 h(k): one-to-one mapping of each key k to an integer in the
range [0, m-1]
 The PHF maps each key to a distinct integer within some
manageable range
 Finding a perfect hashing function is impossible in most cases
 More realistically
 Hash function h(k) that maps most of the keys onto unique
integers, but not all
Hashing Functions

10
 A collision comes when different keys have the same hash value
h(k1) = h(k2) for k1 ≠ k2
 When the number of collisions is sufficiently small, the hash
tables work quite well (fast)
 Several collisions resolution strategies exist
 Chaining collided keys (+ values) in a list
 Re-hashing (second hash function)
 Using the neighbor slots (linear probing)
 Many other
Collisions in a Hash Table

11
h("Pesho") = 4
h("Kiro") = 2
h("Mimi") = 1
h("Ivan") = 2
h("Lili") = m-1
Collision Resolution: Chaining
Ivan
null
null null
collision
Chaining the elements
in case of collision
null Mimi Kiro null Pesho … Lili
0 1 2 3 4 … m-1
T
null

12
 Open addressing as collision resolution strategy means to take another
slot in the hash-table in case of collision, e.g.
 Linear probing: take the next empty slot just after the collision
 h(key, i) = h(key) + i
 where i is the attempt number: 0, 1, 2, …
 Quadratic probing: the ith next slot is calculated by a quadratic
polynomial (c1 and c2 are some constants)
 h(key, i) = h(key) + c1*i + c2*i2
 Re-hashing: use separate (second) hash-function for collisions
 h(key, i) = h1(key) + i*h2(key)
Collision Resolution: Open Addressing

13
 The load factor (fill factor) = used cells / all cells
 How much the hash table is filled, e.g. 65%
 Smaller fill factor leads to:
 Less collisions (faster average seek time)
 More memory consumption
 Recommended fill factors:
 When chaining is used as collision resolution  less than 75%
 When open addressing is used  less than 50%
How Big the Hash-Table Should Be?

14
Adding Item to Hash Table With Chaining
Ivan null
null Mimi Kiro null null … Lili
0 1 2 3 4 … m-1
T
Add("Tanio")
hash("Tanio") % m = 3
Fill factor >= 75%?
Resize & rehash
yes
no
map[3] == null?
Insert("Tanio")
Initiliaze
linked list
null
null
yes

15
Lab Exercise
Implement a Hash-Table with
Chaining as Collision Resolution

16
 The hash-table performance depends on the probability
of collisions
 Less collisions  faster add / find / delete operations
 How to implement a good (efficient) hash function?
 A good hash-function should distribute the input values uniformly
 The hash code calculation process should be fast
 Integer n  use n as hash value (n % size as hash-table slot)
 Real number r  use the bitwise representation of r
 String s  use a formula over the Unicode representation of s
Implementing a Good Hash Function

17
 All C# / Java objects already have GetHashCode() method
 Primitive types like int, long, float, double, decimal, …
 Built-in types like: string, DateTime and Guid
Built-In Hash Functions in C# / Java
int c, hash1 = (5381<<16) + 5381; int hash2 = hash1;
char *s = src;
while ((c = s[0]) != 0) {
hash1 = ((hash1 << 5) + hash1) ^ c;
c = s[1];
if (c == 0)
break;
hash2 = ((hash2 << 5) + hash2) ^ c;
s += 2;
}
return hash1 + (hash2 * 1566083941);
Hash function for
System.String

18
 What if we have a composite key
 E.g. FirstName + MiddleName + LastName?
1. Convert keys to string and get its hash code:
2. Use a custom hash-code function:
Hash Functions on Composite Keys
var hashCode = (this.FirstName != null ? this.FirstName.GetHashCode() : 0);
hashCode = (hashCode * 397) ^ (this.MiddleName != null ?
this.MiddleName.GetHashCode() : 0);
hashCode = (hashCode * 397) ^ (this.LastName != null ?
this.LastName.GetHashCode() : 0);
return hashCode;
var key = string.Format("{0}-{1}-{2}", FirstName, MiddleName, LastName);

19
 Hash table efficiency depends on:
 Efficient hash-functions
 Most implementations use the built-in hash-functions in C# / Java
 Collisions should be as low as possible
 Fill factor (used buckets / all buckets)
 Typically 70% fill  resize and rehash
 Avoid frequent resizing! Define the hash table capacity in advance
 Collisions resolution algorithm
 Most implementations use chaining with linked list
Hash Tables and Efficiency

20
 Hash tables are the most efficient dictionary implementation
 Add / Find / Delete take just few primitive operations
 Speed does not depend on the size of the hash-table
 Amortized complexity O(1) – constant time
 Example:
 Finding an element in a hash-table holding 1 000 000 elements
takes average just 1-2 steps
 Finding an element in an array holding 1 000 000 elements
takes average 500 000 steps
Hash Tables and Efficiency

Hash Tables in C#
The Dictionary<TKey,TValue> Class

22
Dictionaries: .NET Interfaces and Classes

23
 Implements the ADT dictionary as hash table
 The size is dynamically increased as needed
 Contains a collection of key-value pairs
 Collisions are resolved by chaining
 Elements have almost random order
 Ordered by the hash code of the key
 Dictionary<TKey,TValue> relies on:
 Object.Equals() – compares the keys
 Object.GetHashCode() – calculates the hash codes of the keys
Dictionary<TKey,TValue>

24
 Major operations:
 Add(key, value) – adds an element by key + value
 Remove(key) – removes a value by key
 this[key] = value – add / replace element by key
 this[key] – gets an element by key
 Clear() – removes all elements
 Count – returns the number of elements
 Keys – returns a collection of all keys (in unspecified order)
 Values – returns a collection of all values (in unspecified order)
Dictionary<TKey,TValue> (2)
Exception when the key already exists
Returns true / false
Exception on
non-existing key

25
Dictionary<TKey,TValue> (3)
 Major operations:
 ContainsKey(key) – checks if given key exists in the dictionary
 ContainsValue(value) – checks whether the dictionary
contains given value
 Warning: slow operation – O(n)
 TryGetValue(key, out value)
 If the key is found, returns it in the value parameter
 Otherwise returns false

26
Dictionary<TKey,TValue> – Example
var studentGrades = new Dictionary<string, int>();
studentGrades.Add("Ivan", 4);
studentGrades.Add("Peter", 6);
studentGrades.Add("Maria", 6);
studentGrades.Add("George", 5);
int peterGrade = studentGrades["Peter"];
Console.WriteLine("Peter's grade: {0}", peterGrade);
Console.WriteLine("Is Peter in the hash table: {0}",
studentsGrades.ContainsKey("Peter"));
Console.WriteLine("Students and their grades:");
foreach (var pair in studentsGrades)
{
Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);
}

Dictionary<TKey,TValue>
Live Demo

28
Counting the Words in a Text
string text = "a text, some text, just some text";
var wordsCount = new Dictionary<string, int>();
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}

Counting the Words in a Text
Live Demo

30
 Data structures can be nested, e.g. dictionary of lists:
Dictionary<string, List<int>>
Nested Data Structures
static Dictionary<string, List<int>> studentGrades =
new Dictionary<string, List<int>>();
private static void AddGrade(string name, int grade)
{
if (! studentGrades.ContainsKey(name))
{
studentGrades[name] = new List<int>();
}
studentGrades[name].Add(grade);
}

31
Nested Data Structures (2)
var countriesAndCities =
new Dictionary<string, Dictionary<string, int>>();
countriesAndCities["Bulgaria"] = new Dictionary<string, int>());
countriesAndCities["Bulgaria"]["Sofia"] = 1000000;
countriesAndCities["Bulgaria"]["Plovdiv"] = 400000;
countriesAndCities["Bulgaria"]["Pernik"] = 30000;
foreach (var city in countriesAndCities["Bulgaria"])
{
Console.WriteLine("{0} : {1}", city.Key, city.Value);
}
var totalPopulation = countriesAndCities["Bulgaria"]
.Sum(c => c.Value);
Console.WriteLine(totalPopulation);

Balanced Tree-Based Dictionaries
The SortedDictionary<TKey,TValue> Class

34
 SortedDictionary<TKey,TValue> implements the ADT
"dictionary" as self-balancing search tree
 Elements are arranged in the tree ordered by key
 Traversing the tree returns the elements in increasing order
 Add / Find / Delete perform log2(n) operations
 Use SortedDictionary<TKey,TValue> when you need the
elements sorted by key
 Otherwise use Dictionary<TKey,TValue> – it has better
performance
SortedDictionary<TKey,TValue>

35
Counting Words (Again)
string text = "a text, some text, just some text";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}

Comparing Dictionary Keys
Using Custom Key Classes in
Dictionary<TKey, TValue> and
SortedDictionary<TKey,TValue>

38
 Dictionary<TKey,TValue> relies on
 Object.Equals() – for comparing the keys
 Object.GetHashCode() – for calculating the hash codes of the
keys
 SortedDictionary<TKey,TValue> relies on IComparable<T>
for ordering the keys
 Built-in types like int, long, float, string and DateTime
already implement Equals(), GetHashCode() and
IComparable<T>
 Other types used when used as dictionary keys should provide
custom implementations
IComparable<T>

39
Implementing Equals() and GetHashCode()
public struct Point
{
public int X { get; set; }
public int Y { get; set; }
public override bool Equals(Object obj)
{
if (!(obj is Point) || (obj == null)) return false;
Point p = (Point)obj;
return (X == p.X) && (Y == p.Y);
}
public override int GetHashCode()
{
return (X << 16 | X >> 16) ^ Y;
}
}

40
Implementing IComparable<T>
public struct Point : IComparable<Point>
{
public int X { get; set; }
public int Y { get; set; }
public int CompareTo(Point otherPoint)
{
if (X != otherPoint.X)
{
return this.X.CompareTo(otherPoint.X);
}
else
{
return this.Y.CompareTo(otherPoint.Y);
}
}
}

42
 The abstract data type (ADT) "set" keeps a set of elements with no
duplicates
 Sets with duplicates are also known as ADT "bag"
 Set operations:
 Add(element)
 Contains(element)  true / false
 Delete(element)
 Union(set) / Intersect(set)
 Sets can be implemented in several ways
 List, array, hash table, balanced tree, ...
Set and Bag ADTs

43
Sets: .NET Interfaces and Implementations

44
 HashSet<T> implements ADT set by hash table
 Elements are in no particular order
 All major operations are fast:
 Add(element) – appends an element to the set
 Does nothing if the element already exists
 Remove(element) – removes given element
 Count – returns the number of elements
 UnionWith(set) / IntersectWith(set) – performs union /
intersection with another set
HashSet<T>

45
HashSet<T> – Example
ISet<string> firstSet = new HashSet<string>(
new string[] { "SQL", "Java", "C#", "PHP" });
ISet<string> secondSet = new HashSet<string>(
new string[] { "Oracle", "SQL", "MySQL" });
ISet<string> union = new HashSet<string>(firstSet);
union.UnionWith(secondSet);
foreach (var element in union)
{
Console.Write("{0} ", element);
}
Console.WriteLine();

46
 SortedSet<T> implements ADT set by balanced search tree
(red-black tree)
 Elements are sorted in increasing order
 Example:
SortedSet<T>
ISet<string> firstSet = new SortedSet<string>(
new string[] { "SQL", "Java", "C#", "PHP" });
ISet<string> secondSet = new SortedSet<string>(
new string[] { "Oracle", "SQL", "MySQL" });
ISet<string> union = new HashSet<string>(firstSet);
union.UnionWith(secondSet);
PrintSet(union); // C# Java PHP SQL MySQL Oracle

HashSet<T> and SortedSet<T>
Live Demo

48
Data Structure Internal Structure
Time Compexity
(Add/Update/Delete)
Dictionary<K,V>
HashSet<K>
O(1)
SortedDictionary<K,V>
SortedSet<K>
O(log(n))
Dictionaries and Sets Comparison

49
 Dictionaries map key to value
 Can be implemented as hash table or
balanced search tree
 Hash-tables map keys to values
 Rely on hash-functions to distribute the keys in the table
 Collisions needs resolution algorithm (e.g. chaining)
 Very fast add / find / delete – O(1)
 Sets hold a group of elements
 Hash-table or balanced tree implementations
Summary

?
Dictionaries, Hash Tables and Sets
https://softuni.bg/courses/data-structures/

License
 This course (slides, examples, labs, videos, homework, etc.)
is licensed under the "Creative Commons Attribution-
NonCommercial-ShareAlike 4.0 International" license
51
 Attribution: this work may contain portions from
 "Fundamentals of Computer Programming with C#" book by Svetlin Nakov & Co. under CC-BY-SA license
 "Data Structures and Algorithms" course by Telerik Academy under CC-BY-NC-SA license

Free Trainings @ Software University
 Software University Foundation – softuni.org
 Software University – High-Quality Education,
Profession and Job for Software Developers
 softuni.bg
 Software University @ Facebook
 facebook.com/SoftwareUniversity
 Software University @ YouTube
 youtube.com/SoftwareUniversity
 Software University Forums – forum.softuni.bg

18. Dictionaries, Hash-Tables and Set

Related slideshows

More Related Content

18. Dictionaries, Hash-Tables and Set

Editor's Notes