WEBVTT

00:00.090 --> 00:04.920
Hello again! In this video, we are going to look at unordered associative containers.

00:06.290 --> 00:12.920
Associative containers store their elements in an order which depends on the key. For containers like

00:12.920 --> 00:16.190
set and map, these use a tree.

00:16.940 --> 00:21.590
The less-than operator of the element decides where the element goes, in the tree.

00:22.520 --> 00:30.050
C++ 11 introduced so-called "unsorted" or unordered associative containers, and these use a hash table

00:30.350 --> 00:31.250
instead of a tree.

00:34.130 --> 00:39.860
If you are not sure about hash tables, I will try and explain them fairly quickly. In C++ terms,

00:40.310 --> 00:45.760
the hash table is an array of so-called "buckets". And these buckets could be a linked list or a vector

00:45.770 --> 00:54.320
or some other sequential container. If we have, say, a lot of random numbers between one and 100,

00:54.770 --> 00:57.200
and we want to put them into a hash table,

00:57.650 --> 01:03.130
we would arrange them into buckets. So we could say, for example, random numbers between one and 10

01:03.150 --> 01:06.380
go into the first bucket. Between 11 and 20,

01:06.410 --> 01:12.320
they go into the second bucket and so on. Up to 91 to 100, which goes into the last bucket.

01:13.990 --> 01:16.030
How do we know which bucket a number goes into?

01:16.750 --> 01:18.910
We can work it out. If it is between one and 10,

01:18.940 --> 01:21.040
it goes into bucket with index zero.

01:21.640 --> 01:25.300
So we need some way to map "between one and 10" into zero.

01:26.020 --> 01:30.490
We can do that by subtracting one and dividing by 10. An integer division.

01:31.820 --> 01:37.340
That is known as a hash function. So it calculates the so-called hash of the number. And then that

01:37.340 --> 01:40.220
hash number is also the index into the array.

01:41.280 --> 01:46.980
So, for example, four. 4 - 3 is 1, 3 divided by 10 is zero - integer division.

01:47.430 --> 01:49.920
So it goes into the element with index 0.

01:51.270 --> 01:54.300
So let's look at a fairly simple hash map.

01:55.350 --> 01:57.810
This is our array of linked lists.

01:58.350 --> 01:59.680
So those are going to be the buckets.

01:59.700 --> 02:00.660
There's 10 of them.

02:01.560 --> 02:03.650
We are going to be generating random numbers.

02:03.660 --> 02:07.830
So there is our engine and the distribution, between 1 and 100.

02:09.580 --> 02:13.850
And then we have our loop. I am going to use one 150 random numbers, because that more

02:13.870 --> 02:18.520
or less fits onto the screen. Then we get a random number on each iteration.

02:19.180 --> 02:22.870
Then we work out its hash, so there is our "quote" hash function.

02:24.590 --> 02:29.810
So this hash number is going to be the index into this array. And then we go to the bucket which has

02:29.810 --> 02:33.620
that index, and we push the random number onto the linked list.

02:34.430 --> 02:39.670
And then we do that for all the random numbers. And then we can iterate over all the elements in the

02:39.690 --> 02:43.820
array, and we can print each of the numbers from the linked list in that element.

02:44.720 --> 02:45.710
So let's see what we get.

02:48.150 --> 02:52.470
Well, they do not actually quite fit on the screen, so maybe I should use less than 150!

02:53.270 --> 02:57.930
But you can see we have elements between 11 and 20 in bucket 1, and so on.

02:58.680 --> 03:01.920
If we want to find an element in our hash map, how do we do that?

03:02.430 --> 03:04.250
Well, that's pretty straightforward, actually.

03:04.860 --> 03:10.410
We are going to call the find() algorithm. So we need the <algorithm> header. Then we create the same hash

03:10.410 --> 03:10.950
map again.

03:13.730 --> 03:18.140
Then let's say, for example, we want to find the number 43.

03:18.260 --> 03:19.730
So we make that our target.

03:20.690 --> 03:24.530
We calculate the hash of 43, using our hash function.

03:25.520 --> 03:29.540
So that should be, 43 minus 1 is 42. 42

03:29.540 --> 03:31.930
divided by 10 is 4.

03:32.660 --> 03:34.610
So we are going to look in bucket number 4.

03:35.450 --> 03:41.360
So we pass the iterators for that bucket as the arguments to find(), and the target that we are looking

03:41.360 --> 03:41.660
for.

03:44.180 --> 03:48.770
So we are looking in buckets 4, apparently and yes, 43 is in there.

03:49.730 --> 03:51.230
And yes, we have found it.

03:55.510 --> 04:02.290
So in our simple hash map, we had an array of linked lists, and the linked lists contain a copy of

04:02.290 --> 04:07.270
the numbers. In the actual C++11 unordered container implementation,

04:07.570 --> 04:10.540
those would be pointers, so they actually have pointers to the elements.

04:10.930 --> 04:16.180
So we have an array of lists of pointers to elements. Got that? :)

04:18.500 --> 04:22.550
So all the details of the hash map are actually implemented for us by the container. So we do not

04:22.550 --> 04:27.200
actually need to compute hashes or search around in linked lists ourselves.

04:28.350 --> 04:31.740
The index of the array is calculated from the element.

04:32.430 --> 04:37.020
There is the hash function, which will generate a number based on the key, which is the hash of the

04:37.020 --> 04:37.380
key.

04:37.950 --> 04:43.320
So that is the index into the array. And usually, this should be a very fast operation.

04:44.340 --> 04:47.940
And then the hash of that key is used as the index into the array.

04:49.540 --> 04:54.250
To insert an element, we calculate the hash number for its key.

04:55.000 --> 04:59.550
Then we find the bucket which corresponds to that hash number, and then we go to the linked list

04:59.560 --> 05:00.310
for that bucket.

05:00.790 --> 05:03.820
And we push a pointer to the new element, onto that list.

05:06.240 --> 05:08.850
To search for an element, we calculate its hash number again.

05:09.270 --> 05:15.300
We go to the bucket with that hash number, and then it searches the bucket for the element or elements

05:15.630 --> 05:16.560
which have that key.

05:17.040 --> 05:22.710
And this will use the equality operator of the key, and not the less than operator which the ordered

05:22.950 --> 05:23.790
containers use.

05:28.890 --> 05:34.320
The most efficient hash table is one in which each element has its own bucket, so there is no need

05:34.320 --> 05:36.360
to process linked lists.

05:37.020 --> 05:42.240
So that means that each key will produce a different hash value, and that requires a hash function which

05:42.240 --> 05:44.580
can perform a so-called "perfect" hashing.

05:48.330 --> 05:53.610
In the real world, perfection is difficult or even impossible to attain, and often very expensive.

05:53.880 --> 05:57.900
So in practice, people usually write hash functions which give the same results for different keys.

05:58.500 --> 06:03.630
And that's known as a "hash collision". And generally that makes the hash map less efficient, because you have

06:03.630 --> 06:05.940
to process a linked list instead of a single element.

06:06.450 --> 06:08.730
On the other hand, the hashing function is much faster.

06:09.390 --> 06:12.150
So it is a compromise, as is often the case in the real world.

06:13.650 --> 06:18.690
In C++ containers, we have multimaps and multisets. And we can have several elements which have

06:18.690 --> 06:19.230
the same key.

06:19.260 --> 06:22.050
So we are definitely going to get collisions there.

06:23.720 --> 06:29.660
And, in general, the more elements there are in the bucket, the longer it will take to perform operations

06:29.990 --> 06:30.800
on that bucket.

06:33.420 --> 06:38.940
So, as far as actually using unordered containers is concerned, they are very similar to using the ordered

06:38.940 --> 06:39.460
versions.

06:40.020 --> 06:45.600
For example, if we want unordered map, so a map which uses a hash table instead of a tree.

06:46.350 --> 06:48.360
We just include the <unordered_map> header.

06:48.990 --> 06:54.150
We create an unordered_map object, and then we can call all the usual member functions.

06:54.630 --> 07:00.060
So, for example, with insert(), again, the keys have to be unique, so duplicates will be ignored.

07:01.350 --> 07:03.750
So we get one element with the key "Graham".

07:04.950 --> 07:07.380
If we use an unordered multimap.

07:11.470 --> 07:15.940
Then we can have duplicate keys. And we have three elements with the key "Graham".

07:16.990 --> 07:18.770
Okay, so that is it for this video.

07:19.240 --> 07:22.380
I will see you next time, but until then, keep coding!
