WEBVTT

00:00.150 --> 00:00.660
Hello again!

00:00.990 --> 00:03.090
In this video, we are going to look at unions.

00:04.350 --> 00:07.500
Unions are inherited from the C programming language.

00:08.370 --> 00:10.740
A union is a compound data structure.

00:11.100 --> 00:14.910
It is similar to a struct in some ways, and different in some ways as well.

00:15.870 --> 00:18.630
Each member of a union must have a distinct type.

00:18.990 --> 00:21.390
We cannot have two members which have the same type.

00:22.500 --> 00:28.740
All the members of a union are stored at the same address in memory, and only one member of a union

00:28.950 --> 00:31.080
can be in use at any one time.

00:32.110 --> 00:33.220
So what does this mean?

00:34.900 --> 00:36.670
Let's try and visualize this.

00:37.150 --> 00:42.760
If we imagine we have a struct whose members are character, int and double, the character will be stored

00:42.760 --> 00:44.800
at the beginning of the struct memory.

00:45.550 --> 00:49.960
The int member will follow the character, and the double will come after the int.

00:50.080 --> 00:52.000
So they all come after each other in memory.

00:53.110 --> 00:56.020
The compiler may also put in some so-called "padding" bytes.

00:56.410 --> 00:57.820
These mean that the program

00:57.820 --> 01:02.050
will access the character value as a word and not as a byte.

01:02.320 --> 01:03.550
And that is much more efficient.

01:04.210 --> 01:07.840
There may also be some padding bytes here as well, depending on the architecture.

01:09.670 --> 01:15.400
If we have a union with the same members, then the character will again begin at the start of the union's

01:15.430 --> 01:15.850
memory.

01:16.660 --> 01:22.060
The int will also begin at the start of the union's memory, and its storage will overlap the character

01:22.060 --> 01:23.850
storage. And the double

01:23.870 --> 01:29.770
will also begin at the start of the union's memory and its storage will overlap the character and

01:29.770 --> 01:30.280
the nt.

01:32.420 --> 01:34.760
So you may think this is a trick for saving memory.

01:35.060 --> 01:40.340
In fact, the main application of unions is if you are processing data, which could be one of

01:40.340 --> 01:41.030
several types.

01:41.330 --> 01:46.100
And that comes in useful if you are parsing soure code, for example, or processing data that you have

01:46.100 --> 01:47.240
received over the network.

01:50.040 --> 01:52.260
All the members of a union are public by default.

01:52.380 --> 01:55.530
The same as a struct. In older versions of C++,

01:55.530 --> 02:01.530
the members of a union must be simple data types: no constructor, copy constructor, assignment

02:01.530 --> 02:02.880
operator or destructor.

02:03.930 --> 02:08.040
So that means, for example, that you cannot have a string as a union member. Which is pretty limiting,

02:08.400 --> 02:13.190
because strings come up all the time when you are processing data. In C++11,

02:13.230 --> 02:18.720
this was relaxed a bit, but it is a bit messy, because the compiler will not generate calls to things

02:18.720 --> 02:19.500
like destructors.

02:19.890 --> 02:23.070
So you have to call them yourself. And I am not going to go into that here.

02:24.450 --> 02:30.330
Unions can have member functions, but they cannot have virtual member functions. And you cannot use

02:30.330 --> 02:33.060
a union as a base or a derived class.

02:36.340 --> 02:37.840
To use a union,

02:37.840 --> 02:40.150
we start by assigning to one of the members.

02:40.540 --> 02:45.430
So we could assign to the character member for example, and that means that the character member is

02:45.430 --> 02:46.090
now in use.

02:46.720 --> 02:51.810
So the character member will have a well-defined value, and the int and the double will have undefined values.

02:53.110 --> 02:58.480
The only safe thing we could do is to read from the character value, or we could assign to one of the other

02:58.690 --> 02:59.170
members.

02:59.500 --> 03:04.060
And then, that member will come into play and the character member will have an undefined value.

03:06.120 --> 03:07.050
Let's have an example.

03:07.530 --> 03:09.010
So we have a union.

03:09.030 --> 03:11.170
The members are character, int and double.

03:12.420 --> 03:14.160
We create an object of this union.

03:14.580 --> 03:16.230
And then we assign to the character member.

03:16.560 --> 03:22.740
So the character member is now in use. And it has a well-defined value, which is the letter, capital

03:22.740 --> 03:23.040
'Z'.

03:24.270 --> 03:28.680
If we try to access the double member, this will have an undefined value.

03:29.580 --> 03:35.190
What will happen is that the program tries to interpret the member used by the union as a double.

03:35.970 --> 03:41.730
The first byte of this memory will contain the ASCII code for capital 'Z' and the rest of the bytes will

03:41.730 --> 03:43.230
contain - well, who knows?

03:44.100 --> 03:46.410
So we will probably get some kind of garbage number.

03:48.030 --> 03:49.080
Like that, for example.

03:51.750 --> 03:54.210
So this is fairly typical of the C legacy.

03:54.630 --> 03:55.380
It is low level.

03:55.890 --> 04:00.000
It is easy to make bad mistakes, and it is difficult to use safely.

04:00.660 --> 04:05.190
The main problem here is that the programmer is expected to know which type is in use.

04:05.940 --> 04:09.360
If you are working on a small problem on your own, that is not really a problem.

04:09.990 --> 04:14.910
If you have a large code base with several programmers working on it, then sooner or later there will be

04:14.910 --> 04:15.450
a mistake.

04:16.830 --> 04:21.030
One of the things you can do is to add a so-called "tag" member to the union.

04:21.450 --> 04:25.650
So this is an extra member, which will just keep track of which member is in use.

04:26.160 --> 04:27.930
And that is known as a tagged union.

04:30.070 --> 04:31.180
So let's try that out.

04:31.660 --> 04:37.480
We start off by creating an enumeration, with values for each possible allowed type.

04:37.480 --> 04:39.280
So we have character, int and double.

04:40.330 --> 04:43.270
Then we have an extra member, which is this token type.

04:44.500 --> 04:50.010
If we are going to bring the character member into use, then we need to set the token type to character.

04:50.920 --> 04:55.210
And then later on, if we want to use the double member, we can just check this.

04:55.570 --> 04:58.270
And if the token type is double, then we know it is safe to use this.

04:58.810 --> 05:01.870
And if it is not, then we have avoided something which is dangerous.

05:05.490 --> 05:08.130
And there are, we get a much more sensible result this time.

05:10.300 --> 05:15.080
But there is still a problem: we have to rely on the programmer who assigned to the character member.

05:15.340 --> 05:18.670
We have to assume that they also remembered to set the token type.

05:19.570 --> 05:22.350
We will come back to this in the next video. But

05:22.780 --> 05:24.220
meanwhile, that is all for this one.

05:24.610 --> 05:25.420
I will see you next time.

05:25.420 --> 05:27.700
But until then, keep coding!
