1
00:00:03,920 --> 00:00:08,580
Hello, bonjour, zdravstvuyte, mambo,

2
00:00:09,570 --> 00:00:15,700
welcome to the fourth tutorial of Statistical Mechanics: Algorithms and Computations

3
00:00:15,700 --> 00:00:19,700
from the Physics Department of Ecole Normale Superieure.

4
00:00:19,700 --> 00:00:23,280
This week we will study random numbers

5
00:00:23,280 --> 00:00:30,170
and the sampling of discrete and continuous one-dimensional distributions.

6
00:00:30,650 --> 00:00:35,140
You need to understand these relatively simple problems

7
00:00:35,140 --> 00:00:38,610
in order to attack the more complex

8
00:00:38,610 --> 00:00:42,610
sampling problem that we are interested in, for their physical applications.

9
00:00:43,280 --> 00:00:49,960
For three weeks already, we heavily relied on random numbers Upsilon

10
00:00:49,960 --> 00:00:55,730
generated through Upsilon=random.uniform(0,1)

11
00:00:56,910 --> 00:01:03,300
This means that we called on the module 'random' of Python

12
00:01:03,300 --> 00:01:07,460
and use the function 'uniform' between 0 and 1

13
00:01:07,460 --> 00:01:14,950
to generate random numbers between 0 and 1, uniformly distributed with a flat distribution.

14
00:01:15,890 --> 00:01:21,630
We left out many things, for example how these random numbers

15
00:01:21,630 --> 00:01:23,690
are generated

16
00:01:23,690 --> 00:01:27,690
on a computer, the most deterministic of all devices

17
00:01:27,690 --> 00:01:30,520
and whether we can trust them.

18
00:01:32,450 --> 00:01:36,860
To get an idea of what random numbers really are

19
00:01:36,860 --> 00:01:43,470
let us consider a baby version of a random number generator:

20
00:01:43,470 --> 00:01:47,470
it is called the congruential linear random number generator.

21
00:01:48,440 --> 00:01:53,190
Starting from a seed, an initial integer value,

22
00:01:53,190 --> 00:01:56,340
it generates a deterministic

23
00:01:56,340 --> 00:02:00,720
but seemingly erratic sequence of integers,

24
00:02:02,060 --> 00:02:07,950
that are folded back into a given interval through a modulo operation.

25
00:02:09,360 --> 00:02:18,200
It is this modulo operation that gives the name congruential to this and this whole class of generators

26
00:02:19,620 --> 00:02:30,190
Then, these integer numbers are divided by a large number, to give the familiar objects

27
00:02:30,190 --> 00:02:34,190
the random floats between 0 and 1.

28
00:02:37,110 --> 00:02:40,510
Of course, this program is deterministic

29
00:02:41,440 --> 00:02:45,000
and if we run it several times

30
00:02:45,740 --> 00:02:49,000
we will obtain exactly the same output.

31
00:02:51,450 --> 00:02:55,670
Now, we can set the initial seed

32
00:02:55,670 --> 00:02:59,670
equal for example to the system time

33
00:03:00,400 --> 00:03:06,430
and then the output of our random number generator

34
00:03:06,430 --> 00:03:11,740
will be a different sequence at each time that we run it.

35
00:03:12,740 --> 00:03:22,640
This is also done in the random.uniform that you have in Python, unless you take special precautions.

36
00:03:23,290 --> 00:03:30,410
Let us briefly discuss the properties of our naive generator, naive_ran.py

37
00:03:30,980 --> 00:03:35,610
but as it always more fun to find answers by yourself

38
00:03:35,610 --> 00:03:37,910
than to have them told to you

39
00:03:37,910 --> 00:03:42,630
I ask you to download the program from the Coursera website

40
00:03:42,630 --> 00:03:45,630
and then to answer the questions.

41
00:03:45,630 --> 00:03:49,630
You may have to run the program in order to find them.

42
00:03:51,250 --> 00:03:53,630
So, question 1:

43
00:03:53,630 --> 00:04:00,850
the program naive_ran.py generates a sequence of integers idum

44
00:04:01,310 --> 00:04:05,840
and of floats idum / float(m)

45
00:04:07,580 --> 00:04:12,230
what is the range of the integers idum?

46
00:04:16,580 --> 00:04:22,610
Well, the answer is clearly that the integers idum

47
00:04:22,870 --> 00:04:28,390
are contained in an interval including 0 and (m-1).

48
00:04:29,200 --> 00:04:33,600
This is done by the modulo operation: modulo m

49
00:04:33,600 --> 00:04:37,600
Now question 2:

50
00:04:39,330 --> 00:04:47,620
we have understood that the integers idum are between 0 and (m-1)

51
00:04:49,720 --> 00:04:56,690
does the generator generate all the numbers between 0 and (m-1)?

52
00:04:57,680 --> 00:05:01,210
or does it leave out a few of them? Are there holes?

53
00:05:05,580 --> 00:05:09,990
Well, as you may find out for yourself,

54
00:05:10,390 --> 00:05:19,620
for example by sorting the output of the random number generator along the first column,

55
00:05:20,060 --> 00:05:27,350
you see that all the numbers between 0 and (m-1) are represented.

56
00:05:28,570 --> 00:05:30,370
Question 3:

57
00:05:31,920 --> 00:05:38,760
we have seen that the sequence of random integers idum is bounded

58
00:05:40,210 --> 00:05:47,370
so sure enough after certain time we'll hit a number that we have seen before

59
00:05:47,370 --> 00:05:53,810
what happens then? Run the program and see for yourself.

60
00:05:57,140 --> 00:06:02,870
Well, this answer is very simple: the output sequence is periodic

61
00:06:03,500 --> 00:06:10,560
if you hit a number a certain time, you'll generate the same sequence as you did the first time

62
00:06:11,610 --> 00:06:19,860
it is this, plus the fact that there are no holes in the distribution that makes that this distribution is totally flat.

63
00:06:21,910 --> 00:06:27,460
You can learn about the Python generator by reading the documentation

64
00:06:27,460 --> 00:06:31,460
or by running this little program.

65
00:06:31,460 --> 00:06:35,460
In our applications, we need to sample

66
00:06:35,460 --> 00:06:44,380
integers and real, but also physical objects, from a general distribution pi(x)

67
00:06:46,100 --> 00:06:51,100
this is of course what the Markov chain method is all about

68
00:06:52,290 --> 00:07:02,960
but the direct sampling of discrete and one-dimensional continuous systems is an important subject

69
00:07:02,960 --> 00:07:06,960
that Michael will next treat in this tutorial.

70
00:07:08,460 --> 00:07:14,940
Before continuing, please take a moment to download, run and modify the program

71
00:07:14,940 --> 00:07:18,180
naive_ran.py, if you haven't done so already

72
00:07:18,180 --> 00:07:26,230
and also to familiarize yourself with basic usage of random numbers by running this program.

73
00:07:31,180 --> 00:07:35,590
Throughout this course we use the Metropolis algorithm

74
00:07:35,590 --> 00:07:39,590
many times, as Werner explained in the first lecture

75
00:07:39,590 --> 00:07:45,180
the Metropolis acceptance rate is given by pi(a->b)

76
00:07:45,180 --> 00:07:50,020
= minimum of 1 and the ratio of the statistical weights

77
00:07:50,020 --> 00:07:55,660
pi(b) and pi(a) of the states b and a

78
00:07:56,630 --> 00:08:03,670
But what did we actually do in order to decide whether or not a particular move is accepted or rejected?

79
00:08:04,160 --> 00:08:08,860
This can be determined by comparing the acceptance rate

80
00:08:08,860 --> 00:08:11,880
to a random number between 0 and 1.

81
00:08:11,880 --> 00:08:19,560
If the random number is smaller than the acceptance rate the move is accepted, and otherwise it is rejected

82
00:08:19,560 --> 00:08:23,270
Of course we do not even need to make this comparison

83
00:08:23,270 --> 00:08:26,660
if the ratio of the statistical weights is larger than one

84
00:08:26,660 --> 00:08:35,630
cause then - according to Metropolis - the acceptance rate will be 1 and the move is certainly accepted.

85
00:08:36,340 --> 00:08:42,110
We have already used the Metropolis algorithm to sample non-constant probabilities.

86
00:08:42,110 --> 00:08:46,460
For example in last week's homework (homework number 3)

87
00:08:46,730 --> 00:08:51,320
we sampled non-constant distributions of hard-disks

88
00:08:51,870 --> 00:08:55,320
but now let us use the Metropolis algorithm

89
00:08:55,320 --> 00:09:01,970
to sample a simple one-dimensional function, in fact the Gaussian normal distribution

90
00:09:02,350 --> 00:09:07,610
How does this look in Python? We start at x=0

91
00:09:07,610 --> 00:09:13,860
then we propose a random move with a step size between (-delta) and delta

92
00:09:14,410 --> 00:09:19,870
the acceptance rate of this proposed move, according to Metropolis, is shown here

93
00:09:19,870 --> 00:09:25,500
and we have really understood that the minimum does not need to appear in the code.

94
00:09:26,950 --> 00:09:32,530
As we see, the algorithm samples the Gaussian distribution nicely

95
00:09:32,890 --> 00:09:35,980
but it much better to choose another route.

96
00:09:35,980 --> 00:09:39,370
We implement a direct sampling algorithm

97
00:09:39,370 --> 00:09:42,450
in fact a direct sampling rejection algorithm.

98
00:09:43,370 --> 00:09:47,400
Once again, we are now talking about rejections

99
00:09:47,400 --> 00:09:55,330
in a direct sampling algorithm, and not about rejections of the algorithmic transition in a Markov chain algorithm.

100
00:09:55,330 --> 00:09:57,230
So how do we proceed?

101
00:09:58,350 --> 00:10:06,050
Well, we can put our Gaussian into a box between (-x_cut) and x_cut

102
00:10:06,050 --> 00:10:10,320
and if you want, we can do this even on the Monte Carlo beach

103
00:10:10,320 --> 00:10:16,050
in a set-up which is only slightly modified from what we used in lecture 1.

104
00:10:17,220 --> 00:10:22,590
Just like the children threw pebbles randomly into a circle and a square

105
00:10:22,590 --> 00:10:24,760
to estimate the number pi

106
00:10:24,760 --> 00:10:28,760
we can now randomly throw pebbles into our rectangular box

107
00:10:28,760 --> 00:10:36,690
whenever a pebble falls below the Gaussian curve we accept it, otherwise we reject it.

108
00:10:37,790 --> 00:10:44,030
So what is the probability that a pebble that is fallen at a position x

109
00:10:44,030 --> 00:10:46,890
landed below the Gauss function?

110
00:10:46,890 --> 00:10:54,280
Well, clearly this probability is proportional to the function itself

111
00:10:54,280 --> 00:10:58,880
as there's the same number of pebbles landing at each position x

112
00:10:58,880 --> 00:11:02,880
due to the constant box side for all x

113
00:11:03,360 --> 00:11:10,120
then the fraction of the accepted pebbles is proportional to pi(x)

114
00:11:10,120 --> 00:11:14,830
In Python, our new pebble game can be implemented as follows

115
00:11:14,830 --> 00:11:18,830
we first select a random position within the rectangle

116
00:11:18,830 --> 00:11:23,260
then we check whether it is above or below the Gaussian curve

117
00:11:23,260 --> 00:11:26,950
and only if it is below we accept it

118
00:11:27,810 --> 00:11:34,950
This direct sampling rejection algorithm again samples the Gaussian distribution very nicely

119
00:11:35,370 --> 00:11:42,430
but note that we had to introduce this inelegant cutoff (-x_cut), x_cut.

120
00:11:42,430 --> 00:11:45,520
Rejection sampling can be problematic

121
00:11:45,520 --> 00:11:49,520
It is not due to the presence of rejections per se,

122
00:11:49,520 --> 00:11:54,050
but because the rejection rate can become really enormous, even prohibitive.

123
00:11:54,410 --> 00:12:00,970
To see this, let's look at a function that is not quite as friendly as a Gaussian distribution:

124
00:12:00,970 --> 00:12:04,970
the distribution 2 / sqrt(x)

125
00:12:05,760 --> 00:12:10,910
Maybe it troubles you that this distribution diverges for x->0

126
00:12:10,910 --> 00:12:13,710
as sqrt(x)->0

127
00:12:13,710 --> 00:12:16,590
and 1/sqrt(x) becomes infinite.

128
00:12:16,590 --> 00:12:19,980
However, such distributions exist

129
00:12:19,980 --> 00:12:27,090
and Vivien will later show you how to sample them without problems using a simple sample transformation

130
00:12:28,090 --> 00:12:33,840
But for our rejection method, that works so nicely in the simple Gaussian case,

131
00:12:33,840 --> 00:12:40,740
this presents a real headache, and this is because we simply don't know what box size to choose

132
00:12:41,220 --> 00:12:46,930
Should we set y_max equal to 1, 10, 1000, 1000000

133
00:12:46,930 --> 00:12:54,160
Our rejection rate would sky-rocket, and whatever value we choose some part of the curve will still be above our box.

134
00:12:54,630 --> 00:13:02,660
Let's give it a try, we implement the rejection method (just like we did in the Gaussian case) and set y_max=100

135
00:13:03,300 --> 00:13:09,800
although the result does not look so bad, in order to get a feeling for the potential danger of cutting off

136
00:13:09,800 --> 00:13:15,820
the divergent probability distribution at finite y_max, we compare with a Markov  chain algorithm.

137
00:13:15,820 --> 00:13:27,260
Note that we do not need to introduce any cut-off as the Markov chain algorithm does not have any problem coping with the infinite probability density.

138
00:13:28,440 --> 00:13:34,680
The programs records the highest values of y and the lowest values of x that it obtains

139
00:13:34,680 --> 00:13:38,680
while it randomly moves along the x-axis between 0 and 1.

140
00:13:40,480 --> 00:13:45,790
This is the output of our program, in a few second of runtime

141
00:13:45,790 --> 00:13:58,090
it goes down to values in x as small as 10^(-7) and up to values in y of a few thousand, way beyond our chosen cutoff of 100.

142
00:13:58,090 --> 00:14:02,570
Should we actually use this Markov chain algorithm?

143
00:14:02,830 --> 00:14:09,550
Alberto in a few moments and Vivien later will show you that this is not at all called for.

144
00:14:10,140 --> 00:14:18,920
So before continuing, please familiarize yourself thoroughly with the way the Metropolis acceptance probability is implemented.

145
00:14:20,410 --> 00:14:31,970
Furthermore take a moment to download, run and modify the programs we just discussed. On the Coursera website you will find markov_gauss.py

146
00:14:32,180 --> 00:14:35,390
that implements the Metropolis choice directly.

147
00:14:35,390 --> 00:14:39,960
There's also the graphic version, markov_gauss_movie.py.

148
00:14:40,060 --> 00:14:45,920
Furthermore you will find the program reject_direct_gauss_cut.py

149
00:14:46,570 --> 00:14:49,920
note that there's the word 'cut' in its name

150
00:14:49,920 --> 00:14:56,350
as we have to cut-off the tails of the distribution at (-x_cut), x_cut.

151
00:14:56,350 --> 00:15:02,630
In this example this was not a problem, as the distribution is suppressed

152
00:15:02,630 --> 00:15:06,630
exponentially for large absolute values of x.

153
00:15:07,270 --> 00:15:13,430
There's also the program reject_direct_inv_sqrt.py

154
00:15:13,730 --> 00:15:19,250
Now we had to introduce a cut-off in y, and it was not exponentially small.

155
00:15:19,250 --> 00:15:24,260
This is really dangerous and we must avoid it at any price.

156
00:15:26,510 --> 00:15:31,280
Before Alberto will next introduce the method of tower sampling

157
00:15:31,280 --> 00:15:37,390
I invite you to take a look at markov_inv_sqrt_movie.py

158
00:15:37,390 --> 00:15:43,540
and take it apart, just in case you still doubt the existence of divergent distributions.

159
00:15:43,540 --> 00:15:48,370
Get it to run for other values of the exponent: positive ones

160
00:15:48,370 --> 00:15:53,640
pi proportional to x, x^2, x^3, for x between 0 and 1

161
00:15:53,640 --> 00:15:58,610
but also for negative ones: pi proportional to x^(-1/2)

162
00:15:58,610 --> 00:16:05,960
x^(-1) and x^(-3/2), again for x between 0 and 1

163
00:16:05,960 --> 00:16:13,800
There are many strange phenomena to observe, and we'll get back to them in week 10, the final week of this course.

164
00:16:18,890 --> 00:16:22,260
Sampling non-uniform finite distributions

165
00:16:22,260 --> 00:16:27,310
has an archetypical example in what we call the Saturday night problem.

166
00:16:27,310 --> 00:16:31,310
As discussed by Werner in lecture 4,

167
00:16:31,310 --> 00:16:36,710
we imagine to have k possible evening activities

168
00:16:36,710 --> 00:16:41,860
and we do not feel equally enthusiastic about them:

169
00:16:43,570 --> 00:16:48,580
go out with friend, visit family, do your homework,

170
00:16:48,580 --> 00:16:52,580
sports, laundry, and so on..

171
00:16:52,580 --> 00:16:56,580
We know all the pi_k probabilities

172
00:16:56,580 --> 00:17:00,580
but we can still have troubles in deciding what to do.

173
00:17:01,490 --> 00:17:08,410
This means that we have troubles in sampling the probability pi_0, pi_1, pi_2 and so on

174
00:17:09,190 --> 00:17:17,970
we can adapt Michael's rejection method, and instead of sampling x between -x_cut and x_cut,

175
00:17:17,970 --> 00:17:22,490
we can draw integer random numbers

176
00:17:22,490 --> 00:17:27,930
0, 1, .., K-1, select an activity

177
00:17:27,930 --> 00:17:34,280
which we'll be accepted if the pebble falls inside the red box.

178
00:17:34,280 --> 00:17:39,560
This occurs with a probability which is proportional to the weight of the activity.

179
00:17:40,290 --> 00:17:47,230
As Michael explained, this algorithm can have problems because of the high rejection rate.

180
00:17:47,230 --> 00:17:57,740
But for the Saturday night problem this is ok, we can program the algorithm and plan our activities week by week.

181
00:18:00,960 --> 00:18:02,440
Bad luck..

182
00:18:04,510 --> 00:18:07,010
Let's look again to the last figure.

183
00:18:08,070 --> 00:18:14,500
You can see that the number of pebbles you should draw before having a single hit

184
00:18:14,990 --> 00:18:20,870
is typically equal to the ratio between pi_max and pi_average.

185
00:18:21,460 --> 00:18:29,060
Where pi_max is the maximal weight and pi_average is the mean value between all the activities.

186
00:18:29,060 --> 00:18:33,060
This number can be very large.

187
00:18:33,060 --> 00:18:38,370
The tower sampling method is much faster and much more elegant.

188
00:18:39,420 --> 00:18:48,140
Instead of placing the boxed one next to the other, we pile up them, one on top of each other,

189
00:18:48,140 --> 00:18:54,460
and the pebble is drawn from the bottom to the top of the tower.

190
00:18:55,870 --> 00:19:03,670
Once again, you choose your activity with a probability which is proportional to the weight of the activity

191
00:19:03,670 --> 00:19:09,390
But the algorithm is much faster, and the output is statistically the same.

192
00:19:10,560 --> 00:19:16,040
in Python, this gives the following rejection-free program

193
00:19:17,440 --> 00:19:22,540
How we can actually find the box j where the pebble falls?

194
00:19:23,400 --> 00:19:26,540
for a small number of activities K

195
00:19:26,540 --> 00:19:32,920
we can just go through the ordered table pi_0, pi_1, ..

196
00:19:32,920 --> 00:19:41,470
until we find a j for which pi_(j-1)<Upsilon<pi_j

197
00:19:42,340 --> 00:19:46,770
For larger K, it is better to use the bisection method

198
00:19:47,320 --> 00:19:55,540
We should first check if Upsilon is smaller or larger than pi_(K/2)

199
00:19:55,540 --> 00:20:04,160
and then cut the range of possible indices j in two for any subsequent iteration.

200
00:20:05,250 --> 00:20:15,300
After log_2(k) operations, we will find the box, and this number gives the complexity of the algorithm.

201
00:20:16,230 --> 00:20:27,940
The tower sampling method is quite amazing because it implements a rejection-free algorithm for sampling, but it is not optimal.

202
00:20:28,970 --> 00:20:36,890
The best method to sample discrete distributions is the Walker method.

203
00:20:40,070 --> 00:20:43,190
Let's consider the colored boxes in the figure.

204
00:20:44,000 --> 00:20:48,330
They represent the discrete distribution that we want to sample.

205
00:20:48,850 --> 00:20:52,330
Let us arrange them in a special way.

206
00:20:53,610 --> 00:21:00,160
We first compute the box height average, as shown in the figure

207
00:21:00,160 --> 00:21:05,210
there are bigger boxes and smaller boxes than average

208
00:21:05,820 --> 00:21:10,930
Take one bigger box and a smaller box,

209
00:21:10,930 --> 00:21:15,960
put on the top of the smaller box the bigger box,

210
00:21:15,960 --> 00:21:23,620
cut it off, return what is left on its original position.

211
00:21:23,620 --> 00:21:32,930
You can continue and repeat this operation for other bigger boxes and other smaller boxes.

212
00:21:34,430 --> 00:21:38,120
Like here, and here.

213
00:21:39,110 --> 00:21:48,600
At the end, you will have a boxing scheme, where for each position you have at most two boxes

214
00:21:48,600 --> 00:21:54,620
which add up to the average. And now sampling is very easy.

215
00:21:55,350 --> 00:21:59,930
You notice that the logarithmic complexity has improved.

216
00:21:59,930 --> 00:22:07,270
We achieved a sampling method that takes a constant work per pebble.

217
00:22:07,270 --> 00:22:12,110
for an arbitrary number of evening activities.

218
00:22:12,110 --> 00:22:16,210
And this ingenious algorithm can be

219
00:22:16,370 --> 00:22:19,460
programmed in Python in a few lines.

220
00:22:21,560 --> 00:22:37,060
Sampling a discrete distribution is so easy that you can think that to sample a continuous one-dimensional distribution you should cut it up in small boxes

221
00:22:37,060 --> 00:22:41,720
and use the tower sampling or the Walker algorithm.

222
00:22:42,670 --> 00:22:44,430
This is not so.

223
00:22:44,430 --> 00:22:53,840
In fact, next in this tutorial, Vivien will show directly the continuous limit of the tower sampling,

224
00:22:53,840 --> 00:22:57,840
And come up with very nice and amazing algorithms.

225
00:22:58,470 --> 00:23:03,200
But before listening to him, please take a moment

226
00:23:03,200 --> 00:23:09,120
to download, run and modify the programs we discussed in this section.

227
00:23:10,610 --> 00:23:13,820
Here you find the tower sampling method,

228
00:23:14,250 --> 00:23:18,510
with the logarithmic search that locates the boxes.

229
00:23:21,000 --> 00:23:31,280
Here we find also the ingenious algorithm by Walker, that produces the nice two-slate parquet.

230
00:23:36,270 --> 00:23:43,410
In the final part of this tutorial, let us now take the continuous limit of the discrete distributions

231
00:23:45,180 --> 00:23:53,540
This corresponds to cases where instead of having a finite or enumerable series of probabilities,

232
00:23:53,540 --> 00:23:57,540
pi_0, pi_1, pi_2...

233
00:23:57,540 --> 00:24:10,470
you have a distribution pi(x) which is a function of the continuous variable x, taking values inside the real line.

234
00:24:11,980 --> 00:24:18,450
To sample such distributions, let's come back to Alberto's tower sampling trick

235
00:24:18,450 --> 00:24:26,520
and remain with a discretized version of the function for a little while, as described here.

236
00:24:27,830 --> 00:24:42,560
To apply Alberto's method we take the different boxes with colors indicated here (red, green, blue, yellow, pink)

237
00:24:42,560 --> 00:24:46,560
and place them on top of each other.

238
00:24:49,100 --> 00:24:54,080
With respect to the Saturday night problem, here

239
00:24:54,080 --> 00:24:58,080
each box corresponds to a position x,

240
00:24:58,080 --> 00:25:02,080
rather than to a specific activity.

241
00:25:04,810 --> 00:25:12,750
To keep track of these x positions, the best way is the one which is displayed here.

242
00:25:14,160 --> 00:25:22,790
Again, if we place a pebble randomly between the bottom and the top of the tower (that we can take equal to one)

243
00:25:22,790 --> 00:25:29,660
we'll fall in the red box with a probability which is proportional to the height of this box.

244
00:25:29,660 --> 00:25:33,660
But this is exactly what we want.

245
00:25:33,660 --> 00:25:44,930
Using this scheme, one can trace from the position of the pebble, to the box and to the position x that we want to sample

246
00:25:44,930 --> 00:25:52,310
according to the distribution (which is discrete) pi(x). This is what is displayed here.

247
00:25:54,030 --> 00:25:59,670
Now let's make the discretization finer and finer.

248
00:26:00,380 --> 00:26:03,670
As you observe, there is an envelope

249
00:26:03,670 --> 00:26:07,670
which develops around the tower construction

250
00:26:07,670 --> 00:26:11,670
and the value of the envelope at a given position x

251
00:26:11,670 --> 00:26:17,970
is in fact simply the integral of the distribution pi(x) up to this point.

252
00:26:21,410 --> 00:26:29,380
In fact, in this example we have taken pi(x) to be a Gaussian normal distribution

253
00:26:30,840 --> 00:26:39,930
so this means that we have a second algorithm, that we call gauss_transform.py, to sample this distribution.

254
00:26:41,060 --> 00:26:47,350
It is a bit complicated, because we first have to compute the integral of this function

255
00:26:48,800 --> 00:26:56,430
Sometimes we can compute analytically the definite integral between minus infinity and x

256
00:26:58,160 --> 00:27:01,690
For the Gaussian distribution - however - Werner's trick

257
00:27:01,690 --> 00:27:07,430
does not work. It only works between minus infinity and plus infinity.

258
00:27:09,110 --> 00:27:17,680
Fortunately for us, the definite integral which is called the error function and which is displayed here

259
00:27:17,680 --> 00:27:21,680
has been studied by mathematicians for centuries.

260
00:27:22,580 --> 00:27:26,680
Although it cannot be reduced to a simpler formula,

261
00:27:26,680 --> 00:27:31,650
we can evaluate it numerically with high efficiency

262
00:27:32,310 --> 00:27:35,650
but now wait a minute, we don't need this function!

263
00:27:35,650 --> 00:27:45,050
So, generically, let's call phi(x) the integral of the distribution pi up to the position x.

264
00:27:46,000 --> 00:27:53,320
We don't need the value phi(x) associated to a value of x, but rather the contrary

265
00:27:54,140 --> 00:28:01,430
we need the position x corresponding to phi, which is the random number between 0 and 1.

266
00:28:02,060 --> 00:28:06,070
All in all, we will need the inverse function of phi(x)

267
00:28:06,070 --> 00:28:14,100
Again, for the Gaussian distribution, the inverse error function cannot be reduced to a simple form

268
00:28:14,100 --> 00:28:19,730
but it can be evaluated numerically as displayed in this program

269
00:28:19,730 --> 00:28:26,950
where its value can be computed by a simple call to the erfinv function

270
00:28:26,950 --> 00:28:30,950
The output of the program is shown here.

271
00:28:32,980 --> 00:28:40,620
To summarize in a mathematical way the mathematical way the discussion that we have had up to now on a specific example,

272
00:28:40,620 --> 00:28:46,580
let us make explicit the correspondence between the discrete and continuous cases.

273
00:28:48,360 --> 00:28:58,250
As displayed here, the first step consists in computing the definite integral of pi between minus infinity and x.

274
00:28:58,250 --> 00:29:02,250
This is the function that we call phi(x).

275
00:29:03,970 --> 00:29:10,120
The second step, which is illustrated there, in the discrete cases

276
00:29:10,120 --> 00:29:17,710
corresponds to determining the position at which the pebble has fallen into.

277
00:29:20,140 --> 00:29:29,950
In the continuous case, it consists in determining the value x which corresponds to the random number Upsilon

278
00:29:29,950 --> 00:29:33,950
distributed between 0 and 1.

279
00:29:36,270 --> 00:29:41,330
Now, let us remark that the procedure that we have described up to now

280
00:29:41,330 --> 00:29:49,680
is in fact the sample transformation described by Werner in this week's lecture, lecture 4.

281
00:29:50,420 --> 00:29:54,170
In any case, the two hurdles that we have at hand

282
00:29:54,170 --> 00:29:59,960
computing the integral and solving the equation phi(x)=Upsilon

283
00:29:59,960 --> 00:30:07,200
can be overcome in general for one-dimensional distribution with various degrees of freedom.

284
00:30:07,200 --> 00:30:11,200
Sometimes the integral can be computed analytically,

285
00:30:11,200 --> 00:30:15,200
sometimes one has to resort to numerical integration

286
00:30:15,200 --> 00:30:17,630
and to interpolations.

287
00:30:19,500 --> 00:30:26,480
Let us also remark that if pi(x) presents divergences or singularities,

288
00:30:26,480 --> 00:30:33,660
however its integral phi(x) up to point x is in fact a much more regular function.

289
00:30:33,660 --> 00:30:39,140
Indeed, since pi is positive and can be normalized,

290
00:30:39,140 --> 00:30:44,900
phi is growing from 0 to 1 in a monotonous manner.

291
00:30:45,940 --> 00:30:53,140
This means - in the end- that the process we have described to sample pi(x) is very stable in general

292
00:30:53,740 --> 00:31:02,130
AS a second example we now turn our attention to the case of a distribution pi(x) equal to a constant

293
00:31:02,130 --> 00:31:05,030
times x to the power of gamma.

294
00:31:05,760 --> 00:31:11,730
where x is a real number between 0 and 1, 0 being excluded.

295
00:31:12,670 --> 00:31:20,380
A few minutes ago, Michael has studied this distribution in the case where gamma = -1/2.

296
00:31:20,380 --> 00:31:25,540
This was a distribution which blew up when x was going to zero.

297
00:31:25,540 --> 00:31:31,810
In fact, we have already considered this distribution for gamma=2

298
00:31:32,040 --> 00:31:35,980
in the program direct_spheres_3d.py.

299
00:31:37,490 --> 00:31:43,610
Let us now apply the sample transformation to sample this distribution.

300
00:31:43,610 --> 00:31:47,610
This is what we use in the following program

301
00:31:49,950 --> 00:31:59,090
with the proper normalization, the function phi(x) can be computed, and it is equal to x^(gamma+1).

302
00:32:00,580 --> 00:32:05,210
The sampling of the distribution is thus given by the following relation

303
00:32:05,790 --> 00:32:11,240
which is x^(1+gamma)=Upsilon

304
00:32:11,240 --> 00:32:17,290
which means that x=Upsilon^(1/(1+gamma))

305
00:32:17,290 --> 00:32:25,500
where as usual Upsilon is a random number between 0 and 1 with a flat distribution.

306
00:32:26,640 --> 00:32:33,500
So, in the end it means that to sample pi you simply have to take this random number

307
00:32:33,500 --> 00:32:37,500
and to take it to the power of 1/(1+gamma).

308
00:32:38,550 --> 00:32:41,800
Isn't this remarkably easy?

309
00:32:42,380 --> 00:32:50,490
It took us a bit of brain work to think about it, but in the end the results justify our effort.

310
00:32:51,100 --> 00:32:57,920
Now it is your turn to download, run and modify the programs we have presented in this section.

311
00:32:59,400 --> 00:33:07,480
Have a look at gauss_transform.py and make sure that you have understood how the sample transformation works.

312
00:33:07,480 --> 00:33:14,280
Further, try out gamma_transform.py

313
00:33:14,280 --> 00:33:18,540
and compare directly the results that you obtain

314
00:33:18,540 --> 00:33:23,080
to the two other algorithms: the Markov chain

315
00:33:23,080 --> 00:33:31,390
algorithm and the direct sampling algorithm that Michael presented earlier.

316
00:33:33,360 --> 00:33:40,950
In conclusion, we have discussed in this tutorial random numbers and probability distribution

317
00:33:40,950 --> 00:33:46,910
and how to reshape them using rejection methods or transformation methods.

318
00:33:47,930 --> 00:33:56,350
We found great rejection-free sampling programs for discrete and continuous one-dimensional distributions,

319
00:33:56,350 --> 00:34:00,490
and solved the direct sampling problem in these cases.

320
00:34:00,840 --> 00:34:04,840
Markov chain methods could generally be avoided.

321
00:34:05,740 --> 00:34:10,800
The situation is much more complicated in high-dimensional situations

322
00:34:10,800 --> 00:34:20,100
for example the hard-spheres simulations that we did last week or the quantum problem that we will next turn our attention to.

323
00:34:21,220 --> 00:34:25,860
In the meantime have fun with Homework session 4.