WEBVTT

1
00:00:00.840 --> 00:00:04.350
<v ->With tools like LM Studio, but also Ollama,</v>

2
00:00:04.350 --> 00:00:09.240
you can control how the loaded model generates tokens

3
00:00:09.240 --> 00:00:11.370
and data for its output.

4
00:00:11.370 --> 00:00:12.510
And specifically,

5
00:00:12.510 --> 00:00:15.240
you can typically tweak the temperature,

6
00:00:15.240 --> 00:00:20.070
and values called top_k, top_p, and sometimes also min_p,

7
00:00:20.070 --> 00:00:22.203
but what are these terms about now?

8
00:00:23.070 --> 00:00:25.710
The temperature setting can be used

9
00:00:25.710 --> 00:00:30.180
to modify the probabilities that are assigned

10
00:00:30.180 --> 00:00:33.960
to the different tokens or token candidates.

11
00:00:33.960 --> 00:00:35.430
And to understand that,

12
00:00:35.430 --> 00:00:37.620
it's important to understand

13
00:00:37.620 --> 00:00:40.860
that these large language models generate tokens

14
00:00:40.860 --> 00:00:42.870
or token candidates.

15
00:00:42.870 --> 00:00:46.110
So if we, for example, have a sequence like the sky is,

16
00:00:46.110 --> 00:00:48.930
then as a next token or word,

17
00:00:48.930 --> 00:00:52.200
the model may generate blue, or clear,

18
00:00:52.200 --> 00:00:54.930
or visible, or some other word.

19
00:00:54.930 --> 00:00:57.840
And each token it considers

20
00:00:57.840 --> 00:01:00.270
has a probability assigned to it.

21
00:01:00.270 --> 00:01:01.103
Now,

22
00:01:01.103 --> 00:01:02.130
it's this probability

23
00:01:02.130 --> 00:01:05.310
and this token that is derived by all these weights,

24
00:01:05.310 --> 00:01:07.830
all these parameters that make up the model.

25
00:01:07.830 --> 00:01:12.210
So based on your input, the sky, is based on these tokens,

26
00:01:12.210 --> 00:01:14.100
these token candidates,

27
00:01:14.100 --> 00:01:18.210
and their probabilities are generated by the model.

28
00:01:18.210 --> 00:01:21.630
And with temperature, top_k, top_p, and min_p,

29
00:01:21.630 --> 00:01:25.470
you can configure which of these possible tokens

30
00:01:25.470 --> 00:01:27.810
are actually considered for sampling.

31
00:01:27.810 --> 00:01:31.890
So which tokens are actually considered to be chosen

32
00:01:31.890 --> 00:01:34.800
for the actual output, so to say.

33
00:01:34.800 --> 00:01:37.350
And with temperature, as mentioned,

34
00:01:37.350 --> 00:01:41.370
you can modify these probabilities, for example.

35
00:01:41.370 --> 00:01:44.790
To be precise, if you set a low temperature value,

36
00:01:44.790 --> 00:01:48.930
zero, or 0.1, or something like this,

37
00:01:48.930 --> 00:01:53.930
then differences in probabilities are exaggerated,

38
00:01:54.090 --> 00:01:56.220
which means, for example, that blue,

39
00:01:56.220 --> 00:02:01.220
which in my example here has a made-up probability of 45%

40
00:02:01.500 --> 00:02:03.120
would become even more likely.

41
00:02:03.120 --> 00:02:05.850
The probability might get boosted to 90

42
00:02:05.850 --> 00:02:10.230
or 95% behind the scenes after the tokens were generated,

43
00:02:10.230 --> 00:02:13.590
so these tweaks are made after the token candidates

44
00:02:13.590 --> 00:02:15.660
have been generated by the model.

45
00:02:15.660 --> 00:02:18.120
So a low temperature value will boost

46
00:02:18.120 --> 00:02:20.790
the likely probabilities to be even more likely

47
00:02:20.790 --> 00:02:24.600
and make the unlikely probabilities even less likely.

48
00:02:24.600 --> 00:02:25.433
On the other hand,

49
00:02:25.433 --> 00:02:28.200
a high temperature flattens those differences

50
00:02:28.200 --> 00:02:31.080
and leads to more equal probabilities.

51
00:02:31.080 --> 00:02:34.590
So again, in this example with a high temperature setting,

52
00:02:34.590 --> 00:02:38.430
blue might go down from 45 to 20%,

53
00:02:38.430 --> 00:02:43.200
but clear might go up to 20%, for example,

54
00:02:43.200 --> 00:02:45.540
that's what the temperature setting does,

55
00:02:45.540 --> 00:02:49.650
And we'll see it in action in the next lecture, of course,

56
00:02:49.650 --> 00:02:51.810
but we don't just have the temperature.

57
00:02:51.810 --> 00:02:54.630
We also, for example, have the top_k value,

58
00:02:54.630 --> 00:02:56.640
which can be set.

59
00:02:56.640 --> 00:02:58.080
And with top_k,

60
00:02:58.080 --> 00:03:00.210
we can limit the number of candidates

61
00:03:00.210 --> 00:03:02.070
that are considered at all.

62
00:03:02.070 --> 00:03:04.800
So here, it's not about boosting probabilities,

63
00:03:04.800 --> 00:03:08.430
it's simply about getting rid of some candidates.

64
00:03:08.430 --> 00:03:11.460
For example, if you have a K value of five,

65
00:03:11.460 --> 00:03:14.610
only the five most likely values are considered.

66
00:03:14.610 --> 00:03:15.443
If it's one,

67
00:03:15.443 --> 00:03:19.140
it's definitely the most likely value that will be used.

68
00:03:19.140 --> 00:03:22.590
So if the top K parameter were set to one here,

69
00:03:22.590 --> 00:03:24.840
we would definitely pick blue

70
00:03:24.840 --> 00:03:28.110
because it's the most likely token

71
00:03:28.110 --> 00:03:30.570
in this made up example here.

72
00:03:30.570 --> 00:03:34.260
If I set top_k to two instead, for example,

73
00:03:34.260 --> 00:03:36.480
blue and visible would be considered

74
00:03:36.480 --> 00:03:39.810
because these are the two highest percentages.

75
00:03:39.810 --> 00:03:42.450
The dot dot dot with 22%

76
00:03:42.450 --> 00:03:45.330
is meant to represent multiple alternatives

77
00:03:45.330 --> 00:03:48.720
where each alternative has less than 22%,

78
00:03:48.720 --> 00:03:51.630
just to be clear, so that's top_k.

79
00:03:51.630 --> 00:03:56.630
Top_p is about limiting the candidates based

80
00:03:56.670 --> 00:03:59.220
on their combined probabilities.

81
00:03:59.220 --> 00:04:02.550
If P is set to 0.5, for example,

82
00:04:02.550 --> 00:04:04.410
all candidates would be considered

83
00:04:04.410 --> 00:04:07.530
that combined have more than 50%.

84
00:04:07.530 --> 00:04:08.490
With 0.9,

85
00:04:08.490 --> 00:04:12.630
all candidates that have more than 90% would be considered.

86
00:04:12.630 --> 00:04:17.370
So again, here, if top_p were set to 0.5,

87
00:04:17.370 --> 00:04:19.740
blue and visible would be considered

88
00:04:19.740 --> 00:04:22.470
because combined they have more than 50%,

89
00:04:22.470 --> 00:04:26.820
and we don't need any other candidates to go above 50%.

90
00:04:26.820 --> 00:04:30.120
If top_p were set to 90%,

91
00:04:30.120 --> 00:04:32.910
we would need blue, visible, and clear,

92
00:04:32.910 --> 00:04:35.880
but even then we would still be below 90%.

93
00:04:35.880 --> 00:04:38.190
So we would also include some other candidates

94
00:04:38.190 --> 00:04:40.800
from those remaining tokens.

95
00:04:40.800 --> 00:04:42.390
And then from all these tokens

96
00:04:42.390 --> 00:04:44.940
that are included in the actual candidates,

97
00:04:44.940 --> 00:04:47.250
one candidate will be chosen,

98
00:04:47.250 --> 00:04:49.410
and that will then be the actual token

99
00:04:49.410 --> 00:04:51.930
that makes it into the actual output.

100
00:04:51.930 --> 00:04:52.770
And by the way,

101
00:04:52.770 --> 00:04:56.700
it's not like the highest probability then always wins.

102
00:04:56.700 --> 00:05:00.030
Instead you can think of it as a weighted dice roll,

103
00:05:00.030 --> 00:05:02.190
so the higher probability candidate

104
00:05:02.190 --> 00:05:04.440
is more likely to be chosen,

105
00:05:04.440 --> 00:05:08.280
but the less likely one also have a chance of being chosen.

106
00:05:08.280 --> 00:05:11.460
That's how temperature, top_k, and top_p work,

107
00:05:11.460 --> 00:05:13.470
and they can be combined.

108
00:05:13.470 --> 00:05:17.070
You can exaggerate and flatten differences with temperature,

109
00:05:17.070 --> 00:05:19.980
then restrict the number of candidates with top_k,

110
00:05:19.980 --> 00:05:24.090
and then in addition also use top_p, for example.

111
00:05:24.090 --> 00:05:25.890
There also is min_p.

112
00:05:25.890 --> 00:05:29.670
And min_p can be used to discard certain candidates

113
00:05:29.670 --> 00:05:33.570
that don't meet a minimum probability threshold.

114
00:05:33.570 --> 00:05:37.440
So for example, if you set top_k to five,

115
00:05:37.440 --> 00:05:40.350
and you therefore have five candidates,

116
00:05:40.350 --> 00:05:44.220
you could in addition set min_p to 0.5

117
00:05:44.220 --> 00:05:45.960
to remove all candidates

118
00:05:45.960 --> 00:05:50.880
that have an individual probability of less than 5%.

119
00:05:50.880 --> 00:05:52.950
So if one of your five candidates

120
00:05:52.950 --> 00:05:55.620
would only have a probability of 4%,

121
00:05:55.620 --> 00:05:59.580
it would be excluded with min_p set to 0.5,

122
00:05:59.580 --> 00:06:03.450
even if it would be one of the five most likely ones,

123
00:06:03.450 --> 00:06:05.100
that's the idea here.

124
00:06:05.100 --> 00:06:07.980
Now, I will say that in reality,

125
00:06:07.980 --> 00:06:11.820
you likely will not tweak these settings all the time.

126
00:06:11.820 --> 00:06:14.100
The defaults are typically pretty good,

127
00:06:14.100 --> 00:06:17.970
giving you a nice mixture of creativity, randomness,

128
00:06:17.970 --> 00:06:21.240
and expected behavior you could say.

129
00:06:21.240 --> 00:06:23.610
But for specific use cases,

130
00:06:23.610 --> 00:06:26.220
you can definitely tweak these settings.

131
00:06:26.220 --> 00:06:28.950
And we will play around with them to get a better feeling

132
00:06:28.950 --> 00:06:30.993
for them in the next lecture.