WEBVTT

1
00:00:01.029 --> 00:00:03.290
Hi, and welcome to this AI in C#

2
00:00:03.290 --> 00:00:06.090
video on the Microsoft Agent Framework.

3
00:00:06.760 --> 00:00:09.070
Today we're going to look at controlling the

4
00:00:09.070 --> 00:00:12.530
reasoning effort on reasoning models like, for example,

5
00:00:12.670 --> 00:00:13.710
ChatGPT-5.

6
00:00:16.980 --> 00:00:21.040
So, in order to see this best, the

7
00:00:21.040 --> 00:00:23.720
best way is to see the difference.

8
00:00:24.600 --> 00:00:27.920
So, I'm going to quickly set this up

9
00:00:27.920 --> 00:00:34.200
so we can run this sample, which is

10
00:00:34.200 --> 00:00:36.180
in my sample repo that you can get

11
00:00:36.180 --> 00:00:37.460
for free on GitHub.

12
00:00:38.780 --> 00:00:41.320
So, I'm just going to get my configuration,

13
00:00:42.220 --> 00:00:44.940
and what I'm going to do here as

14
00:00:44.940 --> 00:00:51.380
well is, in my OpenAI client here, or

15
00:00:51.380 --> 00:00:54.820
the Azure OpenAI client, if that's what you're

16
00:00:54.820 --> 00:00:58.560
using, I'm also setting the network timeout.

17
00:00:59.640 --> 00:01:02.940
You might not need this, but, of course,

18
00:01:03.220 --> 00:01:04.860
if you have a reasoning model and you

19
00:01:04.860 --> 00:01:07.220
give it a very, very hard question, it

20
00:01:07.220 --> 00:01:09.440
can take multiple minutes to actually get an

21
00:01:09.440 --> 00:01:09.760
answer.

22
00:01:10.400 --> 00:01:13.120
So, I just included this for the sake

23
00:01:13.120 --> 00:01:14.720
of how you do it.

24
00:01:14.720 --> 00:01:18.080
You simply just new up, and then there's

25
00:01:18.080 --> 00:01:20.220
extra options where you can set the network

26
00:01:20.220 --> 00:01:20.800
timeout.

27
00:01:21.300 --> 00:01:22.540
So, fairly simple to do.

28
00:01:23.540 --> 00:01:24.900
But why do we need it?

29
00:01:25.340 --> 00:01:27.700
Well, if we, for example, go in and

30
00:01:27.700 --> 00:01:32.100
use ChatGPT-5 mini here, and we use

31
00:01:32.100 --> 00:01:38.700
it without any extra configuration and ask, what

32
00:01:38.700 --> 00:01:41.200
is the capital of France and how many

33
00:01:41.200 --> 00:01:43.060
people live there?

34
00:01:44.040 --> 00:01:46.200
If we do that, it will take a

35
00:01:46.200 --> 00:01:49.300
little while to come back because ChatGPT-5

36
00:01:49.300 --> 00:01:52.780
is a reasoning model, and the default, if

37
00:01:52.780 --> 00:01:57.320
you don't set anything against many people's belief,

38
00:01:57.500 --> 00:01:59.240
is that it's set to medium.

39
00:01:59.700 --> 00:02:01.980
There's low, high, medium, and then there's something

40
00:02:01.980 --> 00:02:05.920
called minimal, which is the lowest of them

41
00:02:05.920 --> 00:02:06.140
all.

42
00:02:07.050 --> 00:02:11.240
But right now, we have used medium, which

43
00:02:11.240 --> 00:02:15.320
means when I write out this answer and

44
00:02:15.320 --> 00:02:17.980
show the token count, we can see that

45
00:02:17.980 --> 00:02:23.600
we used 19 tokens for input, but 623

46
00:02:23.600 --> 00:02:27.640
for output, and 512 of these for reasoning.

47
00:02:28.980 --> 00:02:31.080
And reasoning is good if you have a

48
00:02:31.080 --> 00:02:32.920
very hard question, but if it's an easy

49
00:02:32.920 --> 00:02:38.900
question, then it's not smart to just use

50
00:02:38.900 --> 00:02:39.280
reasoning.

51
00:02:39.420 --> 00:02:42.820
That's the reason why ChatGPT-5, if you

52
00:02:42.820 --> 00:02:44.460
give it an easy question, it can answer

53
00:02:44.460 --> 00:02:46.200
quickly, but if you give it a hard

54
00:02:46.200 --> 00:02:47.900
question, it takes longer.

55
00:02:48.740 --> 00:02:50.800
And that's because it's turning on and off

56
00:02:50.800 --> 00:02:51.340
the reasoning.

57
00:02:52.640 --> 00:02:54.620
For the APIs, you can't do that.

58
00:02:54.720 --> 00:02:57.260
It doesn't happen automatically, so you are in

59
00:02:57.260 --> 00:02:57.760
control.

60
00:02:57.760 --> 00:03:01.380
But if you just don't care and just

61
00:03:01.380 --> 00:03:04.820
run the models, you are actually using the

62
00:03:04.820 --> 00:03:09.060
medium reasoning capability.

63
00:03:10.040 --> 00:03:12.220
And that might not be what you want,

64
00:03:12.780 --> 00:03:16.320
because first of all, it takes longer to

65
00:03:16.320 --> 00:03:20.900
give an answer, and second, every token counts,

66
00:03:21.160 --> 00:03:25.960
so you're spending more money on some easy

67
00:03:25.960 --> 00:03:26.560
questions.

68
00:03:26.560 --> 00:03:31.370
So let's see how we can control this.

69
00:03:31.750 --> 00:03:37.870
And this is unfortunately quite cumbersome in my

70
00:03:37.870 --> 00:03:41.390
mind here, because what we do is we

71
00:03:41.390 --> 00:03:43.870
make our agent, but then we need to

72
00:03:43.870 --> 00:03:47.130
give our options to that, where we need

73
00:03:47.130 --> 00:03:49.570
to give a chat client agent options, and

74
00:03:49.570 --> 00:03:51.990
inside that we need to give chat options.

75
00:03:52.930 --> 00:03:55.430
And in that, we give a new option

76
00:03:55.430 --> 00:04:03.390
where we set this broad representation factory, which

77
00:04:03.390 --> 00:04:06.730
can finally set the reasoning effort level to,

78
00:04:06.850 --> 00:04:07.770
in my case, minimum.

79
00:04:08.430 --> 00:04:14.210
But we have the options of minimum, low,

80
00:04:17.660 --> 00:04:25.480
medium, which is the default, or high.

81
00:04:26.540 --> 00:04:30.660
And this is specific for OpenAI and Azure

82
00:04:30.660 --> 00:04:31.100
OpenAI.

83
00:04:31.500 --> 00:04:35.460
Other models, like Claude and Gemini, does this

84
00:04:35.460 --> 00:04:36.980
in completely different ways.

85
00:04:37.560 --> 00:04:40.000
So what I'm showing is something that only

86
00:04:40.000 --> 00:04:42.520
works with Azure OpenAI now, and you need

87
00:04:42.520 --> 00:04:44.900
to go in and check documentation of the

88
00:04:44.900 --> 00:04:48.000
other models on how you set that for

89
00:04:48.000 --> 00:04:48.760
that reasoning effort.

90
00:04:52.370 --> 00:04:56.690
So if we do this, we can actually

91
00:04:56.690 --> 00:04:58.150
get a response.

92
00:04:58.350 --> 00:04:59.470
And in my case, I've set it to

93
00:04:59.470 --> 00:05:02.350
minimal, which is a new effort they recently

94
00:05:02.350 --> 00:05:06.430
introduced, because of this problem of everyone is

95
00:05:06.430 --> 00:05:10.110
going to HTTP5, but they also need it

96
00:05:10.110 --> 00:05:12.470
for simple answers back.

97
00:05:13.210 --> 00:05:16.090
And for that, minimal is the best for

98
00:05:16.090 --> 00:05:16.830
a reasoning model.

99
00:05:17.640 --> 00:05:20.670
So if we ask exactly the same question,

100
00:05:21.670 --> 00:05:24.370
first of all, we should get our answer

101
00:05:24.370 --> 00:05:27.510
back faster.

102
00:05:29.740 --> 00:05:32.860
Roughly the same answer, but you can see

103
00:05:32.860 --> 00:05:37.940
the output tokens are now 64, and zero

104
00:05:37.940 --> 00:05:38.920
was used for reasoning.

105
00:05:42.300 --> 00:05:48.000
And that, of course, is much cheaper and

106
00:05:48.000 --> 00:05:49.980
much faster compared to up here.

107
00:05:50.620 --> 00:05:52.200
And then you need, of course, to begin

108
00:05:52.200 --> 00:05:56.040
to evaluate, is this answer better because we

109
00:05:56.040 --> 00:05:57.800
were thinking than this answer?

110
00:05:58.240 --> 00:06:00.060
And that's not the point here.

111
00:06:01.280 --> 00:06:04.460
You need to know in more real scenarios

112
00:06:04.460 --> 00:06:06.440
if you need reasoning or not.

113
00:06:07.140 --> 00:06:11.000
So you definitely need this if you want

114
00:06:11.000 --> 00:06:13.620
to do HTTP5 and do some simple stuff.

115
00:06:13.960 --> 00:06:16.600
And you definitely need this if you need

116
00:06:16.600 --> 00:06:18.600
to set it up to, for example, high

117
00:06:18.600 --> 00:06:20.760
and use even more tokens because it's a

118
00:06:20.760 --> 00:06:25.900
very advanced thing, like code reviews or something

119
00:06:25.900 --> 00:06:27.060
that is more advanced.

120
00:06:30.320 --> 00:06:32.060
As I mentioned, this is Composon.

121
00:06:32.200 --> 00:06:35.500
I have actually asked the Agent Framework team

122
00:06:35.500 --> 00:06:38.660
about if they want to do something more

123
00:06:38.660 --> 00:06:39.400
about this.

124
00:06:39.860 --> 00:06:41.520
And right now, the answer is no.

125
00:06:41.740 --> 00:06:43.320
This is the way you do it.

126
00:06:44.520 --> 00:06:48.460
And the reason for it is that all

127
00:06:48.460 --> 00:06:50.400
the different models do it in different ways,

128
00:06:50.480 --> 00:06:52.660
and they don't really want to settle in

129
00:06:52.660 --> 00:06:57.460
on one specific way with a string here

130
00:06:57.460 --> 00:06:59.720
or because others need to, say, max allow

131
00:06:59.720 --> 00:07:01.940
tokens and stuff like that.

132
00:07:01.940 --> 00:07:04.680
So right now, they have said, we're not

133
00:07:04.680 --> 00:07:07.420
going to fix it, change it.

134
00:07:07.680 --> 00:07:09.380
Again, there's nothing wrong.

135
00:07:09.520 --> 00:07:10.340
It's just Composon.

136
00:07:13.160 --> 00:07:17.220
So this is what we need to live

137
00:07:17.220 --> 00:07:17.460
with.

138
00:07:17.960 --> 00:07:20.980
And, of course, what I have done in

139
00:07:20.980 --> 00:07:23.360
my case is I have extracted this away

140
00:07:23.360 --> 00:07:26.720
because I'm beginning to find more and more

141
00:07:26.720 --> 00:07:31.120
things that are not strange, but Composon with

142
00:07:31.120 --> 00:07:34.000
the Create AI Agent, there's multiple ways of

143
00:07:34.000 --> 00:07:35.900
doing it and stuff like that.

144
00:07:36.640 --> 00:07:39.520
So I have made my own Create AI

145
00:07:39.520 --> 00:07:43.620
Agent for Azure OpenAI and for OpenAI.

146
00:07:44.780 --> 00:07:46.360
And these are in the repo as well

147
00:07:46.360 --> 00:07:47.900
if you want to copy them.

148
00:07:48.420 --> 00:07:50.300
And at some point, I might release them

149
00:07:50.300 --> 00:07:51.340
as a NuGet package.

150
00:07:52.380 --> 00:07:54.480
But they're down here in my extensions.

151
00:07:54.480 --> 00:07:58.820
And what I have made is just a

152
00:07:58.820 --> 00:08:00.360
copy of them with a new name.

153
00:08:02.960 --> 00:08:05.440
And here, I'm putting in the reasoning effort

154
00:08:05.440 --> 00:08:09.440
so we can abstract this away.

155
00:08:10.280 --> 00:08:13.280
And I also fixed that there is actually

156
00:08:13.280 --> 00:08:15.780
a bug in the current one.

157
00:08:15.980 --> 00:08:19.060
If you set both reasoning and tools at

158
00:08:19.060 --> 00:08:21.260
the same time, this is fixing that as

159
00:08:21.260 --> 00:08:21.440
well.

160
00:08:23.060 --> 00:08:25.520
So annoying we need to have this, but

161
00:08:25.520 --> 00:08:29.020
I understand the team needs to be generic

162
00:08:29.020 --> 00:08:31.640
and work with every single AI model out

163
00:08:31.640 --> 00:08:35.679
there while I can make these for Azure

164
00:08:35.679 --> 00:08:38.360
AI and OpenAI which work the same.

165
00:08:40.659 --> 00:08:42.820
So in my case here, I can then

166
00:08:42.820 --> 00:08:44.280
– and this is what I'm going to

167
00:08:44.280 --> 00:08:46.800
do in my code – is I'm going

168
00:08:46.800 --> 00:08:50.580
to use these new overloads of Create AI

169
00:08:50.580 --> 00:08:54.360
Agents in order to get something simple as

170
00:08:54.360 --> 00:08:54.700
this.

171
00:08:55.300 --> 00:08:57.500
And it will give exactly the same answer

172
00:08:57.500 --> 00:08:58.860
as this one up here.

173
00:09:01.620 --> 00:09:05.420
So depending on what you want to do,

174
00:09:05.940 --> 00:09:07.520
you use yours.

175
00:09:08.080 --> 00:09:10.320
But just be aware that if you use

176
00:09:10.320 --> 00:09:13.760
it raw, it is a bit extra to

177
00:09:13.760 --> 00:09:16.060
write in here in order to do it.

178
00:09:16.060 --> 00:09:20.720
So do some kind of abstraction away from

179
00:09:20.720 --> 00:09:20.880
it.

180
00:09:20.920 --> 00:09:25.600
Make a helper method, extension method, in my

181
00:09:25.600 --> 00:09:25.840
mind.

182
00:09:26.900 --> 00:09:28.680
But that's everything there is to reasoning.

183
00:09:29.040 --> 00:09:31.400
The most important thing is, of course, that

184
00:09:31.400 --> 00:09:34.640
you remember that any reasoning model by default

185
00:09:34.640 --> 00:09:35.460
is medium.

186
00:09:36.420 --> 00:09:39.600
And if you don't do anything about this,

187
00:09:39.720 --> 00:09:43.160
this will cost your users a lot of

188
00:09:43.160 --> 00:09:45.980
time and you a lot of money based

189
00:09:45.980 --> 00:09:50.690
on the tokens if you don't are aware

190
00:09:50.690 --> 00:09:51.430
of it at least.

191
00:09:51.970 --> 00:09:52.730
So we're done.

192
00:09:53.270 --> 00:09:53.710
Thank you.

193
00:09:53.970 --> 00:09:54.790
See you in the next one.
