WEBVTT

1
00:00:00.000 --> 00:00:04.800
Hi, and welcome back to this AI and C-Sharp video on the Microsoft Agent

2
00:00:04.800 --> 00:00:10.359
framework and our Reasoning Deep Dive mini-series. This is the third

3
00:00:10.359 --> 00:00:15.680
installment of this, where we had the introduction as part one and OpenAI as

4
00:00:15.680 --> 00:00:23.879
part two. So now we're going to look into how Google does reasoning. So we have

5
00:00:23.879 --> 00:00:30.120
exactly the same setup as OpenAI here, but now we're just using Google instead.

6
00:00:30.120 --> 00:00:41.080
And all Google's modern models, meaning 2.5 and forward, can do thinking. And they

7
00:00:41.080 --> 00:00:46.319
do thinking in a slightly different manner, in that they are more auto

8
00:00:46.319 --> 00:00:55.880
thinking and much more easy to work with. I really like the thinking approach to

9
00:00:55.880 --> 00:01:03.560
the GemIIni approach, especially the version 3 of them. Because what I have

10
00:01:03.560 --> 00:01:10.199
here is, I first have a baseline of GemIIni 2.5 with auto thinking on,

11
00:01:10.199 --> 00:01:18.360
meaning that we just create the Google GenAI client, make the chat client, and

12
00:01:18.360 --> 00:01:25.160
use GemIIni 2.5 flash. We could also use the pro model, but they work in the

13
00:01:25.160 --> 00:01:31.000
same manner. I'm just using it so we don't sit and wait too long. But if we do

14
00:01:31.000 --> 00:01:36.120
this for the GemIIni and we have our question, which is the answer back in

15
00:01:36.120 --> 00:01:40.879
three words of what's the capital of France and how many people live there, in

16
00:01:40.879 --> 00:01:46.319
order for it to have a little of it to think about. So if we go in here and we

17
00:01:46.319 --> 00:01:54.440
see how is GemIIni 2.5, in this case flash, with just out of the box and

18
00:01:54.440 --> 00:02:02.000
nothing more. So if I press F10, we can see that it's still thinking quite a lot

19
00:02:02.000 --> 00:02:12.119
about it and we can see that it used 715 tokens to do it. This is quite high and I

20
00:02:12.119 --> 00:02:16.440
would definitely go in and steer this model some more and we will see that

21
00:02:16.440 --> 00:02:28.600
happening a little later on. But if we do the same baseline with GemIIni 3, so

22
00:02:28.600 --> 00:02:34.520
exactly the same code as baseline 2.5, but the only thing different being that

23
00:02:34.520 --> 00:02:45.679
it's now the GemIIni 3 flash version. If we do that and we just press F10 here

24
00:02:45.679 --> 00:02:55.119
to run that code, we'll see that it's much faster to answer. Because it has

25
00:02:55.119 --> 00:03:01.080
internally figured out better that this was not a hard question and for that

26
00:03:01.080 --> 00:03:09.399
reason it didn't use a lot of reasoning. It still used some, but it didn't need to

27
00:03:09.399 --> 00:03:19.039
do a lot because of that. So GemIIni is much more advanced, or GemIIni 3 is much

28
00:03:19.119 --> 00:03:25.320
more advanced than 2.5 when it comes to this. This is to a level where I actually

29
00:03:25.320 --> 00:03:30.399
think we don't really need to steer too much in general. You might want to say,

30
00:03:30.399 --> 00:03:36.600
oh it's a really really advanced thing, but else this is sort of acceptable that

31
00:03:36.600 --> 00:03:47.639
we use quite a lot of output tokens for reasoning. Then if we want to actually

32
00:03:47.759 --> 00:03:53.679
control stuff and in GemIIni 2.5 we do that by setting something called a

33
00:03:53.679 --> 00:03:59.399
thinking budget. Because in OpenAR you set low, medium, high and so on. In 2.5

34
00:03:59.399 --> 00:04:06.440
you set how many tokens you are marked allowed to use. And again we need to go

35
00:04:06.440 --> 00:04:13.919
into this annoying extra step of doing breaking glass way down to the raw

36
00:04:13.960 --> 00:04:19.119
representation factory where we can set the generating content config. And in

37
00:04:19.119 --> 00:04:24.839
that we can set config, thinking config, where we set the thinking budget. And

38
00:04:24.839 --> 00:04:31.200
here we can also control should it include the false back or not. So in

39
00:04:31.200 --> 00:04:40.040
OpenAI it's between auto, concise and detailed. In Google you need to just say

40
00:04:40.040 --> 00:04:45.559
if you want it or not. You can't really control. There's also something called

41
00:04:45.559 --> 00:04:51.160
thinking level, but that is not supported by GemIIni 2.5. It's only something for

42
00:04:51.160 --> 00:04:57.920
tree that we will see in one second. So we go in and set a number here. And that

43
00:04:57.920 --> 00:05:04.040
number can mean different things. If you set it to minus one then it means auto,

44
00:05:04.040 --> 00:05:09.559
meaning the same as we saw up here without us setting anything. We can also

45
00:05:09.559 --> 00:05:17.119
set it to zero, meaning off, meaning it will not use any reasoning at all. Or we

46
00:05:17.119 --> 00:05:21.720
set it to a number higher than zero and that would be how many tokens we max

47
00:05:21.720 --> 00:05:28.959
allow for reasoning. In this case we're setting a high number, 2,000. And the

48
00:05:28.959 --> 00:05:34.760
reason for that is if we run this and let it run.

49
00:05:35.760 --> 00:05:45.200
And again it will think a lot like the first one. And let me quickly make it

50
00:05:45.200 --> 00:05:53.480
move down here so we can see. This ended up using more tokens than the first one,

51
00:05:53.480 --> 00:05:59.559
but it didn't use all the tokens. So there's still some auto reasoning in

52
00:05:59.640 --> 00:06:08.119
here that it says, okay I'm allowed to use 2,000, but I don't need to because I

53
00:06:08.119 --> 00:06:15.480
can answer this question without using every single token at my disposal. So

54
00:06:15.480 --> 00:06:23.399
this just means it can max allow 2,000 and it thought in this case that 1165

55
00:06:23.399 --> 00:06:29.320
was the right number. And we can see the reasoning text coming back and it works

56
00:06:29.320 --> 00:06:36.880
exactly the same as OpenAI, the Microsoft agent framework, but it's really good at

57
00:06:36.880 --> 00:06:44.320
packaging this in to always go into these text reasoning contents, which is

58
00:06:44.320 --> 00:06:51.279
really nice. So we saw it thought a little less than OpenAI if you saw that

59
00:06:51.279 --> 00:06:59.559
video, but it still came back with the correct answer. If we do the same,

60
00:06:59.559 --> 00:07:08.519
beginning to control Gemini 3, we now do it in a slightly different way.

61
00:07:08.519 --> 00:07:12.880
We do it in a slightly different manner because now we are not setting the

62
00:07:12.880 --> 00:07:18.559
thinking budget, but we are setting the thinking level. We can still set the

63
00:07:18.559 --> 00:07:25.079
thinking budget for backwards compatibility reasons, but we are

64
00:07:25.079 --> 00:07:30.760
recommended not to by the Google documentation. And if we set both, it

65
00:07:30.760 --> 00:07:41.920
will actually produce an exception. So in Gemini 3 model family, we set the

66
00:07:41.920 --> 00:07:50.079
thinking level instead. And the Gemini 3 Pro can only set high and low. As I read

67
00:07:50.079 --> 00:07:55.359
the documentation, we're not done yet, so they might want to have more

68
00:07:55.359 --> 00:08:00.519
levels at some point. But for the high level, you can only choose between high

69
00:08:00.519 --> 00:08:05.880
and low. While the flash, you can set between high, low, medium, and I think it's

70
00:08:05.880 --> 00:08:12.440
minimalist way. So this is a nice thing to see that this is slowly going on

71
00:08:12.440 --> 00:08:20.119
because it sort of shows a way toward perhaps an industry standard that these

72
00:08:20.119 --> 00:08:27.519
high level, medium, and so on is the way to go for an industry standard. So we also

73
00:08:27.519 --> 00:08:33.599
can get rid of needing to do these very, very elaborate ways of setting these

74
00:08:33.599 --> 00:08:45.000
values. But we're not there yet, and if I do this in high mode, we will see some

75
00:08:45.000 --> 00:08:52.119
interesting thing. It will take longer, but it will still take not too much longer.

76
00:08:52.119 --> 00:08:59.400
So if we see it went in and did its reasoning, we get our reasoning text. You

77
00:08:59.400 --> 00:09:04.919
can pause the video if you want to read what it thought about. And then we get

78
00:09:04.919 --> 00:09:11.200
our output where we can see that despite us giving, hey, you're allowed to think a

79
00:09:11.200 --> 00:09:18.599
lot about this, Gemini 3 is intelligent enough to know that, yeah, I don't need to.

80
00:09:18.599 --> 00:09:24.280
You might have given me the allowance to do it, but I don't need to because I know

81
00:09:24.400 --> 00:09:32.760
the answer pretty well. So I only use a little more than we did up here, and then

82
00:09:32.760 --> 00:09:38.520
I was sure I'm thinking as the model right now, and just give the answer back.

83
00:09:38.520 --> 00:09:47.320
So no matter how much we even steer Gemini, we still get some fairly quick

84
00:09:47.320 --> 00:09:56.400
answers back, which I really like. Moving on, I can do exactly the same in the

85
00:09:56.400 --> 00:10:00.840
agent framework toolkit, and so can you if you go and grab the free NuGet

86
00:10:00.840 --> 00:10:08.520
packages. And it works exactly the same. As we saw with OpenAI, we make a

87
00:10:08.520 --> 00:10:13.719
factory, in this case a Google agent factory. And to get a Google agent, we can

88
00:10:13.760 --> 00:10:17.960
set a thinking budget, and we can have a thinking level. But again, the thinking

89
00:10:17.960 --> 00:10:25.200
level is not supported, or both of them at the same time is not supported. But

90
00:10:25.200 --> 00:10:32.000
just a more convenient way of doing exactly the same thing as we saw before.

91
00:10:32.000 --> 00:10:39.119
And again, I have the reasoning.getReasoningContent extension method,

92
00:10:39.119 --> 00:10:46.239
so we don't need to sit and do all this annoying for loops as well. But we end up

93
00:10:46.239 --> 00:10:56.640
with exactly the same way of responding back. And the same goes for Gemini 3,

94
00:10:56.640 --> 00:11:05.280
where we can set the thinking level as we saw before. And it comes back fairly

95
00:11:05.280 --> 00:11:15.200
quickly, and we can just get our reasoning. So that's Gemini. I would say

96
00:11:15.200 --> 00:11:26.239
Gemini is probably the most likable way this works, and their way of trying to go

97
00:11:26.239 --> 00:11:32.440
to a same model as OpenAI also goes for this interesting standard way. But

98
00:11:32.440 --> 00:11:37.200
they are really the most intelligent right now, as I see it, when it comes to

99
00:11:37.200 --> 00:11:41.599
reasoning. Because OpenAI is more crude, you actually need to go and say, hey,

100
00:11:41.599 --> 00:11:48.760
don't think so much, even on the simple questions, while Google just goes in and

101
00:11:48.760 --> 00:11:54.840
say, yeah, it's okay, I'm allowed to think a lot, but I don't need to. So for that

102
00:11:54.840 --> 00:11:59.679
reason, I would say they are in front when it comes to the reasoning now,

103
00:11:59.880 --> 00:12:07.119
compared to OpenAI that invented it. So that's everything for Google. In the last

104
00:12:07.119 --> 00:12:12.320
installment of this deep dive, we are going to show

105
00:12:12.719 --> 00:12:17.119
we're going to show Anthropic.