WEBVTT

1
00:00:00.000 --> 00:00:08.199
Hi, and welcome to this last installment of the Reasoning Deep Dive mini-series for AI

2
00:00:08.199 --> 00:00:11.720
and C Sharp and the Microsoft Agent Framework.

3
00:00:11.720 --> 00:00:16.920
We have already looked at an intro, OpenAI and Google and how they do it.

4
00:00:16.920 --> 00:00:23.120
Now we're going to look into Entropic and their Claude models on how they do it.

5
00:00:23.120 --> 00:00:35.319
So let's jump into Visual Studio and go to the ReasoningEffort.Entropic sample.

6
00:00:35.319 --> 00:00:43.799
So Entropic, all modern models can do thinking, there's nothing special, they don't have reasoning

7
00:00:43.799 --> 00:00:48.159
models and non-reasoning models, all of them can do it.

8
00:00:48.159 --> 00:00:55.639
And in this one, I'm going to write a little more advanced question.

9
00:00:55.639 --> 00:01:04.400
And the main reason for that is to show you how even the simpler questions here, it's

10
00:01:04.400 --> 00:01:12.879
a little more complex, can go wrong when it comes to not having any thinking.

11
00:01:12.879 --> 00:01:19.519
Because out-of-the-box Entropic don't have any thinking with this baseline.

12
00:01:19.519 --> 00:01:25.639
So if I do this, and you can see the question is now, what is the capital of the country

13
00:01:25.639 --> 00:01:33.400
where the Eiffel Tower sits, and how is it to live there, and how many people live there?

14
00:01:33.400 --> 00:01:35.360
Answer back in max three words.

15
00:01:35.360 --> 00:01:43.839
So if we look at the baseline, we are again using these chat options, but that is not

16
00:01:43.839 --> 00:01:49.639
because we're setting anything default here, but because we always need to set this max

17
00:01:49.639 --> 00:01:54.080
output tokens on Claude models, which is really annoying.

18
00:01:54.080 --> 00:02:00.019
But again, I will later show Agent Framework Toolkit, that makes it easier.

19
00:02:00.019 --> 00:02:06.500
But we need to set it, and for that reason, we set this up, but else we just run it and

20
00:02:06.500 --> 00:02:09.860
not set anything thinking.

21
00:02:09.860 --> 00:02:20.380
And if we do that and let it run, we'll see it's, now it runs, and you can see it's very

22
00:02:20.380 --> 00:02:26.179
fast to answer back, zero reasoning.

23
00:02:26.179 --> 00:02:31.660
And that is because Entropic can't report reasoning back, so this will be zero all the

24
00:02:31.660 --> 00:02:37.539
way through, no matter what we're reasoning, because they simply don't report it back in

25
00:02:37.539 --> 00:02:40.820
their LLM calls.

26
00:02:40.820 --> 00:02:46.100
And we might see, oh, that's good, out of the box, there's no reasoning, I can choose

27
00:02:46.100 --> 00:02:50.000
whenever I want to have it more advanced.

28
00:02:50.000 --> 00:02:55.380
But you already need to have it more advanced, because look at the answer back.

29
00:02:55.380 --> 00:02:59.619
It says Paris, France, expensive, vibrant, 2.2 million.

30
00:02:59.619 --> 00:03:08.179
It's not the 2.2, the other said 2.1, but it's not following the rules.

31
00:03:08.179 --> 00:03:14.179
And Entropic is not too good at following rules in general, but it actually uses more

32
00:03:14.179 --> 00:03:16.979
than three words.

33
00:03:16.979 --> 00:03:21.940
So you can see there's five words here.

34
00:03:21.940 --> 00:03:28.860
So even with no thinking and a little more hard question, had I not put in the extra

35
00:03:28.860 --> 00:03:34.979
thing, how it is to live there and so on, this would still have used the three words.

36
00:03:34.979 --> 00:03:43.500
But this is where reasoning comes in, because if we run this again, and this works a little

37
00:03:43.500 --> 00:03:52.139
like the Gemini 2.5, where we set max allowed tokens, in this case, it's called budget tokens.

38
00:03:52.139 --> 00:04:00.699
So if we do that, we, next to the max tokens, also set our thinking, where we need to set

39
00:04:00.699 --> 00:04:09.100
some message creation parameters, set the max tokens, set the model again, so we need

40
00:04:09.100 --> 00:04:12.580
to set the model twice now.

41
00:04:12.660 --> 00:04:18.899
And then we have something called thinking config param, where we need to say it's enabled

42
00:04:18.899 --> 00:04:26.339
and how many tokens we allow, and the minimum is 1,024.

43
00:04:26.339 --> 00:04:37.739
But if we do this, and let's set a breakpoint up here after it has called this, it will

44
00:04:37.739 --> 00:04:40.140
begin to take time.

45
00:04:40.140 --> 00:04:44.619
But now it can actually understand that it needs to do it in three words.

46
00:04:44.619 --> 00:04:54.179
So without reasoning, at least this very cheap model of Claude is too unintelligent to even

47
00:04:54.179 --> 00:05:01.179
follow the rules of answer back in max three words.

48
00:05:01.179 --> 00:05:08.299
But if we give it budget meaning, throw more money at it, it is possible for it to use

49
00:05:08.299 --> 00:05:10.700
the three words.

50
00:05:10.700 --> 00:05:19.100
And the reasoning also shows that, down here it says, but they want it back in three words

51
00:05:19.100 --> 00:05:22.299
max, but let's think about prioritize.

52
00:05:22.299 --> 00:05:30.179
So it already thought about it could answer in a more elaborate way, but oh yeah, it checks

53
00:05:30.179 --> 00:05:35.899
that's three words, that's three words, that's three words, which one of them should I use

54
00:05:35.899 --> 00:05:40.779
and go on and talk back and forth in order to figure it out.

55
00:05:40.779 --> 00:05:49.059
And we see that it's reported back as just output tokens, again, it's a zero reasoning,

56
00:05:49.059 --> 00:05:54.220
but some of these were reasoning, they just don't tell how much is reasoning, how much

57
00:05:54.220 --> 00:05:57.059
is not, which is annoying.

58
00:05:57.059 --> 00:06:03.779
But that's just the way they do things.

59
00:06:03.779 --> 00:06:11.859
So we can do this, and as we can see, even with such a simple question, reasoning is

60
00:06:11.859 --> 00:06:20.940
actually kind of needed in Anthropic, meaning it's kind of annoying that they default go to zero.

61
00:06:20.940 --> 00:06:26.940
Because before we thought, yeah, that's nice, but if they can't even follow rules like this,

62
00:06:26.940 --> 00:06:30.619
we need to begin to put all this in all the time.

63
00:06:30.619 --> 00:06:36.779
And for that reason, again, Agent Framework Toolkit, my NuGet packages on that, make it

64
00:06:36.779 --> 00:06:44.500
very simple, and we make an Anthropic agent, we make the agent, and we just set the budget

65
00:06:44.500 --> 00:06:48.579
tokens here directly.

66
00:06:48.579 --> 00:06:56.619
And again, we can get our reasoning content back with the extension method.

67
00:06:56.619 --> 00:07:01.500
So once it's thought about it, it should hopefully do the exact same.

68
00:07:01.500 --> 00:07:06.760
Now we're going from beautiful to expensive, but fair enough, and it forgot.

69
00:07:06.760 --> 00:07:16.299
So even 2,000 tokens is not enough to remember to actually get back that it also needed to answer the 2.2.

70
00:07:16.299 --> 00:07:24.619
Again, the life of Anthropic, they're intelligent, they're good at coding, but following rules,

71
00:07:24.619 --> 00:07:33.339
that's not their strong suit.

72
00:07:33.339 --> 00:07:40.600
And yes, it comes back still as output tokens instead of reasoning tokens.

73
00:07:40.600 --> 00:07:48.540
So Anthropic is not my preferred choice, to be honest.

74
00:07:48.540 --> 00:07:56.500
And I really like my models to be able to follow rules, and they certainly are not good at it.

75
00:07:56.500 --> 00:08:03.820
Only by throwing a lot of reasoning after it, they begin to become good at it again,

76
00:08:03.820 --> 00:08:04.980
which is annoying.

77
00:08:04.980 --> 00:08:10.179
And again, annoying to set all this up.

78
00:08:10.179 --> 00:08:14.899
Even if we use the Agent Framework Toolkit, we need to put in extra lines just to make

79
00:08:14.899 --> 00:08:18.019
it something that can actually talk.

80
00:08:18.019 --> 00:08:22.179
And now, of course, I'm choosing the very small model.

81
00:08:22.179 --> 00:08:26.179
The Opus and Sonnet models are much more capable.

82
00:08:26.179 --> 00:08:33.900
So of course, you can make an Anthropic agent tell the truth and follow rules, but it just

83
00:08:33.900 --> 00:08:39.780
takes more effort and more money and more time.

84
00:08:39.780 --> 00:08:43.659
So that is all the three models.

85
00:08:43.659 --> 00:08:48.659
If we talk a little about more of the other models, there's things like Grok.

86
00:08:48.659 --> 00:08:57.580
Grok uses OpenAI, but if you try to set the various reasoning efforts on them, it just

87
00:08:57.580 --> 00:08:59.460
throws an exception.

88
00:08:59.460 --> 00:09:06.419
They talk a little about having internal things for setting reasoning, but it seems to not

89
00:09:06.419 --> 00:09:12.460
be possible to make it work together with the Agent Framework.

90
00:09:13.260 --> 00:09:18.619
In the same manner, if you take some things like DeepSeek via OpenRouter or one of the

91
00:09:18.619 --> 00:09:25.859
others that exposes DeepSeek, you can again set these reasoning efforts, but it seems

92
00:09:25.859 --> 00:09:33.820
to just ignore it and use its level of reasoning by itself.

93
00:09:33.820 --> 00:09:41.900
So it doesn't really seem you can do much more than the baseline on, if you at least

94
00:09:41.900 --> 00:09:45.659
want to use the Agent Framework.

95
00:09:45.659 --> 00:09:53.820
So in summary of these four videos, reasoning is important.

96
00:09:53.820 --> 00:09:56.780
Too many people don't know why it's important.

97
00:09:56.780 --> 00:10:04.299
I don't want to make my users sit and wait for the answer back.

98
00:10:04.299 --> 00:10:07.580
I don't want to spend more tokens than needed.

99
00:10:07.580 --> 00:10:15.219
So I really, really want to control it, especially because I'm using default OpenAI, and I really

100
00:10:15.219 --> 00:10:23.780
need to steer that with the minimal and low efforts for most of the things, only bumping

101
00:10:23.780 --> 00:10:27.820
it up when I know it's really, really important.

102
00:10:27.820 --> 00:10:35.900
While if switching to Google, it sounds like I could have a much easier time by its new

103
00:10:35.940 --> 00:10:44.059
auto-reasoning capabilities, which is really, really cool from a technical perspective.

104
00:10:44.059 --> 00:10:52.219
Google seems to be ahead, and then comes OpenAI because it can follow the rules, and then

105
00:10:52.219 --> 00:11:00.700
Anthropic, in my mind, is behind on this in quality of reasoning.

106
00:11:00.700 --> 00:11:04.239
But that's everything from this miniseries.

107
00:11:04.239 --> 00:11:10.799
If you want to see more or have any questions, feel free to put them in the comments. See you around.