WEBVTT

1
00:00:00.000 --> 00:00:06.000
Hi, and welcome to this AI and C-Sharp video on the Microsoft Agent Framework.

2
00:00:06.000 --> 00:00:12.000
Today we're going to revisit a topic that is old to me,

3
00:00:12.000 --> 00:00:16.000
but new to the Agent Framework, and that is agent skills.

4
00:00:16.000 --> 00:00:21.000
So agent skills have now an official implementation,

5
00:00:21.000 --> 00:00:25.000
while I, a couple of months ago, made my own implementation.

6
00:00:25.000 --> 00:00:29.000
So we're going to look at Agent Framework's new implementation,

7
00:00:29.000 --> 00:00:34.000
and compare it a little to what I have, if I need to remove it again,

8
00:00:34.000 --> 00:00:40.000
or if the official implementation lacks some features.

9
00:00:40.000 --> 00:00:42.000
So let's get into it.

10
00:00:42.000 --> 00:00:46.000
And just a reminder of what agent skills are.

11
00:00:46.000 --> 00:00:54.000
It's a new format that is a complementary to MCP servers,

12
00:00:54.000 --> 00:01:01.000
again, invented by Anthropic and open-sourced on agentskills.io.

13
00:01:01.000 --> 00:01:07.000
And it's a simple and open format that gives agents capabilities and expertise.

14
00:01:07.000 --> 00:01:13.000
And it's mostly for local agents because it requires a folder structure.

15
00:01:13.000 --> 00:01:17.000
So, for example, if you have a folder called agent skills,

16
00:01:17.000 --> 00:01:21.000
and you add three folders, employee handbook, secret formulas,

17
00:01:21.000 --> 00:01:23.000
and speak like a pirate.

18
00:01:23.000 --> 00:01:27.000
Each one of these needs to have a skill.md.

19
00:01:27.000 --> 00:01:29.000
That's part of the spec.

20
00:01:29.000 --> 00:01:33.000
And inside such a skill, you have a front matter up here.

21
00:01:33.000 --> 00:01:38.000
We have a front matter that needs to have exactly the same name here as here

22
00:01:38.000 --> 00:01:41.000
in order to be a valid skill.

23
00:01:41.000 --> 00:01:45.000
Then you give it a description, and then you have an instruction body.

24
00:01:46.000 --> 00:01:53.000
And what happens is that LLMs and AI in general

25
00:01:53.000 --> 00:01:58.000
needs to get the simple description and name,

26
00:01:58.000 --> 00:02:03.000
and then on demand load the rest of the skill

27
00:02:03.000 --> 00:02:06.000
so it doesn't cost too many tokens if not used.

28
00:02:07.000 --> 00:02:12.000
So we can just have a skill file, but we can also have extra files

29
00:02:12.000 --> 00:02:16.000
that can be reference files, other MD files, for example,

30
00:02:16.000 --> 00:02:20.000
assets in terms of pictures and stuff like that,

31
00:02:20.000 --> 00:02:24.000
but also scripts, in this case a Python script here.

32
00:02:25.000 --> 00:02:29.000
So, the idea is that you have this folder,

33
00:02:29.000 --> 00:02:35.000
you load it up into the system, and then you take it...

34
00:02:36.000 --> 00:02:40.000
AI are going to use it in various different ways.

35
00:02:40.000 --> 00:02:45.000
We'll see how Alien Framework does it in one second.

36
00:02:45.000 --> 00:02:51.000
Because, if we have a look, it's under AI context provider

37
00:02:51.000 --> 00:02:55.000
in my sample repo because it's implemented using that.

38
00:02:56.000 --> 00:02:59.000
And I have this test folder as we just saw

39
00:02:59.000 --> 00:03:04.000
with the employee handbook, speak like a pirate,

40
00:03:04.000 --> 00:03:08.000
and a script for some secret formula.

41
00:03:10.000 --> 00:03:14.000
And the way Alien Framework have implemented it

42
00:03:14.000 --> 00:03:20.000
is they have a context provider called FileAgentSkillsProvider.

43
00:03:21.000 --> 00:03:24.000
And the only real thing we need to give to it is

44
00:03:24.000 --> 00:03:31.000
a list of different parts where you have skills.

45
00:03:32.000 --> 00:03:33.000
You can give some options.

46
00:03:33.000 --> 00:03:36.000
They're fairly limited and quite buggy at the moment.

47
00:03:37.000 --> 00:03:41.000
You should be able to change the instruction prompt,

48
00:03:42.000 --> 00:03:46.000
but if you try to do it, it just doesn't add a prompt

49
00:03:46.000 --> 00:03:49.000
for some reason, and breaks down.

50
00:03:49.000 --> 00:03:52.000
So there's some bug around that, but at some point,

51
00:03:52.000 --> 00:03:56.000
it should be able to help you set the instructions.

52
00:03:59.000 --> 00:04:03.000
Once we have that, we don't need to do anything more.

53
00:04:03.000 --> 00:04:06.000
In my case, I'm just adding a little extra tool

54
00:04:06.000 --> 00:04:10.000
to run Python scripts because that's not part of the spec,

55
00:04:10.000 --> 00:04:13.000
to be able to run scripts at the moment.

56
00:04:13.000 --> 00:04:15.000
A little more on that later.

57
00:04:16.000 --> 00:04:21.000
Also, to show this, I'm showing the raw call,

58
00:04:21.000 --> 00:04:23.000
and I'm doing tool calling middleware

59
00:04:23.000 --> 00:04:25.000
because what happens behind the scenes

60
00:04:25.000 --> 00:04:31.000
is that agent skills are on-the-fly injected skills

61
00:04:31.000 --> 00:04:36.000
in terms of two skills, one to tell what skills there are,

62
00:04:38.000 --> 00:04:43.000
to read a specific skill, and another to go in

63
00:04:43.000 --> 00:04:46.000
and read resources for a skill.

64
00:04:47.000 --> 00:04:49.000
So if I run this...

65
00:04:53.000 --> 00:04:56.000
So before that, I'm just getting an agent,

66
00:04:56.000 --> 00:04:59.000
and after that, I'm just using a...

67
00:05:01.000 --> 00:05:04.000
simple chat loop with sessions.

68
00:05:04.000 --> 00:05:09.000
So if I'm saying, the answer, hybrid mode,

69
00:05:12.000 --> 00:05:16.000
what we will see is that, first of all,

70
00:05:17.000 --> 00:05:23.000
the call, we can see that on-the-fly instructions are being added,

71
00:05:23.000 --> 00:05:26.000
that it has these skills,

72
00:05:28.000 --> 00:05:32.500
one for each of the skills, the employee skill, the pirate skill,

73
00:05:32.500 --> 00:05:35.000
the secret formula skill, and so on.

74
00:05:35.000 --> 00:05:39.000
So this text is what you should be able to manipulate,

75
00:05:39.000 --> 00:05:43.000
but at the moment, it doesn't seem like you can do that.

76
00:05:45.500 --> 00:05:49.000
Then, we can see that what happens behind the scenes

77
00:05:49.000 --> 00:05:53.000
is we have a load skill function or tool,

78
00:05:53.000 --> 00:05:57.000
and we have a read skill resource option.

79
00:06:01.000 --> 00:06:06.000
So that's what is there, and we get our calls back, and so on,

80
00:06:06.000 --> 00:06:09.000
and we see that we end up in pirate mode.

81
00:06:13.000 --> 00:06:16.000
We could also then say...

82
00:06:20.000 --> 00:06:28.000
isParentallyPaid,

83
00:06:28.000 --> 00:06:32.000
which is an HR skill.

84
00:06:32.000 --> 00:06:36.000
So here we are using one of the reference documents,

85
00:06:36.000 --> 00:06:40.000
and if we have a look again...

86
00:06:44.000 --> 00:06:46.000
to go down here...

87
00:06:49.500 --> 00:06:56.000
So here we used a read skill here, and it comes back.

88
00:07:00.000 --> 00:07:04.000
If we try to call the secret formula with the Python script,

89
00:07:04.000 --> 00:07:06.000
it tends to not work.

90
00:07:07.000 --> 00:07:14.000
Read the secret formula...

91
00:07:15.000 --> 00:07:23.000
in that it fails because it can only give the relevant part to the script,

92
00:07:23.000 --> 00:07:26.000
and not the real part.

93
00:07:26.000 --> 00:07:30.000
The relevant part is here, but it doesn't know

94
00:07:30.000 --> 00:07:34.000
where that is in relation to the rest of the system.

95
00:07:34.000 --> 00:07:39.000
So there's some limitations. I know they are working on script support.

96
00:07:39.000 --> 00:07:43.000
I don't know how they want to have script support,

97
00:07:44.000 --> 00:07:48.000
specifically, but right now it's not really possible

98
00:07:48.000 --> 00:07:52.000
compared to my own setup here,

99
00:07:52.000 --> 00:07:55.000
where I have Agent Skills, the same folders,

100
00:07:55.000 --> 00:08:00.000
and up here we are using my interpretation of this,

101
00:08:00.000 --> 00:08:03.000
which is Agent Skills, where we can get the instructions,

102
00:08:03.000 --> 00:08:06.000
and get them as tools.

103
00:08:06.000 --> 00:08:11.000
I also expose them as tools, one for what tools are available,

104
00:08:11.000 --> 00:08:15.000
and three tools, what tools are available,

105
00:08:15.000 --> 00:08:18.000
read a specific tool, and read a resource.

106
00:08:18.000 --> 00:08:22.000
So I'm using three tools instead of two tools for that.

107
00:08:23.000 --> 00:08:30.000
So let's do a little side comparison in what we can do in this.

108
00:08:30.000 --> 00:08:33.000
So the official implementation is here,

109
00:08:33.000 --> 00:08:40.000
and my version of agentskills.net, which is the NuGet package,

110
00:08:40.000 --> 00:08:43.000
which is also part of my Agent Framework Toolkit,

111
00:08:43.000 --> 00:08:48.000
which is a set of NuGet packages on top of Agent Framework.

112
00:08:48.000 --> 00:08:53.000
So the new one is built in, it's from Release Candidate 2 and higher,

113
00:08:53.000 --> 00:08:58.000
and mine is a NuGet package that you need to install if you want to.

114
00:08:59.000 --> 00:09:02.000
The delivery mode that they have chosen is AI contact providers,

115
00:09:02.000 --> 00:09:06.000
but behind the scenes they are simply just injecting extra instructions

116
00:09:06.000 --> 00:09:09.000
and adding two tools.

117
00:09:09.000 --> 00:09:13.000
I do sort of the similar, I can do two generic tools,

118
00:09:13.000 --> 00:09:17.000
but I also have an option to have each skill be its own tool

119
00:09:17.000 --> 00:09:20.000
if you rather want that for more discovery.

120
00:09:21.000 --> 00:09:28.000
And then you can get the instructions back as text that you can paste in.

121
00:09:28.000 --> 00:09:31.000
So I'm not using AI contact providers for this,

122
00:09:31.000 --> 00:09:34.000
because I want to have more control here.

123
00:09:34.000 --> 00:09:36.000
Here we don't really have any control,

124
00:09:36.000 --> 00:09:39.000
in that we can't really control the instructions,

125
00:09:39.000 --> 00:09:42.000
where they are compared to the rest of the instructions,

126
00:09:42.000 --> 00:09:47.000
and here we just get every tool no matter what.

127
00:09:47.000 --> 00:09:49.000
Here we can actually control it.

128
00:09:51.000 --> 00:09:55.000
So instructions are automatic in the official implementation,

129
00:09:55.000 --> 00:09:58.000
while it's configurable in my version.

130
00:09:58.000 --> 00:10:01.000
Logging-wise, they have standard iLoggers,

131
00:10:01.000 --> 00:10:05.000
while I instead give a log error object back

132
00:10:05.000 --> 00:10:09.000
on which of the skills perhaps couldn't load.

133
00:10:10.000 --> 00:10:14.000
So that's a matter of opinion, you could put that into a logger yourself.

134
00:10:14.000 --> 00:10:17.000
But else they have more out-of-the-box logging.

135
00:10:19.000 --> 00:10:23.000
Then there's skill validation, if the skill actually follows the format.

136
00:10:23.000 --> 00:10:27.000
And it seems that they don't care at all.

137
00:10:27.000 --> 00:10:30.000
There is these rules, for example,

138
00:10:30.000 --> 00:10:34.000
that this name needs to be exactly the same name as this.

139
00:10:34.000 --> 00:10:38.000
You can write something different here, and it will still load as a skill.

140
00:10:39.000 --> 00:10:46.000
I put in that you can choose the strict mode and get errors back if it's wrong,

141
00:10:46.000 --> 00:10:48.000
or you can go into loose mode.

142
00:10:48.000 --> 00:10:52.000
That is sort of what the official implementation does.

143
00:10:53.000 --> 00:10:58.000
Then a huge limitation they have is that you just get these skills,

144
00:10:58.000 --> 00:11:00.000
and you just get these tools.

145
00:11:00.000 --> 00:11:02.000
But you can't really get them back as objects,

146
00:11:02.000 --> 00:11:08.000
so you're not in any way able to do a subset of the skills.

147
00:11:08.000 --> 00:11:13.000
It's all or nothing, and you can't get individual instructions

148
00:11:13.000 --> 00:11:17.000
if you want to use those together with some more curated parts

149
00:11:17.000 --> 00:11:20.000
of your instructions file.

150
00:11:20.000 --> 00:11:26.000
That is fully configurable in my implementation

151
00:11:26.000 --> 00:11:30.000
because you get each skill back as a .NET object

152
00:11:30.000 --> 00:11:33.000
with all the information in it.

153
00:11:34.000 --> 00:11:37.000
So that is much simpler to do.

154
00:11:39.000 --> 00:11:43.000
You need to do a little more code, but you have more control over it.

155
00:11:44.000 --> 00:11:48.000
Another huge problem with the official implementation right now

156
00:11:48.000 --> 00:11:52.000
is that it doesn't give the subfile parts.

157
00:11:52.000 --> 00:11:55.000
We saw that with the script not being able to be run

158
00:11:55.000 --> 00:11:59.000
because it just gives back the name and description,

159
00:11:59.000 --> 00:12:04.000
and when it reads it, it's only what's in the body of the Markdown file.

160
00:12:07.000 --> 00:12:10.000
I give the body of the Markdown file back,

161
00:12:10.000 --> 00:12:15.000
but then I also tell, and these files are available for this skill

162
00:12:15.000 --> 00:12:20.000
with the full parts, and that makes AI able to actually read

163
00:12:20.000 --> 00:12:28.000
and execute the various scripts and subassets.

164
00:12:31.000 --> 00:12:36.000
In terms of reading skill metadata, not really something that's too important,

165
00:12:36.000 --> 00:12:41.000
but you can have extra things like license and metadata

166
00:12:41.000 --> 00:12:44.000
on an official agent skill.

167
00:12:45.000 --> 00:12:49.000
The official implementation only takes out the name and description and uses it,

168
00:12:49.000 --> 00:12:55.000
while I grab everything that you get back in the objects if you need to.

169
00:12:55.000 --> 00:13:00.000
Honestly, you wouldn't probably need this for anything at the moment.

170
00:13:02.000 --> 00:13:05.000
And finally, the ability to actually shoot scripts.

171
00:13:05.000 --> 00:13:08.000
None of us actually have it,

172
00:13:08.000 --> 00:13:12.000
because each script execution can be very different.

173
00:13:12.000 --> 00:13:16.000
How do you want to run your Python,

174
00:13:16.000 --> 00:13:20.000
what kind of environment, what kind of restrictions, and so on,

175
00:13:20.000 --> 00:13:25.000
and how to run a BAT file, how to run an executable, and so on.

176
00:13:25.000 --> 00:13:28.000
You need to bring your own tools, in my opinion.

177
00:13:29.000 --> 00:13:33.000
Right now, in the official implementation, there's no support at all.

178
00:13:33.000 --> 00:13:38.000
You could bring your own tool, but again, the problem is the subfile parts

179
00:13:38.000 --> 00:13:43.000
that are not exposed to the LLM,

180
00:13:43.000 --> 00:13:48.000
which causes it to actually not be able to call it correctly.

181
00:13:50.000 --> 00:13:55.000
Again, they have an open issue that they want to enhance it with that,

182
00:13:55.000 --> 00:13:59.000
but I don't see a way they can actually make it easy

183
00:13:59.000 --> 00:14:02.000
and cheap token-wise as well.

184
00:14:03.000 --> 00:14:07.000
So, right now, it is cool that they added it,

185
00:14:07.000 --> 00:14:14.000
but it's a very, very limited set of features that they gave,

186
00:14:14.000 --> 00:14:20.000
and I'm really, really surprised that they don't even validate the skills correctly.

187
00:14:21.000 --> 00:14:24.000
So, that is my review so far.

188
00:14:24.000 --> 00:14:30.000
I will definitely not use the official one yet, when I have the other one,

189
00:14:31.000 --> 00:14:38.000
because I'm simply just having much more level of customization here.

190
00:14:38.000 --> 00:14:44.000
But perhaps, over time, the official implementation might add more features

191
00:14:44.000 --> 00:14:48.000
and come to a point where we can actually throw away

192
00:14:48.000 --> 00:14:52.000
third-party skills like mine.

193
00:14:52.000 --> 00:14:57.000
But for now, I would definitely not use the internal implementation.

194
00:14:57.000 --> 00:15:01.000
So, that's everything. See you in the next one.