WEBVTT

1
00:00:00.000 --> 00:00:05.000
Hi, and welcome to this ANC-Sharp video on the Microsoft Agent Framework.

2
00:00:05.000 --> 00:00:11.000
Today we're going to look into Gemini and its code execution options.

3
00:00:11.000 --> 00:00:15.000
Code execution is something OpenAI already have,

4
00:00:15.000 --> 00:00:17.000
but Gemini have it as well.

5
00:00:17.000 --> 00:00:19.000
So let's go into it.

6
00:00:19.000 --> 00:00:23.000
So let me bring up a sample here,

7
00:00:23.000 --> 00:00:26.000
and that sample being, under Google,

8
00:00:26.000 --> 00:00:30.000
Google Gemini Specific Features, Google Code Execution.

9
00:00:30.000 --> 00:00:35.000
So code execution is a way to run Python

10
00:00:35.000 --> 00:00:39.000
in a sandbox environment up at Google,

11
00:00:39.000 --> 00:00:47.000
and then either use it to do some advanced calculations,

12
00:00:47.000 --> 00:00:50.000
like prime numbers and something like that,

13
00:00:50.000 --> 00:00:52.000
or use it to generate, for example, charts

14
00:00:52.000 --> 00:00:56.000
that you would like to get back as images in the tool.

15
00:00:56.000 --> 00:00:59.000
So I have both here, first being the chart,

16
00:00:59.000 --> 00:01:04.000
and I'm asking make a chart, use the code execution tool

17
00:01:04.000 --> 00:01:11.000
to get the top five countries by the amount of cars they produce per year.

18
00:01:11.000 --> 00:01:17.000
So as we have seen the other tools, special tools from Gemini,

19
00:01:17.000 --> 00:01:21.000
of course we can use normal tool calling,

20
00:01:21.000 --> 00:01:23.000
but if we need these special tools,

21
00:01:23.000 --> 00:01:29.000
we need to go in and do a little of the breaking glass options here,

22
00:01:29.000 --> 00:01:31.000
where we go into the chat client options,

23
00:01:31.000 --> 00:01:34.000
the chat options, and set the raw representation factory

24
00:01:34.000 --> 00:01:37.000
to a generate content config.

25
00:01:37.000 --> 00:01:41.000
If we do that, we can get a tool called code execution.

26
00:01:41.000 --> 00:01:44.000
There's nothing more to it, there's no options or anything,

27
00:01:44.000 --> 00:01:49.000
it's just that we get a sandbox environment that can run Python,

28
00:01:49.000 --> 00:01:52.000
and it can only run Python.

29
00:01:52.000 --> 00:01:56.000
That's very much specified in the documentation.

30
00:01:56.000 --> 00:02:00.000
But it doesn't really matter for us, we can just write it,

31
00:02:00.000 --> 00:02:04.000
and we can just leverage the benefit of it.

32
00:02:04.000 --> 00:02:09.000
In this case, if we run down here to letting it run,

33
00:02:09.000 --> 00:02:15.000
it will now go in, use its world knowledge, find the top five countries,

34
00:02:15.000 --> 00:02:23.000
make some Python, and then generate an image.

35
00:02:23.000 --> 00:02:29.000
And we can get the data back here, so we see the top five countries

36
00:02:29.000 --> 00:02:32.000
according to 2023,

37
00:02:32.000 --> 00:02:36.000
and we can see that we spent quite a lot of tokens.

38
00:02:36.000 --> 00:02:40.000
Let's come back to those at the end.

39
00:02:40.000 --> 00:02:46.000
If we want to know a little more about how this was taken

40
00:02:46.000 --> 00:02:50.000
from the world knowledge, but what Python was executed,

41
00:02:50.000 --> 00:02:55.000
and how we can get the image, we need to go in to take the response

42
00:02:55.000 --> 00:02:58.000
and get the raw representation factory as a chat response,

43
00:02:58.000 --> 00:03:02.000
and after that into a generate content response.

44
00:03:02.000 --> 00:03:05.000
This is the Microsoft Extensions AI version,

45
00:03:05.000 --> 00:03:09.000
and this is the raw Google version that we need.

46
00:03:09.500 --> 00:03:15.500
So once we have that, we can go in and get the executable code,

47
00:03:15.500 --> 00:03:17.500
which in this case is this.

48
00:03:17.500 --> 00:03:21.000
So if you, for some reason, wanted to run the code yourself

49
00:03:21.000 --> 00:03:25.500
and generate the image, that would be possible.

50
00:03:25.500 --> 00:03:30.000
But in line in this, there's also something called parts.

51
00:03:30.000 --> 00:03:34.500
And in here, there is multiple parts of what it has done.

52
00:03:34.500 --> 00:03:39.000
So the first one being the code that you generated,

53
00:03:39.000 --> 00:03:44.500
then that was okay, the result of that code.

54
00:03:44.500 --> 00:03:48.500
Then it gives us back the image in a blob,

55
00:03:48.500 --> 00:03:52.000
so we get the raw data inside here.

56
00:03:52.000 --> 00:03:56.000
It seems to also have a bug that it gives that back twice.

57
00:03:56.000 --> 00:03:59.000
I have seen sometimes that it gives it back once,

58
00:03:59.000 --> 00:04:01.000
and sometimes it gives it back twice.

59
00:04:01.000 --> 00:04:03.000
In our case, we get it back twice,

60
00:04:03.000 --> 00:04:06.500
so you need to take into account if there's multiple images

61
00:04:06.500 --> 00:04:08.500
and you didn't expect that.

62
00:04:08.500 --> 00:04:13.000
And the final one is actually the text being back.

63
00:04:13.000 --> 00:04:15.000
So that is what we get,

64
00:04:15.000 --> 00:04:20.500
and if we go in and find the parts that have inline data,

65
00:04:20.500 --> 00:04:24.000
so the first one didn't, the second one didn't either,

66
00:04:24.000 --> 00:04:26.000
and now we get to the data.

67
00:04:26.000 --> 00:04:29.500
We can then make a path, in my case in my temp folder,

68
00:04:29.500 --> 00:04:35.500
and write all the raw bytes into that and run the image.

69
00:04:35.500 --> 00:04:37.500
So the image looks like this,

70
00:04:37.500 --> 00:04:42.500
and that was generated up in the sandbox environment using Python,

71
00:04:42.500 --> 00:04:45.500
and sent back to us as raw bytes.

72
00:04:47.500 --> 00:04:50.500
If we go further,

73
00:04:50.500 --> 00:04:54.500
we will see that we will end up with one more image.

74
00:04:54.500 --> 00:04:57.500
Again, in my opinion, this is a bug

75
00:04:57.500 --> 00:05:00.000
because it's exactly the same two images,

76
00:05:00.000 --> 00:05:05.500
so we need to take care of that in our code,

77
00:05:05.500 --> 00:05:09.500
unfortunately, until Google fixes this.

78
00:05:10.500 --> 00:05:16.000
But at the end, we just have our image and our text,

79
00:05:16.000 --> 00:05:19.500
and if we want to, our executed code.

80
00:05:21.000 --> 00:05:26.000
Another way we can use the code tool is, for example,

81
00:05:26.000 --> 00:05:29.500
give us the first 50 prime numbers,

82
00:05:29.500 --> 00:05:35.500
generate them and give some calculations on them. If we do this,

83
00:05:38.500 --> 00:05:43.500
we can again see that we get a response back here,

84
00:05:43.500 --> 00:05:48.500
and that it used a fairly amount of tokens.

85
00:05:49.500 --> 00:05:53.500
And again, we can go and get the raw contents,

86
00:05:54.000 --> 00:05:56.000
and we can get the executable code,

87
00:05:56.000 --> 00:06:00.500
so this is how it calculated the first 50 prime numbers,

88
00:06:00.500 --> 00:06:06.000
and the raw result, if we want that for some reason,

89
00:06:06.000 --> 00:06:10.000
which is what this code generated.

90
00:06:11.500 --> 00:06:14.000
So we have full access to the raw data,

91
00:06:14.000 --> 00:06:19.500
but also just letting the AI respond back if we want to.

92
00:06:20.500 --> 00:06:24.500
The reason why I've talked a little about the tokens,

93
00:06:24.500 --> 00:06:28.500
in input and output tokens here and up here,

94
00:06:28.500 --> 00:06:38.500
is that they actually just bill you the tokens you use for code execution,

95
00:06:38.500 --> 00:06:48.000
meaning we pay for this part of the code and so on,

96
00:06:48.000 --> 00:06:51.000
as it was input tokens and output tokens.

97
00:06:51.000 --> 00:06:54.000
And that's the reason why these are fairly high,

98
00:06:54.000 --> 00:06:58.000
but of course still being good that it can do this.

99
00:06:58.000 --> 00:07:01.000
And this deviates from OpenAI,

100
00:07:01.000 --> 00:07:05.000
where you actually need to pay per session you run.

101
00:07:05.500 --> 00:07:08.500
Google have decided to say, okay, we pay tokens,

102
00:07:08.500 --> 00:07:13.000
but we don't pay for you to run a sandbox environment.

103
00:07:13.000 --> 00:07:17.000
They take the charts on that and instead bill you in tokens,

104
00:07:17.000 --> 00:07:22.000
which I think will be much cheaper than actually paying

105
00:07:22.000 --> 00:07:29.000
for the container apps running up in the cloud.

106
00:07:31.000 --> 00:07:35.500
In my book, that is a better solution and a cheaper solution for us,

107
00:07:35.500 --> 00:07:38.500
but of course if you make some very, very advanced stuff,

108
00:07:38.500 --> 00:07:42.000
you pay a lot of tokens in that code execution.

109
00:07:43.000 --> 00:07:45.500
But that's actually everything there is to it.

110
00:07:45.500 --> 00:07:48.000
We still need to do breaking glass like this,

111
00:07:48.000 --> 00:07:51.500
but beyond that, it's very simple to just use,

112
00:07:51.500 --> 00:07:56.500
and then a little cumbersome to get out again from the system.

113
00:07:57.000 --> 00:07:59.500
But that's everything. See you on the next one.