WEBVTT

1
00:00:00.070 --> 00:00:03.170
In this lecture, we will talk about the

2
00:00:03.170 --> 00:00:04.030
code interpreter.

3
00:00:04.710 --> 00:00:08.310
Code interpreter is a tool a little similar

4
00:00:08.310 --> 00:00:10.710
to web search, which is a hosted tool,

5
00:00:10.850 --> 00:00:13.050
so you don't need to make it yourself.

6
00:00:13.950 --> 00:00:16.610
But what it can do is it can

7
00:00:16.610 --> 00:00:17.550
run code.

8
00:00:18.550 --> 00:00:22.590
And since it's AI, it can run Python

9
00:00:22.590 --> 00:00:23.110
code.

10
00:00:23.870 --> 00:00:26.270
But not that we need to write the

11
00:00:26.270 --> 00:00:27.210
code ourselves.

12
00:00:27.210 --> 00:00:30.370
We get the AI to write this Python

13
00:00:30.370 --> 00:00:32.030
code and have it execute.

14
00:00:33.610 --> 00:00:36.430
The main reason for having such a tool

15
00:00:36.430 --> 00:00:40.890
is that, for example, if you give your

16
00:00:40.890 --> 00:00:46.670
AI a very complex mathematical formula, it might

17
00:00:46.670 --> 00:00:49.930
not be good at using next prediction tokens

18
00:00:49.930 --> 00:00:52.330
in order to figure out what the number

19
00:00:52.330 --> 00:00:52.850
is.

20
00:00:53.690 --> 00:00:57.830
But it can certainly write a formula in

21
00:00:57.830 --> 00:01:01.850
Python and then execute the Python code and

22
00:01:01.850 --> 00:01:06.110
just get the answer back instead, which is

23
00:01:06.110 --> 00:01:09.690
more efficient and more secure to be correct.

24
00:01:10.550 --> 00:01:12.810
But it can also use the code interpreter

25
00:01:12.810 --> 00:01:17.670
tool to generate things, for example, images and

26
00:01:17.670 --> 00:01:18.570
stuff like that.

27
00:01:19.010 --> 00:01:20.430
And this is what we are actually going

28
00:01:20.430 --> 00:01:21.350
to do here.

29
00:01:22.410 --> 00:01:24.510
So I'm going to show it first, and

30
00:01:24.510 --> 00:01:26.670
then we'll go into the code, because this

31
00:01:26.670 --> 00:01:29.730
is probably the most advanced example we have

32
00:01:29.730 --> 00:01:30.750
seen so far.

33
00:01:31.690 --> 00:01:33.550
But what I'm going to ask it is,

34
00:01:33.790 --> 00:01:36.290
create me a pie chart of top five

35
00:01:36.290 --> 00:01:38.950
most populous countries in the world.

36
00:01:41.270 --> 00:01:44.350
Because before this, let me just start it,

37
00:01:44.370 --> 00:01:45.850
because it will take a little while.

38
00:01:46.410 --> 00:01:49.750
Before this, we have only gotten text back.

39
00:01:49.750 --> 00:01:54.490
But we can actually get things like images

40
00:01:54.490 --> 00:01:54.930
back.

41
00:01:55.070 --> 00:01:57.890
Not that we're doing image generation, but that

42
00:01:57.890 --> 00:02:00.810
we're using a code interpreter tool to actually

43
00:02:00.810 --> 00:02:06.710
get the image by generating it using Python

44
00:02:06.710 --> 00:02:07.130
code.

45
00:02:08.770 --> 00:02:11.010
So right now, what's happening is, it is

46
00:02:11.010 --> 00:02:14.090
spinning up a small container up in the

47
00:02:14.090 --> 00:02:17.890
cloud and generating this image.

48
00:02:19.970 --> 00:02:23.430
It is okay at it, but not good.

49
00:02:23.510 --> 00:02:26.670
You can see top five here is not

50
00:02:26.670 --> 00:02:28.490
put in.

51
00:02:28.750 --> 00:02:30.730
We would be able to tell it a

52
00:02:30.730 --> 00:02:33.210
bit more that the title should be this,

53
00:02:33.330 --> 00:02:34.270
and so on and so forth.

54
00:02:34.510 --> 00:02:36.430
But they have given us the right information,

55
00:02:37.090 --> 00:02:40.710
and I'm now running a PNG right now

56
00:02:40.710 --> 00:02:41.930
in my temp folder.

57
00:02:42.910 --> 00:02:44.930
So let's see how this is done.

58
00:02:46.650 --> 00:02:49.470
We also see it gets the information back,

59
00:02:50.170 --> 00:02:52.550
but we are most interested in getting the

60
00:02:52.550 --> 00:02:53.430
image right now.

61
00:02:55.200 --> 00:02:59.880
So first up, this only works with OpenAI.

62
00:03:02.040 --> 00:03:06.520
Azure OpenAI have code interpreter, and they can

63
00:03:06.520 --> 00:03:09.200
use it for mathematical formulas and run things

64
00:03:09.200 --> 00:03:12.100
like that, but they cannot generate images back.

65
00:03:13.200 --> 00:03:17.240
It's a limitation, bug, call it what you

66
00:03:17.240 --> 00:03:17.640
want.

67
00:03:18.560 --> 00:03:20.680
So as you can see here, instead of

68
00:03:20.680 --> 00:03:24.320
the normal Azure, I'm using OpenAI for this.

69
00:03:26.300 --> 00:03:28.740
And then I'm making a client.

70
00:03:28.960 --> 00:03:31.920
In this case, it's a GPT 5 Nano,

71
00:03:32.460 --> 00:03:35.220
and I'm giving it one tool, meaning this

72
00:03:35.220 --> 00:03:37.280
hosted code interpreter tool.

73
00:03:38.160 --> 00:03:41.040
And as we saw with web search, these

74
00:03:41.040 --> 00:03:44.640
host tools only work with the responses client,

75
00:03:45.840 --> 00:03:48.600
which was the one that were in evaluation.

76
00:03:48.980 --> 00:03:50.360
So for that reason, we have it up

77
00:03:50.360 --> 00:03:53.640
here, because there's more of this that needs

78
00:03:53.640 --> 00:03:54.160
evaluation.

79
00:03:56.940 --> 00:04:00.520
And then we do our chat loop, but

80
00:04:00.520 --> 00:04:03.580
instead of just writing out the answer, we're

81
00:04:03.580 --> 00:04:07.740
also expecting the response back, because the response

82
00:04:07.740 --> 00:04:10.440
have a lot of extra information.

83
00:04:11.400 --> 00:04:12.980
And let me set a break point and

84
00:04:12.980 --> 00:04:15.880
do it again, so we can better understand

85
00:04:15.880 --> 00:04:16.620
this code.

86
00:04:18.140 --> 00:04:20.880
So I'm just going to ask the same

87
00:04:20.880 --> 00:04:21.740
question again.

88
00:04:24.560 --> 00:04:27.980
And once it comes back, we will begin

89
00:04:27.980 --> 00:04:32.000
to loop to this, because the response underneath

90
00:04:32.000 --> 00:04:37.960
have one or more messages, meaning whenever we

91
00:04:37.960 --> 00:04:41.180
ask just hello and the AI say hello

92
00:04:41.180 --> 00:04:43.560
back, we get two messages, one from the

93
00:04:43.560 --> 00:04:46.160
user and one from the assistant.

94
00:04:47.120 --> 00:04:50.000
But inside these messages, there can also be

95
00:04:50.000 --> 00:04:50.680
content.

96
00:04:53.160 --> 00:04:56.940
All content are by default AI content, but

97
00:04:56.940 --> 00:04:59.700
we can ask if it's various different types

98
00:04:59.700 --> 00:05:00.420
of content.

99
00:05:01.340 --> 00:05:06.880
So for example, annotation content, which is how

100
00:05:06.880 --> 00:05:11.020
we get a file ID that is generated

101
00:05:11.020 --> 00:05:12.740
by the tool.

102
00:05:13.540 --> 00:05:15.860
So now it has done it, and let's

103
00:05:15.860 --> 00:05:19.880
see this response object with the debugger.

104
00:05:21.120 --> 00:05:23.640
So what we can see is we have

105
00:05:23.640 --> 00:05:25.700
inside it one message.

106
00:05:26.380 --> 00:05:28.780
This message is from the assistant.

107
00:05:29.580 --> 00:05:30.760
Let it come back.

108
00:05:31.940 --> 00:05:39.500
And inside that, we'll find various content.

109
00:05:41.340 --> 00:05:44.380
So we have content about how it reasons.

110
00:05:45.180 --> 00:05:48.800
We have content about it calling the tool,

111
00:05:51.560 --> 00:05:54.660
where we see some input in that it

112
00:05:54.660 --> 00:05:56.130
makes some Python code.

113
00:05:56.540 --> 00:06:00.800
It used some matplotlib, stuff like that.

114
00:06:04.600 --> 00:06:08.740
And what we are most interested in is,

115
00:06:10.500 --> 00:06:13.860
you can see it come back with further

116
00:06:13.860 --> 00:06:18.000
things here, where we actually get a container

117
00:06:18.000 --> 00:06:21.700
ID, meaning what container up in the cloud

118
00:06:21.700 --> 00:06:23.560
was running this.

119
00:06:24.960 --> 00:06:27.080
We need that later on.

120
00:06:30.380 --> 00:06:36.200
We also see, let me see if I

121
00:06:36.200 --> 00:06:37.160
can find it here.

122
00:06:42.840 --> 00:06:44.900
Let's just run this code, so we can

123
00:06:44.900 --> 00:06:46.960
see that when we go to the messages,

124
00:06:48.220 --> 00:06:52.520
we can go to the content, and check

125
00:06:52.520 --> 00:06:56.300
if that content has any annotations.

126
00:06:57.500 --> 00:06:59.320
So the first content didn't have.

127
00:07:00.600 --> 00:07:02.000
The second didn't have.

128
00:07:03.880 --> 00:07:05.440
The third didn't have.

129
00:07:05.440 --> 00:07:07.180
We have eight of them in total.

130
00:07:12.740 --> 00:07:13.940
Fourth didn't have.

131
00:07:15.780 --> 00:07:17.080
Fifth didn't have.

132
00:07:18.760 --> 00:07:19.400
Sixth.

133
00:07:20.620 --> 00:07:22.560
Seventh was an annotation.

134
00:07:23.440 --> 00:07:25.960
So if we look at this content, we

135
00:07:25.960 --> 00:07:29.380
see we get one annotation, and that annotation

136
00:07:29.380 --> 00:07:34.620
being, what is the file ID of what

137
00:07:34.620 --> 00:07:40.700
we did, and what is the container ID,

138
00:07:41.220 --> 00:07:42.660
which are the two things we need.

139
00:07:46.200 --> 00:07:48.680
So if we go in here, we can

140
00:07:48.680 --> 00:07:52.360
check that this raw representation is a container

141
00:07:52.360 --> 00:07:54.720
file citation message annotation.

142
00:07:54.720 --> 00:07:57.220
And if it is not, which it is,

143
00:07:57.620 --> 00:08:01.760
we now have what we need, because we

144
00:08:01.760 --> 00:08:04.380
can now use the raw client to get

145
00:08:04.380 --> 00:08:07.860
a container client, meaning now we are not

146
00:08:07.860 --> 00:08:10.620
working with chat messages and anything, but we

147
00:08:10.620 --> 00:08:13.760
are actually going to get files that are

148
00:08:13.760 --> 00:08:17.700
associated with my account up in OpenAI.

149
00:08:18.500 --> 00:08:21.760
So we get a client in order to

150
00:08:21.760 --> 00:08:24.340
download our file.

151
00:08:24.340 --> 00:08:27.440
So we take our citation and take the

152
00:08:27.440 --> 00:08:30.160
container ID and the file ID.

153
00:08:31.400 --> 00:08:34.280
So what happens is it actually downloads it

154
00:08:34.280 --> 00:08:34.500
all.

155
00:08:34.720 --> 00:08:39.940
It gives me back the file in a

156
00:08:39.940 --> 00:08:42.159
binary format, so I can't really see anything

157
00:08:42.159 --> 00:08:43.560
other than it's linked here.

158
00:08:45.300 --> 00:08:48.160
And then I check the file name of

159
00:08:48.160 --> 00:08:52.860
the system, which is this type, pi thing.

160
00:08:53.540 --> 00:08:55.920
And then I write all the data, in

161
00:08:55.920 --> 00:08:58.120
my case, down into a temp folder.

162
00:09:00.120 --> 00:09:02.300
And once I do that, I just use

163
00:09:02.300 --> 00:09:07.080
normal C-sharp to run that part, meaning

164
00:09:07.080 --> 00:09:10.140
it will open up in my file or

165
00:09:10.140 --> 00:09:12.460
in my image viewer.

166
00:09:15.200 --> 00:09:16.880
So it will open up.

167
00:09:17.180 --> 00:09:19.220
Now it looks a little better here in

168
00:09:19.220 --> 00:09:19.960
this case.

169
00:09:19.960 --> 00:09:23.680
But again, we could use prompt engineering to

170
00:09:23.680 --> 00:09:26.580
tell exactly how we want it to look.

171
00:09:29.760 --> 00:09:32.020
We then go to the final content and

172
00:09:32.020 --> 00:09:32.400
messages.

173
00:09:32.760 --> 00:09:33.360
We're done.

174
00:09:34.400 --> 00:09:37.640
And we have our image that we could

175
00:09:37.640 --> 00:09:39.700
then use in our system.

176
00:09:41.220 --> 00:09:45.500
So while it's very easy to just give

177
00:09:45.500 --> 00:09:48.600
the tool, and if we had not used

178
00:09:48.600 --> 00:09:51.020
it to get images back but just used

179
00:09:51.020 --> 00:09:53.900
it for a very advanced formula, we wouldn't

180
00:09:53.900 --> 00:09:56.300
need to do anything down here.

181
00:09:56.680 --> 00:09:58.800
But if we get files back, we need

182
00:09:58.800 --> 00:10:01.560
to do stuff like this, going through the

183
00:10:01.560 --> 00:10:05.640
content, figuring out where the annotations are, and

184
00:10:05.640 --> 00:10:09.480
figuring out where the container file citations are,

185
00:10:09.600 --> 00:10:12.960
so we can actually get our files and

186
00:10:12.960 --> 00:10:15.760
show them to the user.

187
00:10:17.960 --> 00:10:20.320
So more advanced down here.

188
00:10:21.040 --> 00:10:23.340
And in general, you will now see that

189
00:10:23.340 --> 00:10:26.400
this response, which we have so far just

190
00:10:26.400 --> 00:10:29.800
used the text order to string, has so

191
00:10:29.800 --> 00:10:33.060
much more information into it when we need

192
00:10:33.060 --> 00:10:35.740
to get things back in a more advanced

193
00:10:35.740 --> 00:10:37.500
stuff in multimodal.
