WEBVTT

1
00:00:00.000 --> 00:00:07.400
Hi, and welcome to this AI and CGI video on an OpenAI and Azure OpenAI specific topic,

2
00:00:07.400 --> 00:00:09.319
which is batch jobs.

3
00:00:09.319 --> 00:00:16.639
So batch jobs is a way for you to get your AI much cheaper, meaning 50% off, if you can

4
00:00:16.639 --> 00:00:18.700
wait for the responses.

5
00:00:18.700 --> 00:00:22.639
So let's see what that is in details.

6
00:00:22.639 --> 00:00:29.440
Because there's a lot to go for batches, because if you check a normal LLM call, it has the

7
00:00:29.440 --> 00:00:35.319
normal model price, depending on the model you choose, of course, while a batch job will

8
00:00:35.319 --> 00:00:40.479
actually only cost you 50% of that, meaning half.

9
00:00:40.479 --> 00:00:47.599
It also gets you much higher rate limits, like 10x rate limits on certain models.

10
00:00:47.599 --> 00:00:50.580
Newer models is slightly less.

11
00:00:50.580 --> 00:00:58.439
So what's not to like, well, there is the difference of that a normal job, you will

12
00:00:58.439 --> 00:01:04.199
be guaranteed instant processing, it begins to work with your data, while in a batch job,

13
00:01:04.199 --> 00:01:10.279
you can technically wait up to 24 hours before you get your job back.

14
00:01:10.279 --> 00:01:17.959
In real life, for low number of requests, it's much faster, which we'll see in one second

15
00:01:17.959 --> 00:01:24.379
in a demo, but they are not guaranteeing it to happen.

16
00:01:24.379 --> 00:01:31.580
So what is actually happening is, when they feel there's no spikes in the number of requests

17
00:01:31.580 --> 00:01:37.260
going to the LLMs, they will process these batch jobs on the fly.

18
00:01:37.260 --> 00:01:41.300
While there's high traffic, you will get less true.

19
00:01:41.300 --> 00:01:51.419
So when there's much traffic in, for example, US waking up, then batching will take longer,

20
00:01:51.419 --> 00:01:57.699
but within 24 hours, you will get your result.

21
00:01:57.699 --> 00:02:03.500
Both of them can do chats, client and responses API, and embeddings.

22
00:02:03.500 --> 00:02:09.460
So you can also take your embeddings and do in batches if need be.

23
00:02:09.460 --> 00:02:18.259
There's some limits in that you can max send out 50,000 requests per file of a batch, because

24
00:02:18.259 --> 00:02:26.419
a batch is happening by sending a file to the LLM, instead of just the individual request.

25
00:02:26.419 --> 00:02:31.860
And it can max be 200 megabytes, that's quite a lot, so we're probably not going to limit you

26
00:02:31.860 --> 00:02:35.539
going to hit, or else you can send multiple batches.

27
00:02:35.539 --> 00:02:38.580
And tool calls are not really supported.

28
00:02:38.580 --> 00:02:43.860
You can technically give a tool, but then the response back from the batch will just be,

29
00:02:43.860 --> 00:02:49.460
hey, I want to run this tool, and then you need to send a new batch with the tool in it.

30
00:02:49.460 --> 00:02:56.100
So it is possible, but in reality, you probably wouldn't want to use that part of it for this.

31
00:02:59.300 --> 00:03:02.339
So let's see this in action.

32
00:03:02.339 --> 00:03:06.740
So I have, under other topics here, I have something called batch jobs,

33
00:03:06.740 --> 00:03:12.179
and right now I'm using OpenAI, but Azure OpenAI is also possible.

34
00:03:12.179 --> 00:03:17.699
We will talk about that at the end, because there's slight differences in what we need to do.

35
00:03:19.220 --> 00:03:20.339
So let's run this.

36
00:03:26.339 --> 00:03:30.020
And the first thing we do is, of course, we make our client like normal,

37
00:03:30.899 --> 00:03:33.539
and then we make a file client.

38
00:03:34.100 --> 00:03:41.940
We have seen file clients before in other topics, but that is just to put up a file into OpenAI.

39
00:03:43.139 --> 00:03:49.460
And, again, I could switch out the OpenAI client with the Azure OpenAI client to be exactly the same.

40
00:03:50.580 --> 00:03:52.979
And then I get something called a batch client as well,

41
00:03:54.179 --> 00:04:01.059
because we need to upload what we want to have processed by the LLM using a file,

42
00:04:01.059 --> 00:04:06.100
and we need to upload one called JSON-L for JSON lists.

43
00:04:06.820 --> 00:04:12.740
And if we look at that file over here, it is just a set of JSON.

44
00:04:12.740 --> 00:04:14.740
So one line equals one JSON.

45
00:04:14.740 --> 00:04:17.140
You can't even put in line breaks.

46
00:04:17.140 --> 00:04:18.579
You need to have them on one line.

47
00:04:19.220 --> 00:04:25.299
And you essentially tell, I want to have a job being done, giving it a unique ID.

48
00:04:25.299 --> 00:04:26.420
It needs to be post.

49
00:04:26.420 --> 00:04:32.019
It needs to go against the completion endpoint.

50
00:04:32.019 --> 00:04:34.579
It could be responses endpoint or embedding endpoint.

51
00:04:35.540 --> 00:04:39.540
It needs to be this model that does it, and then you send your messages.

52
00:04:39.540 --> 00:04:44.179
So in this case, we're just sending a system message with nice AI,

53
00:04:44.899 --> 00:04:49.140
and a role of the user where we can send that along.

54
00:04:50.739 --> 00:04:55.779
So if we do it like this, we can send three requests.

55
00:04:55.779 --> 00:04:58.500
So in this case, we are asking, why is the sky blue?

56
00:04:59.299 --> 00:05:00.579
What is the capital of France?

57
00:05:00.579 --> 00:05:02.820
And what is a JSON-L file?

58
00:05:02.820 --> 00:05:05.540
Which I actually didn't know about these files before that.

59
00:05:08.820 --> 00:05:13.779
So in OpenAI, we just give the model name like normal.

60
00:05:15.140 --> 00:05:20.019
If you're going to do this against Azure OpenAI, it's slightly different,

61
00:05:20.579 --> 00:05:25.220
because there, what you put in model here is the deployments.

62
00:05:26.179 --> 00:05:34.019
And up in Azure, when we do deployments up here with our models,

63
00:05:36.420 --> 00:05:43.059
each model are either set as a global standard or a global batch.

64
00:05:43.059 --> 00:05:49.299
So I made one here called global batch.

65
00:05:49.299 --> 00:05:54.820
And it's only the deployment type global batch that will work together with the system.

66
00:05:54.820 --> 00:05:59.140
But when you deploy a model, it's fairly simple to do.

67
00:06:00.579 --> 00:06:08.179
We take and see if we can choose batch here on this one.

68
00:06:08.179 --> 00:06:12.579
But if you go into custom, you choose here.

69
00:06:12.579 --> 00:06:14.820
In this case, it's codec, so it can't do it.

70
00:06:14.820 --> 00:06:18.260
But you will normally choose the model here.

71
00:06:18.260 --> 00:06:27.540
So let's go to nano and have this one we want to deploy as a batch.

72
00:06:28.660 --> 00:06:33.380
And we can choose global batch or data zone batch if we want to do that as well.

73
00:06:34.579 --> 00:06:36.660
So we can definitely do that up here.

74
00:06:37.220 --> 00:06:43.459
And that's the reason why I have a separate file if I want to do it using Azure.

75
00:06:45.380 --> 00:06:46.420
But back to the code.

76
00:06:47.380 --> 00:06:52.980
So in our case, we're just using the one that used HTTP 4.1 mini.

77
00:06:54.339 --> 00:06:56.500
So we get our file.

78
00:06:56.500 --> 00:06:59.459
And now it's uploaded to the system.

79
00:06:59.459 --> 00:07:04.019
And we need to tell the purpose of that file is batch, else it can't pick it up.

80
00:07:05.619 --> 00:07:09.459
And then we're going to make something called a create batch request.

81
00:07:09.459 --> 00:07:12.179
Because this is a very open-ended system.

82
00:07:13.140 --> 00:07:17.140
So there's no real out-of-the-box objects for these.

83
00:07:17.700 --> 00:07:21.059
So instead, we need to make our own object here,

84
00:07:21.059 --> 00:07:26.260
where we need to have an input file ID, an endpoint, and a completion window.

85
00:07:27.459 --> 00:07:30.980
And we're giving the ID of the file we just uploaded.

86
00:07:30.980 --> 00:07:32.820
We're telling it's completions.

87
00:07:34.019 --> 00:07:37.140
And we're telling it's a window of 24 hours.

88
00:07:37.140 --> 00:07:40.899
It's a little odd that they have this, because it's the only option you can give.

89
00:07:42.179 --> 00:07:45.700
But when we do that, we get something called a binary content.

90
00:07:46.980 --> 00:07:50.100
And we can say we want to create a batch.

91
00:07:50.820 --> 00:07:52.899
We can wait for the batch to complete.

92
00:07:53.540 --> 00:07:54.980
So it will just sit and wait.

93
00:07:55.540 --> 00:07:59.380
But in real life, you will probably not wait for the batch.

94
00:08:00.980 --> 00:08:08.339
I've seen batches come back between 20 seconds and 4 minutes for these three things.

95
00:08:09.380 --> 00:08:11.779
But of course, with much more, it will take longer.

96
00:08:13.140 --> 00:08:16.899
But just to simulate that we're not waiting,

97
00:08:17.859 --> 00:08:19.299
we start a batch.

98
00:08:19.299 --> 00:08:21.459
So you can see it didn't take a long time.

99
00:08:22.019 --> 00:08:25.380
And we just get back that we have a batch ID.

100
00:08:27.540 --> 00:08:30.179
So we're writing out our batch ID here.

101
00:08:31.299 --> 00:08:40.580
And then we're going to go into a while loop and ask, how is our batch doing?

102
00:08:41.219 --> 00:08:46.659
So there's three termination statuses that we need to listen for.

103
00:08:46.659 --> 00:08:48.900
Completed, expired, or canceled.

104
00:08:48.900 --> 00:08:52.900
So we can either complete it, get it completed.

105
00:08:52.900 --> 00:08:56.419
It can expire if it didn't finish within the 24 hours.

106
00:08:56.979 --> 00:09:01.940
Or we have an endpoint that can say, we want to cancel this batch.

107
00:09:01.940 --> 00:09:03.460
We don't want to run it anymore.

108
00:09:03.460 --> 00:09:04.659
Then it's possible to do that.

109
00:09:05.380 --> 00:09:09.219
So we have the options of validating, failed, in progress, and so on.

110
00:09:09.219 --> 00:09:18.500
So if we take these different termination statuses and just start a clock,

111
00:09:18.500 --> 00:09:22.340
we can go in and check if the status of a batch,

112
00:09:22.340 --> 00:09:24.820
which we use the batch client again and say, get batch.

113
00:09:28.099 --> 00:09:31.700
Right now, I have a delay here of 10 seconds we need to wait for first.

114
00:09:34.419 --> 00:09:38.659
Because in real life, it would not complete within those.

115
00:09:38.659 --> 00:09:40.739
We will now get a batch result back.

116
00:09:41.380 --> 00:09:45.539
And this batch result don't really have anything directly for us.

117
00:09:45.539 --> 00:09:47.940
Because again, it's a very open-ended system.

118
00:09:47.940 --> 00:09:50.979
So the real thing we get back is actually this JSON.

119
00:09:52.099 --> 00:09:54.580
And in this case, we can see now it's in progress.

120
00:09:55.619 --> 00:10:01.299
When it started, what is the output file and all kinds of things of the request.

121
00:10:01.299 --> 00:10:02.820
But right now, it's still running.

122
00:10:03.700 --> 00:10:14.659
And if we go to OpenAI, we can go in and check the batches up here.

123
00:10:15.539 --> 00:10:20.500
And it's actually now finished because it ran for 12.

124
00:10:20.500 --> 00:10:24.260
So let's just quickly restart the code with a new batch

125
00:10:24.260 --> 00:10:26.900
so we can see this in more seconds.

126
00:10:28.500 --> 00:10:31.059
So let's get it running.

127
00:10:31.780 --> 00:10:33.140
Let now have a new batch.

128
00:10:35.059 --> 00:10:39.380
We will be able to see there's a new batch up here. It's running.

129
00:10:39.380 --> 00:10:42.020
It's already completed the first of the three.

130
00:10:43.940 --> 00:10:52.179
And if we go to the code, I've just set up that every 10 seconds,

131
00:10:52.179 --> 00:10:55.859
it should give us back how far it's along.

132
00:10:57.219 --> 00:11:00.020
And we can set a breakpoint here.

133
00:11:00.659 --> 00:11:04.739
Because we are always asking for the response back here,

134
00:11:04.739 --> 00:11:08.659
which have a status, some counts, and output files.

135
00:11:10.500 --> 00:11:12.900
So this time, it takes a little longer.

136
00:11:14.820 --> 00:11:18.340
And we are essentially waiting for one of the statuses

137
00:11:18.340 --> 00:11:21.140
to be completed, expired, or canceled.

138
00:11:24.179 --> 00:11:27.859
And once we get that, it will go out of our loop.

139
00:11:27.859 --> 00:11:30.659
Of course, in real life, you wouldn't make a while loop like this.

140
00:11:30.659 --> 00:11:36.340
You would instead have some kind of job that checked every 30 minutes

141
00:11:36.340 --> 00:11:40.500
or however long you feel like your batches will be.

142
00:11:41.700 --> 00:11:45.140
And up here, we can now see it's finalizing.

143
00:11:45.140 --> 00:11:46.900
So we can also see that status here.

144
00:11:50.739 --> 00:11:54.900
So we should be very, very close to getting our result back,

145
00:11:54.900 --> 00:11:58.979
meaning our output file here will end up having a value.

146
00:12:03.140 --> 00:12:06.419
So now our output and status is completed.

147
00:12:08.340 --> 00:12:09.780
We go out of the while loop.

148
00:12:10.419 --> 00:12:12.260
We now have our output file.

149
00:12:13.700 --> 00:12:15.859
And we can download the output.

150
00:12:16.659 --> 00:12:19.940
So in this case, I'm just downloading it to a temp folder.

151
00:12:20.739 --> 00:12:23.460
And then I'm showing it to you in Notepad here.

152
00:12:24.419 --> 00:12:28.979
So if we take this and look at a JSON viewer,

153
00:12:28.979 --> 00:12:31.940
there's no real good JSON viewer for JSONL files.

154
00:12:31.940 --> 00:12:38.260
But if we look in here, we can see we have the first result back,

155
00:12:38.820 --> 00:12:42.260
the second result back, and the third result back.

156
00:12:43.059 --> 00:12:45.780
And the first result was, why is the sky blue?

157
00:12:45.780 --> 00:12:48.099
And you can see we only get the response back.

158
00:12:48.099 --> 00:12:51.859
We don't even get what the questions were.

159
00:12:51.859 --> 00:12:59.859
So we need to match our own ID of the request with our own response here.

160
00:12:59.859 --> 00:13:04.739
So we need to save the request as well and then merge them.

161
00:13:04.739 --> 00:13:07.619
But that's normal, C sharp.

162
00:13:07.619 --> 00:13:09.140
So we won't go into that.

163
00:13:09.940 --> 00:13:14.020
But we can get back why the sky is blue.

164
00:13:14.979 --> 00:13:20.580
We can get back what is the capital of France in the second request.

165
00:13:20.580 --> 00:13:23.700
And the third one was, what is a JSONL file?

166
00:13:26.580 --> 00:13:31.460
So if we have a look at the final part here,

167
00:13:31.460 --> 00:13:34.419
it's just we download the file and we save it.

168
00:13:36.020 --> 00:13:40.979
So we need a little of this extra thing with some objects here.

169
00:13:40.979 --> 00:13:43.219
You can just copy paste mine here.

170
00:13:43.219 --> 00:13:47.299
And there's a bunch more data in them, of course, as we saw.

171
00:13:48.260 --> 00:13:53.219
But this way, we have now asked three questions.

172
00:13:53.219 --> 00:13:55.940
We spent a couple of minutes.

173
00:13:55.940 --> 00:13:57.619
We can see that up here.

174
00:14:03.380 --> 00:14:07.219
I need to refresh here so we can see.

175
00:14:07.859 --> 00:14:14.900
So it took one minute and 26 seconds to get those three messages.

176
00:14:15.619 --> 00:14:17.780
And that would, of course, not be good for a chatbot.

177
00:14:17.780 --> 00:14:20.820
But it would be good if it's just some background job.

178
00:14:21.859 --> 00:14:29.059
And now we have saved 50% of our price just for waiting.

179
00:14:29.059 --> 00:14:33.539
So this is really, really good for some big advanced stuff

180
00:14:33.539 --> 00:14:37.299
where you really, really hit your rate limits.

181
00:14:37.299 --> 00:14:40.739
Or you just want to have it more cheap.

182
00:14:41.539 --> 00:14:46.260
There's more work for you because you need to do all these extra steps in the code.

183
00:14:47.059 --> 00:14:52.500
And the biggest problem, of course, is things like structured output and tools

184
00:14:52.500 --> 00:14:54.900
that is more difficult to work with here.

185
00:14:57.059 --> 00:14:58.500
But we can do it.

186
00:14:59.219 --> 00:15:01.380
And it's really nice that we can.

187
00:15:02.260 --> 00:15:04.739
Azure have exactly the same.

188
00:15:05.460 --> 00:15:10.580
If we go to the models again, go into build.

189
00:15:11.219 --> 00:15:12.340
Go to the models.

190
00:15:12.340 --> 00:15:13.859
We can see our batch jobs.

191
00:15:15.539 --> 00:15:19.940
The only real difference I have seen is it can't really on the fly,

192
00:15:21.380 --> 00:15:25.140
like we saw here, tell how many of them were completed.

193
00:15:26.340 --> 00:15:28.099
It seems like it stays at zero.

194
00:15:28.099 --> 00:15:30.659
And then it suddenly goes to complete it all.

195
00:15:30.659 --> 00:15:36.020
But that's the only difference in what I have seen in the two differences.

196
00:15:36.979 --> 00:15:41.299
But again, we need to deploy a dedicated model for batching.

197
00:15:43.059 --> 00:15:47.859
But else it works exactly the same just with the Azure OpenAI Cloud.

198
00:15:49.299 --> 00:15:52.979
So that's everything there is to know about batching.

199
00:15:53.619 --> 00:16:01.380
Really nice subject if you need to have some really, really big payloads.

200
00:16:02.179 --> 00:16:03.539
So see you in the next one.