WEBVTT

1
00:00:00.000 --> 00:00:08.000
Hi, and welcome to this AI and TCI video on an OpenAI specific topic, which is their service tiers.

2
00:00:08.000 --> 00:00:16.000
So, again, this is OpenAI specific, because they have something called Flex Standard and Priority,

3
00:00:16.000 --> 00:00:22.000
where you can choose your speed versus your pricing.

4
00:00:23.000 --> 00:00:29.000
Azure OpenAI doesn't have the same, or they have something,

5
00:00:29.000 --> 00:00:33.000
but it's an invite only, so we will leave it out of this video.

6
00:00:33.000 --> 00:00:37.000
So, this is for OpenAI specific.

7
00:00:37.000 --> 00:00:44.000
So, if you go to the pricing model of OpenAI, you can see there's something called Batch.

8
00:00:44.000 --> 00:00:49.000
We have a separate video for that, but there's Flex, Standard, and Priority.

9
00:00:49.000 --> 00:00:57.000
Standard is what you normally always see, and we see the different input and output pricing,

10
00:00:57.000 --> 00:01:03.000
while Priority has been there for a while and is double the price.

11
00:01:03.000 --> 00:01:15.000
For the double the price, you get less errors, less timeouts, and faster latency in general.

12
00:01:15.000 --> 00:01:23.000
A new thing they have is Flex, which goes the other way, which is slower and more latency,

13
00:01:23.000 --> 00:01:31.000
but if you can manage that, you get it for half the price, meaning the same as the batch prices.

14
00:01:31.000 --> 00:01:37.000
So, these are the three options, and the way you control it is either one of two ways.

15
00:01:37.000 --> 00:01:44.000
One way is that every API key you have is linked to what is called a project,

16
00:01:44.000 --> 00:01:51.000
and in projects, you can go in and set if it should run Default or Priority.

17
00:01:51.000 --> 00:01:58.000
It cannot run Flex at the moment, but you simply just switch to Priority,

18
00:01:58.000 --> 00:02:03.000
and then you're paying double the price now with better performance,

19
00:02:03.000 --> 00:02:09.000
and if you want to switch back, we switch back to Default.

20
00:02:09.000 --> 00:02:12.000
So, this is how you control it.

21
00:02:12.000 --> 00:02:19.000
If you don't want to put anything in code, you simply switch, and you get a better quality of the service.

22
00:02:19.000 --> 00:02:29.000
And I've seen that, in reality, this is actually quite needed if you have very big requests

23
00:02:29.000 --> 00:02:34.000
like several hundred thousand tokens in one go.

24
00:02:34.000 --> 00:02:42.000
Before we switched in our project to Priority, we simply couldn't use OpenAI for that,

25
00:02:42.000 --> 00:02:50.000
but that might have been a period where they had problems with delivering service.

26
00:02:50.000 --> 00:02:57.000
But you can switch. If you want to switch in code, you can also do that,

27
00:02:57.000 --> 00:03:03.000
and we do that down in the chat completion options or the response API options,

28
00:03:03.000 --> 00:03:06.000
where we have something called Service Tier.

29
00:03:06.000 --> 00:03:16.000
And Service Tier can be Auto, meaning it follows what is in the project. It can be Flex.

30
00:03:16.000 --> 00:03:21.000
If we look at the pricing again, Flex doesn't work on every single model.

31
00:03:21.000 --> 00:03:24.000
You can only see it's JGP 5 and 03.

32
00:03:24.000 --> 00:03:29.000
So, some of the earlier models, if you use Flex, it won't work.

33
00:03:29.000 --> 00:03:36.000
While Standard works for everything, and Priority also, more or less, works for everything.

34
00:03:36.000 --> 00:03:39.000
So, we can set that.

35
00:03:39.000 --> 00:03:47.000
We can set default, meaning even if you have a project that is Priority, you can switch it back to default,

36
00:03:47.000 --> 00:03:56.000
or you can have a default project that sometimes, on a case-by-case basis, needs to be Priority.

37
00:03:56.000 --> 00:04:02.000
And that's actually everything there is to it, just setting this extra thing.

38
00:04:02.000 --> 00:04:11.000
In real life, you would probably just go up in the portal and switch everything to the Premium Service,

39
00:04:11.000 --> 00:04:19.000
or have two projects, one that are Service Tier and one that are Default Tier.

40
00:04:19.000 --> 00:04:22.000
But that's everything. See you in the next one.

