WEBVTT

00:00.060 --> 00:00.893
-: Hey, welcome back.

00:00.893 --> 00:03.030
So far, you've learned about how to manage messages

00:03.030 --> 00:04.680
in both a stateful environment,

00:04.680 --> 00:06.990
i.e. OpenAI saves the messages,

00:06.990 --> 00:08.220
and also the stateless,

00:08.220 --> 00:11.190
where the store=False with the Responses API.

00:11.190 --> 00:12.420
The next thing we're gonna have a look at

00:12.420 --> 00:15.510
is how we can do custom token counting

00:15.510 --> 00:17.970
if we want to restrict the token size

00:17.970 --> 00:20.460
when we're using the Responses API.

00:20.460 --> 00:21.990
Now, you can either use the notebook,

00:21.990 --> 00:24.690
or you can follow along and get that muscle memory building.

00:24.690 --> 00:26.580
The notebook you would want to open

00:26.580 --> 00:30.270
is the managing_token_messages_manually_responses_api

00:30.270 --> 00:33.303
inside of the openai_features_and_functionality" folder.

00:34.320 --> 00:35.730
All right, so the first thing that we're gonna do

00:35.730 --> 00:38.040
is we're gonna install the tiktoken package

00:38.040 --> 00:39.840
and the openai package.

00:39.840 --> 00:41.340
After that, what we're then gonna do

00:41.340 --> 00:44.160
is we're gonna import the OpenAI package,

00:44.160 --> 00:46.110
import the model that we're gonna be using,

00:46.110 --> 00:48.197
is GPT-4.1-mini.

00:48.197 --> 00:50.310
You're going to need to update the client

00:50.310 --> 00:51.597
and add in your API key here.

00:51.597 --> 00:52.530
Now if you scroll down,

00:52.530 --> 00:54.480
we've got this managing tokens manually

00:54.480 --> 00:56.850
using tiktoken and a custom function.

00:56.850 --> 00:58.770
So we've copied across that function

00:58.770 --> 01:00.240
from the other notebook,

01:00.240 --> 01:01.200
which is basically,

01:01.200 --> 01:02.547
we have this import on tiktoken

01:02.547 --> 01:05.550
and we can easily find out how many messages we have

01:05.550 --> 01:07.590
for all of these different types of models

01:07.590 --> 01:09.600
as a custom helper function.

01:09.600 --> 01:11.820
And we're gonna have a look at a more complicated example

01:11.820 --> 01:14.670
of how to manage tokens for the input tokens.

01:14.670 --> 01:17.340
We have a bunch of article headings here

01:17.340 --> 01:18.900
and we're gonna set up the system prompts.

01:18.900 --> 01:20.562
So we'll say system_prompt

01:20.562 --> 01:24.600
= You are a helpful assistant

01:24.600 --> 01:28.320
for a financial news website.

01:28.320 --> 01:30.888
And then we are also going to say

01:30.888 --> 01:35.888
system_prompt += All of the subheadings.

01:37.140 --> 01:38.040
So at this point here,

01:38.040 --> 01:42.300
we're gonna do for heading in article_headings.

01:42.300 --> 01:46.560
Then we're gonna do system_prompt +=

01:46.560 --> 01:50.013
and then we're gonna add in the f string here with heading,

01:51.490 --> 01:55.020
\n for the new line.

01:55.020 --> 01:57.270
After that we're gonna do

01:57.270 --> 02:00.960
messages.append,

02:00.960 --> 02:04.500
and then we're gonna paste in the role of system,

02:04.500 --> 02:08.220
and the content of that will be equal to our system_prompt,

02:14.610 --> 02:16.743
and we need to put a comma here.

02:19.050 --> 02:21.270
We're also gonna set the maximum token size,

02:21.270 --> 02:23.250
so MAX_TOKEN_SIZE,

02:23.250 --> 02:26.163
we're gonna set this to 2048.

02:27.750 --> 02:29.280
Now we have some pseudo code here,

02:29.280 --> 02:30.420
where we're gonna loop over

02:30.420 --> 02:32.940
every article_heading in the headings.

02:32.940 --> 02:34.740
And whilst the chat history object

02:34.740 --> 02:37.350
is more than 2,048 tokens,

02:37.350 --> 02:41.940
we're gonna remove the oldest non-system/developer message.

02:41.940 --> 02:43.290
Basically how we'll do that

02:43.290 --> 02:45.000
is find the index of the first message.

02:45.000 --> 02:46.950
It's not a system or developer message,

02:46.950 --> 02:49.140
and if there isn't a non-system message,

02:49.140 --> 02:50.490
we're going to remove it.

02:50.490 --> 02:53.250
So the first thing that we need to do for this

02:53.250 --> 02:55.620
is we need to add on a messages here

02:55.620 --> 02:57.300
telling them that we want to write

02:57.300 --> 02:59.670
a very large paragraph about that heading.

02:59.670 --> 03:00.810
The next thing we're gonna do

03:00.810 --> 03:03.600
is tell ChatGPT to generate a response,

03:03.600 --> 03:05.940
and notice how we are using the store=False

03:05.940 --> 03:08.130
so that we can control the message history.

03:08.130 --> 03:10.470
We're putting in the messages that already exist.

03:10.470 --> 03:13.440
And then we are gonna then update the conversation history

03:13.440 --> 03:14.940
with what comes out of that.

03:14.940 --> 03:19.050
So we'll just create a new role with a role of assistant

03:19.050 --> 03:22.290
putting in the content from the response.output_text.

03:22.290 --> 03:23.520
And then our while loop,

03:23.520 --> 03:25.860
we're gonna have some custom logic here

03:25.860 --> 03:28.620
where we iterate over all of these.

03:28.620 --> 03:32.280
And what you'll see is we basically end up doing this

03:32.280 --> 03:36.840
where we look to see in our messages

03:36.840 --> 03:40.350
if the message role is not in the system or developer,

03:40.350 --> 03:44.550
then we found the first index of that specific message.

03:44.550 --> 03:47.580
Then what we do is if it's not equal to None,

03:47.580 --> 03:49.410
then we pop that message out

03:49.410 --> 03:52.590
and we also then print out the current number of tokens.

03:52.590 --> 03:54.000
So if you run this,

03:54.000 --> 03:58.470
what you'll see is it will actually grow in the token size

03:58.470 --> 04:00.330
as this runs again and again.

04:00.330 --> 04:02.700
So we'll just wait for this to come back.

04:02.700 --> 04:06.480
The first token count was 1,163 tokens.

04:06.480 --> 04:08.460
The second token count will be even larger

04:08.460 --> 04:10.200
because we're just continually adding in

04:10.200 --> 04:11.790
all of the message history.

04:11.790 --> 04:14.940
So this is why I wanted to show this to you

04:14.940 --> 04:16.050
and demonstrate this to you

04:16.050 --> 04:18.570
because as you add on these messages,

04:18.570 --> 04:21.390
the input length is gonna get increasingly large,

04:21.390 --> 04:23.280
which will add onto your costs.

04:23.280 --> 04:26.070
So what we are doing here is having a custom function

04:26.070 --> 04:29.880
that allows us to basically reduce the token count.

04:29.880 --> 04:31.080
And then as we see here,

04:31.080 --> 04:34.500
the current token count is roughly hovering just above 2,000

04:34.500 --> 04:37.260
because we decided to remove some messages from this.

04:37.260 --> 04:39.690
So hopefully this gives you a good way

04:39.690 --> 04:43.080
of you can combine the store=False

04:43.080 --> 04:46.230
when you're using the responses.create method

04:46.230 --> 04:50.100
as well as the custom token utility function that we have

04:50.100 --> 04:53.130
to basically generate any type of long output

04:53.130 --> 04:55.410
whilst truncating earlier messages

04:55.410 --> 04:57.900
that aren't your system or developer message.

04:57.900 --> 04:59.310
In the next video, we're gonna have a look

04:59.310 --> 05:02.040
at the differences in the API reference

05:02.040 --> 05:04.560
and how you use the Chat Completions endpoint

05:04.560 --> 05:06.840
versus the Responses API endpoint.

05:06.840 --> 05:08.850
Both endpoints are currently fresh.

05:08.850 --> 05:10.650
And the Chat Completions endpoint,

05:10.650 --> 05:12.480
you will see that in some of our content.

05:12.480 --> 05:14.280
It is being continued indefinitely,

05:14.280 --> 05:15.660
so it's good for you to be aware

05:15.660 --> 05:17.910
of how to use both the Responses API

05:17.910 --> 05:19.620
and the Chat Completions API.

05:19.620 --> 05:21.370
Cool, I'll see you in the next one.
