In this chapter, we'll take a deeper dive into the rich text and chat functionality the API has to offer.

So far, we've used the Completions endpoint to answer questions, but the model's capabilities go far beyond this.

To understand where these capabilities come from, let's take a step back and discuss how text completion works.

When we send a prompt to the Completions endpoint, the model returns the text that it believes is most likely to complete the prompt, which it infers based on the data the model was developed on.

If we send "Life is like a box of chocolates" to the model, it correctly completes the quote with high probability. We say high probability here because the model results are non-deterministic, so the model may only correctly complete the quote 98 times out of 100.

There are many use cases where randomness is undesirable; think of a customer service chatbot - we wouldn't want the chatbot to provide different guidance to customers with the same issue. However, we would like the model to be flexible to different inputs, so there's often a trade off in the amount of randomness.

We can control the amount of randomness in the response using the temperature parameter. temperature is set to one by default, but can range from zero to two, where zero is almost entirely deterministic and two is extremely random.

If we add a temperature of two here, we can see the model completes the prompt by putting its own bizarre spin on Forrest Gump's famous quote.

Because the text completion model returns the most likely text to follow the prompt, it can be used to solve a number of tasks besides answering questions, including text content generation and transformation.

Text transformation involves changing text based on an instruction, and examples include find and replace, summarization, and copyediting.

For example, we can use the API to update the name, pronouns, and job title in a bio.

Notice that the prompt starts with the instruction, then the text to transform. We've also used triple quotes to define a multi-line prompt for ease of readability and processing.

Then, as before, we send this prompt to the Completions endpoint of the API using Completion-dot-create.

Voilà! We have our updated text. Even with a find and replace tool, this task would normally require us to specify every word to update.

Text completions are also used to generate new text content from a prompt providing an instruction.

For example, we can create a request to generate a tagline for a new hot dog stand - the API does a good job, and even includes a subtle pun!

By default, the response from the API is quite short, which may be unsuitable for many use cases.

The max_tokens parameter can be used to control the maximum length of the response.

Tokens are a unit of one or more characters used by language models to understand and interpret text.

In English, one token translates to about four characters, and 100 tokens to 75 words, so if our use case requires no more than around 150 words, a max_tokens of 200 would be a good choice.

Increasing max_tokens will likely also increase the usage cost for each request.

Recall that the usage costs are dependent on the model used and the amount of generated text. Each model is actually priced based upon the cost per 1000 tokens, where input tokens, the tokens used in the prompt, and output tokens, the generated text, can be priced differently.

When scoping the potential cost of a new AI feature, the first step is often a back-of-the-envelope calculation to determine the cost per unit time.

Onward to the exercises!