Welcome back! In this video, we'll take it up a gear and look at how we can combine models together.

So far, when we've been using functionality from the OpenAI API, we've completed a task by passing an input to the model and receiving an output.

This got us pretty far, from simple question answering to multi-turn conversations and audio translation.

But what if we could do more with the model's output.

Enter model chaining. Chaining is when models are combined by feeding the output from one model directly into another model as an input.

We can chain multiple calls to the same model together or use different models.

If we chain two text models together, we can ask the model to perform a task in one call to the API and send it back with an additional instruction. For example, let's say we draft an email template for a customer, and we want to validate that the template follows some guidelines, we can send it back to the model with an instruction to check whether the guidelines are followed.

We can also combine two different types of models, say the Whisper model into a text model, to perform tasks like summarizing discussion points and next steps from a meeting recording.

Let's have a go at chaining Whisper with a chat model.

Let's use Whisper to extract the attendees from a meeting recording, where we know it starts with introductions from each of the attendees.

We start by opening the audio file and assigning it audio_file.

Next, we send the audio to the Whisper model and request a transcript with the transcribe method. To extract the transcript from the response, we extract the value from the text key.

Now that we have the meeting transcript, we can use it to create a prompt for the chat model.

The prompt starts with an instruction to extract the attendee names from the start of the transcript, then we append the transcript to the end.

We're now ready to send the prompt to the chat model! We create a request to the ChatCompletion endpoint using the create method. Inside, we specify the model to use and the messages to send, which is just the prompt in this case.

Finally, we extract the response from the chat model by digging into the nested JSON response.

And there we have it!

As with all of these models, there's no guarantee that the models will be 100% accurate - Whisper could make a mistake in the transcript or the chat model may summarize incorrectly. It's important that any application of these models is well-tested, iterating on the prompts if necessary, to understand its performance thresholds. Additionally, usage should be restricted to only non-sensitive data, as we may risk breaching data governance laws by unjustly exposing employee or customer data.

Over to the final exercises!