WEBVTT

1
00:00:00.000 --> 00:00:02.200
Up to this point,

2
00:00:02.200 --> 00:00:02.240


3
00:00:02.240 --> 00:00:05.920
we've used OpenAI's models for completing text in some

4
00:00:05.920 --> 00:00:12.440
way, whether that's answering questions, classifying statements, or editing text.

5
00:00:12.440 --> 00:00:12.440


6
00:00:12.440 --> 00:00:12.440


7
00:00:12.440 --> 00:00:17.360
In this chapter, we'll go further into the API capabilities, including text

8
00:00:17.360 --> 00:00:23.960
moderation, audio transcription and translation, and even combining models together.

9
00:00:23.960 --> 00:00:23.960


10
00:00:23.960 --> 00:00:23.960


11
00:00:23.960 --> 00:00:25.480
Let's start with text moderation.

12
00:00:25.480 --> 00:00:25.480


13
00:00:25.480 --> 00:00:26.480


14
00:00:26.480 --> 00:00:32.560
Text moderation is the process of identifying text that is inappropriate for the context it is being used in.

15
00:00:32.560 --> 00:00:32.560


16
00:00:32.560 --> 00:00:33.760


17
00:00:33.760 --> 00:00:37.400
In online communities like social networks and chat rooms,

18
00:00:37.400 --> 00:00:41.800
moderation is crucial to prevent the exchange of harmful and offensive content.

19
00:00:41.800 --> 00:00:48.520
Traditionally, this moderation was done by-hand, where a team of moderators flagged the content that

20
00:00:48.520 --> 00:00:55.800
breached usage rules, and more recently, by algorithms that detected and flagged content containing particular words.

21
00:00:55.800 --> 00:00:56.680


22
00:00:56.680 --> 00:00:56.680


23
00:00:56.680 --> 00:01:01.600
Manual moderation is extremely time-consuming and, if multiple moderators

24
00:01:01.600 --> 00:01:07.760
are involved, introduces a subjective element that may result in inconsistencies.

25
00:01:07.760 --> 00:01:07.760


26
00:01:07.760 --> 00:01:07.760


27
00:01:07.760 --> 00:01:12.280
Word-detection algorithms, although much quicker and able to run round-the-clock,

28
00:01:12.280 --> 00:01:17.560
can be a clumsy tool that misses some malicious content while accidentally flagging

29
00:01:17.560 --> 00:01:22.480
perfectly good content because it doesn't understand nuance or the context of the discussion.

30
00:01:22.480 --> 00:01:22.480


31
00:01:22.480 --> 00:01:22.480


32
00:01:22.480 --> 00:01:27.240
To prevent the misuse of their own models, OpenAI have developed

33
00:01:27.240 --> 00:01:32.400
moderation models to flag content that breaches their usage policies.

34
00:01:32.400 --> 00:01:32.400


35
00:01:32.400 --> 00:01:32.400


36
00:01:32.400 --> 00:01:38.680
The OpenAI moderation models can not only detect violations of their terms of use, but also

37
00:01:38.680 --> 00:01:45.080
differentiate the type of violation across different categories, including violence and hate speech.

38
00:01:45.080 --> 00:01:45.080


39
00:01:45.080 --> 00:01:45.080


40
00:01:45.080 --> 00:01:51.360
To create a request to the Moderations endpoint, we call the create method on

41
00:01:51.360 --> 00:01:59.720
openai-dot-Moderation, and specify that we want the latest moderation model, which often performs the best.

42
00:01:59.720 --> 00:02:08.360
If a use case requires greater stability in classifications over time, we can also specify particular model versions.

43
00:02:08.360 --> 00:02:08.360


44
00:02:08.360 --> 00:02:08.360


45
00:02:08.360 --> 00:02:12.200
Next is the input, which is the content that the model will consider.

46
00:02:12.200 --> 00:02:15.880
This statement could easily be classed as violent by

47
00:02:15.880 --> 00:02:20.160
traditional moderation systems that worked by flagging particular keywords.

48
00:02:20.160 --> 00:02:20.160


49
00:02:20.160 --> 00:02:20.160


50
00:02:20.160 --> 00:02:23.160
Let's see what OpenAI's models makes of it.

51
00:02:23.160 --> 00:02:23.160


52
00:02:23.160 --> 00:02:24.240


53
00:02:24.240 --> 00:02:32.080
Like other endpoints, the response is still a JSON, and contains three useful indicators: categories, representing

54
00:02:32.080 --> 00:02:39.040
whether the model believed that the statement violated a particular category, category_scores, an indicator of the

55
00:02:39.040 --> 00:02:41.760
model's confidence of a violation, and

56
00:02:41.760 --> 00:02:47.120
finally, flagged, whether it believes the terms of usage have been violated in any way.

57
00:02:47.120 --> 00:02:47.120


58
00:02:47.120 --> 00:02:48.120


59
00:02:48.120 --> 00:02:50.560
Let's take a closer look at the category_scores.

60
00:02:50.560 --> 00:02:50.560


61
00:02:50.560 --> 00:02:50.560


62
00:02:50.560 --> 00:02:57.640
The category_scores are float values for each category indicating the model's confidence of a violation.

63
00:02:57.640 --> 00:03:02.920
The scores can be between 0 and 1, and although higher values

64
00:03:02.920 --> 00:03:07.280
represent higher confidence, they should not be interpreted as probabilities.

65
00:03:07.280 --> 00:03:07.280


66
00:03:07.280 --> 00:03:08.400


67
00:03:08.400 --> 00:03:12.560
Notice from the small numbers, including in the violence category, that

68
00:03:12.560 --> 00:03:17.800
OpenAI's moderation model did not interpret the statement as containing violent content.

69
00:03:17.800 --> 00:03:25.040
The model used the rest of the sentence to interpret the context and accurately infer the statement's meaning.

70
00:03:25.040 --> 00:03:25.040


71
00:03:25.040 --> 00:03:25.040


72
00:03:25.040 --> 00:03:30.120
The beauty of having access to these category scores means that we don't have to

73
00:03:30.120 --> 00:03:35.840
depend on the final true/false results outputted by the model, we can instead test the

74
00:03:35.840 --> 00:03:42.040
model on data from our own particular use case, and set our own thresholds based on the results.

75
00:03:42.040 --> 00:03:42.040


76
00:03:42.040 --> 00:03:42.040


77
00:03:42.040 --> 00:03:46.200
For some use cases, such as student communications in a

78
00:03:46.200 --> 00:03:54.360
school, strict thresholds may be chosen that flag more content, even if it means accidentally flagging some non-violations.

79
00:03:54.360 --> 00:04:01.200
The goal here would be to minimize the number of missed violations, so-called false negatives.

80
00:04:01.200 --> 00:04:01.200


81
00:04:01.200 --> 00:04:01.200


82
00:04:01.200 --> 00:04:06.720
Other use cases, such as communications in law enforcement, may use

83
00:04:06.720 --> 00:04:11.240
more lenient thresholds so reports on crimes aren't accidentally flagged.

84
00:04:11.240 --> 00:04:16.520
Incorrectly flagging a crime report here would be an example of a false positive.

85
00:04:16.520 --> 00:04:16.520


86
00:04:16.520 --> 00:04:16.520


87
00:04:16.520 --> 00:04:20.600
Time for some practice!

