WEBVTT

1
00:00:01.050 --> 00:00:04.130
Let's go in and see a little more

2
00:00:04.130 --> 00:00:06.190
than what we did in the Hello World,

3
00:00:06.350 --> 00:00:08.510
because the Hello World was a quick and

4
00:00:08.510 --> 00:00:09.930
dirty introduction.

5
00:00:10.410 --> 00:00:13.430
Let's go a little deeper this time, look

6
00:00:13.430 --> 00:00:15.850
at normal calls and look at streaming calls.

7
00:00:16.810 --> 00:00:18.930
So here's some similar code.

8
00:00:19.030 --> 00:00:21.710
It's called a normal versus streaming response in

9
00:00:21.710 --> 00:00:25.770
section four in the sample repo, and we

10
00:00:25.770 --> 00:00:28.050
just make our client as before.

11
00:00:29.910 --> 00:00:32.630
And then we make our chat client, and

12
00:00:32.630 --> 00:00:34.670
we can create our agent.

13
00:00:36.290 --> 00:00:39.350
The agent actually have further information we can

14
00:00:39.350 --> 00:00:42.750
give here, like instructions, name, description, and tools,

15
00:00:42.850 --> 00:00:43.250
and so on.

16
00:00:43.630 --> 00:00:47.190
I will add a bonus video to this

17
00:00:47.190 --> 00:00:51.050
section, going through all the different things, but

18
00:00:51.050 --> 00:00:53.590
throughout the system, we will learn about instructions

19
00:00:53.590 --> 00:00:56.510
and tools and all the common things, but

20
00:00:56.510 --> 00:00:58.230
I will have a section that will go

21
00:00:58.230 --> 00:00:59.470
to every single one of them.

22
00:01:01.310 --> 00:01:06.890
Beyond that, when we have the agent, we

23
00:01:06.890 --> 00:01:09.450
can just get some information about it, like

24
00:01:09.450 --> 00:01:11.990
instructions and stuff, and we have this run

25
00:01:11.990 --> 00:01:13.130
async that we used.

26
00:01:14.250 --> 00:01:15.670
And as you can see, the run async

27
00:01:15.670 --> 00:01:18.370
can have a lot of different overloads, which

28
00:01:18.370 --> 00:01:20.690
we'll also cover as we go along.

29
00:01:23.290 --> 00:01:27.310
There's also a run async with some brackets

30
00:01:27.310 --> 00:01:27.590
here.

31
00:01:27.910 --> 00:01:31.290
This is for being able to do structured

32
00:01:31.290 --> 00:01:34.450
outputs, which we'll save for a dedicated video

33
00:01:34.450 --> 00:01:35.070
on that.

34
00:01:36.530 --> 00:01:38.730
And then there's streaming, which we'll just see

35
00:01:38.730 --> 00:01:39.470
in one second.

36
00:01:41.150 --> 00:01:44.790
So if we do this and let the

37
00:01:44.790 --> 00:01:48.370
system run, we can ask our question and

38
00:01:48.370 --> 00:01:50.070
we can get our response back.

39
00:01:51.130 --> 00:01:52.950
And if we do, we, of course, get

40
00:01:52.950 --> 00:01:55.750
the response, but this response is, as we

41
00:01:55.750 --> 00:01:57.410
see it right here, it's just the two

42
00:01:57.410 --> 00:01:59.870
string of this agent response.

43
00:02:00.630 --> 00:02:02.570
Behind the scene, the agent response is a

44
00:02:02.570 --> 00:02:05.470
much bigger object that have, for example, the

45
00:02:05.470 --> 00:02:08.449
ID of the agent we use, when it

46
00:02:08.449 --> 00:02:13.970
was created, the response ID back uses, as

47
00:02:13.970 --> 00:02:16.870
we saw in that video, and a bunch

48
00:02:16.870 --> 00:02:19.090
more that we will cover later on.

49
00:02:21.130 --> 00:02:24.690
So it's first when I press that we

50
00:02:24.690 --> 00:02:26.430
actually get the response back here.

51
00:02:27.190 --> 00:02:29.150
We can also do streaming.

52
00:02:30.090 --> 00:02:33.850
And if we do streaming, we say run

53
00:02:33.850 --> 00:02:36.710
streaming async, and then I'm asking, in this

54
00:02:36.710 --> 00:02:37.810
case, how to make soup.

55
00:02:38.310 --> 00:02:39.990
The reason why I'm doing that instead of

56
00:02:39.990 --> 00:02:42.670
the capital of France is because it gives

57
00:02:42.670 --> 00:02:44.970
us the option to better see that it's

58
00:02:44.970 --> 00:02:46.050
actually streaming back.

59
00:02:46.270 --> 00:02:48.130
So if I press F5 here and go

60
00:02:48.130 --> 00:02:52.750
to after, look that it's coming back word

61
00:02:52.750 --> 00:02:53.530
by word here.

62
00:02:54.590 --> 00:02:57.810
And the way it does that is we

63
00:02:57.810 --> 00:02:59.990
go and see we get an update.

64
00:03:01.330 --> 00:03:03.930
And the first update is actually not even

65
00:03:03.930 --> 00:03:05.010
any of the text.

66
00:03:05.290 --> 00:03:10.670
It's just, hey, I'm the agent, and it

67
00:03:10.670 --> 00:03:11.910
was created at this point.

68
00:03:12.710 --> 00:03:15.350
So it's not only the text streaming back,

69
00:03:15.430 --> 00:03:19.770
it's the entire request response back.

70
00:03:21.930 --> 00:03:26.630
So second update, we now get that we

71
00:03:26.630 --> 00:03:28.830
have an assistant that is about to give

72
00:03:28.830 --> 00:03:29.730
us some text back.

73
00:03:30.330 --> 00:03:32.350
So we are called the user.

74
00:03:32.910 --> 00:03:34.310
The AI is called the assistant.

75
00:03:35.170 --> 00:03:38.070
And it's now identified that the assistant will

76
00:03:38.070 --> 00:03:40.310
say something to us, but it hadn't done

77
00:03:40.310 --> 00:03:40.590
it.

78
00:03:43.650 --> 00:03:46.990
And now comes the first word in everything,

79
00:03:47.510 --> 00:03:47.850
making.

80
00:03:47.850 --> 00:03:50.230
So you can see it's wrote down making

81
00:03:50.230 --> 00:03:50.590
here.

82
00:03:52.750 --> 00:03:58.110
And this will continue for almost 400 updates

83
00:03:58.110 --> 00:04:00.530
in this case because there's so much to

84
00:04:00.530 --> 00:04:00.910
write.

85
00:04:01.250 --> 00:04:03.890
So you can see it's writing each word.

86
00:04:04.790 --> 00:04:07.030
If I let it run, it will write

87
00:04:07.030 --> 00:04:07.750
all the words.

88
00:04:09.550 --> 00:04:12.050
This is a cool way of doing things,

89
00:04:12.270 --> 00:04:15.329
but it has the drawback that whenever we

90
00:04:15.329 --> 00:04:18.810
do these updates, we can't really get users

91
00:04:18.810 --> 00:04:19.130
back.

92
00:04:19.209 --> 00:04:20.670
It's not like when we have the first

93
00:04:20.670 --> 00:04:22.890
three words that we can say how many

94
00:04:22.890 --> 00:04:25.250
tokens was used for making those three words.

95
00:04:26.430 --> 00:04:29.370
That's only the response object that we get

96
00:04:29.370 --> 00:04:32.330
from normal that can give such things back,

97
00:04:32.430 --> 00:04:34.610
and all the other things that might be

98
00:04:34.610 --> 00:04:36.590
in a call that we are interested in.

99
00:04:38.110 --> 00:04:41.690
So for that reason, the agent framework team

100
00:04:41.690 --> 00:04:44.610
have made that you can store all these

101
00:04:44.610 --> 00:04:45.110
updates.

102
00:04:46.410 --> 00:04:48.350
So if we go in here, it has

103
00:04:48.350 --> 00:04:49.190
the same question.

104
00:04:50.370 --> 00:04:52.710
We will then now store the first update,

105
00:04:53.030 --> 00:04:54.730
the second update, and so on.

106
00:04:55.710 --> 00:04:57.350
And if we let it run, you can

107
00:04:57.350 --> 00:05:00.330
see there's 361 updates.

108
00:05:01.410 --> 00:05:03.970
And if we open it, it takes a

109
00:05:03.970 --> 00:05:04.390
little while.

110
00:05:04.390 --> 00:05:07.210
You can actually see the entire sentence coming

111
00:05:07.210 --> 00:05:12.090
back in an object by object update here.

112
00:05:13.490 --> 00:05:15.450
And if we want to take all these

113
00:05:15.450 --> 00:05:18.430
updates and turn into one of these responses

114
00:05:18.430 --> 00:05:22.230
up here, we have handy methods on the

115
00:05:22.230 --> 00:05:26.430
object extension method that say to agent response.

116
00:05:27.010 --> 00:05:28.890
And if we do that, we actually get

117
00:05:28.890 --> 00:05:29.870
our response back.

118
00:05:30.790 --> 00:05:34.090
And the response is, of course, the entire

119
00:05:34.090 --> 00:05:36.730
thing as if we had run the runAsync

120
00:05:36.730 --> 00:05:37.090
instead.

121
00:05:38.490 --> 00:05:42.630
So it looks exactly the same where we

122
00:05:42.630 --> 00:05:44.010
didn't get the entire text.

123
00:05:44.130 --> 00:05:45.750
Of course, we haven't have written that to

124
00:05:45.750 --> 00:05:46.110
the user.

125
00:05:46.350 --> 00:05:48.930
We don't really care too much here, but

126
00:05:48.930 --> 00:05:51.310
we're interested in the input tokens and the

127
00:05:51.310 --> 00:05:54.470
output tokens and whatever else is in here

128
00:05:54.470 --> 00:05:55.370
as we go along.

129
00:05:57.270 --> 00:05:59.490
So in this case, we're just getting the

130
00:05:59.490 --> 00:06:00.290
output tokens.

131
00:06:02.710 --> 00:06:06.510
So if you want to use runAsync or

132
00:06:06.510 --> 00:06:10.090
streaming, streaming is mostly used only for chatbots

133
00:06:10.090 --> 00:06:12.730
because whenever we want to have it behind

134
00:06:12.730 --> 00:06:17.310
the scenes, it's just extra code that nobody

135
00:06:17.310 --> 00:06:20.170
will see that it was streaming back to

136
00:06:20.170 --> 00:06:20.890
us anyway.

137
00:06:20.890 --> 00:06:25.170
So it's only when I do chatbots that

138
00:06:25.170 --> 00:06:27.430
I actually use the streaming part.

139
00:06:28.170 --> 00:06:30.510
95% of the time, I end up

140
00:06:30.510 --> 00:06:33.430
just using runAsync for the work I do

141
00:06:33.430 --> 00:06:36.830
because I rarely do chatbots.

142
00:06:36.950 --> 00:06:39.150
I more do behind the scenes with AI.

143
00:06:40.510 --> 00:06:43.350
But that's everything there is to normal versus

144
00:06:43.350 --> 00:06:43.930
streaming.
