1
00:00:00,360 --> 00:00:07,290
Hi, guys, and welcome to this other exciting session in which we are going to look at the Onyx format

2
00:00:07,290 --> 00:00:09,750
of representing machine learning models.

3
00:00:09,780 --> 00:00:14,040
Onyx actually stands for Open Neural Network Exchange.

4
00:00:14,340 --> 00:00:22,140
This open standard for machine learning interoperability was co-developed by Microsoft, Facebook and

5
00:00:22,560 --> 00:00:23,190
AWS.

6
00:00:23,730 --> 00:00:32,040
And in this session, we'll learn how to convert our already trained TensorFlow model into this Onyx

7
00:00:32,160 --> 00:00:40,070
format and then carry out inference on this newly created Onyx model.

8
00:00:40,080 --> 00:00:47,940
So we've gotten to this point where we've fine tuned our hogging phase based vision transformer model.

9
00:00:48,120 --> 00:00:58,050
Now we have this TensorFlow model which we have created and evaluated, and it may happen that another

10
00:00:58,050 --> 00:01:05,310
developer using a different framework, like, for example, PyTorch, wants to make use of this model

11
00:01:05,310 --> 00:01:07,410
which has been trained in TensorFlow.

12
00:01:08,370 --> 00:01:14,940
So thanks to the Onyx format, we can now convert this code which was written in TensorFlow or rather

13
00:01:14,940 --> 00:01:24,360
convert this model which was built in terms of flow into an Onyx model and then later on convert this

14
00:01:25,020 --> 00:01:35,190
model from the Onyx format into PyTorch, such that we now have this PyTorch model which this other

15
00:01:35,190 --> 00:01:36,750
practitioner can use.

16
00:01:37,050 --> 00:01:40,970
Now another possibility is the reverse.

17
00:01:40,980 --> 00:01:47,970
That is, we could go from this PyTorch model to a TensorFlow model.

18
00:01:48,690 --> 00:01:52,800
Thanks to the Onyx format's interoperability.

19
00:01:53,400 --> 00:02:01,110
Another reason why a developer will want to say take this model from PyTorch and make use of say, in

20
00:02:01,110 --> 00:02:02,820
CAFE, for example.

21
00:02:03,390 --> 00:02:11,100
Maybe because that model is more efficiently run in this other framework.

22
00:02:11,100 --> 00:02:16,770
And the reason why we have these kinds of differences for different frameworks is because, for example,

23
00:02:16,770 --> 00:02:24,510
you could have a convolutional neural network or a convolutional layer which is built in PyTorch with

24
00:02:24,510 --> 00:02:32,850
a certain implementation, and the implementation in cafe may be slightly different and maybe more efficient

25
00:02:32,850 --> 00:02:35,910
as compared to that which was done in PyTorch.

26
00:02:37,200 --> 00:02:43,830
Now this is just for demonstrative purposes and we are not saying that the implementations in CAFE are

27
00:02:43,830 --> 00:02:46,290
better than those in PyTorch.

28
00:02:46,920 --> 00:02:55,110
And so in summary, the Onyx format allows models to be represented in a common format that can be executed

29
00:02:55,110 --> 00:03:00,260
across different hardware platforms using the Onix runtime.

30
00:03:00,270 --> 00:03:06,930
So now developers can feel free to build their models with just any framework.

31
00:03:06,960 --> 00:03:14,130
Say, for example, TensorFlow or PyTorch or Padel or whatever framework they actually want to work

32
00:03:14,130 --> 00:03:21,150
with, knowing that they could deploy your model on whatever hardware they want to, since they could

33
00:03:21,150 --> 00:03:31,590
now convert these models to the Onyx format and then run inference on these models via the Onyx runtime,

34
00:03:31,590 --> 00:03:40,410
which is in fact an inference engine which is lightweight and modular and permits us to run our Onyx

35
00:03:40,410 --> 00:03:44,940
models on just any hardware we choose to work with.

36
00:03:44,970 --> 00:03:48,000
So here let's look at the list of supported hardwares.

37
00:03:48,000 --> 00:03:57,090
You see for example, the tensor r rt, which is very popular with the Nvidia GPUs and permit us to

38
00:03:57,090 --> 00:04:03,480
attain very high speeds when carrying out in France on neural network models.

39
00:04:03,510 --> 00:04:09,110
The other great advantage of working with the Onyx models is the ease with which we could convert the

40
00:04:09,150 --> 00:04:12,230
TensorFlow models into this Onyx format.

41
00:04:12,240 --> 00:04:19,170
So right here you see we evaluate in our model this hugging face model, which we fine tuned previously,

42
00:04:19,980 --> 00:04:24,270
and we get in 90% validation accuracy.

43
00:04:24,990 --> 00:04:27,300
Now we could go ahead and save this model.

44
00:04:27,300 --> 00:04:30,380
So let's take this off H.f. model.

45
00:04:30,390 --> 00:04:31,830
Let's call this hugging.

46
00:04:31,980 --> 00:04:36,330
Let's call this white, white or fine tuned OC.

47
00:04:36,330 --> 00:04:36,980
So that's it.

48
00:04:36,990 --> 00:04:41,160
We have this fine tuned white model, which we're going to save.

49
00:04:41,220 --> 00:04:42,930
We check this out here.

50
00:04:44,340 --> 00:04:45,090
There we go.

51
00:04:45,090 --> 00:04:48,270
We see a white fine tuned and you could check.

52
00:04:48,270 --> 00:04:54,390
You see you have the saved model Keras method dealer variables the assets folder.

53
00:04:54,390 --> 00:05:02,280
And then you're if you could see that this is almost one gigabyte model so you could view this here

54
00:05:02,280 --> 00:05:07,650
you see 984 just just check as I hover on this.

55
00:05:07,650 --> 00:05:13,350
You see here 984.39 megabyte model.

56
00:05:14,220 --> 00:05:20,100
And so this means that if you are to deploy this in a real world scenario, then you will need to always

57
00:05:20,100 --> 00:05:24,990
allocate this amount of space in order to run this model.

58
00:05:25,200 --> 00:05:30,540
Now, let's move on to converting this model into the Onyx format.

59
00:05:31,350 --> 00:05:33,630
We shall have this two installs.

60
00:05:33,630 --> 00:05:42,600
So we start by installing the TensorFlow to Onyx Tool and then we'll install the Onyx runtime.

61
00:05:42,600 --> 00:05:52,170
So we run this and then now we'll go ahead to convert our model, which is this one year with fine tuned.

62
00:05:52,170 --> 00:05:56,100
So we just have to specify your white fine tuned into the Onyx format.

63
00:05:56,100 --> 00:06:06,450
So here we just say white fine tuned and then now we have this model or give it the name of our onyx

64
00:06:06,780 --> 00:06:07,560
file.

65
00:06:07,860 --> 00:06:10,500
Who called us white Onyx.

66
00:06:11,190 --> 00:06:12,010
That's it.

67
00:06:12,060 --> 00:06:13,260
Or with Onyx.

68
00:06:13,910 --> 00:06:16,740
TF to Onyx convert.

69
00:06:16,740 --> 00:06:18,380
And that should be it.

70
00:06:18,390 --> 00:06:26,340
So now we'll run this then, while waiting for that run to be complete, you could get to the Onyx GitHub

71
00:06:26,520 --> 00:06:36,030
repo TensorFlow onyx, and then you'll have the documentation for how to convert from TensorFlow to

72
00:06:36,030 --> 00:06:36,750
Onyx.

73
00:06:37,230 --> 00:06:47,580
So let's get back to this still running because see, we have a series of warnings, then specifies

74
00:06:47,580 --> 00:06:55,660
the TensorFlow one, TensorFlow Onyx and TensorFlow two Onyx versions, then the offset, then it's

75
00:06:55,680 --> 00:06:56,730
optimizing.

76
00:06:57,480 --> 00:07:00,060
And now we have this successful conversion.

77
00:07:00,060 --> 00:07:02,890
So on this model is saved, as with Onyx and Onyx.

78
00:07:02,910 --> 00:07:04,560
Now let's open this up.

79
00:07:04,590 --> 00:07:07,710
You see, with Onyx or Onyx, you see you.

80
00:07:07,710 --> 00:07:09,900
We've moved from this.

81
00:07:10,380 --> 00:07:11,910
Let's open this again.

82
00:07:11,910 --> 00:07:22,890
We've moved from this year where we had 984.39 megabytes to this optimized Onyx version, which is at

83
00:07:22,890 --> 00:07:25,850
327 megabytes.

84
00:07:25,860 --> 00:07:32,610
Now, another option will be to convert the model from this class format to the Onyx format.

85
00:07:32,610 --> 00:07:39,990
So this first one, we saved this as a TensorFlow saved model, and then now we could just have this

86
00:07:39,990 --> 00:07:40,280
year.

87
00:07:40,290 --> 00:07:43,290
So let's say we have the model.

88
00:07:43,290 --> 00:07:48,180
We saved this as a class model.

89
00:07:48,180 --> 00:07:53,520
So we have the H five and we run that checking here.

90
00:07:54,480 --> 00:08:00,990
You see we have this still 984.9, eight megabytes, close to one gigabytes.

91
00:08:00,990 --> 00:08:06,570
And then from this Keras format, we are going to now convert this to Onyx.

92
00:08:07,200 --> 00:08:14,070
Now you're we're going to have this specification and then we'll pass in the image size.

93
00:08:14,070 --> 00:08:22,010
So here we have the batch by 256, by two, 56 by three, it's flow 32 and it's our input.

94
00:08:22,020 --> 00:08:24,450
Then we could specify this output path.

95
00:08:24,450 --> 00:08:26,730
So this one was with Onyx.

96
00:08:26,730 --> 00:08:39,030
Let's, let's, let's have this as, let's, let's call this white white cross dot onyx.

97
00:08:39,030 --> 00:08:39,540
Okay.

98
00:08:39,570 --> 00:08:42,210
So this is going to be our output path here.

99
00:08:42,360 --> 00:08:52,860
And then we will have this TensorFlow two Onyx here, which contains this from Keras method, which

100
00:08:52,860 --> 00:09:01,530
takes in our model hog interface model, and then the specifications which we just mentioned right here.

101
00:09:01,530 --> 00:09:04,980
So your specifications right here, we pass this in.

102
00:09:04,980 --> 00:09:08,910
This passes as our input signature.

103
00:09:08,910 --> 00:09:14,970
We have this offset value and then we also have our output path, which we have already specified here.

104
00:09:14,970 --> 00:09:18,420
Then we have our output names, which you shall get automatically.

105
00:09:18,780 --> 00:09:19,680
So that's it.

106
00:09:19,680 --> 00:09:22,720
Let's run this while that's running.

107
00:09:22,740 --> 00:09:30,900
Also, note that you could check out this conversions on the Onyx runtime platform where you can get

108
00:09:30,900 --> 00:09:32,820
the details of all what we were doing.

109
00:09:33,120 --> 00:09:34,800
So let's get back here.

110
00:09:34,800 --> 00:09:38,820
Still still running and that's it complete.

111
00:09:38,820 --> 00:09:45,120
Let's check out our accuracy 30 327.59 megabytes.

112
00:09:46,140 --> 00:09:52,920
Then from here, we move on to the inference where we'll see whether what we get from the Onyx model

113
00:09:52,920 --> 00:09:58,830
coincides with the initial Keras or TensorFlow formats.

114
00:09:58,830 --> 00:10:01,080
So right here we have this provider.

115
00:10:01,110 --> 00:10:09,390
Now, you always specify this provider to be this CPU execution provider as we'll be running this on

116
00:10:09,390 --> 00:10:10,530
the CPU.

117
00:10:10,530 --> 00:10:19,560
And then we have this Onyx runtime as our TX, which we are imported right here and we should we make

118
00:10:19,560 --> 00:10:21,330
enough making use of your.

119
00:10:21,330 --> 00:10:28,650
So when we want to run an Onyx model, you see the first thing you have to notice we do not need TensorFlow

120
00:10:28,680 --> 00:10:29,280
anymore.

121
00:10:29,280 --> 00:10:38,250
So even if we restart this whole process, all we need to do will be just to install the onyx runtime

122
00:10:38,250 --> 00:10:41,280
and then import this this way.

123
00:10:41,280 --> 00:10:43,950
So now we have our Onyx runtime.

124
00:10:44,100 --> 00:10:53,310
We, we, we have this inference session which we create by just specifying this path, our output path

125
00:10:53,310 --> 00:10:53,940
here.

126
00:10:54,390 --> 00:11:01,920
Our output path is the path to this model, the Onyx model, and then the provider is this one year.

127
00:11:02,460 --> 00:11:07,320
So we just need to specify this path and the provider and we're good to go.

128
00:11:07,410 --> 00:11:08,940
So now we have this.

129
00:11:09,990 --> 00:11:13,380
The next thing we want to do is to run the inference.

130
00:11:13,380 --> 00:11:15,750
So the Onyx prediction is this.

131
00:11:15,750 --> 00:11:20,790
MB that run and then now we'll specify the output names.

132
00:11:21,150 --> 00:11:24,900
Now this output names is going from here.

133
00:11:24,900 --> 00:11:29,790
So let's run this, let's print out output names so you see what it contains.

134
00:11:31,050 --> 00:11:31,740
That's it.

135
00:11:31,740 --> 00:11:33,120
You see, we have that dense.

136
00:11:33,120 --> 00:11:39,330
And if we get back to when we're creating this model, let's get back to this.

137
00:11:39,330 --> 00:11:39,820
Okay.

138
00:11:39,840 --> 00:11:42,480
You see, the name here we have is dense.

139
00:11:42,480 --> 00:11:47,700
So if you specify the model name to be different from this, you would have a different output name.

140
00:11:48,060 --> 00:11:48,930
So that's it.

141
00:11:48,930 --> 00:11:52,290
Let's get back to our Onyx inference.

142
00:11:53,070 --> 00:11:55,500
We've converted already and that's it.

143
00:11:55,500 --> 00:11:56,070
So.

144
00:11:56,070 --> 00:11:56,790
So that's it.

145
00:11:56,790 --> 00:11:59,070
We have our output names, which we created.

146
00:11:59,070 --> 00:11:59,400
All right.

147
00:11:59,400 --> 00:12:02,390
I wish we generated automatically from here.

148
00:12:02,400 --> 00:12:03,420
Does it?

149
00:12:03,420 --> 00:12:09,090
And then obviously it's a list because we could have several outputs.

150
00:12:09,090 --> 00:12:12,840
And then from here we pass in our input image.

151
00:12:12,840 --> 00:12:16,800
Now this our input image is simply what we've been having already.

152
00:12:16,800 --> 00:12:23,790
So we just we could copy this, copy this, and then test this out here with our Onyx model.

153
00:12:24,660 --> 00:12:28,230
Let's have this here and this code paces out.

154
00:12:28,770 --> 00:12:31,500
All we need here is just this basically.

155
00:12:31,500 --> 00:12:33,480
So we we could run this.

156
00:12:33,480 --> 00:12:34,740
Let's run this.

157
00:12:34,740 --> 00:12:35,310
That's it.

158
00:12:35,310 --> 00:12:39,840
We have our image, and then let's run this and let's run this.

159
00:12:39,840 --> 00:12:44,580
And then we print out the onyx print so we get in input.

160
00:12:44,580 --> 00:12:47,430
Must be a list of dictionaries or a single non pi array.

161
00:12:47,430 --> 00:12:51,630
So what we're going to do here is instead of TensorFlow use numpy.

162
00:12:51,660 --> 00:12:54,600
So you see already that we do not really need TensorFlow.

163
00:12:54,600 --> 00:12:58,890
Like even with this we could get the the test image from here.

164
00:12:58,890 --> 00:13:06,420
Let's, let's get this test image directly so we can have test image and that's it.

165
00:13:06,420 --> 00:13:14,220
We run that we have our image does not define our test image.

166
00:13:14,220 --> 00:13:15,510
Let's run that again.

167
00:13:16,020 --> 00:13:17,070
That's fine.

168
00:13:17,070 --> 00:13:20,760
We get this other error because this input isn't a float.

169
00:13:20,760 --> 00:13:30,570
So here instead of this we're going to have test image, dot se type and p dot float 32.

170
00:13:31,200 --> 00:13:32,910
Okay, so we have that set.

171
00:13:32,940 --> 00:13:34,770
Now let's run this again.

172
00:13:34,770 --> 00:13:38,460
No, before doing this, let's make sure we pass in this image instead here.

173
00:13:38,460 --> 00:13:39,630
So we have that.

174
00:13:39,630 --> 00:13:40,710
We run that.

175
00:13:41,400 --> 00:13:42,300
That should be fine.

176
00:13:42,300 --> 00:13:44,760
Okay, so let's rerun this.

177
00:13:46,290 --> 00:13:47,190
Again.

178
00:13:47,190 --> 00:13:50,730
And you see, we have now our onyx spread.

179
00:13:50,730 --> 00:13:51,960
So there we go.

180
00:13:51,960 --> 00:13:57,420
You see, it shows us that it's a sad image because this is angry, happy, sad.

181
00:13:57,420 --> 00:13:59,100
So it's sad because the sad.

182
00:13:59,130 --> 00:14:00,450
Has the highest probability.

183
00:14:00,450 --> 00:14:02,200
And you see, the image here is sad.

184
00:14:02,220 --> 00:14:09,270
Now, if we get back to the top here and simply run this the same image, let's run this.

185
00:14:10,780 --> 00:14:12,610
Oh, let's get the probabilities, actually.

186
00:14:12,610 --> 00:14:19,450
So let's let's let's say we want to have let's print out the model image.

187
00:14:19,450 --> 00:14:20,950
Let's get this probabilities.

188
00:14:21,460 --> 00:14:25,030
You see, this is what this model gives us is output.

189
00:14:25,030 --> 00:14:30,130
And here is what we get from the Onyx model right here.

190
00:14:31,450 --> 00:14:38,470
Now, before we continue, another thing we'd like to check out is the speed or the time or the latency

191
00:14:38,470 --> 00:14:39,780
of the model.

192
00:14:39,790 --> 00:14:45,720
So let's get back up here and we could add this code sell.

193
00:14:45,730 --> 00:14:49,420
Let's import time or we could just do it below.

194
00:14:49,420 --> 00:14:52,990
So let's yeah, let's do, let's just do that below.

195
00:14:52,990 --> 00:15:02,470
So let's take this off and then your we could say we want to have less import time, There we go.

196
00:15:02,470 --> 00:15:08,380
And then we have DT one which is time that time.

197
00:15:08,920 --> 00:15:13,600
And then we do have model taking an input image.

198
00:15:14,470 --> 00:15:25,570
Then we record the time, the current time minus the T one, so we could get the time which is elapsed

199
00:15:25,570 --> 00:15:27,880
after running this model.

200
00:15:27,880 --> 00:15:31,540
You see 0.14, that's a 0.14.

201
00:15:31,540 --> 00:15:35,950
And you should know that here we suppose that we run in this on a GPU.

202
00:15:35,950 --> 00:15:41,500
So you check manage sessions, we run this on a GPU.

203
00:15:41,860 --> 00:15:54,610
Now for the Onyx model, you could have your T one time that time and then we could print out the the

204
00:15:54,610 --> 00:15:55,570
difference in time.

205
00:15:55,570 --> 00:15:57,790
So we have that minus RT one.

206
00:15:58,120 --> 00:16:01,180
Let's run this and see what we get.

207
00:16:02,170 --> 00:16:05,320
Input list single non pi.

208
00:16:05,830 --> 00:16:07,210
Oh, let's run this again.

209
00:16:10,180 --> 00:16:10,450
Okay.

210
00:16:10,450 --> 00:16:20,440
So you see with yeah, we're making use of the CPU and we get in 0.27 9 seconds while with the GPU here

211
00:16:20,440 --> 00:16:22,780
we get a 0.14 3 seconds.

212
00:16:23,140 --> 00:16:34,810
Now we can be comparing the Onyx runtime results on the CPU with data of this or hungry phase model

213
00:16:34,810 --> 00:16:36,910
with TensorFlow on a GPU.

214
00:16:36,940 --> 00:16:38,590
You could you could even see from here.

215
00:16:38,590 --> 00:16:45,810
If you do get device, you'll see from here that what the onyx runtime is using is the CPU.

216
00:16:45,820 --> 00:16:51,250
And so if want to make use of the GPU, we would have to install on x runtime GPU version so we could

217
00:16:51,250 --> 00:16:55,870
compare the two models.

218
00:16:56,770 --> 00:17:07,240
So for now, just note that TensorFlow with the GPU is 0.150.1 5 seconds and then we will need to get

219
00:17:07,240 --> 00:17:10,420
TensorFlow with a CPU.

220
00:17:10,420 --> 00:17:21,490
And then we'll also get we got onyx with the CPU, onyx with a CPU give this is 0.3 seconds, give about

221
00:17:21,490 --> 00:17:24,250
0.3 or let's run this again.

222
00:17:24,250 --> 00:17:25,180
We run it again.

223
00:17:25,180 --> 00:17:28,840
We should get from this again.

224
00:17:29,680 --> 00:17:31,030
I'll get in an error.

225
00:17:31,060 --> 00:17:36,010
Now that was because I had moved this file into the drive.

226
00:17:36,010 --> 00:17:39,490
So here we have the crash here and we run this again.

227
00:17:39,490 --> 00:17:45,060
We have about 0.38 C that's it about okay, let's say 0.5.

228
00:17:45,070 --> 00:17:54,460
So let's let's say this is 0.5 here and then we need to get on the GPU and then we also need to get

229
00:17:54,460 --> 00:17:56,580
a tensorflow with the CPU.

230
00:17:56,590 --> 00:17:58,120
So that's it.

231
00:17:58,600 --> 00:18:02,530
What we'll do now is we'll go ahead and install the Onyx GPU.

232
00:18:03,340 --> 00:18:09,150
So yeah, we have PIP install on X runtime CPU and that should be it.

233
00:18:09,160 --> 00:18:14,950
We already installed this OC, so we have this Onyx runtime GPU.

234
00:18:14,950 --> 00:18:27,070
And then if you if you, if you do import on X runtime as our PT and then you as our PT and then you'll

235
00:18:27,070 --> 00:18:33,510
see our PT dot get device, we are getting the CPU.

236
00:18:33,520 --> 00:18:40,510
So what we'll do now is we are going to restart this runtime, so basically restart this runtime so

237
00:18:40,510 --> 00:18:46,540
that this, the GPU version of Onyx will be taken into consideration.

238
00:18:47,440 --> 00:18:51,910
Now this time around we're going to install on X runtime with the GPU.

239
00:18:54,430 --> 00:18:58,840
And as it as you can see now, the device we're using is a GPU.

240
00:18:59,300 --> 00:18:59,740
Okay.

241
00:18:59,740 --> 00:19:01,000
So we have that.

242
00:19:01,000 --> 00:19:02,920
Let's now get back here.

243
00:19:04,000 --> 00:19:09,370
We already have our model in the Onyx format, so we just run this.

244
00:19:09,700 --> 00:19:18,430
And then if you run this, you'll see that since we we specify the provider to the CPU, we shouldn't

245
00:19:18,430 --> 00:19:20,260
have any much difference here.

246
00:19:20,260 --> 00:19:21,760
So let's run this again.

247
00:19:21,790 --> 00:19:25,750
We import a time output names not defined.

248
00:19:26,500 --> 00:19:28,090
Our output names is basically this.

249
00:19:28,090 --> 00:19:32,650
So let's have this list and let's define this here.

250
00:19:32,650 --> 00:19:34,600
So let's have output names.

251
00:19:36,040 --> 00:19:38,020
Output names.

252
00:19:38,020 --> 00:19:39,460
There we go.

253
00:19:40,240 --> 00:19:41,470
Output names dense.

254
00:19:41,470 --> 00:19:42,960
So let's run that again.

255
00:19:42,970 --> 00:19:43,990
That's fine.

256
00:19:44,020 --> 00:19:46,540
And then we get back here.

257
00:19:46,540 --> 00:19:52,780
But note that this output names were generated from yours so you could get back to start generating

258
00:19:52,780 --> 00:19:53,080
this.

259
00:19:53,080 --> 00:19:54,900
Or you could just make yourself the output names.

260
00:19:54,910 --> 00:20:00,340
As you can see, since we started this runtime, we do not have those files anymore apart from the Onyx

261
00:20:00,730 --> 00:20:02,460
which we stored in the drive.

262
00:20:02,470 --> 00:20:07,870
So if we have to do this, we need to retrain our model since we lost them already.

263
00:20:08,590 --> 00:20:12,820
So we have our output names here which was specified, and then we get back here.

264
00:20:12,820 --> 00:20:17,560
We run this again and check out the time it takes to run the model.

265
00:20:17,560 --> 00:20:19,050
You see 0.34.

266
00:20:19,060 --> 00:20:19,930
Not bad.

267
00:20:21,050 --> 00:20:23,990
Uh, 0.34 OC This is 0.34.

268
00:20:23,990 --> 00:20:27,360
So you're normally this should be zero point.

269
00:20:27,390 --> 00:20:29,030
Yeah, that's a 35 OC.

270
00:20:29,030 --> 00:20:33,360
So the TF with GPU 0.15 on X with CPU 0.35.

271
00:20:33,380 --> 00:20:37,280
Now the way we're going to use this is we'll get here.

272
00:20:37,280 --> 00:20:46,460
You see in this documentation you have the provider, you specify the Cuda execution provider, so let's

273
00:20:46,460 --> 00:20:50,230
copy this and then paste it out here.

274
00:20:50,240 --> 00:20:55,700
So instead of instead of F, for example, instead of CPU, now we have the GPU, so we could make use

275
00:20:55,700 --> 00:20:57,830
of the coder execution provider.

276
00:20:57,830 --> 00:21:01,580
So here we'll take this off and then paste this out.

277
00:21:01,580 --> 00:21:03,440
So this all provider now.

278
00:21:03,560 --> 00:21:12,140
Note that since this providers can be a list, you could have several providers like here's when we

279
00:21:12,140 --> 00:21:22,190
do this, when we have CUDA execution provider before the CPU execution provider, it means that the

280
00:21:22,190 --> 00:21:25,480
priority goes to the CUDA execution provider.

281
00:21:25,490 --> 00:21:30,440
And so in the case where we are having a CPU, then we start with this one.

282
00:21:30,440 --> 00:21:35,210
And given that this cannot work in the situation of a CPU, we will then move on to this.

283
00:21:35,210 --> 00:21:43,750
But if we have a GPU then directly we will use this scooter execution provider and that's it.

284
00:21:43,780 --> 00:21:46,820
You see 0.04 8 seconds.

285
00:21:46,850 --> 00:21:53,960
All I say is 0.5 seconds, which is three times less than what we had when running our model or when

286
00:21:53,960 --> 00:21:55,600
running the TensorFlow model.

287
00:21:55,610 --> 00:22:03,350
So you see already that the Onyx framework permits us to optimize our initial TensorFlow model.

288
00:22:03,350 --> 00:22:05,300
So here we have on next GPU.

289
00:22:05,330 --> 00:22:13,700
Now this is 0.05, so it takes us now 50 milliseconds to run this hugging face model.

290
00:22:13,700 --> 00:22:14,630
So that's it.

291
00:22:15,620 --> 00:22:22,910
Now, before we proceed, it's also important to note that generally the way we measure this time it

292
00:22:22,910 --> 00:22:25,870
takes for the model to predict an output.

293
00:22:25,880 --> 00:22:34,070
We usually can test with several input images, or we could repeat this process several times.

294
00:22:34,070 --> 00:22:38,600
So let's say four underscore in range.

295
00:22:38,720 --> 00:22:45,620
Let's say ten, we run this and then we get the time elapsed instead of just testing.

296
00:22:45,620 --> 00:22:48,260
Once, we could test ten times so we could get the average.

297
00:22:48,260 --> 00:22:49,760
So let's run that.

298
00:22:50,840 --> 00:22:55,490
And you see dividing this by ten, we have 0.035.

299
00:22:55,490 --> 00:23:02,840
So that's 35 milliseconds per prediction.

300
00:23:03,200 --> 00:23:13,010
So this means that for us to have 100 predictions, we would take three point 5 milliseconds.

301
00:23:13,010 --> 00:23:14,060
We could test that out.

302
00:23:14,060 --> 00:23:15,200
Let's run this.

303
00:23:17,600 --> 00:23:18,260
And there we go.

304
00:23:18,260 --> 00:23:21,320
We take 2.3 seconds for 100 predictions.

305
00:23:21,350 --> 00:23:26,390
Now, if we divide, let's let's say let's call this number of predictions.

306
00:23:27,890 --> 00:23:31,250
We set this to be 100 and then the time taken.

307
00:23:31,250 --> 00:23:39,980
This is a time time for a single prediction.

308
00:23:41,510 --> 00:23:42,360
There we go.

309
00:23:42,380 --> 00:23:43,920
Time for a single prediction.

310
00:23:43,940 --> 00:23:48,680
All this divided by the number of predictions.

311
00:23:48,680 --> 00:23:49,850
So we get the average.

312
00:23:50,300 --> 00:23:51,020
That's fine.

313
00:23:51,020 --> 00:23:52,400
So let's run that again.

314
00:23:53,030 --> 00:23:54,170
So to get this time.

315
00:23:56,780 --> 00:24:01,250
Okay, so we see 0.02 3 seconds.

316
00:24:01,250 --> 00:24:03,470
That is 23 milliseconds.

317
00:24:04,490 --> 00:24:08,540
Why we repeat the same for the hogging face model.

318
00:24:09,710 --> 00:24:15,020
You see, it takes, uh, 0.1 5 seconds, just as we had already.

319
00:24:16,040 --> 00:24:19,900
Now, you could see that the the difference in speed here.

320
00:24:19,970 --> 00:24:24,770
0.15 divided by .025.

321
00:24:25,430 --> 00:24:30,440
You see, the Onyx model is six times faster than the TensorFlow model.

322
00:24:32,790 --> 00:24:38,540
Now what if we get back and run the TensorFlow model on the CPU?

323
00:24:38,550 --> 00:24:43,140
So change this runtime known and we save that.

324
00:24:43,290 --> 00:24:45,900
So we'll have to rerun all this again.

325
00:24:46,800 --> 00:24:55,110
Now, running our hog interface model, your you see this takes 0.8 seconds, 0.8.

326
00:24:55,110 --> 00:25:02,880
So with the with the, with the CPU, we have 0.8 divided by 0.35.

327
00:25:03,630 --> 00:25:10,440
The Onyx model runs about twice as fast as the TensorFlow model.
