1
00:00:05,240 --> 00:00:09,890
In this section we're going to talk about aggregate functions, which are just summary functions that

2
00:00:09,890 --> 00:00:12,590
reduce groups of values down to one value.

3
00:00:12,620 --> 00:00:19,310
More specifically, they produce generalizations about a group of values like count, average, sum,

4
00:00:19,310 --> 00:00:20,600
min, and max.

5
00:00:20,600 --> 00:00:27,680
So for example, I could use max to search for the employee who sold the most products, or I could

6
00:00:27,680 --> 00:00:33,870
use average to search for the average salary for each employee in a given department.

7
00:00:33,870 --> 00:00:37,590
So let's go over to SSMs and see how they work.

8
00:00:37,590 --> 00:00:41,550
So I'm going to open a new query window here.

9
00:00:41,550 --> 00:00:43,920
And I'm just going to do some examples.

10
00:00:43,920 --> 00:00:47,730
I'll give you these as a resource at the end of the lecture.

11
00:00:47,730 --> 00:00:50,160
And these are going to be aggregate functions.

12
00:00:50,160 --> 00:00:55,360
And we are going to start with the first one which is the count aggregate.

13
00:00:55,360 --> 00:01:01,120
And basically it just returns the number of rows that result from a SQL query.

14
00:01:01,120 --> 00:01:05,170
So let's first take a look at our customer table.

15
00:01:05,170 --> 00:01:09,520
And you can see down here the editor tells you there's 107 customers.

16
00:01:09,520 --> 00:01:11,140
And this is our data.

17
00:01:11,140 --> 00:01:14,260
It's always nice to kind of see the data first before I start doing this.

18
00:01:14,260 --> 00:01:21,770
And so if I wanted to know how many customers we have, I could just do select and then just do count.

19
00:01:21,770 --> 00:01:29,480
And the argument here for count is really just going to be either star or it's going to be a field name,

20
00:01:29,480 --> 00:01:33,620
but it's just going to be the number of results returned by the query.

21
00:01:33,620 --> 00:01:36,380
So we would do select count from.

22
00:01:36,380 --> 00:01:39,170
And then I will just drag out customers.

23
00:01:39,170 --> 00:01:43,910
And if we execute that, then it's just going to be 107.

24
00:01:43,910 --> 00:01:49,190
So instead of listing all the customers out, we can just say, hey, give me the count and it will

25
00:01:49,190 --> 00:01:49,910
return that.

26
00:01:49,910 --> 00:01:55,250
Now count acts kind of like a column name, so we can alias that.

27
00:01:55,250 --> 00:01:57,800
So we could call this number customers.

28
00:01:57,800 --> 00:02:02,960
And that's just going to give us a little bit better header name for our results.

29
00:02:02,960 --> 00:02:05,910
Now let's say that we want to refine it further.

30
00:02:05,910 --> 00:02:08,610
I'll put a comment up here.

31
00:02:08,610 --> 00:02:11,910
How many customers do we have.

32
00:02:11,910 --> 00:02:14,640
And then I'm just going to copy all of this.

33
00:02:14,640 --> 00:02:19,470
And then the next one, let's say that we want to know a little bit more.

34
00:02:19,470 --> 00:02:24,930
So we'll add who live in Florida or California.

35
00:02:24,930 --> 00:02:26,970
So I can change this.

36
00:02:26,970 --> 00:02:31,690
And I'll just put for our field name FL or CA.

37
00:02:31,690 --> 00:02:34,300
And then I can just put a Where clause.

38
00:02:34,300 --> 00:02:42,430
And looking at customers, the state they live in is state ID, so I'll do state ID and I'll do N.

39
00:02:42,430 --> 00:02:46,540
And then we're just going to list the list of states.

40
00:02:46,540 --> 00:02:48,700
And then I can just execute this.

41
00:02:48,700 --> 00:02:53,930
And I get a quick number if we want to know the details of who actually lives there.

42
00:02:53,930 --> 00:03:02,060
And we did this back several sections ago, we will just do select star from customers and we do get

43
00:03:02,060 --> 00:03:02,990
that list.

44
00:03:02,990 --> 00:03:04,970
And there are 16 rows.

45
00:03:04,970 --> 00:03:08,540
So for now I'm going to go ahead and just save this.

46
00:03:08,540 --> 00:03:13,640
And this is going to be our 6.1 aggregates example.

47
00:03:13,640 --> 00:03:17,150
Just so I don't lose it I'll just hit save okay.

48
00:03:17,150 --> 00:03:19,890
And so count is pretty easy to understand.

49
00:03:19,890 --> 00:03:24,030
You can use it with your wear filters and you can use n.

50
00:03:24,060 --> 00:03:27,750
You can use between like anything like that.

51
00:03:27,750 --> 00:03:29,970
So let's take a look at the next one.

52
00:03:29,970 --> 00:03:31,860
And the next one is going to be some.

53
00:03:31,860 --> 00:03:38,100
And basically that is going to be the arithmetic sum of whatever list of values that we give it.

54
00:03:38,100 --> 00:03:42,400
And so let's go take a look at employees.

55
00:03:42,400 --> 00:03:44,950
So I'll just right click and select all the rows.

56
00:03:44,950 --> 00:03:47,980
We have 75 employees and they have a salary.

57
00:03:47,980 --> 00:03:49,900
And they also have a department.

58
00:03:49,900 --> 00:03:53,020
So let's ask some more questions.

59
00:03:53,020 --> 00:03:57,160
Let's say we want to know select some.

60
00:03:57,160 --> 00:03:59,950
And in this case we're not going to put a star.

61
00:03:59,950 --> 00:04:02,050
So we want to sum up the salary.

62
00:04:02,050 --> 00:04:04,660
So I'll just go down here and get it.

63
00:04:04,660 --> 00:04:05,890
So here's salary.

64
00:04:05,890 --> 00:04:11,540
Select the sum of the salary from and I'll just drag over employees okay.

65
00:04:11,540 --> 00:04:13,850
So this is all employees.

66
00:04:13,850 --> 00:04:16,100
And that's a lot okay.

67
00:04:16,100 --> 00:04:20,360
So let's say first we'll go ahead and put our comments in here.

68
00:04:20,360 --> 00:04:26,300
What is the sum of our annual payroll which is the sum of all salaries.

69
00:04:26,300 --> 00:04:32,120
And I like to put the problems in plain language like this, because that's usually how you get the

70
00:04:32,120 --> 00:04:32,870
request.

71
00:04:32,870 --> 00:04:35,150
Let's refine this a little bit further.

72
00:04:35,150 --> 00:04:39,740
So I'm just going to copy this and we'll put where.

73
00:04:39,740 --> 00:04:45,080
And we want to look maybe at their department ID equals and 100.

74
00:04:45,080 --> 00:04:52,790
And so if we do this then basically we are going to get the sum of employees who work in this department.

75
00:04:52,790 --> 00:04:55,110
So sum is pretty simple.

76
00:04:55,110 --> 00:05:00,750
We just have to get our air clause to refine the results to exactly what we want.

77
00:05:00,750 --> 00:05:03,390
So I'll go ahead and update my comments here.

78
00:05:03,390 --> 00:05:07,740
What is sum of annual payroll for Department 100?

79
00:05:07,740 --> 00:05:12,870
And so I think once you know how to use some, you know how to use the other ones.

80
00:05:12,870 --> 00:05:17,430
So average is going to return the average of all the values.

81
00:05:17,430 --> 00:05:24,190
And so we can just copy this and change sum to average.

82
00:05:24,190 --> 00:05:28,900
And our average salary is $50,416.

83
00:05:28,900 --> 00:05:36,550
And again of course we could alias like that and make our results look a little bit cleaner.

84
00:05:36,580 --> 00:05:37,240
Okay.

85
00:05:37,240 --> 00:05:41,440
And of course we could do our minimum salary.

86
00:05:41,440 --> 00:05:43,160
So I'll just update the header.

87
00:05:43,160 --> 00:05:46,370
And we can also do our max salary.

88
00:05:46,940 --> 00:05:50,150
And then I can just highlight all of these execute it.

89
00:05:50,150 --> 00:05:54,890
So our average is 50,000 men is 24 Max is 138.

90
00:05:54,890 --> 00:06:03,200
And of course if I wanted to get all this information in just one query, I could combine these basically

91
00:06:03,200 --> 00:06:04,040
as columns.

92
00:06:04,040 --> 00:06:07,440
So I could do average salary.

93
00:06:07,650 --> 00:06:11,040
And then I'll just copy this, put a comma.

94
00:06:11,070 --> 00:06:12,300
There's my men salary.

95
00:06:12,300 --> 00:06:13,740
Put another comma.

96
00:06:13,920 --> 00:06:15,810
There's my max salary.

97
00:06:15,810 --> 00:06:18,660
And then I could just do this.

98
00:06:18,660 --> 00:06:24,630
And we've got a single query that gives us our average our minimum and our maximum.

99
00:06:24,630 --> 00:06:30,820
So a couple of other tips about aggregate functions your sum and your average.

100
00:06:30,820 --> 00:06:32,710
These are always numeric.

101
00:06:32,710 --> 00:06:40,330
However, your min and your max can be used for things like dates or character based data.

102
00:06:40,330 --> 00:06:45,100
And when you use min and max with character based data, it's going to convert it to Ascii and do that.

103
00:06:45,100 --> 00:06:48,250
So let's take a look at a couple of examples there.

104
00:06:48,250 --> 00:06:51,850
Maybe we'll go look at our orders table.

105
00:06:51,850 --> 00:06:54,490
So I'm just going to do select rows.

106
00:06:54,500 --> 00:06:56,960
And it's got some dates here.

107
00:06:56,960 --> 00:07:00,410
So we could use that on our min and max function.

108
00:07:00,410 --> 00:07:10,220
So going back here let's say we want to ask the question what is the minimum date in our orders table.

109
00:07:10,220 --> 00:07:15,950
And so I could just select min and we'll go to our orders table.

110
00:07:15,950 --> 00:07:17,630
Look at our columns.

111
00:07:17,630 --> 00:07:19,130
And here's our order date.

112
00:07:19,130 --> 00:07:22,160
So here's our minimum order date from orders.

113
00:07:22,160 --> 00:07:26,060
So it looks like it was January 3rd of 2003.

114
00:07:26,060 --> 00:07:29,750
Let's just call this earliest order.

115
00:07:30,050 --> 00:07:34,880
And of course we could also get the max date here.

116
00:07:35,270 --> 00:07:37,520
This would be our last order here.

117
00:07:37,520 --> 00:07:40,400
So your min and max can be used with dates.

118
00:07:40,400 --> 00:07:42,990
And it can also be used with names.

119
00:07:42,990 --> 00:07:49,590
So if I did something like select min and I'm going to go back to my employees table, I'm just going

120
00:07:49,590 --> 00:07:59,460
to grab first name and I'll do Max first name from and I will do from our employees.

121
00:07:59,460 --> 00:08:01,470
Let's see what that gives us.

122
00:08:01,470 --> 00:08:05,460
And so so you can see it's going in alphabetical order.

123
00:08:05,460 --> 00:08:09,820
So you could use it for that type of operation if you wanted.

124
00:08:10,510 --> 00:08:17,140
One of the apparent limitations of using aggregate is the aggregate is limited to one value.

125
00:08:17,140 --> 00:08:20,890
And I can't have multiple aggregates in a query.

126
00:08:20,890 --> 00:08:24,010
But each one of these is going to return one value.

127
00:08:24,010 --> 00:08:31,580
And so what if you are asked the question what is the average salary of employees in all departments?

128
00:08:31,610 --> 00:08:37,880
Now we know how to do an average salary, say, in one of our departments where we have a Where clause

129
00:08:37,880 --> 00:08:40,880
where department ID equals 100.

130
00:08:40,910 --> 00:08:47,810
But what if we wanted the output to be the average salary for department 100 average for 200, 304 hundred.

131
00:08:47,810 --> 00:08:51,710
And this is where we can use the group by clause.

132
00:08:51,710 --> 00:08:59,340
And so the group by clause basically allows us to put another column name in here, which is going to

133
00:08:59,340 --> 00:09:02,940
be department ID, and I'll do as department.

134
00:09:02,970 --> 00:09:07,740
Now I can't run it like this because it doesn't know what to do with that aggregate.

135
00:09:07,740 --> 00:09:15,930
So what we want to do is we want to group by our non aggregate fields, which is going to be department

136
00:09:15,930 --> 00:09:16,350
ID.

137
00:09:16,350 --> 00:09:25,360
And what this does is it will give us all departments and calculate the average salary of each department.

138
00:09:25,360 --> 00:09:32,110
So in this way we're able to return multiple rows just by using the groupby.

139
00:09:32,110 --> 00:09:39,400
And we can use multiple non-aggregate columns as long as they are in our group by statement.

140
00:09:39,400 --> 00:09:42,190
So I will put the question up here.

141
00:09:42,190 --> 00:09:47,500
What is the average salary for each department.

142
00:09:47,920 --> 00:09:50,380
And this is the solution to that.

143
00:09:50,380 --> 00:09:53,350
So let's look at another example.

144
00:09:53,350 --> 00:10:01,060
Let's say the question is how many employees do live in each state.

145
00:10:01,060 --> 00:10:11,840
So here we will do select state ID and then count as employees.

146
00:10:11,840 --> 00:10:14,900
And then we're going to group it by each state.

147
00:10:14,900 --> 00:10:19,100
So we'll just say group by state ID.

148
00:10:19,340 --> 00:10:27,920
And when we execute this we see we've got three California residents, three Georgia residents, 64

149
00:10:27,920 --> 00:10:30,410
Massachusetts and five in Texas.

150
00:10:30,440 --> 00:10:35,040
Now let's say that we are going to further expand upon this.

151
00:10:35,040 --> 00:10:38,430
So I'm just going to copy this kind of a typo there.

152
00:10:38,430 --> 00:10:39,780
I take care of that.

153
00:10:40,650 --> 00:10:47,400
And let's say we're going to extend this to state and zip code.

154
00:10:47,400 --> 00:10:52,170
Then we could do I have to go back and look at our table.

155
00:10:52,170 --> 00:10:56,200
And it's going to be just the column name is zip.

156
00:10:56,200 --> 00:10:58,510
So I could just do zip.

157
00:10:58,510 --> 00:11:04,000
And then I have to also add that to my group by statement.

158
00:11:04,000 --> 00:11:05,800
And when we execute that.

159
00:11:05,800 --> 00:11:11,050
So it's going to list every zip code in each state.

160
00:11:11,050 --> 00:11:15,610
And it's going to give us the number of residents in that zip code.

161
00:11:15,610 --> 00:11:20,390
And then once it's done with the state then we'll go down to the next state.

162
00:11:20,390 --> 00:11:21,920
There's only one zip code.

163
00:11:21,920 --> 00:11:23,420
All three people live in it.

164
00:11:23,420 --> 00:11:24,470
Same with Texas.

165
00:11:24,470 --> 00:11:27,500
And then we have two zip codes in California.

166
00:11:27,500 --> 00:11:30,560
So one less concept for this lecture.

167
00:11:30,830 --> 00:11:35,480
And we're going to go back to our first group by example.

168
00:11:35,480 --> 00:11:42,440
And basically what this does is it gives us the average salary for each department.

169
00:11:42,440 --> 00:11:48,990
So it's still basically turning many rows into one by generalizing the data.

170
00:11:48,990 --> 00:11:53,670
In this case it is averaging the salaries by department.

171
00:11:53,670 --> 00:11:56,100
And so we put in our group by department.

172
00:11:56,100 --> 00:11:58,890
So let's take that a little further.

173
00:11:58,890 --> 00:12:05,550
And let's say that you wanted to only show departments and their average salaries, as long as the average

174
00:12:05,550 --> 00:12:17,080
salary was say more than 45,000, we could try to just put where average salary is greater than 45,000.

175
00:12:17,080 --> 00:12:23,110
But notice that where is kind of got the red squiggly under it and we're going to get an error.

176
00:12:23,110 --> 00:12:26,800
And it's basically saying the syntax is incorrect.

177
00:12:26,800 --> 00:12:32,590
Basically, once you put down a group by clause then where is no longer valid.

178
00:12:32,590 --> 00:12:37,990
And that's because where works on individual rows returned by a Select statement.

179
00:12:37,990 --> 00:12:45,730
But we've already consolidated the information for our department and average salary by using groups.

180
00:12:45,730 --> 00:12:51,340
So we can magically fix that by using the having directive.

181
00:12:51,340 --> 00:12:56,390
And basically the way I think about it is having is where for groups.

182
00:12:56,390 --> 00:13:03,260
And so it will look at each department ID and give it a yes or no based on whether the group average

183
00:13:03,260 --> 00:13:09,680
salary value is meeting this predicate, which is greater than 45,000.

184
00:13:09,680 --> 00:13:17,810
So if we execute this, then we're able to interrogate each group and filter out the results so that

185
00:13:17,810 --> 00:13:25,890
we have only the department's return who have an average salary that is greater than 45,000.

186
00:13:26,190 --> 00:13:29,520
So aggregate functions are powerful, but they're pretty simple.

187
00:13:29,550 --> 00:13:34,650
Coming up, we're going to do an assignment where you will get some practice using aggregate functions

188
00:13:34,650 --> 00:13:35,700
in queries.

189
00:13:35,730 --> 00:13:37,440
See you in the next assignment.
