WEBVTT

00:00.110 --> 00:00.770
Hey, guys.

00:00.770 --> 00:04.250
Welcome to day 31 of 100 Days of Code.

00:04.250 --> 00:07.040
And you've been learning quite a lot of things.

00:07.040 --> 00:15.050
Now, we've been looking at handling exceptions using the JSON data format, parsing and reading Csvs,

00:15.050 --> 00:20.090
using pandas, opening and writing to files, and a whole lot more.

00:20.090 --> 00:24.530
So it's time for your capstone project.

00:24.590 --> 00:30.470
And in this capstone project, we're going to be building a flashcard program to help you study.

00:30.470 --> 00:34.610
And it's especially great with studying for languages.

00:34.730 --> 00:42.470
Now, when I was in school, I studied French, and it was a lot of vocab that we had to learn for tests.

00:42.470 --> 00:48.710
Pomme is apple and I would memorize all these words, all of the grammar tables.

00:48.710 --> 00:53.450
And yet after 4 or 5 years, I couldn't really speak much French.

00:53.450 --> 00:58.160
So I decided to go for immersive language learning.

00:58.160 --> 01:07.370
I went to France, I hung out with friends and I tried to immerse myself in the language and culture,

01:07.430 --> 01:09.230
but that also failed.

01:09.230 --> 01:13.940
I had a lot of fun, but my French didn't seem to improve all that much.

01:14.120 --> 01:22.280
But then I discovered a new way of learning languages, and it all started with looking at Chinese characters.

01:22.340 --> 01:24.650
There's a lot of Chinese characters out there.

01:24.680 --> 01:31.700
There's something like 50,000 Chinese characters in total from history to now.

01:31.730 --> 01:36.140
There's a lot of characters that you could learn that each have a different meaning, and they each

01:36.140 --> 01:38.030
have a different pronunciation.

01:38.120 --> 01:39.200
Imagine that.

01:39.200 --> 01:43.310
Trying to learn 50,000 characters, that's no easy feat.

01:43.310 --> 01:49.190
But then a friend told me that actually, you don't really need 50,000 characters.

01:49.190 --> 01:55.790
Your average professor who is very eloquent, who can write a lot of the characters and use them with

01:55.790 --> 02:04.010
ease, only knows about 10,000, and your average person probably only uses about 8000 in their day

02:04.010 --> 02:05.150
to day lives.

02:05.210 --> 02:11.600
And if you basically just want to get by in life, you can pretty much rely on the 3000 words that an

02:11.600 --> 02:13.310
average teenager would know.

02:13.460 --> 02:19.550
And finally, if you actually just want to be able to watch some simple movies, read some simple books,

02:19.550 --> 02:24.560
then you could use the average kid vocabulary of about 1000 characters.

02:24.590 --> 02:27.590
And at this point, I think to myself, 1000.

02:27.620 --> 02:28.760
That's quite doable.

02:28.760 --> 02:32.420
I could do 1000 if I learned just ten characters a day.

02:32.450 --> 02:35.840
That will take me less than a year to learn all of these characters.

02:36.230 --> 02:40.190
But it's not just 1000 random characters either.

02:40.220 --> 02:42.860
There's such a thing as a frequency dictionary.

02:42.860 --> 02:49.550
So a dictionary that's not listed by A, B, C, D, but it's actually listed by the frequency that

02:49.550 --> 02:52.640
a particular word occurs in common usage.

02:52.910 --> 03:00.980
For example, if you take the first 1000 characters that are most commonly used, then you can pretty

03:00.980 --> 03:03.110
much read most of the newspapers.

03:03.110 --> 03:08.600
You can watch most of the TV shows because these are the words that are the bread and butter of the

03:08.600 --> 03:09.410
language.

03:09.410 --> 03:13.910
It's like in English the a, the of from y.

03:13.940 --> 03:14.540
Yes, no.

03:14.570 --> 03:18.140
These are words that we use every day again and again.

03:18.170 --> 03:24.410
The crazy words like anti-establishment or glioblastoma.

03:24.440 --> 03:29.150
These are not words that you need to really know for day to day life.

03:29.540 --> 03:34.310
So let me show you the program that you will build where you can learn the most frequently used words

03:34.310 --> 03:35.750
in any language.

03:35.780 --> 03:41.900
It's a flashcard program, and it shows you the front and the back of the card.

03:41.900 --> 03:47.300
So for example, French demand in English means request.

03:47.450 --> 03:53.990
After three seconds, the card flips and I can check whether if I knew the right answer, if I got it

03:53.990 --> 03:56.000
right, I'll press the tick.

03:56.000 --> 03:58.670
And if I got it wrong, I'll press the cross.

03:58.700 --> 04:00.500
So let's try another word.

04:00.500 --> 04:07.580
Party means Left or to leave a train means to wait.

04:07.580 --> 04:09.950
And I think I knew that word.

04:09.950 --> 04:12.560
So I'm going to click the check mark.

04:12.590 --> 04:17.960
And what that's going to do is it's going to take the flash card out of all of the list of flash cards.

04:17.960 --> 04:24.110
So it doesn't show me the things I already know, and instead it only shows me the things I don't know

04:24.110 --> 04:28.790
so I can review it and say, oh, I'm not sure what LA means.

04:28.790 --> 04:35.690
So I'll say cross, and that will go back into the deck and it might come up again at some point.

04:35.750 --> 04:40.610
So this beautiful piece of software is what we're going to be creating.

04:40.610 --> 04:47.120
But more specifically you're going to be creating because after all, this is your capstone project.

04:47.120 --> 04:53.510
But don't worry, I've divided up into four steps and I've got some step by step instructions for you

04:53.510 --> 04:55.640
in the next lesson.

04:55.970 --> 05:00.920
Now, if you're wondering, how did you get the most frequent words for the flashcard app in the first

05:00.920 --> 05:01.460
place?

05:01.460 --> 05:09.560
Well, let me show you There's a wiki for the frequency lists of different languages, and it lists

05:09.560 --> 05:11.330
most of the common languages.

05:11.330 --> 05:17.720
If we go to French, you can see that there are loads of different lists that people have compiled that

05:17.720 --> 05:21.290
list the top most frequently occurring words.

05:21.290 --> 05:27.860
And one of the ones that I thought was really relevant is the words that are based on subtitles.

05:27.890 --> 05:34.550
These subtitles come from all sorts of shows and movies that are relevant to modern culture.

05:34.550 --> 05:39.470
And when you look at one of the subtitles, this is one of my favorite shows, by the way.

05:39.470 --> 05:47.090
You can see that the subtitles are listed by language, and if we pick out one, which is in English

05:47.090 --> 05:54.800
and we take a look at it, then you can see it's basically just all the words that are spoken in the

05:54.800 --> 05:59.270
movie or in the show, and it's been transcribed into subtitles.

05:59.300 --> 06:05.190
Now then, if we take all of these words that are from the most commonly watched movies and shows.

06:05.190 --> 06:07.740
We end up with these frequency lists.

06:07.740 --> 06:15.960
So if we take a look here, it shows the most frequent words from 1 to 5000.

06:15.960 --> 06:20.190
And at the very beginning it's your eye of is.

06:20.220 --> 06:22.350
All of these things are really common.

06:22.350 --> 06:25.770
And then as you scroll down, you get to some longer words.

06:25.770 --> 06:31.710
And if you scroll to the bottom, you can see you're getting some more and more niche words.

06:32.370 --> 06:42.360
These frequency lists are compiled by a user called hermit D, and hermit D is Hermit Dave, and he

06:42.360 --> 06:48.570
has a GitHub repository where he's compiled all of the frequency words.

06:48.630 --> 06:52.530
And you can see the latest version from 2018.

06:52.560 --> 06:59.070
Now he's got all of the frequency words for many languages, and it's listed by the language code.

06:59.070 --> 07:02.130
So French would be fr for example.

07:02.160 --> 07:09.600
And here you can see the top 50,000 most frequent lists or the full entire list.

07:09.630 --> 07:13.470
We're probably not going to learn more than 1000.

07:13.500 --> 07:16.410
And I'm certainly not going to get to 50,000.

07:16.410 --> 07:22.770
But this data here lists all the words that he found in these subtitles and the frequency that they

07:22.770 --> 07:23.550
occurred.

07:23.580 --> 07:29.220
And once they've been sorted in order of frequency, this is what you end up with.

07:30.540 --> 07:33.930
So I've already studied some of the first 200 words.

07:33.930 --> 07:40.800
So if I take 100 words from this frequency dictionary and I put it into a Google sheet, then I end

07:40.830 --> 07:42.450
up with something like this.

07:42.600 --> 07:48.840
Now what I want to be able to do is to create a flashcard where the front of the flashcard is the word

07:48.840 --> 07:55.470
in French, and then on the back of the flashcard is the answer in English for what that word means.

07:55.680 --> 08:01.080
Instead of having to flip through a dictionary, finding out the meaning of each of these words, there's

08:01.110 --> 08:04.200
actually a really neat trick in Google Sheets that I want to show you.

08:04.290 --> 08:13.170
if you hit equals to start a new formula and you type in Google Translate, you can see it expects some

08:13.170 --> 08:13.920
inputs.

08:13.920 --> 08:16.800
First is the piece of text that you want to translate.

08:16.800 --> 08:18.690
So I'm going to click on this cell.

08:18.690 --> 08:20.760
And then it's the source language.

08:20.760 --> 08:23.490
So this is the language as a code.

08:23.490 --> 08:28.680
So for example Spanish is es and French is fr.

08:28.710 --> 08:34.440
And then the final input it expects is the language code that you want to translate it to.

08:34.470 --> 08:37.320
So in this case I want to translate it to English.

08:37.320 --> 08:41.310
So I'm going to use n and then we can close off the parentheses.

08:41.310 --> 08:42.150
Hit enter.

08:42.150 --> 08:47.730
And after a little while with good internet you'll see the English translation for this word.

08:47.730 --> 08:53.910
And of course because we're in Excel, we can simply just drag this across all the way down to all of

08:53.910 --> 08:54.630
our words.

08:54.630 --> 08:57.120
And after a little while, bam!

08:57.120 --> 09:00.900
It's translated all of those words into English.

09:01.380 --> 09:07.320
So this is a really neat trick, and I'll link to the docs for this particular formula.

09:07.380 --> 09:13.830
And also you can take a look at the language support that Google's translation service has.

09:13.860 --> 09:17.370
And you can see the language code for each of these languages.

09:17.370 --> 09:24.000
So if you want to try learning Macedonian or Malay, then this is going to be your best bet.

09:24.030 --> 09:31.470
So now that I've created my Excel sheet essentially of French and English words, I've got potentially

09:31.470 --> 09:38.010
100 flashcards with the front and back data already saved inside this Google sheet.

09:38.040 --> 09:45.450
Now all I have to do is simply download it as a CSV, and we'll be able to work with it very easily.

09:45.750 --> 09:51.570
Now you don't have to worry about downloading this or getting hold of this, because I've already included

09:51.570 --> 09:56.760
the final CSV data in the starting project for you to be able to use.

09:57.180 --> 10:05.040
So head over to the next lesson and get started building your very own study aid the flashy flashcard

10:05.070 --> 10:05.670
app.
