Women Who Code is a fantastic non-profit organization whose goal is to expose more women to tech related careers. Their chapter Women Who Code Data Science is putting on a 6 week introduction to machine learning course on Saturdays from 5/16/20 – 6/20/20. Each week, on the following Monday, I will be posting a recap of the course on my blog. Here’s the recap of Part 2; the full video is available on Youtube.
Part 2 Focus: Classification
- Classification is a form of supervised learning that is intended for predicting variables that are categorical (occupation, team name, color, etc.)
- Need a refresh on supervised learning? Check out last week’s recap of supervised learning here.
2. Conditional Probability
- Conditional probability is used to calculate the probability of two or more dependent events occurring.
- Dependent events are events whose outcomes have some kind of effect on each other
- for example, the probability that a person draws a King and then an 8 from a deck of cards
- this event is dependent because taking a card out makes the stack smaller – and easier to get the one you want on the second try
3. Bayes Theorem
- There are many cases where a scientist needs to know the likelihood of one event happening given another. Also, they often have the data of previous events to help in their calculations. In comes Bayes Theorem, to calculate the probability of some event, Event B, given an event that already happened, Event A.
- By equating P (B|A) and P (A|B), and doing some rearranging, the formula for Bayes Theorem becomes:
4. Naïve Bayes
- When working with Naïve Bayes, the data set you are working with should be split into two parts:
- Features (x) – the columns in the dataset that will somehow affect the outcome variable. Think of these as a bunch of potential “Event As”.
- Class Variable (y) – the column in the dataset that we are trying to predict, or classify. Think of these as potential “Event Bs”.
- You’re probably thinking – why is Bayes so naïve? That’s because this particular version of the theorem assumes that each feature of the dataset is independent (meaning one has no effect on the other).
- To use the Naïve Bayes formula, all you need to do is translate your tabular data to one or more contingency tables, which is just a table with counts for each category. These counts will be substituted into the formula where appropriate.
- The example below uses a weather data set with four features (outlook, temperature, humidity, windy) and one class variable (whether or not a golf game was played). Here are what the contingency tables for this data set would look like.
5. Naïve Bayes Variations
- There are numerous variations of the Naïve Bayes algorithm that can be extremely handy for certain datasets.
- Gaussian Naïve Bayes
- When you have continuous numerical features, you can take advantage of the Gaussian Naïve Bayes algorithm to predict a categorical outcome.
- Multinomial Naïve Bayes
- This algorithm is useful when you have a bunch of text data that you want to use to predict an outcome that is categorical.
- Multinomial Naïve Bayes uses word frequencies in its calculations to make its predictions.
- Bayesian Networks
- Bayesian Networks are helpful when you want to more intricately investigate the conditional probability between subsets of features.
Alright, you’ve got all the information from Part 2 – which means you can click this link now and sign up for Part 3: Classification (continued), and hopefully Parts 4-6 too. Women Who Code has done an amazing job with the classes so far – and the content is only getting better every week.
I sure hope I can learn more about classification with you all next Saturday. But, remember, if you can’t make it to the class on the weekend, you can catch the recap next Monday on my blog!