Women Who Code is a fantastic non-profit organization whose goal is to expose more women to tech related careers. Their chapter Women Who Code Data Science is putting on a 6 week introduction to machine learning course on Saturdays from 5/16/20 – 6/20/20. Each week, on the following Monday, I will be posting a recap of the course on my blog. Here’s the recap of Part 3; the full video is available on Youtube.
Part 3 Focus: Classification (continued)
If you missed last week’s session on classification, you can read the recap first to get caught up here.
1. Decision Trees
- A decision tree is a type of classification algorithm that learns a set of rules from a data set and uses them to make predictions.
- A decision tree is represented like a flow chart.
- Both a root node (square) and an internal node (circle) represent some sort of question that will be asked. Ex: Is it sunny? Is the humidity high, medium, or low?
- The branch (line) represents an answer to a question at a root or leaf node. The answer determines which branch you follow from a node. Ex: Yes, it is sunny or no, it is not sunny.
- The leaf node (triangle) represents a final prediction for the data point in question. Ex: Since it is sunny, and the humidity is low, we predict that the golf match will be played.
- A decision tree is constructed by observing two measures, entropy and information gain, which will be explained below.
- Entropy is basically a measure of uncertainty within a sample of data.
- Think of it this way – if all 100 golf matches in your data set were played and none were cancelled, there is LOW entropy… because you have LOW doubt about if the match will be played.
- If you have 100 golf matches in your data set, and 50 were played, and 50 were cancelled, there is HIGH entropy. You have HIGH doubt about playing the next match, because the data is “hit or miss”.
- There are two entropy formulas needed for this process – the entropy of the target response (top) and the entropy of the target response given some feature, X (bottom).
3. Information Gain
- Information gain describes the amount of knowledge that we learn about the dataset by adding a node to a decision tree.
- Here’s a real example – let’s say there’s a golf match coming up next week. If we know it’s going to rain, that makes a big difference in our prediction, because rain has a big impact on a golf match getting cancelled. Knowing if it is raining would yield HIGH information gain.
- The formula for information gain uses the formulas from above. (This makes sense, because as we lower uncertainty, we gain information.)
4. Building the Tree
- Basically, the process for building a decision tree goes like this: Calculate the potential information gain for each feature in your data sample. Pick the one with the highest information gain as the “decision making” feature in the current node. Repeat calculations for remaining features.
- You can stop once all of your branches lead to a leaf node (which implies a final decision).
4. Advantages and Disadvantages
- Decision trees are easy to understand and explain, which makes them an intuitive way to make predictions.
- Also, they are an upgrade from Naïve Bayes, because features are not assumed to be equally important. The whole idea of decision trees is putting the most important features first.
- However, decision trees have costly and complicated calculations when the class labels start to grow. Also, they cannot estimate missing data.
- Finally, decision trees are susceptible to overfitting because they will always be biased to the class values with more samples. For example, if you have 100 golf matches in your data sample, and 87 of them had low humidity, it could appear that low humidity causes the matches to be played.
5. Overcoming Disadvantages of Decision Trees
- Two ways to reduce the disadvantages of decision trees are pruning and ensemble learning.
- Pruning a tree is removing nodes and branches that yield little to no predictive power. We can consider these branches “edge cases” or “random noise”, so cutting them off reduces overfitting.
- Ensemble learning is a process of putting together multiple “weak” machine learning models to make one large, better performing learning unit.
5. Random Forest
- A random forest is a specific type of ensemble learning used for decision trees.
- The models are built on random subsets of data, and each model only focuses on a random subset of features.
- Each individual tree (or “weak learner”) will make a prediction. The outcome with the most votes from the forest will become the true prediction.
OK, that’s a basic rundown of decision trees and their stronger big sister, AKA the random forest. If you want to try implementing a random forest in Python, you can check out the full course, or you can find the notebook on the WWCode Data Science Github.
We’re halfway through our Introduction to Machine Learning courses… but, WWCode Data Science still has three more weeks planned that will be filled with linear regression, logistic regression, model evaluation, and more. YOU DON’T WANNA MISS IT! Sign up for Parts 3-6 here. Hope to virtually see you there again Saturday!