*Women Who Code is a fantastic non-profit organization whose goal is to expose more women to tech related careers. Their chapter Women Who Code Data Science* *is putting on a 6 week introduction to machine learning course on Saturdays from 5/16/20 โ 6/20/20. Each week, on the following Monday, I will be posting a recap of the course on my blog. Hereโs the recap of Part 5; the full video is available on YouTube.*

## Part 6 Focus: Model Evaluation

1. Model Evaluation

**Model evaluation**is comparing the values that were predicted by the model to the actual values in the dataset to determine how well the model is performing.- To evaluate a classification model, we use a confusion matrix.
- To evaluate a regression model, we use a loss function.

2. Confusion Matrix

- A
**confusion matrix**is represents how well a classification model is doing in tabular form by accounting for all possible combinations of actual vs. predicted values. - This will weigh the outcomes where the prediction matches the actual values and where they don’t – to see if there are any commonly
*confused*classes. (ex: what if the model keeps confusing apples and oranges?)

- True Positive: Correctly classify as positive (“Yes, this is an apple, and I’m right.”)
- False Positive: Incorrectly classify as positive (“Yes, this is an apple, but I’m wrong.”)
- True Negative: Correctly classify as negative (“No, this isn’t an apple, and I’m right.”)
- False Negative: Incorrectly classify as negative (“No, this isn’t an apple, but I’m wrong.”)
- There are many metrics to evaluate the confusion matrix. Some of the most popular include accuracy, precision, recall, F1-Score, and ROC curve

3. Class Imbalance

- Very few datasets are balanced in real life.
- This can lead models to be biased, and introduces the Bias-Variance trade off.
- For example, imagine trying to develop a model that predicts presence of a rare disease that affects .3% of the population.
- We can handle
**class imbalance**with appropriate metric selection, intentional sampling, and introducing cost in the model where the stakes are higher.

4. Loss Function

- A
**loss function**determines how well a model is performing by calculating the distance (or deviation) of the predicted value from the actual value. - Commonly used loss functions are mean absolute error, mean squared error, root mean squared error, R-squared, and adjusted R-squared.

5. Cross Validation

**Cross validation**tests the model’s ability to predict values for data that it has not seen before.- This will help identify overfitting or sampling bias.
- Exhaustive cross validation: tests all the possible ways to divide the dataset
- Non-exhaustive cross validation: tests a subset of ways to divide the dataset with leave-p-out.

WOW! What an amazing 6 weeks learning the fundamentals of machine learning. I hope that you learned as much as I did on this journey. If you haven’t checked out the previous five weeks of this course, I highly recommend you check out. I’ll even give you all the links below ๐

## Be the first to reply