# Introduction To Machine Learning By WWCode Data Science, Part 6 Recap

Women Who Code is a fantastic non-profit organization whose goal is to expose more women to tech related careers. Their chapter Women Who Code Data Science is putting on a 6 week introduction to machine learning course on Saturdays from 5/16/20 – 6/20/20. Each week, on the following Monday, I will be posting a recap of the course on my blog. Here’s the recap of Part 5; the full video is available on YouTube.

## Part 6 Focus: Model Evaluation

1. Model Evaluation

• Model evaluation is comparing the values that were predicted by the model to the actual values in the dataset to determine how well the model is performing.
• To evaluate a classification model, we use a confusion matrix.
• To evaluate a regression model, we use a loss function.

2. Confusion Matrix

• A confusion matrix is represents how well a classification model is doing in tabular form by accounting for all possible combinations of actual vs. predicted values.
• This will weigh the outcomes where the prediction matches the actual values and where they don’t – to see if there are any commonly confused classes. (ex: what if the model keeps confusing apples and oranges?)
• True Positive: Correctly classify as positive (“Yes, this is an apple, and I’m right.”)
• False Positive: Incorrectly classify as positive (“Yes, this is an apple, but I’m wrong.”)
• True Negative: Correctly classify as negative (“No, this isn’t an apple, and I’m right.”)
• False Negative: Incorrectly classify as negative (“No, this isn’t an apple, but I’m wrong.”)
• There are many metrics to evaluate the confusion matrix. Some of the most popular include accuracy, precision, recall, F1-Score, and ROC curve

3. Class Imbalance

• Very few datasets are balanced in real life.
• This can lead models to be biased, and introduces the Bias-Variance trade off.
• For example, imagine trying to develop a model that predicts presence of a rare disease that affects .3% of the population.
• We can handle class imbalance with appropriate metric selection, intentional sampling, and introducing cost in the model where the stakes are higher.

4. Loss Function

• A loss function determines how well a model is performing by calculating the distance (or deviation) of the predicted value from the actual value.
• Commonly used loss functions are mean absolute error, mean squared error, root mean squared error, R-squared, and adjusted R-squared.

5. Cross Validation

• Cross validation tests the model’s ability to predict values for data that it has not seen before.
• This will help identify overfitting or sampling bias.
• Exhaustive cross validation: tests all the possible ways to divide the dataset
• Non-exhaustive cross validation: tests a subset of ways to divide the dataset with leave-p-out.

WOW! What an amazing 6 weeks learning the fundamentals of machine learning. I hope that you learned as much as I did on this journey. If you haven’t checked out the previous five weeks of this course, I highly recommend you check out. I’ll even give you all the links below 🙂