Introduction To Machine Learning By WWCode Data Science, Part 6 Recap

Source: https://opencollective.com/wwcode

Women Who Code is a fantastic non-profit organization whose goal is to expose more women to tech related careers. Their chapter Women Who Code Data Science is putting on a 6 week introduction to machine learning course on Saturdays from 5/16/20 โ€“ 6/20/20. Each week, on the following Monday, I will be posting a recap of the course on my blog. Hereโ€™s the recap of Part 5; the full video is available on YouTube.

Part 6 Focus: Model Evaluation

1. Model Evaluation

  • Model evaluation is comparing the values that were predicted by the model to the actual values in the dataset to determine how well the model is performing.
  • To evaluate a classification model, we use a confusion matrix.
  • To evaluate a regression model, we use a loss function.

2. Confusion Matrix

  • A confusion matrix is represents how well a classification model is doing in tabular form by accounting for all possible combinations of actual vs. predicted values.
  • This will weigh the outcomes where the prediction matches the actual values and where they don’t – to see if there are any commonly confused classes. (ex: what if the model keeps confusing apples and oranges?)
Photo Courtesy of WWCode Data Science
  • True Positive: Correctly classify as positive (“Yes, this is an apple, and I’m right.”)
  • False Positive: Incorrectly classify as positive (“Yes, this is an apple, but I’m wrong.”)
  • True Negative: Correctly classify as negative (“No, this isn’t an apple, and I’m right.”)
  • False Negative: Incorrectly classify as negative (“No, this isn’t an apple, but I’m wrong.”)
  • There are many metrics to evaluate the confusion matrix. Some of the most popular include accuracy, precision, recall, F1-Score, and ROC curve

3. Class Imbalance

  • Very few datasets are balanced in real life.
  • This can lead models to be biased, and introduces the Bias-Variance trade off.
  • For example, imagine trying to develop a model that predicts presence of a rare disease that affects .3% of the population.
  • We can handle class imbalance with appropriate metric selection, intentional sampling, and introducing cost in the model where the stakes are higher.

4. Loss Function

  • A loss function determines how well a model is performing by calculating the distance (or deviation) of the predicted value from the actual value.
  • Commonly used loss functions are mean absolute error, mean squared error, root mean squared error, R-squared, and adjusted R-squared.

5. Cross Validation

  • Cross validation tests the model’s ability to predict values for data that it has not seen before.
  • This will help identify overfitting or sampling bias.
  • Exhaustive cross validation: tests all the possible ways to divide the dataset
  • Non-exhaustive cross validation: tests a subset of ways to divide the dataset with leave-p-out.

WOW! What an amazing 6 weeks learning the fundamentals of machine learning. I hope that you learned as much as I did on this journey. If you haven’t checked out the previous five weeks of this course, I highly recommend you check out. I’ll even give you all the links below ๐Ÿ™‚

  1. Basics of Supervised Learning
  2. Classification
  3. Classification (continued)
  4. Linear Regression
  5. Logistic Regression

Be the first to reply

Leave a Reply

Your email address will not be published. Required fields are marked *