ISLP selected readings
Dive deeper in the content we’ll be covering. If anything is unclear, don’t hesitate to reach out to your instructor. Aim to grasp the big-picture ideas so you can explain each topic to a non-technical audience.
More Models
Naive Bayes (NB) section 2.2.3 is used in text classification, spam detection, and good on large datasets. The “naive” part is because it assumes features are independent (unrelated) from each other when in actuality this rarely happens. Applies probabilities to features (usually words) that can indicate the thing you’re looking for.
k Nearest Neighbors (KNN)section 2.2.3 and section 3.5 is used in pattern recognition, recommender systems, when there are non-linear boundaries, but not great on large datasets. We call this a “lazy” learner because it doesn’t learn or create a model until it’s asked to make a prediction on new data.
Logistic regression section 4.3 is used in binary classification to output probabilities between 0 and 1, needs linear boundaries. Uses another function to “squish” these outputted probabilities towards 0 or 1 (usually the sigmoid function).
Support Vector Machine (SVM) section 9.1-9.3 is used in a ton of ways, good because it can deal with linear and non-linear boundaries. Uses what is called the “kernel” trick to essentially project data into a higher dimension so that it can have an easier time separating groups out. Sort of how it’s easier to see a person if you are facing them head-on vs looking down at them from a bird’s eye view.
Principal Component Analysis (PCA) section 12.2 can simplify data by finding the most important directions (or features) that capture the most information. (Bad example) Kind of like how if someone were to dig around in your trash, they would find information out about you (what food you like, how often you clean, items you frequently use).
You can read more about ensemble models in section 8.2 from our TREE readings