Approaching (almost) any Machine Learning problem

This post was originally published by Matthew Mayo at KD Nuggets

There are a growing number of works out there addressing how to approach machine learning problems, many of them quite good. But how many of them are written by a 4x Kaggle Grandmaster?

Abhishek Thakur, the 4x Grandmaster in question — who now works on NLP at Hugging Face — wrote and released his book Approaching (Almost) Any Machine Learning Problem (AAAMLP) last year. The book can be purchased through Amazon for a very reasonable price, much more so than most other books of similar content.

Additionally, however, Abhishek has recently released the entirety of the book online for free, available in PDF via its Github repo.


There are 2 details you need to know about AAAMLP right off the bat, directly from the author:

This is not a traditional book.

The book expects you to have basic knowledge of machine learning and deep learning.

The second point is easily explained in that this is definitely not a “start from zero” machine learning book. If you aren’t already comfortable with basic machine learning concepts and Python programming, you need to look elsewhere for learning materials.

The first point can be explained in part in that AAAMLP is designed as a code-along book, which is the reason why its code is not explicitly shared separately on its Github repo. You want to understand and implement the suggested approaches contained within? You have to read and type out the code as you go, better ensuring an understanding of what it is you are doing. This is not like most other books.

Also, the latter half of the book focuses on comprehensive approaches to solving particular machine learning problem types, with chapters appropriately dedicated to these different types. This is also, generally speaking, not like most other books.

Coming in right around 300 pages, the book covers a lot of ground. See the full table of contents below for more information:

  1. Setting up your working environment
  2. Supervised vs unsupervised learning
  3. Cross-validation
  4. Evaluation metrics
  5. Arranging machine learning projects
  6. Approaching categorical variables
  7. Feature engineering
  8. Feature selection
  9. Hyperparameter optimization
  10. Approaching image classification & segmentation
  11. Approaching text classification/regression
  12. Approaching ensembling and stacking
  13. Approaching reproducible code & model serving

You can see that some time is spent early on with more basic concepts, but even these aren’t your typical book-based explanations. You get right into their practical usefulness as well as how to code the concepts for your own projects.

Once the vital foundational topic chapters of supervised vs unsupervised learning, cross-validation and evaluation metrics are out of the way, the task of project-centric approaches to arranging your code and other files is taken up, followed by several more practical data pre-processing and preparation chapters. After this, practical hyperparameter optimization is covered.

Once these chapters are finished, the more comprehensive treatments of specific types of problem approaches are tackled, including image classification, text classification, ensembling, and reproducibility and model serving. It is in these chapters that the book really shines, moving from being useful in a practical sense of what you might be trying to accomplish, to being a great guidebook and what you can accomplish by following Thakur’s map.

A great companion piece to this book is Thakur’s YouTube channel. Besides some great discussions and input from others, Thakur has video coding walkthroughs of a number of the concepts that have chapters dedicated to them in the book. This makes for a great way to reinforce and get additional insight on these topics. I highly recommend his channel as a resource beyond a book mate as well.

If you enjoy the book, you should consider supporting Thakur’s work by purchasing a copy of AAAMLP, as both Kindle and paperback format copies are very reasonably priced. Regardless, Thakur asks that anyone who reads the books consider leaving a review on Google, Amazon, or Goodreads.

AAAMLP is a unique book, blending practical “how to” with demonstrative “could do” and serving as a great guidebook to anyone looking to get serious about implementing machine learning solutions. I highly recommend both the book and author Abhishek Thakur’s YouTube channel.

Spread the word

This post was originally published by Matthew Mayo at KD Nuggets

Related posts