The most under-rated programming books

Ask any programmer what their favourite programming book is, and their answer will be one of the usual suspects: Code Complete, The Pragmatic Programmer, or Design Patterns. And rightly so; these are outstanding and highly-regarded works that belong to every programmer’s bookshelf. (If you’re just starting out building up your bookshelf, Jeff Atwood has some great recommendations).

But once you get past the “essential” books you’ll find that there are many incredibly good programming books out there that people don’t talk much about, but which were essential in taking me to the next levels in my professional growth.

Here’s a partial list of such books; I’m sure there are many others, feel free to mention them in the comments.

Growing Object-Oriented Software, Guided by Tests

Cover of "Growing Object-Oriented Software, Guided by Tests
Continue reading

Feature standardization considered harmful

Many statistical learning algorithms perform better when the covariates are on similar scales. For example, it is common practice to standardize the features used by an artificial neural network so that the gradient of its objective function doesn’t depend on the physical units in which the features are described.

The same advice is frequently given for K-means clustering (see Do Clustering algorithms need feature scaling in the pre-processing stage?, Are mean normalization and feature scaling needed for k-means clustering?, and In cluster analysis should I scale (standardize) my data if variables are in the same units?), but there’s a great counter-example given in The Elements of Statistical Learning that I try to reproduce here.

Consider two point clouds ($n=50$ each), randomly drawn around two origins 3 units away from the origin:

Continue reading