David's blog

Err and err and err but less and less and less

David's blog

Err and err and err but less and less and less

Uncategorized

Things I wish they taught in school

Before he became Spider-Man in the 1960s, Peter Parker was a chemistry and physics genius, an expert photographer, and–get this–even knew how to tie a tie. But he was also a shy, nerdy high-school student who couldn’t have been more than 16-18 years old: Wait, a high school student going around in a suit and […]

How to set up a reverse SSH tunnel with Amazon Web Services

When the startup shut down there were still dozens of netbooks out there in the wild collecting data on the residential houses fitted with our adaptive heating control algorithms, hopelessly attempting to connect to our VPN server that didn’t exist anymore in order to upload all that data to our now-defunct database. That’s a lot […]

Working with that data scientist

In my current team we have decided to split up the work in a number of workstreams, which are in effect subteams responsible for different aspects of the product. One workstream might be responsible for product instrumentation, another for improving the recommendation algorithms, another responsible for the application’s look and feel. Each workstream has its […]

Getting into data science

A while back I had the pleasure to address a team of user experience researchers at YouTube, and I got asked for a few resources that could help someone pretty good at science, math, and programming who wanted to get into data science. Here’s the list I gave. These have worked for me in the […]

The law of total probability applied to a conditional probability

Dear future self, I’ve just lost (again) about half an hour of my life trying to find a vaguely remembered formula that generalizes the law of total probability to the case of conditional probabilities. Here it is. You’re welcome. The law of total probability says that if you can decompose the set of possible events […]

Quick note about bootstrapping

Cross-validation—the act of keeping a subset of data to measure the performance of a model trained on the rest of the data—never sounded right to me. It just doesn’t feel optimal to retain an arbitrary fraction of the data when you train your model. Oh and then you’re also supposed to keep another fraction for […]

The most under-rated programming books

Ask any programmer what their favourite programming book is, and their answer will be one of the usual suspects: Code Complete, The Pragmatic Programmer, or Design Patterns. And rightly so; these are outstanding and highly-regarded works that belong to every programmer’s bookshelf. (If you’re just starting out building up your bookshelf, Jeff Atwood has some […]

Scraping real estate for fun

Here’s a fun weekend project: scrape the real estate classifieds of the website of your choice, and do some analytics on the data. I did just that last weekend, using the Scrapy Python library for web scraping, which I then let loose on one of the major real estate classifieds website in Switzerland (can’t tell […]

Scroll to top