David's blog

How Are P-values Distributed Under The Null?

By David LindelöfPosted on January 22, 2025Posted in R2 Comments

I sometimes use this fun interview question for aspiring data scientists: How are p-values distributed assuming the null hypothesis is true? I’ve heard a lot of reasonable answers, including: All very reasonable and intuitive answers which I would probably, at some point, have given myself. They’re also all wrong. The (perhaps surprising) answer is that […]

Your Classifier Is Broken, But It Is Still Useful

By David LindelöfPosted on January 8, 2025Posted in R2 Comments

When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the prevalence. But that estimate is biased, because no classifier is perfect. For example, if your classifier tells you that you have 20% of positive cases, but its precision […]

Things I wish they taught in school

By David LindelöfPosted on July 22, 2024Posted in Uncategorized

Before he became Spider-Man in the 1960s, Peter Parker was a chemistry and physics genius, an expert photographer, and–get this–even knew how to tie a tie. But he was also a shy, nerdy high-school student who couldn’t have been more than 16-18 years old: Wait, a high school student going around in a suit and […]

How to set up a reverse SSH tunnel with Amazon Web Services

By David LindelöfPosted on May 31, 2023Posted in Uncategorized

When the startup shut down there were still dozens of netbooks out there in the wild collecting data on the residential houses fitted with our adaptive heating control algorithms, hopelessly attempting to connect to our VPN server that didn’t exist anymore in order to upload all that data to our now-defunct database. That’s a lot […]

Deep silence or deep work

By David LindelöfPosted on May 17, 2023Posted in Uncategorized1 Comment

It’s Monday afternoon. It’s a holiday but I have a couple of things to catch up from last week that I didn’t finish. The rest of the family is either on holiday camp or taking a nap in the bedroom. I’m working from home. But the home is anything but silent. I can hear the […]

Is The Ratio of Normal Variables Normal?

By David LindelöfPosted on May 3, 2023Posted in R4 Comments

In Trustworthy Online Controller Experiments I came across this quote, referring to a ratio metric $M = \frac{X}{Y}$, which states that: Because $X$ and $Y$ are jointly bivariate normal in the limit, $M$, as the ratio of the two averages, is also normally distributed. That’s only partially true. According to https://en.wikipedia.org/wiki/Ratio_distribution, the ratio of two […]

Working with that data scientist

By David LindelöfPosted on April 20, 2023Posted in Uncategorized

In my current team we have decided to split up the work in a number of workstreams, which are in effect subteams responsible for different aspects of the product. One workstream might be responsible for product instrumentation, another for improving the recommendation algorithms, another responsible for the application’s look and feel. Each workstream has its […]

Controlling for covariates is not the same as “slicing”

By David LindelöfPosted on April 5, 2023Posted in R

To detect small effects in experiments you need to reduce the experimental noise as much as possible. You can do it by working with larger sample sizes, but that doesn’t scale well. A far better approach consists in controlling for covariates that are correlated with your response. I recently gave a talk at our company […]

Getting into data science

By David LindelöfPosted on March 22, 2023Posted in Uncategorized4 Comments

A while back I had the pleasure to address a team of user experience researchers at YouTube, and I got asked for a few resources that could help someone pretty good at science, math, and programming who wanted to get into data science. Here’s the list I gave. These have worked for me in the […]

The law of total probability applied to a conditional probability

By David LindelöfPosted on March 8, 2023Posted in Uncategorized

Dear future self, I’ve just lost (again) about half an hour of my life trying to find a vaguely remembered formula that generalizes the law of total probability to the case of conditional probabilities. Here it is. You’re welcome. The law of total probability says that if you can decompose the set of possible events […]

David's blog

How Are P-values Distributed Under The Null?

Like this:

Your Classifier Is Broken, But It Is Still Useful

Like this:

Things I wish they taught in school

Like this:

How to set up a reverse SSH tunnel with Amazon Web Services

Like this:

Deep silence or deep work

Like this:

Is The Ratio of Normal Variables Normal?

Like this:

Working with that data scientist

Like this:

Controlling for covariates is not the same as “slicing”

Like this:

Getting into data science

Like this:

The law of total probability applied to a conditional probability

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: