2020 - David's blog

A/B testing my resume

By David LindelöfPosted on November 24, 2020Posted in R4 Comments

Internet wisdom is divided on whether one-page resumes are more effective at landing you an interview than two-page ones. Most of the advice out there seems much opinion- or anecdotal-based, with very little scientific basis. Well, let’s fix that. Being currently open to work, I thought this would be the right time to test this […]

Unit testing SQL with PySpark

By David LindelöfPosted on November 16, 2020Posted in Python3 Comments

Machine-learning applications frequently feature SQL queries, which range from simple projections to complex aggregations over several join operations. There doesn’t seem to be much guidance on how to verify that these queries are correct. All mainstream programming languages have embraced unit tests as the primary tool to verify the correctness of the language’s smallest building […]

Scraping real estate for fun

By David LindelöfPosted on November 6, 2020Posted in Uncategorized

Here’s a fun weekend project: scrape the real estate classifieds of the website of your choice, and do some analytics on the data. I did just that last weekend, using the Scrapy Python library for web scraping, which I then let loose on one of the major real estate classifieds website in Switzerland (can’t tell […]

Testing Scientific Software with Hypothesis

By David LindelöfPosted on October 28, 2020Posted in Python

Writing unit tests for scientific software is challenging because frequently you don’t even know what the output should be. Unlike business software, which automates well-understood processes, here you cannot simply work your way through use case after use case, unit test after unit test. Your program is either correct or it isn’t, and you have […]

Monty Hall: a programmer’s explanation

By David LindelöfPosted on October 2, 2020Posted in R3 Comments

I take it we’re all familiar with the infamous Monty Hall problem: Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say A, and the host, who knows what’s behind the doors, opens another door, say […]

Reading S3 data from a local PySpark session

By David LindelöfPosted on September 25, 2020Posted in Python

For the impatient To read data on S3 to a local PySpark dataframe using temporary security credentials, you need to: Download a Spark distribution bundled with Hadoop 3.x Build and install the pyspark package Tell PySpark to use the hadoop-aws library Configure the credentials The problem When you attempt read S3 data from a local […]

Probability’s not fair

By David LindelöfPosted on August 10, 2020Posted in Maths

I’m currently reviewing a book for a well-known tech publisher, in which the author discusses the assignment of probabilities to each face of a die, when the faces are not of equal size. In the absence of more information, he claims that this is impossible. I beg to differ. What follows will depend a lot […]

A/B testing my resume

Like this:

Unit testing SQL with PySpark

Like this:

Scraping real estate for fun

Like this:

Testing Scientific Software with Hypothesis

Like this:

Monty Hall: a programmer’s explanation

Like this:

Reading S3 data from a local PySpark session

Like this:

Probability’s not fair

Like this:

Year: 2020

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: