Review: RESTful Web Services

RESTful Web Services RESTful Web Services by Leonard Richardson My rating: 5 of 5 stars

I began reading “Restful web services” while researching technical solutions for Neurobat Online, the web service version of our intelligent heating controller. Prior to this, most (all?) web service projects I had been involved in were based on SOAP.

REST is a heavily overloaded term in our industry, and can mean different things to different people. The author avoids that controversy by coining the term “Resource-Oriented Architecture”, and shows different examples of web services that can be built using this approach: a social bookmarking service inspired by Delicious, and a mapping service. Both examples are RESTful but the author does an excellent job at showing why being RESTful is not enough. To fully leverage the existing web architecture (including the full HTTP protocol) you need, he argues, to do more than merely being RESTful, and he shows how.

The author never says so explicitly, but after reading this book I found that RESTful web services have at least two significant advantages over what the author coyly calls “Big Web Service” (aka SOAP):

  1. Testability: do not underestimate the advantage of exposing a service that your testing team can test with cURL instead of having to setup a tool such as SOAPUI.

  2. Discoverability: everything is a resource, and all resources will respond to a limited number of HTTP verbs. You don’t have to worry whether adding a bookmark is done through addBookmark() or appendBookmark(); if your service is RESTful, you know that you need to send a POST to some URI.

The biggest takeaway from this book for me was to realise that it’s possible to design an application where “everything is a resource”, and that all resources respond to the same set of methods. Think of it for a moment. Will this not change the way you design non-web services too? Are not all your objects resources? Imagine, for a minute, if all your classes were restricted to expose not more than 5 public methods, and if these methods had the same names. It may sound crazy, but it’s quite possible that you’d end up with a cleaner design built out of many small classes with small interfaces. Is this not an easy way to clean code?

What makes this book great and not merely good is that the author doesn’t simply explain what RESTful web services are about. He is also clearly opinionated about it; however he is never patronising, never condescending. He takes very occasional jabs at systems he calls “Big Web Services” but never belittles them. His message comes across as entirely believable and convincing. By example after example he shows how popular services can be designed as collections of resources, and it is up to the intelligence of the reader to judge whether which kind of design is better.

Pair-programming girls did just as well as boys

For the past three years I’ve taught a freshman-level programming course at the Swiss Federal Institute of Technology in Lausanne. Students are asked to form groups of 2 and to work on a semester project, consisting in the development of a simple library of numeric routines (e.g. square root function, integrals, etc). I then submit their code to a suite of unit tests (including the Valgrind memory checker) and assign them a grade linearly proportional to the number of unit tests that pass. The same grade is assigned to both members of the pair.

Most students will pair with a fellow student of the same sex. In the spring 2014 session, 43 pairs out of 52 were of the same sex. This year’s class was large enough to consider carrying out statistically significant studies on the students’ grades. More specifically, I wanted to examine whether pairs of girls obtained significantly different results from pairs of boys.

Here I show the boxplots of the grades assigned to the 52 pairs, depending on whether it was two females, mixed sex, or two males. The median grade for females is 5.5 out of 6, while the median grade for males is 5 out of 6.

Final grades

The Welch two sample t-test (used to determine whether two samples are drawn from populations with the same mean) yields a p-value of 0.32. The 95% confidence interval for the difference in means between all-females and all-males is between -0.27 and 0.80. In other words, there is no statistically significant difference between the grades obtained by two-female pairs of students and two-male ones.

And what about the pairs of mixed sex? The boxplot suggests that their results are lower, and I can think of a hypothesis to explain that. But with a sample size of only 9 it is hard to draw any conclusion.

Statistically significant energy savings: how many buildings are enough?

From Neurobat’s website it is now possible to download a brochure with the 2012-2013 test results. It summarises the findings we published at the CISBAT 2013 conference, describing the energy savings that we have achieved on four experimental test sites. Of these four, one is an administrative office. Another included the (domestic) hot water in its energy metering. Therefore, only two of them are single-family houses whose energy savings concern the space heating alone. The energy savings we measured on these two sites are 23% and 35%.

It is natural to ask oneself what the average energy savings on a typical house might be. It’s possible to give an estimate, provided that we make several assumptions. We’re going to assume that the energy savings that Neurobat can achieve on a single-family house in Switzerland is a random variable drawn from a normal distribution. Therefore, our best estimates for the mean and standard deviation of that parent distribution are:

$$\mu = \frac{23 + 35}{2} = 29$$

and

$$s = \sqrt{\frac{(23-29)^2 + (35-29)^2}{1}} = 8.49$$

The sample mean estimated from n samples of a normal parent distribution is distributed according to a t-distribution with $n-1$ degrees of freedom. The 95% confidence interval for the true (parent) mean can therefore be found by looking up the 0.975 and 0.025 quantiles of the t-distribution with $n-1$ degrees of freedom. In our case, $n = 2$ and the 95% confidence interval of the true mean is therefore:

$$\left[\mu – t_{n-1, 0.975} \frac{s}{\sqrt{n}}, \mu + t_{n-1, 0.975} \frac{s}{\sqrt{n}}\right] = \left[ -47, 105 \right],$$

where $\mu = 29$, $s$ is the sample standard deviation calculated above and $n = 2$. Not the most helpful estimate ever.

We are currently repeating the experiment for this 2013-2014 heating season. A natural question that has come up is “How many buildings do we need to have a usable confidence interval for the average energy savings?”

As always, reformulating the question in precise terms is half of the battle. We want a narrow confidence interval around the mean energy savings. We can make it as narrow as we want by increasing n, or by relaxing our confidence requirement. Suppose then that we want a 95% confidence interval not wider than 10%; i.e., we want to state that Neurobat achieves $X\pm5\%$ energy savings with 95% confidence.

By a bit of arithmetic, what we are looking for is the number $n$ such that

$$n \ge 4t^2_{n-1, 0.975}\frac{s^2}{w^2},$$

where $s$ is our sample standard deviation and $w$ is the desired width of our confidence interval. There is no closed-form solution for this equation (except for large $n$, where the t-distribution can be approximated with a normal distribution), so finding $n$ is an iterative process. In R, the right-hand side of this formula can be computed with:

 4 * qt(.975,n-1)^2 * s^2 / w^2 

Evaluate this expression with increasing values of n until is becomes smaller than n.

Assuming that $s = 8.49$ as above, we obtain:

$$n = 14.$$

And that’s it. Again, assuming that the energy savings are drawn from a normal distribution whose parent standard deviation is about 8.49, we will need experimental results on 14 buildings to give a 95% confidence interval not larger than 10% on the expected energy savings. For $n = 10$, the width of the confidence interval increases to 12% and for $n = 5$, it is 21%.

The next time you hear someone claim suspiciously precise energy savings with their miracle device, you have my permission to ask them what their confidence interval is, how they calculated it, and what underlying assumptions they are making.

Welcome back, Climate Charts & Graphs

I was happy to learn a few days ago that the Climate Charts & Graphs blog is being reactivated by its author. I used to subscribe to it back in the Google Reader days. In the current climate change conversation we need more blogs like CCG, where arguments can be conclusively settled with (preferably graphical) evidence.

So welcome back, and there’s no reason to apologise!