How to set up a reverse SSH tunnel with Amazon Web Services

When the startup shut down there were still dozens of netbooks out there in the wild collecting data on the residential houses fitted with our adaptive heating control algorithms, hopelessly attempting to connect to our VPN server that didn’t exist anymore in order to upload all that data to our now-defunct database. That’s a lot of data, sitting and growing on a lot of internet-connected devices.

Some of us came together and figured it could be possible to resume collecting that data, and showcase the benefits of having our system installed on your house. The first problem was, how do we connect to these netbooks? And at near-zero cost?

Warning: hacks ahead.

We figured that step one would be to establish a reverse SSH tunnel to each of these netbooks. A reverse SSH tunnel is set up when an otherwise-inaccessible device (in our case, the netbooks) connects to a publicly available SSH server, opens a port on the server, and forwards (“tunnels”) all incoming connections to that port back to the device. This is the best solution to connect to a device that’s not exposed to the public internet short of setting up a proper VPN solution.

To set up a reverse SSH tunnel you first need a publicly available machine running an SSH server and that will accept reverse tunnels. The good news is that you can all have one by signing up to Amazon Web Services (AWS) and going to the Elastic Cloud 2 (EC2) service:

Next you want to launch an instance:

You really want the smallest, freeest possible machine here that runs Linux:

Make sure you have generated a key pair for this instance (and that you have saved the private key!) and that the machine accepts SSH from anywhere:

But when you set up an SSH tunnel you will also need to make sure the EC2 instance accepts SSH traffic on the ports that will be opened by the tunnel. These are up to you; I have created two tunnels, one on port 7030 and one on 7040, so navigate to the settings for the security group of your instance and make sure the instance will accept TCP traffic to these ports:

That’s all on the server side. On the netbook side you need to do three things: 1) get the private key, 2) change the file permissions on the key, 3) establish the tunnel.

Getting the private key to the netbook is entirely up to you. What I did, and which is absolutely not recommended, was to place the private key neurobat.pem on the same web server hosting this blog. Then I was able to get the key with

wget --no-check-certificate davidlindelof.com/<path-to-key>

(Notice the --no-check-certificate argument. Those netbooks are hopelessly out of date and won’t accept HTTPS certificates anymore.)

Next you need to set the right permissions on the key, or SSH will not accept them:

chmod 400 <path-to-key>

And finally you can set up the tunnel, say on port 7000:

ssh -i <path-to-key> -fN -R :7000:localhost:22 ec2-user@<ec2-ip-address>

If all went well you’ll now be able to ssh into the remote device by sshing to your EC2 instance on port 7000:

ssh <username-on-device>@<ec2-ip-address> -p 7000

As an extra precaution you might also want to look into using the autossh program, which can detect connection drops and attempt to reconnect.

Clunky? Sure. Hacky? You bet. Brittle? Oh my god. But it did the job and I can now work on doing things the “right” way, i.e. setting up a proper VPN solution, probably based on OpenVPN or something.

Deep silence or deep work

It’s Monday afternoon. It’s a holiday but I have a couple of things to catch up from last week that I didn’t finish. The rest of the family is either on holiday camp or taking a nap in the bedroom. I’m working from home. But the home is anything but silent.

I can hear the girls’ muffled chatting, from the sound of it they’re making up some story with their dolls. The village church bell just tolled a single note for the quarter past the hour. My phone’s notification just dinged, and in a rare moment of self discipline I don’t pick it up. Some birds are chirping outside. The convection oven in the kitchen has had a malfunction in years and emits a beep every 10 seconds that I have learned to ignore. Occasionally a plane comes in overhead to land on Geneva’s airport; there’s only one landing strip and depending on the direction of the wind, planes come in from the direction of our village. And on top of it all I hear some kind of background whine that’s very soft–I usually don’t notice it but it’s definitely there and I don’t know if it comes from outside of me or from inside my head.

That’s a lot of noise. It’s also the best possible working conditions I’ve ever experienced. Today I’ve chosen to deliberately notice all these sounds and now I cannot unhear them.

Then there’s the visual distractions. I’ve been working for the past three years from a corner in the living room, the rest of which fills my field of view, as well as parts of the kitchen.

These working conditions sound bad but they can be fixed. I usually set a screen between me and the rest of the living room, and almost always do my deep focus work wearing noise-canceling over-the-ear headphones, playing focus-friendly music. My family knows that when daddy wears the headphones, he is not to be disturbed unless there’s blood or fire. It mostly works.

Like many others, I used to work in an open-space office. Noise-wise and visual distraction-wise, open-space offices are possibly better than working from home. On more than one occasion, visitors from abroad have been impressed by the museum-grade silence filling a Swiss open-space office. But open-space offices offer a richer set of options for not concentrating on your deep work. Entire days can go by, being interrupted by colleagues, taking a walk to the cafeterias, listening in on neighboring conversations, attending more meetings than you should because you fear you’ll miss out. And the siren song of office perks, of course.

The choice is between perfect quiet filled with distractions, or constant information-free background sounds that you can learn to ignore with monk-like focus. I’ve tried it all and I know what works for me. Do you?

Is The Ratio of Normal Variables Normal?

In Trustworthy Online Controller Experiments I came across this quote, referring to a ratio metric $M = \frac{X}{Y}$, which states that:

Because $X$ and $Y$ are jointly bivariate normal in the limit, $M$, as the ratio of the two averages, is also normally distributed.

That’s only partially true. According to https://en.wikipedia.org/wiki/Ratio_distribution, the ratio of two uncorrelated noncentral normal variables $X = N(\mu_X, \sigma_X^2)$ and $Y = N(\mu_Y, \sigma_Y^2)$ has mean $\mu_X / \mu_Y$ and variance approximately $\frac{\mu_X^2}{\mu_Y^2}\left( \frac{\sigma_X^2}{\mu_X^2} + \frac{\sigma_Y^2}{\mu_Y^2} \right)$. The article implies that this is true when $Y$ is unlikely to assume negative values, say $\mu_Y > 3 \sigma_Y$.

As always, the best way to believe something is to see it yourself. Let’s generate some uncorrelated normal variables far from 0 and their ratio:

ux = 100
sdx = 2
uy = 50
sdy = 0.5

X <- rnorm(1000, mean = ux, sd = sdx)
Y <- rnorm(1000, mean = uy, sd = sdy)
Z <- X / Y

Their ratio looks normal enough:

hist(Z)

Which is confirmed by a q-q plot:

qqnorm(Z)

What about the mean and variance?

mean(Z)
[1] 1.998794
ux / uy
[1] 2
var(Z)
[1] 0.001783404
ux^2 / uy^2 * (sdx^2 / ux^2 + sdy^2 / uy^2)
[1] 0.002

Both the mean and variance are very close to their theoretical values.

But what happens now when the denominator $Y$ has a mean close to 0?

ux = 100
sdx = 2
uy = 10
sdy = 2

X <- rnorm(1000, mean = ux, sd = sdx)
Y <- rnorm(1000, mean = uy, sd = sdy)
Z <- X / Y

Hard to call the resulting ratio normally distributed:

hist(Z)

Which is also clear with a q-q plot:

qqnorm(Z)

In other words, it is generally true that ratio metrics where the denominator is far from 0 will also be close enough to a normal distribution for practical purposes. But when the denominator’s mean is, say, closer than 5 sigmas from 0 that assumption breaks down.