It’s 2015 and your firm has decided that it’s finally time that you stop using your primary production systems as the first place you routinely run brand new versions of your software. And, after realizing that configuration files are often just software written in a domain-specific programming language, someone on the team dove deep down the dark devops rabbit hole, and, congratulations, your infrastructure is code now!
Somewhere on the journey into cloud server APIs, orchestration frameworks, configuration management systems, and continuous deployment pipelines, something interesting happened: setting up a new server used to involve considerable labour, discussion, and ceremony. But now, setting a new server is a rather prosaic affair that happens hundreds of times per day – often automatically.
Is that being done in a secure way? If you are using many of the popular devops tools, sadly, the answer is a resounding “no”.
Here’s how many of them work:
- “Hello Cloud Server API: I’d like a new machine, please” – “OK, there’s a new machine available at 192.0.2.3”
- “Hello 192.0.2.3: here’s some confidential data”
And that’s where things go horribly wrong. Modern IP networks aren’t especially trustworthy. In fact, in some countries, man-in the-middle attacks are entirely routine. When your devops tool eagerly transmits the confidential data required to add a new server to your systems, it doesn’t actually know if it’s sending that data to a server at a company you’ve decided is trustworthy, or if it’s sending that data to a complete stranger.
Fortunately, there’s been an easy to use solution to this problem since March of 2010 when OpenSSH 5.4 was released. Unfortunately, that solution isn’t getting used – instead, many tools simply ignore the dire security warnings the OpenSSH developers helpfully provide. In all fairness, the solution I describe (1) may not be intuitive to developers that aren’t familiar with cryptography, (2) is not very widely documented, and (3) is not typically described as a solution to this specific problem.
Here’s how to use SSH’s certificates features to more securely bootstrap a new system:
- Each time a new server is bootstrapped, generate a new temporary SSH key pair. Pass the private key of this key pair to the cloud server API as data to be made available to the server instance. If the API provides features for limiting the distribution of that data, use those features.
- As part of the new server’s initial boot, it will generate a public/private key pair for itself. After doing so, it can use the private key that’s been provided to sign its own public key (using the “HostCertificate” option of SSH).
- During the initial connection to the server, verify the server’s SSH fingerprint using the public portion of the temporary SSH key pair created in the first step.
- Consider setting up the server so that it pauses its boot process with only the SSH daemon running until its SSH fingerprint has been verified. Some cloud server APIs don’t allow you to control which processes on the machine will have access to the private key.
- Discard and cease trusting the temporary key pair immediately after using it. It’s done its job. It’s also likely been exposed to various systems which may not be designed with adequate protection for secure key storage.
One day, perhaps we can worry about concerns like “does my server even have enough entropy to securely generate a key pair?” (This problem isn’t nearly as bad as it used to be. And it turns out that with the solution I’ve described, we are already providing every new instance with unique new random data. If we felt like it, we could look at also seeding the PRNG. I realize that allowing the network to be involved in PRNG seeding is considered heresy in some circles, but on some platforms it might be pretty much the only source of entropy available.). One day, perhaps we can start actually using the specially designed TPM hardware built into many systems which gives us a place to store our private keys where it’s harder for mistakes to lead to the exposure of those keys.
In the meantime, hundreds of thousands of times per day, confidential data — often the data required for complete administrative control over a system — is sent with little regard for ensuring the data gets to its intended destination unread. We can do better than that.
At this point, I’m not going to identify the specific tools I’m referring to. It’s such a widespread problem that there is no point naming and shaming. My hope and belief is that greater developer awareness is all it will take for this particular issue to become rare.