Category Archives: DevOps

Setting up a secure etcd cluster

etcd is a highly available key-value store for performing service discovery and storing application configuration. It’s a key component of CoreOS – if you set up a simple CoreOS cluster you’ll wind up with etcd running on each node in your cluster.

One of the appealing things about etcd is that its API is very easy to use – simple HTTP endpoints delivering easily consumable JSON data. However, by default it’s not secured in any way.

etcd supports TLS based encryption and authentication, but the documentation isn’t the easiest to follow. In this post, I’ll share my experience of setting up a secured etcd installation from scratch.

Let’s build an etcd cluster than spans 3 continents!

Untitled
I’m going to walk through how you could build a highly available etcd cluster using 3 cheap Digital Ocean machines in London, New York and Singapore. This cluster will tolerate the failure of any one location. You could throw in San Francisco and Amsterdam and tolerate *two* failures. I’ll leave that as an exercise for the reader!

I’m going to demonstrate this using Ubuntu 15.04 rather than CoreOS – that’s simply because I wanted to learn about etcd without having CoreOS perform any configuration for me.

Ladies and gentlemen, start your engines!

Fire up 3 Ubuntu 15.04 machines. The only reason I chose 15.04 is because I wanted to use systemd, but you should be able to use whatever you prefer. If you’re not already a Digital Ocean customer, use this referral link for a $10 credit – that’ll let you play with this setup for a couple of weeks.

Each machine need only be their most basic $5/mo offering – so go ahead and create a machine in London, New York and Singapore.

You need to know their IPs and domain names – for the rest of this post I’ll refer to them as ETCD_IP1..3 and ETCD_HOSTNAME1..3. Note that you don’t need to set up DNS entries, you just need the name to create the certificate signing request for each host.

Creating a certificate authority

To create the security certificates we need to set up a Certificate Authority (CA). There’s a tool called etcd-ca we can use do this.

There’s no binary releases of etcd-ca available, but it’s fairly straightforward to build your own binary in a golang docker container.

#get a shell in a golang container
docker run -ti --rm -v /tmp:/out golang /bin/bash 

#build etcd-ca and copy it back out
git clone https://github.com/coreos/etcd-ca
cd etcd-ca
./build
cp /go/etcd-ca/bin/etcd-ca /out
exit

#now we have etcd-ca in /tmp ready to copy wherever we need it
cp /tmp/etcd-ca /usr/bin/

Now we can initialise our CA. To keep things simple, I’ll use an empty passphrase

etcd-ca init --passphrase ''

This will setup the CA and store its key in .etcd-ca – you can change where etcd-ca stores such data with the –depot-path option.

Create certificates

Now we have a CA, we can create all the certificates we need for our cluster.

etcd-ca new-cert --passphrase '' --ip $ETCD_IP1 --domain $ETCD_HOSTNAME1 server1
etcd-ca sign --passphrase '' server1
etcd-ca export --insecure --passphrase '' server1 | tar xvf -
etcd-ca chain server1 > server1.ca.crt

etcd-ca new-cert --passphrase '' --ip $ETCD_IP2 --domain $ETCD_HOSTNAME2 server2
etcd-ca sign --passphrase '' server2
etcd-ca export --insecure --passphrase '' server2 | tar xvf -
etcd-ca chain server2 > server2.ca.crt

etcd-ca new-cert --passphrase '' --ip $ETCD_IP3 --domain $ETCD_HOSTNAME3 server3
etcd-ca sign --passphrase '' server3
etcd-ca export --insecure --passphrase '' server3 | tar xvf -
etcd-ca chain server3 > server3.ca.crt

The keys and certificates are retained in the depot directory, but the export will have created the files we need on each of our etcd servers as serverX.crt and serverX.key.insecure. We also create a CA chain in serverX.ca.crt

We also need a client key which we’ll use with etcdctl. etcd will reject client requests if they aren’t using a certificate signed by your CA, which is how we’ll be preventing unauthorized access to the etcd cluster.

etcd-ca new-cert  --passphrase '' client
etcd-ca sign  --passphrase '' client
etcd-ca export --insecure  --passphrase '' client | tar xvf -

This will leave us with client.crt and client.key.insecure

Setting up each etcd server

Here’s how we set up server 1. First, we install etcd

#install curl and ntp to keep our clock in sync
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get -y install curl ntp

#now grab binary release of etcd
curl -L  https://github.com/coreos/etcd/releases/download/v2.1.0-alpha.0/etcd-v2.1.0-alpha.0-linux-amd64.tar.gz -o etcd.tar.gz
tar xfz etcd.tar.gz

#install etcd and etcdctl, then clean up
cp etcd-v*/etcd* /usr/bin/
rm -Rf etcd*

#create a directory where etcd can store persistent data
mkdir -p /var/lib/etcd

Copy the server1.crt, server1.key.insecure, server1.ca.crt we created earlier to /root. Now we’ll create a systemd unit which will start etcd in /etc/systemd/system/etcd.service

[Unit]
Description=etcd
After=network.target

[Install]
WantedBy=multi-user.target

[Service]
#basic config
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=etcd1
Environment=ETCD_LISTEN_PEER_URLS=https://$ETCD_IP1:2380
Environment=ETCD_LISTEN_CLIENT_URLS=https://$ETCD_IP1:2379
Environment=ETCD_ADVERTISE_CLIENT_URLS=https://$ETCD_IP1:2379

#initial cluster configuration
Environment=ETCD_INITIAL_CLUSTER=etcd1=https://$ETCD_IP1:2380,etcd2=https://$ETCD_IP2:2380,etcd3=https://$ETCD_IP3:2380
Environment=ETCD_INITIAL_CLUSTER_TOKEN=your-unique-token
Environment=ETCD_INITIAL_CLUSTER_STATE=new
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://$ETCD_IP1:2380

#security
Environment=ETCD_TRUSTED_CA_FILE=/root/server1.ca.crt
Environment=ETCD_CERT_FILE=/root/server1.crt
Environment=ETCD_KEY_FILE=/root/server1.key.insecure
Environment=ETCD_CLIENT_CERT_AUTH=1

Environment=ETCD_PEER_TRUSTED_CA_FILE=/root/server1.ca.crt
Environment=ETCD_PEER_CERT_FILE=/root/server1.crt
Environment=ETCD_PEER_KEY_FILE=/root/server1.key.insecure
Environment=ETCD_PEER_CLIENT_CERT_AUTH=1

#tuning see https://github.com/coreos/etcd/blob/master/Documentation/tuning.md
Environment=ETCD_HEARTBEAT_INTERVAL=100
Environment=ETCD_ELECTION_TIMEOUT=2500

ExecStart=/usr/bin/etcd
Restart=always

The etcd documentation recommends setting the election timeout to around 10x the ping time. In my test setup, I was seeing 250ms pings from London to Singapore, so I went for a 2500ms timeout.

It should be clear how to adjust that unit for each server – note that the ETCD_INITIAL_CLUSTER setting is the same for each server, and simply tells etcd where it can find its initial peers.

Now we can tell the system about our new unit and start it

systemctl daemon-reload
systemctl enable etcd.service
systemctl restart etcd.service

Do that on all three servers and you’re up and running!

Setting up etcdctl

We can set up some environment variables on the server so that etcdctl uses our client certificate. Copy the client.crt to /root and create this file in /etc/profile.d/etcd.sh so that you have these environment variables on each login.

export ETCDCTL_CERT_FILE=/root/client.crt
export ETCDCTL_KEY_FILE=/root/client.key.insecure
export ETCDCTL_CA_FILE=/root/server1.ca.crt
export ETCDCTL_PEERS=https://$ETCD_IP1:2379,https://$ETCD_IP2:2379,https://$ETCD_IP3:2379

Log back in and you should be able to play with etcdctl

etcdctl set /foo bar
etcdctl get /foo

Here’s how you could talk to a specific node with curl

curl --cacert /root/server1.ca.crt \
--cert /root/client.crt \
--key /root/client.key.insecure \
-L https://$ETCD_IP1:2379/v2/keys/foo

What next?

As it stands, you could use this setup as a secure for replacement for https://discovery.etcd.io to bootstrap a CoreOS cluster. You could also use this as the basis for a CoreOS cluster which is distributed across multiple datacentres.

While exploring this, I found the follow pages useful

Blue-Turquoise-Green Deployment

In this post I’m putting a name to something I’ve found myself doing in order to deliver zero-downtime deployments without any loss of database consistency.

The idea of Blue-Green deployment is well-established and appealing. Bring up an entire new stack when you want to deploy, and when you’re ready, flip over to it at the load balancer.

Zero downtime deployment. It makes everyone happy.

But…data synchronization is hard

Cloud environments make it easy to bring up a new stack for blue-green deployments. What’s not so easy is dealing with transactions during the flip from blue to green.

During that time, some of your blue services might be writing data into the blue database, and on a subsequent request, trying to read it out of green.

You either have to live with a little inconsistency, or drive yourself crazy trying to get it synchronized.

What about a common database?

This won’t suit all applications, but you can do blue-green deployment with a common data storage backend. The actual blue and green elements of the stack consist of application code and any data migration upgrade/downgrade handling logic.

Most of the time, if you’re trying to push out frequent updates, those updates are software changes with infrequent database schema changes.

So, you can happily make several releases a day with zero downtime. However, sooner or later you’re going to make a breaking schema change.

The horror of backwards incompatible schema changes

So, you’re barrelling along with your shared data backend, and you find the current live blue deployment will fail when the new green deployment performs its database migrations on your common data store.

Now you can’t deploy green without a scheduled downtime.

But you don’t want scheduled downtime! How can we do a zero downtime deployment and still retain the green-blue rollback capability?

Introducing Blue-Turquoise-Green deployment!

You need to create a turquoise stack. That’s the blue release, patched to run without failure on both a blue and a green database schema. This means it might have to detect the availability of certain features and adapt its behaviour at runtime. It might look ugly, but you’re not planning to keep it for long.
Diagram illustrating how 'turquoise' stack allows zero-downtime deployment on a shared data store

Now, you can perform a deployment of turquoise. It runs just fine on the blue database, and you can run the database migrations for green. It keeps on trucking. Now you’re safely running blue on a green-compatible database, you go ahead and deploy the green stack.

If you do run into problems, you’ve got everything in place to downgrade. Flip from green back to turquoise. Revert the database migrations, and you can then flip from turquoise to blue, and you’re back where you started.

Thinking in turquoise

For me, this has been more of a thought experiment. I’ve found that if you plan to do blue-green deployment on a shared data backend, you naturally adopt a ‘turquoise’ mindset to the migrations.

That means ensuring you design schema changes carefully, and deploy them in advance of the code which actually requires them. In order words, you build in that turquoise-coloured forward compatibility ahead of time, and you’re back to low risk, blue-green deployments!

Finally, why turquoise?

Because turquoise is a much nicer word than cyan. I should also say that I don’t claim this is a new idea. Giving a name to things makes it easier to discuss with others – I was trying to describe this approach to someone and wrote this as a result. Comments are welcome.