Deploying Should Be a Non-Event (Lessons Learned)

A major rule of Agile development is that testing and continuous integration should mean deploying your web app to production should be a non-event: meaning no surprises, no held breath and no fingers crossed when you push your latest changes “live”.

For our small team of three deploying has been pretty easy and straightforward- for the most part. We’ve been pushing iterations of the project live since week one so that “the client” can show it to his team. After the initial deploy at the end of the first week we’ve been iterating about every three days and then pushing the updates live. But not everything has been dandy in deploy-ville, and hopefully any other rookies out there can learn from our two crucial mistakes.

Our first big mistake was just not getting into the right rhythm before deploying: making sure the tests pass, committing our changes, pulling the master to check for changes from the other guys, running the tests again, then deploying. Since this is the first big project that any of us have incorporated tests into we still haven’t completely got into the testing rhythm- but after one major deploy hiccup we learned fast. About a week ago Paul did a batch of refactoring to manage technical debt that we’d built up. Everything looked to be ok on the front-end so we pushed it live. We went to bring the live site up and you could here the explosion as our broken site failed to load (I imagine the sound of canned goods, pots and pans falling off the shelves in a pantry).

We quickly ran the tests… and immediately open palms met foreheads. The tests told us exactly what changes we hadn’t made after the refactoring. A quick journey through the failed error messages, another push, and the site was up and running. (Yes, we need to set up automatic tests.) The lesson: always use a solid, test-based workflow- especially with all of the pushes, pulls, merges and deploys.

Our second major snafu came last Friday afternoon. Again, it was near the end of the day, Paul had spent some time refactoring in order to nip and tuck the week’s code (we have some queries that he’s trying to keep as lean as possible), and after checking his tests locally, merging with my code and checking the tests again, I pushed everything live. Paul was working remotely from Vermont so he wrapped up the day at four and Dan had headed home to cover for his babysitter that cancelled… and I loaded up the page to the sound of more pots and pans falling off of imaginary shelves. And this time not a single page would load. Fortunately for me and my Friday evening, the “live” in-progress site isn’t actually something that the client requested and has mainly been for our own experience (and props to JC for recommending we shoot for three-day iterations).

So although I was able to go home and not worry about it too much, the broken site felt like a sliver under my skin and I spent a little time on Saturday morning looking at the logs to figure out what happened. Fortunately, the problem wasn’t that severe and I knew we could fix it in a few minutes on Monday morning, but the lesson stared me straight in the face: we hadn’t set up our development environment to mirror our production environment. Specifically, we were using SQLite for our database locally and Heroku uses PostgreSQL, and they occasionally want to handle numbers differently (integers, numbers as strings, Fixnums). Some of the refactoring was returning values that PostgreSQL couldn’t handle. We actually knew this was a problem because of an earlier error and we’d spent at least half a day trying to get PostgreSQL running locally on mine or Paul’s machines— but to no avail. Apparently OS X Lion includes a version of PostgreSQL that causes some conflicts so we decided to de-prioritize that task. Again, a rookie move we don’t plan to make in the future.

And there’s actually one more lesson in there too: deploy early and well before any deadline. As much as deploying should be a non-event, sh*t happens so give yourself some room to breathe. Our final demo for the client will be Friday at 3 P.M. so we’ve scheduled our last major deploy for Thursday afternoon. That hopefully means we’ll have a nice relaxing Friday morning for gathering up our notes and prepping for the demo— but you know how those best laid plans go.