Get Good at Rolling Back
The only thing constant in enterprise technology is change itself. If you boil it down, that’s the essence of our jobs. We change things: applications, web services, networks, desktops, databases, architectures, you name it. That’s the bulk of the work. Keeping systems humming without any change isn’t really that hard. It’s all of the new business requirements, refactoring, security patches, technology upgrades, organic growth, and audit remediation that makes life exciting, isn’t it?
With all of this change, you’d think we’d be naturally good at it, but I’ve found that many organizations struggle with it. Due to the 24 x 7 x 365 nature of global enterprise business, it can be hard to find a change window. When we get a change window, we hold onto it because we don’t know when the next one is going to be. Also, because we are overly optimistic estimators, we tend to chew up most of lead-time developing our change, not testing or preparing the deployment and rollback of our change. All of these culminate in highly risky do-or-die change deployments. We put tremendous pressure on ourselves with all sorts of heroics because “failure is not an option.”
This a pretty tough challenge. One solution is to increase the budget and schedule of every project so there is ample time and resources to do everything that’s needed. As much as I’d love for that to happen, I don’t see that as realistic. The business expects us to deliver value faster and cheaper, not slower and more expensive. That’s just not going to happen.
There is another solution that is worth exploring. Move away from huge, infrequent changes. Break things up into small, frequent deployments. Make the architectural investments necessary to make your systems changeable without business disruption. This is easy to say, hard to do, but the pay-off is big.
First, there is a technical modernization component to this. Second, there is a process modernization component to this, namely DevOps. Third, there’s a mental component. I’m going to zero-in on the mental component.
Let’s be honest. We hate rolling back. It feels like failure. We may even call it failure. If so, that’s a problem. We muster the discipline to deploy our change, validate our change, and if it doesn’t pass validation, rollback our change within the change window. That’s still success. The alternative is to try to fix on failure, pull an all-nighter, and let the business deal with the fall-out in the morning.
One way to get past the mental hurdle is to have another change window right around the corner. Getting access to schedule another change should be easy. If it’s not, then you are going to load-up the risk into the window you have. The way to earn those change windows is to make them small and non-disruptive. Then, if it doesn’t work today, you can try again tomorrow without penalty. You shouldn’t have to wait another month or quarter.
Get good at rolling back. I like to celebrate rollbacks. I recognize my team when they deploy a change and quickly roll back before sustaining significant business impact. There’s always time to try again another time soon. Frankly, it’s good practice. If you are constantly operating in “failure is not an option” mode, then chances are you don’t know how to rollback even if you really need to, and you might have a full-blown disaster recovery event in your future. I like my teams to be good at rolling back and comfortable making the call to rollback early in the window.
Rollbacks in enterprise technology have a negative connotation. We need to become more like Walmart, where rollback means “price reduction.” We should have the same smiley emoji excitedly declaring that we are always rolling back because we know how to protect the business from failed changes. How do you know if it’s working? Track your change-induced incidents. Getting good at rolling back should drive down their frequency, severity, and duration. Go ahead, be a little like Walmart. It’s ok.