Software Gambles
Bear with me, this isn’t going to sound like a software essay for a little bit. But trust me, I’ll get there.
Ask a random sampling of people who know me at a level somewhere north of “mere acquaintance” what one word they would use to describe me and I’d bet at least 30% of them said “gambler”. I’m not going to get into the details of why that might be the case here on the public internet but the moniker might be warranted based on certain extracurricular activities. On the surface, this seems weird because I’m not much of a chance seeking, thrill riding enthusiast in regular life. But I do love action when it comes to football games, casinos and golf matches. Early on in my gambling career, there were lots of losses and not so many wins. Gambling, like any other skill, involves some experiential instruction that can’t be readily gained from reading about it on the internet. But there’s a dirty little secret to gambling that most people on the outside looking in don’t understand. Good gamblers rarely take risks on the unknown or combinations of bets because they know as a general rule, you have a much higher chance of going broke when you do. The fat tail of gambling failure is similar to the old adage about the stock market: it can stay irrational a lot longer than you can stay solvent. Betting on 10 games on Sunday is a fast way to go broke unless you have very real, very hard empirical data that says you can win 56% of the time (and that’s all it takes to be a very successful sports bettor which is a shocking fact to many people and one reason why you should never, ever trust someone who is trying to sell you picks that claim greater than 57% winners. If they could really pick ’em at that clip, they wouldn’t be selling picks).
Let’s say you really can pick football (or basketball or whatever) winners at 55%. Let’s say you have a $2000 bankroll and you bet the recommended amount of 5% of your bankroll on any given bet. If you bet one game on Sunday, you have a 55% chance of winning $100 and a 45% chance of losing $110 (the extra $10 is the service charge the book extracts, that’s another post all to itself), all else being equal, for an expected profit of $5.5. Sweet, we’re going to be rich! Seems like we should be as many games as we can then, right? Well, no. For one thing, chances are you don’t actually pick at 55%. You pick 55% right on games you fully understand and that you have studied. Others, you might not have a clue about. Also, even if this is a very normal distribution AND if you actually do pick every game at 55%, there is a chance you will lose every single game over the course of 2 weeks and go broke. The chance is astronomically low but it exists. And that’s why most professional gamblers don’t bet lots and lots of games every weekend. Limit your risk by taking singular and calculated gambles that you control for.
What does this have to do with software? This essay on building stable systems contains a treasure trove of important ideas for developing good software but one that stood out to me was this paragraph:
A project usually have a single gamble only. Doing something you’ve never done before or has high risk/reward is a gamble. Picking a new programming language is a gamble. Using a new framework is a gamble. Using some new way to deploy the application is a gamble. Control for risk by knowing where you have gambled and what is the stable part of the software. Be prepared to re-roll (mulligan) should the gamble come out unfavorably.
Note the intersection of ideas between actual gambling and gambling on your software projects. Limit your risk by limiting your gambles. At work, I’m currently involved in a high-priority project that has the potential to shift the types of products we can offer our customers substantially. It’s actually been on the books for over two years with fits and starts but finally has the political backing to get it done. Now to me, a high priority, high visibility project like this is in and of itself a gamble. On top of that, this particular project is different from our current set up in a few important ways which increases the risk. That alone should be enough to say: “let’s not introduce any more risk into the project.” Instead, for a variety of reasons both political and technical in nature, we are attempting to deliver this project using a new communication framework (RabbitMQ), integrating a new database (Couchbase), monitoring it using a new stack (ELK), deploying it using a new tool (Octopus Deploy) and possibly utilizing an offshore team in Russia. As exciting as all that sounds technically (except for that last part, that gives me nightmares), it seems to me a project fraught with risk. If our chances of success for the project doing just one of those things is somewhere in the realm of 80%, the chance of getting them all right is tiny. Our best case scenario in a probability function is that each event is unrelated (this isn’t necessarily true if some of the probabilities are related and work in each other’s favor, see Bayes’ Theorem but I seriously doubt implementing RabbitMQ is going to drastically increase the success rate of a Couchbase implementation). Instead of limiting our risk, this project is taking on scope like the Lusitania took on water.
None of this means the project will be a failure. But what it likely means is that many of the gambles added to the project will result in poor implementations that hurt our chances of success in the medium to long term. This is not the way to build a stable system. So how do we manage the risk? One is to push back on all the technological scope. This is possible but difficult in an environment where there are competing interests above and beyond the success of the project. Delivering X is great for the company but delivering X with Y new technologies is better for N number of teams. Saying no means some teams have their darlings at least pushed off into the future if not killed. The problem with this is that my team doesn’t control all these decisions. Another way might be to utilize one technology (RabbitMQ for instance) to ease the risk of another one (Couchbase. By doing database writes via a queue, we could write to both the new and old database to ensure success). This is something the team does have control over and that we will probably implement. Another way is to leverage the expertise of other teams/people for particular pieces (DevOps controls Octopus). But each of these are just Band-Aids on the larger wound of too much risk in a single project.
The right way to have a successful project and move towards a stable system is to bite off only as much risk as you can hedge. Each of the tenets in that essay can be used to build a stable system but it involves engineering discipline and political understanding to get there. If you watched the Republican debate tonight, you know political understanding is a dying characteristic in our society. In the interim, the best I can do is protect the team from the risks to the best of my ability and let strong engineering rise to the top. And hope that the next big bet I make only includes a single gamble. I may or may not like the Steelers at 10 to 1 to win the Super Bowl next year. 🙂