The Right Tool For The Job Isn’t What You Think It Is
This tweet recently took me down a rabbit hole of ideas about software and the epiphenomenon that we produce when we write it. As is often the case when I start thinking about something, other seemingly random events or articles bubble to the top of my consciousness or Twitter feed or whatever. Choose Boring Technology had recently popped up, linked from another article on architectural working groups and the idea of talking about technology choices. Outside of all that, I’ve recently been waking up at 1 in the morning thinking about some looming changes at work in our technology stack. It’s weird how the universe knows when you are ready for an idea and suddenly, you can tie multiple streams of thought into a coherent whole. Well, you can at least try. This post is an attempt to do that.
Epiphenomenon is a secondary effect that an action has that occurs in parallel to the primary effect. The medical world is rife with examples of epiphenomenon. I assert the software world is too but that they are poorly documented or catalogued because they are primarily negative. I believe epiphenomenon are what Michael Feathers is talking about in the lede. If you only see the effects of your software choices, you don’t really understand what you have built. It is only when you see the effects of the effect, the epiphenomenon, do you really understand. I contend this is rarely technological in nature but is instead cultural and has wide ranging effects, many of them negative.
How is this related to choosing boring technology? Epiphenomenon are much more well known and much less widespread in boring, well understood technologies. When you choose exciting technologies, the related effects of the effects of your choices are deeper and broader because you understand fewer of the implications of the choice. These are the unknown unknowns that Dan talks about. We see this over and over in the tech space where people think that choices are made in a total vacuum with no organizational effects outside the primary technological ones.
At Amazon, they are famous for their service oriented architecture. It sounds so dreamy. We’ll have services that allow us to iterate independently and deploy pieces independently and we’ll all be so independent. The problem is that independence requires incredible discipline, discipline that is paradoxically very dependent on everyone being on the same page about what a service looks like and what it has access to and how it goes about getting the data it needs to function. Without any of that very hard discipline that rarely seems to exist outside the Amazons of the world, what you have is not your dreamy Service Oriented Architecture but instead a distributed monolith that is actually a hundred times worse than the actual monolith you replaced.
I saw several people disagreeing with that tweet and wondered why it was so controversial. It dawned on me that the people disagreeing with it were developers, people deep down in the corporate food chain who have this idea of using the right tool for the job in all instances which is great if you are a carpenter but fucking insane if you are a software shop. When a carpenter uses a miter saw instead of a hammer, it’s because you can’t cut a 2×4 with a hammer unless you are very very dedicated and also the shittiest carpenter in the world. However, when an engineer says “This is the job for Super Document Database (which by the way we’ve never once run in production)!” in his best Superman voice, he’s saying that in a total vacuum, a vacuum that doesn’t exist for the carpenter (and actually doesn’t exist for the engineer, he just doesn’t know it). Now you have your data in two places. Now you need different engineering rules for how its accessed, what its SLAs are, how its monitored, how it gets to your analytics team who just got blindsided for the fourth time this year with some technology, the adoption of which they had no input into, etc, etc, etc, until everyone in the company wants to go on a homicidal rampage.
Logical conclusion time: Imagine a team of 5 developers with 100 microservices. Imagine the cognitive overload required to know where something happens in the system. Imagine the operational overload of trying to track down a distributed system bug in 100 microservices when you have 5 developers and 1 very sad operations person. Ciaran isn’t saying it’s technologically a bad idea to have more services than developers. He’s saying it’s a cultural/organizational bad idea. He didn’t say it in the tweet or the thread because he didn’t have #280Characters or just doesn’t know how to express it. But that’s what he’s saying. It introduces a myriad of problems that a monolith or a very small set of team or developer owned services do not.
Our industry has spread this “right tool for the job” meme and to our benefit, it’s stuck. It’s to our benefit because we developers get to play with shiny jangly things and then move on to some other job. People who don’t have such fluid career options are then stuck supporting or trying to get information out of a piece of technology that isn’t the right tool for THEIR particular job. “The Job” is so much broader than the technological merits and characteristics of a particular decision. As Dan points out in his point, it’s amazing what you can do with boring technology like PHP, Postgres and Python. You better have a really damn good reason that you can defend to a committee of highly skeptical people. If you can’t do that, you use the same old boring technology.
Our industry and by extension our careers live in this paradoxical contradiction. On the one hand, a developer can’t write VB.Net his entire career because he’ll watch his peers get promoted and his salary not keep up with inflation and his wife leave him for the sexy Kotlin developer who just came to town. On the other hand, taking a multimillion dollar company that happens to use VB.net and using that as an excuse to scorch the earth technologically speaking is in my mind very nearly a crime. There is a middle ground of course but it’s a difficult one, fraught with large falling rocks, slippery corners with no guard rails and a methed out semi driver careening down the mountain in the opposite direction you are going.
Changing technologies has impacts for different arms of the organization and I’ve found it useful to frame these in terms of compile versus runtime impacts. Developers and development teams get to discover things at compile time. When you choose a new language, you learn it slowly over the course of a project or 4. But if you operate in a classic company where you throw software over the wall for operations, they get to find out about the new tech stack at runtime, i.e. at 3 AM when something is segfaulting in production. The pain for choosing a new technology is felt differently by different groups of the organization. Development teams have a tendency to locally optimize for pain, e.g. push it off into the distant future because they are under a deadline and trying to get something, anything to work and so decisions are made that put off a great deal of pain.
Technological change requires understanding the effects of the effects of your decisions. Put more succinctly, it requires empathy. It’s a good thing most developers I’ve known are such empathetic creatures. SIgh. Perhaps it’s time we start enforcing empathy more broadly. The only way I know to do that is oddly a technological solution. If you want to roll out some new piece of technology (language, platform, database, source control, build tool, deployment model or in the case of where I currently work all of the above), you have to support it from the moment it’s a cute little wonderful baby in your hands all the way up to when it’s a creaky old geezer shitting its pants and mumbling about war bonds. Put more directly, any time someone has a question or a problem with your choice, you have to answer it. You don’t get to put them off or say it’s someone else’s job or hire a consultancy to tell you what to do. If it’s broken at 3 AM, you get the call. If analytics doesn’t know how to get data out of the database, you get to teach them. If you fucked up a kubernetes script and deployed 500 instances of your 200 line microservice, you get to explain to the CFO why the AWS bill is the same amount as he’s paying to send his daughter to Yale. Suddenly, that boring technology that you totally understand sounds fantastic because you’d like to go back to sleeping or drinking Dewars straight from the bottle or whatever.
We cannot keep existing as an industry by pushing the pain we create off onto other people. On the flip side, those people we have been pushing pain to need to make it easier for us to run small experiments and not say no to everything just because “it’s production”. There has to be a discussion. That’s where things seem to completely fall apart because frankly, almost no developer or operations person I’ve known has, when faced with a technological question, said “I know, I’ll go talk to this other team I don’t really ever interface with and see what they think of the idea.”
Software is just as much cultural as it is technological. Nothing exists in a vacuum. The earlier we understand that and the more dedicated to the impact and effects of that understanding, the happier we’ll be as teams of people trying to deliver value to the business. Because in the end, as Dan puts it, the actual job we’re doing is keeping the business in business. All decisions about tooling have to be made in that framework. Any tool that doesn’t serve that job and end is most decidedly NOT the right tool for the job.