There’s a very undervalued skill I’ve observed over the years writing software, and that is complaining (to be valuable, you do have to be good at it). I’ve observed it in Amazon’s trouble ticket system, in open source communities, asking questions on stack overflow, and pretty much anywhere else engineers interact collaboratively. Learning to complain accurately and precisely is an incredibly valuable skill and is a hallmark of great software engineers.
Creating and working tickets at Amazon was great experience for developing this skillset, and one I greatly under-appreciated at the time. At Amazon, every team has a ticket queue (or 5), and every problem can be assigned to a ticket queue. Tickets can move between queues on a whim, but tickets must always remain in a queue and every queue has an owner — thus, every ticket has an owner. Additionally, there’s pressure from management to have fewer tickets in your queue at all times, and to have fewer ‘high severity’ tickets pass through your queue. The result of this environment is that when you open a ticket against another team, you want to be careful it doesn’t come back to your own queue (more on how this works below). Your software quality is measured indirectly through how many tickets you receive. (It’s more complicated than that, but the way I’ve seen it used it is a valid proxy.)
Lets take an example. I’m working on some software, and I find what I believe to be a defect in someone else’s code (Gasp!). I might open a ticket against the owner of that software to request a fix. Should they determine that their systems are working as designed (read as “you’re the who made the mistake, check your own code”), they might reassign the ticket back to my queue — after all, they have determined they have no action to take to resolve the supposed issue. Yup, now both our metrics show that ticket passing through it, but I own the next step. This pattern results in at least a moral incentive to not have tickets you create to come back to your own queue (from firsthand knowledge, it is rather embarrassing when it happens). The only way to get them to ‘accept’ the defect report is to demonstrate or provide evidence of the error. The result is that you learn to write great tickets. This same skill applies to (some) questions on StackOverflow or issues on GitHub. Being able to precisely and accurately diagnose the issue by identifying expected behavior and contrasting it with the observed behavior takes time and diligence. Furthermore, it requires you to eliminate possible sources of error in code or systems you control before concluding the error lies outside of your domain.
Great complaints usually include enough detail to find and trace the issue in question (request id for log searching along with approximate time and date of request), logs clearly demonstrating the symptom in question, and/or code that can deterministically reproduce the issue. (If it’s non-deterministic, it’s an order of magnitude harder to diagnose … good luck).
Once you’ve gone through this exercise a few times (and had other people politely reject your complaints due to lack of information), you start to get pretty good at creating tickets. Also, the exercise is valuable in itself, and often leads to a resolution of the issue before you can complain. On numerous occasions I’ve started asking a question on StackOverflow or on Github and in the course of asking the question and creating reproduction steps I have solved my own problem. I will see a behavior that appears to be an error in someone else’s code; I will start working on an issue; go looking for an example, some code, or logs to make my point, and discover the issue lies elsewhere. I’d say this happens 80% of the time I start a question or issue.
In my interactions with other engineers, it’s become clear that the ability to speak precisely about what is happening in and about code is correlated with those able to complain clearlyt. It’s hard to establish causality, but improved complaint quality appears to be highly correlated with engineering talent. Not sure if I could actually screen for it, but it’s an interesting question.
On a final note, while on one of my previous teams, we used to joke that we would get a large stuffed bear. Before bothering another engineer for help, you would have to get the stuffed bear and explain your issue to the bear. If you were unable to demonstrate your issue to the bear, you couldn’t bother another engineer. This would simply force the thought process of stepping back from an issue to a higher level, explaining the inputs and outputs, and tracing the code through holistically — things like saying out loud what different variables contained. 90% of the time, you’d find your issue was a misassigned variable, or an instance of a class you were not expecting, or some other ‘menial’ error.
I swear, if we had actually gotten the bear, he would have immediately become the most helpful member of the team and surely improved our ability to troubleshoot issues.
Note: Just so there’s no confusion, I’m discussing one of many metrics used by Amazon internally, and even this one isn’t applied as bluntly as I might have described. The abstraction works for my use case, but please don’t jump to conclusions about Amazon’s management practices based on this. Also worth noting, it’s been more than a year since I last worked at Amazon, so it’s possible the anecdote is no longer accurate.