A systems analysis rubric

This is a systems analysis document rubric I’ve written several variations on in recent years. I’ve genericized it a bit and updated it with my current thinking. The form of this document is something a team would have in their official processes library somewhere, as a guide to how to do analysis of a fresh problem. I’ve had this blog post sitting 90% finished for a year now, so hey, here it is!

NB: I have come to believe that there is no one process that works for every team. The process that makes a team most effective is a process designed for that team, for their current project. Don’t be dogmatic about anything! Think about the true goal, which is to write good software that does what it needs to do, making its users happy while its authors have chill weekends. Take the ideas here and adapt them to what your team needs.

I no longer call this document an RFC, because I think this term comes with the implication of a slow-moving process, which has to solicit a lot of feedback because of its importance. This is perfect when you’re designing the fundamental protocols of the Internet; it is not quite what I find myself wanting my colleagues to do. I am using the term “system analysis rubric” as I think about this task right now, because systems analysis is where my head is, and what I see missing from a lot of problem-solving.

“Problem statement” might also be a good name for this document, although I think it’s good to explore possible solutions in them as well as problems. Coming to a clear problem statement is possibly the most important task you have when you’re thinking about changing something or making something new.

Design documents: a systems analysis rubric

A design document is a structured way to have and record a conversation about a problem. It is not appropriate for all problems you might be solving. The formality and length of the conversation depends on the scope and complexity of the problem. For a bug fix, you might need a short conversation with a single colleague, plus commentary in a commit message. For a major project, this process might take weeks to complete and you might write several of these documents.

While the process does produce a document, the document is not the most important result. The important result of the design process is the exploration of the problem that writing the document encourages. The conversation that accompanies the exploration aligns you and your team on an understanding of the problem. Yes, a design doc might describe a proposed solution, but this proposal is secondary to a team’s collective understanding of the problem to be solved.

I’m going to hammer on this point as I go here. The document exists to promote exploration and shared understanding of the problem. The document is a tool in service of a more important goal.

The widening conversation

My design documents start as notes to myself. I attempt to structure my own thoughts about a problem by writing down what I’m thinking. The stakes are low; the document is so informal that it’s likely nothing more than bulleted lists of things that come to mind. As you go, your writing should tighten up and be more complete, but remember: the document is not the point. Don’t stress about sentence perfection. ¹

The audience for the design document changes as it matures. When you are writing your first notes about a problem, you might share them only with a pairing partner to get immediate feedback. As you gain confidence in your understanding of the problem, widen the audience for your document. Seek out feedback from domain experts and from your team as a whole.

Show your design document to its stakeholders in advance of any public discussion, to give them a chance to think and give you feedback. Follow the principle of least surprise. People can react badly to surprises even if they agree with the proposal in the main. If you can, avoid introducing complex technical topics in meetings. Meetings are best used to solidify alignment or discuss specific known open questions.

When you reach the step of sharing your proposal with the entire engineering organization, it will be a solid document that you feel confident about.

The process of exploration

Step one: Research.

Investigate the background of the problem & document the current solutions, if they exist.
Document why the current solutions are inadequate, if relevant.
Gather relevant product documentation, if it exists. A product requirements document is ideal, and this phase might be focused on collaborating on requirements with a product team.

Step two: Write a clear problem statement.

What change would you like to effect upon the system?
What is happening today that you’d like to be different after the work you’re considering?
What are the properties of a successful solution? How will you know it’s successful?
Identify constraints on the solution space. Development time? Budget? Performance? A fixed point of integration?
Why is this the right problem to solve now?
What problems are you choosing not to solve right now?
Refine your problem statement until the team aligns on it.

Step three: Explore possible solutions.

Identify and consider possible solutions.
Discuss tradeoffs inherent in the solutions. Evaluate them against the constraints.
Estimate costs of the solutions, in time / effort / complexity / maintenance / hiring.
If necessary, do spike implementations to test the validity of assumptions or the viability of a specific approach.

Step four: Reach consensus on a solution that solves the stated problem while making acceptable tradeoffs.

Sometimes step four does not end in consensus on a solution, but instead ends in a decision to do further research. This is a good result and should not be treated as a negative by the team.

The design document should now be a document describing the problem, the research, and the possible solutions, and conclude with a plan of action. Congratulations! Archive the final version in the corporate wiki or in a docs folder for the resulting project. Its next audience is the person working on its replacement, who you’ve just given a good head start.

Now let’s review the parts again, in more detail.

The problem statement

You’ll start with something you think is a good problem statement, but you will often find that it doesn’t go into enough detail to support a good technical decision. Constraints might be missing. Stakeholders might disagree on what success looks like. Important implicit requirements might need to be unearthed.

The initial problem statement informs your research, but expect to change it. Push on it and iterate until you have something the team agrees on.

Among the constraints you implicitly take on for any project are your team’s shared values. If your team hasn’t discussed those values, now is a good time to do so. Your shared values are partly a reflection of your team’s personality and culture, and partly a reflection of where your business is. A team at a new startup trying to ship something quickly for survival might value a minimal solution that can be produced rapidly. The same team following up after a successful first ship might value flexibility instead. Make the implicit explicit and state any values that might affect this project.

Detail on the research step

Do not short-change this step! This is critical to understanding the problem. Do the background research if there is extant code. Summarize that research, with relevant links, so your readers can also understand the context.

Answer scaling questions if they’re relevant. Gather numbers for today, a year from now, and as far in advance as a reasonable guess can be made. Does your solution to the problem have a lifespan? Don’t look beyond that lifespan if so.

For data being stored and manipulated, you might ask questions like these:

How much data is being discussed? Is it large in total size or in quantity?
What actions are taken on this data? How often does it change? In what quantity?
Who is changing this data?
What are the constraints on data changes? Are there any conflict resolution requirements? Do operations need to be serializable (expensive) or will idempotency suffice (cheap)?
What happens if data mutations are lost?
How is this data expected to grow over time? Is it shardable if massive growth is expected?
If the data is very very large, the questions become more specialized. If you are not a data engineer, you might want to consult one.

For APIs, the questions might look like this:

What other systems are expected to call this API? To do what tasks?
What are the latency requirements?
Is this operation write heavy or read heavy?
How many requests/sec do we experience at peak? How will this number change over time in relation to business growth?
Does peak load differ from steady state load? When is the load heaviest? Does this correlate with other usage patterns in the system?
If you’re caching expensive work product, identify how you’ll be invalidating that cache. What fails if the cache is stale? (Do you really need a cache? Really?)

Failure analysis is next. This topic can be where engineers shine, because we love discussing how things fall over.

How might this system fail?
What are the consequences of failure for this system?
Should any of these failures be visible to or actionable by the end-user? If so, how should they be presented?
How should we handle the most important or unusual invisible-to-users errors? Retry? Escalate to human beings? Log and move on?

What are the security concerns? Do a threat modeling exercise with security experts early, particularly if you’re doing something new or not handled by existing tools.

Are you accepting untrusted user input? How do you need to handle it?
Who is allowed to perform these operations or see this data?
Are you managing data that needs to be protected or encrypted?
What would an attacker gain if one got access to your data or your API?
What would a person with bad motives do if they have normal access to this new functionality?

The appropriate questions to ask depend on what your area of work is and what “affair of the world” it addresses. These questions are intended to get you started.

Problem statement (slight return)

Come back to your initial problem statement. Can you sharpen it? Can you clearly define what a successful solution might look like now? If you’ve done the research, you probably can.

Don’t move forward until you have consensus that the problem statement is good.

Solutioneering

This is where programmers love to be. We are problem-solvers and we want to jump right to solving problems, especially if we can write code to do it. Resist this urge. Your solutions have a better chance of success if they are informed by a solid grasp of the problem you need to solve. Your second and third refinements of a solution are likely to be better than your first.

This step is often focused on navigating tradeoffs. The problem statement, if it’s sharp enough, gives you a good razor to use to evaluate solutions against your success criteria.

What are the costs of a possible solution? How complex is it?

Give the solution a t-shirt size. Does it match the time budget the project has?

What are the risks in the solution? How might it fail to solve the problem or otherwise fail as a project?

What’s the solution’s blast radius? That is, how many other systems would be affected by the work? How many teams?

Does the solution introduce new technologies to the overall system, or does it leverage tools your team understands well? If it spends novelty points, do they buy you something worth the expense?

Does the solution align with the team’s values?

In many cases the right solution will feel good to the team discussing it, and you’ll reach consensus smoothly. When information, values, and understanding of the problem is shared, alignment is easy. If consensus is not happening, make an attempt to figure out why the team is not aligned. Is there a disagreement about values? An information disparity? Is more research needed? Bring in senior staff to help break stalemates. Bring in somebody from another team who has relevant experience. Remember that the project might need to move forward anyway because of business needs, and a half-good solution might be better than no solution in the short term.

The document’s final home

I end up making a design or docs subfolder in the code repo for these documents. Your organization might have an official home for documents that isn’t the repo. I suggest that you at least store a copy next to the code, where it will survive as long as the code does. The document will drift out of sync with reality the instant anybody starts implementing the plan, but that is fine. The document exists to help future maintainers understand what their predecessors were thinking at the time.

Remember: the act of writing the document is more important than the document. The sharp problem statement and shared understanding of the solution were the goals of the exercise. If it got you there, it was good enough.

Additional reading

The Rust RFC process discusses the importance of the conversations.
Architectural Decision Records
A Structured RFC Process by Phil Calçado talks about the benefits of widening the circle of review.

Correctness in these details can make an unconscious impression on readers that matters, so if you have the time, hey, spell-check yourself. The opposite side of this is that you as a reader of design documents need to set aside your own fussiness about spelling and grammar, should you have any, especially if the author is not a native speaker of the language they’re writing in. These things are to the side of the problem. ↩︎