Easily understood data processing

Marcus • June 30, 2020

Rolling up a piece of Sushi by hand.

A lot of what is written as part of a software implementation is very concrete: With what the standard libraries, external and internal dependencies provide to you as tools, you very explicitly tell the computer what to do when.

Your implementation is written on a completely different level than any business requirements you’re tasked with solving, though. To provide the most useful tools, understanding the workflows that software solves is essential. The output that’s relevant to you might mean nothing to the business, though, and if you want other departments to understand how data is processed, it needs to be abstract enough.

Imagine you had no code

Here’s what I can do, and what a business department typically can’t: I can look at the code with my own eyes and learn from what I can or can’t see. Given a language you’re reasonably familiar with, that shouldn’t be impossible.

But for the business departments, here’s the very quintessential question: What would they need to understand your process?

The only reasonably way I would estimate a basic level of knowledge can be found is by looking at the inputs, outputs and trying to estimate how they match. When the association is trivial, I can tell from data alone how the process works. The more complicated the software is, the more insight is necessary.

And just before you start randomly hacking things together for your next solution where you want to provide business-readable output, just consider this for a moment: What is the most helpful approach to support the business departments’ understanding?

Abstractions

Software is a very precisely defined order of steps that are required: Looking into a single function provides you no useful meaning in the grander scheme of things, and even on a class level you’re often left with finely tuned building blocks for the larger process to use.

Now if you would explain this process to the business on this level, you are aiming for a high level of confusion: Your software is essentially a black box to most if not all of your end users, and many are quite happy with not knowing the finer details.

Finding the right fit for abstractions in code is a herculean challenge, to be sure. If you have the wrong abstractions, your code can be vastly harder to understand – while the right abstractions it plenty easier for new developers to work with.

If you’re a developer who’s putting a greater process together, it may very well be that you have all those details necessary to understand the inner workings: You need to know how the database is structured, how certain objects are loaded and the likes. Will that information, if you write it to standard-output, help any other developer understand your process?

Reasoning about processes

There’s about two states from which I would estimate most of us peek into any particular problem, and whether you start at the first or the second one can largely be influenced by how knowledgeable the people around you are.

If I have trouble understanding the process to begin with, the last part I need is highly detailed output that go into every little aspect of what is done when. Looking at problems, I personally need to have a very general idea of how things fit together. Wouldn’t I be much better off if there were reasonably generalizations in place, where I could see what logical (rather than technical) steps are undertaken as part of a process? Wouldn’t that also benefit the business side a lot?

And when I already have an understanding of a process, wouldn’t tests, documentation or debugging help provide deeper insights into singular units of code? Wouldn’t they also help me understand the structure? Isn’t that sort of knowledge also entirely foreign to a business department, for good reason?

When things break

Bizarrely enough, we’re not living in a perfect world, and there’s nothing else that’s as guaranteed as the eventual day where you’ll run into a constellation of data and behaviour you didn’t expect. That may be as soon as you test your code, after it’s shipped to production, or some months later.

There’s a multitude of things that can go wrong. The database could stop working, or you’re working with an unexpected constellation of data, but alas: If your processing is aimed at being transparent, then any such failure should include the details to understand what went wrong.

When things go okay, limiting yourself to only output things in logical steps is fine, after all that should by now simply be describing what your software does on a higher level.

When something fails, it’s very possible the libraries to interact with that software component can already tell you what’s wrong – from a SQL error code to something completely meaningless. The more context you can provide as to where you were when it all fell apart, the better.

If your business department can fix the error by providing other inputs, configuring something, etc., all necessary steps to address the issue should be present – in as simple a way as possible. And if you can offer to fix it, then that’s making everyone’s life easier.

Aggregations

Now that we’ve established that all output provided should be reasonably useful to your business department, we can take this a step further.

Rather than delving deep into the details of a full run of your process, we can aggregate the resulting data: If no problem occured while handling a certain subset of inputs, their presence is important, but their details are not.

Just imagine if you were to monitor your business processes for a moment: If everything went smoothly, the number of items handled may be relevant just to see if it’s in a good ballpark.

Yet if you had problems or worse, that’s especially what the focus should be on – make it clear and concise that problems occured & how many items are affected vs. non-affected. The key problems should be obvious, without scrolling through hundreds of lines of “this was processed without problems” and hoping that you’re not accidentally scrolling past the issues present.

Key Takeaways

}