The burden of complexity

As creators of software, we all have the overwhelming task of both delivering quickly and anticipating the future at the same time. If we are slow at delivering our managers complain, but if we don’t design the solutions in a proper way, the architect complains! Finding the balance is hard and in this post I will talk about some common pitfalls that lead to overly complex solutions that take longer than necessary and that are difficult to maintain.

Premature use of Patterns

Patterns is a great way to capture a concept in a general description which can then be used over and over again. The gang of four presented a big collection of patterns in 1994. These have been used widely since, and a lot of them have either been incorporated into the technical vocabulary, or even natively into programming languages. While patterns are great at providing a collection of well thought out solutions for common problems, they are also easy to overuse. Patterns often comes with abstractions, which in turn increases complexity. If a patterns is applied to a system that is already complex, it is important to consider if the complexity added is worth the advantage the pattern gives.

Most patterns are meant for scaling applications. The pattern introduces some abstraction that makes it easy to add functionality without changing too much code. That is also why patterns might be a bigger burden than relief if you do not end up utilizing these facilities. A factory that can produce two variants or a builder with two builder methods might be more complex to use, than if the same had been implemented without the abstractions.

This brings me to YAGNI, a concept from Extreme Programming, and in the linked article explained by Martin Fowler. It is an acronym for “You aren’t gonna need it”, which is exactly the risk of overusing patterns. Spending the time to add a pattern to the system will both incur a cost on understanding the system, and on delivering the features currently in the pipeline. Furthermore, if the pattern is never used to its full potential, or even worse, it turns out that it does not solve the problem properly, it is either wasted effort or it might even have to be completely rewritten. Thus it is very important to consider if a pattern will pay off immediately, or only after it is being utilized in a couple of instances more. If the latter is the case, it might very well not be the right time to add the pattern.

Anticipating the future

When building a new feature we often get told that this feature will be needed for more instances of variants of the same problem. The business has a plan of selling this feature to this and that customer, and they tell you that they will be similar. And then begins the anticipation race. Knowing that the feature should be future proof, you begin thinking about how to make it usable for customers with all sorts of requirements. Little by little the feature becomes littered with hooks and class hierarchies that accommodate various changes that surely will come when the feature gets more use. Until it turns out that we anticipated the wrong requirements.

Once again YAGNI must be your mantra. By attempting to anticipate the future, you will end up with a much more complicated structure than needed, which will end up hindering you when you want to do changes for any other reason. The fact is that we can never anticipate the future, and starting out simple and adding to it, is always easier than starting with a complex structure and modifying that to fit a new case that is not exactly what was anticipated.

Instead of adding complexity in anticipation, it is much more valuable to start out simple and get the customer to actually use it. Looking at real usage you can gain information about what needs improvement and how the feature is actually used. When a customer arrives that needs a slight variation of the feature, you can actually put him in front of the production version which will help to gather the exact requirements.

Overhead throughout the cycle

Looking further down the pipeline of development, adding unneeded complexity adds overhead throughout, which means that the cost of unneeded complexity is amplified many times in its lifetime.

The reviewer who has to review the changes will need to spend more time understanding the code. The additional complexity and abstractions will be a source of confusion and further add to the time spent getting the actual necessary feature delivered.

When the code arrives in production it will need to be maintained, and any bugs found in the code will take longer to debug because of the added complexity.

Conclusion

Finding the right level of complexity is difficult, but simplicity will a lot of time be an advantage. It makes the whole pipeline smoother, and when the feature is actually in production, you have the opportunity to investigate what the customers want and you can make a justified decision on the right abstractions to use in the future. Manage your complexity or gain perplexity!

Django group by

I have often been confused by the way that the Django ORM implements the equivalent of the “group by” statement of SQL. When I want to group some data I often think in a structure very similar to SQL. And because Django has a higher level abstraction I often forget how to string it together properly. In this post, I will describe how I understand the grouping in Django after experimenting a bit with the feature.

Values

Values() is the method that does the grouping of the data. Calling values() without a subsequent annotate() call just pulls out the values of the specified fields.

Calling annotate() after a values() call on a queryset groups that data according to the specified fields, such that each group has a unique combination of the fields. The annotation is then performed on the resulting groups.

Annotate

The annotate call defines what information to add to each group. Usually this is some kind of aggregation – Count, Sum, Avg, Min or Max. Each line returned by the values() call will be annotated with an aggregation of all the rows that shares that combination of fields. A Sum annotation will therefore be the sum of all rows that share some combination of the fields used in the values() call.

Order_by

An important quirk is the interaction between annotate() and order_by(). If a model has a default ordering, or the query has an order_by() part, this will be used when grouping data. q.values(‘name’).order_by(‘id’) will be treated as grouping on distinct (name, id) pairs, which might not be intended. To ensure that a distant order_by is not interfering with a grouping, you can call order_by() without arguments to clear any previous ordering.

I hope this cleared up how to group data and add annotations with the Django ORM. The official documentation is brilliant as always, so also check that out for more information on aggregation: https://docs.djangoproject.com/en/2.2/topics/db/aggregation/#values

Latency in debugging

Debugging can be a frustrating couple of hours, and it can be the most rewarding work of the month. You might end up in places where you did not even dream of. I have been scrolling through the source code of libraries to understand how a parameter manipulate some function. I have been diving through dependencies of dependencies to find out why a particular piece of code is not working. When debugging, you always look for some way to unearth the missing puzzle piece.

One of the things I have recognized is that different methods of debugging provide different latencies. Going into a shell and pasting in the exact piece of code will give you a result immediately. Committing a piece of code to a repo, getting it reviewed by a coworker, deploying it to test, deploying it to production, and then getting the result takes a bit longer. Depending on the task it might be necessary to do the cumbersome last option, but as a rule of thumb, we always want to keep the latency of our debugging as low as possible.

In this post I will be exploring the various levels of debugging that can be used to debug in, and how they differ regarding the latency of the feedback loop.

Levels of debugging

The following are the levels I have identified as the major methods of debugging, from the lowest latency to the highest.

Shell / script

The ultimate fast feedback delivery method. You write some code and get the feedback. If you want to try the same method with several inputs, you do it right away. This method might get cumbersome if you are debugging something that requires a lot of setup, because you need to find out how to set it up for your particular case.

Unittests

This is very similar to using a script, as mentioned above, but has the advantage that you probably (hopefully!) already have a test suite. You can use existing tests to create a new case that captures the problem, and then use your standard toolchain for unit tests. I use the debugger in Pycharm which helps me step through and inspect variables in the tests. This helps my debugging immensely.

Local instance

Booting up the application and clicking through it to provoke the bug is often an easy way to detect what context the bug happens in, and what methods should be focused on. The difficulty with this method is that it is not always easy to translate a click on a button in the application to the exact path of execution that contains the problem. If the bug is dependent on a particular data configuration or some other edge conditions, it might not be obvious how to replicate it. On the other hand, the actual application might be the best way of finding out how the bug arises.

Test environment

Particularly difficult problems might need to be debugged in a test environment. This is often because the bug has some complex data dependency that is not easy to replicate in your local environment. If your test environment has data similar to reals customers, it might also be easier to find a case that is similar to the bug that was originally reported.

Because the test environment is a remote system, the latency gets really high at this point. The right code needs to be on the system, logs need to be traversed to find the appropriate piece of information, and other users using the same environment might interfere with the debugging.

Because this system is a test system, you can usually do whatever you want, just like on a local instance. The difference is that you need to jump through some hoops that might be more difficult than you initially think. This creates overhead that adds to the inherent latency when using the test environment.

This is a dangerous level because it often feels like you are making progress towards finding a solution, but oftentimes you will end up facing some restriction that complicates the debugging process. Bypassing those restrictions feels necessary, but it would actually be much faster and easier if done on a lower level. When debugging on a test system, always consider if it is really necessary to be on this level.

Production environment

Using the production environment should always be the last resort. If the production environment is not yelling the answer to you, it will not be worth trying to make it do something it was not supposed to. Trying to do that needs special care and you might end up breaking something. Don’t do it!

Keep low on the ladder

Proponents of test-first development will say that you should always create a test case for the bug, and then fix it. I do agree, but sometimes you do not know all details of the case is, and then you need to investigate. If you can capture the case in a unit test, you have captured the problem. If you cannot you need to move to another level and find out what you are missing. When you find that missing piece of information, move down again, and keep as low as possible on the ladder .

It might seem like an obvious piece of advice to always stay on the level with the least latency, but it can be tempting to stay on a high level for too long, because it is more convenient. But the hidden costs often ends up being far greater than the cost of moving to a method with lower latency. The trick is to identify when it is necessary to stay and when it is a waste.

Identifying and automating repetetive tasks

In my daily work at a product company, I often get requests from our support department to help them with a problem they are facing when while helping a customer. This can be simple things that are simply too tedious to do in the interface or something complex that needs to be investigated before a solution can be applied.

Do not underestimate the time you spend on these small repetitive tasks. Every time you have to perform these seemingly small tasks, the time will add up. In addition, every time you get theses requests, you might be taken out of your train of thought, and you lose much more time than the base amount from just performing the task.

I am prone to thinking that this class of small tasks will only appear once, and thus it would be a waste spending too much time on it. I am also inclined to postpone doing a good solution because I know that a fullfledged solution is already put in the plan. Then I do the task over and over again using some middle of the road solution, instead of getting it out of the way once and for all.

After a discussion with a colleague I have come up with the following pattern for identifying which tasks to automate, how to do it and when to do it:

1. Identify the repetitive task

The first step is to actually identify that a task is repetitive. Some tasks actually only come up once, and spending time on those would be a waste. If two or more colleagues do the same kind of task at separate instances, it might not feel repetitive, so it is very important that this kind of support tasks are discussed in a public forum so it is possible to get an overview.

2. Assess the task

Ask yourself some questions about the task. These will help you make a good decision on how to improve the situation.

  1. Why is this task arriving at your desk?
  2. Is the system missing a feature, or is the documentation not good enough for the customer or the support team?
  3. Is it possible to perform in the system, but takes too many clicks?
  4. Is it possible in the system but the requirements have changed slightly, making it awkward to perform?
  5. Has the frequency of the task changed for the customer, making the current process too cumbersome?

Also, remember that many problems come in different shades. You need to identify if there are variations of the task that need different ways of solving.

3. Ease the task

What is the easiest way to fix this? It might be a script, it might be a change in the software. At this point, the assessment from the previous point needs to be taken into account. The correct solution might be requiring a lot of work, and a short term solution might be necessary. But remember that short term / long term is a false dichotomy. There are often several possible short term solutions, so it is important to choose one that will actually help you ease the task, with an appropriate amount of work.

There are also different contexts where the solution can be placed. Sometimes a local code snippet is sufficient, while other times a script available for the whole team is necessary. Sometimes when the full solution requires a lot of work, an admin page can be created that is not necessarily pretty or ready for public viewing, but it does the job and saves you the work. This can also be a good way of prototyping the real solution without committing too much.

4. Eliminate

When you have implemented a solution, keep assessing if the situation has improved. This applies to small scripts as well as fully implemented solutions. It is always a possibility that you missed the mark, and improvements are needed. There might also be variations of the problem that might need some tweaking of your solution, or even a separate soltion.

I hope that my ideas can inspire you to find the appropriate solution to the tasks that keep coming and save time that can be used for the real problems!