How to work with legacy code -- Jacek's Blog

Lately I've been doing a project where I dealt with legacy code, here are some take-aways for future me.

Note

By legacy code I mostly mean: code wrote by people no longer involved in the project, that has unreasonable amount of technical debt.

Main take-away

Some helpful tips you might use if dealing with legacy code yourself:

When in doubt test it. assert is your friend;
End to end tests are very easy way to get a good coverage; Unit-tests help you find bugs easier.
Run your tests often;
Add documentation as you go along. In Python --- use PEP-484 annotations.
Favour immutable data structures;
Don't be afraid to refactor. But don't refactor for the sake of refactoring;
You can refactor for the sake of understanding the code better, but only if it doesn't impede other people progress.
Keep in mind backwards compatibility, test this compatibility;
It's much easier to edit parts that have good test coverage;

When in doubt test it

Write tests. If you see a code function and think that something is true or needs to be true for this code to work check it! Add test or assertion.

I often add assertions that are somewhat exotic:

assert sorted(self.work_items) == self.work_items, "work_items need to be sorted"
assert some_matrix is self.cached_matrix, "Cache mismatch"
assert np.all(allocations <= 1), "Binary matrix hold values greater than zero"
assert len({employee.id for employee in employees}) == len(employees), "Employee ids needs to be unique!"

Application ran in production in the optimized mode, so assertions were only executed during automatic and manual tests (if you are wondering what is Python optimized mode, read this article)

End to end tests

Adding end to end tests allows you to easily make sure that program works the same before and after every change you introduce. So add as much of them as possible at very early stage of the project (this is when they give most value).

The problem with such tests is that, 90% of the time it is super hard to guess what broke them. So it makes sense to run them often, as often as possible. (cause if you know that 5 minutes ago they worked, and now they don't your last 5 minutes of code introduced an deviation of behaviour).

Other problem with such tests is that they check if the program behaves as before, so fixing things might break them.

Run tests often

It is a well known fact that the sooner you identify the bug, the easier it is to fix it. So run your tests as often as possible,

Add documentation as go forward

A lot of work when working with legacy code is reading the code, and trying to understand how it works.

Try to leave as much documentation as possible:

When you figured something out, write it in comment.
When you thing something is fishy, add // TODO: comment (but don't feel presured to fix them all), at the very least it will serve as a warning for the next guy
If you are unsure of something also write it in the docs.

In Python use type annotations. Period. They serve as a great documentation.

Using dataclasses, typing.NamedTuple, or python-attrs, is a great benefit. Every libarary of the above allows you to declaratively define classes, and get dunder methods for free (that is: __init__, __eq__, ...). If in doubt use dataclasses.

Compare these two snippets:

# Before
class Worker:

     def __init__(self, hour_range, **kwargs):
         # ...
         self.hour_range = [tuple(map(ut.ensure_date, h)) for h in hour_range]

# After

class Schedule:
   # ...
   pass

@dataclass
class Worker:
    # ...
    availability: typing.Sequence[Schedule] = dataclasses.field(default=tuple(), repr=False)

Favour immutable data structures

If you think something shouldn't change, make it immutable. If you are wrong (and something else is changing it) you'll get loud exception instead of silent bug.

Don't be afraid to refactor

Don't be afraid to refactor. But don't refactor for the sake of refactoring.

If you don't need to change some module, and you understand API between things you need to change and beforementioned module, then don't touch it. Refactoring things you don't need to touch wastes your time, and might introduce bugs.

You can refactor for the sake of understanding the code better, but only if it doesn't impede other people progress.

Refactoring stuff so it is more understandable is always a good idea. Often getting to understand a hard piece of code, takes longer time than refactoring it to something easy to understand.

Beware if multiple people are working on this codebase, your refactors might break their work.

Make tests for backward compatibility

Think about backward compatibility, test it often, test it early.

Editing parts of code that have good test coverage is much easier.