Refactoring by Martin Fowler is a book showing how to refactor code - restructuring a codebase to improve it without changing its functionality. The book provides general principles, a list of code smells, and how they should be refactored. Most valuable is the catalog of refactorings, such as extracting duplicated code into a common function or encapsulating a variable. Each entry in the refactoring catalog explains the motivation behind the refactoring and an example of the refactoring applied in real code, with the author noting common pitfalls.
While I can’t share the entire catalog in a book review, I can share the general principles the author espouses for refactoring:
This is the most important part of refactoring. If you don’t have a suite of tests, you won’t know when (not if) you break something. For example, when renaming a function, you also have to make sure all references to that function are changed as well. If you have a suite of tests that cover your code base, the error will become apparent quickly. If you don’t have tests, don’t refactor blind; refactoring is a perfect time to start writing tests. And when writing tests, make sure they fail when they’re supposed to—a test that always passes is useless.
Refactoring is different from the general restructuring of code. Refactoring involves making one small change at a time. When you combine many small refactorings together, you get something much greater than the sum of each change. Like Unix tools, each refactoring in the catalog is designed to be small and composable. Each one preserves the existing behavior of the program and only does one thing. Don’t try to take shortcuts and apply many refactorings at the same time. While this seems inefficient, by making small changes that are guaranteed to preserve behavior, you will spend little time debugging those changes.
If someone says their code was broken for a couple of days while they are refactoring, you can be pretty sure they were not refactoring.
As The Pragmatic Programmer say: “The coding isn’t done until all the tests run”. If you run the tests frequently you’ll know as soon as you break something. Bind the tests you need to a hotkey and run them with each refactoring you make. Run a subset of tests that cover the code you’re changing every few minutes, and the entire test suite with every git commit you make. Ideally, each refactoring should be an individual commit, and you can rebase them all into a single commit when you’re done. This will allow you to identify exactly which refactoring caused a problem and roll it back when a test fails.
Refactoring involves making many small changes across a codebase, such as changing the parameters of a function. If you work with large feature branches that are weeks out of date with the main branch, there will be frequent merge conflicts that are difficult to resolve. The author thinks you should adopt Continuous Integration techniques and merge your branch into mainline at the end of each day. This will prevent branches from diverging too far from each other, and ensure your colleagues are aware of your refactorings.
Refactoring isn’t to change what your program does, it’s to make the code easier to understand and build upon. The author recommends a test-refactor-code cycle, where each phase is separate from the others. Trying to add functionality while refactoring brings needless complexity to your work. If you find that building a feature would be easier with refactoring, back out your changes, refactor, then add the feature.
Refactoring should be an automatic part of your development methodology. If you dedicate a sprint to refactoring, you’ll find that you have other priorities and never get around to refactoring. It’s easier to fix something when the problem is initially found than to let it fester. Refactoring should be opportunistic and done as needed. The author lists some opportunities to refactor:
When shouldn’t you refactor? If you can treat the code as an API you don’t need to understand, it can stay ugly. The author also suggests that you don’t refactor code you aren’t actively working on.
Many IDEs provide automated refactoring features, such as renaming a function across an entire codebase. Take advantage of your IDE’s ability to do this rather than manually refactor. Even if you are using a basic text editor without refactoring extensions, consider writing scripts that manipulate the text for refactorings you often employ. Even with automated refactoring, the tool can make mistakes, so still run the tests after making changes.
Some refactoring will make your program slower. However, most performance optimizations make code harder to read, and most of your code is not being run often enough to impact performance. Instead, profile your code, find out what sections are being run the most often, and optimize those as needed. Well-factored software will make it easier to optimize the hot spots in your code. The sections that need to be optimized will be clear, and comments can explain the optimizations to future editors. You shouldn’t ignore the performance of your software - but focus on the parts that matter.
Code blocks with identical or very similar behaviors is a code smell. If you have to edit a duplicated code block, you will also have to edit it everywhere else it appears. If you forget to edit one of those duplicates, you will probably introduce a hard-to-track-down bug. In the easiest case, the blocks will be exactly identical, such as a switch or if-else if-else selecting common values. However, some duplicated code may not be exactly identical but vary in behavior based on a parameter. Consider what the code is trying to do when judging similarity, rather than if the code is exactly identical. If you find duplicated code, extract it into a common function or class, and use a parameter to change behavior as needed.
The longer a section of code is, the harder it is to understand. Long functions or classes with lots of member variables are a sign they are trying to do too much at the same time. For large functions, consider breaking it up into multiple sub-functions that are called as steps. For large classes, extract common functionality and variables into another class that is a member of the large class. You may also look at the clients of a class and see if they only use a subset of the functionality of the class. Each subset used by a client can be extracted into another class.
Refactoring is a great book, if opinionated, to think about how to restructure your program. I recommend everyone that is interested in this process to get a copy of the book and look at the catalog - it will give you ideas on how to improve your own programs.