For me, debugging comes in two different shapes:
- The day-to-day activities associated with fixing errors as they occur during regular development or...
- ...the high-octane work related to ironing out production bugs.
The latter generally calls for more of a Sherlock Holmes approach to identify and fix the issue. After all, the bug made its way to production! 🤯
If you’ve been in the software development business for a while, you know it’s almost impossible to ship completely bug-free code. It’s all about weighing test efforts to the potential downside of a production incident. Obviously, the higher the implication of a production issue, the more rigorous test efforts are needed to ensure it never happens — and vice versa.
So, if you ever find yourself in a situation where you need to identify and fix a bug that’s made it into production, using
bisect is my suggested Holmes way of doing it. Hint: it even works effortlessly for non-linear histories like this one!
A typical case
Before we get going, humor me with a short example. Let’s say you are working on a web app that’s been live for years, where new updates get released almost daily. All of a sudden, a bug is discovered in a secluded part. From my experience, these discoveries are generally made by the product owner (usually on holiday when he should be thinking about anything but the product).
The conversation goes like this:
- PO: “Hi, I’ve just discovered a bug in production related to the user’s Settings section. When I try to update the email, nothing happens. As far as I can remember, it worked three weeks ago when I last used it. Can you please take a look at it?”
- Dev: “Sounds weird. We haven’t touched that part for ages. I‘ll look into it and get back to you. By the way, shouldn’t you be on vacation?”
- PO: “Yeah, I know… I can’t take my eyes off this lovely app of ours!”
If you’re lucky, the bug can be easily reproduced and fixed. But other times, the root cause is a silent error that can be difficult to identify — especially if the issue has been live for weeks or months. What could be the root cause? Is it due to a dependency update? Or did the server protocol change? Could it be that some other developer made a change that caused ripple effects?
Whenever you don’t know where to start, here is a structured and battle-tested process for tackling problems like these. On a conceptual level, the steps are as follows:
- Identify when the bug was first introduced (i.e., what commit) – using
- Figure out what’s causing the issue.
- Create a bug fix.
- Prepare for release.
OK, let’s see the steps in action!
1 . Identify When the Bug Was Introduced (i.e. What Commit)
How do we figure out where the bug was first introduced when we only know roughly when it worked last time? Remember the PO saying, “It worked three weeks ago”? Our only problem is that the bug could have been introduced anytime between then and now.
This is where
git bisect comes in!
git bisect is an automatic way of applying the “divide and conquer” approach to find an offending commit within your history. It uses a binary search algorithm to narrow down a set interval of commits, between a known working and broken commit, by halving the interval on every attempt, finally arriving at the offending commit.
I theory, you could follow the same process manually, however, if you are working in a repository with a non-linear history (which most of us are), knowing which commit to try next can prove quite tricky.
git bisect will even take care of this process for you!