How to Split Commits and Remove Unwanted Files

Did you perhaps commit that log file or binary by accident? No worries! In this post I'll provide you with three ways to remove unwanted files from a series of commits.

How to Split Commits and Remove Unwanted Files
Photo by Tania Melnyczuk / Unsplash

As a developer you're occasionally faced with the need to undo (or rather redo) a series of commits – just like the original poster of this Stack Overflow question (with over 10M views). If you've ever accidentally committed unwanted files, such as logs, binaries, or libraries as part of a series of commits, you know what I'm talking about.

No one wants to bloat the repository by adding unintentional commit noise, ultimately causing confusion among your team. Whenever this happens, removing these unwanted files from each commit is necessary – but how can it be done?

In this post, I'll go through three different ways to remove unwanted files from a series of commits using reset, rebase, and filter-branch.

Start case

Before we look at the different options, let's contextualize the problem with a concrete example. Below is a short linear-history consisting of three commits, where the two latter (C1 & C2) contain changes to a log file (tmp.log) that have been accidentally committed.

Start case where changes to the file "tmp.log" have been accidentally committed as part of commits "C1" and "C2".

Notice that none of the two commits (C1 & C2) have been pushed remotely, and only exist in the local master branch, allowing us to easily rewrite history as we please. Now, let's see how the accidentally committed file tmp.log can be removed from the history and C1 & C2 redone.

If the whole concept of rewriting history sounds strange to you, make sure to first revisit this post on Immutable Snapshots - One of Git's Core Concepts to get your bearings.

Redoing a series of commits

There are several ways to "undo" or "redo" a series of commits, depending on the outcome you're after. Considering the start case above, reset, rebase and filter-branch can all be used to rewrite your history.

Alternative 1: reset

With reset, a branch can be reset to a previous state, and any compounded changes be reverted to the Staging Area, from where any unwanted changes can then be discarded. Below illustration showcases how undoing changes from our initial start case looks like using a "soft reset":

$ git reset --soft t56pi
Following a soft reset the compounded changes are left in the Staging Area.

Following the soft reset, any unwanted changes can be removed using restore and a new commit can then be created containing only the desired changes.

Note: As the soft reset clusters all previous changes (C1 + C2) into the Staging Area, individual commit meta-data is lost; e.g. commit messages and unique changes. If this is not OK with you, chances are you're probably better off with rebase or filter-branch instead.

💡
Sign up to read the full post, and see how rebase and filter-branch can be used to remove unwanted files from your history! (only a valid e-mail is required) 🤓