How to Split Commits and Remove Unwanted Files
Did you perhaps commit that log file or binary by accident? No worries! In this post I'll provide you with three ways to remove unwanted files from a series of commits.
As a developer you're occasionally faced with the need to undo (or rather redo) a series of commits – just like the original poster of this Stack Overflow question (with over 10M views). If you've ever accidentally committed unwanted files, such as logs, binaries, or libraries as part of a series of commits, you know what I'm talking about.
No one wants to bloat the repository by adding unintentional commit noise, ultimately causing confusion among your team. Whenever this happens, removing these unwanted files from each commit is necessary – but how can it be done?
In this post I'll go through three different ways to remove unwanted files from a series of commits using
Before we look at the different options, let's contextualize the problem with a concrete example. Below is a short linear-history consisting of three commits, where the two latter (C1 & C2) contain changes to a log file (tmp.log) that have been accidentally committed.
Notice that none of the two commits (C1 & C2) have been pushed remotely, and only exist in the local master branch, allowing us to easily rewrite history as we please. Now, let's see how the accidentally committed file tmp.log can be removed from the history and C1 & C2 redone.
If the whole concept of rewriting history sounds strange to you, make sure to first revisit this post on Immutable Snapshots - One of Git's Core Concepts to get your bearings.
Redoing a series of commits
There are several ways to "undo" or "redo" a series of commits, depending on the outcome you're after. Considering the start case above,
filter-branch can all be used to rewrite your history.
Alternative 1: reset
reset, a branch can be reset to a previous state, and any compounded changes be reverted to the Staging Area, from where any unwanted changes can then be discarded. Below illustration showcases how undoing changes from our initial start case looks like using a "soft reset":
$ git reset --soft t56pi
Following the soft reset, any unwanted changes can be removed using
restore and a new commit can then be created containing only the desired changes.
Note: As the soft
reset clusters all previous changes (C1 + C2) into the Staging Area, individual commit meta-data is lost; e.g. commit messages and unique changes. If this is not OK with you, chances are you're probably better off with
Alternative 2: rebase
By using an interactive
rebase, each offending commit in the branch can be rewritten, allowing you to modify and discard unwanted changes in the process. In the infographic below, the source tree on the right illustrates the state post
$ git rebase --interactive t56pi
An interactive rebase is done using a step-by-step approach, where each commit to modify is pre-selected, only to be manually modified as part of the rebase process.
- With master checked out, select from which commit the rebase should be started (e.g.
t56pi, as we are only interested in rewriting C1 & C2)
- When prompted with the commit range, select which commits you'd like to modify by replacing
edit. Save and close.
- Git will now stop on each selected commit allowing you to manually modify it by removing the unwanted file with
$ git rm tmp.log, followed by
$ git rebase --continueto persist the change in the selected commit and move on to the next.
rebase commit metadata can be kept as is, in contrast to the first
reset alternative above. This is most likely a preferred option if you want to keep much of your history, but only remove the unwanted files. Notice how commits C1 & C2 are still kept separate post rebase, with only the unwanted file removed.
Alternative 3: filter-branch
filter-branch can be used to wipe unwanted files from a subsection of a branch. Instead of manually editing each commit through the rebase process,
filter-branch can automatically perform the desired action on each commit.
$ git filter-branch --tree-filter 'rm -r ./tmp.log' t56pi..HEAD
Above command would filter out the file
./tmp.log from all commits in the desired range
t56pi..HEAD (assuming our initial start case from above). See below illustration for clarity.
Note: Just like
filter-branch would preserve the rest of the commit meta-data, by only discarding the desired file. Again, notice how C1 and C2 have been rewritten, and the log-file discarded from each commit; in this particular case,
filter-branch would produce the exact same outcome.
As with anything related to software development, there are always multiple ways to achieve the same (or similar) outcome for a given problem. You just need to pick the most suitable option for your particular case. 🤓
You now know how accidentally committed files can be removed from a series of commits using three different options, I hope you found it useful!
A friendly advice
Do note that all three alternatives above rewrites the history completely (i.e. by generating new commits). Unless you know exactly what you're doing and have good communication within your team - only rewrite commits that have not yet been published remotely!
If you need to rewrite already published commits it can be done the same way, you just have to force push your local changes once done, ultimately discarding the original commits even from your remote – be careful!
😎 Thanks for reading and good luck improving your source code management skills!