Git Commands & Concepts – Demystified (part 1)
When I started working with Git (10 years ago) I was overwhelmed by the tsunami of commands and concepts that washed over me; I had big problems sorting them out to make any sense of the program. 🤯
I bet you probably recognise the feeling, don't lose faith – there's light at the end of the tunnel!
In Git, many terms use the same name, only distinguished by the context they're in; such as commit
which as a noun refers to the commit object, and when used as a verb denotes the action of creating the actual commit object. Furthermore, as Git has evolved over +15 years, so has the terminology, causing the same parts of it to be known under different names, such as Staging Area and Index; not making it any easier for anyone new to the game!
With Git now being the primary tool for source code management, knowing its commands and concepts is key for any software professional looking to communicate efficiently with fellow peers (particularly when searching for answers online).
In this first post out of three, I'll provide you with a mental model highlighting all key parts of Git, including a short description of each. In the consecutive posts, I'll then go through the standard Git workflow (or commands) using the same mental model.
This model will hopefully make your life with Git more durable! 🙃
Bits and pieces
Below is a visualization of the main bits and pieces that make up a traditional Git setup, followed by a short description of each component. Learning this mental model will aid you in your day-to-day Git usage – I resort to it regularly!
Git components
The list below describes each component in the visualization above, ordered by descending generality; basics first!
- Repository (or "repo"): The storage, or database, where all source code is stored along with its historical metadata. As Git is a Distributed Version Control System (DVCS), the entire repository exists both locally (on each developer's computer) as well as remotely (typically on a shared server).
- Remote: Generic term used to reference anything that is not explicitly related to your local machine. E.g. A remote branch is a branch that's present in a remote repository, e.g. if
master
is the name of your local branch thenorigin/master
would be the default name of its remote counterpart. Likewise, a remote repo would refer to a repo that for example sits on a remote server. - Local: Generic term used to reference anything that exists on your machine, such as a repo, branch, or likewise. See remote for a more in depth description.
- Origin: Default name of remote repository. When an existing repository is initially cloned, the remote reference is named origin by default; it's just a name, there's nothing "magical" about it.
- Clone: Your local copy of a remote repository, including the entire history with commits, branches, tags, etc.
- History (or history graph): String of commits that make up all changes done leading up to today. It can be viewed (textually) from the terminal using
$ git log --graph
, or more graphically using Gitk or any other visual tool. In the illustration, commits C0 + C1 + C2 make up the history of the super-cool-feature branch. - Branch: Lightweight moveable pointer referencing one particular commit. The term is somewhat inconsistently used as it sometimes refers to the string of commits (aka branch) making up the divergent history to which changes are made, and sometimes the actual pointer itself. For example, whenever a new commit is made on a branch, the physical branch is automatically moved to reference the new commit.
- Master: Default name for initial branch. When a new repository is initialized with
$ git init
, a branch namedmaster
is automatically created. Recently collaboration platforms, such as Github, have changed the default setting to a more unbiased name:main
. Just as with Origin, there's nothing magical about neithermaster
normain
– it's just a name! - Commit: Immutable snapshot of your entire codebase at a given time; formally called commit object. Perplexed? See Immutable Snapshots - One of Git's Core Concepts
- Hash (SHA-ID, "commit hash"): Unique commit identifier, derived from the content and metadata provided at time of commit, using the cryptographic Secure Hash Algorithm (SHA); the algorithm produces a 40 digit long (20 byte) hash value, also known as message digest. In Git, a commit can be referenced using its entire hash, e.g.
87735044984fe97760d40d3097feef8c6a3c2219
, or the smallest amount necessary to identify a commit in the repo – generally the first seven digits, e.g.8773504
. - Tag: Similar to a branch (when lightweight), but without the automatic updating feature – it's just a pointer to a specific commit. When annotated, on the other hand, the tag is stored as a full object including metadata such as tagger, date, message, etc; it can even be signed and verified with GNU Privacy Guard (GPG). Tags are typically used to specify releases, e.g.
v1.0
. - Working Tree: The actual files and folders on disk currently checked out for editing, e.g. what you see in your editor or file explorer. It also contains metadata about any changes made, that are not yet staged or ignored, and can be shown using
$ git status
. Working Tree is also commonly referred to as Working Directory or Work Space. - Staging Area: Conceptual area containing all files staged for commit, in essence it's an "open commit" still able to accept changes; i.e. the Staging Area contains all files that'll go into the next commit.
- Index: Physically it's the
.git/index
file that makes up the content of the Staging Area, conceptually it's the same as the Staging Area. E.g. "AddREADME.md
to Index." = "AddREADME.md
to Staging Area." = "StageREADME.md
" - HEAD: Pointer to the currently checked out branch, or commit, referencing the immutable snapshot your Working Tree is currently based. In short, HEAD answers the question: "Where am I right now in the repository?" For a more thorough run down, see What is HEAD in Git?
- Stash: A small "hideaway" for local work that is not yet completed, e.g. not ready for commit, but still have to be stowed away for the future. Stashing is typically done when immediate context switching is necessary, and you don't have time or don't want to commit. It stows your unfinished work and resets your Working Tree to what HEAD points to.
- Fork: A complete copy of a remote repository (generally from which you don't have write/push permission) under a new name space, typically your own. E.g. The Bootstrap project resides on Github under the following namespace twbs/bootstrap, a personal fork of it would then look like
{userName}/bootstrap
. Don't confuse fork with clone which refers to a local copy of a remote repository. In short, you can first fork a repo and then clone the newly forked repo onto your own machine.
Other general terms
- Source Code Management System (SCMs) / Version Control System (VCS): System used to keep track of changes made to source code over time.
- Checksum: A small-sized block of data derived from another block of digital data, typically using a cryptographic hash function, for the purpose of verifying data integrity. The commit hash is a checksum.
- Tree (Directory tree): Linux term for all files and folders under a specific directory. See this Wikipedia article for a close up on the matter.
- Terminal: Computer program allowing textual interaction over a command line interface, opposite to a graphical interface (which is point-and-click).
- Command Line Interface (CLI): Processes commands to a computer program in the form of lines of text.
Conclusion
With the most fundamental components of Git ironed out and visualized, I hope you now feel you have some additional tools to better cope with it. Learning the commands and components by heart will for sure speed up your communication with fellow peers, and definitely aid you when searching for answers online!
You are now ready to continue reading the second post in this series, where we'll look at how the standard Git workflow fits into this mental model. We'll go through the different file statuses (untracked, tracked, ignored) and common operations such as fetch
, checkout
, and commit
among others.
😎 Thanks for reading and good luck improving your source code management skills!