Git Internals — What Happens When You Commit

Updated 2026-03-04

Most developers use Git daily without ever looking at what’s inside the .git directory. Understanding Git’s object model turns it from a black box into a transparent, debuggable system — and makes the mental model for rebasing, merging, and reflog much clearer.

The Object Store

Everything in Git — file content, directory structures, commits, and tags — is stored as a content-addressed object in .git/objects/. The filename is the SHA-1 hash of the compressed content.

1
2
3
4
5
# After making a commit, inspect the object store
find .git/objects -type f | head -20
# .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
# .git/objects/84/4af7d34...
# ...

There are four object types:

Type Description
blob File contents (no filename, no metadata)
tree Directory listing — maps filenames to blob/tree hashes
commit Points to a tree + parent commit(s) + author + message
tag Annotated tag — points to a commit with extra metadata

Blobs: Storing File Content

When you stage a file, Git creates a blob from its contents:

1
2
echo "hello world" | git hash-object --stdin -w
# 8c7e5a667f1b771847fe88c01c3de34413a1b220

The blob stores only the raw content — no filename, no timestamp. Two files with identical content share one blob. This is how Git deduplicates content efficiently.

1
2
git cat-file -p 8c7e5a667f1b
# hello world

Trees: Storing Directories

A tree object maps names to hashes:

1
2
3
4
git cat-file -p HEAD^{tree}
# 100644 blob 8c7e5... README.md
# 100644 blob a4b3c... _config.yml
# 040000 tree 9f8d2... source

Each line contains:

  • Mode100644 for regular file, 040000 for directory, 100755 for executable
  • Typeblob or tree
  • Hash — the SHA-1 of the object
  • Name — the filename or directory name

Nested directories are represented by nested tree objects. The entire working tree at any commit is a single root tree with sub-trees.


Commits: The Spine of History

1
2
3
4
5
6
7
git cat-file -p HEAD
# tree 9f8d2a3f1b8e4c6d2a1b3e4f5c6d7e8f9a0b1c2d
# parent a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
# author Alice <alice@example.com> 1706745600 +0000
# committer Alice <alice@example.com> 1706745600 +0000
#
# Add syntax highlighting showcase post

A commit stores:

  1. Tree — a pointer to the root tree (snapshot of the entire working directory)
  2. Parent(s) — zero parents for the initial commit, one for normal commits, two for merges
  3. Author + Committer — name, email, timestamp (can differ after rebasing)
  4. Message — the commit message

A commit is a snapshot, not a diff. Git computes diffs on the fly by comparing tree objects.


How Branches Work

1
2
cat .git/refs/heads/main
# 3b18e512dba79e4c8300dd08aeb37f8e728b8dad

A branch is just a 41-byte file containing a commit hash. Creating a branch is instantaneous:

1
2
git branch feature-x
# Creates .git/refs/heads/feature-x pointing to HEAD's hash

HEAD is a symbolic ref — usually pointing to a branch:

1
2
cat .git/HEAD
# ref: refs/heads/main

When HEAD points directly to a commit hash (not a branch), you’re in detached HEAD state.


The Index (Staging Area)

The staging area (.git/index) is a binary file that maps paths to blob hashes. It represents the next commit — your working directory minus any unstaged changes.

1
2
3
4
# See what the index contains
git ls-files --stage
# 100644 8c7e5a... 0 README.md
# 100644 a4b3c8... 0 _config.yml

When you run git add, Git:

  1. Hashes the file content → creates a blob object
  2. Updates the index entry for that path to point to the new blob

When you run git commit, Git:

  1. Reads the current index
  2. Creates tree objects for each directory
  3. Creates a commit object pointing to the root tree and the previous commit
  4. Updates the current branch ref to point to the new commit

Rebasing Under the Hood

git rebase main replays commits:

  1. Finds the common ancestor of feature and main
  2. For each commit on feature (after the ancestor), computes the diff from its parent
  3. Applies each diff on top of main‘s HEAD, creating new commit objects with new hashes
  4. Moves the feature branch ref to the last new commit

The original commits still exist in the object store until garbage collection runs. They’re accessible via git reflog.


Reflog: The Safety Net

1
2
3
4
5
6
7
8
# See where HEAD has been
git reflog
# 3b18e51 HEAD@{0}: commit: Add internals post
# a1b2c3d HEAD@{1}: rebase: Add syntax highlighting
# 9f8d2a3 HEAD@{2}: checkout: moving from main to feature

# Recover from a bad reset
git reset --hard HEAD@{2}

The reflog is local-only and expires after 90 days by default. It’s what makes git reset --hard recoverable.


Packfiles

As a repo grows, Git packs loose objects into binary packfiles for efficiency:

1
2
3
ls .git/objects/pack/
# pack-3b18e512...idx
# pack-3b18e512...pack

Inside a packfile, Git stores objects as deltas against similar objects, achieving significant compression. git gc triggers repacking.


Practical Takeaways

  • Commits are snapshots, not diffs — understanding this clarifies why git rebase creates new commits
  • Branches are pointers — moving, renaming, and deleting them is cheap
  • Staging is explicit — the index gives you fine-grained control over what goes into a commit
  • Nothing is lost — until GC runs, the reflog can recover any state you’ve been in

Understanding the object model makes commands like git cherry-pick, git bisect, and git stash far less mysterious — they’re all just manipulating the same four object types.