In this chapter, we will take a look at Git's data model. We will learn how Git references its objects and how the history is recorded. We will learn how to navigate the history, from finding certain text snippets in commit messages, to the introducing a particular string in the code.
The data model of Git is different from other common version control systems (VCSs) in the way Git handles its data. Traditionally, a VCS will store its data as an initial file, followed by a list of patches for each new version of the file:

Git is different: Instead of the regular file and patches list, Git records a snapshot of all the files tracked by Git and their paths relative to the repository root—that is, the files tracked by Git in the filesystem tree. Each commit in Git records the full tree state. If a file does not change between commits, Git will not store the file again. Instead, Git stores a link to the file. This is shown in the diagram below where you see how the files will be after every commit/version.

This is what makes Git different from most other VCSs, and, in the following chapters, we will explore some of the benefits of this powerful model.
The way Git references files and directories is directly built into the data model. In short, the Git data model can be summarized as shown in the following diagram:

The commit
object points to the root tree. The root tree points to subtrees and files.
Branches and tags point to a commit
object and the HEAD
object points to the branch that is currently checked out. So, for every commit, the full tree state and snapshot are identified by the root tree.
Now, since you know that Git stores every commit as a full tree state or snapshot, let's take a closer look at the object's Git store in the repository.
Git's object storage is a key-value storage, the key being the ID of the object and the value being the object itself. The key is an SHA-1 hash of the object, with some additional information, such as size. There are four types of objects in Git, as well as branches (which are not objects, but which are important) and the special HEAD
pointer that refers to the branch/commit currently being checked out. The four object types are as follows:
- Files, or blobs as they are also called in the Git context
- Directories, or trees in the Git context
- Commits
- Tags
We will start by looking at the most recent commit
object in the repository we just cloned, keeping in mind that the special HEAD
pointer points to the branch that is currently being checked out.
To view the objects in the Git database, we first need a repository to be examined. For this recipe, we will clone an example repository in the following location:
$ git clone https://github.com/PacktPublishing/Git-Version-Control-Cookbook-Second-Edition.git
$ cd Git-Version-Control-Cookbook-Second-Edition
Now you are ready to look at the objects in the database. We will start by looking first at the commit
object, followed by the trees, the files, and finally, the branches and tags.
Let's take a closer look at the object's Git stores in the repository.
The Git's specialHEAD
object always points to the current snapshot/commit, so we can use that as the target for our request of the commit that we want to have a look at:
$ git cat-file -p HEAD tree 34fa038544bcd9aed660c08320214bafff94150b parent 5c662c018efced42ca5e9cce709787c40a849f34 author John Doe <john.doe@example.com> 1386933960 +0100 committer John Doe <john.doe@example.com> 1386941455 +0100
This is the subject line of the commit message. It should be followed by a blank line and then the body, which is this text. Here, you can use multiple paragraphs to explain your commit. It's like an email with a subject and a body to try to attract people's attention to the subject.
The cat-file
command with the -p
option prints the object given on the command line; in this case, HEAD
, points to master
, which, in turn, points to the most recent commit on the branch.
We can now see the commit
object, consisting of the root tree (tree
), the parent commit
object's ID (parent
), the author and timestamp information (author
), the committer and timestamp information (committer
), and the commit message.
To see the tree
object, we can run the same command on the tree, but with the tree ID (34fa038544bcd9aed660c08320214bafff94150b
) as the target:
$ git cat-file -p 34fa038544bcd9aed660c08320214bafff94150b
100644 blob f21dc2804e888fee6014d7e5b1ceee533b222c15 README.md
040000 tree abc267d04fb803760b75be7e665d3d69eeed32f8 a_sub_directory
100644 blob b50f80ac4d0a36780f9c0636f43472962154a11a another-file.txt
100644 blob 92f046f17079aa82c924a9acf28d623fcb6ca727 cat-me.txt
100644 blob bb2fe940924c65b4a1cefcbdbe88c74d39eb23cd hello_world.c
We can also specify that we want the tree
object from the commit pointed to by HEAD
by specifying git cat-file -p HEAD^{tree}
, which would give the same results as the previous command. The special notation HEAD^{tree}
means that from the reference given, HEAD
recursively dereferences the object at the reference until a tree
object is found.
The first tree
object is the root tree
object found from the commit pointed to by the master
branch, which is pointed to by HEAD
. A generic form of the notation is <rev>^<type>
, and will return the first object of <type>
, searching recursively from <rev>
.
From the tree
object, we can see what it contains: the file type/permissions, type (tree
/blob
), ID, and pathname:
Type/ Permissions | Type | ID/SHA-1 | Pathname |
100644 |
|
|
|
040000 |
|
|
|
100644 |
|
|
|
100644 |
|
|
|
100644 |
|
|
|
Now, we can investigate the blob
(file) object. We can do this using the same command, giving the blob
ID as the target for the cat-me.txt
file:
$ git cat-file -p 92f046f17079aa82c924a9acf28d623fcb6ca727
The content of the file is cat-me.txt
.
Not really that exciting, huh?
This is simply the content of the file, which we can also get by running a normal cat cat-me.txt
command. So, the objects are tied together, blobs to trees, trees to other trees, and the root tree to the commit
object, all connected by the SHA-1 identifier of the object.
The branch
object is not really like any other Git objects; you can't print it using the cat-file
command as we can with the others (if you specify the -p
pretty print, you'll just get the commit
object it points to), as shown in the following code:
$ git cat-file master
usage: git cat-file (-t|-s|-e|-p|<type>|--textconv) <object>
or: git cat-file (--batch|--batch-check) < <list_of_objects>
<type> can be one of: blob, tree, commit, tag.
...
$ git cat-file -p master
tree 34fa038544bcd9aed660c08320214bafff94150b
parent a90d1906337a6d75f1dc32da647931f932500d83
...
Instead, we can take a look at the branch inside the .git
folder where the whole Git repository is stored. If we open the text file .git/refs/heads/master
, we can actually see the commit ID that the master
branch points to. We can do this using cat
, as follows:
$ cat .git/refs/heads/master
13dcada077e446d3a05ea9cdbc8ecc261a94e42d
We can verify that this is the latest commit by running git log -1
:
$ git log -1
commit 34acc370b4d6ae53f051255680feaefaf7f7850d (HEAD -> master, origin/master, origin/HEAD)
Author: John Doe <john.doe@example.com>
Date: Fri Dec 13 12:26:00 2013 +0100
This is the subject line of the commit message
...
We can also see that HEAD
is pointing to the active branch by using cat
with the .git/HEAD
file:
$ cat .git/HEAD
ref: refs/heads/master
The branch
object is simply a pointer to a commit, identified by its SHA-1 hash.
The last object to be analyzed is the tag
object. There are three different kinds of tag: a lightweight (just a label
) tag, an annotated tag, and a signed tag. In the example repository, there are two annotated tags:
$ git tag
v0.1
v1.0
Let's take a closer look at the v1.0
tag:
$ git cat-file -p v1.0 object f55f7383b57ad7c11cf56a7c55a8d738af4741ce type commit tag v1.0 tagger John Doe <john.doe@example.com> 1526017989 +0200 We got the hello world C program merged, let's call that a release 1.0
As you can see, the tag consists of an object—which, in this case, is the latest commit on the master branch—the object's type (commits, blobs, and trees can be tagged), the tag name, the tagger and timestamp, and finally the tag message.
The Git command git cat-file -p
will print the object given as an input. Normally, it is not used in everyday Git commands, but it is quite useful to investigate how it ties the objects together.
We can also verify the output of git cat-file
by rehashing it with the Git command git hash-object
; for example, if we want to verify the commit
object at HEAD
(34acc370b4d6ae53f051255680feaefaf7f7850d
), we can run the following command:
$ git cat-file -p HEAD | git hash-object -t commit --stdin 13dcada077e446d3a05ea9cdbc8ecc261a94e42d
If you see the same commit hash as HEAD
pointing towards you, you can verify whether it is correct using git log -1
.
We have seen the different objects in Git, but how do we create them? In this example, we'll see how to create a blob
, tree
, and commit
object in the repository. We'll also learn about the three stages of creating a commit.
We'll use the same Git-Version-Control-Cookbook-Second-Edition
repository that we saw in the last recipe:
$ git clone https://github.com/PacktPublishing/Git-Version-Control-Cookbook-Second-Edition.git
$ cd Git-Version-Control-Cookbook-Second-Edition
- First, we'll make a small change to the file and check
git status
:
$ echo "Another line" >> another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: another-file.txt
no changes added to commit (use "git add" and/or "git commit -a")
This, of course, just tells us that we have modified another-file.txt
and we need to use git add
to stage it.
- Let's add the
another-file.txt
file and rungit status
again:
$ git add another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: another-file.txt
The file is now ready to be committed, just as you have probably seen before. But what happens during the add
command? The add
command, generally speaking, moves files from the working directory to the staging area; however, this is not all that actually happens, though you don't see it. When a file is moved to the staging area, the SHA-1 hash of the file is created and the blob
object is written to Git's database. This happens every time a file is added, but if nothing changes for a file, it means that it is already stored in the database. At first, this might seem that the database will grow quickly, but this is not the case. Garbage collection kicks in at times, compressing, and cleaning up the database and keeping only the objects that are required.
- We can edit the file again and run
git status
:
$ echo 'Whoops almost forgot this' >> another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: another-file.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: another-file.txt
Now, the file shows up in both the Changes to be committed
and Changes not staged for commit
sections. This looks a bit weird at first, but there is, of course, a reason for this. When we added the file the first time, the content of it was hashed and stored in Git's database. The changes arising from the second change to the file have not yet been hashed and written to the database; it only exists in the working directory. Therefore, the file shows up in both the Changes to be committed
and Changes not staged for commit
sections; the first change is ready to be committed, the second is not. Let's also add the second change:
$ git add another-file.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: another-file.txt
- Now, all the changes we have made to the file are ready to be committed, and we can record a commit:
$ git commit -m 'Another change to another file'
[master 99fac83] Another change to another file
1 file changed, 2 insertions(+)
As we learned previously, the add
command creates the blob
, tree
, and commit
objects; however, they are also created when we run the commit
command. We can view these objects using the cat-file
command, as we saw in the previous recipe:
$ git cat-file -p HEAD
tree 162201200b5223d48ea8267940c8090b23cbfb60
parent 13dcada077e446d3a05ea9cdbc8ecc261a94e42d
author John Doe <john.doe@example.com> 1524163792 +0200
committer John Doe <john.doe@example.com> 1524163792 +0200
Making changes to another file.
The root-tree
object from the commit is as follows:
$ git cat-file -p HEAD^{tree}
100644 blob f21dc2804e888fee6014d7e5b1ceee533b222c15 README.md
040000 tree abc267d04fb803760b75be7e665d3d69eeed32f8 a_sub_directory
100644 blob 35d31106c5d6fdb38c6b1a6fb43a90b183011a4b another-file.txt
100644 blob 92f046f17079aa82c924a9acf28d623fcb6ca727 cat-me.txt
100644 blob bb2fe940924c65b4a1cefcbdbe88c74d39eb23cd hello_world.c
From the previous recipe, we know that the SHA-1 of the root tree was 34fa038544bcd9aed660c08320214bafff94150b
and the SHA-1 of the another-file.txt
file was b50f80ac4d0a36780f9c0636f43472962154a11a
, and, as expected, they changed in our latest commit when we updated the another-file.txt
file. We added the same file, another-file.txt
, twice before we created the commit, recording the changes to the history of the repository. We also learned that the add
command creates a blob object when called. So, in the Git database, there must have been an object similar to the content of another-file.txt
the first time we added the file to the staging area. We can use the git fsck
command to check for dangling objects—that is, objects that are not referred to by other objects or references:
$ git fsck --dangling
Checking object directories: 100% (256/256), done.
dangling blob ad46f2da274ed6c79a16577571a604d3281cd6d9
Let's check the content of the blob using the following command:
$ git cat-file -p ad46f2da274ed6c79a16577571a604d3281cd6d9
This is just another file
Another line
The blob was, as expected, similar to the content of another-file.txt
when we added it to the staging area the first time.
The following diagram describes the tree stages and the commands used to move between the stages:

For more examples and information on the cat-file
and fsck
commands, please consult the Git documentation at https://git-scm.com/docs/git-cat-file and https://git-scm.com/docs/git-fsck.
The history in Git is formed from the commit
objects; as development advances, branches are created and merged, and the history will create a directed acyclic graph, the DAG, because of the way that Git ties a commit to its parent commit. The DAG makes it easy to see the development of a project based on the commits.
Please note that the arrows in the following diagram are dependency arrows, meaning that each commit points to its parent commit(s), which is why the arrows point in the opposite direction to the normal flow of time:

A graph of the example repository with abbreviated commit IDs
You can view the history (the DAG) in Git by using its git log
command. There are also a number of visual Git tools that can graphically display the history. This section will show some features of git log
.
We will use the example repository from the last section and ensure that the master branch is pointing to 34acc37
:
$ git checkout master && git reset --hard 34acc37
In the previous command, we only use the first seven characters (34acc37
) of the commit ID; this is fine as long as the abbreviated ID that is used is unique in the repository.
$ git log -3
- This will display the following result:
commit 34acc370b4d6ae53f051255680feaefaf7f7850d
Author: John Doe <john.doe@example.com>
Date: Fri Dec 13 12:26:00 2013 +0100
This is the subject line of the commit message.
It should be followed by a blank line then the body, which is this text. Here
you can have multiple paragraphs etc. and explain your commit. It's like an
email with subject and body, so try to get people's attention in the subject
commit a90d1906337a6d75f1dc32da647931f932500d83
Author: John Doe <john.doe@example.com>
Date: Fri Dec 13 12:17:42 2013 +0100
Instructions for compiling hello_world.c
commit 485884efd6ac68cc7b58c643036acd3cd208d5c8
Merge: 44f1e05 0806a8b
Author: John Doe <john.doe@example.com>
Date: Fri Dec 13 12:14:49 2013 +0100
Merge branch 'feature/1'
Adds a hello world C program.
- By default,
git log
prints the commit, author's name and email ID, timestamp, and the commit message. However, the information isn't very graphical, especially if you want to see branches and merges. To display this information and limit some of the other data, you can use the following options withgit log
:
$ git log --decorate --graph --oneline --all
- The previous command will show one commit per line (
--oneline
), identified by its abbreviated commit ID, and the commit message subject. A graph will be drawn between the commits depicting their dependency (--graph
). The--decorate
option shows the branch names after the abbreviated commit ID, and the--all
option shows all the branches, instead of just the current one(s):
$ git log --decorate --graph --oneline --all
* 34acc37 (HEAD, tag: v1.0, origin/master, origin/HEAD, master) This is the sub...
* a90d190 Instructions for compiling hello_world.c
* 485884e Merge branch 'feature/1'
...
This output, however, gives neither the timestamp nor the author information, because of the way the --oneline
option formats the output.
- Fortunately, the
log
command gives us the ability to create our own output format. So, we can make a history view similar to the previous one. The colors are made with the%C<color-name>text-be-colored%Creset
syntax, along with the author and timestamp information and some colors to display it nicely:
$ git log --all --graph \ --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%ci) %C(bold blue)<%an>%Creset'

- This is a bit cumbersome to write, but luckily, it can be made as an alias so you only have to write it once:
git config --global alias.graph "log --all --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%ci) %C(bold blue)<%an>%Creset'"
Git traverses the DAG by following the parent IDs (hashes) from the given commit(s). The options passed to git log
can format the output in different ways; this can serve several purposes—for example, to give a nice graphical view of the history, branches, and tags, as seen previously, or to extract specific information from the history of a repository to use, for example, in a script.
A common use case of creating a release is to create a release note, containing, among other things, the bugs fixed in the release. A good practice is to write in the commit message whether a bug is fixed by the commit. A better practice is to have a standard way of doing this—for example, a line with the string "Fixes-bug: "
, followed by the bug identifier in the last part of the commit message. This makes it easy to compile a list of bugs fixed for a release note. The JGit project is a good example of this; their bug identifier in the commit messages is a simple "Bug: "
string followed by the bug ID.
This recipe will show you how to limit the output of git log
to only list the commits since the last release (tag), which contains a bug fix.
Clone the JGit repository using the following command lines:
$ git clone https://git.eclipse.org/r/jgit/jgit
$ cd jgit
If you want the exact same output as in this example, reset your master
branch to b14a93971837610156e815ae2eee3baaa5b7a44b
:
$ git checkout master && git reset --hard b14a939
You are now ready to look through the commit log for commit messages that describe the bugs fixed.
- First, let's limit the log to only look through the history since the last tag (release). To find the last tag, we can use
git describe
:
$ git describe
v3.1.0.201310021548-r-96-gb14a939
The preceding output tells us three things:
- The last tag was
v3.1.0.201310021548-r
- The number of commits since the tag was
96
- The current commit in abbreviated form is
b14a939
- The last tag was
Now, the log can be parsed from HEAD
to v3.1.0.201310021548-r
. But just running git log 3.1.0.201310021548-r..HEAD
will give us all 96 commits, and we just want the commits with the commit messages that contain "Bug: xxxxxx"
for our release note. The xxxxxx
is an identifier for the bug, and will be a number. We can use the --grep
option with git log
for this purpose, making the code phrase git log --grep "Bug: "
. This will give us all the commits containing "Bug: "
in the commit message; all we need to do now is just to format it to something that we can use for our release note.
- Now, let's say we want the release note format to look like the following template:
Commit-id: Commit subject Fixes-bug: xxx
- Our command line so far is as follows:
$ git log --grep "Bug: " v3.1.0.201310021548-r..HEAD
This gives us all the bug fix commits, but we can format this to a format that is easily parsed with the --pretty
option.
- First, we will print the abbreviated commit ID (
%h
), followed by a separator of our choice (|
), and then the commit subject (%s
, the first line of the commit message), followed by a new line (%n
), and the body (%b
):
--pretty="%h|%s%n%b"
The output, of course, needs to be parsed, but that's easy with regular Linux tools, such as grep
and sed
.
- First, we just want the lines that contain
"|"
or"Bug: "
:
grep -E "\||Bug: "
- Then, we replace these with
sed
:
sed -e 's/|/: /' -e 's/Bug:/Fixes-bug:/'
- The entire command put together is as follows:
$ git log --grep "Bug: " v3.1.0.201310021548-r..HEAD --pretty="%h|%s%n%b" | grep -E "\||Bug: " | sed -e 's/|/: /' -e 's/Bug:/Fixes-bug:/'
- The previous set of commands gives the following output:
f86a488: Implement rebase.autostash
Fixes-bug: 422951
7026658: CLI status should support --porcelain
Fixes-bug: 419968
e0502eb: More helpful InvalidPathException messages (include reason)
Fixes-bug: 413915
f4dae20: Fix IgnoreRule#isMatch returning wrong result due to missing reset
Fixes-bug: 423039
7dc8a4f: Fix exception on conflicts with recursive merge
Fixes-bug: 419641
99608f0: Fix broken symbolic links on Cygwin.
Fixes-bug: 419494
...
Now, we can extract the bug information from the bug tracker and put the preceding code in the release note as well, if necessary.
First, we limit the git log
command to only show the range of commits we are interested in, and then we further limit the output by filtering the "Bug: "
string in the commit message. We pretty print the string so we can easily format it to a style we need for the release note, and finally, find "Bug: "
and replace it by "Fixes-bug: "
using grep
and sed
to completely match the style of the release note.
If we just wanted to extract the bug IDs from the commit messages and didn't care about the commit IDs, we could have just used grep
after the git log
command, still limiting the log to the last tag:
$ git log v3.1.0.201310021548-r..HEAD | grep "Bug: "
If we just want the commit IDs and their subjects, but not the actual bug IDs, we can use the --oneline
feature of git log
combined with the --grep
option:
$ git log --grep "Bug: " --oneline v3.1.0.201310021548-r..HEAD
As we saw in the previous recipe, where a list of fixed issues was extracted from the history, a list of all the files that have been changed since the last release can also easily be extracted. The files can be further filtered to find those that have been added, deleted, modified, and so on.
The same repository and HEAD
position (HEAD
pointing to b14a939
) that we saw in the previous recipe will be used. The release is also the same, which is v3.1.0.201310021548-r
.
The following command lists all the files that have changed since the last release (v3.1.0.201310021548-r
):
$ git diff --name-only v3.1.0.201310021548-r..HEAD
org.eclipse.jgit.packaging/org.eclipse.jgit.target/jgit-4.3.target
org.eclipse.jgit.packaging/org.eclipse.jgit.target/jgit-4.4.target
org.eclipse.jgit.pgm.test/tst/org/eclipse/jgit/pgm/DescribeTest.java
org.eclipse.jgit.pgm.test/tst/org/eclipse/jgit/pgm/FetchTest.java
org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Describe.java
...
The git diff
command operates on the same revision range as git log
did in the previous recipe. By specifying --name-only
, Git will only give the paths of the files that were changed by the commits in the range specified as output.
The output of the command can be further filtered: If we only want to show which files have been deleted in the repository since the last commit, we can use the --diff-filter
switch with git diff
:
$ git diff --name-only --diff-filter=D v3.1.0.201310021548-r..HEAD
org.eclipse.jgit.junit/src/org/eclipse/jgit/junit/SampleDataRepositoryTestCase.java
org.eclipse.jgit.packaging/org.eclipse.jgit.target/org.eclipse.jgit.target.target
org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/storage/file/GCTest.java
There are also switches for the files that have been added (A
), copied (C
), deleted (D
), modified (M
), renamed (R
), and so on.
We saw earlier how we can view the history (the DAG) and visualize it by using git log
. However, as the history grows, the terminal representation of the history can be a bit cumbersome to navigate. Fortunately, there are a lot of graphical tools in Git, one of them being gitk, which works on multiple platforms (Linux, Mac, and Windows).
This recipe will show you how to get started with gitk.
Make sure you have gitk
installed:
$ which gitk
/usr/local/bin/gitk
If nothing shows up, then gitk is not installed on your system, or at least is not available on your $PATH
.
Change the directory to the Git-Version-Control-Cookbook-Second-Edition
repository from the objects and DAG examples. Make sure the master branch is checked out and pointing to 13dcad
:
$ git checkout master && git reset --hard 13dcad
In the repository, run gitk --all &
to bring up the gitk
interface. You can also specify the commit range or branches you want, just as you did with git log
(or provide --all
to see everything):
$ gitk --all &
Gitk shows the commit history of the repository:

Gitk parses the information for every commit and the objects attached to it to provide an easy graphical information screen that shows a graph of the history, author, and timestamp for each commit. In the bottom half is the result of selecting a commit. The commit message and the patches for each file that has changed . Moreover, a list of files that have been changed is displayed to the right.
Though very lightweight and fast, gitk is a very powerful tool. There are many different context menus that appear after the user clicks on a commit, a branch, or a tag in the history view. You can create and delete branches, revert and cherry-pick commits, diff
selected commits, and much more.
You already saw in the previous recipe how we can filter the output of git log
to only list commits with the "Bug: "
string in the commit message. In this example, we will use the same technique to find specific commits in the entire history.
Again, we will use the JGit repository, trying to find commits related to the "Performance"
keyword. In this recipe, we will look through the entire history, so we don't need the master branch to point to a specific commit.
As we tried earlier, we can use the --grep
option to find specific strings in commit messages. In this recipe, we look at the entire history and search every commit that has "Performance"
in its commit message:
$ git log --grep "Performance" --oneline --all e3f19a529 Performance improvement on writing a large index 83ad74b6b SHA-1: collision detection support 48e245fc6 RefTreeDatabase: Ref database using refs/txn/committed 087b5051f Skip redundant 'OR-reuse' step in tip commit bitmap setup 9613b04d8 Merge "Performance fixes in DateRevQueue" 84afea917 Performance fixes in DateRevQueue 7cad0adc7 DHT: Remove per-process ChunkCache d9b224aeb Delete DiffPerformanceTest e7a3e590e Reuse DiffPerformanceTest support code to validate algorithms fb1c7b136 Wait for JIT optimization before measuring diff performance
In this example, we specifically ask Git to consider all of the commits in the history by supplying the --all
switch. Git runs through the DAG and checks whether the "Performance"
string is included in the commit message. For an easy overview of the results, the --oneline
switch is also used to limit the output to just the subject of the commit message. Hopefully then, the commit(s) we needed to find can be identified from this, much shorter, list of commits.
Note that the search is case sensitive—had we searched for "performance"
(all in lower case), the list of commits would have been very different:
$ git log --grep "performance" --oneline --all d7deda98d Skip ignored directories in FileTreeIterator 5a87d5040 Teach UploadPack "include-tag" in "fetch" 7d9246f16 RawParseUtils#lineMap: Simplify by using null sentinel internally 4bfc6c2ae Significantly speed up FileTreeIterator on Windows 4644d15bc GC: Replace Files methods with File alternatives d3021788d Use bitmaps for non-commit reachability checks 6b1e3c58b Run auto GC in the background db7761025 Pack refs/tags/ with refs/heads/ 30eb6423a Add GC_REST PackSource to better order DFS packs ... more output
We also could have used the Find
feature in gitk to find the same commits. Open gitk with the --all
switch, type Performance
in the Find
field, and hit Enter. This will highlight the commits in the history view, and you can navigate to the previous/next result by pressing Shift + up arrow, Shift + down arrow, or the buttons next to the Find
field. You will still, however, be able to see the entire history in the view with the matching commits highlighted:

Sometimes, it is not enough to list the commit messages. You may want to know which commits touched a specific method or variable. This is also possible using git log
. You can perform a search for a string, for example, or a variable or method, and git log
will give you the commits, adding or deleting the string from the history. In this way, you can easily get the full commit context for the piece of code.
Again, we will use the JGit repository with the master branch pointing to b14a939
:
$ git checkout master && git reset --hard b14a939
We would like to find all the commits that have had changes made to the lines that contain the "isOutdated"
method. Again, we will just display the commits on one line each; we can then check them individually later:
$ git log -G"isOutdated" --oneline
f32b861 JGit 3.0: move internal classes into an internal subpackage
c9e4a78 Add isOutdated method to DirCache
797ebba Add support for getting the system wide configuration
ad5238d Move FileRepository to storage.file.FileRepository
4c14b76 Make lib.Repository abstract and lib.FileRepository its implementation
c9c57d3 Rename Repository 'config' as 'repoConfig'
5c780b3 Fix unit tests using MockSystemReader with user configuation
cc905e7 Make Repository.getConfig aware of changed config
We can see that eight commits have patches that involve the string "isOutdated"
.
Git looks over the history (the DAG) looking at each commit for the "isOutdated"
string in the patch between the parent commit and the current commit. This method is quite convenient to use in finding out when a given string was introduced or deleted, and to get the full context and commit at that point in time.
The -G
option used with git log
will look for differences in the patches that contain added or deleted lines that match the given string. However, these lines could also have been added or removed because of some other refactoring/renaming of a variable or method. There is another option that can be used with git log
, namely -S
, which will look through the difference in the patch text in a similar way to the -G
option, but will only match commits where there is a change in the number of occurrences of the specified string—that is, a line added or removed, but not added and removed.
Let's see the output of the -S
option:
$ git log -S"isOutdated" --oneline f32b861 JGit 3.0: move internal classes into an internal subpackagec9e4a78 Add isOutdated method to DirCache797ebba Add support for getting the system wide configurationad5238d Move FileRepository to storage.file.FileRepository4c14b76 Make lib.Repository abstract and lib.FileRepository its implementation5c780b3 Fix unit tests using MockSystemReader with user configuationcc905e7 Make Repository.getConfig aware of changed config
The search matches seven commits, whereas the search with the -G
option matches eight commits. The difference is that the commit with the ID c9c57d3
is only found with the -G
option in the first list. A closer look at this commit shows that the isOutdated
string is only touched because of the renaming of another object, and this is why it is filtered away from the list of matching commits in the last list when using the -S
option. We can see the content of the commit with the git show
command, and use grep -C4
to limit the output to just the four lines before and after the search string:
$ git show c9c57d3 | grep -C4 "isOutdated"
@@ -417,14 +417,14 @@ public FileBasedConfig getConfig() {
throw new RuntimeException(e);
}
}
- if (config.isOutdated()) {
+ if (repoConfig.isOutdated()) {
try {
- loadConfig();
+ loadRepoConfig();
} catch (IOException e) {