Often, the development teams know where the bottleneck in the source tree is, but it can be challenging to convince the management that you need resources to rewrite some code. However, with Git, it is fairly simple to extract that type of data from the repository.
Start by checking out the stable-3.1
release:
$ git checkout stable-3.1 Branch stable-3.1 set up to track remote branch stable-3.1 from origin. Switched to a new branch 'stable-3.1'
We want to start by listing some stats for one commit, and then we can extend the examples to larger chunks of commits:
dirstat
for git log
:$ git log -1 --dirstat commit da6e87bc373c54c1cda8ed563f41f65df52bacbf Author: Matthias Sohn <[email protected]> Date: Thu Oct 3 17:22:08 2013 +0200 Prepare post 3.1.0 builds Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962 Signed-off-by: Matthias Sohn <[email protected]> 5.0% org.eclipse.jgit.http.server/META-INF/ 6.9% org.eclipse.jgit.http.test/META-INF/ 3.3% org.eclipse.jgit.java7.test/META-INF/ 4.3% org.eclipse.jgit.junit.http/META-INF/ 6.6% org.eclipse.jgit.junit/META-INF/ 5.5% org.eclipse.jgit.packaging/ 5.9% org.eclipse.jgit.pgm.test/META-INF/ 13.7% org.eclipse.jgit.pgm/META-INF/ 15.4% org.eclipse.jgit.test/META-INF/ 3.7% org.eclipse.jgit.ui/META-INF/ 13.1% org.eclipse.jgit/META-INF/
--dirstat
option shows which directories have changed in the commit and how much they have changed compared to each other. The default setting is to count the number of lines added to or removed from the commit. So, rearranging the code potentially does not count for any change as the line count might be the same. You can compensate for this slightly by using --dirstat=lines
. This option will look at each file line by line and see whether they have changed compared to the previous version:$ git log -1 --dirstat=lines commit da6e87bc373c54c1cda8ed563f41f65df52bacbf Author: Matthias Sohn <[email protected]> Date: Thu Oct 3 17:22:08 2013 +0200 Prepare post 3.1.0 builds Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962 Signed-off-by: Matthias Sohn <[email protected]> 4.8% org.eclipse.jgit.http.server/META-INF/ 6.5% org.eclipse.jgit.http.test/META-INF/ 3.2% org.eclipse.jgit.java7.test/META-INF/ 4.0% org.eclipse.jgit.junit.http/META-INF/ 6.1% org.eclipse.jgit.junit/META-INF/ 6.9% org.eclipse.jgit.packaging/ 5.7% org.eclipse.jgit.pgm.test/META-INF/ 13.0% org.eclipse.jgit.pgm/META-INF/ 14.6% org.eclipse.jgit.test/META-INF/ 3.6% org.eclipse.jgit.ui/META-INF/ 13.8% org.eclipse.jgit/META-INF/
$ git log -1 --dirstat=lines,10 commit da6e87bc373c54c1cda8ed563f41f65df52bacbf Author: Matthias Sohn <[email protected]> Date: Thu Oct 3 17:22:08 2013 +0200 Prepare post 3.1.0 builds Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962 Signed-off-by: Matthias Sohn <[email protected]> 13.0% org.eclipse.jgit.pgm/META-INF/ 14.6% org.eclipse.jgit.test/META-INF/ 13.8% org.eclipse.jgit/META-INF/
10
to the --dirstat=lines
command, we are asking Git to only show the directories that have 10 percent or higher changes; you can use any number you like here. By default, Git does not count the changes in the subdirectories, but only the files in the directory. So, in this diagram, only changes in File A1
are counted as changes. For the Dir A1
directory and the File B1
file, it is counted as a change in Dir A2
:cumulative
to the --dirstat=lines,10
command, and this will cumulate the changes and calculate a percentage. Be aware that the percentage can go beyond 100 due to the way it is calculated:$ git log -1 --dirstat=files,10,cumulative commit da6e87bc373c54c1cda8ed563f41f65df52bacbf Author: Matthias Sohn <[email protected]> Date: Thu Oct 3 17:22:08 2013 +0200 Prepare post 3.1.0 builds Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962 Signed-off-by: Matthias Sohn <[email protected]> 31.3% org.eclipse.jgit.packaging/
git log --dirstat
, you can get some information about what goes on in the repository. Obviously, you can also do this for all the commits between two releases or two commit hashes. Let's try this, but instead of using git log
, we will be using git diff
, as Git will show the accumulated diff between the two releases, and git log
will show the dirstat for each commit between the releases:$ git diff origin/stable-3.1..origin/stable-3.2 --dirstat 4.0% org.eclipse.jgit.packaging/org.eclipse.jgit.target/ 3.9% org.eclipse.jgit.pgm.test/tst/org/eclipse/jgit/pgm/ 4.1% org.eclipse.jgit.pgm/ 20.7% org.eclipse.jgit.test/tst/org/eclipse/jgit/api/ 21.3% org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/storage/file/ 5.2% org.eclipse.jgit.test/tst/org/eclipse/jgit/ 14.5% org.eclipse.jgit/src/org/eclipse/jgit/api/ 6.5% org.eclipse.jgit/src/org/eclipse/jgit/lib/ 3.9% org.eclipse.jgit/src/org/eclipse/jgit/transport/ 4.6% org.eclipse.jgit/src/org/eclipse/jgit/
origin/stable-3.1
and origin/stable-3.2
branches, we can see which directories have the highest percentage of changes. We can then dig a little deeper using --stat
or --numstat
for the directory, and again use git diff
. We will also use --relative="org.eclipse.jgit.test/tst/org/eclipse/"
, which will show the relative path of the files from org.eclipse.jgit.test/tst/org/eclipse/
. This will look better in the console. Feel free to try this without using the following option:$ git diff --pretty origin/stable-3.1..origin/stable-3.2 --numstat --relative ="org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/" org.eclipse.jgit.test/ tst/org/eclipse/jgit/internal/ 4 2 storage/file/FileRepositoryBuilderTest.java 8 1 storage/file/FileSnapshotTest.java 0 741 storage/file/GCTest.java 162 0 storage/file/GcBasicPackingTest.java 119 0 storage/file/GcBranchPrunedTest.java 119 0 storage/file/GcConcurrentTest.java 85 0 storage/file/GcDirCacheSavesObjectsTest.jav 104 0 storage/file/GcKeepFilesTest.java 180 0 storage/file/GcPackRefsTest.java 120 0 storage/file/GcPruneNonReferencedTest.java 146 0 storage/file/GcReflogTest.java 78 0 storage/file/GcTagTest.java 113 0 storage/file/GcTestCase.java
We have used git log
, git diff
, and git shortlog
to find information about the repository, but there are so many options for those commands on how to find bottlenecks in the source code.
If we want to find the files with the most commits, and these are not necessarily the files with the most line additions or deletions, we can use git log
:
git log
between the origin/stable-3.1
and origin/stable-3.2
branches and list all the files changed in each commit. Then, we just need to sort and accumulate the result with some bash tools:$ git log origin/stable-3.1..origin/stable-3.2 --format=format: --name-only org.eclipse.jgit.ant.test/META-INF/MANIFEST.MF org.eclipse.jgit.ant.test/pom.xml
--format=format:
option tells Git to not display any commit-message-related information, and --name-only
tells Git to list the files for each commit. Now all we have to do is count them:$ git log origin/stable-3.1..origin/stable-3.2 --format=format: --name-only | sed '/^$/d' | sort | uniq -c | sort -r | head -10 12 se.jgit/src/org/eclipse/jgit/api/RebaseCommand.java 12 est/tst/org/eclipse/jgit/api/RebaseCommandTest.java 9 org.eclipse.jgit/META-INF/MANIFEST.MF 7 org.eclipse.jgit.pgm.test/META-INF/MANIFEST.MF 7 org.eclipse.jgit.packaging/pom.xml 6 pom.xml 6 pse.jgit/src/org/eclipse/jgit/api/RebaseResult.java 6 org.eclipse.jgit.test/META-INF/MANIFEST.MF 6 org/eclipse/jgit/pgm/internal/CLIText.properties 6 org.eclipse.jgit.pgm/META-INF/MANIFEST.MF
sed '/^$/d'
to remove empty lines from the output. After this, we used sort
to sort the list of files. Then, we used uniq -c
, which counts the occurrences of each item in the files and adds the number from the output. Finally, we sorted in reverse order using sort -r
and displayed only the top ten results using head 10
. To proceed from here, we should list all the commits between the branches that are changing the top file:$ git log origin/stable-3.1..origin/stable-3.2 org.eclipse.jgit/src/org/eclipse /jgit/api/RebaseCommand.java commit e90438c0e867bd105334b75df3a6d640ef8dab01 Author: Stefan Lay <[email protected]> Date: Tue Dec 10 15:54:48 2013 +0100 Fix aborting rebase with detached head Bug: 423670 Change-Id: Ia6052867f85d4974c4f60ee5a6c820501e8d2427 commit f86a488e32906593903acb31a93a82bed8d87915
git log
command, we will see the commits between the two branches. Now all we have to do is to grep commits that have the bug, so we can tell our manager the number of bugs we fixed in this file.