Finding bottlenecks in the source tree

Often, the development teams know where the bottleneck in the source tree is, but it can be challenging to convince the management that you need resources to rewrite some code. However, with Git, it is fairly simple to extract that type of data from the repository.

Getting ready

Start by checking out the stable-3.1 release:

$ git checkout stable-3.1
Branch stable-3.1 set up to track remote branch stable-3.1 from origin.
Switched to a new branch 'stable-3.1'

How to do it...

We want to start by listing some stats for one commit, and then we can extend the examples to larger chunks of commits:

  1. The first option we will be using is --dirstat for git log:
    $ git log -1 --dirstat
    commit da6e87bc373c54c1cda8ed563f41f65df52bacbf
    Author: Matthias Sohn <[email protected]>
    Date:   Thu Oct 3 17:22:08 2013 +0200
    
        Prepare post 3.1.0 builds
    
        Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962
        Signed-off-by: Matthias Sohn <[email protected]>
    
       5.0% org.eclipse.jgit.http.server/META-INF/
       6.9% org.eclipse.jgit.http.test/META-INF/
       3.3% org.eclipse.jgit.java7.test/META-INF/
       4.3% org.eclipse.jgit.junit.http/META-INF/
       6.6% org.eclipse.jgit.junit/META-INF/
       5.5% org.eclipse.jgit.packaging/
       5.9% org.eclipse.jgit.pgm.test/META-INF/
      13.7% org.eclipse.jgit.pgm/META-INF/
      15.4% org.eclipse.jgit.test/META-INF/
       3.7% org.eclipse.jgit.ui/META-INF/
      13.1% org.eclipse.jgit/META-INF/
    
  2. The --dirstat option shows which directories have changed in the commit and how much they have changed compared to each other. The default setting is to count the number of lines added to or removed from the commit. So, rearranging the code potentially does not count for any change as the line count might be the same. You can compensate for this slightly by using --dirstat=lines. This option will look at each file line by line and see whether they have changed compared to the previous version:
    $ git log -1 --dirstat=lines
    commit da6e87bc373c54c1cda8ed563f41f65df52bacbf
    Author: Matthias Sohn <[email protected]>
    Date:   Thu Oct 3 17:22:08 2013 +0200
    
        Prepare post 3.1.0 builds
    
        Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962
        Signed-off-by: Matthias Sohn <[email protected]>
    
       4.8% org.eclipse.jgit.http.server/META-INF/
       6.5% org.eclipse.jgit.http.test/META-INF/
       3.2% org.eclipse.jgit.java7.test/META-INF/
       4.0% org.eclipse.jgit.junit.http/META-INF/
       6.1% org.eclipse.jgit.junit/META-INF/
       6.9% org.eclipse.jgit.packaging/
       5.7% org.eclipse.jgit.pgm.test/META-INF/
      13.0% org.eclipse.jgit.pgm/META-INF/
      14.6% org.eclipse.jgit.test/META-INF/
       3.6% org.eclipse.jgit.ui/META-INF/
      13.8% org.eclipse.jgit/META-INF/
    
  3. This also gives a slightly different result. If you would like to limit the output to only show directories with a certain percentage or higher, we can limit the output as follows:
    $ git log -1 --dirstat=lines,10
    commit da6e87bc373c54c1cda8ed563f41f65df52bacbf
    Author: Matthias Sohn <[email protected]>
    Date:   Thu Oct 3 17:22:08 2013 +0200
    
        Prepare post 3.1.0 builds
    
        Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962
        Signed-off-by: Matthias Sohn <[email protected]>
    
      13.0% org.eclipse.jgit.pgm/META-INF/
      14.6% org.eclipse.jgit.test/META-INF/
      13.8% org.eclipse.jgit/META-INF/
    
  4. By adding 10 to the --dirstat=lines command, we are asking Git to only show the directories that have 10 percent or higher changes; you can use any number you like here. By default, Git does not count the changes in the subdirectories, but only the files in the directory. So, in this diagram, only changes in File A1 are counted as changes. For the Dir A1 directory and the File B1 file, it is counted as a change in Dir A2:
    How to do it...
  5. To cumulate this, we can add cumulative to the --dirstat=lines,10 command, and this will cumulate the changes and calculate a percentage. Be aware that the percentage can go beyond 100 due to the way it is calculated:
    $ git log -1 --dirstat=files,10,cumulative
    commit da6e87bc373c54c1cda8ed563f41f65df52bacbf
    Author: Matthias Sohn <[email protected]>
    Date:   Thu Oct 3 17:22:08 2013 +0200
    
        Prepare post 3.1.0 builds
    
        Change-Id: I306a3d40c6ddb88a16d17f09a60e3d19b0716962
        Signed-off-by: Matthias Sohn <[email protected]>
    
      31.3% org.eclipse.jgit.packaging/
    
  6. As you can see, the output is slightly different from what we have seen earlier. By using git log --dirstat, you can get some information about what goes on in the repository. Obviously, you can also do this for all the commits between two releases or two commit hashes. Let's try this, but instead of using git log, we will be using git diff, as Git will show the accumulated diff between the two releases, and git log will show the dirstat for each commit between the releases:
    $ git diff  origin/stable-3.1..origin/stable-3.2 --dirstat
       4.0% org.eclipse.jgit.packaging/org.eclipse.jgit.target/
       3.9% org.eclipse.jgit.pgm.test/tst/org/eclipse/jgit/pgm/
       4.1% org.eclipse.jgit.pgm/
      20.7% org.eclipse.jgit.test/tst/org/eclipse/jgit/api/
      21.3% org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/storage/file/
       5.2% org.eclipse.jgit.test/tst/org/eclipse/jgit/
      14.5% org.eclipse.jgit/src/org/eclipse/jgit/api/
       6.5% org.eclipse.jgit/src/org/eclipse/jgit/lib/
       3.9% org.eclipse.jgit/src/org/eclipse/jgit/transport/
       4.6% org.eclipse.jgit/src/org/eclipse/jgit/
    
  7. So, between the origin/stable-3.1 and origin/stable-3.2 branches, we can see which directories have the highest percentage of changes. We can then dig a little deeper using --stat or --numstat for the directory, and again use git diff. We will also use --relative="org.eclipse.jgit.test/tst/org/eclipse/", which will show the relative path of the files from org.eclipse.jgit.test/tst/org/eclipse/. This will look better in the console. Feel free to try this without using the following option:
    $ git diff --pretty  origin/stable-3.1..origin/stable-3.2 --numstat  --relative
    ="org.eclipse.jgit.test/tst/org/eclipse/jgit/internal/"  org.eclipse.jgit.test/
    tst/org/eclipse/jgit/internal/
    4       2       storage/file/FileRepositoryBuilderTest.java
    8       1       storage/file/FileSnapshotTest.java
    0       741     storage/file/GCTest.java
    162     0       storage/file/GcBasicPackingTest.java
    119     0       storage/file/GcBranchPrunedTest.java
    119     0       storage/file/GcConcurrentTest.java
    85      0       storage/file/GcDirCacheSavesObjectsTest.jav
    104     0       storage/file/GcKeepFilesTest.java
    180     0       storage/file/GcPackRefsTest.java
    120     0       storage/file/GcPruneNonReferencedTest.java
    146     0       storage/file/GcReflogTest.java
    78      0       storage/file/GcTagTest.java
    113     0       storage/file/GcTestCase.java
    
  8. The first number is the number of lines added, and the second number is the lines removed from the files between the two branches.

There's more...

We have used git log, git diff, and git shortlog to find information about the repository, but there are so many options for those commands on how to find bottlenecks in the source code.

If we want to find the files with the most commits, and these are not necessarily the files with the most line additions or deletions, we can use git log:

  1. We can use git log between the origin/stable-3.1 and origin/stable-3.2 branches and list all the files changed in each commit. Then, we just need to sort and accumulate the result with some bash tools:
    $ git log origin/stable-3.1..origin/stable-3.2 --format=format: --name-only
    
    org.eclipse.jgit.ant.test/META-INF/MANIFEST.MF
    org.eclipse.jgit.ant.test/pom.xml
    
  2. First, we are just executing the command without the use of the bash tools. You can see from the extensive output that you only see file names and nothing else. This is due to the options used. The --format=format: option tells Git to not display any commit-message-related information, and --name-only tells Git to list the files for each commit. Now all we have to do is count them:
    $ git log origin/stable-3.1..origin/stable-3.2 --format=format: --name-only | sed '/^$/d'  | sort | uniq -c | sort -r | head -10
         12 se.jgit/src/org/eclipse/jgit/api/RebaseCommand.java
         12 est/tst/org/eclipse/jgit/api/RebaseCommandTest.java
          9 org.eclipse.jgit/META-INF/MANIFEST.MF
          7 org.eclipse.jgit.pgm.test/META-INF/MANIFEST.MF
          7 org.eclipse.jgit.packaging/pom.xml
          6 pom.xml
          6 pse.jgit/src/org/eclipse/jgit/api/RebaseResult.java
          6 org.eclipse.jgit.test/META-INF/MANIFEST.MF
          6 org/eclipse/jgit/pgm/internal/CLIText.properties
          6 org.eclipse.jgit.pgm/META-INF/MANIFEST.MF
    
  3. Now we have a list of the top ten files between the two releases, but before we proceed further, let's just go through what we did. We got the list of files, and we used sed '/^$/d' to remove empty lines from the output. After this, we used sort to sort the list of files. Then, we used uniq -c, which counts the occurrences of each item in the files and adds the number from the output. Finally, we sorted in reverse order using sort -r and displayed only the top ten results using head 10. To proceed from here, we should list all the commits between the branches that are changing the top file:
    $ git log origin/stable-3.1..origin/stable-3.2 org.eclipse.jgit/src/org/eclipse
    /jgit/api/RebaseCommand.java
    commit e90438c0e867bd105334b75df3a6d640ef8dab01
    Author: Stefan Lay <[email protected]>
    Date:   Tue Dec 10 15:54:48 2013 +0100
    
        Fix aborting rebase with detached head
    
        Bug: 423670
        Change-Id: Ia6052867f85d4974c4f60ee5a6c820501e8d2427
    
    commit f86a488e32906593903acb31a93a82bed8d87915
    
  4. By adding the file to the end of the git log command, we will see the commits between the two branches. Now all we have to do is to grep commits that have the bug, so we can tell our manager the number of bugs we fixed in this file.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset