We expand the domain of software defect research into Trace Analysis Patterns. In addition to Citrix CDF / Microsoft ETW traces we cover other variants based on extensive software engineering background in the past where we used tracing in software products ranging from soft multi-platform real-time systems to static code analysis tools. Connection with memory dump analysis is covered too because, sometimes, the combination of static and dynamic data leads to interesting observations and helps to troubleshoot and resolve customer problems especially when not all data can be collected dynamically.
In fact, stack traces (Volume 1, page 395) and their collections (Volume 1, page 409) are specializations of the more general traces. Another example is historical information (Volume 1, page 457) in memory dump files especially when it is somehow time stamped.
This is the obvious and to some extent the trivial pattern. This is an error or a status value that is observed periodically many times:
No PID TID Date Time Message [...] 664957 1788 22504 4/23/2009 17:59:14.600 MyClass::Initialize:Cannot open connection
"Client ID: 310", status=5
[...] 668834 1788 19868 4/23/2009 19:11:52.979 MyClass::Initialize:Cannot open connection
"Client ID: 612", status=5
[...]or
No PID TID Date Time Message [...] 202314 1788 19128 4/21/2009 16:03:46.861 HandleDataLevel:Error 12005 Getting Mask
[...] 347653 1788 17812 4/22/2009 13:26:00.735 HandleDataLevel :Error 12005 Getting Mask
[...]
Here single trace entries can be isolated from the trace and studied in detail. We should be aware though that some modules might report periodic errors that are false positive, in the sense, that they are expected as a part of implementation details, for example, when a function returns an error to indicate that the bigger buffer is required or to estimate its size for a subsequent call.
A typical trace is a detailed narrative. It is accompanied by a problem description that lists essential facts. Therefore the first task of any trace analysis is to check the presence of Basic Facts in the trace. If they are not visible or do not correspond then the trace was possibly not recorded during the problem occurrence or was taken from a different computer or under different conditions. Here is an example. A user "test01″ can-not connect to a published application in a terminal services environment. We look at the trace and find this statement:
No PID TID Date Time Statement [...] 3903 3648 5436 4/29/2009 16:17:36.150 User Name: test01 [...]
At least we can be sure that this trace was taken for the user test01 especially when we expect this or similar trace statements. If we don't see this trace statement we may suppose that the trace was taken at the wrong time, for example, when the problem had already happened.
This is an obvious structural trace analysis pattern. Sometimes information about circularity is missing in the problem description or the trace metadata doesn't reflect it. Then circular traces can be detected by trace file size (usually large) and from time-stamps, like this 100Mb CDF trace snippet:
No Module PID TID Date Time Statement [Begin of trace listing] 1 ModuleA 4280 1736 5/28/2009 08:53:50.496 [... Trace statement 1] 2 ModuleB 6212 6216 5/28/2009 08:53:52.876 [... Trace statement 2] 3 ModuleA 4280 4776 5/28/2009 08:54:13.537 [... Trace statement 3] [... Some traced exceptions helpful for analysis ...] 3799 ModuleA 4280 37765/28/2009 09:15:00.853
[... Trace statement 3799] 3800 ModuleA 4280 1736[... Trace statement 3800] [... Skipped ...] [... Skipped ...] [... Skipped ...] 579210 ModuleA 4280 4776 5/28/2009 08:53:35.989 [... Trace statement 579210] [End of trace listing]
5/27/2009
09:42:12.029
In such traces the analysis region usually can be found at the beginning at the trace because as soon as elusive and hard to reproduce problem happens the trace is stopped.
Sometimes we see a functional activity and basic facts (page 345) in a trace. Then we might want to find a correlation between that activity and facts in another part of the trace. If that intra-correlation fits into our problem description we may claim a possible explanation or, if we are lucky, we have just found, an inference to the best explanation, as philosophers of science like to say. Here is an example, but this time using Citrix WindowHistory tracing tool[33]. A third-party application was frequently loosing the focus and the suspicion was on a terminal services client process. It was found that the following WindowHistory trace fragment corresponds to that application:
Handle: 00050586 Class: "Application A Class" Title: "" Title changed at 15:52:4:3 to "Application A" Title changed at 15:52:10:212 to "Application A - File1" [...] Process ID: 89c Thread ID: d6c [...] Visible: true Window placement command: SW_SHOWNORMAL Placement changed at 15:54:57:506 to SW_SHOWMINIMIZED Placement changed at 15:55:2:139 to SW_SHOWNORMAL Foreground: false Foreground changed at 15:52:4:3 to true Foreground changed at 15:53:4:625 to false Foreground changed at 15:53:42:564 to true Foreground changed at 15:53:44:498 to false Foreground changed at 15:53:44:498 to true Foreground changed at 15:53:44:592 to false Foreground changed at 15:53:45:887 to true Foreground changed at 15:53:47:244 to false Foreground changed at 15:53:47:244 to true Foreground changed at 15:53:47:353 to false Foreground changed at 15:54:26:416 to true Foreground changed at 15:54:27:55 to false Foreground changed at 15:54:27:55 to true Foreground changed at 15:54:27:180 to false Foreground changed at 15:54:28:428 to true Foreground changed at 15:54:28:771 to false Foreground changed at 15:54:28:865 to true Foreground changed at 15:54:29:99 to false Foreground changed at 15:54:30:877 to true Foreground changed at 15:54:57:521 to false Foreground changed at 15:55:2:76 to true Foreground changed at 15:57:3:378 to false
Foreground changed at 15:57:11:396 to true Foreground changed at 15:57:29:601 to false Foreground changed at 15:57:39:803 to true Foreground changed at 15:58:54:41 to false Foreground changed at 15:59:8:96 to true Foreground changed at 16:1:19:478 to false Foreground changed at 16:1:27:527 to true Foreground changed at 16:1:39:914 to false Foreground changed at 16:2:0:515 to true Foreground changed at 16:7:14:628 to false Foreground changed at 16:7:24:246 to true Foreground changed at 16:9:53:523 to false Foreground changed at 16:10:15:919 to true Foreground changed at 16:10:31:426 to false Foreground changed at 16:11:12:818 to true Foreground changed at 16:11:59:538 to false Foreground changed at 16:12:39:456 to true Foreground changed at 16:13:6:364 to false
Corresponding terminal services client window trace fragment doesn't show any foreground changes but another application main window has lots of them:
Handle: 000D0540 Class: "Application B Class" Title: "Application B" [...] Process ID: 3ac Thread ID: bd4 [...] Foreground: false Foreground changed at 15:50:36:972 to true Foreground changed at 15:50:53:732 to false Foreground changed at 15:50:53:732 to true Foreground changed at 15:50:53:826 to false Foreground changed at 15:51:51:352 to true Foreground changed at 15:51:53:941 to false Foreground changed at 15:53:8:135 to true Foreground changed at 15:53:8:182 to false Foreground changed at 15:53:10:178 to true Foreground changed at 15:53:13:938 to false Foreground changed at 15:53:30:443 to true Foreground changed at 15:53:31:20 to false Foreground changed at 15:53:31:20 to true Foreground changed at 15:53:31:129 to false Foreground changed at 15:53:34:78 to true Foreground changed at 15:53:34:795 to false Foreground changed at 15:53:34:795 to true Foreground changed at 15:53:34:873 to false Foreground changed at 15:53:36:901 to true Foreground changed at 15:53:42:502 to false Foreground changed at 15:53:42:502 to true Foreground changed at 15:53:42:564 to false Foreground changed at 15:57:3:425 to true Foreground changed at 15:57:4:595 to false Foreground changed at 15:57:10:507 to true
Foreground changed at 15:57:11:318 to false Foreground changed at 15:57:29:632 to true Foreground changed at 15:57:31:67 to false Foreground changed at 15:57:32:721 to true Foreground changed at 15:57:33:844 to false Foreground changed at 15:58:54:88 to true Foreground changed at 15:58:56:178 to false Foreground changed at 15:59:6:505 to true Foreground changed at 15:59:7:987 to false Foreground changed at 16:1:19:525 to true Foreground changed at 16:1:19:961 to false Foreground changed at 16:1:26:607 to true Foreground changed at 16:1:27:434 to false Foreground changed at 16:1:39:914 to true Foreground changed at 16:1:39:992 to false Foreground changed at 16:1:49:798 to true Foreground changed at 16:2:0:437 to false Foreground changed at 16:7:14:628 to true Foreground changed at 16:7:14:847 to false Foreground changed at 16:7:18:76 to true Foreground changed at 16:7:24:106 to false Foreground changed at 16:9:58:790 to true Foreground changed at 16:10:4:16 to false Foreground changed at 16:10:4:874 to true Foreground changed at 16:10:4:890 to false Foreground changed at 16:10:8:634 to true Foreground changed at 16:10:15:779 to false Foreground changed at 16:10:56:766 to true Foreground changed at 16:10:59:402 to false Foreground changed at 16:10:59:652 to true Foreground changed at 16:10:59:667 to false Foreground changed at 16:12:9:397 to true Foreground changed at 16:12:39:347 to false Foreground changed at 16:13:18:375 to true Foreground changed at 16:14:33:656 to false
We can see that most of the time when Application A window looses focus, Application B window gets it.