A

 

Abnormal Value

While preparing a presentation on malware narratives4, we found that one essential pattern is missing from the current log analysis pattern catalog. Most of the time, we see some abnormal or unexpected value in a software trace or log, such as a network address outside the expected range, and this triggers further investigation. The message structure may have the same Message Invariant, but the variable part may contain such values as depicted graphically:

Please note that we also have Significant Event pattern that is more general and also covers messages without variable part or just suspicious log entries.

 

Activity Disruption

Sometimes a few Error Messages or Periodic Errors with low Statement Density for specific Activity Regions or Adjoint Threads of Activity (for a specific component, file or function) may constitute Activity Disruption. If the particular functionality was no longer available at the logging time, then its unavailability may not be explained by such disruptions, and such messages may be considered False Positive Errors in relation to the reported problem:

But, if we have Periodic Message Blocks containing only Periodic Errors, Activity Region or Adjoint Thread Discontinuity, or simply No Activity, then we may have the complete cessation of activity that may correlate with the unavailable functionality:

 

Activity Divergence

Sometimes we have several Threads of Activity, for example, from the same process) visible for a certain period and then suddenly we see only one such thread till the end of a trace (or even none). It may be an indication of an application hang or some other abnormal behavior if several active threads doing logging are normal. If we consider such activities (including Adjoint Threads) as vectors running through some temporal “surface,” we can use an analogy of a divergence5:

 

Activity Overlap

Sometimes specific parts of simultaneous Use Case Trails, blocks of Significant Events or Message Sets in general may overlap. It may point to possible synchronization problems such as race conditions (prognostics) or be visible root causes of them if such problems are reported (diagnostics). We call this pattern Activity Overlap:

For example, a first request may start a new session, and we expect the second request processed by the same already established session:

However, users report the second session started upon the second request. If we filter execution log by session id, we find out that session initialization prologs (Trace Partition, page 256) are overlapped. The new session started because the first session initialization was not completed:

 

Activity Packet

Sometimes we are interested in Message Set that has the same time attribute value (or rounded to some digit). We call this analysis pattern Activity Packet by analogy with wave packets6. It may allow the identification of related threads and activities.

It is different from Activity Quantum analysis pattern where time attribute value may change for continuous Message Set with the same thread ID.

 

Activity Quantum

Having chosen a trace message, we are interested in its Message Context which can span all “continuous” messages before and after from the same Thread of Activity. We call it Activity Quantum, which is variable and independent from the so-called CPU quantum. Different messages from Activity Quantum may be executed on different CPUs. The following diagram depicts this analysis pattern:

 

Activity Region

When looking at lengthy traces with thousands and millions of messages (trace statements), we can see regions of activity where the statement current (Jm, msg/s) is much higher than in surrounding temporal regions (Statement Current). Here is an illustration for a typical ETW7 / CDF8 trace where a middle region of activity (Jm2) signifies a system performing some response function as a user session initialization and application launch:

 

Activity Theatre

In addition to Message Patterns, there are higher-level patterns of specific activities and Motives. Such activities may or may not coincide with specific components (modules) because they may be grouped based on implementation messages, software internals semantics and not on architectural and design entities (as in Use Case Trail analysis pattern). Moreover, the same components may “play” different activity roles. Once assigned, Activity Theatre “scripts” can be compared with “scripts” from other traces and logs (Inter-Correlation) or different parts of the same log (Intra-Correlation). This pattern is illustrated in the following diagram:

 

Adjoint Message

By analogy with Adjoint Thread of Activity, we introduce Adjoint Message analysis pattern. Most, if not all, analysis patterns focus on log message text and consider TID, PID, Module, source file, and function as its attributes. However, we can choose one of the attributes and consider it as a message in its own right with the original message text consigned now as another attribute. Then we can analyze the structure of the trace from the perspective of that newly selected message:

Since the number of different message values now is smaller (for example, module names) compared to normal trace messages, we can use them in protein-like encoding and structure analysis schemes (see Software Trace and Logs as Proteins9). We metaphorically name Adjoint Messages as Amino-acid-Messages (A-Messages). We can also compress same message sequences into one message which may be useful for pattern matching (and even use different color intensities to represent message cardinalities):

 

Adjoint Space

Sometimes we need memory reference information not available in software traces and logs, for example, to see the pointer dereferences, to follow pointers and linked structures. In such cases, memory dumps saved during logging sessions may help. In the case of process memory dumps, we can even have several Step Dumps10. We may force complete or kernel memory dumps after saving a log file. We call such pattern Adjoint Space:

Then we can analyze logs and memory dumps together, for example, to follow pointer data further in memory space:

There is also a reverse situation when we use logs to see past data changes before memory snapshot time (Paratext memory analysis pattern11):

 

Adjoint Thread of Activity

This pattern is an extension of Thread of Activity based on the concept of multibraiding (see below). There is also an article published in Debugged! MZ/PE magazine12.

Having considered computational threads as braided strings13 and after discerning software trace analysis patterns we can see formatted and tabulated software trace output in a new light and employ the “fabric of traces” and braid metaphors for Adjoint Thread concept. This new concept was motivated by reading about Extended Phenotype14 and extensive analysis of Citrix ETW15-based CDF16 traces using CDFAnalyzer17. The term Adjoint was borrowed from mathematics18 because the concept we discuss below resembles this metaphorical formula: (Thread A, B) = [A, Thread B]. Let me first illustrate adjoint threading using simplified trace tables. Consider this generalized software trace example (we omitted date and time column for visual clarity):

#

Source Dir

PID

TID

File Name

Function

Message

1

srcsubsystemA

2792

5676

file1.cpp

fooA

Message text…

2

srcsubsystemA

2792

5676

file1.cpp

fooA

Message text…

3

srcsubsystemA

2792

5676

file1.cpp

fooA

Message text…

4

srclib

2792

5680

file2.cpp

barA

Message text…

5

srcsubsystemA

2792

5680

file1.cpp

fooA

Message text…

6

srcsubsystemA

2792

5676

file1.cpp

fooA

Message text…

7

srclib

2792

5680

file2.cpp

fooA

Message text…

8

srclib

2792

5680

file2.cpp

fooA

Message text…

9

srcsubsystemB

2792

3912

file3.cpp

barB

Message text…

10

srcsubsystemB

2792

3912

file3.cpp

barB

Message text…

11

srcsubsystemB

2792

3912

file3.cpp

barB

Message text…

12

srcsubsystemB

2792

3912

file3.cpp

barB

Message text…

13

srcsubsystemB

2792

3912

file3.cpp

barB

Message text…

14

srcsubsystemB

2792

3912

file3.cpp

barB

Message text…

15

srcsubsystemB

2792

2992

file4.cpp

fooB

Message text…

16

srcsubsystemB

2792

3008

file4.cpp

fooB

Message text…

We see several threads in a process PID 2792. In CDFAnalyzer, we can filter trace messages that belong to any column, and if we filter by TID, we get a view of any Thread of Activity. However, each thread can “run” through any source directory, file name or function. If a function belongs to a library, then multiple threads access it. This source location (we can consider it as a subsystem), file or function view of activity is called an Adjoint Thread. For example, if we filter only subsystemA column in the trace above we get this table:

#

Source Dir

PID

TID

File Name

Function

Message

1

srcsubsystemA

2792

5676

file1.cpp

fooA

Message …

2

srcsubsystemA

2792

5676

file1.cpp

fooA

Message …

3

srcsubsystemA

2792

5676

file1.cpp

fooA

Message …

5

srcsubsystemA

2792

5680

file1.cpp

fooA

Message …

6

srcsubsystemA

2792

5676

file1.cpp

fooA

Message …

7005

srcsubsystemA

2792

5664

file1.cpp

fooA

Message …

10198

srcsubsystemA

2792

5664

file1.cpp

fooA

Message …

10364

srcsubsystemA

2792

5664

file1.cpp

fooA

Message …

10417

srcsubsystemA

2792

5664

file1.cpp

fooA

Message …

10420

srcsubsystemA

2792

5676

file1.cpp

fooA

Message …

10422

srcsubsystemA

2792

5680

file1.cpp

fooA

Message …

10587

srcsubsystemA

2792

5664

file1.cpp

fooA

Message …

10767

srcsubsystemA

2792

5680

file1.cpp

fooA

Message …

11126

srcsubsystemA

2792

5668

file1.cpp

fooA

Message …

11131

srcsubsystemA

2792

5680

file1.cpp

fooA

Message …

11398

srcsubsystemA

2792

5676

file1.cpp

fooA

Message …

11501

srcsubsystemA

2792

5668

file1.cpp

fooA

Message …

11507

srcsubsystemA

2792

5668

file1.cpp

fooA

Message …

11509

srcsubsystemA

2792

5664

file1.cpp

fooA

Message …

11513

srcsubsystemA

2792

5680

file1.cpp

fooA

Message …

11524

srcsubsystemA

2792

5668

file1.cpp

fooA

Message …

We can graphically view subsystemA as a braid string that “permeates the fabric of threads”:

We can get many different braids by changing filters, hence multbraiding. Here is another example of a driver source file view initially permeating two process contexts and four threads:

#

Source Dir

PID

TID

File Name

Function

Message

41

srcsysdriver

3636

3848

entry.c

DriverEntry

IOCTL …

80

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

99

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

102

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

179

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

180

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

311

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

447

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

448

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

457

srcsysdriver

2792

5108

entry.c

DriverEntry

IOCTL …

608

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

614

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

655

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

675

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

678

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

680

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

681

srcsysdriver

3636

3896

entry.c

DriverEntry

IOCTL …

1145

srcsysdriver

3636

4960

entry.c

DriverEntry

IOCTL …

1153

srcsysdriver

3636

4960

entry.c

DriverEntry

IOCTL …

1154

srcsysdriver

3636

4960

entry.c

DriverEntry

IOCTL …

The following diagram illustrates Adjoint Thread of Activity for a selected message:

 

Anchor Messages

When a software trace is lengthy, it is useful to partition it into several regions based on a sequence of Anchor Messages. We can determine the choice of them by Vocabulary Index or Adjoint Thread of Activity. For example, an ETW trace with almost 900,000 messages recorded during a desktop connection for 6 minutes can be split into 14 segments by the adjoint thread of DLL_PROCESS_ATTACH message (the message was generated by DllMain of an injected module, not shown in the trace output for formatting clarity):

#      PID  TID  Time         Message​
24226  2656 3480 10:41:05.774 AppA.exe: DLL_PROCESS_ATTACH     ​
108813 4288 4072 10:41:05.774 AppB.exe: DLL_PROCESS_ATTACH​
112246 4180 3836 10:41:05.940 DllHost.exe: DLL_PROCESS_ATTACH​
135473 2040 3296 10:41:12.615 AppC.exe: DLL_PROCESS_ATTACH​
694723 1112 1992 10:44:23.393 AppD.exe: DLL_PROCESS_ATTACH​
703962 5020 1080 10:44:42.014 DllHost.exe: DLL_PROCESS_ATTACH​
705511 4680 3564 10:44:42.197 DllHost.exe: DLL_PROCESS_ATTACH​
705891 1528 2592 10:44:42.307 regedit.exe: DLL_PROCESS_ATTACH​
785231 2992 4912 10:45:26.516 AppE.exe: DLL_PROCESS_ATTACH​
786523 3984 1156 10:45:26.605 powershell.exe: DLL_PROCESS_ATTACH​
817979 4188 4336 10:45:48.707 wermgr.exe: DLL_PROCESS_ATTACH​
834875 3976 1512 10:45:52.342 LogonUI.exe: DLL_PROCESS_ATTACH​
835229 4116 3540 10:45:52.420 AppG.exe: DLL_PROCESS_ATTACH

Each region can be analyzed independently for any anomalies, for example, to look for the answer to a question why wermgr.exe was launched. We illustrate partitioning in the following schematic diagram:

It is also possible to make different trace segmentation by interleaving regions above with another set of Anchor Messages comprising of the adjoint thread of DLL_PROCESS_DETACH message:

 

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset