Chapter 9. Troubleshooting Exchange Server 2013

Microsoft Exchange Server 2013 is critically important to your organization, and to be a successful Exchange administrator, you need to know how to diagnose and resolve problems as quickly as possible. Throughout this book, I’ve discussed techniques you can use to configure, maintain, and troubleshoot Exchange Server 2013. In this chapter, I discuss additional techniques you can use to perform comprehensive troubleshooting.

Troubleshooting essentials

Client Access and Mailbox servers running Exchange 2013 can experience many types of issues that require troubleshooting to resolve. These issues can range from performance problems, to denied logins, to service outages. To help you resolve problems as they occur, you need a solid understanding of Exchange architecture, which I’ve covered throughout this book as part of the core discussion. Now let’s look at architecture components specific to maintaining, diagnosing, and resolving Exchange services.

Tracking server health

In Exchange Server 2013, the Managed Availability architecture is used to automatically detect and correct many types of system problems with a goal of helping to ensure the overall availability of Exchange services. Managed Availability is implemented as part of both the Client Access server role and the Mailbox server role. All servers running Exchange 2013 have this architecture.

As part of Managed Availability, hundreds of probes, monitors, and responders are running constantly on Exchange 2013 to analyze, monitor, and maintain services. If a problem is identified, it often can be fixed automatically. Figure 9-1 provides an overview of how Managed Availability works. Managed Availability has three asynchronous components:

A diagram of the workflow for Managed Availability in Exchange Server 2013, showing the flow from the probe engine to monitors to responders to final escalation.
Figure 9-1. Overview of Managed Availability in Exchange Server 2013.
  • Probe engineTakes measurements on the server and collects data samples. The collected data flows to the monitor engine.

  • Monitor engine. Uses the measurements and collected data to determine the status of Exchange services and components. The processed data flows to the responder engine.

  • Responder engine. Takes recovery actions based on unhealthy states reported by the monitor engine. If automated recovery is unsuccessful, escalates by issuing event log notifications.

By delving deeper into the Managed Availability architecture, you can get a better understanding of how the automated monitoring and response processes work. As Figure 9-2 shows, the workflow has three phases:

A diagram of the probe, monitor, and recovery components of Managed Availability, showing workflow from probe sampling to monitor detection to responses for recovery.
Figure 9-2. The probe, monitor, and recovery components of Managed Availability.
  • Sampling. The probe engine checks the state of Exchange services and components according to specific probes. Each probe has a top-level identifier and one or more related probe definitions. Each probe definition identifies the name of the associated probe, the health set to which the probe belongs, the target resource being tracked, a recurrence interval, and a timeout value.

  • Detection. The monitor engine analyzes the sampled data and issues alerts related to changes in the state of Exchange services and components according to specific monitors. Each monitor has a top-level identifier and one or more related monitor definitions. Each monitor definition identifies the name of the associated monitor, the health set to which the monitor belongs, and a sample mask that specifies the top level identifier for related probes.

  • RecoveryThe responder engine responds to unhealthy states identified in alerts. Each responder has an associated responder definition that identifies the recovery action to be taken, the name of the responder, the target resource that will be acted on, and an alert mask that specifies the top-level identifier for related monitors.

Note

Rather than list each associated monitor or probe, Managed Availability components use name masking. Here, a top-level identifier is provided and then used as a mask to identify the related monitors and probes.

Collections of monitors are grouped together in health sets. Exchange 2013 has health sets for everything from Microsoft ActiveSync to User Throttling. Each health set has a number of associated monitors. As part of automated recovery, responders use the alerts issued by monitors to take recovery actions. There are three levels of recovery:

  • Tier 1. Provides the initial recovery response. As an initial response to an unhealthy state, responders typically will try to restart the service that uses the affected components.

  • Tier 2. Provides more advanced and customized recovery response. If restarting the service doesn’t resolve the issue, the monitor state is escalated to the next level. The action or actions taken at this level to recover depend on the component but could include failover, bug checking, re-initialization of components to bring them back online, and more.

  • Tier 3. Uses the escalate responder to issue event log notifications regarding the problem. If you’ve installed the Exchange Server 2013 Management Pack, escalated issues are sent to Microsoft System Center Operations Manager via the event logs as well.

Although designed to resolve many typical problems, Managed Availability cannot resolve every problem, and this escalation is built into the architecture. As part of diagnosing and resolving problems, you can check the status of monitors and health sets by using:

  • Get-HealthReport. Details the state and health of Exchange resources, monitors, and services

    Get-HealthReport -Identity ServerID [-GroupSize SizeOfRollup]
    [-HaImpactingOnly <$true | $false>] [-HealthSet HealthSet]
    [-MinimumOnlinePercent MinToDegraded>]
    [-RollupGroup <$true | $false>]
  • Get-ServerHealth. Returns the state of monitored resources in addition to alert values

    Get-ServerHealth -Identity ServerID [-HaImpactingOnly <$true |
    $false>] [-HealthSet HealthSet]

To check the state of resources, enter the following command:

Get-ServerHealth -Identity ServerID

ServerID is the host name or fully qualified name of the Exchange server to check, such as:

Get-ServerHealth -Identity MailServer42

In the following sample, I’ve omitted the server name and server component columns from the default output:

State     Name                TargetResource     HealthSetName   AlertValue
-----     -----               --------------     --------------  ----------
Online    AutodiscoverProxy... MSExchangeAutoDis... Autodiscover... Healthy
Online    ActiveSyncProxyTe... MSExchangeSyncApp... ActiveSync.P... Healthy
Repairing ECPProxyTestMonitor  MSExchangeECPAppPool ECP.Proxy     Unhealthy

Real World

Often when you work with Exchange Management Shell, you’ll find that the output is too long for the default screen buffer size or that the output has too many columns for the default window size. Because of this, I prefer to use a screen buffer height of 2,999 and width of 120, along with a window width of 120 and height of 74. This makes Exchange Management Shell easier to work with. If you are using Windows 8 or Windows Server 2012, you’ll find that you can’t customize all of these settings from the Start screen. Instead, press and hold or right-click the tile for the shell on the Start screen, and then select Open File Location. This opens File Explorer to the folder in which the shortcut for Exchange Management Shell is located. Press and hold or right-click this shortcut, and then select Properties. In the Properties dialog box, you’ll then be able to use the options on the Layout tab to customize the shell.

From the State value, you can determine the online status of a monitored resource that is used for transport, connections, or communications. State values you might see include:

  • Online. All the components of the monitored resource are online.

  • Partially Online. Some of the components of the monitored resource are not online.

  • Offline. All the components of the monitored resource are offline.

  • Sidelined. The monitored resource is sidelined and might not be in a fully online state.

  • Functional. The monitored resource is functional but might not be in a fully online state.

  • NotApplicable. An online or offline status is not applicable to this monitored resource.

  • Unavailable. The monitored resource is unavailable.

From the alert value, you can determine the general health status of a monitored resource. Alert values you might see include:

  • Healthy. All the components of the monitored resource are healthy.

  • Degraded. Some of the components of the monitored resource are not healthy.

  • Disabled. The components of the monitored resource have been disabled.

  • Unhealthy. All the components of the monitored resource are not healthy.

  • Sidelined. The monitored resource is sidelined and might not be in a fully healthy state.

  • Repairing. The monitored resource is functional but is recovering from a degraded or unhealthy state.

  • Unavailable. The monitored resource is unavailable.

  • Uninitialized. The monitored resource hasn’t been initialized.

If a health set has a status other than healthy or online, you can take a closer look at it by using the -HealthSet parameter. List the properties of the health set as shown in this example:

Get-ServerHealth -Identity MailServer42 -HealthSet ECP.Proxy | fl

You can get a formatted list of every monitor, target resource, and its related health set by entering the following command:

Get-ServerHealth localhost | ft name,targetresource,healthsetname

The output lists the name of the monitor, the target resource, and the name of the corresponding health set. You can store the output for later reference by redirecting the output to a file. In the following example, c:data is the name of an existing folder, and Healthset-Reference.txt is the name of the file to create:

(get-serverhealth localhost|ft name,targetresource,healthsetname) >
c:datahealthset-reference.txt

The output will look similar to the following:

Name                         TargetResource    HealthSetName
----                         --------------    -------------
ActiveSyncV2CTPMonitor       ActiveSync        ActiveSync
ActiveSyncCTPMonitor         ActiveSync        ActiveSync
ActiveSyncV2DeepTestMonitor  ActiveSync        ActiveSync.Protocol
ActiveSyncDeepTestMonitor    ActiveSync        ActiveSync.Protocol

Tracking user and workload throttling

Whenever you are trying to diagnose and resolve problems with Exchange 2013, you need to keep in mind how user and workload throttling might be affecting performance. All users with mailboxes on servers running Exchange 2013 are subject to user throttling policy.

The default user throttling policy is named the Global Throttling Policy. As the name implies, this policy has global scope and applies throughout the organization. User throttling policies also can have organization and regular scope. If you want to configure user throttling, you should create policies with these scopes rather than modify the Global Throttling Policy.

You can list currently defined user throttling policies by entering Get-ThrottlingPolicy at the shell prompt. To create and manage user throttling policies, you can use New-ThrottlingPolicy, Set-ThrottlingPolicy, and Remove-ThrottlingPolicy. You can view throttling policies assigned to users by using Get-ThrottlingPolicyAssociation, and assign user throttling policies to users by using Set-ThrottlingPolicyAssociation.

In addition to user throttling, Exchange Server manages workloads for protocols, features, and services using workload throttling policy. Workloads are automatically throttled to prevent overuse of system resources and to try to ensure managed resources maintain a healthy state.

Each defined workload has an associated policy and classification. Workload policies are used to enable and configure workloads. Workload classifications set the default priority of the workload. Classifications that can be assigned to workloads include:

  • Urgent

  • Customer Expectation

  • Internal Maintenance

  • Discretionary

You can view the current workload policies and their associated workload classifications by entering Get-WorkloadPolicy at the Shell prompt. To create and manage workload policies, you can use New-WorkloadPolicy, Set-WorkloadPolicy, and Remove-WorkloadPolicy.

Managed resources have health indicators and resource thresholds. Health indicators are used to measure the relative health of the workload in terms of the resources used. Health indicators tracked include:

  • Percent CPU utilization

  • Mailbox database RPC latency

  • Mailbox database replication health

  • Content indexing age of last notification

  • Content indexing retry queue size

Resource thresholds are used to configure usage limits for a system resource. Within each workload classification, one of three thresholds can be assigned: underloaded, overloaded, or critical. As an example:

  • Discretionary workloads are considered underloaded at 70 percent utilization, overloaded at 80 percent utilization, and critical at 100 percent utilization.

  • Internal Maintenance workloads are considered underloaded at 75 percent utilization, overloaded at 85 percent utilization, and critical at 100 percent utilization.

  • Customer Expectation workloads are considered underloaded at 80 percent utilization, overloaded at 90 percent utilization, and critical at 100 percent utilization.

You can view the current resource threshold settings for each workload classification by entering the following command:

Get-ResourcePolicy | fl

To create and manage resource policies, you can use New-ResourcePolicy, Set-ResourcePolicy, and Remove-ResourcePolicy. After you’ve defined custom workload and resource policies, you can create a policy object based on a particular policy by using New-WorkloadManagementPolicy. You then assign the workload management policy to a server by using Set-ExchangeServer with the –WorkloadManagementPolicy and –Server parameters.

Tracking configuration changes

As part of your standard operating procedures, you should track changes in the configuration of your Exchange servers. The Exchange Management Shell provides the following cmdlets for obtaining detailed information on the current configuration of your Exchange servers:

  • Get-ClientAccessServer. Displays configuration details for servers with the Client Access server role

  • Get-ExchangeServer. Displays the general configuration details for Exchange servers

  • Get-MailboxServer. Displays configuration details for servers with the Mailbox server role

  • Get-OrganizationConfigDisplays summary information about your Exchange organization

  • Get-TransportService. Displays configuration details for servers with the Mailbox or Edge Transport server role

To get related details for a specific server, you pass the Get-TransportService cmdlet the identity of the server you want to work with, as shown in the following example:

Get-TransportService mailserver36 | fl

To get related details for all servers, omit the –Identity parameter, as shown in the following example:

Get-TransportService | fl

When you finalize the configuration of your Exchange servers, you should use these cmdlets to store the configuration details for each server role. To store the configuration details in a file, redirect the output to a file, as shown in the following example:

Get-TransportService mailserver36 | fl >
c:SavedConfigs	ransport2014-0211.txt

If you then store the revised configuration, any time you make significant changes you can use this information during troubleshooting to help resolve problems that might be related to configuration changes. To compare two configuration files, you can use the file compare command, fc, at an elevated, administrator command prompt. When you use the following syntax with the fc command, the output is the difference between two files:

fc FilePath1 FilePath2

FilePath1 is the full file path to the first file and FilePath2 is the full file path to the second file. Here is an example:

fc c:SavedConfigs	ransport2014-0211.txt c:SavedConfigs
transport2014-0221.txt

Because the files contain configuration details for specific dates, the changes shown in the output represent the configuration changes that you’ve made to the server.

Testing service health, mail flow, replication, and more

As part of troubleshooting, you’ll often want to determine the status of required services, which can be done by using Test-ServiceHealth. The basic syntax is:

Test-ServiceHealth [-Server ServerName]

ServerName is the name of the server to test. If you omit a server name, the local server is tested. As shown in the following sample output, Test-ServiceHealth shows you which required services are running and which aren’t:

Role                    : Mailbox Server Role
RequiredServicesRunning : True
ServicesRunning         : {IISAdmin, MSExchangeADTopology,
MSExchangeDelivery, MSExchangeIS, MSExchangeMailboxAssistants,
MSExchangeRepl, MSExchangeRPC, MSExchangeServiceHost,
MSExchangeSubmission, MSExchangeThrottling, MSExchangeTransportLogSearch,
W3Svc, WinRM}
ServicesNotRunning      : {}
Role                    : Client Access Server Role
RequiredServicesRunning : True
ServicesRunning         : {IISAdmin, MSExchangeADTopology,
MSExchangeMailboxReplication, MSExchangeRPC, MSExchangeServiceHost, W3Svc,
WinRM}
ServicesNotRunning      : {}
Role                    : Unified Messaging Server Role
RequiredServicesRunning : True
ServicesRunning         : {IISAdmin, MSExchangeADTopology,
MSExchangeServiceHost, MSExchangeUM, W3Svc, WinRM}
ServicesNotRunning      : {}
Role                    : Hub Transport Server Role
RequiredServicesRunning : True
ServicesRunning         : {IISAdmin, MSExchangeADTopology,
MSExchangeEdgeSync, MSExchangeServiceHost, MSExchangeTransport,
MSExchangeTransportLogSearch, W3Svc, WinRM}
ServicesNotRunning      : {}

The server in this example has the Client Access server role and the Mailbox server role installed. Although Exchange 2013 no longer has separate UM and Hub Transport roles, Test-ServiceHealth continues to list separately the related required services and their status.

As part of troubleshooting, you’ll often need to test mail flow and replication. If you suspect a problem with mailflow, you can quickly send a test message by using Test-Mailflow. This cmdlet verifies whether mail can be successfully sent from and delivered to the system mailbox as well as whether email is sent between Mailbox servers within a defined latency threshold.

To test mail flow from one mailbox server to another or from one mailbox server to a target mailbox database, you can use the following syntax:

Test-MailFlow -Identity OriginatingMailServer [-TargetMailboxServer
DestinationMailServer | -TargetDatabase DestinationDatabase]

In the following example, a test message is sent from MailboxServer18 to MailboxServer96:

Test-MailFlow -Identity MailboxServer18 -TargetMailboxServer
MailboxServer96

As shown in this sample, the output of the command tells you whether the message was sent and received successfully:

TestMailflowResult : Success
MessageLatencyTime : 00:00:04.0077377
IsRemoteTest       : False
Identity           :
IsValid            : True
ObjectState        : New

If you suspect a problem with replication, you can quickly determine the status of replication components by using Test-ReplicationHealth. This cmdlet checks the status of all aspects of replication, replay, and availability on a Mailbox server in a Database Availability group. Use Test-ReplicationHealth to help you monitor the status of continuous replication, availability of Active Manager, and the general status of availability components.

The basic syntax is:

Test-MailFlow [-Identity MailboxServerId]

Such as:

Test-MailFlow MailServer42

As shown in this sample, the output of the command tells you the status of each replication component on the Mailbox server:

Server          Check                      Result     Error
------          -----                      ------     -----
MAILSERVER42    ReplayService              Passed
MAILSERVER42    ActiveManager              Passed
MAILSERVER42    TasksRpcListener           Passed
MAILSERVER42    DatabaseRedundancy         *FAILED*   Failures:...
MAILSERVER42    DatabaseAvailability       *FAILED*   Failures:...

If errors are found, you’ll want to get more details by formatting the output in a list, such as:

Test-MailFlow MailServer42 | fl server, check*, result, error

The error details should help you identify the problem. In this example, the Mailbox database doesn’t have enough copies to be fully redundant:

Server           : MAILSERVER42
Check            : DatabaseRedundancy
CheckDescription : Verifies that databases have sufficient redundancy. If
this check fails, it means that some databases are at risk of losing data.
Result           : *FAILED*
Error            : Failures:
There were database redundancy check failures for database 'Engineering
Mailbox Database' that may be lowering its redundancy and
putting the database at risk of data loss. Redundancy Count: 1. Expected
Redundancy Count: 2.

In this example, the Engineering Mailbox Database does not have enough copies for full redundancy. This could be because an administrator forgot to make a passive copy of the database or because a Mailbox server hosting a copy of the database is offline or otherwise unavailable.

Other useful cmdlets for checking the Exchange organization include:

  • Test-ActiveSyncConnectivity. Performs a full synchronization against a specified mailbox to test the configuration of Exchange ActiveSync

  • Test-ArchiveConnectivity. Verifies archive functionality for a mailbox user

  • Test-AssistantHealth. Verifies that the Exchange Mailbox Assistant service is running as expected

  • Test-CalendarConnectivity. Verifies that calendar sharing as part of Outlook Web App is working properly

  • Test-EcpConnectivity. Verifies that the Exchange Admin Center is running as expected

  • Test-EdgeSynchronization. Verifies that the subscribed Edge Transport servers have a current and accurate synchronization status

  • Test-ExchangeSearch. Verifies that Exchange Search is currently enabled and is indexing new email messages in a timely manner

  • Test-FederationTrust. Verifies that the federation trust is properly configured and functioning as expected

  • Test-FederationTrustCertificate. Verifies the status of certificates used for federation on all Mailbox and Client Access servers

  • Test-ImapConnectivity. Verifies that the IMAP4 service is running as expected

  • Test-IPAllowListProvider. Verifies the configuration for a specific IP allow list provider

  • Test-IPBlockListProvider. Verifies the configuration for a specific IP block list provider

  • Test-IRMConfiguration. Verifies Information Rights Management (IRM) configuration and functionality

  • Test-MapiConnectivity. Verifies server functionality by logging on to the mailbox that you specify

  • Test-MRSHealth. Verifies the health of the Microsoft Exchange Mailbox Replication Service

  • Test-OAuthConnectivity. Verifies that OAuth authentication is working properly

  • Test-OutlookConnectivity. Verifies end-to-end Microsoft Outlook client connectivity and also tests for Outlook Anywhere (RPC/HTTP) and TCP-based connections

  • Test-OutlookWebServicesVerifies the Autodiscover service settings for Outlook

  • Test-OwaConnectivity. Verifies that Outlook Web App is running as expected

  • Test-PopConnectivity. Verifies that the POP3 service is running as expected

  • Test-PowerShellConnectivity. Verifies whether Windows PowerShell remoting on the target Client Access server is functioning correctly

  • Test-SenderId. Verifies whether a specified IP address is the legitimate sending address for a specified SMTP address

  • Test-SmtpConnectivity. Verifies SMTP connectivity for a specified server

  • Test-UMConnectivity. Verifies the operation of a computer that has the Unified Messaging installed

  • Test-WebServicesConnectivity. Verifies the functionality of Exchange Web Services

Diagnosing and resolving problems

As discussed previously in this chapter in the Troubleshooting essentials section, you can use Get-ServerHealth to list monitors, target resources, and corresponding health sets. Knowing which monitor, target resource, and health set you want to work with is important for troubleshooting. To diagnose and resolve problems, you often need to work backward from the reported problem to the source of the problem, as shown here:

  1. Find recovery actions.

  2. Trace recovery actions to their responder.

  3. Use the responses logged by a responder to find the related monitor.

  4. Find the probes for a monitor.

  5. Locate the error messages being logged by probes.

  6. Verify probe errors still exist.

The sections that follow examine the related procedures.

Identifying recovery actions

During recovery, the responder engine uses responders to take appropriate recovery actions, based on the type of alert and the affected target resource. Whenever a responder takes a recovery action, it logs related events in the Microsoft.Exchange.ManagedAvailability/RecoveryActionResults event log. An entry with an event ID of 500 indicates that a recovery action has started. An entry with an event ID of 501 indicates that the recovery action was completed.

Although you can view the events in Event Viewer, you can also view them at the Shell prompt. To collect the events in the RecoveryActionResults event log so you can process them, enter the following commands:

$Results = Get-WinEvent –ComputerName ServerName
-LogName Microsoft-Exchange-ManagedAvailability/RecoveryActionResults
$ResultsXML = ($Results | Foreach-object
-Process {[xml]$_.toXml()}).event.userData.eventXml

ServerName is the name of the Client Access or Mailbox server that you want to work with. The first command collects the events. The second command formats the event entries so that they are easier to work with. These commands can be combined and shortened to:

$ResultsXML = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ManagedAvailability/RecoveryActionResults |
 % {[xml]$_.toXml()}).event.userData.eventXml

Next, you need to identify a response that you want to look at more closely. If you want to review corrective actions taken by Managed Availability, you’d look for events that occurred today and completed successfully. The following example parses the previously collected event data and looks for events from 2013-07-01 that have a successful result:

$ResultsXML | Where-Object {$_.Result -eq "Succeeded" -and $_.EndTime -like
"2013-07-01*"}| ft -AutoSize StartTime,RequestorName

As shown in this example, you also could look for events that occurred but where the responder failed to correct the issue:

$ResultsXML | Where-Object {$_.Result -eq "Failed" -and $_.EndTime -like
"2013-07-01*"}| ft -AutoSize StartTime,RequestorName

With either approach, you’ll then get a list of issues by start time and requestor name, such as:

StartTime                    RequestorName
---------                    -------------
2013-07-01t21:00:10.1008312Z SearchLocalCopyStatusRestartSearchService
2013-07-01t21:00:06.1162578Z RWSProxyTestRecycleAppPool
2013-07-01t21:00:00.4597184Z ClusterEndpointRestart
2013-07-01t20:59:36.1601996Z RWSProxyTestRecycleAppPool
2013-07-01t20:57:17.8657794Z OutlookSelfTestRestart
2013-07-01t20:58:03.7958299Z RWSProxyTestRecycleAppPool
2013-07-01t20:55:24.6591276Z ServiceHealthActiveManagerRestartService
2013-07-01t20:57:11.2223574Z ClusterEndpointRestart
2013-07-01t20:55:06.9326525Z OutlookSelfTestRestart
2013-07-01t20:57:02.6438007Z RWSProxyTestRecycleAppPool
2013-07-01t20:54:34.5391633Z OutlookMailboxDeepTestRestart
2013-07-01t20:56:32.4360908Z RWSProxyTestRecycleAppPool
2013-07-01t20:54:41.4926429Z ClusterEndpointRestart
2013-07-01t20:53:34.1596832Z ActiveDirectoryConnectivityRestart
2013-07-01t20:52:11.0579430Z ClusterEndpointRestart

In this example, the value in the RequestorName column is the responder that took the action. To examine the properties of a recovery action, run a query for a specific responder, such as:

$ResultsXML | Where-Object {$_.Result -eq "Failed" -and $_.EndTime -like
"2013*" -and $_.RequestorName -eq "OutlookSelfTestRestart"}| fl

The output includes the details logged for events in which the recovery action initiated by the OutLookSelfTestRestart responder failed. Each entry will look similar to the following:

auto-ns2            : http://schemas.microsoft.com/win/2004/08/events
xmlns               : myNs
Id                  : RestartService
InstanceId          : 130629.015717.86577.001
ResourceName        : MSExchangeRPC
StartTime           : 2013-07-01T20:57:17.8657794Z
EndTime             : 2013-07-01T20:59:19.4994266Z
State               : Finished
Result              : Failed
RequestorName       : OutlookSelfTestRestart
ExceptionName       : TimeoutException
ExceptionMessage    : System error.
Context             : [null]
CustomArg1          : [null]
CustomArg2          : [null]
CustomArg3          : [null]
LamProcessStartTime : 7/01/2013 1:12:28 PM

Although the responder name and details will often help you identify the type of problem that occurred, you can keep working toward the exact problem that occurred by finding the monitor that triggered the responder.

Identifying responders

Whenever the Health Manager service starts, it logs related events in the Microsoft.Exchange.ActiveMonitoring/ResponderDefinition event log that you can use to get properties of responders. To collect the events in the Responder-Definition event log so that you can process them, enter the following command:

$Responders = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/ResponderDefinition | %
{[xml]$_.toXml()}).event.userData.eventXml

ServerName is the name of the Client Access or Mailbox server with which you want to work. If you examine the definition of a responder, the AlertMask property will identify the monitor associated with the responder. Thus, one way to display the required information is to look for the responder and list the responder name and the associated alert mask in the output as shown in this example:

$Responders | ? {$_.Name –eq "OutlookSelfTestRestart"} |
 ft name, alertmask

The output will then be similar to the following:

Name                                   AlertMask
----                                   ---------
OutlookSelfTestRestart                 OutlookSelfTestMonitor
OutlookSelfTestRestart                 OutlookSelfTestMonitor

You’ll know the related monitor is named OutlookSelfTestMonitor. Before examining the related monitor, you might want to display the full details for the responder to help you understand exactly how the responder works. To display the full details for a responder, simply list its properties in a formatted list as shown in this example:

$Responders | ? {$_.Name –eq "OutlookSelfTestRestart"} | fl

During recovery, the responder engine uses responders to take appropriate recovery actions based on the alert type and the affected target resource. The wait interval specifies the minimum amount of time a responder must wait before running again. As shown in this partial output, the definition details can help you learn more about the responder:

Id                             : 452
AssemblyPath                   : C:Program FilesMicrosoftExchange
 ServerV15BinMicrosoft.Exchange.Monitoring.ActiveMonitoring
.Local.Components.dll
TypeName                       : Microsoft.Exchange.Monitoring
.ActiveMonitoring.Responders.ResetIISAppPoolResponder
Name                           : OutlookSelfTestRestart
WorkItemVersion                : [null]
ServiceName                    : Outlook.Protocol
DeploymentId                   : 0
ExecutionLocation              : [null]
CreatedTime                    : 2013-07-01T20:02:32.2527661Z
Enabled                        : 1
TargetResource                 : MSExchangeRpcProxyAppPool
RecurrenceIntervalSeconds      : 0
TimeoutSeconds                 : 300
StartTime                      : 2013-07-01T20:02:32.2527661Z
UpdateTime                     : 2013-07-01T17:55:07.9754209Z
MaxRetryAttempts               : 3
ExtensionAttributes            : <ExtensionAttributes AppPoolName=
"MSExchangeRpcProxyAppPool" MinimumSecondsBetweenRestarts="300"
MaximumAllowedRestartsInAnHour="3" MaximumAllowedRestartsInADay="-1"
DumpOnRestart="FullDump" DumpPath="C:Program FilesMicrosoftExchange
ServerV15Dumps" MinimumFreeDiskPercent="15" MaximumDumpsPerDay="9"
MaximumDumpDurationInSeconds="180" />
AlertMask                      : OutlookSelfTestMonitor
WaitIntervalSeconds            : 30
MinimumSecondsBetweenEscalates : 0
NotificationServiceClass       : 0
AlwaysEscalateOnMonitorChanges : 0

Identifying monitors

Monitor definitions are written in the Microsoft.Exchange.ActiveMonitoring/Monitor-Definition event log. If you examine the properties of events, you can learn more about monitors and learn their related probes. To collect the events in the Monitor-Definition event log so that you can process them, enter the following command:

$Monitors = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/MonitorDefinition | %
{[xml]$_.toXml()}).event.userData.eventXml

ServerName is the name of the Client Access or Mailbox server with which you want to work. If you examine the definition of a monitor, the SampleMask property will identify the probes associated with the monitor. List the monitor name and the associated sample mask in the output as shown in this example:

$Monitors | ? {$_.Name –eq "OutlookSelfTestMonitor"} |
 ft name, samplemask

The output will then be similar to the following:

Name                                   AlertMask
----                                   ---------
OutlookSelfTestMonitor                 OutlookSelfTestProbe

As shown in the output, probes related to this monitor have the top-level identifier: OutlookSelfTestProbe. To display the full details for a monitor, simply list its properties in a formatted list as shown in this example:

$Monitors | ? {$_.Name –eq "OutlookSelfTestMonitor"} | fl

During detection, the monitor engine uses monitors to analyze the sampled data. Whether a monitor issues an alert depends on the state of the target resource. As shown in this partial output, the monitor details provide a lot of information, including the exact definition of each transition state for the monitor:

Id                                 : 339
AssemblyPath                       : C:Program FilesMicrosoftExchange
ServerV15BinMicrosoft.Exchange.Monitoring.ActiveMonitoring.Local.
Components.dll
TypeName                           : Microsoft.Exchange.Monitoring.
ActiveMonitoring .ActiveMonitoring.Monitors
.OverallConsecutiveProbeFailuresMonitor
Name                               : OutlookSelfTestMonitor
WorkItemVersion                    : [null]
ServiceName                        : Outlook.Protocol
DeploymentId                       : 0
ExecutionLocation                  : [null]
CreatedTime                        : 2013-07-01T20:02:32.2215111Z
Enabled                            : 1
RecurrenceIntervalSeconds          : 0
TimeoutSeconds                     : 30
StartTime                          : 2013-07-01T20:02:32.2215111Z
UpdateTime                         : 2013-07-01T19:59:57.2971492Z
MaxRetryAttempts                   : 0
ExtensionAttributes                : [null]
SampleMask                         : OutlookSelfTestProbe
MonitoringIntervalSeconds          : 300
MinimumErrorCount                  : 0
MonitoringThreshold                : 2
SecondaryMonitoringThreshold       : 0
ServicePriority                    : 0
ServiceSeverity                    : 0
IsHaImpacting                      : 1
CreatedById                        : 0
InsufficientSamplesIntervalSeconds : 28800
StateAttribute1Mask                : [null]
FailureCategoryMask                : 0
ComponentName                      : ServiceComponents/
Outlook.Protocol/Critical
StateTransitionsXml                : <StateTransitions>
<Transition ToState="Degraded" TimeoutInSeconds="0" />
<Transition ToState="Degraded1" TimeoutInSeconds="10" />
<Transition ToState="Degraded2" TimeoutInSeconds="240" />
<Transition ToState="Unhealthy" TimeoutInSeconds="300" />
<Transition ToState="Unhealthy1" TimeoutInSeconds="600" />
<Transition ToState="Unrecoverable" TimeoutInSeconds="1200" />
</StateTransitions>
Version                            : 65536

Identifying probes

To identify the probes associated with the OutlookSelfTestProbe identifier, you need to examine the probe definitions. Probe definitions are written in the Microsoft.Exchange.ActiveMonitoring/ProbeDefinition event log. If you examine the properties of events, you can learn more about each probe. To collect the events in the ProbeDefinition event log so that you can process them, enter the following command:

$Probes = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/ProbeDefinition | %
{[xml]$_.toXml()}).event.userData.eventXml

ServerName is the name of the Client Access or Mailbox server with which you want to work. Next, examine the associated probes to learn more about them as shown in this example:

$Probes | ? {$_.Name –eq "OutlookSelfTestProbe"} | fl

The output will then list the definition of each associated probe. Although many monitors have many associated probes, the OutlookSelfTestMonitor has only one associated probe. In this partial sample of the output, note the recurrence interval, timeout, and max retry values for this probe:

Id                          : 106
AssemblyPath                : C:Program FilesMicrosoftExchange
ServerV15BinMicrosoft.Exchange.Monitoring.ActiveMonitoring
.Local.Components.dll
TypeName                    : Microsoft.Exchange.Monitoring.ActiveMonitoring
.RpcClientAccess.LocalRpcProbe+SelfTest
Name                        : OutlookSelfTestProbe
WorkItemVersion             : [null]
ServiceName                 : Outlook.Protocol
DeploymentId                : 0
ExecutionLocation           : [null]
CreatedTime                 : 2013-07-01T20:02:32.2058880Z
Enabled                     : 1
RecurrenceIntervalSeconds   : 10
TimeoutSeconds              : 8
StartTime                   : 2013-07-01T20:02:41.2215111Z
UpdateTime                  : 2013-07-01T19:59:57.2190196Z
MaxRetryAttempts            : 0
ExtensionAttributes         : <ExtensionAttributes AccountLegacyDN="
/o=First Organization/ou=Monitoring Mailboxes/cn=Recipients
/cn=HealthMailbox3d899a319e1e4c019f5362ead47f0185"
PersonalizedServerName="278c17fc-8adc-49d7-affa-90f0ea7679b6@
pocket-consultant.com" StartupNotificationId="MSExchangeRPC"
StartupNotificationMaxStartWaitInSeconds="12
/>
CreatedById                 : 0
Account                     : <r at="Kerberos" ln="POCKET-CONSULTASM_
fef8fb0aaba040c19"><s>S-1-5-21-1487214957-3235876329-
1606252878-1151</s><s a="7" t="1">
S-1-5-21-1487214957-3235876329-1606252878-513</s>
<s a="7" t="1">S-1-1-0</s><s a="7" t="1">S-1-5-2</s>
<s a="7" t="1">S-1-5-11</s><s a="7" t="1">S-1-5-15</s>
<s a="3221225479" t="1">S-1-5-5-0-8194354</s><s a="7"
t="1">
S-1-18-2</s></r>
AccountDisplayName          : HealthMailbox3d899a319e1e4c019f5362ead47f0185
Endpoint                    : MailServer21.pocket-consultant.com
SecondaryAccount            : [null]
SecondaryAccountDisplayName : [null]
SecondaryEndpoint           : MailServer21.pocket-consultant.com
ExtensionEndpoints          : [null]
Version                     : 65536
ExecutionType               : 0

During sampling, the probe engine runs probes against target resources. How often a probe runs depends on its recurrence interval. How long a probe waits before reporting failure depends on its timeout value. Also listed in the output is the system account under which the probe runs and the authentication method used for that account.

Viewing error messages for probes

After you know which probes are associated with the issue you are tracking, you can get the error messages for the probes. Probe results are written in the Microsoft.Exchange.ActiveMonitoring/ProbeResult event log. As this log is quite extensive, you want to filter the logs for the exact information you are seeking. Properties for related events include:

  • ServiceNameIdentifies the related health set.

  • ResultName. Identifies the name of the probe. When there are multiple probes for a monitor the name includes the monitor’s sample mask and the resource it verifies.

  • Error. Lists the error returned by this probe, if it failed.

  • Exception. Lists the call stack of the error, if it failed.

  • ResultType. Lists an integer value that indicates the result type: 1 for timeout, 2 for poisoned, 3 for succeeded, 4 for failed, 5 for quarantined, and 6 for rejected.

  • ExecutionStartTime. Lists when the probe started.

  • ExecutionEndTime. Lists when the probe completed.

  • ExecutionContext. Provides additional information about the probe’s execution context.

  • FailureContext. Provides additional information about the probe’s failure.

Knowing this, you can collect the events in the ProbeResult event log and filter them. In this example, you look for failure results related to OutlookSelfTestProbe:

$Errors = (Get-WinEvent –ComputerName ServerName -LogName
 Microsoft-Exchange-ActiveMonitoring/ProbeResult -FilterXPath
"*[UserData[EventXML[ResultName='OutlookSelfTestProbe'][ResultType='4']]]"
| % {[XML]$_.toXml()}).event.userData.eventXml

ServerName is the name of the Client Access or Mailbox server with which you want to work. After you filter the log, you can display the results you want to see, such as:

$Errors | select -Property *Time,Result*,Error*,*Context

In this example, the output lists the time-, result-, error-, and context-related properties, which will help you identify the exact problem that occurred. Consider the following example:

ExecutionStartTime : 2013-07-01T21:24:26.9816420Z
ExecutionEndTime   : 2013-07-01T21:24:27.7508864Z
ResultId           : 644887342
ResultName         : OutlookSelfTestProbe
ResultType         : 4
Error              : The request was aborted: Could not create SSL/TLS
secure channel.
ExecutionContext   :     RpcProxy connectivity verification
Task produced output:
- TaskStarted = 7/01/2013 2:24:26 PM
- TaskFinished = 7/01/2013 2:24:27 PM
- Exception = System.Net.WebException: The request
was aborted: Could not create SSL/TLS secure channel.
- ErrorDetails = Status: SecureChannelFailure
                     HttpStatusCode:
                     HttpStatusDescription:
                     ProcessedBody:
                         - Latency = 00:00:00.5617493
- RpcProxyUrl = https://mailserver21.
pocket-consultant.com:444/rpc/rpcproxy.dll?MailServer21.
pocket-consultant.com:6001
                         - ResponseStatusCode = <null>
                     RpcProxy connectivity verification failed.
FailureContext     : Status: SecureChannelFailure
                     HttpStatusCode:
                     HttpStatusDescription:
                     ProcessedBody:

As you can see from the output, the probe error details provide a lot of information regarding the exact problem that occurred. In this example, an RPC Proxy error occurred that prevented creation of a secure SSL/TLS channel. If this was a problem preventing access to the server or causing other issues, you would then know that you need to look at related components to continue your troubleshooting. You would look at the RPC, RPC Proxy, SSL and TLS configuration in Internet Information Services (IIS) in addition to the related settings in Exchange.

Tracing probe errors

Now that you know how to trace a reported problem to its source, let’s take a look at additional ways in which you can put this knowledge to use. You view the overall health of a server by using Get-ServerHealth. As discussed earlier in this chapter, if a health set has a status other than healthy or online, you can take a closer look at it by using the -HealthSet parameter. List the properties of the health set as shown in this example:

Get-ServerHealth -Identity MailServer42 -HealthSet FrontEndTransport | fl

The Name property in the output of Get-ServerHealth lists the name of the monitor reporting the health status. Table 9-1 lists the health sets associated with key Exchange features and components.

Table 9-1. Health sets associated with key Exchange features and components

FEATURE/COMPONENT

RELATED HEALTH SETS

ActiveSync

ActiveSync, ActiveSync.Protocol, ActiveSync.Proxy

Active Directory

AD

Anti-virus

Antimalware, AntiSpam

Autodiscover

Autodiscover, Autodiscover.Protocol, Autodiscover.Proxy

Mailbox databases

Clustering, Database, DataProtection, MailboxMigration, MailboxSpace, MRS, Store

Exchange Admin Center

ECP.Proxy

Exchange Web Services

EWS, EWS.Protocol, EWS.Proxy

Front End Transport Service

FrontendTransport

Transport Service

HubTransport, MailboxTransport, Transport, TransportSync

Offline Address Book

OAB, OAB.Proxy

Outlook, Outlook Web Access

Outlook, Outlook.Proxy, OWA.Protocol, OWA.Protocol.Dep, OWA.Proxy

Unified Messaging

UM.Callrouter, UM.Protocol

User Throttling

UserThrottling

You can quickly identify all the related probes, monitors, and responders for a health set by using Get-MonitoringItemIdentity. The basic syntax is:

Get-MonitoringItemIdentity -Identity HealthSetName -Server ServerName

HealthSetName identifies the health set to examine and ServerName is the name of an Exchange server. In the following example, you list items by type, item name, and target resource:

Get-MonitoringItemIdentity -Identity FrontEndTransport -Server mailserver21
| ft itemtype, name, targetresource

As shown in the following partial output, each associated probe, monitor, and responder is listed by name:

ItemType  Name                                           TargetResource
--------  ----                                           --------------
Probe     FrontendTransportServiceRunning      msexchangefrontendtransport
Probe     FrontendTransportRepeatedlyCrashing   msexchangefrontendtransport
Monitor   FrontendTransportServiceRunningMonitor
Monitor   FrontendTransportRepeatedlyCrashingMonitor
Responder FrontendTransportServiceRunningEscalateResponder     Transport
Responder FrontendTransportRepeatedlyCrashingResponder         Transport

If the name of the monitor reporting a status other than online or healthy is FrontendTransportRepeatedlyCrashingMonitor, you can analyze the problem by looking at errors for the FrontendTransportRepeatedlyCrashing probe. Collect events for this probe from the ProbeResult event log and filter them as discussed earlier in “Viewing error messages for probes.” Here is an example:

$Errors = (Get-WinEvent –ComputerName ServerName -LogName
 Microsoft-Exchange-ActiveMonitoring/ProbeResult -FilterXPath
"*[UserData[EventXML[ResultName='FrontendTransportRepeatedlyCrashing']
[ResultType='4']]]" | % {[XML]$_.toXml()}).event.userData.eventXml

ServerName is the name of the Client Access or Mailbox server with which you want to work. Remember, the result type can be 1 for timeout, 2 for poisoned, 3 for succeeded, 4 for failed, 5 for quarantined, or 6 for rejected.

After you filter the log, you can display the results you want to see, such as:

$Errors | select -Property *Time,Result*,Error*,*Context

Before you begin deeper troubleshooting, you might want to rerun the associated probe for the monitor to ensure it’s still not in a healthy or online state. You can rerun probes by using Invoke-MonitoringProbe. The basic syntax is:

Invoke-MonitoringProbe HealthSetNameProbeName -Server ServerName | fl

HealthSetName is the name of the health set with which to work, ProbeName is the name of the probe within the specified health set, and ServerName is the name of the Exchange server to check, such as:

Invoke-MonitoringProbe FrontEndTransport
FrontendTransportRepeatedlyCrashing -Server MailServer21 | fl

As shown in this partial sample of the output, the command returns a lot of information about the test:

Server             : MailServer21
MonitorIdentity    : FrontEndTransportFrontendTransportRepeatedlyCrashing
RequestId          : 84dc68cd-c2f8-487f-a5e2-20b43f6f9207
ExecutionStartTime : 7/2/2013 10:20:42 PM
ExecutionEndTime   : 7/2/2013 10:20:42 PM
Error              :
Exception          :
PoisonedCount      : 0
ExecutionId        : 18902819
SampleValue        : 2015
ExecutionContext   :
FailureContext     :
ExtensionXml       :
ResultType         : Succeeded
RetryCount         : 0
ResultName         : 84dc68cdc2f8487fa5e220b43f6f9207-
FrontendTransportRepeatedlyCrashing
IsNotified         : False
ResultId           : 1289896134
ServiceName        : InvokeNow
StateAttribute1    : No relevant crash events found for service

The ResultType value in the output will tell you whether the probe succeeded or failed. If the probe succeeded, the problem no longer exists. If the probe fails, the problem still exists and you’ll need to continue trying to diagnose and resolve it. Step-by-step procedures for troubleshooting issues with Exchange services was provided in Chapter 6; specifically, see the Troubleshooting Outlook Web App and Working with virtual directories and web applications sections.

Using Log Parser Studio

Log Parser Studio is a graphical interface for Log Parser. Both tools are excellent for processing log files and have been extended specifically to analyze the Exchange protocol logs.

Getting started with Log Parser Studio

Before you can parse and analyze Exchange logs, you’ll need to install Log Parser and then add Log Parser Studio. At the time of this writing, the current version of Log Parser was available at http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=24659 and the current version of Log Parser Studio was available at http://gallery.technet.microsoft.com/Log-Parser-Studio-cd458765. After you install Log Parser, you can run Log Parser Studio.

Log Parser Studio runs from an executable named LPS.exe. Unless you copy logs to folders that you can access with standard user privileges, you’ll usually want to run Log Parser Studio with elevated, administrator privileges. To do this, press and hold or right-click the executable and then select Run As Administrator.

When you run Log Parser Studio, you’ll see dozens of preloaded queries that can be used to examine various Exchange protocols and other protocols. As shown in Figure 9-3, queries begin with a prefix that identifies the protocol they examine, including:

A screen shot of Log Parser Studio, showing queries available on the Library tab.
Figure 9-3. Viewing the query library in Log Parser Studio.
  • ActiveSync and ActiveSync Proxy for analyzing Exchange ActiveSync and the Exchange ActiveSync Proxy.

  • CAS and CAS-Proxy for analyzing requests related to Client Access server protocols and proxies.

  • ECP for analyzing requests related to Exchange Admin Center.

  • EWS for analyzing requests related to Exchange Web Services.

  • ExRCA for tracking requests made by the Exchange Remote Connectivity Analyzer.

  • OWA for analyzing requests related to Outlook Web Access.

  • Windows PowerShell for analyzing requests related to the remote Windows PowerShell gateway.

Performing queries in Log Parser Studio

Log Parser Studio is designed to run queries against several different types of logs, including Event Viewer logs, Exchange protocol logs, and IIS protocol logs. Queries in Log Parser Studio are listed by name, description, query, and log type.

Before you can run a query in Log Parser Studio, you must specify the folders and types of logs with which to work. Keep the following in mind:

  • Logging for protocols and services that run on top of IIS are handled by IIS and these logs have the log type IISW3CLOG. By default, IIS logs are stored in the %SystemDrive%inetpublogsLogFiles folder.

  • Logging for Exchange services and components is performed by Exchange, and these logs have the type EELLOG or EELXLOG. By default, Exchange logs are stored within the Logging folder under the %ExchangeInstallPath%.

  • Logging is also performed by the operating system and these logs have the type EVTLOG. By default, Windows logs are stored in the %SystemRoot%System32winevtLogs folder.

In Log Parser Studio, you can specify the logs with which to work and their type by completing these steps:

  1. Select the Choose Log… button on the toolbar.

  2. In the Log File Manager dialog box, select Add Folder. Adding folders ensures any available log in the folder can be used.

  3. In the Add Folder dialog box, navigate to the folder with which you want to work, such as %SystemDrive%inetpublogsLogFiles.

  4. Next, select a log with the log type you want to use, and then select Open. When you select a log, Log Parser Studio tries to automatically detect the log type. If Log Parser Studio can’t detect the log type, you’ll need to select the log type when prompted.

  5. Select OK.

You can run queries against all logs of the selected type in the selected folder. To run a query, double-tap or double-click the query on the Library tab to open the query in a new tab. On the new query tab, select Execute Active Query to run the query. How long it takes to run a query depends on the size and number of logs in the specified folder or folders. When Log Parser Studio finishes analyzing the logs, you’ll see the results and can use this information for troubleshooting. Figure 9-4 shows the results of a sample query.

A screen shot of the results of a query in Log Parser Studio, showing who recently accessed Outlook Web Access.
Figure 9-4. Viewing the results of a query in Log Parser Studio.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset