Microsoft Exchange Server 2013 is critically important to your organization, and to be a successful Exchange administrator, you need to know how to diagnose and resolve problems as quickly as possible. Throughout this book, I’ve discussed techniques you can use to configure, maintain, and troubleshoot Exchange Server 2013. In this chapter, I discuss additional techniques you can use to perform comprehensive troubleshooting.
Client Access and Mailbox servers running Exchange 2013 can experience many types of issues that require troubleshooting to resolve. These issues can range from performance problems, to denied logins, to service outages. To help you resolve problems as they occur, you need a solid understanding of Exchange architecture, which I’ve covered throughout this book as part of the core discussion. Now let’s look at architecture components specific to maintaining, diagnosing, and resolving Exchange services.
In Exchange Server 2013, the Managed Availability architecture is used to automatically detect and correct many types of system problems with a goal of helping to ensure the overall availability of Exchange services. Managed Availability is implemented as part of both the Client Access server role and the Mailbox server role. All servers running Exchange 2013 have this architecture.
As part of Managed Availability, hundreds of probes, monitors, and responders are running constantly on Exchange 2013 to analyze, monitor, and maintain services. If a problem is identified, it often can be fixed automatically. Figure 9-1 provides an overview of how Managed Availability works. Managed Availability has three asynchronous components:
Probe engine. Takes measurements on the server and collects data samples. The collected data flows to the monitor engine.
Monitor engine. Uses the measurements and collected data to determine the status of Exchange services and components. The processed data flows to the responder engine.
Responder engine. Takes recovery actions based on unhealthy states reported by the monitor engine. If automated recovery is unsuccessful, escalates by issuing event log notifications.
By delving deeper into the Managed Availability architecture, you can get a better understanding of how the automated monitoring and response processes work. As Figure 9-2 shows, the workflow has three phases:
Sampling. The probe engine checks the state of Exchange services and components according to specific probes. Each probe has a top-level identifier and one or more related probe definitions. Each probe definition identifies the name of the associated probe, the health set to which the probe belongs, the target resource being tracked, a recurrence interval, and a timeout value.
Detection. The monitor engine analyzes the sampled data and issues alerts related to changes in the state of Exchange services and components according to specific monitors. Each monitor has a top-level identifier and one or more related monitor definitions. Each monitor definition identifies the name of the associated monitor, the health set to which the monitor belongs, and a sample mask that specifies the top level identifier for related probes.
Recovery. The responder engine responds to unhealthy states identified in alerts. Each responder has an associated responder definition that identifies the recovery action to be taken, the name of the responder, the target resource that will be acted on, and an alert mask that specifies the top-level identifier for related monitors.
Rather than list each associated monitor or probe, Managed Availability components use name masking. Here, a top-level identifier is provided and then used as a mask to identify the related monitors and probes.
Collections of monitors are grouped together in health sets. Exchange 2013 has health sets for everything from Microsoft ActiveSync to User Throttling. Each health set has a number of associated monitors. As part of automated recovery, responders use the alerts issued by monitors to take recovery actions. There are three levels of recovery:
Tier 1. Provides the initial recovery response. As an initial response to an unhealthy state, responders typically will try to restart the service that uses the affected components.
Tier 2. Provides more advanced and customized recovery response. If restarting the service doesn’t resolve the issue, the monitor state is escalated to the next level. The action or actions taken at this level to recover depend on the component but could include failover, bug checking, re-initialization of components to bring them back online, and more.
Tier 3. Uses the escalate responder to issue event log notifications regarding the problem. If you’ve installed the Exchange Server 2013 Management Pack, escalated issues are sent to Microsoft System Center Operations Manager via the event logs as well.
Although designed to resolve many typical problems, Managed Availability cannot resolve every problem, and this escalation is built into the architecture. As part of diagnosing and resolving problems, you can check the status of monitors and health sets by using:
Get-HealthReport. Details the state and health of Exchange resources, monitors, and services
Get-HealthReport -Identity ServerID [-GroupSize SizeOfRollup] [-HaImpactingOnly <$true | $false>] [-HealthSet HealthSet] [-MinimumOnlinePercent MinToDegraded>] [-RollupGroup <$true | $false>]
Get-ServerHealth. Returns the state of monitored resources in addition to alert values
Get-ServerHealth -Identity ServerID [-HaImpactingOnly <$true | $false>] [-HealthSet HealthSet]
To check the state of resources, enter the following command:
Get-ServerHealth -Identity ServerID
ServerID is the host name or fully qualified name of the Exchange server to check, such as:
Get-ServerHealth -Identity MailServer42
In the following sample, I’ve omitted the server name and server component columns from the default output:
State Name TargetResource HealthSetName AlertValue ----- ----- -------------- -------------- ---------- Online AutodiscoverProxy... MSExchangeAutoDis... Autodiscover... Healthy Online ActiveSyncProxyTe... MSExchangeSyncApp... ActiveSync.P... Healthy Repairing ECPProxyTestMonitor MSExchangeECPAppPool ECP.Proxy Unhealthy
Often when you work with Exchange Management Shell, you’ll find that the output is too long for the default screen buffer size or that the output has too many columns for the default window size. Because of this, I prefer to use a screen buffer height of 2,999 and width of 120, along with a window width of 120 and height of 74. This makes Exchange Management Shell easier to work with. If you are using Windows 8 or Windows Server 2012, you’ll find that you can’t customize all of these settings from the Start screen. Instead, press and hold or right-click the tile for the shell on the Start screen, and then select Open File Location. This opens File Explorer to the folder in which the shortcut for Exchange Management Shell is located. Press and hold or right-click this shortcut, and then select Properties. In the Properties dialog box, you’ll then be able to use the options on the Layout tab to customize the shell.
From the State value, you can determine the online status of a monitored resource that is used for transport, connections, or communications. State values you might see include:
Online. All the components of the monitored resource are online.
Partially Online. Some of the components of the monitored resource are not online.
Offline. All the components of the monitored resource are offline.
Sidelined. The monitored resource is sidelined and might not be in a fully online state.
Functional. The monitored resource is functional but might not be in a fully online state.
NotApplicable. An online or offline status is not applicable to this monitored resource.
Unavailable. The monitored resource is unavailable.
From the alert value, you can determine the general health status of a monitored resource. Alert values you might see include:
Healthy. All the components of the monitored resource are healthy.
Degraded. Some of the components of the monitored resource are not healthy.
Disabled. The components of the monitored resource have been disabled.
Unhealthy. All the components of the monitored resource are not healthy.
Sidelined. The monitored resource is sidelined and might not be in a fully healthy state.
Repairing. The monitored resource is functional but is recovering from a degraded or unhealthy state.
Unavailable. The monitored resource is unavailable.
Uninitialized. The monitored resource hasn’t been initialized.
If a health set has a status other than healthy or online, you can take a closer look at it by using the -HealthSet parameter. List the properties of the health set as shown in this example:
Get-ServerHealth -Identity MailServer42 -HealthSet ECP.Proxy | fl
You can get a formatted list of every monitor, target resource, and its related health set by entering the following command:
Get-ServerHealth localhost | ft name,targetresource,healthsetname
The output lists the name of the monitor, the target resource, and the name of the corresponding health set. You can store the output for later reference by redirecting the output to a file. In the following example, c:data is the name of an existing folder, and Healthset-Reference.txt is the name of the file to create:
(get-serverhealth localhost|ft name,targetresource,healthsetname) > c:datahealthset-reference.txt
The output will look similar to the following:
Name TargetResource HealthSetName ---- -------------- ------------- ActiveSyncV2CTPMonitor ActiveSync ActiveSync ActiveSyncCTPMonitor ActiveSync ActiveSync ActiveSyncV2DeepTestMonitor ActiveSync ActiveSync.Protocol ActiveSyncDeepTestMonitor ActiveSync ActiveSync.Protocol
Whenever you are trying to diagnose and resolve problems with Exchange 2013, you need to keep in mind how user and workload throttling might be affecting performance. All users with mailboxes on servers running Exchange 2013 are subject to user throttling policy.
The default user throttling policy is named the Global Throttling Policy. As the name implies, this policy has global scope and applies throughout the organization. User throttling policies also can have organization and regular scope. If you want to configure user throttling, you should create policies with these scopes rather than modify the Global Throttling Policy.
You can list currently defined user throttling policies by entering Get-ThrottlingPolicy at the shell prompt. To create and manage user throttling policies, you can use New-ThrottlingPolicy, Set-ThrottlingPolicy, and Remove-ThrottlingPolicy. You can view throttling policies assigned to users by using Get-ThrottlingPolicyAssociation, and assign user throttling policies to users by using Set-ThrottlingPolicyAssociation.
In addition to user throttling, Exchange Server manages workloads for protocols, features, and services using workload throttling policy. Workloads are automatically throttled to prevent overuse of system resources and to try to ensure managed resources maintain a healthy state.
Each defined workload has an associated policy and classification. Workload policies are used to enable and configure workloads. Workload classifications set the default priority of the workload. Classifications that can be assigned to workloads include:
Urgent
Customer Expectation
Internal Maintenance
Discretionary
You can view the current workload policies and their associated workload classifications by entering Get-WorkloadPolicy at the Shell prompt. To create and manage workload policies, you can use New-WorkloadPolicy, Set-WorkloadPolicy, and Remove-WorkloadPolicy.
Managed resources have health indicators and resource thresholds. Health indicators are used to measure the relative health of the workload in terms of the resources used. Health indicators tracked include:
Percent CPU utilization
Mailbox database RPC latency
Mailbox database replication health
Content indexing age of last notification
Content indexing retry queue size
Resource thresholds are used to configure usage limits for a system resource. Within each workload classification, one of three thresholds can be assigned: underloaded, overloaded, or critical. As an example:
Discretionary workloads are considered underloaded at 70 percent utilization, overloaded at 80 percent utilization, and critical at 100 percent utilization.
Internal Maintenance workloads are considered underloaded at 75 percent utilization, overloaded at 85 percent utilization, and critical at 100 percent utilization.
Customer Expectation workloads are considered underloaded at 80 percent utilization, overloaded at 90 percent utilization, and critical at 100 percent utilization.
You can view the current resource threshold settings for each workload classification by entering the following command:
Get-ResourcePolicy | fl
To create and manage resource policies, you can use New-ResourcePolicy, Set-ResourcePolicy, and Remove-ResourcePolicy. After you’ve defined custom workload and resource policies, you can create a policy object based on a particular policy by using New-WorkloadManagementPolicy. You then assign the workload management policy to a server by using Set-ExchangeServer with the –WorkloadManagementPolicy and –Server parameters.
As part of your standard operating procedures, you should track changes in the configuration of your Exchange servers. The Exchange Management Shell provides the following cmdlets for obtaining detailed information on the current configuration of your Exchange servers:
Get-ClientAccessServer. Displays configuration details for servers with the Client Access server role
Get-ExchangeServer. Displays the general configuration details for Exchange servers
Get-MailboxServer. Displays configuration details for servers with the Mailbox server role
Get-OrganizationConfig. Displays summary information about your Exchange organization
Get-TransportService. Displays configuration details for servers with the Mailbox or Edge Transport server role
To get related details for a specific server, you pass the Get-TransportService cmdlet the identity of the server you want to work with, as shown in the following example:
Get-TransportService mailserver36 | fl
To get related details for all servers, omit the –Identity parameter, as shown in the following example:
Get-TransportService | fl
When you finalize the configuration of your Exchange servers, you should use these cmdlets to store the configuration details for each server role. To store the configuration details in a file, redirect the output to a file, as shown in the following example:
Get-TransportService mailserver36 | fl > c:SavedConfigs ransport2014-0211.txt
If you then store the revised configuration, any time you make significant changes you can use this information during troubleshooting to help resolve problems that might be related to configuration changes. To compare two configuration files, you can use the file compare command, fc, at an elevated, administrator command prompt. When you use the following syntax with the fc command, the output is the difference between two files:
fc FilePath1 FilePath2
FilePath1 is the full file path to the first file and FilePath2 is the full file path to the second file. Here is an example:
fc c:SavedConfigs ransport2014-0211.txt c:SavedConfigs transport2014-0221.txt
Because the files contain configuration details for specific dates, the changes shown in the output represent the configuration changes that you’ve made to the server.
As part of troubleshooting, you’ll often want to determine the status of required services, which can be done by using Test-ServiceHealth. The basic syntax is:
Test-ServiceHealth [-Server ServerName]
ServerName is the name of the server to test. If you omit a server name, the local server is tested. As shown in the following sample output, Test-ServiceHealth shows you which required services are running and which aren’t:
Role : Mailbox Server Role RequiredServicesRunning : True ServicesRunning : {IISAdmin, MSExchangeADTopology, MSExchangeDelivery, MSExchangeIS, MSExchangeMailboxAssistants, MSExchangeRepl, MSExchangeRPC, MSExchangeServiceHost, MSExchangeSubmission, MSExchangeThrottling, MSExchangeTransportLogSearch, W3Svc, WinRM} ServicesNotRunning : {} Role : Client Access Server Role RequiredServicesRunning : True ServicesRunning : {IISAdmin, MSExchangeADTopology, MSExchangeMailboxReplication, MSExchangeRPC, MSExchangeServiceHost, W3Svc, WinRM} ServicesNotRunning : {} Role : Unified Messaging Server Role RequiredServicesRunning : True ServicesRunning : {IISAdmin, MSExchangeADTopology, MSExchangeServiceHost, MSExchangeUM, W3Svc, WinRM} ServicesNotRunning : {} Role : Hub Transport Server Role RequiredServicesRunning : True ServicesRunning : {IISAdmin, MSExchangeADTopology, MSExchangeEdgeSync, MSExchangeServiceHost, MSExchangeTransport, MSExchangeTransportLogSearch, W3Svc, WinRM} ServicesNotRunning : {}
The server in this example has the Client Access server role and the Mailbox server role installed. Although Exchange 2013 no longer has separate UM and Hub Transport roles, Test-ServiceHealth continues to list separately the related required services and their status.
As part of troubleshooting, you’ll often need to test mail flow and replication. If you suspect a problem with mailflow, you can quickly send a test message by using Test-Mailflow. This cmdlet verifies whether mail can be successfully sent from and delivered to the system mailbox as well as whether email is sent between Mailbox servers within a defined latency threshold.
To test mail flow from one mailbox server to another or from one mailbox server to a target mailbox database, you can use the following syntax:
Test-MailFlow -Identity OriginatingMailServer [-TargetMailboxServer DestinationMailServer | -TargetDatabase DestinationDatabase]
In the following example, a test message is sent from MailboxServer18 to MailboxServer96:
Test-MailFlow -Identity MailboxServer18 -TargetMailboxServer MailboxServer96
As shown in this sample, the output of the command tells you whether the message was sent and received successfully:
TestMailflowResult : Success MessageLatencyTime : 00:00:04.0077377 IsRemoteTest : False Identity : IsValid : True ObjectState : New
If you suspect a problem with replication, you can quickly determine the status of replication components by using Test-ReplicationHealth. This cmdlet checks the status of all aspects of replication, replay, and availability on a Mailbox server in a Database Availability group. Use Test-ReplicationHealth to help you monitor the status of continuous replication, availability of Active Manager, and the general status of availability components.
The basic syntax is:
Test-MailFlow [-Identity MailboxServerId]
Such as:
Test-MailFlow MailServer42
As shown in this sample, the output of the command tells you the status of each replication component on the Mailbox server:
Server Check Result Error ------ ----- ------ ----- MAILSERVER42 ReplayService Passed MAILSERVER42 ActiveManager Passed MAILSERVER42 TasksRpcListener Passed MAILSERVER42 DatabaseRedundancy *FAILED* Failures:... MAILSERVER42 DatabaseAvailability *FAILED* Failures:...
If errors are found, you’ll want to get more details by formatting the output in a list, such as:
Test-MailFlow MailServer42 | fl server, check*, result, error
The error details should help you identify the problem. In this example, the Mailbox database doesn’t have enough copies to be fully redundant:
Server : MAILSERVER42 Check : DatabaseRedundancy CheckDescription : Verifies that databases have sufficient redundancy. If this check fails, it means that some databases are at risk of losing data. Result : *FAILED* Error : Failures: There were database redundancy check failures for database 'Engineering Mailbox Database' that may be lowering its redundancy and putting the database at risk of data loss. Redundancy Count: 1. Expected Redundancy Count: 2.
In this example, the Engineering Mailbox Database does not have enough copies for full redundancy. This could be because an administrator forgot to make a passive copy of the database or because a Mailbox server hosting a copy of the database is offline or otherwise unavailable.
Other useful cmdlets for checking the Exchange organization include:
Test-ActiveSyncConnectivity. Performs a full synchronization against a specified mailbox to test the configuration of Exchange ActiveSync
Test-ArchiveConnectivity. Verifies archive functionality for a mailbox user
Test-AssistantHealth. Verifies that the Exchange Mailbox Assistant service is running as expected
Test-CalendarConnectivity. Verifies that calendar sharing as part of Outlook Web App is working properly
Test-EcpConnectivity. Verifies that the Exchange Admin Center is running as expected
Test-EdgeSynchronization. Verifies that the subscribed Edge Transport servers have a current and accurate synchronization status
Test-ExchangeSearch. Verifies that Exchange Search is currently enabled and is indexing new email messages in a timely manner
Test-FederationTrust. Verifies that the federation trust is properly configured and functioning as expected
Test-FederationTrustCertificate. Verifies the status of certificates used for federation on all Mailbox and Client Access servers
Test-ImapConnectivity. Verifies that the IMAP4 service is running as expected
Test-IPAllowListProvider. Verifies the configuration for a specific IP allow list provider
Test-IPBlockListProvider. Verifies the configuration for a specific IP block list provider
Test-IRMConfiguration. Verifies Information Rights Management (IRM) configuration and functionality
Test-MapiConnectivity. Verifies server functionality by logging on to the mailbox that you specify
Test-MRSHealth. Verifies the health of the Microsoft Exchange Mailbox Replication Service
Test-OAuthConnectivity. Verifies that OAuth authentication is working properly
Test-OutlookConnectivity. Verifies end-to-end Microsoft Outlook client connectivity and also tests for Outlook Anywhere (RPC/HTTP) and TCP-based connections
Test-OutlookWebServices. Verifies the Autodiscover service settings for Outlook
Test-OwaConnectivity. Verifies that Outlook Web App is running as expected
Test-PopConnectivity. Verifies that the POP3 service is running as expected
Test-PowerShellConnectivity. Verifies whether Windows PowerShell remoting on the target Client Access server is functioning correctly
Test-SenderId. Verifies whether a specified IP address is the legitimate sending address for a specified SMTP address
Test-SmtpConnectivity. Verifies SMTP connectivity for a specified server
Test-UMConnectivity. Verifies the operation of a computer that has the Unified Messaging installed
Test-WebServicesConnectivity. Verifies the functionality of Exchange Web Services
As discussed previously in this chapter in the Troubleshooting essentials section, you can use Get-ServerHealth to list monitors, target resources, and corresponding health sets. Knowing which monitor, target resource, and health set you want to work with is important for troubleshooting. To diagnose and resolve problems, you often need to work backward from the reported problem to the source of the problem, as shown here:
Find recovery actions.
Trace recovery actions to their responder.
Use the responses logged by a responder to find the related monitor.
Find the probes for a monitor.
Locate the error messages being logged by probes.
Verify probe errors still exist.
The sections that follow examine the related procedures.
During recovery, the responder engine uses responders to take appropriate recovery actions, based on the type of alert and the affected target resource. Whenever a responder takes a recovery action, it logs related events in the Microsoft.Exchange.ManagedAvailability/RecoveryActionResults event log. An entry with an event ID of 500 indicates that a recovery action has started. An entry with an event ID of 501 indicates that the recovery action was completed.
Although you can view the events in Event Viewer, you can also view them at the Shell prompt. To collect the events in the RecoveryActionResults event log so you can process them, enter the following commands:
$Results = Get-WinEvent –ComputerName ServerName
-LogName Microsoft-Exchange-ManagedAvailability/RecoveryActionResults
$ResultsXML = ($Results | Foreach-object
-Process {[xml]$_.toXml()}).event.userData.eventXml
ServerName is the name of the Client Access or Mailbox server that you want to work with. The first command collects the events. The second command formats the event entries so that they are easier to work with. These commands can be combined and shortened to:
$ResultsXML = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ManagedAvailability/RecoveryActionResults |
% {[xml]$_.toXml()}).event.userData.eventXml
Next, you need to identify a response that you want to look at more closely. If you want to review corrective actions taken by Managed Availability, you’d look for events that occurred today and completed successfully. The following example parses the previously collected event data and looks for events from 2013-07-01 that have a successful result:
$ResultsXML | Where-Object {$_.Result -eq "Succeeded" -and $_.EndTime -like "2013-07-01*"}| ft -AutoSize StartTime,RequestorName
As shown in this example, you also could look for events that occurred but where the responder failed to correct the issue:
$ResultsXML | Where-Object {$_.Result -eq "Failed" -and $_.EndTime -like "2013-07-01*"}| ft -AutoSize StartTime,RequestorName
With either approach, you’ll then get a list of issues by start time and requestor name, such as:
StartTime RequestorName --------- ------------- 2013-07-01t21:00:10.1008312Z SearchLocalCopyStatusRestartSearchService 2013-07-01t21:00:06.1162578Z RWSProxyTestRecycleAppPool 2013-07-01t21:00:00.4597184Z ClusterEndpointRestart 2013-07-01t20:59:36.1601996Z RWSProxyTestRecycleAppPool 2013-07-01t20:57:17.8657794Z OutlookSelfTestRestart 2013-07-01t20:58:03.7958299Z RWSProxyTestRecycleAppPool 2013-07-01t20:55:24.6591276Z ServiceHealthActiveManagerRestartService 2013-07-01t20:57:11.2223574Z ClusterEndpointRestart 2013-07-01t20:55:06.9326525Z OutlookSelfTestRestart 2013-07-01t20:57:02.6438007Z RWSProxyTestRecycleAppPool 2013-07-01t20:54:34.5391633Z OutlookMailboxDeepTestRestart 2013-07-01t20:56:32.4360908Z RWSProxyTestRecycleAppPool 2013-07-01t20:54:41.4926429Z ClusterEndpointRestart 2013-07-01t20:53:34.1596832Z ActiveDirectoryConnectivityRestart 2013-07-01t20:52:11.0579430Z ClusterEndpointRestart
In this example, the value in the RequestorName column is the responder that took the action. To examine the properties of a recovery action, run a query for a specific responder, such as:
$ResultsXML | Where-Object {$_.Result -eq "Failed" -and $_.EndTime -like "2013*" -and $_.RequestorName -eq "OutlookSelfTestRestart"}| fl
The output includes the details logged for events in which the recovery action initiated by the OutLookSelfTestRestart responder failed. Each entry will look similar to the following:
auto-ns2 : http://schemas.microsoft.com/win/2004/08/events xmlns : myNs Id : RestartService InstanceId : 130629.015717.86577.001 ResourceName : MSExchangeRPC StartTime : 2013-07-01T20:57:17.8657794Z EndTime : 2013-07-01T20:59:19.4994266Z State : Finished Result : Failed RequestorName : OutlookSelfTestRestart ExceptionName : TimeoutException ExceptionMessage : System error. Context : [null] CustomArg1 : [null] CustomArg2 : [null] CustomArg3 : [null] LamProcessStartTime : 7/01/2013 1:12:28 PM
Although the responder name and details will often help you identify the type of problem that occurred, you can keep working toward the exact problem that occurred by finding the monitor that triggered the responder.
Whenever the Health Manager service starts, it logs related events in the Microsoft.Exchange.ActiveMonitoring/ResponderDefinition event log that you can use to get properties of responders. To collect the events in the Responder-Definition event log so that you can process them, enter the following command:
$Responders = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/ResponderDefinition | %
{[xml]$_.toXml()}).event.userData.eventXml
ServerName is the name of the Client Access or Mailbox server with which you want to work. If you examine the definition of a responder, the AlertMask property will identify the monitor associated with the responder. Thus, one way to display the required information is to look for the responder and list the responder name and the associated alert mask in the output as shown in this example:
$Responders | ? {$_.Name –eq "OutlookSelfTestRestart"} | ft name, alertmask
The output will then be similar to the following:
Name AlertMask ---- --------- OutlookSelfTestRestart OutlookSelfTestMonitor OutlookSelfTestRestart OutlookSelfTestMonitor
You’ll know the related monitor is named OutlookSelfTestMonitor. Before examining the related monitor, you might want to display the full details for the responder to help you understand exactly how the responder works. To display the full details for a responder, simply list its properties in a formatted list as shown in this example:
$Responders | ? {$_.Name –eq "OutlookSelfTestRestart"} | fl
During recovery, the responder engine uses responders to take appropriate recovery actions based on the alert type and the affected target resource. The wait interval specifies the minimum amount of time a responder must wait before running again. As shown in this partial output, the definition details can help you learn more about the responder:
Id : 452 AssemblyPath : C:Program FilesMicrosoftExchange ServerV15BinMicrosoft.Exchange.Monitoring.ActiveMonitoring .Local.Components.dll TypeName : Microsoft.Exchange.Monitoring .ActiveMonitoring.Responders.ResetIISAppPoolResponder Name : OutlookSelfTestRestart WorkItemVersion : [null] ServiceName : Outlook.Protocol DeploymentId : 0 ExecutionLocation : [null] CreatedTime : 2013-07-01T20:02:32.2527661Z Enabled : 1 TargetResource : MSExchangeRpcProxyAppPool RecurrenceIntervalSeconds : 0 TimeoutSeconds : 300 StartTime : 2013-07-01T20:02:32.2527661Z UpdateTime : 2013-07-01T17:55:07.9754209Z MaxRetryAttempts : 3 ExtensionAttributes : <ExtensionAttributes AppPoolName= "MSExchangeRpcProxyAppPool" MinimumSecondsBetweenRestarts="300" MaximumAllowedRestartsInAnHour="3" MaximumAllowedRestartsInADay="-1" DumpOnRestart="FullDump" DumpPath="C:Program FilesMicrosoftExchange ServerV15Dumps" MinimumFreeDiskPercent="15" MaximumDumpsPerDay="9" MaximumDumpDurationInSeconds="180" /> AlertMask : OutlookSelfTestMonitor WaitIntervalSeconds : 30 MinimumSecondsBetweenEscalates : 0 NotificationServiceClass : 0 AlwaysEscalateOnMonitorChanges : 0
Monitor definitions are written in the Microsoft.Exchange.ActiveMonitoring/Monitor-Definition event log. If you examine the properties of events, you can learn more about monitors and learn their related probes. To collect the events in the Monitor-Definition event log so that you can process them, enter the following command:
$Monitors = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/MonitorDefinition | %
{[xml]$_.toXml()}).event.userData.eventXml
ServerName is the name of the Client Access or Mailbox server with which you want to work. If you examine the definition of a monitor, the SampleMask property will identify the probes associated with the monitor. List the monitor name and the associated sample mask in the output as shown in this example:
$Monitors | ? {$_.Name –eq "OutlookSelfTestMonitor"} | ft name, samplemask
The output will then be similar to the following:
Name AlertMask ---- --------- OutlookSelfTestMonitor OutlookSelfTestProbe
As shown in the output, probes related to this monitor have the top-level identifier: OutlookSelfTestProbe. To display the full details for a monitor, simply list its properties in a formatted list as shown in this example:
$Monitors | ? {$_.Name –eq "OutlookSelfTestMonitor"} | fl
During detection, the monitor engine uses monitors to analyze the sampled data. Whether a monitor issues an alert depends on the state of the target resource. As shown in this partial output, the monitor details provide a lot of information, including the exact definition of each transition state for the monitor:
Id : 339 AssemblyPath : C:Program FilesMicrosoftExchange ServerV15BinMicrosoft.Exchange.Monitoring.ActiveMonitoring.Local. Components.dll TypeName : Microsoft.Exchange.Monitoring. ActiveMonitoring .ActiveMonitoring.Monitors .OverallConsecutiveProbeFailuresMonitor Name : OutlookSelfTestMonitor WorkItemVersion : [null] ServiceName : Outlook.Protocol DeploymentId : 0 ExecutionLocation : [null] CreatedTime : 2013-07-01T20:02:32.2215111Z Enabled : 1 RecurrenceIntervalSeconds : 0 TimeoutSeconds : 30 StartTime : 2013-07-01T20:02:32.2215111Z UpdateTime : 2013-07-01T19:59:57.2971492Z MaxRetryAttempts : 0 ExtensionAttributes : [null] SampleMask : OutlookSelfTestProbe MonitoringIntervalSeconds : 300 MinimumErrorCount : 0 MonitoringThreshold : 2 SecondaryMonitoringThreshold : 0 ServicePriority : 0 ServiceSeverity : 0 IsHaImpacting : 1 CreatedById : 0 InsufficientSamplesIntervalSeconds : 28800 StateAttribute1Mask : [null] FailureCategoryMask : 0 ComponentName : ServiceComponents/ Outlook.Protocol/Critical StateTransitionsXml : <StateTransitions> <Transition ToState="Degraded" TimeoutInSeconds="0" /> <Transition ToState="Degraded1" TimeoutInSeconds="10" /> <Transition ToState="Degraded2" TimeoutInSeconds="240" /> <Transition ToState="Unhealthy" TimeoutInSeconds="300" /> <Transition ToState="Unhealthy1" TimeoutInSeconds="600" /> <Transition ToState="Unrecoverable" TimeoutInSeconds="1200" /> </StateTransitions> Version : 65536
To identify the probes associated with the OutlookSelfTestProbe identifier, you need to examine the probe definitions. Probe definitions are written in the Microsoft.Exchange.ActiveMonitoring/ProbeDefinition event log. If you examine the properties of events, you can learn more about each probe. To collect the events in the ProbeDefinition event log so that you can process them, enter the following command:
$Probes = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/ProbeDefinition | %
{[xml]$_.toXml()}).event.userData.eventXml
ServerName is the name of the Client Access or Mailbox server with which you want to work. Next, examine the associated probes to learn more about them as shown in this example:
$Probes | ? {$_.Name –eq "OutlookSelfTestProbe"} | fl
The output will then list the definition of each associated probe. Although many monitors have many associated probes, the OutlookSelfTestMonitor has only one associated probe. In this partial sample of the output, note the recurrence interval, timeout, and max retry values for this probe:
Id : 106 AssemblyPath : C:Program FilesMicrosoftExchange ServerV15BinMicrosoft.Exchange.Monitoring.ActiveMonitoring .Local.Components.dll TypeName : Microsoft.Exchange.Monitoring.ActiveMonitoring .RpcClientAccess.LocalRpcProbe+SelfTest Name : OutlookSelfTestProbe WorkItemVersion : [null] ServiceName : Outlook.Protocol DeploymentId : 0 ExecutionLocation : [null] CreatedTime : 2013-07-01T20:02:32.2058880Z Enabled : 1 RecurrenceIntervalSeconds : 10 TimeoutSeconds : 8 StartTime : 2013-07-01T20:02:41.2215111Z UpdateTime : 2013-07-01T19:59:57.2190196Z MaxRetryAttempts : 0 ExtensionAttributes : <ExtensionAttributes AccountLegacyDN=" /o=First Organization/ou=Monitoring Mailboxes/cn=Recipients /cn=HealthMailbox3d899a319e1e4c019f5362ead47f0185" PersonalizedServerName="278c17fc-8adc-49d7-affa-90f0ea7679b6@ pocket-consultant.com" StartupNotificationId="MSExchangeRPC" StartupNotificationMaxStartWaitInSeconds="12 /> CreatedById : 0 Account : <r at="Kerberos" ln="POCKET-CONSULTASM_ fef8fb0aaba040c19"><s>S-1-5-21-1487214957-3235876329- 1606252878-1151</s><s a="7" t="1"> S-1-5-21-1487214957-3235876329-1606252878-513</s> <s a="7" t="1">S-1-1-0</s><s a="7" t="1">S-1-5-2</s> <s a="7" t="1">S-1-5-11</s><s a="7" t="1">S-1-5-15</s> <s a="3221225479" t="1">S-1-5-5-0-8194354</s><s a="7" t="1"> S-1-18-2</s></r> AccountDisplayName : HealthMailbox3d899a319e1e4c019f5362ead47f0185 Endpoint : MailServer21.pocket-consultant.com SecondaryAccount : [null] SecondaryAccountDisplayName : [null] SecondaryEndpoint : MailServer21.pocket-consultant.com ExtensionEndpoints : [null] Version : 65536 ExecutionType : 0
During sampling, the probe engine runs probes against target resources. How often a probe runs depends on its recurrence interval. How long a probe waits before reporting failure depends on its timeout value. Also listed in the output is the system account under which the probe runs and the authentication method used for that account.
After you know which probes are associated with the issue you are tracking, you can get the error messages for the probes. Probe results are written in the Microsoft.Exchange.ActiveMonitoring/ProbeResult event log. As this log is quite extensive, you want to filter the logs for the exact information you are seeking. Properties for related events include:
ResultName. Identifies the name of the probe. When there are multiple probes for a monitor the name includes the monitor’s sample mask and the resource it verifies.
Error. Lists the error returned by this probe, if it failed.
Exception. Lists the call stack of the error, if it failed.
ResultType. Lists an integer value that indicates the result type: 1 for timeout, 2 for poisoned, 3 for succeeded, 4 for failed, 5 for quarantined, and 6 for rejected.
ExecutionStartTime. Lists when the probe started.
ExecutionEndTime. Lists when the probe completed.
ExecutionContext. Provides additional information about the probe’s execution context.
FailureContext. Provides additional information about the probe’s failure.
Knowing this, you can collect the events in the ProbeResult event log and filter them. In this example, you look for failure results related to OutlookSelfTestProbe:
$Errors = (Get-WinEvent –ComputerName ServerName -LogName
Microsoft-Exchange-ActiveMonitoring/ProbeResult -FilterXPath
"*[UserData[EventXML[ResultName='OutlookSelfTestProbe'][ResultType='4']]]"
| % {[XML]$_.toXml()}).event.userData.eventXml
ServerName is the name of the Client Access or Mailbox server with which you want to work. After you filter the log, you can display the results you want to see, such as:
$Errors | select -Property *Time,Result*,Error*,*Context
In this example, the output lists the time-, result-, error-, and context-related properties, which will help you identify the exact problem that occurred. Consider the following example:
ExecutionStartTime : 2013-07-01T21:24:26.9816420Z ExecutionEndTime : 2013-07-01T21:24:27.7508864Z ResultId : 644887342 ResultName : OutlookSelfTestProbe ResultType : 4 Error : The request was aborted: Could not create SSL/TLS secure channel. ExecutionContext : RpcProxy connectivity verification Task produced output: - TaskStarted = 7/01/2013 2:24:26 PM - TaskFinished = 7/01/2013 2:24:27 PM - Exception = System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel. - ErrorDetails = Status: SecureChannelFailure HttpStatusCode: HttpStatusDescription: ProcessedBody: - Latency = 00:00:00.5617493 - RpcProxyUrl = https://mailserver21. pocket-consultant.com:444/rpc/rpcproxy.dll?MailServer21. pocket-consultant.com:6001 - ResponseStatusCode = <null> RpcProxy connectivity verification failed. FailureContext : Status: SecureChannelFailure HttpStatusCode: HttpStatusDescription: ProcessedBody:
As you can see from the output, the probe error details provide a lot of information regarding the exact problem that occurred. In this example, an RPC Proxy error occurred that prevented creation of a secure SSL/TLS channel. If this was a problem preventing access to the server or causing other issues, you would then know that you need to look at related components to continue your troubleshooting. You would look at the RPC, RPC Proxy, SSL and TLS configuration in Internet Information Services (IIS) in addition to the related settings in Exchange.
Now that you know how to trace a reported problem to its source, let’s take a look at additional ways in which you can put this knowledge to use. You view the overall health of a server by using Get-ServerHealth. As discussed earlier in this chapter, if a health set has a status other than healthy or online, you can take a closer look at it by using the -HealthSet parameter. List the properties of the health set as shown in this example:
Get-ServerHealth -Identity MailServer42 -HealthSet FrontEndTransport | fl
The Name property in the output of Get-ServerHealth lists the name of the monitor reporting the health status. Table 9-1 lists the health sets associated with key Exchange features and components.
FEATURE/COMPONENT | RELATED HEALTH SETS |
ActiveSync | ActiveSync, ActiveSync.Protocol, ActiveSync.Proxy |
Active Directory | AD |
Anti-virus | Antimalware, AntiSpam |
Autodiscover | Autodiscover, Autodiscover.Protocol, Autodiscover.Proxy |
Mailbox databases | Clustering, Database, DataProtection, MailboxMigration, MailboxSpace, MRS, Store |
Exchange Admin Center | ECP.Proxy |
Exchange Web Services | EWS, EWS.Protocol, EWS.Proxy |
Front End Transport Service | FrontendTransport |
Transport Service | HubTransport, MailboxTransport, Transport, TransportSync |
Offline Address Book | OAB, OAB.Proxy |
Outlook, Outlook Web Access | Outlook, Outlook.Proxy, OWA.Protocol, OWA.Protocol.Dep, OWA.Proxy |
Unified Messaging | UM.Callrouter, UM.Protocol |
User Throttling | UserThrottling |
You can quickly identify all the related probes, monitors, and responders for a health set by using Get-MonitoringItemIdentity. The basic syntax is:
Get-MonitoringItemIdentity -Identity HealthSetName -Server ServerName
HealthSetName identifies the health set to examine and ServerName is the name of an Exchange server. In the following example, you list items by type, item name, and target resource:
Get-MonitoringItemIdentity -Identity FrontEndTransport -Server mailserver21 | ft itemtype, name, targetresource
As shown in the following partial output, each associated probe, monitor, and responder is listed by name:
ItemType Name TargetResource -------- ---- -------------- Probe FrontendTransportServiceRunning msexchangefrontendtransport Probe FrontendTransportRepeatedlyCrashing msexchangefrontendtransport Monitor FrontendTransportServiceRunningMonitor Monitor FrontendTransportRepeatedlyCrashingMonitor Responder FrontendTransportServiceRunningEscalateResponder Transport Responder FrontendTransportRepeatedlyCrashingResponder Transport
If the name of the monitor reporting a status other than online or healthy is FrontendTransportRepeatedlyCrashingMonitor, you can analyze the problem by looking at errors for the FrontendTransportRepeatedlyCrashing probe. Collect events for this probe from the ProbeResult event log and filter them as discussed earlier in “Viewing error messages for probes.” Here is an example:
$Errors = (Get-WinEvent –ComputerName ServerName -LogName Microsoft-Exchange-ActiveMonitoring/ProbeResult -FilterXPath "*[UserData[EventXML[ResultName='FrontendTransportRepeatedlyCrashing'] [ResultType='4']]]" | % {[XML]$_.toXml()}).event.userData.eventXml
ServerName is the name of the Client Access or Mailbox server with which you want to work. Remember, the result type can be 1 for timeout, 2 for poisoned, 3 for succeeded, 4 for failed, 5 for quarantined, or 6 for rejected.
After you filter the log, you can display the results you want to see, such as:
$Errors | select -Property *Time,Result*,Error*,*Context
Before you begin deeper troubleshooting, you might want to rerun the associated probe for the monitor to ensure it’s still not in a healthy or online state. You can rerun probes by using Invoke-MonitoringProbe. The basic syntax is:
Invoke-MonitoringProbe HealthSetNameProbeName -Server ServerName | fl
HealthSetName is the name of the health set with which to work, ProbeName is the name of the probe within the specified health set, and ServerName is the name of the Exchange server to check, such as:
Invoke-MonitoringProbe FrontEndTransport FrontendTransportRepeatedlyCrashing -Server MailServer21 | fl
As shown in this partial sample of the output, the command returns a lot of information about the test:
Server : MailServer21 MonitorIdentity : FrontEndTransportFrontendTransportRepeatedlyCrashing RequestId : 84dc68cd-c2f8-487f-a5e2-20b43f6f9207 ExecutionStartTime : 7/2/2013 10:20:42 PM ExecutionEndTime : 7/2/2013 10:20:42 PM Error : Exception : PoisonedCount : 0 ExecutionId : 18902819 SampleValue : 2015 ExecutionContext : FailureContext : ExtensionXml : ResultType : Succeeded RetryCount : 0 ResultName : 84dc68cdc2f8487fa5e220b43f6f9207- FrontendTransportRepeatedlyCrashing IsNotified : False ResultId : 1289896134 ServiceName : InvokeNow StateAttribute1 : No relevant crash events found for service
The ResultType value in the output will tell you whether the probe succeeded or failed. If the probe succeeded, the problem no longer exists. If the probe fails, the problem still exists and you’ll need to continue trying to diagnose and resolve it. Step-by-step procedures for troubleshooting issues with Exchange services was provided in Chapter 6; specifically, see the Troubleshooting Outlook Web App and Working with virtual directories and web applications sections.
Log Parser Studio is a graphical interface for Log Parser. Both tools are excellent for processing log files and have been extended specifically to analyze the Exchange protocol logs.
Before you can parse and analyze Exchange logs, you’ll need to install Log Parser and then add Log Parser Studio. At the time of this writing, the current version of Log Parser was available at http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=24659 and the current version of Log Parser Studio was available at http://gallery.technet.microsoft.com/Log-Parser-Studio-cd458765. After you install Log Parser, you can run Log Parser Studio.
Log Parser Studio runs from an executable named LPS.exe. Unless you copy logs to folders that you can access with standard user privileges, you’ll usually want to run Log Parser Studio with elevated, administrator privileges. To do this, press and hold or right-click the executable and then select Run As Administrator.
When you run Log Parser Studio, you’ll see dozens of preloaded queries that can be used to examine various Exchange protocols and other protocols. As shown in Figure 9-3, queries begin with a prefix that identifies the protocol they examine, including:
ActiveSync and ActiveSync Proxy for analyzing Exchange ActiveSync and the Exchange ActiveSync Proxy.
CAS and CAS-Proxy for analyzing requests related to Client Access server protocols and proxies.
ECP for analyzing requests related to Exchange Admin Center.
EWS for analyzing requests related to Exchange Web Services.
ExRCA for tracking requests made by the Exchange Remote Connectivity Analyzer.
OWA for analyzing requests related to Outlook Web Access.
Windows PowerShell for analyzing requests related to the remote Windows PowerShell gateway.
Log Parser Studio is designed to run queries against several different types of logs, including Event Viewer logs, Exchange protocol logs, and IIS protocol logs. Queries in Log Parser Studio are listed by name, description, query, and log type.
Before you can run a query in Log Parser Studio, you must specify the folders and types of logs with which to work. Keep the following in mind:
Logging for protocols and services that run on top of IIS are handled by IIS and these logs have the log type IISW3CLOG. By default, IIS logs are stored in the %SystemDrive%inetpublogsLogFiles folder.
Logging for Exchange services and components is performed by Exchange, and these logs have the type EELLOG or EELXLOG. By default, Exchange logs are stored within the Logging folder under the %ExchangeInstallPath%.
Logging is also performed by the operating system and these logs have the type EVTLOG. By default, Windows logs are stored in the %SystemRoot%System32winevtLogs folder.
In Log Parser Studio, you can specify the logs with which to work and their type by completing these steps:
Select the Choose Log… button on the toolbar.
In the Log File Manager dialog box, select Add Folder. Adding folders ensures any available log in the folder can be used.
In the Add Folder dialog box, navigate to the folder with which you want to work, such as %SystemDrive%inetpublogsLogFiles.
Next, select a log with the log type you want to use, and then select Open. When you select a log, Log Parser Studio tries to automatically detect the log type. If Log Parser Studio can’t detect the log type, you’ll need to select the log type when prompted.
Select OK.
You can run queries against all logs of the selected type in the selected folder. To run a query, double-tap or double-click the query on the Library tab to open the query in a new tab. On the new query tab, select Execute Active Query to run the query. How long it takes to run a query depends on the size and number of logs in the specified folder or folders. When Log Parser Studio finishes analyzing the logs, you’ll see the results and can use this information for troubleshooting. Figure 9-4 shows the results of a sample query.