© Nikolas Charlebois-Laprade et al. 2017

Nikolas Charlebois-Laprade, Evgueni Zabourdaev, Daniel Brunet, Bruce Wilson, Mike Farran, Kip Ng, Andrew Stobart, Roger Cormier, Colin Hughes-Jones, Rhoderick Milne and Shawn Cathcart, Expert Office 365, https://doi.org/10.1007/978-1-4842-2991-0_10

10. Understanding and Troubleshooting Office 365 Mail Flow

Nikolas Charlebois-Laprade, Evgueni Zabourdaev2, Daniel Brunet3, Bruce Wilson4, Mike Farran5, Kip Ng6, Andrew Stobart4, Roger Cormier6, Colin Hughes-Jones6, Rhoderick Milne6 and Shawn Cathcart7

(1)Gatineau, Québec, Canada

(2)Ottawa, Ontario, Canada

(3)Laval, Québec, Canada

(4)Winnipeg, Manitoba, Canada

(5)Strathmore, Alberta, Canada

(6)Mississauga, Ontario, Canada

(7)Edmonton, Alberta, Canada

BY BRUCE WILSON

Office 365 Exchange Online (ExO) with Exchange Online Protection (EOP) is the successor to Business Productivity Online Service (BPOS) and Forefront Online Protection for Exchange (FOPE) services. While BPOS and FOPE were separate services managed completely independently, Office 365 brings them together into one service and administration center, running on a common platform, Exchange. Since the service initially launched, it has received many updates, bringing new and improved functionality to users along the way.

In this chapter, we will look at the various aspects of mail flow, as they relate to Office 365, starting with what happens to an e-mail message when it is first received by the service. We will then review the steps the message takes before it is handed off to the destination e-mail system or target mailbox. We will also cover where anti-malware and antispam checks fit in, but I will not go into depth about this in this chapter. Next, we will look at the most common deployment scenarios and what properly functioning mail flow looks like in each. Then we will look at several troubleshooting techniques that can be used to isolate where an e-mail delivery failure is occurring. Other topics covered will be the different kinds of logging that are available, which one to use when, and how to use them.

Office 365 Mail-Flow Architectural Overview

Office 365 is based on the current generation of Exchange, which means that mail flow does not differ from what you would expect from an on-premises server deployment. When a message is received by Office 365, it first passes through an edge server, where connection and reputation filtering occurs, before being passed on to a mailbox server for filtering and further processing. The message is then processed by anti-malware, Transport & Data Loss Prevention (DLP) rules, antispam agents, and, if enabled, Advanced Threat Protection (ATP) safe links and safe attachments, before being delivered to its destination (Figure 10-1).

A434446_1_En_10_Fig1_HTML.gif
Figure 10-1. Overview of the overall Exchange Online architecture

The service uses what we call Opportunistic TLS (Transport Layer Security ) to secure communication between itself and external servers, providing end-to-end TLS encryption. This means that if TLS is offered, we will always prefer it over a non-TLS connection.

For messages being sent to Office 365, the sending server is responsible for setting up the TLS connection . Connectors can be configured to force TLS communication for messages coming in to the service. Messages being sent from the service to external parties will always attempt TLS first. If the receiving MTA does not offer the STARTTLS SMTP (Simple Mail Transfer Protocol ) verb or there is an issue establishing the connection, the service will fall back on unencrypted SMTP communication. Connectors can also be used to specify the security and routing of messages leaving Office 365.

There are two broad categories of customers who use Office 365: Exchange Online Protection (a.k.a. Filtering Only) and Exchange Online (a.k.a. Fully Hosted or Hybrid) customers. Aside from the physical servers that messages for these customers pass through, there is no difference between how messages are processed.

Fully hosted is by far the simplest configuration, as connectors are not required to set up mail flow to or from these customers. When a message is sent to a fully hosted customer, once it passes through filtering, it will be delivered to the destination mailbox. Messages sent from these customers will be delivered, based on the destination domain’s MX record or via an outbound partner connector, if configured.

For filtering-only customers, once messages have passed through filtering, the service uses an outbound on-premises connector to route the message to the customer’s on-premises environment, where it will be delivered to its destination. Messages sent from these customers enter the service through an inbound on-premises connector, then go through the same filtering as inbound messages, before being delivered, based on the destination domain’s MX record or via outbound partner connectors to enforce TLS delivery.

As there are many variations of hybrid configurations , we will look at the simplest setup here. After a message passes through filtering, the service will first look for a mailbox in the cloud to deliver to. If a mailbox cannot be found, Office 365 then uses the domain type, Internal Relay or Authoritative, to determine what happens to messages when a mailbox cannot be found in the cloud. When the domain is set to Internal Relay, the service will route the message out of the service for delivery to the destination mailbox. If the domain is set to Authoritative, a mail object is required, to tell the service that a mailbox exists on-premises. In both cases, an outbound on-premises connector is required, to ensure that the message is routed to the correct end point for delivery to the destination mailbox.

How Does the Service Know If a Message Is Being Sent to or from a Customer?

When a message is sent to Office 365, the edge servers that process the message determine if it is originating, coming from a customer, or incoming, going to a customer. This process is called attribution .

  • Originating: A message that originates from a tenant’s on-premises server (matches the tenant’s inbound on-premises connector) or from a cloud mailbox. If Office 365 is unable to identify the message as originating, and the recipient domain is an accepted domain in an Office 365 organization, the service will identify the message as incoming to the recipient organization instead.

  • Incoming: Messages sent to accepted domains that are in Office 365 that do not match an incoming on-premises connector.

During the attribution process, the service uses the following information to make the determination as to whether the message is originating or incoming:

  • Certificate Details and/or Connecting IP

  • MailFrom domain

  • RcptTo domain

Common Mail-Flow Deployment Styles

Office 365 allows for many different mail-flow configurations, from very simple filtering only or fully hosted to complex hybrid deployments. When deploying Office 365, understanding how you plan to configure your MX record in the long term is important. For example, if you plan to keep your MX record pointing to a third party, you will have to create Partner connectors to enable secured mail flow.

Filtering Only

With a filtering-only configuration, your MX record is pointed to Office 365 and then to configure inbound and outbound on-premises-type connectors, to enable mail flow for your organization (Figure 10-2).

A434446_1_En_10_Fig2_HTML.gif
Figure 10-2. Filtering-only configuration

In a filtering-only configuration, there are four different failure points that you must be aware of.

  • From the Internet to Exchange Online Protection

  • Exchange Online Protection to on-premises

  • On-premises to Exchange Online Protection

  • Exchange Online Protection to the Internet

Fully Hosted

With a fully hosted configuration , your MX record is pointed to Office 365. No connectors are required with this configuration, unless you have a business necessity, such as TLS with business partners (Figure 10-3).

A434446_1_En_10_Fig3_HTML.jpg
Figure 10-3. Fully hosted configuration

In a fully hosted configuration, there are two different failure points that you must be aware of.

  • From the Internet to Exchange Online

  • From Exchange Online to the Internet

Note

If you configure connectors to enforce business requirements, these introduce additional failure points that you must be aware of.

As there are many different hybrid configurations that can be configured, I will only cover the main scenarios. Additionally, I will not go deep within on-premises but instead will reference it at a high level. With hybrid configurations, as the complexity increases, so do the number of failure points.

MX to On-Premises

With this configuration, your MX record will continue to be pointed to the on-premises server. Messages destined to mailboxes not hosted within the on-premises environment will have their e-mail address rewritten to the TargetAddress, which is the service domain in Office 365, and relayed up to the cloud for delivery (Figure 10-4) .

A434446_1_En_10_Fig4_HTML.jpg
Figure 10-4. Hybrid mail flow with MX pointing to on-premises server

In this configuration, there are five failure points that you must be aware of.

  • From the Internet to on-premises

  • Within the on-premises environment

  • On-premises to Exchange Online

  • Exchange Online to on-premises for internal recipients

  • Exchange Online to Internet for external recipients

MX to Cloud

With this configuration, your MX record is updated to point to Exchange Online, and connectors are configured to route mail between the cloud and on-premises for users not hosted in Exchange Online (Figure 10-5). When a message is sent from an on-premises user to a user that is hosted in Exchange Online, the recipient’s address is rewritten to the TargetAddress, which is the service domain in Office 365, and relayed to the cloud for delivery.

A434446_1_En_10_Fig5_HTML.jpg
Figure 10-5. Hybrid mail flow with MX pointing to Office 365

In this configuration, there are five failure points that you must be aware of.

  • From the Internet to Exchange Online

  • Exchange Online to on-premises

  • Within the on-premises environment

  • On-premises to Exchange Online

  • Exchange Online to the Internet

Centralized Mail Transport

Centralized Mail Transport, sometimes referred to as Centralized Mail Control, requires that all messages be routed through the on-premises environment first, before being delivered (Figure 10-6). This type of configuration is typically used when there are compliance requirements that must be enforced within the on-premises environment.

A434446_1_En_10_Fig6_HTML.gif
Figure 10-6. Centralized Mail Control

In this configuration, owing to the complexity, I will break down the failure points to inbound and then outbound.

For inbound:

  • From the Internet to Exchange Online

  • Exchange Online to on-premises

  • Within on-premises environment

  • On-premises to Exchange Online for hosted recipients

For outbound:

  • Exchange Online to on-premises

  • Within on-premises environment

  • On-premises to Exchange Online

  • Exchange Online to the Internet

Common Troubleshooting Techniques

When troubleshooting mail flow, it is critical to isolate the fault as much as possible. To do this, it is best to break the process into multiple steps: scoping, data collection, solution identification, and solution implementation.

During the scoping phase, you will want to look at the following: When did the issue begin? This is important to know, as issues that suddenly occur are often a result of a change within the environment or an unexpected service incident. If mail flow has never worked, this would indicate an initial setup issue, and the approach to resolving it will be different.

Who is affected by the issue? What you are looking for here is whether the issue is isolated to a subset of users or if it’s affecting all users. If the issue is isolated to a specific set of users, you start by looking at commonalities between those users. For example, are they all in the same branch office, and are other tools also impacted. This could indicate a general connectivity issue.

Are there any error messages generated? In mail flow, we call these NDRs (non-delivery reports/receipts ) or DSNs (data source names ). Both will often tell you exactly what the issue affecting mail delivery is or the behavior being seen.

During the data-collection phase , you want to collect additional information that you will use with the information collected during scoping, to determine the solution. Generate test messages to identify which aspects of mail flow are not working. Often, mail-flow issues affect only some aspects of mail flow, meaning that some mail will still work. You just have to identify which ones are working. To do this you can test the following scenarios:

  • External sender to hosted mailbox

  • Hosted mailbox to External Recipient

  • External sender to on-premises mailbox

  • On-premises mailbox to External Recipient

  • Hosted mailbox to on-premises mailbox

  • On-premises mailbox to hosted mailbox

  • Hosted mailbox to hosted mailbox (Note: This should always work. If it does not work, and you do not have a criteria-based routing rule configured in Office 365, you should engage support).

Collect any Message Tracking Logs, NDRs, and e-mail headers showing the issue. With mail-flow issues, this data will likely show you exactly where the issue is, reducing the need for further troubleshooting.

Because mail flow is linear, once you have collected the scoping data and the empirical data outlined previously, troubleshooting becomes a matter of determining where in the message flow the issue is occurring and then looking at the configuration of the step directly before the break.

Once you have identified the configuration change that is necessary to resolve the mail delivery issue, it is time to implement the fix. When you update connectors in Office 365, changes will typically replicate out in 15 minutes, but they will sometimes take longer. This is important to note, as you will not be able to test your changes immediately. Changes within your on-premises environment will replicate out faster, except for changes to DNS, if necessary, as these are dependent on the Time-To-Live (TTL) that was configured during the last DNS update.

Reading NDRs

When an NDR is generated, it will provide valuable information needed to determine where a delivery failure is occurring and the cause of the delivery failure. An NDR is always generated by the last mail server to have the message in the queue. In the case of Office 365, it will generate a friendly NDR format that is broken into multiple sections. The first section of the NDR is intended to provide end users with a quick explanation of what happened and what steps need to be taken to resolve the issue (Figure 10-7).

A434446_1_En_10_Fig7_HTML.jpg
Figure 10-7. Undelivered message in Office 365

The next section of the NDR is intended for administrators and provides more details about the most common resolutions (Figure 10-8).

A434446_1_En_10_Fig8_HTML.jpg
Figure 10-8. More information content from Office 365

You will see the exact error details, which server generated the NDR, which server it was attempting communicate with, and a table showing the individual hops that the message took before the NDR was generated (Figure 10-9) .

A434446_1_En_10_Fig9_HTML.jpg
Figure 10-9. Message hops table

The final section of an Office 365–generated NDR contains the original message headers.

If the NDR was not generated by Office 365, at a minimum, you will be presented with the SMTP response code generated by the remote server and the server that generated the NDR. The SMTP response code will be either a 4xx series, indicating a temporary delivery issue, or a 5xx series, indicating a permanent delivery issue. With a 5xx series response, the sending mail server will make no further attempts to deliver the message. For a 4xx series response, the sending server will attempt to deliver the message again, based on the configured retry interval and time frame.

Reading Headers

While troubleshooting mail flow, you will often end up looking at lots of e-mail headers, to determine the exact path that a message took. This can be done manually using Notepad or your favorite text editor, but this can become tiresome, owing to formatting and the sheer amount of information that is contained within the header. Microsoft provides a great tool that makes this process much easier called the Header Analyzer, available at https://testconnectivity.microsoft.com/ (Figure 10-10).

A434446_1_En_10_Fig10_HTML.jpg
Figure 10-10. Remote Connectivity Analyzer

The Message Analyzer parses the header for you and then displays the contents in a table format broken down as follows:

  • Summary: contains the Subject

  • Message ID

  • Message Creation Date/Time

  • From address

  • To address

    • Received headers: This is a section that contains the Received headers, presented chronologically, with the oldest at the top (note: e-mail headers show the newest Received headers at the top) and with the date/times all converted to UTC.

    • Forefront Antispam Report H eader: This section parses the X-Forefront-Antispam-Report header, if present, to display the Country/Region the sending IP belongs to, Language, Spam Confidence Level (SCL), Spam Filtering Verdict, IP Filter Verdict, HELO/EHLO String of the sending server, as well as the PTR-Record.

    • Microsoft Antispam header: This section parses the X-Microsoft-Antispam to display the Bulk Complaint and the Phishing Confidence Levels.

    • Other headers: This section displays all the other headers that were present in the message.

When working with e-mail headers, it is important to ensure that you always have the complete headers. To ensure that you always have full e-mail headers, it is recommended that you copy the headers from the source mailbox. If the message has been forwarded as an attachment, there is the potential for the client sending the mail to strip out extra headers, to conserve space. Once you have a complete e-mail header, the parts that you will be looking at will be the following:

  • Message details: The To, From, Date/Time & Message Id fields, as these are used, should you have to trace a message

  • Message path: The Received headers, as these tell you what path a message took

  • Extra properties: These are the X-headers appended to the bottom of the message that provide details about what happened to a message passing through the network.

Available Logging and Usage

As an administrator of Office 365, you will have, at some point, to be able to see exactly what happened to a message as it passes through the service. To assist administrators with these queries, the service provides multiple options.

  • Message trace via the Office 365 Exchange Administration Center (EAC) .

  • Via PowerShell cmdlets:

    • Get-MessageTrace and Get-MessageTraceDetail

    • Start-HistoricalSearch

As there are multiple options, the key becomes knowing which option to use, as there are limitations that must be understood. To decide which option to use, you must first start by determining how far back you have to trace: the preceding 7 days or up to 90 days in the past.

Seven Days or Fewer

If the message is from the previous seven days, the easiest option is to perform the trace via the EAC (see Figure 10-11). For very recent messages, there is typically a delay of just a few minutes between the time the message was sent and the time it can be traced. The trace can be initiated as follows:

  • Log in to Office 365 as an administrator

  • Navigate to the Exchange Administration Center

  • Click mail flow

  • Click message trace

  • Enter your search criteria

  • Click Search

A434446_1_En_10_Fig11_HTML.jpg
Figure 10-11. Exchange Administration Center

Once the search has been initiated, you will be presented a results box showing the matches that were found. You can then get more details on individual messages, by double-clicking the message in question or by clicking the edit icon (Figure 10-12).

A434446_1_En_10_Fig12_HTML.jpg
Figure 10-12. Additional message events information

Alternatively, the same trace can be performed via PowerShell. Often when tracing for a message, you already have the Message-ID, so start by using the Get-MessageTrace cmdlet and filter the results, to show you the Message-IDs that will return a list of matching messages (see Figure 10-13).

A434446_1_En_10_Fig13_HTML.jpg
Figure 10-13. Get-MessageTrace cmdlet results

Next, to get additional details about the message, you can then use the Get-MessageTraceDetail cmdlet, which will return more information, such as what we saw through the EAC (Figure 10-14).

A434446_1_En_10_Fig14_HTML.jpg
Figure 10-14. Get-MessageTrace in PowerShell filtered by MessageId

You will notice that this is the same detail presented through the EAC, but it is possible to get additional details. You can do the same via PowerShell, by adding | FL to the previous cmdlet (Figure 10-15).

A434446_1_En_10_Fig15_HTML.jpg
Figure 10-15. Full details from the Get-MessageTrace PowerShell cmdlet

Greater Than Seven Days to Ninety Days

These traces can be initiated from the EAC the same way as a trace for the previous seven days. The key difference, though, is that the traces are run on the back end and then once the results can be downloaded by administrators. When setting up the trace, it is important to select the Include message events and routing details with a report option, as this will ensure that all details about what happened to the message are included in the results (Figure 10-16).

A434446_1_En_10_Fig16_HTML.jpg
Figure 10-16. Creating new trace in Office 365

The results will be presented in a comma-separated value (csv) file, which can be opened and reviewed via Excel. After opening the file in Excel, start by sorting the report, based on the Date column, and then filter to a Message-ID in question, to restrict the amount of data requiring review (see Figure 10-17).

A434446_1_En_10_Fig17_HTML.jpg
Figure 10-17. Exported traces in Excel

Alternatively, the same trace can be initiated via the Start-HistoricalSearch cmdlet with the -ReportType MessageTraceDetail parameter, to ensure that all the events and routing details are returned (Figure 10-18).

A434446_1_En_10_Fig18_HTML.jpg
Figure 10-18. Start-HistoricalSearch cmdlet in PowerShell

Sometimes, you want to get the message tracking report for messages that are fewer than seven days old. To do this via the EAC, you can set the Start Date to eight days prior to the current date/time. The most efficient option, though, is to use the Start-HistoricalSearch cmdlet.

Additional Resources

The following resources are recommended for further information.

Summary

In this chapter, we looked at how a message gets routed from the Internet through to an intended recipient in Exchange Online or an on-premises messaging environment. As mail flow is performed via SMTP, we can rely on built-in capabilities, such as NDRs and DSNs, to get alerts when something is not working correctly. Using message headers and message-tracing capabilities built into Office 365, we can dig deep into exactly what happened to a given message or set of messages. I hope this chapter leaves you with a deeper understanding and set of tools that you can use in the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset