Packet Loss

Reasons Packet Could Be Discarded in Transit

A request or response packet can be discarded in transit for any of a number of reasons. Some reasons for packet discard are:

  • A packet CRC error is detected by a device in the packet's path between the two QPs.

  • Link Version error (LRH:LVer should be 0). If a device receives the packet and its Link Layer doesn't support the Link version indicated in the packet's LRH:LVer field, it discards the packet.

  • If the packet length exceeds the MTU size of the port on the other end of a link, the packet is not transmitted to that port. It is dropped.

  • If a packet's LRH:DLID field contains 0000h (the reserved DLID), then the packet is dropped by a switch or a router.

  • VL error. If the VL buffer that is required to accept or to transmit a packet is inoperative, then the packet cannot be accepted or transmitted.

Request Packet Lost

Detection and Handling by the RQ Logic
RQ Logic Detects Lost Request(s)

If one or more request packets are lost while in transit to the responder QP's RQ Logic, the RQ Logic will detect the loss upon its receipt of the next request packet. In this case, the PSN in the next request packet received:

- is not equal to the RQ Logic's ePSN (see Figure 17-16 on page 395), and

- does not fall within the duplicate request region.

This indicates that one or more request packets are missing. There is also the case wherein the requester QP's SQ Logic has not sent any additional request packets after the one in question. In this case, the RQ Logic will not receive any additional request packets and therefore will not detect the fact that one or more request packets have been lost in transit. The loss will be detected by the SQ Logic—see “SQ Logic Detects Lost Request Packet(s)” on this page.

RQ Logic Issues a PSN Sequence Error Nak Packet

Upon detecting one or more missing request packets, the RQ Logic returns a Nak packet back to the requester QP's SQ Logic. Figure 17-6 on page 369 illustrates the format of an Ack packet. Table 17-3 on page 370 illustrates the format of the Syndrome field within the Ack packet's AETH, and Table 17-4 on page 371 indicates the currently defined Nak error code. In this case, the error code returned is the PSN Sequence Error Nak.

After Sending the Nak...

After the RQ Logic returns the PSN Sequence Error Nak, it then returns to waiting for the requester QP's SQ Logic to send a request packet containing the responder's ePSN. The RQ Logic ignores any other new requests (but it does respond to duplicate requests) until it receives a valid request with a PSN = the ePSN.

SQ Logic Detects Lost Request Packet(s)

The SQ Logic can detect that one of the request packets it issued has been lost in one of two ways:

  1. If no other request packet has been issued after the one in question, and if the request issued required a response (see “Triggering the SQ Logic's Transport Timer” on page 390), the SQ Logic's Transport Timer will eventually timeout awaiting a response and will retransmit (i.e., retry) the request (assuming that the Retry Count is not exhausted).

  2. If one or more request packets were issued by the SQ Logic subsequent to the one in question, those packets may or may not make it to the requester QP's RQ Logic:

    - If one or more of them do arrive at the RQ Logic, the RQ Logic responds as described in “RQ Logic Issues a PSN Sequence Error Nak Packet” on page 398.

    - If none of the subsequently issued request packets arrive at the RQ Logic, and if the request in question required a response (see “Triggering the SQ Logic's Transport Timer” on page 390), the SQ Logic's Transport Timer will eventually timeout awaiting a response and will retransmit (i.e., retry) the request (assuming that the Retry Count is not exhausted).

Response Packet Lost

A response packet may be lost while in transit from the responder QP's RQ Logic back to the requester QP's SQ Logic. There are several possible scenarios:

  • Transport Timer timeout. If no response packets are received by the SQ Logic for a protracted period of time, the SQ Logic's Transport Timer may timeout. In this event, the actions taken by the SQ Logic are described in “Transport Timer Expiration” on page 392.

  • Automatically fixed by Ack coalescing. If an Ack packet acknowledging receipt of a Send or an RDMA Write request packet is lost, the requester's receipt of a subsequent response packet with a higher PSN will implicitly acknowledge the outstanding request packet for which the earlier Ack packet was lost.

  • Lost RDMA Read response packet(s). This is detected by the receipt of a response packet (of any type) with a PSN higher than the next expected RDMA Read response PSN. It is referred to as an Implied Sequence Error Nak. The RDMA Read request must therefore be retried (assuming that the SQ Logic's Retry Count isn't exhausted). In addition, the SQ Logic must retry all request packets that had been launched into the fabric after the RDMA Read request.

  • Lost Atomic response packet. This is detected by the receipt of a response packet (of any type) with a PSN higher than the next expected Atomic response PSN. The Atomic request must therefore be retried (assuming that the SQ Logic's Retry Count isn't exhausted). In addition, the SQ Logic must retry all request packets that had been launched into the fabric after the Atomic request.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset