Information About SAN Congestion
Information About SAN Congestion
Information About SAN Congestion Caused by Slow-Drain Devices
Most SAN edge devices use Class 2 or Class 3 Fibre Channel services that have link-level flow control. The
Flow Control feature allows a receiving port to back-pressure the upstream-sending port whenever the receiving
port reaches its capacity to accept frames. When an edge device does not accept frames from the fabric for
an extended time, it creates a condition in the fabric that is known as slow drain. If the upstream source of a
slow-edge device is an ISL, it results in credit starvation or slow drain in that ISL. This credit starvation then
affects the unrelated flows that use the same ISL. Similarly, congestion can occur in Fibre Channel and FCoE
although the flow control mechanisms are different in each of them. Regardless of the protocol of the device
causing the congestion, the congestion can propagate back to the source of the frames via both Fibre Channel
and FCoE links.
Fibre Channel buffer-to-buffer credits (BB_credits) are a flow-control mechanism to ensure that each side of
the Fibre Channel link is able to control the rate of incoming frames. BB_credits are set on a per-hop basis.
Each side of a Fibre Channel connection informs the other side of the number of buffers that are available for
it to receive frames. The sender can only send frames if the receiver has buffers. For each frame received, the
receiver transmits a R_RDY (also known as BB_credit) to the sender of that frame. If there is some processing
delay in the receiver, it can withhold the BB_credits, thereby limiting the rate at which it is receiving frames.
If the receiver withholds the BB_credits to a significant amount, it causes congestion in the SAN. This
BB_credit mechanism works independently in each direction of the traffic flow.
In FCoE, the flow control mechanism is called priority flow control (PFC). PFC consists of a receiver sending
class-based pause frames to a sender. PFC pause frames contain a value that is called a quanta. The quanta
determines how long a class of traffic is paused. There are two types of PFC pause frames—nonzero quanta
and zero quanta. A PFC pause frame with a nonzero quanta signals the receiver to stop sending frames
immediately for a specified amount of time. A PFC pause frame with a zero quanta signals the receiver that
it can resume sending frames immediately. As the receiver experiences some processing delay or its buffers
reach a defined threshold, it can transmit a PFC pause frame with a nonzero quanta. After the buffers are
sufficiently freed, the receiver can transmit another PFC pause frame containing a zero quanta which in turn
signals the sender to resume traffic. This PFC pause mechanism works in each direction of the traffic flow
independently of the other.
Devices that do not accept frames at the rate that is generated by the sender can be both Fibre Channel and
FCoE. The underlying flow control mechanism is different between the Fibre Channel and FCoE. But, Fibre
Channel and FCoE can equally cause congestion in the SAN. These devices are referred to as slow-drain
devices.
Slow-drain devices can be detected and actions can be taken to drop all or old frames that exceed the configured
threshold and queued to the slow-drain devices, reset credits on the affected ports, flap the affected ports,
disable errors on the affected ports, or isolate the traffic flows to the slow-drain devices. The Congestion
Detection, Congestion Avoidance, and Congestion Isolation features are used to detect slow-drain devices
and take appropriate actions on them.
The slow drain condition can be classified in the following four levels:
• Level 3—Indicates severe congestion. Ports are without credits for a continuous amount of time and
Cisco MDS 9000 Series Interfaces Configuration Guide, Release 8.x
134
Credit Loss Recovery to be initiated. For an F port, the duration when ports are without credits for a
continuous amount of time is 1 second and for an E port it is 1.5 seconds. When this type of congestion
occurs, a Fibre Channel primitive Link Credit Reset (LR) is sent to restore the BB_credits on the link in
Congestion Detection, Avoidance, and Isolation