How Root handles error messages from other PCIe devices

This article talks about advanced error reporting (Advanced Error Reporting, AER) about correctable and uncorrectable errors related registers, and how Root handles error messages from other PCIe devices.

Ø Advanced Correctable Error Handling

o Advanced correctable error status

The advanced correctable error status register is shown in the figure below. When a related error occurs, the hardware will automatically set the corresponding bit to 1. The software can clear it by writing 1 to the corresponding bit.

o Advanced correctable error masking

The advanced correctable error mask register is shown in the figure below. By default, the value of these bits are all 0. In other words, as long as a related error occurs and the error reporting function is enabled, the related error will be reported (not shielded). Of course, the software can shield the related error report information by setting the related bit to 1.

Ø Advanced Uncorrectable Error Handling

o Advanced uncorrectable error status

The advanced uncorrectable error status register is shown in the figure below. When related errors occur, regardless of whether these errors will be reported to the Root, the related bits will be set to 1.

Recall, the current error pointer (First Error Pointer) in the previous article. Assuming that the value of the pointer is 18d, it indicates that the error corresponding to the 18th bit in the uncorrectable error status register-abnormal TLP (Malformed TLP) will be processed first. Once the error is processed, the software will write 1 to the 18th bit of the uncorrectable error status register to clear this bit. Then, the current error pointer will be updated to the next value.

The software can modify whether uncorrectable errors are treated as fatal errors through the Advanced Uncorrectable Error Severity Register (Advanced Uncorrectable Error Severity Register), so that these errors can be treated separately. As shown in the figure below, 0 means non-fatal (Non-Fatal) and 1 means fatal (Fatal).

o Advanced uncorrectable error masking

The advanced uncorrectable error rating register is shown in the figure below. When the relevant bit is set to 1, the corresponding error type will not be reported.

The advanced error report structure in the configuration space contains a 4DW subspace, which is used to buffer the received header of the TLP with uncorrectable errors (unshielded). PCIe Spec stipulates that when the device supports the AER function, it must have the ability to cache at least one TLP header (4DW). Of course, some devices may support caching more TLP headers. This subspace is called the Header Log Register, and the error types supported by it are shown in the figure below.

In the PCIe bus topology, Root is the target of all other PCIe device error reports. When Root receives an error message (Error Message) from other PCIe devices, Root will choose whether to report the error to the system according to the parameter settings of the system, and in what way (interrupt, etc.) to report it.

Note: The interrupt mechanism of PCIe will be introduced in detail in a subsequent article.

When Root receives an error message, it will set the corresponding bit in the Root error status register. It should be noted that since Root itself is also a PCIe device, when an error occurs, it will also cause the corresponding bit in the Root error status register to be set, as if it had received an error message. The register is shown in the figure below:

As mentioned in the previous article, error messages are also a type of message. The error message contains the ID information of the error source device (BDF, Bus, Device and Function). According to the ID information, the location of the error source can be determined, and the information is cached in the advanced source ID register, as shown in the figure below. Show.

You can enable or disable whether related types of errors are reported to the system through the relevant bits of the Root Error Command Register (Root Error Command Register). As shown below:

Massking

Massking

Shenzhen Xcool Vapor Technology Co.,Ltd , http://www.xcoolvapor.com

Posted on