Windows/2008

[스크랩] WHEA 하드웨어 에러

99iberty 2015. 3. 25. 11:22

 

https://msdn.microsoft.com/en-us/library/windows/hardware/ff560537(v=vs.85).aspx

 

WHEA Hardware Error Events (Windows Server 2008, Windows Vista SP1 and Later)

 

Beginning with Windows Server 2008 and Windows Vista SP1, when a hardware error occurs, the operating system creates an error record that describes the error condition and sends an event tracing for Windows (ETW) event that contains the error record to user mode. The format of the error record is based on the Common Platform Error Record as described in Appendix N of version 2.2 of the Unified Extensible Firmware Interface (UEFI) Specification. The operating system creates an error-specific hardware error event from the ETW hardware error event, and logs the error-specific event in the system event log.

User-mode applications can be written to process WHEA error records by either registering for notification of the ETW events sent by the operating system or by querying the system event log for the error-specific WHEA hardware error events.

The WHEA hardware error events that are defined and the data templates that are used to describe the hardware error events are different for the ETW hardware error events that the operating system sends up to user mode and for the error-specific hardware error events that are logged in the system event log.

WHEA ETW Hardware Error Events

Beginning with Windows Server 2008 and Windows Vista SP1, the GUID for the provider of the WHEA ETW hardware error events that the operating system sends up to user mode is WHEA_ETW_PROVIDER. The name for the provider is Microsoft-Windows-Kernel-WHEA.

The following table describes the WHEA ETW hardware error event that is defined for Windows Server 2008, Windows Vista SP1 and later versions of Windows.

EventDescription

EVENT_WHEA_ERROR

ETW hardware error event.

 

The WHEA ETW hardware error event has an associated event identifier. An application can determine if a received WHEA ETW event is a hardware error event by examining the EventID system value that is associated with the WHEA ETW event. The WHEA ETW hardware error event uses a data template to describe the hardware error data that is associated with the event.

The following table lists the event identifier and the data template that is associated with the WHEA ETW hardware error event.

EventEvent IDData template

EVENT_WHEA_ERROR

20

WHEAEvent

 

The data template is described in the following section.

WHEAEvent

Note This data template is used only for Windows Server 2008, Windows Vista SP1, and later versions of Windows. For more information about the data templates used for Windows Vista, see WHEA Hardware Error Events (Windows Vista).

WHEA Error-Specific Hardware Error Events

Beginning with Windows Server 2008 and Windows Vista SP1, the GUID for the provider of the WHEA error-specific hardware error events that are logged in the system event log is WHEA_TS_ETW_PROVIDER. The name for the provider is Microsoft-Windows-WHEA-Logger.

The following table describes each of the WHEA error-specific hardware error events that are defined for Windows Server 2008, Windows Vista SP1, and later versions of Windows.

EventDescription

WHEALOGR_DEFAULT_ERROR

Generic uncorrected hardware error.

WHEALOGR_DEFAULT_WARNING

Generic corrected hardware error.

WHEALOGR_DEFAULT_INFO

Generic error information.

WHEALOGR_PCIE_ERROR

Uncorrected PCI Express error.

WHEALOGR_PCIE_WARNING

Corrected PCI Express error.

WHEALOGR_XPF_MCA_ERROR

Uncorrected machine check error.

WHEALOGR_XPF_MCA_WARNING

Corrected machine check error.

WHEALOGR_XPF_AMD64NB_MCA_ERROR

Uncorrected AMD64 northbridge machine check error.

WHEALOGR_XPF_AMD64NB_MCA_WARNING

Corrected AMD64 northbridge machine check error.

WHEALOGR_PLATFORM_MEMORY_ERROR

Uncorrected platform memory error.

WHEALOGR_PLATFORM_MEMORY_WARNING

Corrected platform memory error.

WHEALOGR_PCIXBUS_ERROR

Uncorrected PCI/PCI-X bus error.

WHEALOGR_PCIXBUS_WARNING

Corrected PCI/PCI-X bus error.

WHEALOGR_PCIXDEVICE_ERROR

Uncorrected PCI/PCI-X device error.

WHEALOGR_PCIXDEVICE_WARNING

Corrected PCI/PCI-X device error.

 

All these hardware error events are recorded in the system event log.

WHEA Error-Specific Hardware Error Identifiers

Each of the WHEA error-specific hardware error events has an associated event identifier. An application can determine which types of errors have occurred by examining the EventID system value that is associated with each WHEA error-specific hardware error event found in the system event log.

The following table lists the event identifier and the data template that is associated with each of the WHEA error-specific hardware error events that are defined for Windows Server 2008, Windows Vista SP1, and later versions of Windows.

EventEvent IDData template

WHEALOGR_DEFAULT_ERROR

1

tidDefaultError

WHEALOGR_DEFAULT_WARNING

2

tidDefaultError

WHEALOGR_DEFAULT_INFO

3

tidDefaultError

WHEALOGR_PCIE_ERROR

16

tidPCIExpressError

WHEALOGR_PCIE_WARNING

17

tidPCIExpressError

WHEALOGR_XPF_MCA_ERROR

18

tidMachineCheck

WHEALOGR_XPF_MCA_WARNING

19

tidMachineCheck

WHEALOGR_XPF_AMD64NB_MCA_ERROR

20

tidAMD64NBMachineCheck

WHEALOGR_XPF_AMD64NB_MCA_WARNING

21

tidAMD64NBMachineCheck

WHEALOGR_PLATFORM_MEMORY_ERROR

22

tidPlatformMemoryError

WHEALOGR_PLATFORM_MEMORY_WARNING

23

tidPlatformMemoryError

WHEALOGR_PCIXBUS_ERROR

24

tidPciXBusError

WHEALOGR_PCIXBUS_WARNING

25

tidPciXBusError

WHEALOGR_PCIXDEVICE_ERROR

26

tidPciXDeviceError

WHEALOGR_PCIXDEVICE_WARNING

27

tidPciXDeviceError

 

WHEA Error-Specific Hardware Error Events (Windows 7 and Later Versions of Windows)

Beginning with Windows 7, additional WHEA error-specific hardware error events are defined. The following table describes each of the WHEA error-specific hardware error events that are defined for Windows 7 and later versions of Windows.

EventDescription

WHEALOGR_IPF_MCA_ERROR

Uncorrected Itanium machine check error.

WHEALOGR_IPF_MCA_WARNING

Corrected Itanium machine check error.

WHEALOGR_PCIE_NODEVICEID_ERROR

Uncorrected PCI Express error.

WHEALOGR_PCIE_NODEVICEID_WARNING

Corrected PCI Express error.

WHEALOGR_PCIXBUS_NODEVICEID_ERROR

Uncorrected PCI/PCI-X bus error.

WHEALOGR_PCIXBUS_NODEVICEID_WARNING

Corrected PCI/PCI-X bus error.

WHEALOGR_PCIXDEVICE_NODEVICEID_ERROR

Uncorrected PCI/PCI-X device error.

WHEALOGR_PCIXDEVICE_NODEVICEID_WARNING

Corrected PCI/PCI-X device error.

WHEALOGR_PLATFORM_MEMORY_NOTYPE_ERROR

Uncorrected platform memory error of an undetermined type.

WHEALOGR_PLATFORM_MEMORY_NOTYPE_WARNING

Corrected platform memory error of an undetermined type.

 

All these hardware error events are recorded in the system event log.

WHEA Error-Specific Hardware Error Identifiers (Windows 7 and Later Versions of Windows)

Beginning with Windows 7, additional WHEA error events identifiers are defined. The following table lists the event identifier and the data template that is associated with each of the WHEA error-specific hardware error events that are defined for Windows 7 and later versions of Windows.

EventEvent IDData template

WHEALOGR_IPF_MCA_ERROR

38

tidIpfCheckInfo

WHEALOGR_IPF_MCA_WARNING

39

tidIpfCheckInfo

WHEALOGR_PCIE_NODEVICEID_ERROR

40

tidPCIExpressError

WHEALOGR_PCIE_NODEVICEID_WARNING

41

tidPCIExpressError

WHEALOGR_PCIXBUS_NODEVICEID_ERROR

42

tidPciXBusError

WHEALOGR_PCIXBUS_NODEVICEID_WARNING

43

tidPciXBusError

WHEALOGR_PCIXDEVICE_NODEVICEID_ERROR

44

tidPciXDeviceError

WHEALOGR_PCIXDEVICE_NODEVICEID_WARNING

45

tidPciXDeviceError

WHEALOGR_PLATFORM_MEMORY_NOTYPE_ERROR

46

tidPlatformMemoryError

WHEALOGR_PLATFORM_MEMORY_NOTYPE_WARNING

47

tidPlatformMemoryError

 

WHEA Error-Specific Hardware Error Event Data Templates

Each of the data templates is described in the following sections.

tidAMD64NBMachineCheck

tidDefaultError

tidIpfCheckInfo

tidMachineCheck

tidPCIExpressError

tidPciXBusError

tidPciXDeviceError

tidPlatformMemoryError

WHEAEvent

 


=======================================================================================================================

https://social.technet.microsoft.com/Forums/windowsserver/en-US/aac123d6-bd5d-4080-9732-c342ca29110f/whealogger-error

 

 

Whea-Logger Error

    Question

  • Event ID: 1

    Execution Process 1844 refers to Widows Firewall

    The event log has the following;

    Log Name: System
    Source: Microsoft-Windows-WHEA-Logger
    Date: 1/31/2012 2:19:01 PM
    Event ID: 1
    Task Category: None
    Level: Error
    Keywords: WHEA Error Event Logs
    User: LOCAL SERVICE
    Computer: ANAHEIM-SERVER.bcwirerope.local
    Description:
    A fatal hardware error has occurred. A record describing the condition is contained in the data section of this event.
    Event Xml:
    < Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    <System>
    <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{C26C4F3C-3F66-4E99-8F8A-39405CFED220}" />
    <EventID>1</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000002</Keywords>
    <TimeCreated SystemTime="2012-01-31T22:19:01.753893000Z" />
    <EventRecordID>104478</EventRecordID>
    <Correlation ActivityID="{FFD9176B-38C7-43CC-B1D8-6785F3A495CB}" />
    <Execution ProcessID="1844" ThreadID="8128" />
    <Channel>System</Channel>
    <Computer>ANAHEIM-SERVER.bcwirerope.local</Computer>
    <Security UserID="S-1-5-19" />
    </System>
    <EventData>
    <Data Name="Length">476</Data>
    <Data Name="RawData">435045521002FFFFFFFF02000100000002000000DC0100001D3215001F010C140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131FF89AD5BE6B7C942814ACF2485D6E98A044ECD672DE0CC0100000000455200000000000000000000000000000000000010010000C00000000102000000000000ADCC7698B447DB4BB65E16F193C4F3DB00000000000000000000000000000000030000000000000000000000000000000000000000000000D00100000C0000000102000001000000E75412E7B9C14049AB76909703A4320F0000000000000000000000000000000001000000000000000000000000000000000000000000000043010000000000000002000000000000C206020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000013000000000000000000000000000000000000000000000000000000000000000000000000000000310000000000000000000000</Data>
    </EventData>
    < /Event>

     

    This has happened twice. It completely shuts down the server and says call the hardware mnufacturer. Intel of course, refes back to Microsoft.

    Any ideas which piece of hardware is generating this?


    Jay Doyle
    Wednesday, February 01, 2012 7:22 PM
    Avatar of JayAAE
    CFL
    0 Points

Answers

  • OK Thank you for your response. I will update drivers. All OS has been updated. Problem has only happened twice in about 45 days so I will monitor. No specific hardware reference in the logs.

    Jay


    Jay Doyle
    • Marked as answer byJayAAEWednesday, February 08, 2012 12:28 AM
    Friday, February 03, 2012 12:23 PM
    Avatar of JayAAE
    CFL
    0 Points

All replies

  • Hi,

    Thanks for posting in Windows Server forum.

    Is there any other errors showing up at system logs which related to hardware failure? From this event log, we cannot determine which part of hardware cause the error.

    In order to troubleshoot, please update all hardware drives to the latest version. Install latest update of your Window OS. If the issue still persists, it appears to be a hardware problem. Given that sistuation, it’s better to contact the hardware manufacturer to resolve this issue.

    Best Regards,

    Aiden


    Aiden Cao

    TechNet Community Support

    Friday, February 03, 2012 5:49 AM
    Avatar of Aiden_Cao
    Aiden_Cao
    Avatar of Aiden_Cao
    15,085
    Points
    Top 0.50
    Aiden_Cao
    MCCJoined Sep 2011

    1

    5

    13

    Microsoft
    (MCC)
    15,085 Points
    Moderator
  • OK Thank you for your response. I will update drivers. All OS has been updated. Problem has only happened twice in about 45 days so I will monitor. No specific hardware reference in the logs.

    Jay


    Jay Doyle
    • Marked as answer byJayAAEWednesday, February 08, 2012 12:28 AM
    Friday, February 03, 2012 12:23 PM
    Avatar of JayAAE
    CFL
    0 Points
  • After all, the server failed and would not restart. I contacted INTEL, replaced the server motherboard and the problem was finally solved.

    WHEA-LOGGER in this case was hardware.


    Jay Doyle

    • Proposed as answer byKlaus BremerSaturday, July 14, 2012 6:37 AM
    Wednesday, March 21, 2012 11:21 PM

 

-----> 마더보드 교체 후 정상화되었다고 함.

 

=======================================================================================================================

 

http://social.technet.microsoft.com/wiki/contents/articles/3567.event-id-18-microsoft-windows-whea-logger.aspx

 

Report inappropriate content using these instructions.

Event ID 18: Microsoft-Windows-WHEA-Logger

Event ID 18: Microsoft-Windows-WHEA-Logger



Applies to:


Windows Server 2008, Windows Vista, Windows Server 2008 R2, Windows 7

Details

Product:

Windows Operating System

Event ID:

18

Source:

Microsoft-Windows-WHEA-Logger

Version:

6.1

Symbolic Name:

Boot Performance Monitoring

Message:

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Unknown Error
Processor ID: 1

The details view of this entry contains further information.


Explanation


This error indicates that there is a hardware problem. A machine check exception indicates a computer hardware error that occurs when a computer's central processing unit detects a hardware problem.

Note: WHEA stands for Windows Hardware Error Architecture.

Some of the main hardware problems which cause machine check exceptions include:


  • System bus errors (error communicating between the processor and the motherboard)
  • Memory errors that may include parity and error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM; if information is corrupted, then random errors occur.
  • Cache errors in the processor; the cache stores important data and code. If this is corrupted, errors often occur.
  • Poor voltage regulation (i.e. power supply problem, voltage regulator malfunction, capacitor degradation)
  • Damage due to power spikes
  • Static damage to the motherboard
  • Incorrect processor voltage setting in the BIOS (too low or too high)
  • Overclocking
  • Permanent motherboard or power supply damage caused by prior overclocking
  • Excessive temperature caused by insufficient airflow (possibly caused by fan failure or blockage of air inlet/outlet)
  • Improper BIOS initialization (the BIOS configuring the motherboard or CPU incorrectly)
  • Installation of a processor that is too much for your motherboard to handle (excessive power requirement, incompatibility)
  • Defective hardware that may be drawing excessive power or otherwise disrupting proper voltage regulation

User Action

  • Update the BIOS and the drivers for the motherboard chipset.
  • Update all the hardware drivers, if updates are available from your manufacturer.
  • Check the temperature inside the computer to make sure your processor and related peripherals are not overheating.
  • Check the fan on your CPU to make sure it is properly attached to the CPU.
  • If you have overclocked your CPU, reset your settings to the default settings.
  • Make sure you power supply fan is working correctly

Related Information


WHEA Design Guide

http://msdn.microsoft.com/en-us/library/ff559288(v=vs.85).aspx

WHEA - Windows Hardware Error Architecture Overview

http://msdn.microsoft.com/en-us/windows/hardware/gg463286