Intel® Microarchitecture Code Named Haswell Events

This section provides reference for hardware events that can be monitored for the CPU(s):

  • 4th generation Intel® Core™ processor family
  • EventName Description
    INST_RETIRED.ANY This event counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leaving the programmable counters available for other events. Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.
    CPU_CLK_UNHALTED.THREAD This event counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.
    CPU_CLK_UNHALTED.REF_TSC This event counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state.
    LD_BLOCKS.STORE_FORWARD This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued.
    LD_BLOCKS.NO_SR The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.
    MISALIGN_MEM_REF.LOADS Speculative cache-line split load uops dispatched to L1D.
    MISALIGN_MEM_REF.STORES Speculative cache-line split store-address uops dispatched to L1D.
    LD_BLOCKS_PARTIAL.ADDRESS_ALIAS Aliasing occurs when a load is issued after a store and their memory addresses are offset by 4K. This event counts the number of loads that aliased with a preceding store, resulting in an extended address check in the pipeline which can have a performance impact.
    DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK Misses in all TLB levels that cause a page walk of any page size.
    DTLB_LOAD_MISSES.WALK_COMPLETED_4K Completed page walks due to demand load misses that caused 4K page walks in any TLB levels.
    DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M Completed page walks due to demand load misses that caused 2M/4M page walks in any TLB levels.
    DTLB_LOAD_MISSES.WALK_COMPLETED_1G Load miss in all TLB levels causes a page walk that completes. (1G)
    DTLB_LOAD_MISSES.WALK_DURATION This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses.
    DTLB_LOAD_MISSES.STLB_HIT_4K This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
    DTLB_LOAD_MISSES.STLB_HIT_2M This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
    DTLB_LOAD_MISSES.PDE_CACHE_MISS DTLB demand load misses with low part of linear-to-physical address translation missed.
    INT_MISC.RECOVERY_CYCLES This event counts the number of cycles spent waiting for a recovery after an event such as a processor nuke, JEClear, assist, hle/rtm abort etc.
    UOPS_ISSUED.ANY This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops.
    UOPS_ISSUED.FLAGS_MERGE Number of flags-merge uops allocated. Such uops add delay.
    UOPS_ISSUED.SLOW_LEA Number of slow LEA or similar uops allocated. Such uop has 3 sources (for example, 2 sources + immediate) regardless of whether it is a result of LEA instruction or not.
    UOPS_ISSUED.SINGLE_MUL Number of multiply packed/scalar single precision uops allocated.
    UOPS_ISSUED.STALL_CYCLES Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread.
    UOPS_ISSUED.CORE_STALL_CYCLES Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads.
    ARITH.DIVIDER_UOPS Any uop executed by the Divider. (This includes all divide uops, sqrt, ...)
    L2_RQSTS.DEMAND_DATA_RD_MISS Demand data read requests that missed L2, no rejects.
    L2_RQSTS.DEMAND_DATA_RD_HIT Counts the number of demand Data Read requests, initiated by load instructions, that hit L2 cache
    L2_RQSTS.L2_PF_MISS Counts all L2 HW prefetcher requests that missed L2.
    L2_RQSTS.L2_PF_HIT Counts all L2 HW prefetcher requests that hit L2.
    L2_RQSTS.ALL_DEMAND_DATA_RD Counts any demand and L1 HW prefetch data load requests to L2.
    L2_RQSTS.ALL_RFO Counts all L2 store RFO requests.
    L2_RQSTS.ALL_CODE_RD Counts all L2 code requests.
    L2_RQSTS.ALL_PF Counts all L2 HW prefetcher requests.
    L2_DEMAND_RQSTS.WB_HIT Not rejected writebacks that hit L2 cache.
    LONGEST_LAT_CACHE.MISS This event counts each cache miss condition for references to the last level cache.
    LONGEST_LAT_CACHE.REFERENCE This event counts requests originating from the core that reference a cache line in the last level cache.
    CPU_CLK_THREAD_UNHALTED.REF_XCLK Increments at the frequency of XCLK (100 MHz) when not halted.
    CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE Count XClk pulses when this thread is unhalted and the other thread is halted.
    L1D_PEND_MISS.PENDING Increments the number of outstanding L1D misses every cycle. Set Cmask = 1 and Edge =1 to count occurrences.
    L1D_PEND_MISS.REQUEST_FB_FULL Number of times a request needed a FB entry but there was no entry available for it. That is the FB unavailability was dominant reason for blocking the request. A request includes cacheable/uncacheable demands that is load, store or SW prefetch. HWP are e.
    L1D_PEND_MISS.PENDING_CYCLES Cycles with L1D load Misses outstanding.
    DTLB_STORE_MISSES.MISS_CAUSES_A_WALK Miss in all TLB levels causes a page walk of any page size (4K/2M/4M/1G).
    DTLB_STORE_MISSES.WALK_COMPLETED_4K Completed page walks due to store misses in one or more TLB levels of 4K page structure.
    DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M Completed page walks due to store misses in one or more TLB levels of 2M/4M page structure.
    DTLB_STORE_MISSES.WALK_COMPLETED_1G Store misses in all DTLB levels that cause completed page walks. (1G)
    DTLB_STORE_MISSES.WALK_DURATION This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses.
    DTLB_STORE_MISSES.STLB_HIT_4K This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
    DTLB_STORE_MISSES.STLB_HIT_2M This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
    DTLB_STORE_MISSES.PDE_CACHE_MISS DTLB store misses with low part of linear-to-physical address translation missed.
    LOAD_HIT_PRE.SW_PF Non-SW-prefetch load dispatches that hit fill buffer allocated for S/W prefetch.
    LOAD_HIT_PRE.HW_PF Non-SW-prefetch load dispatches that hit fill buffer allocated for H/W prefetch.
    EPT.WALK_CYCLES Cycle count for an Extended Page table walk.
    L1D.REPLACEMENT This event counts when new data lines are brought into the L1 Data cache, which cause other lines to be evicted from the cache.
    TX_MEM.ABORT_CONFLICT Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address.
    TX_MEM.ABORT_CAPACITY_WRITE Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes.
    TX_MEM.ABORT_HLE_STORE_TO_ELIDED_LOCK Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer.
    TX_MEM.ABORT_HLE_ELISION_BUFFER_NOT_EMPTY Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero.
    TX_MEM.ABORT_HLE_ELISION_BUFFER_MISMATCH Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer.
    TX_MEM.ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer.
    TX_MEM.HLE_ELISION_BUFFER_FULL Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.
    MOVE_ELIMINATION.INT_ELIMINATED Number of integer move elimination candidate uops that were eliminated.
    MOVE_ELIMINATION.SIMD_ELIMINATED Number of SIMD move elimination candidate uops that were eliminated.
    MOVE_ELIMINATION.INT_NOT_ELIMINATED Number of integer move elimination candidate uops that were not eliminated.
    MOVE_ELIMINATION.SIMD_NOT_ELIMINATED Number of SIMD move elimination candidate uops that were not eliminated.
    CPL_CYCLES.RING0 Unhalted core cycles when the thread is in ring 0.
    CPL_CYCLES.RING123 Unhalted core cycles when the thread is not in ring 0.
    CPL_CYCLES.RING0_TRANS Number of intervals between processor halts while thread is in ring 0.
    TX_EXEC.MISC1 Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.
    TX_EXEC.MISC2 Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region.
    TX_EXEC.MISC3 Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded.
    TX_EXEC.MISC4 Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region.
    TX_EXEC.MISC5 Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region.
    RS_EVENTS.EMPTY_CYCLES This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end.
    OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD Offcore outstanding demand data read transactions in SQ to uncore. Set Cmask=1 to count cycles.
    OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD Offcore outstanding Demand code Read transactions in SQ to uncore. Set Cmask=1 to count cycles.
    OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO Offcore outstanding RFO store transactions in SQ to uncore. Set Cmask=1 to count cycles.
    OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD Offcore outstanding cacheable data read transactions in SQ to uncore. Set Cmask=1 to count cycles.
    OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore.
    OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore.
    OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle.
    LOCK_CYCLES.SPLIT_LOCK_UC_LOCK_DURATION Cycles in which the L1D and L2 are locked, due to a UC lock or split lock.
    LOCK_CYCLES.CACHE_LOCK_DURATION Cycles in which the L1D is locked.
    IDQ.EMPTY Counts cycles the IDQ is empty.
    IDQ.MITE_UOPS Increment each cycle # of uops delivered to IDQ from MITE path. Set Cmask = 1 to count cycles.
    IDQ.DSB_UOPS Increment each cycle. # of uops delivered to IDQ from DSB path. Set Cmask = 1 to count cycles.
    IDQ.MS_DSB_UOPS Increment each cycle # of uops delivered to IDQ when MS_busy by DSB. Set Cmask = 1 to count cycles. Add Edge=1 to count # of delivery.
    IDQ.MS_MITE_UOPS Increment each cycle # of uops delivered to IDQ when MS_busy by MITE. Set Cmask = 1 to count cycles.
    IDQ.MS_UOPS This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
    IDQ.MS_CYCLES This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
    IDQ.MITE_CYCLES Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path.
    IDQ.DSB_CYCLES Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path.
    IDQ.MS_DSB_CYCLES Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.
    IDQ.MS_DSB_OCCUR Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy.
    IDQ.ALL_DSB_CYCLES_4_UOPS Counts cycles DSB is delivered four uops. Set Cmask = 4.
    IDQ.ALL_DSB_CYCLES_ANY_UOPS Counts cycles DSB is delivered at least one uops. Set Cmask = 1.
    IDQ.ALL_MITE_CYCLES_4_UOPS Counts cycles MITE is delivered four uops. Set Cmask = 4.
    IDQ.ALL_MITE_CYCLES_ANY_UOPS Counts cycles MITE is delivered at least one uop. Set Cmask = 1.
    IDQ.MITE_ALL_UOPS Number of uops delivered to IDQ from any path.
    ICACHE.HIT Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both cacheable and noncacheable, including UC fetches.
    ICACHE.MISSES This event counts Instruction Cache (ICACHE) misses.
    ICACHE.IFETCH_STALL Cycles where a code fetch is stalled due to L1 instruction-cache miss.
    ITLB_MISSES.MISS_CAUSES_A_WALK Misses in ITLB that causes a page walk of any page size.
    ITLB_MISSES.WALK_COMPLETED_4K Completed page walks due to misses in ITLB 4K page entries.
    ITLB_MISSES.WALK_COMPLETED_2M_4M Completed page walks due to misses in ITLB 2M/4M page entries.
    ITLB_MISSES.WALK_COMPLETED_1G Store miss in all TLB levels causes a page walk that completes. (1G)
    ITLB_MISSES.WALK_DURATION This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses.
    ITLB_MISSES.STLB_HIT_4K ITLB misses that hit STLB (4K).
    ITLB_MISSES.STLB_HIT_2M ITLB misses that hit STLB (2M).
    ILD_STALL.LCP This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP).
    ILD_STALL.IQ_FULL Stall cycles due to IQ is full.
    BR_INST_EXEC.NONTAKEN_CONDITIONAL Not taken macro-conditional branches.
    BR_INST_EXEC.TAKEN_CONDITIONAL Taken speculative and retired macro-conditional branches.
    BR_INST_EXEC.TAKEN_DIRECT_JUMP Taken speculative and retired macro-conditional branch instructions excluding calls and indirects.
    BR_INST_EXEC.TAKEN_INDIRECT_JUMP_NON_CALL_RET Taken speculative and retired indirect branches excluding calls and returns.
    BR_INST_EXEC.TAKEN_INDIRECT_NEAR_RETURN Taken speculative and retired indirect branches with return mnemonic.
    BR_INST_EXEC.TAKEN_DIRECT_NEAR_CALL Taken speculative and retired direct near calls.
    BR_INST_EXEC.TAKEN_INDIRECT_NEAR_CALL Taken speculative and retired indirect calls.
    BR_INST_EXEC.ALL_CONDITIONAL Speculative and retired macro-conditional branches.
    BR_INST_EXEC.ALL_DIRECT_JMP Speculative and retired macro-unconditional branches excluding calls and indirects.
    BR_INST_EXEC.ALL_INDIRECT_JUMP_NON_CALL_RET Speculative and retired indirect branches excluding calls and returns.
    BR_INST_EXEC.ALL_INDIRECT_NEAR_RETURN Speculative and retired indirect return branches.
    BR_INST_EXEC.ALL_DIRECT_NEAR_CALL Speculative and retired direct near calls.
    BR_INST_EXEC.ALL_BRANCHES Counts all near executed branches (not necessarily retired).
    BR_MISP_EXEC.NONTAKEN_CONDITIONAL Not taken speculative and retired mispredicted macro conditional branches.
    BR_MISP_EXEC.TAKEN_CONDITIONAL Taken speculative and retired mispredicted macro conditional branches.
    BR_MISP_EXEC.TAKEN_INDIRECT_JUMP_NON_CALL_RET Taken speculative and retired mispredicted indirect branches excluding calls and returns.
    BR_MISP_EXEC.TAKEN_RETURN_NEAR Taken speculative and retired mispredicted indirect branches with return mnemonic.
    BR_MISP_EXEC.ALL_CONDITIONAL Speculative and retired mispredicted macro conditional branches.
    BR_MISP_EXEC.ALL_INDIRECT_JUMP_NON_CALL_RET Mispredicted indirect branches excluding calls and returns.
    BR_MISP_EXEC.ALL_BRANCHES Counts all near executed branches (not necessarily retired).
    IDQ_UOPS_NOT_DELIVERED.CORE This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis.
    IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis.
    IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_1_UOP_DELIV.CORE Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled.
    IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_2_UOP_DELIV.CORE Cycles with less than 2 uops delivered by the front end.
    IDQ_UOPS_NOT_DELIVERED.CYCLES_LE_3_UOP_DELIV.CORE Cycles with less than 3 uops delivered by the front end.
    IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.
    UOPS_EXECUTED_PORT.PORT_0 Cycles which a uop is dispatched on port 0 in this thread.
    UOPS_EXECUTED_PORT.PORT_1 Cycles which a uop is dispatched on port 1 in this thread.
    UOPS_EXECUTED_PORT.PORT_2 Cycles which a uop is dispatched on port 2 in this thread.
    UOPS_EXECUTED_PORT.PORT_3 Cycles which a uop is dispatched on port 3 in this thread.
    UOPS_EXECUTED_PORT.PORT_4 Cycles which a uop is dispatched on port 4 in this thread.
    UOPS_EXECUTED_PORT.PORT_5 Cycles which a uop is dispatched on port 5 in this thread.
    UOPS_EXECUTED_PORT.PORT_6 Cycles which a uop is dispatched on port 6 in this thread.
    UOPS_EXECUTED_PORT.PORT_7 Cycles which a uop is dispatched on port 7 in this thread.
    RESOURCE_STALLS.ANY Cycles allocation is stalled due to resource related reason.
    RESOURCE_STALLS.RS Cycles stalled due to no eligible RS entry available.
    RESOURCE_STALLS.SB This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available.
    RESOURCE_STALLS.ROB Cycles stalled due to re-order buffer full.
    CYCLE_ACTIVITY.CYCLES_L2_PENDING Cycles with pending L2 miss loads. Set Cmask=2 to count cycle.
    CYCLE_ACTIVITY.CYCLES_L1D_PENDING Cycles with pending L1 data cache miss loads. Set Cmask=8 to count cycle.
    CYCLE_ACTIVITY.CYCLES_LDM_PENDING Cycles with pending memory loads. Set Cmask=2 to count cycle.
    CYCLE_ACTIVITY.CYCLES_NO_EXECUTE This event counts cycles during which no instructions were executed in the execution stage of the pipeline.
    CYCLE_ACTIVITY.STALLS_L2_PENDING Number of loads missed L2.
    CYCLE_ACTIVITY.STALLS_LDM_PENDING This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data).
    CYCLE_ACTIVITY.STALLS_L1D_PENDING Execution stalls due to L1 data cache miss loads. Set Cmask=0CH.
    LSD.UOPS Number of uops delivered by the LSD.
    DSB2MITE_SWITCHES.PENALTY_CYCLES Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
    ITLB.ITLB_FLUSH Counts the number of ITLB flushes, includes 4k/2M/4M pages.
    OFFCORE_REQUESTS.DEMAND_DATA_RD Demand data read requests sent to uncore.
    OFFCORE_REQUESTS.DEMAND_CODE_RD Demand code read requests sent to uncore.
    OFFCORE_REQUESTS.DEMAND_RFO Demand RFO read requests sent to uncore, including regular RFOs, locks, ItoM.
    OFFCORE_REQUESTS.ALL_DATA_RD Data read requests sent to uncore (demand and prefetch).
    UOPS_EXECUTED.CORE Counts total number of uops to be executed per-core each cycle.
    UOPS_EXECUTED.STALL_CYCLES Counts number of cycles no uops were dispatched to be executed on this thread.
    OFFCORE_REQUESTS_BUFFER.SQ_FULL Offcore requests buffer cannot take more entries for this thread core.
    PAGE_WALKER_LOADS.DTLB_L1 Number of DTLB page walker loads that hit in the L1+FB.
    PAGE_WALKER_LOADS.ITLB_L1 Number of ITLB page walker loads that hit in the L1+FB.
    PAGE_WALKER_LOADS.EPT_DTLB_L1 Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB.
    PAGE_WALKER_LOADS.EPT_ITLB_L1 Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB.
    PAGE_WALKER_LOADS.DTLB_L2 Number of DTLB page walker loads that hit in the L2.
    PAGE_WALKER_LOADS.ITLB_L2 Number of ITLB page walker loads that hit in the L2.
    PAGE_WALKER_LOADS.EPT_DTLB_L2 Counts the number of Extended Page Table walks from the DTLB that hit in the L2.
    PAGE_WALKER_LOADS.EPT_ITLB_L2 Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
    PAGE_WALKER_LOADS.DTLB_L3 Number of DTLB page walker loads that hit in the L3.
    PAGE_WALKER_LOADS.ITLB_L3 Number of ITLB page walker loads that hit in the L3.
    PAGE_WALKER_LOADS.EPT_DTLB_L3 Counts the number of Extended Page Table walks from the DTLB that hit in the L3.
    PAGE_WALKER_LOADS.EPT_ITLB_L3 Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
    PAGE_WALKER_LOADS.DTLB_MEMORY Number of DTLB page walker loads from memory.
    PAGE_WALKER_LOADS.ITLB_MEMORY Number of ITLB page walker loads from memory.
    PAGE_WALKER_LOADS.EPT_DTLB_MEMORY Counts the number of Extended Page Table walks from the DTLB that hit in memory.
    PAGE_WALKER_LOADS.EPT_ITLB_MEMORY Counts the number of Extended Page Table walks from the ITLB that hit in memory.
    TLB_FLUSH.DTLB_THREAD DTLB flush attempts of the thread-specific entries.
    TLB_FLUSH.STLB_ANY Count number of STLB flush attempts.
    INST_RETIRED.ANY_P Number of instructions at retirement.
    INST_RETIRED.X87 This is a non-precise version (that is, does not use PEBS) of the event that counts FP operations retired. For X87 FP operations that have no exceptions counting also includes flows that have several X87, or flows that use X87 uops in the exception handling.
    INST_RETIRED.PREC_DIST Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution.
    OTHER_ASSISTS.AVX_TO_SSE Number of transitions from AVX-256 to legacy SSE when penalty applicable.
    OTHER_ASSISTS.SSE_TO_AVX Number of transitions from SSE to AVX-256 when penalty applicable.
    OTHER_ASSISTS.ANY_WB_ASSIST Number of microcode assists invoked by HW upon uop writeback.
    UOPS_RETIRED.ALL Counts the number of micro-ops retired. Use Cmask=1 and invert to count active cycles or stalled cycles.
    UOPS_RETIRED.RETIRE_SLOTS This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.
    UOPS_RETIRED.ALL_PS Actually retired uops.
    UOPS_RETIRED.RETIRE_SLOTS_PS Retirement slots used.
    UOPS_RETIRED.STALL_CYCLES Cycles without actually retired uops.
    UOPS_RETIRED.TOTAL_CYCLES Cycles with less than 10 actually retired uops.
    UOPS_RETIRED.CORE_STALL_CYCLES Cycles without actually retired uops.
    MACHINE_CLEARS.CYCLES Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes.
    MACHINE_CLEARS.MEMORY_ORDERING This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently.
    MACHINE_CLEARS.SMC This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently.
    MACHINE_CLEARS.MASKMOV This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0.
    BR_INST_RETIRED.CONDITIONAL Counts the number of conditional branch instructions retired.
    BR_INST_RETIRED.NEAR_CALL Direct and indirect near call instructions retired.
    BR_INST_RETIRED.ALL_BRANCHES Branch instructions at retirement.
    BR_INST_RETIRED.NEAR_RETURN Counts the number of near return instructions retired.
    BR_INST_RETIRED.NOT_TAKEN Counts the number of not taken branch instructions retired.
    BR_INST_RETIRED.NEAR_TAKEN Number of near taken branches retired.
    BR_INST_RETIRED.FAR_BRANCH Number of far branches retired.
    BR_INST_RETIRED.CONDITIONAL_PS Conditional branch instructions retired.
    BR_INST_RETIRED.NEAR_CALL_PS Direct and indirect near call instructions retired.
    BR_INST_RETIRED.ALL_BRANCHES_PS All (macro) branch instructions retired.
    BR_INST_RETIRED.NEAR_RETURN_PS Return instructions retired.
    BR_INST_RETIRED.NEAR_TAKEN_PS Taken branch instructions retired.
    BR_INST_RETIRED.NEAR_CALL_R3 Direct and indirect macro near call instructions retired (captured in ring 3).
    BR_INST_RETIRED.NEAR_CALL_R3_PS Direct and indirect macro near call instructions retired (captured in ring 3).
    BR_MISP_RETIRED.CONDITIONAL Mispredicted conditional branch instructions retired.
    BR_MISP_RETIRED.ALL_BRANCHES Mispredicted branch instructions at retirement.
    BR_MISP_RETIRED.CONDITIONAL_PS Mispredicted conditional branch instructions retired.
    BR_MISP_RETIRED.ALL_BRANCHES_PS This event counts all mispredicted branch instructions retired. This is a precise event.
    HLE_RETIRED.START Number of times an HLE execution started.
    HLE_RETIRED.COMMIT Number of times an HLE execution successfully committed.
    HLE_RETIRED.ABORTED Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
    HLE_RETIRED.ABORTED_MISC1 Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).
    HLE_RETIRED.ABORTED_MISC2 Number of times an HLE execution aborted due to uncommon conditions.
    HLE_RETIRED.ABORTED_MISC3 Number of times an HLE execution aborted due to HLE-unfriendly instructions.
    HLE_RETIRED.ABORTED_MISC4 Number of times an HLE execution aborted due to incompatible memory type.
    HLE_RETIRED.ABORTED_MISC5 Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts).
    RTM_RETIRED.START Number of times an RTM execution started.
    RTM_RETIRED.COMMIT Number of times an RTM execution successfully committed.
    RTM_RETIRED.ABORTED Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
    RTM_RETIRED.ABORTED_MISC1 Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts).
    RTM_RETIRED.ABORTED_MISC2 Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts).
    RTM_RETIRED.ABORTED_MISC3 Number of times an RTM execution aborted due to HLE-unfriendly instructions.
    RTM_RETIRED.ABORTED_MISC4 Number of times an RTM execution aborted due to incompatible memory type.
    RTM_RETIRED.ABORTED_MISC5 Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt).
    FP_ASSIST.X87_OUTPUT Number of X87 FP assists due to output values.
    FP_ASSIST.X87_INPUT Number of X87 FP assists due to input values.
    FP_ASSIST.SIMD_OUTPUT Number of SIMD FP assists due to output values.
    FP_ASSIST.SIMD_INPUT Number of SIMD FP assists due to input values.
    FP_ASSIST.ANY Cycles with any input/output SSE* or FP assists.
    ROB_MISC_EVENTS.LBR_INSERTS Count cases of saving new LBR records by hardware.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4 Randomly selected loads with latency value being above 4.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8 Randomly selected loads with latency value being above 8.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16 Randomly selected loads with latency value being above 16.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32 Randomly selected loads with latency value being above 32.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64 Randomly selected loads with latency value being above 64.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128 Randomly selected loads with latency value being above 128.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256 Randomly selected loads with latency value being above 256.
    MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512 Randomly selected loads with latency value being above 512.
    MEM_UOPS_RETIRED.STLB_MISS_LOADS Retired load uops that miss the STLB.
    MEM_UOPS_RETIRED.STLB_MISS_STORES Retired store uops that miss the STLB.
    MEM_UOPS_RETIRED.LOCK_LOADS Retired load uops with locked access.
    MEM_UOPS_RETIRED.SPLIT_LOADS Retired load uops that split across a cacheline boundary.
    MEM_UOPS_RETIRED.SPLIT_STORES Retired store uops that split across a cacheline boundary.
    MEM_UOPS_RETIRED.ALL_LOADS All retired load uops.
    MEM_UOPS_RETIRED.ALL_STORES All retired store uops.
    MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS Retired load uops that miss the STLB. (precise Event)
    MEM_UOPS_RETIRED.STLB_MISS_STORES_PS Retired store uops that miss the STLB. (precise Event)
    MEM_UOPS_RETIRED.LOCK_LOADS_PS Retired load uops with locked access. (precise Event)
    MEM_UOPS_RETIRED.SPLIT_LOADS_PS This event counts load uops retired which had memory addresses spilt across 2 cache lines. A line split is across 64B cache-lines which may include a page split (4K). This is a precise event.
    MEM_UOPS_RETIRED.SPLIT_STORES_PS This event counts store uops retired which had memory addresses spilt across 2 cache lines. A line split is across 64B cache-lines which may include a page split (4K). This is a precise event.
    MEM_UOPS_RETIRED.ALL_LOADS_PS All retired load uops. (precise Event)
    MEM_UOPS_RETIRED.ALL_STORES_PS This event counts all store uops retired. This is a precise event.
    MEM_LOAD_UOPS_RETIRED.L1_HIT Retired load uops with L1 cache hits as data sources.
    MEM_LOAD_UOPS_RETIRED.L2_HIT Retired load uops with L2 cache hits as data sources.
    MEM_LOAD_UOPS_RETIRED.L3_HIT Retired load uops with L3 cache hits as data sources.
    MEM_LOAD_UOPS_RETIRED.L1_MISS Retired load uops missed L1 cache as data sources.
    MEM_LOAD_UOPS_RETIRED.L2_MISS Retired load uops missed L2. Unknown data source excluded.
    MEM_LOAD_UOPS_RETIRED.L3_MISS Retired load uops missed L3. Excludes unknown data source .
    MEM_LOAD_UOPS_RETIRED.HIT_LFB Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready.
    MEM_LOAD_UOPS_RETIRED.L1_HIT_PS Retired load uops with L1 cache hits as data sources.
    MEM_LOAD_UOPS_RETIRED.L2_HIT_PS Retired load uops with L2 cache hits as data sources.
    MEM_LOAD_UOPS_RETIRED.L3_HIT_PS This event counts retired load uops in which data sources were data hits in the L3 cache without snoops required. This does not include hardware prefetches. This is a precise event.
    MEM_LOAD_UOPS_RETIRED.L1_MISS_PS This event counts retired load uops in which data sources missed in the L1 cache. This does not include hardware prefetches. This is a precise event.
    MEM_LOAD_UOPS_RETIRED.L2_MISS_PS Retired load uops with L2 cache misses as data sources.
    MEM_LOAD_UOPS_RETIRED.L3_MISS_PS Miss in last-level (L3) cache. Excludes Unknown data-source.
    MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM Retired load uops which data sources were HitM responses from shared L3.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_NONE Retired load uops which data sources were hits in L3 without snoops required.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS This event counts retired load uops that hit in the L3 cache, but required a cross-core snoop which resulted in a HIT in an on-pkg core cache. This does not include hardware prefetches. This is a precise event.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS This event counts retired load uops that hit in the L3 cache, but required a cross-core snoop which resulted in a HITM (hit modified) in an on-pkg core cache. This does not include hardware prefetches. This is a precise event.
    MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_NONE_PS Retired load uops which data sources were hits in L3 without snoops required.
    MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches.
    MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM_PS This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. This is a precise event.
    CPU_CLK_UNHALTED.THREAD_P Counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.
    L2_TRANS.DEMAND_DATA_RD Demand data read requests that access L2 cache.
    L2_TRANS.RFO RFO requests that access L2 cache.
    L2_TRANS.CODE_RD L2 cache accesses when fetching instructions.
    L2_TRANS.ALL_PF Any MLC or L3 HW prefetch accessing L2, including rejects.
    L2_TRANS.L1D_WB L1D writebacks that access L2 cache.
    L2_TRANS.L2_FILL L2 fill requests that access L2 cache.
    L2_TRANS.L2_WB L2 writebacks that access L2 cache.
    L2_TRANS.ALL_REQUESTS Transactions accessing L2 pipe.
    L2_LINES_IN.I L2 cache lines in I state filling L2.
    L2_LINES_IN.S L2 cache lines in S state filling L2.
    L2_LINES_IN.E L2 cache lines in E state filling L2.
    L2_LINES_IN.ALL This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss.
    L2_LINES_OUT.DEMAND_CLEAN Clean L2 cache lines evicted by demand.
    L2_LINES_OUT.DEMAND_DIRTY Dirty L2 cache lines evicted by demand.
    SQ_MISC.SPLIT_LOCK Split locks in SQ
    BR_MISP_EXEC.TAKEN_INDIRECT_NEAR_CALL Taken speculative and retired mispredicted indirect calls.
    UOPS_EXECUTED_PORT.PORT_0_CORE Cycles per core when uops are exectuted in port 0.
    UOPS_EXECUTED_PORT.PORT_1_CORE Cycles per core when uops are exectuted in port 1.
    UOPS_EXECUTED_PORT.PORT_2_CORE Cycles per core when uops are dispatched to port 2.
    UOPS_EXECUTED_PORT.PORT_3_CORE Cycles per core when uops are dispatched to port 3.
    UOPS_EXECUTED_PORT.PORT_4_CORE Cycles per core when uops are exectuted in port 4.
    UOPS_EXECUTED_PORT.PORT_5_CORE Cycles per core when uops are exectuted in port 5.
    UOPS_EXECUTED_PORT.PORT_6_CORE Cycles per core when uops are exectuted in port 6.
    UOPS_EXECUTED_PORT.PORT_7_CORE Cycles per core when uops are dispatched to port 7.
    BR_MISP_RETIRED.NEAR_TAKEN Number of near branch instructions retired that were taken but mispredicted.
    BR_MISP_RETIRED.NEAR_TAKEN_PS number of near branch instructions retired that were mispredicted and taken.
    DTLB_LOAD_MISSES.WALK_COMPLETED Completed page walks in any TLB of any page size due to demand load misses.
    DTLB_LOAD_MISSES.STLB_HIT Number of cache load STLB hits. No page walk.
    L2_RQSTS.RFO_HIT Counts the number of store RFO requests that hit the L2 cache.
    L2_RQSTS.RFO_MISS Counts the number of store RFO requests that miss the L2 cache.
    L2_RQSTS.CODE_RD_HIT Number of instruction fetches that hit the L2 cache.
    L2_RQSTS.CODE_RD_MISS Number of instruction fetches that missed the L2 cache.
    L2_RQSTS.ALL_DEMAND_MISS Demand requests that miss L2 cache.
    L2_RQSTS.ALL_DEMAND_REFERENCES Demand requests to L2 cache.
    L2_RQSTS.MISS All requests that missed L2.
    L2_RQSTS.REFERENCES All requests to L2 cache.
    DTLB_STORE_MISSES.WALK_COMPLETED Completed page walks due to store miss in any TLB levels of any page size (4K/2M/4M/1G).
    DTLB_STORE_MISSES.STLB_HIT Store operations that miss the first TLB level but hit the second and do not cause page walks.
    ITLB_MISSES.WALK_COMPLETED Completed page walks in ITLB of any page size.
    ITLB_MISSES.STLB_HIT ITLB misses that hit STLB. No page walk.
    UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC This events counts the cycles where at least one uop was executed. It is counted per thread.
    UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC This events counts the cycles where at least two uop were executed. It is counted per thread.
    UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC This events counts the cycles where at least three uop were executed. It is counted per thread.
    UOPS_EXECUTED.CYCLES_GE_4_UOPS_EXEC Cycles where at least 4 uops were executed per-thread.
    BACLEARS.ANY Number of front end re-steers due to BPU misprediction.
    HLE_RETIRED.ABORTED_PS Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
    RTM_RETIRED.ABORTED_PS Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
    MACHINE_CLEARS.COUNT Number of machine clears (nukes) of any type.
    LSD.CYCLES_ACTIVE Cycles Uops delivered by the LSD, but didn't come from the decoder.
    LSD.CYCLES_4_UOPS Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.
    RS_EVENTS.EMPTY_END Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues.
    IDQ.MS_SWITCHES Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.
    UOPS_DISPATCHED_PORT.PORT_0 Cycles per thread when uops are executed in port 0.
    UOPS_DISPATCHED_PORT.PORT_1 Cycles per thread when uops are executed in port 1.
    UOPS_DISPATCHED_PORT.PORT_2 Cycles per thread when uops are executed in port 2.
    UOPS_DISPATCHED_PORT.PORT_3 Cycles per thread when uops are executed in port 3.
    UOPS_DISPATCHED_PORT.PORT_4 Cycles per thread when uops are executed in port 4.
    UOPS_DISPATCHED_PORT.PORT_5 Cycles per thread when uops are executed in port 5.
    UOPS_DISPATCHED_PORT.PORT_6 Cycles per thread when uops are executed in port 6.
    UOPS_DISPATCHED_PORT.PORT_7 Cycles per thread when uops are executed in port 7.
    CPU_CLK_UNHALTED.THREAD_ANY Core cycles when at least one thread on the physical core is not in halt state.
    CPU_CLK_UNHALTED.THREAD_P_ANY Core cycles when at least one thread on the physical core is not in halt state.
    CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY Reference cycles when the at least one thread on the physical core is unhalted (counts at 100 MHz rate).
    INT_MISC.RECOVERY_CYCLES_ANY Core cycles the allocator was stalled due to recovery from earlier clear event for any thread running on the physical core (e.g. misprediction or memory nuke).
    UOPS_EXECUTED.CORE_CYCLES_GE_1 Cycles at least 1 micro-op is executed from any thread on physical core.
    UOPS_EXECUTED.CORE_CYCLES_GE_2 Cycles at least 2 micro-op is executed from any thread on physical core.
    UOPS_EXECUTED.CORE_CYCLES_GE_3 Cycles at least 3 micro-op is executed from any thread on physical core.
    UOPS_EXECUTED.CORE_CYCLES_GE_4 Cycles at least 4 micro-op is executed from any thread on physical core.
    UOPS_EXECUTED.CORE_CYCLES_NONE Cycles with no micro-ops executed from any thread on physical core.
    OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD_GE_6 Cycles with at least 6 offcore outstanding Demand Data Read transactions in uncore queue.
    L1D_PEND_MISS.PENDING_CYCLES_ANY Cycles with L1D load Misses outstanding from any thread on physical core.
    L1D_PEND_MISS.FB_FULL Cycles a demand request was blocked due to Fill Buffers inavailability.
    AVX_INSTS.ALL Note that a whole rep string only counts AVX_INST.ALL once.
    ICACHE.IFDATA_STALL Cycles where a code fetch is stalled due to L1 instruction-cache miss.
    CPU_CLK_UNHALTED.REF_XCLK Reference cycles when the thread is unhalted. (counts at 100 MHz rate)
    CPU_CLK_UNHALTED.REF_XCLK_ANY Reference cycles when the at least one thread on the physical core is unhalted (counts at 100 MHz rate).
    CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE Count XClk pulses when this thread is unhalted and the other thread is halted.
    UNC_CBO_XSNP_RESPONSE.MISS_EXTERNAL An external snoop misses in some processor core.
    UNC_CBO_XSNP_RESPONSE.MISS_XCORE A cross-core snoop initiated by this Cbox due to processor core memory request which misses in some processor core.
    UNC_CBO_XSNP_RESPONSE.MISS_EVICTION A cross-core snoop resulted from L3 Eviction which misses in some processor core.
    UNC_CBO_XSNP_RESPONSE.HIT_EXTERNAL An external snoop hits a non-modified line in some processor core.
    UNC_CBO_XSNP_RESPONSE.HIT_XCORE A cross-core snoop initiated by this Cbox due to processor core memory request which hits a non-modified line in some processor core.
    UNC_CBO_XSNP_RESPONSE.HIT_EVICTION A cross-core snoop resulted from L3 Eviction which hits a non-modified line in some processor core.
    UNC_CBO_XSNP_RESPONSE.HITM_EXTERNAL An external snoop hits a modified line in some processor core.
    UNC_CBO_XSNP_RESPONSE.HITM_XCORE A cross-core snoop initiated by this Cbox due to processor core memory request which hits a modified line in some processor core.
    UNC_CBO_XSNP_RESPONSE.HITM_EVICTION A cross-core snoop resulted from L3 Eviction which hits a modified line in some processor core.
    UNC_CBO_CACHE_LOOKUP.READ_M L3 Lookup read request that access cache and found line in M-state.
    UNC_CBO_CACHE_LOOKUP.WRITE_M L3 Lookup write request that access cache and found line in M-state.
    UNC_CBO_CACHE_LOOKUP.EXTSNP_M L3 Lookup external snoop request that access cache and found line in M-state.
    UNC_CBO_CACHE_LOOKUP.ANY_M L3 Lookup any request that access cache and found line in M-state.
    UNC_CBO_CACHE_LOOKUP.READ_I L3 Lookup read request that access cache and found line in I-state.
    UNC_CBO_CACHE_LOOKUP.WRITE_I L3 Lookup write request that access cache and found line in I-state.
    UNC_CBO_CACHE_LOOKUP.EXTSNP_I L3 Lookup external snoop request that access cache and found line in I-state.
    UNC_CBO_CACHE_LOOKUP.ANY_I L3 Lookup any request that access cache and found line in I-state.
    UNC_CBO_CACHE_LOOKUP.READ_MESI L3 Lookup read request that access cache and found line in any MESI-state.
    UNC_CBO_CACHE_LOOKUP.WRITE_MESI L3 Lookup write request that access cache and found line in MESI-state.
    UNC_CBO_CACHE_LOOKUP.EXTSNP_MESI L3 Lookup external snoop request that access cache and found line in MESI-state.
    UNC_CBO_CACHE_LOOKUP.ANY_MESI L3 Lookup any request that access cache and found line in MESI-state.
    UNC_CBO_CACHE_LOOKUP.ANY_ES L3 Lookup any request that access cache and found line in E or S-state.
    UNC_CBO_CACHE_LOOKUP.EXTSNP_ES L3 Lookup external snoop request that access cache and found line in E or S-state.
    UNC_CBO_CACHE_LOOKUP.READ_ES L3 Lookup read request that access cache and found line in E or S-state.
    UNC_CBO_CACHE_LOOKUP.WRITE_ES L3 Lookup write request that access cache and found line in E or S-state.
    UNC_ARB_TRK_OCCUPANCY.ALL Each cycle count number of all Core outgoing valid entries. Such entry is defined as valid from it's allocation till first of IDI0 or DRS0 messages is sent out. Accounts for Coherent and non-coherent traffic.
    UNC_ARB_TRK_REQUESTS.ALL Total number of Core outgoing entries allocated. Accounts for Coherent and non-coherent traffic.
    UNC_ARB_TRK_REQUESTS.WRITES Number of Writes allocated - any write transactions: full/partials writes and evictions.
    UNC_ARB_COH_TRK_OCCUPANCY.All Each cycle count number of valid entries in Coherency Tracker queue from allocation till deallocation. Aperture requests (snoops) appear as NC decoded internally and become coherent (snoop L3, access memory).
    UNC_ARB_COH_TRK_REQUESTS.ALL Number of entries allocated. Account for Any type: e.g. Snoop, Core aperture, etc.
    UNC_ARB_TRK_OCCUPANCY.CYCLES_WITH_ANY_REQUEST Cycles with at least one request outstanding is waiting for data return from memory controller. Account for coherent and non-coherent requests initiated by IA Cores, Processor Graphics Unit, or LLC.
    UNC_CLOCK.SOCKET This 48-bit fixed counter counts the UCLK cycles.
    OFFCORE_RESPONSE:request=ALL_REQUESTS: response=L3_MISS.ANY_RESPONSE Counts all requestsmiss in the L3
    OFFCORE_RESPONSE:request=ALL_REQUESTS: response=L3_HIT.ANY_RESPONSE Counts all requestshit in the L3
    OFFCORE_RESPONSE:request=ALL_READS: response=L3_MISS.LOCAL_DRAM miss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=ALL_READS: response=L3_MISS.ANY_RESPONSE miss in the L3
    OFFCORE_RESPONSE:request=ALL_READS: response=L3_HIT.HITM_OTHER_CORE hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded
    OFFCORE_RESPONSE:request=ALL_READS: response=L3_HIT.HIT_OTHER_CORE_NO_FWD hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded
    OFFCORE_RESPONSE:request=ALL_CODE_RD: response=L3_MISS.LOCAL_DRAM Counts all demand & prefetch code readsmiss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=ALL_CODE_RD: response=L3_MISS.ANY_RESPONSE Counts all demand & prefetch code readsmiss in the L3
    OFFCORE_RESPONSE:request=ALL_CODE_RD: response=L3_HIT.HIT_OTHER_CORE_NO_FWD Counts all demand & prefetch code readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded
    OFFCORE_RESPONSE:request=ALL_RFO: response=L3_MISS.LOCAL_DRAM Counts all demand & prefetch RFOsmiss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=ALL_RFO: response=L3_MISS.ANY_RESPONSE Counts all demand & prefetch RFOsmiss in the L3
    OFFCORE_RESPONSE:request=ALL_RFO: response=L3_HIT.HITM_OTHER_CORE Counts all demand & prefetch RFOshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded
    OFFCORE_RESPONSE:request=ALL_RFO: response=L3_HIT.HIT_OTHER_CORE_NO_FWD Counts all demand & prefetch RFOshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded
    OFFCORE_RESPONSE:request=ALL_DATA_RD: response=L3_MISS.LOCAL_DRAM Counts all demand & prefetch data readsmiss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=ALL_DATA_RD: response=L3_MISS.ANY_RESPONSE Counts all demand & prefetch data readsmiss in the L3
    OFFCORE_RESPONSE:request=ALL_DATA_RD: response=L3_HIT.HITM_OTHER_CORE Counts all demand & prefetch data readshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded
    OFFCORE_RESPONSE:request=ALL_DATA_RD: response=L3_HIT.HIT_OTHER_CORE_NO_FWD Counts all demand & prefetch data readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded
    OFFCORE_RESPONSE:request=PF_L3_CODE_RD: response=L3_MISS.ANY_RESPONSE Counts prefetch (that bring data to LLC only) code readsmiss in the L3
    OFFCORE_RESPONSE:request=PF_L3_CODE_RD: response=L3_HIT.ANY_RESPONSE Counts prefetch (that bring data to LLC only) code readshit in the L3
    OFFCORE_RESPONSE:request=PF_L3_RFO: response=L3_MISS.ANY_RESPONSE Counts all prefetch (that bring data to LLC only) RFOsmiss in the L3
    OFFCORE_RESPONSE:request=PF_L3_RFO: response=L3_HIT.ANY_RESPONSE Counts all prefetch (that bring data to LLC only) RFOshit in the L3
    OFFCORE_RESPONSE:request=PF_L3_DATA_RD: response=L3_MISS.ANY_RESPONSE Counts all prefetch (that bring data to LLC only) data readsmiss in the L3
    OFFCORE_RESPONSE:request=PF_L3_DATA_RD: response=L3_HIT.ANY_RESPONSE Counts all prefetch (that bring data to LLC only) data readshit in the L3
    OFFCORE_RESPONSE:request=PF_L2_CODE_RD: response=L3_MISS.ANY_RESPONSE Counts all prefetch (that bring data to LLC only) code readsmiss in the L3
    OFFCORE_RESPONSE:request=PF_L2_CODE_RD: response=L3_HIT.ANY_RESPONSE Counts all prefetch (that bring data to LLC only) code readshit in the L3
    OFFCORE_RESPONSE:request=PF_L2_RFO: response=L3_MISS.ANY_RESPONSE Counts all prefetch (that bring data to L2) RFOsmiss in the L3
    OFFCORE_RESPONSE:request=PF_L2_RFO: response=L3_HIT.ANY_RESPONSE Counts all prefetch (that bring data to L2) RFOshit in the L3
    OFFCORE_RESPONSE:request=PF_L2_DATA_RD: response=L3_MISS.ANY_RESPONSE Counts prefetch (that bring data to L2) data readsmiss in the L3
    OFFCORE_RESPONSE:request=PF_L2_DATA_RD: response=L3_HIT.ANY_RESPONSE Counts prefetch (that bring data to L2) data readshit in the L3
    OFFCORE_RESPONSE:request=DEMAND_CODE_RD: response=L3_MISS.LOCAL_DRAM Counts all demand code readsmiss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=DEMAND_CODE_RD: response=L3_MISS.ANY_RESPONSE Counts all demand code readsmiss in the L3
    OFFCORE_RESPONSE:request=DEMAND_CODE_RD: response=L3_HIT.HITM_OTHER_CORE Counts all demand code readshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded
    OFFCORE_RESPONSE:request=DEMAND_CODE_RD: response=L3_HIT.HIT_OTHER_CORE_NO_FWD Counts all demand code readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded
    OFFCORE_RESPONSE:request=DEMAND_RFO: response=L3_MISS.LOCAL_DRAM Counts all demand data writes (RFOs)miss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=DEMAND_RFO: response=L3_MISS.ANY_RESPONSE Counts all demand data writes (RFOs)miss in the L3
    OFFCORE_RESPONSE:request=DEMAND_RFO: response=L3_HIT.HITM_OTHER_CORE Counts all demand data writes (RFOs)hit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded
    OFFCORE_RESPONSE:request=DEMAND_RFO: response=L3_HIT.HIT_OTHER_CORE_NO_FWD Counts all demand data writes (RFOs)hit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded
    OFFCORE_RESPONSE:request=DEMAND_DATA_RD: response=L3_MISS.LOCAL_DRAM Counts demand data readsmiss the L3 and the data is returned from local dram
    OFFCORE_RESPONSE:request=DEMAND_DATA_RD: response=L3_MISS.ANY_RESPONSE Counts demand data readsmiss in the L3
    OFFCORE_RESPONSE:request=DEMAND_DATA_RD: response=L3_HIT.HITM_OTHER_CORE Counts demand data readshit in the L3 and the snoop to one of the sibling cores hits the line in M state and the line is forwarded
    OFFCORE_RESPONSE:request=DEMAND_DATA_RD: response=L3_HIT.HIT_OTHER_CORE_NO_FWD Counts demand data readshit in the L3 and the snoops to sibling cores hit in either E/S state and the line is not forwarded