Diagnosing and disabling bugged CPU cores

Yesterday my CPU died: constant reboots, BSODs, freezes, etc. Usually I would buy new hardware, but I couldn’t waste time with that just yet, so I managed to find out what exactly was failing and how to avoid it. Most people won’t bother with this kind of stuff, but I thought I should document the process I followed; it might be of help to someone, some day.

Disclaimer: I’m stuck with Windows OS for various reasons, so if you use any of the OS master races, half of this stuff will be useless I’m afraid.

  1. First of all, make sure the reboots are due to CPU issues. For that, follow the usual procedure: unplug all devices you don’t need, test your ram, yada yada

  2. Download prime95, open it, choose 1 torture test thread, choose Small FFTs option, and don’t click “OK” just yet (or you risk an insta-BSOD)
  3. Go to task manager, click More details, go to Details tab, locate and right click on prime95 process, click “Set affinity”, uncheck all CPUs except the first one.
  4. Go back to prime95, click OK to start the test on that one core. Let it run for 5 minutes. If any numeric error or warning message shows up, it freezes, it ends in BSOD, etc, then that core is probably busted.
  5. Repeat the test choosing different affinities, this will test a different core each time.
  6. After that, I would also test pairs of cores. Hyperthreading, shared-FPUs, shared L2, heat dissipation problems, etc, can all lead to failures only when several cores are used at the same time.For that, your best bet is to test with affinity set to consecutive pairs of cores, and then adjusting the number of threads in prime95 accordingly.

In my case, this yielded problems in the 5th and 6th core (always failing when used in conjunction, and rarely when run in isolation). I would bet the problem is their shared FPU path, but I have no idea how to find out for sure.

Once you have determined the failing core or cores, you can survive without a new CPU this way:

  • If your BIOS allows it, look for the option to selectively disable cores. My mobo allows disabling pairs of cores rather than individual ones, but that was okay in my case, since I had to nuke one of those pairs.
  • If that’s not possible, hit Win+R, run msconfig,  go to Boot tab, click Advanced Options, mark “Number of processors”, and choose the appropriate amount. You will probably lose some  working cores, but it’s something ¯\_(ツ)_/¯

And of course, if you can afford the wait, just throw that CPU away and buy new parts. Otherwise your CPU will be limping around… and the rest of cores are likely to follow the same path anyway.

Tags: , , [en] | November 9th, 2017 |

Leave a Reply