Microway’s GPU-Checker

Graphical Tool for Validating Workstation and Clustered GPUs

Microway’s GPU-Checker utility validates a single GPU or a cluster of GPUs from a single interface. GPUs are automatically detected, queried and tested on each system – the user simply needs to specify a list of host systems to test.

Designed specifically for NVIDIA’s professional Quadro and Tesla GPU products, the tool monitors the health of each GPU while tests are run. Metrics include:

  • Correctable and Uncorrectable ECC memory errors
  • Retired and Pending memory pages
  • Power consumption (compared to TDP)
  • Temperature
  • Memory and GPU clock speeds
  • PCI-Express width and generation

Harnessing the same tools which Microway uses to verify GPU cluster health, GPU-Checker runs each graphics processing unit through a battery of computational and memory-intensive tests. High-intensity stress tests ensure GPU-dense systems will not overheat under heavy loads. Memory-intense modes validate the local and global memory systems. Memory check modes also catch errors on GPUs which are running with ECC disabled.

GPU-Checker supports a variety of run modes:

  • Single GPU on local computer
  • Multiple GPUs on local computer
  • Multiple GPUs on multiple remote computers
  • Multiple GPUs on local and multiple remote computers
Screenshot of Tesla K40m GPUs running Microway GPU-Checker Utility

GPU-Checker executing diagnostics on a remote compute node

Questions and Price Inquiries

If you would like to learn more about Microway GPU-Checker please contact one of our HPC experts.

Leave a Reply

Your email address will not be published. Required fields are marked *