Category Archives: Administration

nvidia-smi: Control Your GPUs

Most users know how to check the status of their CPUs, see how much memory is free or find out how much disk space is free. In contrast, keeping tabs on the health and status of GPUs has historically been … Continue reading

Monitoring Hard Drive and RAID Health

By default, you won’t find out that one of your hard drives has failed until the data is gone. Even if you are using a software or hardware RAID, it will only continue to function if you replace failed drives. … Continue reading

Managing a Linux Software RAID with MDADM

There are several advantages to assembling hard drives into a RAID: performance, redundancy and capacity. Microway workstations and servers are most commonly outfitted with software RAID to prevent a single drive failure from destroying your operating system installation. In most … Continue reading

Take Care When Updating Your Cluster

Although modern Linux distributions have made it very easy to keep your software packages up-to-date, there are some pitfalls you might encounter when managing your compute cluster. Cluster software packages are usually not managed from the same software repository as … Continue reading