Revision for “Common Maintenance Tasks (Workstations and Servers)” created on July 18, 2022 @ 09:50:30
Common Maintenance Tasks (Workstations and Servers)
|
The following items should be completed to maintain the health of your workstation or server. For compute clusters, please see <a title="Common Maintenance Tasks for Clusters" href="https://www.microway.com/knowledge-center-articles/common-maintenance-tasks-clusters/">Common Maintenance Tasks (Clusters)</a>.
<h2>Backup non-replaceable data</h2> Remember that RAID is not a replacement for backups. If your system is stolen, hacked or started on fire, your data will be gone forever. Automate this task or you will forget. <ul> <li>For many groups, a weekly or monthly cron job is fine. Write a script calling <code>rsync</code> or <code>tar</code> which writes the files to a separate server, NAS or SAN. Place the script in <code>/etc/cron.weekly/</code> or <code>/etc/cron.monthly/</code></li> <li>Users with more complex requirements should look at <a title="Amanda Open Source Backup Software" href="http://www.amanda.org/" target="_blank" rel="noopener noreferrer">AMANDA</a> or <a href="https://www.bacula.org/blog/" target="_blank" rel="noopener noreferrer">Bacula</a></li> <li>Tape backup systems are still available for those who prefer them. <a title="Contact Microway" href="https://www.microway.com/contact/" target="_blank" rel="noopener noreferrer">Contact us</a>.</li> </ul> <h2>Verify the health of the drive arrays (RAIDs)</h2> Drive sectors can go bad silently. Scheduling regular verifies will weed out any issues before they occur. Automate them or you will forget. <ul> <li>Linux Software RAID (mdadm) arrays can be easily kicked into verify mode. Many distributions (Red Hat, CentOS, Ubuntu) come with their own utilities. To manually start a verify, run this line for each RAID (as root): <code>echo check > /sys/block/md#/md/sync_action</code> Watch the text file <code>/proc/mdstat</code> and the output of <code>dmesg</code> to watch the status of each verify. </li> <li>Hardware RAID controllers provide their own methods for automated verifies and alert notification. Reference the controller’s manual.</li> </ul> <h2>Monitor system alarms and system health</h2> <ul> <li><em>Preferred</em>: learn how to use the IPMI capability of your system for remote monitoring and management. You’ll spend a lot less time trekking to the datacenter.</li> <li><em>Alternative</em>: listen for system alarms and check for warning LEDs.</li> </ul> <strong>Don’t ignore alarms! If you put it off, you’ll soon find that something else is wrong and the system needs major repair.</strong> |