Revision for “Common Maintenance Tasks (Workstations and Servers)” created on August 26, 2019 @ 09:25:15
Common Maintenance Tasks (Workstations and Servers)
The following items should be completed to maintain the health of your workstation or server. For compute clusters, please see <a title="Common Maintenance Tasks for Clusters" href="http://www.microway.com/knowledge-center-articles/common-maintenance-tasks-clusters/">Common Maintenance Tasks (Clusters)</a>.
<h2>Backup non-replaceable data</h2>
Remember that RAID is not a replacement for backups. If your system is stolen, hacked or started on fire, your data will be gone forever. Automate this task or you will forget.
<li>For many groups, a weekly or monthly cron job is fine. Write a script calling <code>rsync</code> or <code>tar</code> which writes the files to a separate server, NAS or SAN. Place the script in <code>/etc/cron.weekly/</code> or <code>/etc/cron.monthly/</code></li>
<li>Users with more complex requirements should look at <a title="Amanda Open Source Backup Software" href="https://www.zmanda.com/download-amanda.php" target="_blank" rel="noopener noreferrer">AMANDA</a> or <a href="http://blog.bacula.org/" target="_blank" rel="noopener noreferrer">Bacula</a></li>
<li>Tape backup systems are still available for those who prefer them. <a title="Contact Microway" href="http://www.microway.com/contact/" target="_blank" rel="noopener noreferrer">Contact us</a>.</li>
<h2>Verify the health of the drive arrays (RAIDs)</h2>
Drive sectors can go bad silently. Scheduling regular verifies will weed out any issues before they occur. Automate them or you will forget.
<li>Linux Software RAID (mdadm) arrays can be easily kicked into verify mode. Many distributions (Red Hat, CentOS, Ubuntu) come with their own utilities. To manually start a verify, run this line for each RAID (as root):
<code>echo check > /sys/block/md#/md/sync_action</code>
Watch the text file <code>/proc/mdstat</code> and the output of <code>dmesg</code> to watch the status of each verify.
<li>Hardware RAID controllers provide their own methods for automated verifies and alert notification. Reference the controller’s manual.</li>
<h2>Monitor system alarms and system health</h2>
<li><em>Preferred</em>: learn how to use the IPMI capability of your system for remote monitoring and management. You’ll spend a lot less time trekking to the datacenter.</li>
<li><em>Alternative</em>: listen for system alarms and check for warning LEDs.</li>
<strong>Don’t ignore alarms! If you put it off, you’ll soon find that something else is wrong and the system needs major repair.</strong>