|
VISIT MICROWAY AT CLUSTERWORLD (BOOTH 618) June 24-26, 2003 in San Jose |
![]()
|
Maximizing
Cluster Price/Performance In our industry one often hears the statement, “users need the maximum amount of computing capability at the lowest possible cost.” At least two questions should almost immediately come to mind. They are:
This issue of the HPC Times Newsletter addresses these two questions in detail. I hope you will gain some insights that will be useful in assessing the factors that are most relevant to you. Microway’s NodeWatch™ and MCMS™ cluster management tools are designed to provide a cluster-wide view with control capabilities to the individual node. Environmental problems, like air conditioning failures during odd hours, could result in catastrophic failures on a large percentage of the cluster nodes. The cost of such an event, adjusted by the probability of its occurrence, varies by facility. While it is not a hard expenditure, this adjusted cost should be factored into the equation. NodeWatch™ as an optional management tool may be a more cost effective alternative to unmonitored nodes. Microway’s engineers and technical support personnel are well versed in the computing capability discussion. Our work with high speed interconnects has demonstrated that applications which take advantage of low latency, high bandwidth interconnects can run faster on Linux clusters than proprietary UNIX SMP machines. However, those same applications on clusters with Gigabit Ethernet may perform poorly due to the high CPU overhead used by the communication protocol. Furthermore CPU cycles could be idle, due to the lack of data bandwidth. This problem is addressed in more detail in an article by Bob Condon on Scalable NFS Servers elsewhere in this newsletter. By taking a balanced look at the solution architecture, the user can invest in such a way that long term costs are minimized and research work is not compromised by one-dimensional views.
NodeWatch™
Enhancements Microway’s NodeWatch features the ability to monitor voltages, temperatures and fan speeds within each node of a Microway cluster. The data is fed to a web-based display and to the Ganglia system performance logging system, allowing trends to be tracked and graphed over time. This enables the system administrator to chart environmental degradation and anticipate failures before they occur. In addition to highlighting out-of-band measurements, Microway’s NodeWatch web-based user interface also allows the administrator to execute commands on any or all nodes of the cluster, and to shutdown, reboot, power on or off, or hard reset any node. The power and reset functions are equivalent to actually pressing the front-panel switches, a feature available elsewhere only on much more expensive retrofit monitoring systems. Major enhancements to the NodeWatch software are currently in final test. The new version sends email to the system administrator if data is consistently seen to be out-of-range. The new software has an additional set of administrator established limits, which, if exceeded, cause the affected machines to be automatically shut down and powered off, greatly reducing the chance of physical damage. The algorithm for emailing a warning or shutting down a node includes extensive re-checks of the measurement to be sure it is valid. Considerable work has been done to avoid false positives, without significantly increasing the chance for a false negative. Additional enhancements include user-adjustable configuration files listing all nodes in the cluster and the failsafe limits associated with them. When customers upgrade their NodeWatch-enabled cluster with additional nodes, the software will configure easily to include those new nodes in the monitoring cycle. HPC clusters are typically operated unattended in close quarters. Unlike your home or office desktop PC, a failure in a cluster can go unnoticed, compounding the issue. NodeWatch is the inexpensive solution for automatic monitoring, failsafe warning and shutdown of failing nodes.
Cluster
News The second cluster will be used for the solution of two fundamental problems in modern astrophysics. The first is related to the rapid flickering of black holes and neutron stars in our galaxy and requires the solution of the time-dependent three-dimensional equations of magneto-hydrodynamics. The second problem is related to the structure of rapidly rotating neutron stars in general relativity and the development of modern test of theories of gravity. The third cluster will be used to run high-performance computations on the mathematical physics of precipitative pattern formation, and for the analysis of experimental data. “We acquired our three clusters from Microway as they offered an aggressive proposal with a very attractive price. Their technical support team has been very helpful in making sure the clusters are ready to run when they arrive in Tucson. They have paid attention to the small details that if overlooked can cause a lot of frustration on the user’s part,” commented Mike Eklund, Computer Systems Manager, Physics Department, at the University of Arizona. |
![]()
|
Cluster
Robustness, Total Cost of Ownership and the Price of Nails Why is this so? A large percentage of the subassemblies in equipment all HPC vendors sell are built in Taiwan and China. From those same locations, vendors can source PC class material as well. For a portion of the PC market, high reliability is not a major issue. Unlike HPC clusters, PCs don’t have to run 24/7 and stay up for months at a time without crashing. An occasional crash, say once every three weeks on a PC, is hardly an earth-shattering event. However, in a 128 node cluster, a crash rate per node of once every three weeks will basically keep the cluster from running, as it will result in node failures almost hourly. There are several “corner cutting” techniques with hardware that are undesirable for HPC applications. For example, a typical Xeon cluster has anywhere from six to ten fans. It turns out that the cost of fans can vary by about a factor of two, along with their throughput and expected life. Use the cheapest fans, and you “save” $60 per node! However, failure of an inexpensive fan can lead to the catastrophic loss of a cluster node. The same concept also applies to the quality and number of capacitors used on motherboard power converters and power supplies. Altogether, a typical node has about 10 DC to DC power converters, including those in the power supply. At the lowest voltage, which turns out to be the 1.2V rail which drives the CPU, the power requirements are over 100 Watts, which is to say the current density on the motherboard will be over 100 Amps! Consistently supporting this much current requires high quality components. The same holds true for PCB material (the better material is not as brittle and can endure more temperature cycles). In short, if jobs are to run reliably, hardware in HPC cluster nodes must be much more robust than in a typical PC. In addition to hardware, the software is just as crucial. Simple mistakes, like using the wrong device driver or Linux kernel, can result in failures that appear to be hardware related, but are actually software driven. So, along with high quality hardware, it is very important to run validated operating systems. The only way you can really be sure an OS validates is to run it for a week or more on a reasonably large cluster, with MPI applications and validation suites which stress both the hardware and software. It takes competently configured hardware and software to produce a robust production quality cluster. Microway has made significant investments in this area. The bottom line is that on an HPC computational node that has close to 2,000 parts, there are nearly 2,000 ways to cut corners. Moreover, it only takes one of these 2,000 parts to end up bringing down a node – “for the price of a nail, the shoe was lost…” Our primary responsibility as your vendor is to make sure that only the highest quality nails make it into a Microway Cluster. |
![]()
|
Storage
Considerations for HPC Cluster Design Storage Area Network (SAN) is a block level storage pool that uses a Fibre Channel protocol and switched technology referred to as a fabric. SANs have become popular as a way to pool heterogeneous RAID arrays, which can be used by heterogeneous systems. SANs have proven to be effective in commercial applications with large servers. The server itself is connected to a SAN switch using a Host Bus Adapter (HBA) connected via the server PCI bus. HBA’s can be fairly expensive (up to $1,000). Application for HPC Cluster Users On the other hand, NFS running on NAS offers significant benefits verses direct attached SCSI storage. In direct attached storage, I/O bottlenecks can occur during the read and write process when using a RAID array on the master node and internal storage in each slave node. For example, a typical BLAST application uses large data sets ranging from 40 to 80 GB’s. These must be loaded through the master node to each slave node. However, only a fraction of this data may be used on any given node during computational analysis cycles. This creates unnecessary network loading and performance degradation. By using a diskless cluster where each node would have a direct connection to shared storage through a dedicated NFS server, the volume of data transfer is reduced and a network bottleneck is avoided since each node connects through its own Gigabit Ethernet connection. Unfortunately, most NFS/NAS solutions are stand alone “islands” of storage that do not scale well when adding I/O bandwidth or storage capacity. Getting the Benefits of SAN and NAS in an HPC
Cluster Microway takes this one step further by adding a software product called a Cluster File System (CFS). The CFS adds dynamic file locking and file sharing. The CFS eliminates the need to partition the SAN since files are locked and unlocked as they are accessed by one of the NFS servers. As the ratio of cluster nodes to NFS servers increases, the cost of the HBAs can be amortized across many systems. This arrangement takes advantage of the accessibility and cost effectiveness of NAS with the scalability of SAN. ![]() Microway has launched an initiative to create storage solutions designed to enhance HPC cluster performance and manageability. We use various technologies, as appropriate, to integrate the right solution for each customer’s application. To learn more about your options, please call your Microway account manager or email storage@microway.com. |
![]()
If you no longer wish to receive this newsletter, please reply to this email and change the subject line to UNSUBSCRIBE.