If you asked around at SC this year, some attendees might have told you there wasn’t much new going on. It’s true that not every company was launching new hardware with 2X performance, but there were significant announcements. Some are shipping now and some are looking forward into 2014 or 2015. See our top picks below.
Intel® Knights Landing
One of SC13’s most exciting announcements came from Intel, who gave a detailed explanation of the Knights Landing product. Scheduled for release in late Q4 2014/early 2015, Knights Landing will be will the successor to Knights Corner, the first generation Xeon Phi™ coprocessor.
Rather than a separate card, Knights Landing will actually be a standalone CPU. Although on the surface, this approach might look like the CPU/GPU combination of AMD’s Heterogeneous System Architecture (HSA), the Knights Landing chip is a single, 72-core processor (based on the Silvermont Intel Atom™ CPU). Performance targets are 6 TFLOPS single-precision and 3 TFLOPS double-precision. In addition to the benefit of accessing up to 384GB of six-channel DDR4 system memory, the chip itself has 16GB of directly attached, stacked 3D memory with 500GB/s bandwidth. These CPUs will also introduce Intel’s “Storm Lake” 100Gb/s Host Fabric Interface, which may be an interconnect to rival InfiniBand.
NVIDIA® Tesla® K40
Another big announcement came from NVIDIA, who introduced the new Tesla K40. Microway was already familiar with the K40 before SC13, and had even gotten a chance to test out some K40s, so we published information on the card on the same day that NVIDIA did. In case you missed it, be sure to check out this blog post.
PLX Technology®: PCI Express as a Fabric
PCI-Express® is ubiquitous. The controllers are built in to all the latest CPUs. These days, it’s hard to even find a server with PCI or PCI-X slots. The older standards for expansion cards have been replaced by something better.
For several years we’ve seen point-to-point connections using PCI-Express. Most commonly in the HPC world, it’s been an easy way to connect GPU accelerators to servers which don’t have the power and/or cooling capacity to house the GPUs within the server chassis. Instead, a separate chassis powers & cools the GPUs with a high-speed PCI-Express interface running between the two chassis. PLX builds the chips which make this work.
PLX has also been providing multi-host designs which allow multiple servers to connect to a PCI-Express connected resource (storage, GPUs, etc.). Now they’re developing a true interconnect allowing multiple servers to communicate directly over PCI-Express. The physical design remains largely the same (host cards in each server connect to a central switch), but the logical level operates differently from traditional interconnects.
The technology is still in the development phase (with dev kits available) but promises to provide low latencies and bandwidths in the range of 10-12 GB/s. We’ll have to see how the software stack pans out and how quickly HPC groups adopt this new paradigm.
If you’ve not yet heard of Bright Computing, be prepared to hear about them more and more in the near future. Their flagship product, Bright Cluster Manager®, has been a real hit within the HPC industry. Combining powerful cluster management capabilities with an intuitive user interface, Bright Cluster Manager has earned the company many accolades, including Deloitte’s “Rising Star 2013” award and one of Bio-IT World’s “Best of Show” awards.
At this year’s SC, Bright Computing previewed its next generation Bright Cluster Manager, which includes a very comprehensive list of features for a variety of HPC uses and applications. For example, companies interested in big data can use Bright Cluster Manager with a variety of Hadoop distributions to configure services, monitor systems, and ensure efficient hardware performance. For those wanting to manage their own cloud, Bright Cluster Manager bundles with OpenStack and offers seamless integration, managing virtual machines as if they were physical nodes.
Bright Computing also appeared at SC alongside Intel, helping the chip manufacturing giant’s “Cherry Creek” supercomputer achieve both TOP500 and Green500 status. Bright’s software not only managed the Intel Xeon CPUs and Phi coprocessors, but also the Intel True Scale fabric. Bright Cluster Manager helped optimize the workload management of Cherry Creek’s heterogeneous system with a single programming model, what Intel calls a neo-heterogeneous architecture. Feel free to try out Bright Cluster Manger by visiting Bright Computing’s website.
Big data was a popular topic at this year’s conference and ET International, the company that created the parallel computing framework SWARM, has a new offering called HAMR. ETI’s whitepaper describes HAMR as “a real-time data driven processing engine that runs on top of Hadoop 2.0. HAMR provides a framework for implementing complex analytics that allows pulling data dynamically from multiple sources and extending Hadoop’s popular MapReduce programming framework to permit multiple phases.” By keeping intermediate data in system memory HAMR is able to reduce both time to result and cluster size for either batch or streaming analytics.
For those less familiar with big data analytics, here is another way to explain HAMR’s value proposition. Under normal conditions, Hadoop needs to centralize data into HDFS before data can be analyzed for data mining, predictive modeling, statistics, or other applications. This centralization process uses up computing resources and time, making it difficult to support real-time data analysis. HAMR is able to pull data from various sources and deliver analytical data in batches or in real-time. Those already familiar with Hadoop will find that HAMR does not have a steep learning curve because ETI’s product is actually an extension of Hadoop’s MapReduce programming framework. HAMR is still in the beta phase, so its uncertain what the eventual adoption rate will look like, but HAMR certainly seems like a compelling product.
Another interesting software offering is rCUDA, middleware that allows nodes within a cluster to remotely access GPUs physically located in another node, without fundamentally changing the CUDA code. Often, the GPUs within a cluster node are not 100% fully utilized by that node. rCUDA allows any node within that cluster to access that under-utilized GPU. The benefits of rCUDA are twofold. First, you can underprovision GPUs in your cluster since you would need fewer shared GPUs compared to dedicating resources to individual nodes. Second, a single node can run a job using more GPUs than it otherwise physically could. Understandably, there are some latency costs associated with rCUDA, but the benefits, including integration with SLURM and compatibility with ARM clusters, can potentially be significant.
There are a lot of choices out there to consider when selecting software for your cluster. The product Qlustar will likely be of great interest to those who prefer a Debian/Ubuntu-based approach. It’s special because building up an HPC cluster from these distributions usually requires additional effort. Qlustar is also unique in its built-in support for ZFS, LUSTRE (on top of ZFS) and HA.
StarNet Communications, makers of X-Win32, introduced FastX. X-Win32 is a widely-used product by groups that run Linux on their HPC cluster and Windows on their local desktops. It enables cluster users to run visualizations on the cluster with the display running in their office.
Although X-Win32 is functional, the speed does leave something to be desired (especially for visualization of 3D models). To address these issues, look to StarNet’s new FastX product. With support for OpenGL and DRI, FastX provides a much smoother Linux remote desktop experience.
Companies to Check Out
The Intel Xeon Phi was one of the hot topics of the show, but some people were uncertain about developing on Intel’s new parallel platform. Fortunately, Acceleware offers an Intel Xeon Phi optimization training course which can help developers get a better understanding of the technology.
Another interesting company at SC13 was Advania, an information technology service company based in Iceland. When you think about data center locations, Iceland might not be at the top of your list, but Advania nonetheless has a compelling value proposition. The company’s data centers draw power from either geothermal or hydroelectric sources, making their energy both inexpensive and ecologically friendly. Advania’s facilities further take advantage of Iceland’s unique environment by utilizing the naturally cold temperatures to cool the data centers, again at a very low cost. Iceland is geographically located between Europe and North America. With subsea cables as direct connections, utilizing data centers there can be an effective way to reduce transatlantic latency. Finally, for companies worried about data security, Iceland’s data protection laws make it a very attractive place to store data. In many ways, the country’s approach to data neutrality is similar to Switzerland’s position on banks. Although many readers manage their own clusters or data centers, it’s interesting to see how IT service companies can compete in sometimes unexpected ways.
NVM Express is a scalable host controller interface that is meant to standardize PCIe SSD implementation. This new standard is trying to make it easier for OEMs to validate PCIe SSD products while providing better overall performance for end users. Currently, each SSD vendor has unique drivers that require OEM validation. In addition, current implementations are burdened with legacy I/O support requirements that add complexity and increase latency. NVM Express simplifies the stack while still maintaining compatibility with software designed to work with older standards.
The value proposition of NVM Express hinges on the fact that while processors have become faster, multi-core, and more parallel, storage technology has not evolved in the same way. In addition to simplifying the storage stack, devices utilizing the NVMe standard realize increased performance across multiple processor cores, an optimized register and command set, increased scalability, and increased security standards.
The impact of NVMe will be different for a variety of users. The unified standards allow OEMs to more quickly and easily validate PCIe SSD products, providing more options for customers. Those concerned with reducing latency and clock cycles associated with data access will be pleased with reductions in both metrics by a factor of about ½. Where data security is a concern, NVMe offers support for important standards, such as those advocated by Trusted Consumer Group. Although the technology is still new, NVM Express has the support of many big HPC players including Intel, Seagate®, Oracle®, Cisco®, Dell®. Be sure to look out for products supporting NVMe on the horizon.