
DGX+Parallel Storage Building Blocks for Scale Out AI
NVIDIA DGX POD is an NVIDIA®-validated building block of AI Compute & Storage for scale-out deployments.
Designed for the largest datasets, DGX POD solutions enable training at vastly improved performance compared to single systems. DGX POD also includes the AI data-plane/storage with the capacity for training datasets, expandability for growth, and the speed that can keep up with AI workloads.
Why DGX POD?
Fully Integrated Deployment
DGX POD solutions come with the DGX AI Appliance and Parallel storage fully integrated together
Extendable
Build complete racks with multiple DGX POD configurations
Validated Configuration
DGX POD solutions have been validated to deliver high storage throughput
Performance Improves Over Time
DGX POD automatically receives performance improvements from NVIDIA
Mid-Scale and Single Rack Solutions
1:1 DGX A100 POD with AI200X
1 NVIDIA DGX A100™ with DDN AI200X
First Deployment of AI Compute & AI Ready-Parallel Storage
- 5 PFLOPS of AI Performance
- DDN AI200X appliance with throughput up to 24GB/sec and 1.5 million IOPS (various data capacities available)
- Total of 640GB of GPU memory
- NVIDIA 200Gb HDR InfiniBand
- NVIDIA NGC™ Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Seamless AI compute scaling: add DGX systems to increase AI performance (existing DDN AI200X has headroom to continue to scale data throughput)
- Seamless storage throughput & capacity scaling: add AI200X appliances to double data bandwidth or grow data capacity (DDN EXAScaler Lustre filesystem is already built for expansion)
- Superior Performance with GPUDirect® Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
2:1 DGX A100 POD with AI400X
2 NVIDIA DGX A100s with DDN AI400X
Scale-Up AI Compute & AI Ready-Parallel Storage
- 10 PFLOPS of AI Performance
- DDN AI400X appliance with throughput up to 48GB/sec and 3 million IOPS (various data capacities available)
- Total of 1280GB of GPU memory
- NVIDIA 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Seamless AI compute scaling: add DGX systems to increase AI performance (existing DDN AI400X has headroom to continue to scale data throughput)
- Seamless storage throughput & capacity scaling: add AI400X appliances to double data bandwidth or grow data capacity (DDN EXAScaler Lustre filesystem is already built for expansion)
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
4:2 DGX A100 POD with AI400X
4 NVIDIA DGX A100s with 2 DDN AI400X
Full Rack AI Compute & AI Ready-Parallel Storage
- 20 Tensor PFLOPS of AI Performance
- DDN AI400X appliances with throughput up to 96GB/sec and 6 million IOPS (various data capacities available)
- Total of 2.5TB of GPU memory
- NVIDIA 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Seamless AI compute scaling: add DGX systems to increase AI performance (existing DDN AI400X systems have headroom to continue to scale data throughput)
- Seamless storage throughput & capacity scaling: add AI400X appliances to increase bandwidth or grow data capacity (DDN EXAScaler Lustre filesystem is already built for expansion)
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
Something else?
Custom Parallel Storage Solutions
Looking for another filesystem (Spectrum Scale/GPFS, BeeGFS), scale, or capacity? Let us design a system that meets your needs.
Key Capabilities
- Capacities to 1PB or beyond
- Throughput >100-500GB/sec
- Dynamic capacity expansion
- Lustre, BeeGFS, or Spectrum Scale (formerly GPFS)
Multi-Rack Solutions and NVIDIA DGX SuperPOD™
8:4 DGX A100 POD with AI400X
8 DGX A100 Systems with 4 DDN AI400X
Dual Rack of AI Compute & Ultra-High Throughput Parallel Storage
- 40 PFLOPS of AI Performance
- DDN AI400X appliances with throughput up to 192GB/sec and 12 million IOPS (various data capacities available)
- Total of 5TB of GPU memory
- NVIDIA 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
DGX SuperPOD 20 Node Deployment
20 NVIDIA DGX A100 AI Systems, 7 DDN AI400X
Record-Breaking, Large AI Cluster Building Block
- 100 PFLOPS of AI Performance
- 7 DDN AI400X appliances with aggregate throughput up to 336GB/sec and 21 million IOPS (various data capacities available)
- Total of 12.5TB of GPU memory
- NVIDIA 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Record Breaking Building Block: Scale-out AI performance of design is the basis of NVIDIA’s World Record Breaking, DGX SuperPOD Deployment
- Scales up to Massive Deployments deploy multiple 20 node building blocks for immense AI + storage deployments
Something else?
Custom Parallel Storage Solutions
Looking for another filesystem (Spectrum Scale/GPFS, BeeGFS), scale, or capacity? Let us design a system that meets your needs.
Key Capabilities
- Multi-PB Storage Capacities
- Throughput >500GB/sec
- Dynamic capacity expansion
- Lustre, Spectrum Scale (formerly GPFS), or BeeGFS