
DGX+Parallel Storage Building Blocks for Scale Out AI
DGX-POD is an NVIDIA-validated building block of AI Compute & Storage for scale-out deployments.
Designed for the largest datasets, DGX POD solutions enable training at vastly improved performance compared to single systems. DGX-POD also includes the AI data-plane/storage with the capacity for training datasets, expandability for growth, and the speed that can keep up with AI workloads.
Why DGX-POD?
Fully Integrated Deployment
DGX-POD solutions come with the DGX AI Appliance and Parallel storage fully integrated together
Extendable
Build complete racks with multiple DGX-POD configurations
Validated Configuration
DGX-POD solutions have been validated to deliver high storage throughput
Performance Improves Over Time
DGX-POD automatically receives performance improvements from NVIDIA
Mid-Scale and Single Rack Solutions
1:1 DGX A100 POD with AI200X
1 NVIDIA DGX A100 with DDN AI200X
First Deployment of AI Compute & AI Ready-Parallel Storage
- 5 PFLOPS of AI Performance
- DDN AI200X appliance with throughput up to 24GB/sec and 1.5 million IOPS (various data capacities available)
- Total of 320GB or 640GB of GPU memory
- Mellanox 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Seamless AI compute scaling: add DGX systems to increase AI performance (existing DDN AI200X has headroom to continue to scale data throughput)
- Seamless storage throughput & capacity scaling: add AI200X appliances to double data bandwidth or grow data capacity (DDN EXAScaler Lustre filesystem is already built for expansion)
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
2:1 DGX A100 POD with AI400X
2 NVIDIA DGX A100s with DDN AI400X
Scale-Up AI Compute & AI Ready-Parallel Storage
- 10 PFLOPS of AI Performance
- DDN AI400X appliance with throughput up to 48GB/sec and 3 million IOPS (various data capacities available)
- Total of 640GB or 1280GB of GPU memory
- Mellanox 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Seamless AI compute scaling: add DGX systems to increase AI performance (existing DDN AI400X has headroom to continue to scale data throughput)
- Seamless storage throughput & capacity scaling: add AI400X appliances to double data bandwidth or grow data capacity (DDN EXAScaler Lustre filesystem is already built for expansion)
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
4:2 DGX A100 POD with AI400X
4 NVIDIA DGX A100s with 2 DDN AI400X
Full Rack AI Compute & AI Ready-Parallel Storage
- 20 Tensor PFLOPS of AI Performance
- DDN AI400X appliances with throughput up to 96GB/sec and 6 million IOPS (various data capacities available)
- Total of 1.25TB or 2.5TB of GPU memory
- Mellanox 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Seamless AI compute scaling: add DGX systems to increase AI performance (existing DDN AI400X systems have headroom to continue to scale data throughput)
- Seamless storage throughput & capacity scaling: add AI400X appliances to increase bandwidth or grow data capacity (DDN EXAScaler Lustre filesystem is already built for expansion)
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
Something else?
Custom Parallel Storage Solutions
Looking for another filesystem (Spectrum Scale/GPFS, BeeGFS), scale, or capacity? Let us design a system that meets your needs.
Key Capabilities
- Capacities to 1PB or beyond
- Throughput >100-500GB/sec
- Dynamic capacity expansion
- Lustre, BeeGFS, or Spectrum Scale (formerly GPFS)
Multi-Rack Solutions and DGX SuperPOD
8:4 DGX A100 POD with AI400X
8 DGX A100 Systems with 4 DDN AI400X
Dual Rack of AI Compute & Ultra-High Throughput Parallel Storage
- 40 PFLOPS of AI Performance
- DDN AI400X appliances with throughput up to 192GB/sec and 12 million IOPS (various data capacities available)
- Total of 2.5TB or 5TB of GPU memory
- Mellanox 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Superior Performance with GPUDirect Storage: realize a direct data path from storage to GPU over InfiniBand. Delivers faster performance for multiple users on a single DGX and on scale-out multi-DGX deployments
- Validated Configuration: Scale-out AI performance of design is validated by NVIDIA and DDN
DGX SuperPOD 20 Node Deployment
20 NVIDIA DGX A100 AI Systems, 7 DDN AI400X
Record-Breaking, Large AI Cluster Building Block
- 100 PFLOPS of AI Performance
- 7 DDN AI400X appliances with aggregate throughput up to 336GB/sec and 21 million IOPS (various data capacities available)
- Total of 6.25TB or 12.5TB of GPU memory
- Mellanox 200Gb HDR InfiniBand
- NGC Containers with NVIDIA-optimized performance
- Full parallel filesystem and DDN Management GUI
- Record Breaking Building Block: Scale-out AI performance of design is the basis of NVIDIA’s World Record Breaking, DGX SuperPOD Deployment
- Scales up to Massive Deployments deploy multiple 20 node building blocks for immense AI + storage deployments
Something else?
Custom Parallel Storage Solutions
Looking for another filesystem (Spectrum Scale/GPFS, BeeGFS), scale, or capacity? Let us design a system that meets your needs.
Key Capabilities
- Multi-PB Storage Capacities
- Throughput >500GB/sec
- Dynamic capacity expansion
- Lustre, Spectrum Scale (formerly GPFS), or BeeGFS