Nvidia DGX-1 Bedienungsanleitung

NVIDIA DGX-1
DU-08033-001 _v13.1 | December 2017
User Guide

www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|ii
TABLE OF CONTENTS
Chapter1.Introduction to the NVIDIA DGX-1 Deep Learning System................................. 1
1.1.Using the DGX-1: Overview............................................................................. 1
1.2.Hardware Specifications................................................................................. 2
1.2.1. Components.......................................................................................... 2
1.2.2. Mechanical............................................................................................ 2
1.2.3.Power Requirements................................................................................ 3
1.2.4.Connections and Controls.......................................................................... 3
1.2.5.Rear Panel Power Controls.........................................................................4
1.2.6. LAN LEDs..............................................................................................5
1.2.7.IPMI Port LEDs....................................................................................... 5
1.2.8.Hard Disk Indicators................................................................................ 6
1.2.9.Power Supply Unit (PSU) LED..................................................................... 6
Chapter2.Installation and Setup............................................................................ 8
2.1.Registering Your DGX-1.................................................................................. 8
2.2.Obtaining Software and Software Updates........................................................... 8
2.3.Choosing a Setup Location / Site Preparation....................................................... 9
2.4.Unpacking the DGX-1................................................................................... 10
2.5.What's In the Box....................................................................................... 11
2.6.Installing the DGX-1 Into a Rack..................................................................... 11
2.6.1.Installing the Rails.................................................................................12
2.6.2.Mounting the DGX-1............................................................................... 12
2.7.Attaching the Bezel.....................................................................................13
2.8.Connecting the Power Cables......................................................................... 14
2.9.Connecting the Network Cables...................................................................... 15
2.10.Setting Up the DGX-1................................................................................. 16
2.11.Post Setup Instructions for DGX OS Server Software Version 2.x and Earlier................. 18
Chapter3.Preparing for Using Docker Containers......................................................20
3.1.Installing Docker and NVIDIA Docker on DGX OS Server Software 2.x or Earlier...............20
3.2.Configuring Docker IP Addresses......................................................................21
3.2.1.Configuring Docker IP Addresses for DGX OS Server Software Version 2.x and Earlier...22
3.2.2.Configuring Docker IP Addresses for DGX OS Server Software Version 3.1.1 and Later.. 22
3.3.Letting Users Issue Docker Commands...............................................................23
3.3.1.Checking if a User is in the Docker Group.................................................... 24
3.3.2.Creating a User.....................................................................................24
3.3.3.Adding a User to the Docker Group............................................................ 24
3.4.Configuring a System Proxy............................................................................24
3.5.Configuring NFS Mount and Cache................................................................... 25
Chapter4.Configuring and Managing the DGX-1........................................................ 27
4.1. Using the BMC........................................................................................... 27
4.1.1.Creating a Unique BMC Password for Remote Access........................................ 28

www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|iii
4.1.2.Viewing System Information......................................................................29
4.1.3.Submitting BMC Log Files.........................................................................29
4.1.4.Determining Total Power Consumption......................................................... 29
4.1.5.Accessing the DGX-1 Console.................................................................... 30
4.1.6.Powering Off / Power Cycling the System Remotely.........................................30
4.1.6.1.From the DGX-1 Console Window..........................................................30
4.1.6.2.From the BMC UI............................................................................. 30
4.2.Configuring a Static IP Address for the BMC........................................................31
4.2.1.Configuring a BMC Static IP Address Using ipmitool..........................................31
4.2.2.Configuring a BMC Static IP Address Using the System BIOS................................ 32
4.2.3.Configuring a BMC Static IP Address Using the BMC Dashboard............................ 36
4.3.Configuring Static IP Addresses for the Network Ports............................................37
4.4.Obtaining MAC Addresses.............................................................................. 38
Chapter5.Maintaining and Servicing the NVIDIA DGX-1............................................... 42
5.1.Problem Resolution and Customer Care............................................................. 42
5.2.Restoring the DGX-1 Software Image................................................................ 42
5.2.1.Obtaining the DGX-1 Software ISO Image and Checksum File.............................. 43
5.2.2.Re-Imaging the System Remotely............................................................... 43
5.2.3.Creating a Bootable Installation Medium...................................................... 46
5.2.3.1.Creating a Bootable USB Flash Drive by Using the dd Command......................46
5.2.3.2.Creating a Bootable USB Flash Drive by Using Akeo Rufus............................. 47
5.2.4.Re-Imaging the System From a USB Flash Drive.............................................. 49
5.2.5.Retaining the RAID Partition While Installing the OS.........................................49
5.3.Updating the System BIOS............................................................................. 50
5.4.Updating the BMC....................................................................................... 53
5.5.Replacing the System and Components..............................................................55
5.5.1.Replacing the System............................................................................. 56
5.5.2.Replacing an SSD...................................................................................56
5.5.3.Recreating the Virtual Drives.................................................................... 57
5.5.3.1.Access the BIOS Setup Utility.............................................................. 57
5.5.3.2.Clear the Drive Group Configuration...................................................... 60
5.5.3.3.Recreate the OS Virtual Drive.............................................................. 64
5.5.3.4.Recreate the RAID0 Virtual Drive.......................................................... 72
5.5.4.Recreating the RAID 0 Array..................................................................... 84
5.5.5.Replacing the Power Supplies....................................................................85
5.5.6.Replacing the Fan Module........................................................................ 86
5.5.7.Replacing the DIMMs...............................................................................86
5.5.8.Replacing the InfiniBand Cards.................................................................. 91
5.5.9.Setting Up the InfiniBand Cards.................................................................95
Chapter6.Installing Software on Air-Gapped NVIDIA DGX-1 Systems............................... 99
6.1.Installing NVIDIA DGX-1 Software.....................................................................99
6.1.1.Re-Imaging the System............................................................................99
6.1.2.Creating a Local Mirror of the NVIDIA and Canonical Repositories....................... 100

www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|iv
6.2.Installing Docker Containers......................................................................... 100
Chapter7.Customer Support for the NVIDIA DGX-1.................................................. 102
Chapter 8. Safety............................................................................................. 103
8.1.Safety Warnings and Cautions....................................................................... 103
8.2.Intended Application Uses............................................................................104
8.3. Site Selection........................................................................................... 104
8.4.Equipment Handling Practices....................................................................... 105
8.5.Electrical Precautions................................................................................. 105
8.6.System Access Warnings.............................................................................. 106
8.7.Rack Mount Warnings..................................................................................106
8.8.Electrostatic Discharge................................................................................107
8.9.Other Hazards.......................................................................................... 108
Chapter9. Compliance.......................................................................................110
9.1. United States........................................................................................... 110
9.2.United States / Canada...............................................................................110
9.3. Canada................................................................................................... 111
9.4. CE.........................................................................................................111
9.5. Japan.....................................................................................................111
9.6. Australia................................................................................................. 112
9.7. China..................................................................................................... 112
9.8. Israel..................................................................................................... 114
9.9. South Korea............................................................................................. 114
9.10. India.................................................................................................... 115

www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|1
Chapter1.
INTRODUCTION TO THE NVIDIA DGX-1
DEEP LEARNING SYSTEM
The NVIDIA® DGX-1™ Deep Learning System is the world’s first purpose-built system
for deep learning with fully integrated hardware and software that can be deployed
quickly and easily.
1.1.Using the DGX-1: Overview
The NVIDIA DGX-1 comes with a base operating system consisting of an Ubuntu OS,
Docker, Docker Engine Utility for NVIDIA GPUs, and NVIDIA drivers. Ths system is
designed to run a number of NVIDIA-optimized deep learning framework applications
packaged in Docker containers. You can use your own scheduling and management
software to run jobs, and also build and run your own applications on the DGX-1.

Introduction to the NVIDIA DGX-1 Deep Learning System
www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|2
1.2.Hardware Specifications
1.2.1.Components
Component Qty Description
1 Dual Intel® Xeon® CPU motherboard with x2 9.6 GT/s QPI, 8 Channel
with 2 DPC DDR4, Intel®X99 Chipset, AST2400 BMC
1 GPU Baseboard supporting 8 SXM2 modules (Cube Mesh) and 4 PCIE x16
slots for InfiniBand NICs
Base Server
1 Chassis with 3+1 1600W Power supply and support for up to five 2.5
inch drives
1 10/100BASE-T IPMI Port
1 RS232 Serial Port
2 USB 3.0 Ports
Power Supply 4 1600 W each.
CPU 2 Intel® Xeon® E5-2698 v4, 20-core, 2.2GHz, 135W
GPU 8 (Option 1) Tesla P100, featuring
‣170 teraflops, FP16
‣16 GB memory per GPU
‣28,672 NVIDIA CUDA® Cores
(Option 2) Tesla V100, featuring
‣960 teraflops, FP16
‣16 GB memory per GPU
‣40,960 NVIDIA CUDA® Cores
‣5120 NVIDIA Tensor Cores
System Memory 16 32 GB DDR4 LRDIMM (512 GB total)
SAS Raid Controller 1 8 port LSI SAS 3108 RAID Mezzanine
Storage (RAID 0) (Data) 4 1.92 TB, 6 Gb/s, SATA 3.0 SSD
Storage (OS) 1 480 GB, 6 Gb/s, SATA 3.0 SSD
10 GbE NIC 1 Dual port, 10GBASE-T, network adapter Mezzanine
InfiniBand EDR NIC 4 Single port, x16 PCIe, Mellanox ConnectX-4 VPI MCX455A-ECAT
1.2.2.Mechanical
Feature Description
Form Factor 3U Rackmount
Height 5.16” (13.1 cm)
Width 17.5" (44.4 cm)

Introduction to the NVIDIA DGX-1 Deep Learning System
www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|3
Feature Description
Depth 34.1" (86.6 cm)
Gross Weight 134 lbs (61 kg)
1.2.3.Power Requirements
Input
Specification for
Each Power Supply Comments
200-240 V (ac) 3500 W max. 1600 W @ 200-240 V,
8 A, 50-60 Hz
The DGX-1 contains four load-balancing
power supplies, with 3+1 redundancy.
1.2.4.Connections and Controls
ID Type Qty Description
1 Power button 1 Press to turn the DGX-1 on or off.
Blue: System power on
Off: System power off
Amber (blinking): DC Off and fault
Amber and blue (blinking): DC On and fault
2 ID button 1 Press to cause an LED on the back of the unit to flash as an identifier
during servicing.

Introduction to the NVIDIA DGX-1 Deep Learning System
www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|4
ID Type Qty Description
3 InfiniBand 4 QSFP28 port; Mellanox ConnectX-4 VPI MCX455A-ECAT, EDR IB (100Gb),
x16 PCIe
4 USB 2 USB 3.0 ports are available to connect a keyboard.
5 VGA 1 The VGA port connects to a VGA capable monitor for local viewing of
the DGX-1 setup console or base OS.
6 DB9 1 RS232 serial port for internal debugging
7 AC input 4 Power supply inputs
8 Ethernet (RJ45) 2 10GBASE-T dual port network adapter Mezzanine
9IPMI (RJ45) 1 10/100BASE-T Intelligent Platform Management Interface (IPMI) port
1.2.5.Rear Panel Power Controls
ID Type Qty Description
1 Power button 1 Press and immediately release the power button for a graceful
shutdown of the host OS.
Press and hold the power button for at least four seconds to shut
down the system immediately. The BMC remains live.
2 Power LED 1 Off: Power off
Blue (steady): Power on
Blue (blinking): BMC reports system health fault.
3 Main Board Status
LED
1Off: Normal
Amber (blinking): BMC reports system health fault.

Introduction to the NVIDIA DGX-1 Deep Learning System
www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|5
1.2.6.LAN LEDs
LEDs next to each Ethernet port indicate the connection status as described in the table
below:
LED Status Description
Amber (steady) LAN link
Amber (blinking) LAN access (off when there is traffic)
1
(Port 1 Link/Activity)
Off Disconnected
Green 10 Gb/s
Amber 1 Gb/s
2
(Port 1 Speed)
Off 100 Mb/s
Amber (steady) LAN link
Amber (blinking) LAN access (off when there is traffic)
3
(Port 0 Link/Activity)
Off Disconnected
Green 10 Gb/s
Amber 1 Gb/s
4
(Port 0 Speed)
Off 100 Mb/s
1.2.7.IPMI Port LEDs
LEDs on the IPMI port indicate the connection status as described in the table below:
Link Activity Description
Off Off Unplugged

Introduction to the NVIDIA DGX-1 Deep Learning System
www.nvidia.com
NVIDIA DGX-1 DU-08033-001 _v13.1|6
Link Activity Description
Green (steady) Green (blinking) 100M active link
Off Green (blinking) 10M active link
1.2.8.Hard Disk Indicators
ID Feature Description
1 Button and release lever for removing the HDD
2
HDD present LED
Blue (Steady): Drive present
Blue (Blinking twice/sec): Identification (such as when
initializing or locating through the SBIOS)
Blue (Blinking once/sec): Rebuilding (such as when creating a
RAID array)
Amber (Steady): Warning/failure
Off: Slot empty
3
HDD activity LED
Blue: Access
1.2.9.Power Supply Unit (PSU) LED
The PSU LED indicates the operation status of the PSU as described in the table below:
Andere Handbücher für DGX-1
1
Inhaltsverzeichnis
Andere Nvidia Desktop Handbücher























