Cutting-edge Cloud GPU for Deep Learning
NVIDIA® A100 GPUs NOW AVAILABLE

The NVIDIA® A100 Tensor Core GPU is a highly advanced graphics processing unit (GPU) specially designed to accelerate deep learning workloads. The A100 delivers cloud-based acceleration available at every scale and on demand, so you don’t have to buy or rent expensive hardware resources to run your AI applications.

Combined with 80GB of the fastest GPU memory, researchers can reduce a 10-hour, double-precision simulation to under four hours on A100. HPC applications can also leverage TF32 to achieve up to 11X higher throughput for single-precision, dense matrix-multiply operations.

Top 3 Use Cases

Deep Learning Training

AI models are becoming increasingly complex. Training these models requires massive computational power, as well as flexible capabilities to scale. NVIDIA A100 Tensor Cores with Tensor Float (TF32) provide up to 20x the performance compared to NVIDIA Volta without code changes, and an additional 2x increased efficiency with automatically mixed precision and FP16.

Deep Learning Inference

For state-of-the-art conversational AI models like BERT, the A100 accelerates inference with up to a 249x improvement in throughput compared to CPUs. Multi-instance GPU (MIG) technology enables the simultaneous operation of multiple networks on a single A100 for optimal use of compute resources. The support for structural sparsity provides up to 2x increased efficiency – in addition to the other A100 GPU's performance gains in inferences.

High-Performance Data Analytics

In a benchmark for Big Data analytics, the A100 80GB delivered an 83x increased throughput compared to CPUs and a 2x increase over the A100 40GB. This makes the A100 80GB the ideal choice for new workloads with exploding data volumes.

Tech Specs

Form Factor A100 80GB SXM A100 80GB PCle
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32(TF32) 156 TFLOPS | 312 TFLOPS *
BFLOAT16 Tensor Core 312 TFLOPS | 624 TFLOPS *
INT8 Tensor Core 312 TFLOPS | 624 TFLOPS *
Max thermal design Power (TDP) 400W 300W
GPU memory 80 GB HBM2e 80 GB HBM2e
GPU memory bandwidth 2,039 GB/s 1,935 GB/s
Max thermal design Power (TDP) 400W 300W
Multi-instance GPUs Up to 7 MIGS @ 10 GB each
Form factor SXM

PCIe

Dual-slot air-cooled or sigle-slot liquid-cooled

Interconnect NVLink: 600GB/s

PCIe Gen4: 64GB/s

NVIDIA NVLink Bridege

for 2 GPUs: 600GB/s **

PCIe Gen4: 64GB/s

* With sparsity

** SXM4 GPUs via HGX A100 server boards; PCle GPUs viaNVLink Bridge for up to two GPUs