04/17/2024 | Press release | Distributed by Public on 04/17/2024 06:47
MLPerf™ Inference is an industry benchmark suite developed by MLCommons for measuring the performance of systems running AI/ML models with various deployment scenarios. OCI has achieved stellar results in all benchmark cases in vision (classification and detection, medical imaging), natural language processing (NLP), recommendation, speech recognition, large language model (LLM) and text-to-image inferences in OCI's new BM.GPU.H100.8 shape powered by eight NVIDIA H100 Tensor Core GPUs and using NVIDIA TensorRT-LLM. The highlights:
The table below shows performance numbers for OCI's BM.GPU.H100.8 shape. For an exhaustive list of submitters' performance, please visit MLPerf benchmark results1.
Reference App |
Benchmark |
Scenarios |
|
Server (queries/s |
Offline (samples/s) |
||
Vision (image Classification) |
ResNet50 99 |
584,147 |
699,409 |
Vision (Object Detection) |
Retinanet 99 |
12,876 |
13,997 |
Vision (Medical Imaging) |
3D-Unet 99 |
- |
52 |
3D-Unet 99.9 |
- |
52 |
|
Speech to Text |
RNN-T 99 |
143,986 |
139,846 |
Recommendation |
DLRMv2 99 |
500,098 |
557,592 |
DLRMv2 99.9 |
315,013 |
347177 |
|
NLP |
BERT 99 |
55,983 |
69,821 |
BERT 99.9 |
49,587 |
61,818 |
|
LLM |
GPT-J 99 |
230 |
237 |
GPT-J 99.9 |
230 |
236 |
|
LLM |
Llama2-70B 99 |
70 |
21,299 |
Llama2-70B 99.9 |
70 |
21,032 |
|
Text to Image Gen |
Stable Diffusion XL 99 |
13 |
13 |
Source: MLPerf® v4.0 Inference Closed. Retrieved from https://mlcommons.org/benchmarks/inference-datacenter/ 14 April 2024, entry 4.0-0073.
The published results on MLPerf 4.0 and MLPerf 3.1 for BM.GPU.H100.8 (8 x NVIDIA H100 GPUs), BM.GPU.A100.8 (8 x NVIDIA A100 GPUs) and BM.GPU.A10.4 (4 x NVIDIA A10 GPUs) are shown below.1,2
BM.GPU.H100.8 * |
BM.GPU.H100.8 vs. BM.GPU.A100.8* |
BM.GPU.H100 vs. BM.GPU.A10* |
||||
Benchmark |
Server (Queries/s) |
Offline (Samples/s) |
Server (Queries/s) |
Offline (Samples/s) |
Server (Queries/s) |
Offline (Samples/s) |
RESNET |
0% |
-1% |
101% |
115% |
N/A |
N/A |
RetinaNet |
0% |
0% |
98% |
150% 1.5x |
1406% 14.1x |
1368% 13.7x |
3D U-Net 99 |
N/A |
0% |
N/A |
70% |
N/A |
900% |
3D U-Net 99.9 |
N/A |
0% |
N/A |
70% |
N/A |
N/A |
RNN-T |
N/A |
N/A |
38% |
30% |
1465% 14.7x |
723% 7.2x |
BERT 99 |
0% |
-1% |
100% |
175% 1.8x |
N/A |
N/A |
BERT 99.9 |
0% |
-1% |
287% 2.9x |
325% 3.3x |
N/A |
N/A |
DLRM v2 99 |
67% |
64% |
525% 5.3x |
303% 3.0x |
N/A |
N/A |
DLRM v2 99.9 |
5% |
2% |
N/A |
N/A |
N/A |
N/A |
GPT-J 99 |
187% |
122% |
1258% 12.6x |
774% 7.7x |
N/A |
N/A |
GPT-J 99.90 |
N/A |
N/A |
1248% 12.5x |
832% 8.3x |
N/A |
N/A |
Llama 2-70B 99 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
Llama 2-70B 99.90 |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
SDXL |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
* comparisons were made for results obtained in MLPerf v4.0 vs. MLPerf v3.1 for three scenarios. For the comparisons titled "vs. BM.GPU.A100.8" and "vs. BM.GPU.A10," MLPerf v3.1 benchmark results were used for the BM.GPU.A100.8 and BM.GPU.A10 instance families.1,2
From the table above, we see that:
With the growing importance of generative AI, two new highly anticipated generative AI benchmarks, Llama2-70B and Stable Diffusion XL, are added to benchmark suite version 4.0. Llama2-70B and Stable Diffusion XL, run exceptionally well on systems with NVIDIA H100 GPUs. As shown below, the GPT-J benchmark BM.GPU.H100.8 shows over 13X the performance compared with the BM.GPU.A100.8. 1,2
The authors want to thank Dr. Sanjay Basu, Senior Director of OCI Engineering, and Ramesh Subramaniam, Principal Program Manager of OCI Engineering, for their assistance in publishing these results.
Footnotes:
[1] MLPerf® v4.0 Inference Closed. Retrieved from https://mlcommons.org/benchmarks/inference-datacenter/ 29 March 2024, entry 4.0-0073. Result verified by MLCommons Association. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
[2] MLPerf® v3.1 Inference Closed. Retrieved from https://mlcommons.org/benchmarks/inference-datacenter/ 29 March 2024, entries 3.1-0119, 3.1-0120, 3.1-0121. Result verified by MLCommons Association. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.