Wheeeeeeeeeeeee, the CUDA cores are idle while the tensor cores are active.
"
To accelerate the execution of Machine Learning
applications, recent GPUs use Tensor cores to speed up the
general matrix multiplication (GEMM), which is the heart of
deep learning. The Streaming Processors in such GPUs also
contain CUDA cores to implement general computations. While
the Tensor cores can significantly improve the performance of
GEMM, the CUDA cores remain idle when Tensor cores are
running. This leads to inefficient resource utilization. In this
work, we propose to offload part of the GEMM operations from
Tensor cores to CUDA cores to fully utilize GPU resources.
We investigated the performance bottleneck in such offloading
schemes and proposed architectural optimization to maximize the
GPU throughput. Our technique is purely hardware-based and
does not require a new compiler or other software support. Our
evaluation results show that the proposed scheme can improve
performance by 19% at the maximum."
Utilizing both doesn't help as much as you would think as it's hard to offload parts of it.
This is a pretty big issue for NVIDIA to solve, we'll see.