0% found this document useful (0 votes)
67 views3 pages

Arm GPU Datasheet

This document provides a comprehensive overview of Arm's GPU architectures from the Bifrost Mali-G71 to the 5th Gen Immortalis-G1, detailing API support, core features, and microarchitecture specifications. It includes tables that outline the capabilities of various Mali GPU models, including thread counts, operations per cycle, and cache sizes. Additionally, it offers insights into texture handling and specific architectural features for each generation of GPUs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views3 pages

Arm GPU Datasheet

This document provides a comprehensive overview of Arm's GPU architectures from the Bifrost Mali-G71 to the 5th Gen Immortalis-G1, detailing API support, core features, and microarchitecture specifications. It includes tables that outline the capabilities of various Mali GPU models, including thread counts, operations per cycle, and cache sizes. Additionally, it offers insights into texture handling and specific architectural features for each generation of GPUs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Mali & Mali & Mali &

Naming Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Mali G1
Immortalis-G715 Immortalis-G720 Immortalis-G725
Architecture Bifrost Valhall 5th Gen

Mali & Mali & Mali &


API Support Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Immortalis-G715 Immortalis-G720 Immortalis-G725 Mali G1

OpenGL ES 1.1 - 3.2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓


Vulkan 1.0 – 1.3 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
OpenCL 1.0 - 1.2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
OpenCL 2.0 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
OpenCL 2.1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
OpenCL 3.0 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Mali & Mali & Mali &
Core Features Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Mali G1
Immortalis-G715 Immortalis-G720 Immortalis-G725
ASTC ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
AFBC ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
AFBC – RGBA16 ✓ ✓ ✓ ✓ ✓ ✓ ✓
AFRC ✓ ✓ ✓ ✓ ✓ ✓
Shader framebuffer
access ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Multiple Render
Target[1] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
2xMSAA Automatically promoted to 4xMSAA ✓ ✓ ✓
4xMSAA ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
8xMSAA ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
16xMSAA ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
8-bit integer dot
product ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
MMUL FP16 ✓
FP16/R11G11B10
accelerated ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
blending[5]
Descriptor indexing ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Conservative
rasterization ✓ ✓ ✓ ✓
Variable Rate Shading ✓ ✓ ✓ ✓
Ray Tracing o o o ⊿

Microarchitecture Mali & Mali & Mali &


Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Mali G1
Features Immortalis-G715 Immortalis-G720 Immortalis-G725
Transaction elimination ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Hidden surface
removal ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Fragment prepass ✓ ✓
IDVS ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
DVS ✓ ✓ ✓

o – Immortalis: Always supported. Mali: Implementation dependent


⊿ - Ultra: Always supported. Pro & Premium: Implementation dependent

© Arm Ltd. 2025 Confidential


Mali & Mali & Mali &
Core Config Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Mali G1
Immortalis-G715 Immortalis-G720 Immortalis-G725
Arithmetic units 3 3 1/2 2/3 2/3 3 2 2 2 4 3/4 1-4 4 4 4 4
Warp width 4 4 4 4 8 8 16 16 16 16 16 16 16 16 16 16
Thread count (max) 384 384 256/512 512/768 512/768 768 1024 1024 1024 2048 1536-2048 512-2048 2048 2048 2048 2048
FP16 operations/cycle 48 48 16/32 32/48 64/96 96 128 128 128 256 192-256 64-256 512 512 512 512
FP32 operations/cycle 24 24 8/16 16/24 32/48 48 64 64 64 128 96-128 32-128 256 256 256 256
Fragments/cycle 1 1 1/2 2 2 2 2 2 2 4 4 2-4 4 4 4 4
Pixels/cycle 1 1 1/2 1/2 2 2 2 2 2 4 4 2-4 4 4 4 4
Texels/cycle 1 1 1/2 1/2 2 2 4 4 4 8 4-8 2-8 8 8 8 8
Load/store cache size 16K 16K 4K 16K 16K 16K 16K 16K 16K 32K 16K 8K-16K 32K 32K 32K 32K
(bytes)
Texture cache size 8K 8K 16K 16K 16K 32K 32K 32K 32K 32K 32K 16K-32K 32K 32K 32K 32K
(bytes)
Tile bits/pixel[2] 128 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256

Mali & Mali & Mali &


Texturing Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Immortalis-G715 Immortalis-G720 Immortalis-G725 Mali G1
Bilinear samples/cycle 1 1 1/2 1/2 2 2 4 4 4 8 4-8 2-8 8 8 8 8
Trilinear filtering x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2
Nx anisotropic N/A xN(4) xN xN[4] xN xN xN xN xN xN xN xN xN xN xN xN
filtering[3]
Depth format w/out x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
reference
Depth format with x1 x1 x1 x1 x1 x1 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2
reference
Data size over 32 bits/ x1 x1 x1 x1 x1 x1 x2 x2 x2 x2 x2 x2 x2 x2 x1[6] x1[6]
texel
ASTC w/out EXT_ x1 x1 x1 x1 x1 x1 x2 x2 x2 x2 x2 x2 x2 x2 x1[6] x1[6]
decode_mode
3D format with linear x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2
filtering
N channel 32bit/
channel format with x4N x4N x4N x4N x4N x4N xN xN xN xN xN xN xN xN xN xN
linear filtering
N plane YUV format xN xN x1 x1 x1 x1 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2

Bifrost ISA Config Mali-G71 Mali-G72 Mali-G31 Mali-G51 Mali-G52 Mali-G76 1: OpenGL ES has 4 render targets and Vulkan 8
Thread count (max) 384 384 256/512 512/768 512/768 768
Max work registers (32b) 64 64 64 64 64 64 2: Tile storage per pixel may be able to exceed this, but with reduced tile size.
Thread count with 0-32 work registers 384 384 256/512 512/768 512/768 768 Theoretical limit is higher from Mali-G710 onward, but 256 is recommendation
Thread count with 33-64 work registers 384 384 128/256 256/384 256/384 384
3: Worst-case anisotropic filtering performance with a MAX_ANISOTROPY = N
Valhall ISA Config Mali-G57 Mali-G77 Mali-G78 Mali-G710 Mali-G510 Mali-G310 Mali-G715 Immortalis-G715
4: Mali-G72 r0p3 / Mali-G51 r1p1 or higher required
Thread count (max) 1024 1024 1024 2048 1536-2048 512-2048 2048 2048
Max work registers (32b) 64 64 64 64 64 64 64 64 5: All have float blending. Valhall adds hardware acceleration for standard blend
Thread count with 0-32 work registers 1024 1024 1024 2048 1536-2048 512-2048 2048 2048
operations
Thread count with 33-64 work registers 512 512 512 1024 768-1024 256-1024 1024 1024
6: Only fp16 and UNORM10 formats fully achieve x1
5th Gen ISA Config Mali & Immortalis-G720 Mali & Immortalis-G725 Mali G1
Thread count (max) 2048 2048 2048
Max work registers (32b) 64 64 64
Thread count with 0-32 work registers 2048 2048 2048
Thread count with 33-64 work registers 1024 1024 1024

© Arm Ltd. 2025 Confidential


This reference sheet covers from the Bifrost Mali-G71, to 5th Gen GPUs up Finally, the architecture-specific tables give thread counts and registers for the NOTE: Mali-G78AE has the same base configurations and support as Mali-G78,
to Immortalis-G1. chips. For more on the generations of Arm architectures see links below. but includes extra safety features. Mali-G6 series has the same specifications as
Mali-G7 series for the values in this sheet.
The API Support, Core Features and Microarchitecture Features tables cover which
For a general picture of Arm GPU architectures see:
GPUs support which technologies. For more on given technologies see links
below. • Arm GPU Architectures

The Core Config table details the specs of the chips, rather than just whether Specific architecture pages:
features are available. As such for each GPU it has threads in a warp, total threads,
• Bifrost (Mali-G71 – Mali-G76)
and operations/texels etc per clock cycle, as well as cache sizes. Operations/
cycle metrics count FMA operations, and count each FMA as 2 operations/cycle. • Valhall (Mali-G57 – Immortalis-G715)
Note that for tile write rate on Arm chips this is both fragments written into the tile • 5th Gen (Mali-G720 – Mali G1)
and the pixels written back out of the tile. Thread count is the total shader core • Performance Counters
hardware capacity; note that for OpenGL ES only 128 threads are exposed. For
Mali-G310 and Mali-G510 Core Config has ranges depending on implementation – For further reference on the technologies mentioned in the sheet, please
refer to these webpages:
please check with device manufacturer for exact specification.

• ASTC (Adaptive Scalable Texture


For Texturing, to work out cycles/sample for more complicated filters than Compression)
bilinear, apply the multiplications in the tables on top of the bilinear performance
• AFBC (Arm FrameBuffer Compression)
to combine to the required filter. Remember to invert the bilinear samples/
cycle to get cycles/sample. For example, a simple trilinear will be 2 x 1 cycles/ • MSAA (Multi-Sample Anti-Aliasing)
sample on a Mali-G72, and 2 x 0.25 cycles/sample on a Mali-G77. To add in 4x • Transaction Elimination
anisotropic filtering, multiply by a further 4x. Note that anisotropic filter scaling is
• Hidden surface removal
the worst-case number caused by the maximum number of sample taps, it will
• IDVS (Index-Driven Vertex Shading)
usually be less than this. Texture performance will differ from Image performance.
Depth performance with/without reference refers to e.g., a shadow sampler with • DVS (Deferred Vertex Shading)
reference comparison returning a weighted bool vs a normal sample returning the • Shader framebuffer access
actual depth value.
• GLES
• Vulcan

For free GPU profiling tools, see:

• Arm Performance Studio

© Arm Ltd. 2025 Confidential

You might also like