Tech Posts

Arm Mali GPU Analysis: Valhall vs. Bifrost Architecture, Specs & Benchmarks

Arm Mali GPU Analysis: Valhall vs. Bifrost Architecture, Specs & Benchmarks

Arm’s transition from the Bifrost architecture to Valhall represents a fundamental restructuring of mobile graphics processing.

Note: If you buy something from our links, we might earn a commission. See our disclosure statement.

This analysis contrasts the technical specifications and real-world performance of key GPU models, ranging from the legacy Mali-G51 to the performance-focused Mali-G610. By examining hardware-level distinctions—specifically the move from thread Quads to 16-wide Warps and the addition of the Command Stream Frontend—we determine how these engineering decisions impact frame rates in 3D applications.

The following sections detail the core configurations, execution engines, and throughput data that distinguish these architectures.

Arm Mali GPU Architecture Comparison

Valhall vs. Bifrost: The Generational Leap.

An interactive technical analysis of Arm’s mobile GPU architectures, performance density, and execution engines. Updated October 2025.

Performance Density Hierarchy

Raw GFXBench Manhattan 3.1 (Offscreen 1080p) performance density. Higher scores indicate superior peak throughput in complex shading scenarios relative to generation. Data aggregated from public benchmarks.

Technical Specification Matrix

Model Architecture Year Exec Engines FMA Lanes/Core Key Feature
Mali-G610 Valhall 3 2021 1 Engine 64 Lanes Command Stream Frontend
Mali-G68 Valhall 2 2020 1 Engine 32 Lanes Async Top Level
Mali-G77 Valhall 1 2019 1 Superscalar 32 Lanes Quad Texture Mapper
Mali-G57 Valhall 1 2019 1 Superscalar 32 Lanes High Density Mainstream
Mali-G76 Bifrost 3 2018 3 Engines 24 Lanes (3×8) Triple-Engine Core
Mali-G52 Bifrost 2 2018 1 Engine 8 Lanes Int8 Dot Product (ML)
Mali-G51 Bifrost 1 2016 1 Engine 4 Lanes Dual-Pixel Core

Real-World Gaming Benchmarks

Theoretical FLOPs rarely translate perfectly to gaming frame rates due to thermal throttling and memory bandwidth. The following table aggregates average GFXBench Manhattan 3.1 Offscreen (1080p) scores from commercial devices.

GPU Model Typical Device (SoC) Score (FPS) Performance Note
Mali-G610 MC6 Dimensity 8100 (Realme GT Neo 3) 102 – 115 Near-flagship performance; beats older G77 due to 64 FMA lanes.
Mali-G77 MP11 Exynos 990 (Galaxy S20) 77 – 85 High peak, but prone to thermal throttling in sustained loads.
Mali-G76 MP16 Kirin 990 (Huawei P40) 55 – 75 Massive core count (16) limited by Bifrost scheduling overhead.
Mali-G68 MC4 Dimensity 900 (Samsung A53) 40 – 45 Strong mid-range. Matches older flagships in efficiency.
Mali-G57 MC2 Dimensity 700 (Redmi Note 10 5G) 24 – 26 Entry-level 5G standard. Capable of 30fps in modern titles.
Mali-G52 MC2 Helio G80 (Redmi 9) 14 – 15 Budget tier. Struggles with high-fidelity 3D rendering.
Mali-G51 MP4 Kirin 710 (Honor 8X) 12 – 14 Legacy. Suitable only for casual 2D gaming or light 3D.

Microarchitecture Deep Dive

The Control Logic Shift

CSF vs Job

The Mali-G610 introduces the Command Stream Frontend (CSF), a dedicated microcontroller inside the GPU. In older generations like the G77 and G76, a “Job Manager” relied on the CPU to handle scheduling, creating latency bubbles. The CSF parses command streams directly from memory, reducing CPU overhead.

Texture Mapping Units

4 vs 2

The Mali-G77 and G57 (Valhall) utilize a Quad Texture Mapper, processing 4 texels per clock. The previous Bifrost generation (G76, G52) used a Dual Texture Mapper (2 texels/clock). This architectural doubling allows Valhall GPUs to handle high-resolution textures significantly better.

Warp Execution

16 vs 4

Bifrost (G76, G52) executes threads in “Quads” (groups of 4). Valhall (G77, G610) expands this to 16-wide Warps. This aligns Arm GPUs with desktop-class architectures, improving “Performance Density” by amortizing control logic over more threads.

Machine Learning: From Dot Products to Matrix Ops

The role of the GPU has expanded beyond pixels. The Mali-G52 was a pivotal chip, introducing hardware acceleration for Int8 dot products, enabling mid-range phones to run AI face detection locally.

The Mali-G77 and G57 refined this, but the Mali-G68 and G610 (Valhall Gen 2/3) integrated explicit Matrix Multiply instructions. These instructions are optimized for the matrix-heavy math of modern Neural Networks, providing a 2x-4x uplift in AI inference speed compared to the G76.

Variable Rate Shading (VRS)

Supported: Mali-G610, Mali-G68 (Tier 1)

Not Supported: Mali-G77, G76, G57, G52

VRS allows the GPU to reduce shading precision in areas of the screen where the user isn’t looking (like shadows or fast-moving edges). The G610 and G68 support this in hardware, offering up to 30% power savings in supported games. Older architectures like the G77 must shade every pixel at full resolution, consuming more battery for the same visual output.

Architectural Evolution Details

The Quad vs. Warp Paradigm Shift

One of the most defining differences between the generations in this comparison is how they group threads. The Bifrost architecture (G51, G52, G76) utilizes “Quads”—groups of 4 threads. Each quad shares control logic, but within that quad, instructions are executed. This was efficient for simple graphics but hit scaling limits.

Valhall (G77, G57, G68, G610) moved to a “Warp” model, grouping 16 threads. This matches the way modern desktop GPUs and APIs like Vulkan operate. By sharing the instruction fetch and decode logic across 16 threads instead of 4, Valhall dramatically reduces the silicon area spent on “management” and increases the area spent on “math”. This is why a 6-core Mali-G610 can outperform a 10-core Mali-G76.

Vulkan API Support & Optimization

While all GPUs in this list support Vulkan 1.0, the experience differs significantly due to hardware capabilities:

  • Mali-G610 & G68: Fully optimized for Vulkan 1.3 (driver dependent). The Command Stream Frontend (CSF) in the G610 specifically accelerates Vulkan draw calls, making it the best choice for emulation and modern 3D titles.
  • Mali-G77 & G57: Strong Vulkan support, but relies on the CPU for job scheduling (Job Manager), which can introduce overhead in high-draw-call scenarios.
  • Mali-G52 & G51: Limited to older Vulkan feature sets. They lack hardware support for some modern features like subgroup operations, often forcing the driver to emulate them, which costs performance.

Frequently Asked Questions

Which GPU is best for gaming? +
The Mali-G610 is the superior choice among these options. Its 64 FMA lanes per core and Command Stream Frontend provide the highest frame rates and best efficiency. The G77 is also powerful but lacks the modern efficiency features of the G610.
What is the difference between G57 and G52? +
The G57 uses the newer Valhall architecture, while the G52 uses the older Bifrost. The G57 offers approximately 30% better performance density and significantly higher texture throughput (4 texels/cycle vs 2 texels/cycle), making it much better for high-resolution displays.
Does core count always matter? +
No. A 6-core Mali-G610 is significantly faster than an 11-core Mali-G77. The architectural density (operations per clock cycle per core) of the G610 is much higher (64 FMA lanes vs 32 FMA lanes), meaning it can do more work with fewer cores.
Does the Mali-G68 support Ray Tracing? +
No. Hardware Ray Tracing was introduced in the Immortalis-G715. The Mali-G68 and G610 rely on standard rasterization, though they support Variable Rate Shading (VRS) to improve performance.
Affiliate Disclosure: Faceofit.com is a participant in the Amazon Services LLC Associates Program. As an Amazon Associate we earn from qualifying purchases.

What's your reaction?

Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
Next Article:

0 %