Machine learning has been one of the most distinctive and widely used workloads these days. Intel has recently offered the details of Advanced Matrix Extension (AMX) in this context.
Note: If you buy something from our links, we might earn a commission. See our disclosure statement.
It is an X86 extension that is designed to accelerate machine learning workloads. This is the third iteration of the extensions that Intel has launched lately. Launched as part of the DL Boost technology brand, it does have far-reaching implications.
Advanced Matrix Extensions (AMX), also comprehended as Intel Advanced Matrix Extensions (Intel AMX), are extensions to the existing x86 instruction set architecture (ISA) for microprocessors from Intel and Advanced Micro Devices (AMD) designed to work on matrices to accelerate artificial intelligence (AI) / machine learning (ML) -related workloads.
AMX was presented by Intel in June 2020 and was first supported by Intel with the new Sapphire Rapids microarchitecture for Xeon servers, planned for this year and beyond. It presented 2-dimensional registers called tiles upon which accelerators can conduct operations. It is planned as an extensible architecture; the first accelerator implemented is called tile matrix multiply unit or TMUL.
TMUL unit reinforces BF16 and INT8 input types. The register file consists of 8 tiles, each with 16 rows of size 64-byte (32 BF16 or 64 INT8 values). AMX uses eight 1024-bit registers for fundamental data operators, and through memory connections, the TMUL instructions will operate on tiles of data using those tile registers.
The TMUL is supported via a dedicated Engine Co-processor built into the core (of which each core has one), and the basis behind AMX is that TMUL is only one such co-processor.
Intel has developed AMX to be wider-ranging than simply this – in the possibility that Intel goes deeper with its silicon multi-die strategy, at some point, we could glimpse custom accelerators being enabled through AMX.
As we already found out. AMX or Advanced Matrix Extension is the third in the series of AI-specific extensions that Intel has launched.
Image Source: Intel
The first one was AVX512_VNNI which was launched along with Cascade lake. It was explicitly targeted at the speeding up of the CNNs kernels. The next version, AVX512_BF16, came with Cooper lake.
It was designed for converting single-precision floating-point values to bfloat16. Finally, the AMX is the third one in the row introduced along with the 4th-generation Xeon Scalable based on the Sapphire Rapids microarchitecture in 2021.
The AMX or Advanced matrix Extension is a new X86 extension. You may find it a little more complex than the previous generations of extensions. It may be noticed that the earlier generations viz _VNNI and _BF16 were based on AVX512. But the AMX offers you a standalone performance which means it comes with its storage and operational components.
As mentioned before, Intel has introduced a new matrix register file with eight rank-2 tensor (matrix) registers on the AMX. These registers are called Tiles. You would also find the accelerators that can operate on those tiles.
However, the extension does have the same type of implementation and thus does not need any changes in the overall architecture. As is typical with the other extensions, one of the best things is that the AMX can be interleaved with other extensions.
The matrix register file consists of eight tiles. You can configure and provide instructions for the tiles with the tile control register (TILECFG). You can easily configure the size of the tiles with the register. Moreover, the size of the tile can be configured based on the algorithm being used with them.
As things stand as of now, AMX supports up to 12 instructions. They have been grouped into three categories:
AMX is defined in the same way as the other AVX512 extensions.
You will find the base tile architecture coupled with the instructions that accompany them. This is referred to as AMX TILE. The AMX TILE has the instructions for both configuring the tile and the instructions for operating it.
However, the current ISA standards are quite not straightforward concerning the support for Sapphire rapids. The ISA Extensions Reference Manual suggests that the extension supports the Sapphire rapids.
There is no proper mention of how they will be supported. That can be a little ambiguous in nature. In any case, ever since AMX was released, we have noticed the support being launched for ecosystems in a very phased and gradual manner.
The new extension in the form of AMX definitely would improve real-world AI workloads’ performance. This can also go a long way in implanting powerful AI training and inference efficiency. The role of AMX in the Broader AI industry will be considered huge, but the exact impact may not be estimated at this moment.
Intel recently came up with an interesting roadmap on the sapphire rapids. More details on the chips and their features are still awaited. The Full Advanced Matrix Extension (AMX) specification has already been made available. We would look at it as a significant improvement and enhancement concerning an enhanced experience.
Let us know your thoughts and suggestions in the comment box below. Thanks for visiting!
Comments are closed.