Loading AI tools
X86 instruction set extension developed by Intel From Wikipedia, the free encyclopedia
The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.[1] There are two variants:
FMA3 and FMA4 instructions have almost identical functionality, but are not compatible. Both contain fused multiply–add (FMA) instructions for floating-point scalar and SIMD operations, but FMA3 instructions have three operands, while FMA4 ones have four. The FMA operation has the form d = round(a · b + c), where the round function performs a rounding to allow the result to fit within the destination register if there are too many significant bits to fit within the destination.
The four-operand form (FMA4) allows a, b, c and d to be four different registers, while the three-operand form (FMA3) requires that d be the same register as a, b or c. The three-operand form makes the code shorter and the hardware implementation slightly simpler, while the four-operand form provides more programming flexibility.
See XOP instruction set for more discussion of compatibility issues between Intel and AMD.
Supported commands include
Mnemonic | Operation | Mnemonic | Operation |
---|---|---|---|
VFMADD | result = + a · b + c | VFMADDSUB | result = a · b + c for i = 1, 3, ...result = a · b − c for i = 0, 2, ... |
VFNMADD | result = − a · b + c | ||
VFMSUB | result = + a · b − c | VFMSUBADD | result = a · b − c for i = 1, 3, ...result = a · b + c for i = 0, 2, ... |
VFNMSUB | result = − a · b − c |
result = − a · b + c
, not result = − (a · b + c)
.Explicit order of operands is included in the mnemonic using numbers "132", "213", and "231":
Postfix 1 | Operation | possible memory operand | overwrites |
---|---|---|---|
132 | a = a · c + b | c (factor) | a (other factor) |
213 | a = b · a + c | c (summand) | a (factor) |
231 | a = b · c + a | c (factor) | a (summand) |
as well as operand format (packed or scalar) and size (single or double).
Postfix 2 | precision | size | Postfix 2 | precision | size |
---|---|---|---|---|---|
SS | Single | 32 bit | SD | Double | 64 bit |
PSx | 4× 32 bit | PDx | 2× 64 bit | ||
PSy | 8× 32 bit | PDy | 4× 64 bit | ||
PSz | 16× 32 bit | PDz | 8× 64 bit |
This results in
Encoding | Mnemonic | Operands | Operation |
---|---|---|---|
VEX.256.66.0F38.W1 98 /r |
VFMADD132PDy | ymm, ymm, ymm/m256 | a = a · c + b |
VEX.256.66.0F38.W0 98 /r |
VFMADD132PSy | ||
VEX.128.66.0F38.W1 98 /r |
VFMADD132PDx | xmm, xmm, xmm/m128 | |
VEX.128.66.0F38.W0 98 /r |
VFMADD132PSx | ||
VEX.LIG.66.0F38.W1 99 /r |
VFMADD132SD | xmm, xmm, xmm/m64 | |
VEX.LIG.66.0F38.W0 99 /r |
VFMADD132SS | xmm, xmm, xmm/m32 | |
VEX.256.66.0F38.W1 A8 /r |
VFMADD213PDy | ymm, ymm, ymm/m256 | a = b · a + c |
VEX.256.66.0F38.W0 A8 /r |
VFMADD213PSy | ||
VEX.128.66.0F38.W1 A8 /r |
VFMADD213PDx | xmm, xmm, xmm/m128 | |
VEX.128.66.0F38.W0 A8 /r |
VFMADD213PSx | ||
VEX.LIG.66.0F38.W1 A9 /r |
VFMADD213SD | xmm, xmm, xmm/m64 | |
VEX.LIG.66.0F38.W0 A9 /r |
VFMADD213SS | xmm, xmm, xmm/m32 | |
VEX.256.66.0F38.W1 B8 /r |
VFMADD231PDy | ymm, ymm, ymm/m256 | a = b · c + a |
VEX.256.66.0F38.W0 B8 /r |
VFMADD231PSy | ||
VEX.128.66.0F38.W1 B8 /r |
VFMADD231PDx | xmm, xmm, xmm/m128 | |
VEX.128.66.0F38.W0 B8 /r |
VFMADD231PSx | ||
VEX.LIG.66.0F38.W1 B9 /r |
VFMADD231SD | xmm, xmm, xmm/m64 | |
VEX.LIG.66.0F38.W0 B9 /r |
VFMADD231SS | xmm, xmm, xmm/m32 |
Mnemonic (AT&T) | Operands | Operation |
---|---|---|
VFMADDPDx | xmm, xmm, xmm/m128, xmm/m128 | a = b·c + d |
VFMADDPDy | ymm, ymm, ymm/m256, ymm/m256 | |
VFMADDPSx | xmm, xmm, xmm/m128, xmm/m128 | |
VFMADDPSy | ymm, ymm, ymm/m256, ymm/m256 | |
VFMADDSD | xmm, xmm, xmm/m64, xmm/m64 | |
VFMADDSS | xmm, xmm, xmm/m32, xmm/m32 |
The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:
Different compilers provide different levels of support for FMA:
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.