

# AMD x86-64 Architecture Programmer's Manual Volume 4: 128-Bit Media Instructions

| Publication No. | Revision | Date        |
|-----------------|----------|-------------|
| 26568           | 3.03     | August 2002 |

© 2002 Advanced Micro Devices, Inc. All rights reserved.

The contents of this document are provided in connection with Advanced Micro Devices, Inc. ("AMD") products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD's Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right.

AMD's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD's product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice.

#### Trademarks

AMD, the AMD arrow logo, AMD Athlon, AMD Duron, and combinations thereof, and 3DNow! are trademarks, and Am486, Am5<sub>x</sub>86, and AMD-K6 are registered trademarks of Advanced Micro Devices, Inc.

MMX is a trademark and Pentium is a registered trademark of Intel Corporation.

Windows NT is a registered trademark of Microsoft Corp.

Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

## Contents

| Figu  | es                                  | vii  |
|-------|-------------------------------------|------|
| Table | es                                  | .ix  |
| Prefa | ICe                                 | . xi |
|       | About This Book                     |      |
|       | Audience                            |      |
|       | Organization                        |      |
|       | Definitions.                        |      |
|       | Related Documentsxx                 |      |
| 1     | 128-Bit Media Instruction Reference | . 1  |
|       | ADDPD                               | . 4  |
|       | ADDPS                               | . 7  |
|       | ADDSD                               | 10   |
|       | ADDSS                               | 12   |
|       | ANDNPD                              | 15   |
|       | ANDNPS                              | 17   |
|       | ANDPD                               | 19   |
|       | ANDPS                               | 21   |
|       | CMPPD                               | 23   |
|       | CMPPS                               |      |
|       | CMPSD                               | 30   |
|       | CMPSS                               |      |
|       | COMISD                              |      |
|       | COMISS                              |      |
|       | CVTDQ2PD                            |      |
|       | CVTDQ2PS                            |      |
|       | CVTPD2D0                            |      |
|       | CVTPD2PI                            |      |
|       | CVTPD2PS                            |      |
|       | CVTPI2PD                            | -    |
|       | CVTPI2PS                            |      |
|       | CVTPS2DQ                            |      |
|       | CVTPS2PD                            |      |
|       | CVTPS2PI                            |      |
|       | CVTSD2SI                            |      |
|       | CVTSD2SS                            |      |
|       | CVTSI2SD                            |      |
|       | CVTSI2SD                            |      |
|       | CVTSS2SD                            |      |
|       | CV15525D                            |      |
|       | CVTTPD2DQ                           |      |
|       | CVTTPD2DQ                           |      |
|       | Cv11rD2r1                           | 0/   |

### 

AMD 64-Bit Technology

| CVTTPS2DQ  |     |
|------------|-----|
| CVTTPS2PI  |     |
| CVTTSD2SI  |     |
| CVTTSS2SI. |     |
| DIVPD      |     |
| DIVPS      |     |
| DIVSD      |     |
| DIVSS      |     |
| FXRSTOR.   |     |
| FXSAVE     |     |
| LDMXCSR    |     |
| MASKMOVDQU |     |
| MAXPD.     |     |
| MAXPS      |     |
| MAXSD      |     |
| MAXSS      |     |
| MINPD      |     |
| MINPS.     | -   |
| MINSD      |     |
| MINSS.     |     |
| MOVAPD.    |     |
| MOVAPS     |     |
| MOVD       |     |
| MOVDQ2Q    |     |
| MOVDQA     |     |
| MOVDQU     |     |
| MOVHLPS    |     |
| MOVHPD     |     |
| MOVHPS     |     |
| MOVLHPS    |     |
| MOVLPD.    |     |
| MOVLPS     |     |
| MOVMSKPD.  |     |
| MOVMSKPS   | 167 |
| MOVNTDQ    |     |
| MOVNTPD    | -   |
| MOVNTPS    |     |
| MOVQ       |     |
| MOVQ2DQ    |     |
| MOVSD      |     |
| MOVSS      |     |
| MOVUPD     |     |
| MOVUPS     |     |
| MULPD      |     |
| MULPS      |     |
| MULSD      |     |
| MULSS      | 198 |

| ORPD     | 201 |
|----------|-----|
| ORPS     |     |
| PACKSSDW |     |
| PACKSSWB |     |
| PACKUSWB |     |
| PADDB    |     |
| PADDD    | -   |
| PADDQ    |     |
| PADDSB   | -   |
| PADDSW   |     |
| PADDUSB  |     |
| PADDUSW  |     |
| PADDW    |     |
| PAND     |     |
| PANDN    |     |
| PAVGB    |     |
| PAVGW    |     |
| PCMPEQB  | 235 |
| PCMPEQD  | 237 |
| PCMPEQW  | 239 |
| РСМРСТВ  | 241 |
| PCMPGTD  | 243 |
| PCMPGTW  | 245 |
| PEXTRW   | 247 |
| PINSRW   | 249 |
| PMADDWD  | 252 |
| PMAXSW   | 254 |
| PMAXUB   | 256 |
| PMINSW   | 258 |
| PMINUB   | 260 |
| PMOVMSKB | 262 |
| PMULHUW  | 264 |
| PMULHW   | 266 |
| PMULLW   | 268 |
| PMULUDQ  | 270 |
| POR      | 272 |
| PSADBW   | 274 |
| PSHUFD   | 276 |
| PSHUFHW  | 279 |
| PSHUFLW  | 282 |
| PSLLD    | 285 |
| PSLLDQ   | 287 |
| PSLLQ    |     |
| PSLLW    |     |
| PSRAD    |     |
| PSRAW    |     |
| PSRLD    |     |

## 

AMD 64-Bit Technology

|       | PSRLDQ                                  | 301 |
|-------|-----------------------------------------|-----|
|       | PSRLO                                   | 303 |
|       | PSRLW                                   |     |
|       | PSUBB                                   |     |
|       | PSUBD                                   |     |
|       |                                         | 0-0 |
|       | PSUBQ                                   |     |
|       | PSUBSB                                  |     |
|       | PSUBSW                                  |     |
|       | PSUBUSB                                 | 318 |
|       | PSUBUSW                                 | 320 |
|       | PSUBW                                   | 322 |
|       | PUNPCKHBW                               | 324 |
|       | PUNPCKHDQ                               | 326 |
|       | PUNPCKHQDQ                              |     |
|       | PUNPCKHWD                               |     |
|       | PUNPCKLBW                               |     |
|       | PUNPCKLDO                               |     |
|       | C                                       |     |
|       | PUNPCKLQDQ                              |     |
|       | PUNPCKLWD                               |     |
|       | PXOR                                    |     |
|       | RCPPS                                   |     |
|       | RCPSS                                   |     |
|       | RSQRTPS                                 | 346 |
|       | RSQRTSS                                 | 348 |
|       | SHUFPD                                  | 350 |
|       | SHUFPS                                  | 353 |
|       | SQRTPD                                  | 356 |
|       | SORTPS                                  |     |
|       | SQRTSD                                  |     |
|       | SORTSS                                  |     |
|       | STMXCSR                                 |     |
|       | SUBPD                                   |     |
|       |                                         |     |
|       | SUBPS                                   |     |
|       | SUBSD                                   |     |
|       | SUBSS                                   |     |
|       | UCOMISD                                 |     |
|       | UCOMISS                                 | 384 |
|       | UNPCKHPD                                | 387 |
|       | UNPCKHPS                                | 389 |
|       | UNPCKLPD                                | 391 |
|       | UNPCKLPS                                |     |
|       | XORPD                                   |     |
|       | XORPS                                   |     |
|       |                                         |     |
| Index | ••••••••••••••••••••••••••••••••••••••• | 399 |
|       |                                         |     |

# Figures

Figure 1-1. Diagram Conventions for 128-Bit Media Instructions .....2

## 

AMD 64-Bit Technology

26568–Rev. 3.02–August 2002

# Tables

| Table 1-1. | Immediate Operand Values for Compare Operations24      |
|------------|--------------------------------------------------------|
| Table 1-2. | Immediate-Byte Operand Encoding for 128-Bit PEXTRW247  |
| Table 1-3. | Immediate-Byte Operand Encoding for 128-Bit PINSRW 250 |
| Table 1-4. | Immediate-Byte Operand Encoding for PSHUFD277          |
| Table 1-5. | Immediate-Byte Operand Encoding for PSHUFHW280         |
| Table 1-6. | Immediate-Byte Operand Encoding for PSHUFLW 283        |
| Table 1-7. | Immediate-Byte Operand Encoding for SHUFPD351          |
| Table 1-8. | Immediate-Byte Operand Encoding for SHUFPS354          |

## 

AMD 64-Bit Technology

26568–Rev. 3.02–August 2002

# Preface

## **About This Book**

This book is part of a multivolume work entitled the AMD x86-64 Architecture Programmer's Manual. This table lists each volume and its order number.

| Title                                                      | Order No. |
|------------------------------------------------------------|-----------|
| Volume 1, Application Programming                          | 24592     |
| Volume 2, System Programming                               | 24593     |
| Volume 3, General-Purpose and System Instructions          | 24594     |
| Volume 4, 128-Bit Media Instructions                       | 26568     |
| Volume 5, 64-Bit Media and x87 Floating-Point Instructions | 26569     |

## Audience

This volume (Volume 4) is intended for all programmers writing application or system software for processors that implement the x86-64 architecture.

## Organization

Volumes 3, 4, and 5 describe the x86-64 architecture's instruction set in detail. Together, they cover each instruction's mnemonic syntax, opcodes, functions, affected flags, and possible exceptions.

The x86-64 instruction set is divided into five subsets:

- General-purpose instructions
- System instructions
- 128-bit media instructions
- 64-bit media instructions
- x87 floating-point instructions

Several instructions belong to—and are described identically in—multiple instruction subsets.

This volume describes the 128-bit media instructions. The index at the end cross-references topics within this volume. For other topics relating to the x86-64 architecture, and for information on instructions in other subsets, see the tables of contents and indexes of the other volumes.

## Definitions

Many of the following definitions assume an in-depth knowledge of the legacy x86 architecture. See "Related Documents" on page xxiii for descriptions of the legacy x86 architecture.

**Terms and Notation** In addition to the notation described below, "Opcode-Syntax Notation" in Volume 3 describes notation relating specifically to opcodes.

#### 1011b

A binary value—in this example, a 4-bit value.

#### F0EAh

A hexadecimal value—in this example a 2-byte value.

#### [1,2)

A range that includes the left-most value (in this case, 1) but excludes the right-most value (in this case, 2).

#### 7–4

A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first.

#### 128-bit media instructions

Instructions that use the 128-bit XMM registers. These are a combination of the SSE and SSE2 instruction sets.

#### 64-bit media instructions

Instructions that use the 64-bit MMX<sup>TM</sup> registers. These are primarily a combination of MMX and 3DNow!<sup>TM</sup> instruction sets, with some additional instructions from the SSE and SSE2 instruction sets.

#### 16-bit mode

Legacy mode or compatibility mode in which a 16-bit address size is active. See *legacy mode* and *compatibility mode*.

#### 32-bit mode

Legacy mode or compatibility mode in which a 32-bit address size is active. See *legacy mode* and *compatibility mode*.

#### 64-bit mode

A submode of *long mode*. In 64-bit mode, the default address size is 64 bits and new features, such as register extensions, are supported for system and application software.

#### #GP(0)

Notation indicating a general-protection exception (#GP) with error code of 0.

#### absolute

Said of a displacement that references the base of a code segment rather than an instruction pointer. Contrast with *relative*.

#### biased exponent

The sum of a floating-point value's exponent and a constant bias for a particular floating-point data type. The bias makes the range of the biased exponent always positive, which allows reciprocation without overflow.

#### byte

Eight bits.

#### clear

To write a bit value of 0. Compare *set*.

#### compatibility mode

A submode of *long mode*. In compatibility mode, the default address size is 32 bits, and legacy 16-bit and 32-bit applications run without modification.

#### commit

To irreversibly write, in program order, an instruction's result to software-visible storage, such as a register (including flags), the data cache, an internal write buffer, or memory.

#### CPL

Current privilege level.

#### CR0–CR4

A register range, from register CR0 through CR4, inclusive, with the low-order register first.

#### CR0.PE = 1

Notation indicating that the PE bit of the CR0 register has a value of 1.

#### direct

Referencing a memory location whose address is included in the instruction's syntax as an immediate operand. The address may be an absolute or relative address. Compare *indirect*.

#### dirty data

Data held in the processor's caches or internal buffers that is more recent than the copy held in main memory.

#### displacement

A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer (relative addressing). Same as *offset*.

#### doubleword

Two words, or four bytes, or 32 bits.

#### double quadword

Eight words, or 16 bytes, or 128 bits. Also called octword.

#### DS:rSI

The contents of a memory location whose segment address is in the DS register and whose offset relative to that segment is in the rSI register.

#### EFER.LME = 0

Notation indicating that the LME bit of the EFER register has a value of 0.

#### effective address size

The address size for the current instruction after accounting for the default address size and any address-size override prefix.

#### effective operand size

The operand size for the current instruction after accounting for the default operand size and any operandsize override prefix.

element

See vector.

#### exception

An abnormal condition that occurs as the result of executing an instruction. The processor's response to an exception depends on the type of the exception. For all exceptions except 128-bit media SIMD floating-point exceptions and x87 floating-point exceptions, control is transferred to the handler (or service routine) for that exception, as defined by the exception's vector. For floating-point exceptions defined by the IEEE 754 standard, there are both masked and unmasked responses. When unmasked, the exception handler is called, and when masked, a default response is provided instead of calling the handler.

#### FF /0

Notation indicating that FF is the first byte of an opcode, and a subfield in the second byte has a value of 0.

#### flush

An often ambiguous term meaning (1) writeback, if modified, and invalidate, as in "flush the cache line," or (2) invalidate, as in "flush the pipeline," or (3) change a value, as in "flush to zero."

#### GDT

Global descriptor table.

#### IDT

Interrupt descriptor table.

#### IGN

Ignore. Field is ignored.

#### indirect

Referencing a memory location whose address is in a register or other memory location. The address may be an absolute or relative address. Compare *direct*.

#### IRB

The virtual-8086 mode interrupt-redirection bitmap.

#### IST

The long-mode interrupt-stack table.

#### IVT

The real-address mode interrupt-vector table.

#### LDT

Local descriptor table.

#### legacy x86

The legacy x86 architecture. See "Related Documents" on page xxiii for descriptions of the legacy x86 architecture.

#### legacy mode

An operating mode of the x86-64 architecture in which existing 16-bit and 32-bit applications and operating systems run without modification. A processor implementation of the x86-64 architecture can run in either *long mode* or *legacy mode*. Legacy mode has three submodes, *real mode*, *protected mode*, and *virtual-8086 mode*.

#### long mode

An operating mode unique to the x86-64 architecture. A processor implementation of the x86-64 architecture can run in either *long mode* or *legacy mode*. Long mode has two submodes, 64-bit mode and compatibility mode.

#### lsb

Least-significant bit.

#### LSB

Least-significant byte.

#### main memory

Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular computer system.

#### mask

(1) A control bit that prevents the occurrence of a floatingpoint exception from invoking an exception-handling routine. (2) A field of bits used for a control purpose.

#### MBZ

Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP) occurs.

#### memory

Unless otherwise specified, main memory.

#### ModRM

A byte following an instruction opcode that specifies address calculation based on mode (Mod), register (R), and memory (M) variables.

#### moffset

A direct memory offset. In other words, a displacement that is added to the base of a code segment (for absolute addressing) or to an instruction pointer (for addressing relative to the instruction pointer, as in RIP-relative addressing).

#### msb

Most-significant bit.

#### **MSB**

Most-significant byte.

#### multimedia instructions

A combination of 128-bit media instructions and 64-bit media instructions.

#### octword

Same as *double quadword*.

#### offset

Same as *displacement*.

#### overflow

The condition in which a floating-point number is larger in magnitude than the largest, finite, positive or negative number that can be represented in the data-type format being used.

#### packed

See vector.

#### PAE

Physical-address extensions.

#### physical memory

Actual memory, consisting of main memory and cache.

#### probe

A check for an address in a processor's caches or internal buffers. *External probes* originate outside the processor, and *internal probes* originate within the processor.

#### protected mode

A submode of *legacy mode*.

#### quadword

Four words, or eight bytes, or 64 bits.

#### RAZ

Read as zero (0), regardless of what is written.

#### real-address mode

See real mode.

#### real mode

A short name for *real-address mode*, a submode of *legacy mode*.

#### relative

Referencing with a displacement (also called offset) from an instruction pointer rather than the base of a code segment. Contrast with *absolute*.

#### REX

An instruction prefix that specifies a 64-bit operand size and provides access to additional registers.

#### RIP-relative addressing

Addressing relative to the 64-bit RIP instruction pointer. Compare *moffset*.

#### set

To write a bit value of 1. Compare *clear*.

#### SIB

A byte following an instruction opcode that specifies address calculation based on scale (S), index (I), and base (B).

#### SIMD

Single instruction, multiple data. See vector.

#### SSE

Streaming SIMD extensions instruction set. See 128-bit media instructions and 64-bit media instructions.

#### SSE2

Extensions to the SSE instruction set. See 128-bit media instructions and 64-bit media instructions.

#### sticky bit

A bit that is set or cleared by hardware and that remains in that state until explicitly changed by software.

#### TOP

The x87 top-of-stack pointer.

#### TPR

Task-priority register (CR8).

#### TSS

Task-state segment.

#### underflow

The condition in which a floating-point number is smaller in magnitude than the smallest nonzero, positive or negative number that can be represented in the data-type format being used.

#### vector

(1) A set of integer or floating-point values, called *elements*, that are packed into a single operand. Most of the 128-bit and 64-bit media instructions use vectors as operands. Vectors are also called *packed* or *SIMD* (single-instruction multiple-data) operands.

(2) An index into an interrupt descriptor table (IDT), used to access exception handlers. Compare *exception*.

|           | <i>virtual-8086 mode</i><br>A submode of <i>legacy mode</i> .                                                                                        |
|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------|
|           | word<br>Two bytes, or 16 bits.                                                                                                                       |
|           | x86<br>See <i>legacy</i> x86.                                                                                                                        |
| Registers | In the following list of registers, the names are used to refer<br>either to a given register or to the contents of that register:                   |
|           | AH–DH<br>The high 8-bit AH, BH, CH, and DH registers. Compare<br>AL–DL.                                                                              |
|           | AL–DL<br>The low 8-bit AL, BL, CL, and DL registers. Compare AH–DH.                                                                                  |
|           | AL-r15B<br>The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and<br>R8B-R15B registers, available in 64-bit mode.                                    |
|           | BP                                                                                                                                                   |
|           | Base pointer register.                                                                                                                               |
|           | CRn                                                                                                                                                  |
|           | Control register number <i>n</i> .                                                                                                                   |
|           | CS                                                                                                                                                   |
|           | Code segment register.                                                                                                                               |
|           | eAX-eSP                                                                                                                                              |
|           | The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP registers. Compare <i>rAX–rSP</i> . |
|           | EBP                                                                                                                                                  |
|           | Extended base pointer register.                                                                                                                      |
|           | EFER                                                                                                                                                 |
|           | Extended features enable register.                                                                                                                   |
|           | eFLAGS                                                                                                                                               |
|           | 16-bit or 32-bit flags register. Compare <i>rFLAGS</i> .                                                                                             |

#### EFLAGS

32-bit (extended) flags register.

#### eIP

16-bit or 32-bit instruction-pointer register. Compare rIP.

#### EIP

32-bit (extended) instruction-pointer register.

#### FLAGS

16-bit flags register.

#### GDTR

Global descriptor table register.

#### **GPRs**

General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP. For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8–R15.

#### IDTR

Interrupt descriptor table register.

#### IP

16-bit instruction-pointer register.

#### LDTR

Local descriptor table register.

#### MSR

Model-specific register.

#### r8–r15

The 8-bit R8B–R15B registers, or the 16-bit R8W–R15W registers, or the 32-bit R8D–R15D registers, or the 64-bit R8–R15 registers.

#### rAX-rSP

The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP registers. Replace the placeholder *r* with

nothing for 16-bit size, "E" for 32-bit size, or "R" for 64-bit size.

#### RAX

64-bit version of the EAX register.

#### RBP

64-bit version of the EBP register.

#### RBX

64-bit version of the EBX register.

#### RCX

64-bit version of the ECX register.

#### RDI

64-bit version of the EDI register.

#### RDX

64-bit version of the EDX register.

#### rFLAGS

16-bit, 32-bit, or 64-bit flags register. Compare RFLAGS.

#### RFLAGS

64-bit flags register. Compare rFLAGS.

#### rIP

16-bit, 32-bit, or 64-bit instruction-pointer register. Compare *RIP*.

#### RIP

64-bit instruction-pointer register.

#### RSI

64-bit version of the ESI register.

#### RSP

64-bit version of the ESP register.

#### SP

Stack pointer register.

#### SS

Stack segment register.

#### TPR

Task priority register, a new register introduced in the x86-64 architecture to speed interrupt management.

#### TR

Task register.

**Endian Order** The x86 and x86-64 architectures address memory using littleendian byte-ordering. Multibyte values are stored with their least-significant byte at the lowest byte address, and they are illustrated with their least significant byte at the right side. Strings are illustrated in reverse order, because the addresses of their bytes increase from right to left.

### **Related Documents**

- Peter Abel, *IBM PC Assembly Language and Programming*, Prentice-Hall, Englewood Cliffs, NJ, 1995.
- Rakesh Agarwal, 80x86 Architecture & Programming: Volume II, Prentice-Hall, Englewood Cliffs, NJ, 1991.
- AMD, AMD-K6<sup>TM</sup> MMX<sup>TM</sup> Enhanced Processor Multimedia *Technology*, Sunnyvale, CA, 2000.
- AMD, *3DNow!*<sup>TM</sup> *Technology Manual*, Sunnyvale, CA, 2000.
- AMD, AMD Extensions to the 3DNow!<sup>TM</sup> and MMX<sup>TM</sup> Instruction Sets, Sunnyvale, CA, 2000.
- Don Anderson and Tom Shanley, *Pentium Processor System Architecture*, Addison-Wesley, New York, 1995.
- Nabajyoti Barkakati and Randall Hyde, *Microsoft Macro Assembler Bible*, Sams, Carmel, Indiana, 1992.
- Barry B. Brey, 8086/8088, 80286, 80386, and 80486 Assembly Language Programming, Macmillan Publishing Co., New York, 1994.
- Barry B. Brey, Programming the 80286, 80386, 80486, and Pentium Based Personal Computer, Prentice-Hall, Englewood Cliffs, NJ, 1995.
- Ralf Brown and Jim Kyle, *PC Interrupts*, Addison-Wesley, New York, 1994.
- Penn Brumm and Don Brumm, 80386/80486 Assembly Language Programming, Windcrest McGraw-Hill, 1993.
- Geoff Chappell, *DOS Internals*, Addison-Wesley, New York, 1994.

- Chips and Technologies, Inc. Super386 DX Programmer's Reference Manual, Chips and Technologies, Inc., San Jose, 1992.
- John Crawford and Patrick Gelsinger, *Programming the* 80386, Sybex, San Francisco, 1987.
- Cyrix Corporation, 5x86 Processor BIOS Writer's Guide, Cyrix Corporation, Richardson, TX, 1995.
- Cyrix Corporation, *M1 Processor Data Book*, Cyrix Corporation, Richardson, TX, 1996.
- Cyrix Corporation, MX Processor MMX Extension Opcode Table, Cyrix Corporation, Richardson, TX, 1996.
- Cyrix Corporation, *MX Processor Data Book*, Cyrix Corporation, Richardson, TX, 1997.
- Jeffrey P. Doyer, *Introduction to Protected Mode Programming*, course materials for an onsite class, 1992.
- Ray Duncan, Extending DOS: A Programmer's Guide to Protected-Mode DOS, Addison Wesley, NY, 1991.
- William B. Giles, *Assembly Language Programming for the Intel 80xxx Family*, Macmillan, New York, 1991.
- Frank van Gilluwe, *The Undocumented PC*, Addison-Wesley, New York, 1994.
- John L. Hennessy and David A. Patterson, *Computer Architecture*, Morgan Kaufmann Publishers, San Mateo, CA, 1996.
- Thom Hogan, *The Programmer's PC Sourcebook*, Microsoft Press, Redmond, WA, 1991.
- Hal Katircioglu, *Inside the 486, Pentium, and Pentium Pro*, Peer-to-Peer Communications, Menlo Park, CA, 1997.
- IBM Corporation, *486SLC Microprocessor Data Sheet*, IBM Corporation, Essex Junction, VT, 1993.
- IBM Corporation, 486SLC2 Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT, 1993.
- IBM Corporation, 80486DX2 Processor Floating Point Instructions, IBM Corporation, Essex Junction, VT, 1995.
- IBM Corporation, 80486DX2 Processor BIOS Writer's Guide, IBM Corporation, Essex Junction, VT, 1995.
- IBM Corporation, *Blue Lightening 486DX2 Data Book*, IBM Corporation, Essex Junction, VT, 1994.

- Institute of Electrical and Electronics Engineers, *IEEE* Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985.
- Institute of Electrical and Electronics Engineers, IEEE Standard for Radix-Independent Floating-Point Arithmetic, ANSI/IEEE Std 854-1987.
- Muhammad Ali Mazidi and Janice Gillispie Mazidi, 80X86 IBM PC and Compatible Computers, Prentice-Hall, Englewood Cliffs, NJ, 1997.
- Hans-Peter Messmer, *The Indispensable Pentium Book*, Addison-Wesley, New York, 1995.
- Karen Miller, An Assembly Language Introduction to Computer Architecture: Using the Intel Pentium, Oxford University Press, New York, 1999.
- Stephen Morse, Eric Isaacson, and Douglas Albert, *The* 80386/387 Architecture, John Wiley & Sons, New York, 1987.
- NexGen Inc., Nx586 Processor Data Book, NexGen Inc., Milpitas, CA, 1993.
- NexGen Inc., Nx686 Processor Data Book, NexGen Inc., Milpitas, CA, 1994.
- Bipin Patwardhan, Introduction to the Streaming SIMD Extensions in the Pentium III, www.x86.org/articles/sse\_pt1/ simd1.htm, June, 2000.
- Peter Norton, Peter Aitken, and Richard Wilton, PC Programmer's Bible, Microsoft Press, Redmond, WA, 1993.
- PharLap 386IASM Reference Manual, Pharlap, Cambridge MA, 1993.
- PharLap TNT DOS-Extender Reference Manual, Pharlap, Cambridge MA, 1995.
- Sen-Cuo Ro and Sheau-Chuen Her, *i386/i486 Advanced Programming*, Van Nostrand Reinhold, New York, 1993.
- Tom Shanley, *Protected Mode System Architecture*, Addison Wesley, NY, 1996.
- SGS-Thomson Corporation, 80486DX Processor SMM Programming Manual, SGS-Thomson Corporation, 1995.
- Walter A. Triebel, *The 80386DX Microprocessor*, Prentice-Hall, Englewood Cliffs, NJ, 1992.
- John Wharton, *The Complete x86*, MicroDesign Resources, Sebastopol, California, 1994.

- Web sites and newsgroups:
  - www.amd.com
  - news.comp.arch
  - news.comp.lang.asm.x86
  - news.intel.microprocessors
  - news.microsoft

1

## 128-Bit Media Instruction Reference

This chapter describes the function, mnemonic syntax, opcodes, affected flags of the 128-bit media instructions and the possible exceptions they generate. These instructions load, store, or operate on data located in 128-bit XMM registers. Most of the instructions operate in parallel on sets of packed elements called *vectors*, although a few operate on scalars. These instructions define both integer and floating-point operations. They include the legacy SSE and SSE2 instructions.

Each instruction that performs a vector (packed) operation is illustrated with a diagram. Figure 1-1 on page 2 shows the conventions used in these diagrams. The particular diagram shows the PSLLW (packed shift left logical words) instruction. Arrowheads going *to* a source operand indicate the writing of the result. In this case, the result is written to the first source operand, which is also the destination operand.



#### Figure 1-1. Diagram Conventions for 128-Bit Media Instructions

Gray areas in diagrams indicate unmodified operand bits.

The 128-bit media instructions are useful in high-performance applications that operate on blocks of data. Because each instruction can independently and simultaneously perform a single operation on multiple elements of a vector, the instructions are classified as *single-instruction, multiple-data* (SIMD) instructions. A few 128-bit media instructions convert operands in XMM registers to operands in GPR, MMX<sup>TM</sup>, or x87 registers (or vice versa), or save or restore XMM state.

Hardware support for a specific 128-bit media instruction depends on the presence of at least one of the following CPUID functions:

- FXSAVE and FXRSTOR, indicated by bit 24 of CPUID standard function 1 and extended function 8000\_0001h.
- SSE, indicated by bit 25 of CPUID standard function 1.
- SSE2, indicated by bit 26 of CPUID standard function 1.

The 128-bit media instructions can be used in legacy mode or long mode. Their use in long mode is available if the following CPUID function is set:

 Long Mode, indicated by bit 29 of CPUID extended function 8000\_0001h.

Compilation of 128-bit media programs for execution in 64-bit mode offers four primary advantages: access to the eight extended XMM registers (for a register set consisting of XMM0-XMM15), access to the eight extended, 64-bit generalpurpose registers (for a register set consisting of GPR0-GPR15), access to the 64-bit virtual address space, and access to the RIP-relative addressing mode.

For further information, see:

- "128-Bit Media and Scientific Programming" in Volume 1.
- "Summary of Registers and Data Types" in Volume 3.
- "Notation" in Volume 3.
- "Instruction Prefixes" in Volume 3.

## ADDPD

## **Add Packed Double-Precision Floating-Point**

Adds each packed double-precision floating-point value in the first source operand to the corresponding packed double-precision floating-point value in the second source operand and writes the result of each addition in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

ADDPS, ADDSD, ADDSS

#### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| FZ                                                                                                 | R  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                                                                                                    |    |    |    |    |    |    |    |    |     | Μ  | М  | М  |    | М  | М  |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

#### Exceptions

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                               |
|----------------------------------------|------|-----------------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                 |
|                                        | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                                        | X    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                                        | X    | Х               | Х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                             | Х    | Х               | Х            | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP                | Х    | X               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                |
|                                        | X    | Х               | х            | The memory operand was not aligned on a 16-byte boundary.                                                                                        |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                     |
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | X            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.                                                                    |
|                                        |      |                 |              | See SIMD Floating-Point Exceptions below for details.                                                                                            |
|                                        |      | SIN             | MD Floating- | Point Exceptions                                                                                                                                 |
| Invalid-operation<br>exception (IE)    | Х    | Х               | Х            | A source operand was an SNaN value.                                                                                                              |
|                                        | Х    | Х               | Х            | +infinity was added to -infinity.                                                                                                                |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х            | A source operand was a denormal value.                                                                                                           |

| Exception                | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|--------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Overflow exception (OE)  | X    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE) | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Precision exception (PE) | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

## ADDPS Add Packed Single-Precision Floating-Point

Adds each packed single-precision floating-point value in the first source operand to the corresponding packed single-precision floating-point value in the second source operand and writes the result of each addition in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



add

addps.eps

#### **Related Instructions**

ADDPD, ADDSD, ADDSS

#### rFLAGS Affected

None

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | С  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

#### Exceptions

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |  |  |  |  |  |  |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
| Invalid opcode, #UD                    | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |  |  |  |  |  |  |
|                                        | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |  |  |  |  |  |  |
|                                        | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |  |  |  |  |  |  |
|                                        | Х    | Х               | х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |  |  |  |  |  |  |
| Device not available, #NM              | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |  |  |  |  |  |  |
| Stack, #SS                             | Х    | Х               | X         | A memory address exceeded the stack segment limit or non-canonical.                                                                                 |  |  |  |  |  |  |
| General protection, #GP                | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |  |  |  |  |  |  |
|                                        |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |  |  |  |  |  |  |
|                                        | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |  |  |  |  |  |  |
| Page fault, #PF                        |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |  |  |  |  |  |  |
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.                                                                    |  |  |  |  |  |  |
|                                        |      |                 |           | See SIMD Floating-Point Exceptions, below, for details.                                                                                             |  |  |  |  |  |  |
| SIMD Floating-Point Exceptions         |      |                 |           |                                                                                                                                                     |  |  |  |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X         | A source operand was an SNaN value.                                                                                                                 |  |  |  |  |  |  |
|                                        | X    | Х               | Х         | +infinity was added to -infinity.                                                                                                                   |  |  |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                                                                                              |  |  |  |  |  |  |

| Exception                | Real | Virtual<br>8086 Protected |   | Cause of Exception                                                                |  |  |  |  |  |  |
|--------------------------|------|---------------------------|---|-----------------------------------------------------------------------------------|--|--|--|--|--|--|
| Overflow exception (OE)  | Х    | Х                         | Х | A rounded result was too large to fit into the format of the destination operand. |  |  |  |  |  |  |
| Underflow exception (UE) | Х    | Х                         | Х | A rounded result was too small to fit into the format of the destination operand. |  |  |  |  |  |  |
| Precision exception (PE) | Х    | Х                         | Х | A result could not be represented exactly in the destination format.              |  |  |  |  |  |  |

## ADDSD

## **Add Scalar Double-Precision Floating-Point**

Adds the double-precision floating-point value in the low-order quadword of the first source operand to the double-precision floating-point value in the low-order quadword of the second source operand and writes the result in the low-order quadword of the destination (first source). The high-order quadword of the destination is not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 64-bit memory location.





#### **Related Instructions**

ADDPD, ADDPS, ADDSS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| FZ                                                                                                 | RC |    | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                                                                                                    |    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected   | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | X    | Х               | Х           | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | х    | х               | х           | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | х    | Х               | Х           | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | Х               | Х           | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х           | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | X    | Х               | Х           | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | X    | Х               | Х           | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | х           | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х           | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X           | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | Х           | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.                                                                    |
|                                        |      | CI I            | AD Floating | See SIMD Floating-Point Exceptions, below, for details.                                                                                             |
| land the second sec                    | V    |                 | -           | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X           | A source operand was an SNaN value.                                                                                                                 |
|                                        | Х    | Х               | Х           | +infinity was added to -infinity.                                                                                                                   |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х           | A source operand was a denormal value.                                                                                                              |
| Overflow exception (OE)                | Х    | Х               | X           | A rounded result was too large to fit into the format of the destination operand.                                                                   |
| Underflow exception (UE)               | Х    | Х               | Х           | A rounded result was too small to fit into the format of the destination operand.                                                                   |
| Precision exception (PE)               | X    | Х               | Х           | A result could not be represented exactly in the destination format.                                                                                |

## ADDSS

# **Add Scalar Single-Precision Floating-Point**

Adds the single-precision floating-point value in the low-order doubleword of the first source operand to the single-precision floating-point value in the low-order doubleword of the second source operand and writes the result in the low-order doubleword of the destination (first source). The three high-order doublewords of the destination are not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 32-bit memory location.

| Mnemonic               | Opcode             | Description                                                                                                                                                                        |
|------------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ADDSS xmm1, xmm2/mem32 | F3 0F 58 <i>/r</i> | Adds low-order single-precision floating-point values in an XMM register and another XMM register or 32-bit memory location and writes the result in the destination XMM register. |



#### **Related Instructions**

ADDPD, ADDPS, ADDSD

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                                                                                                 | R  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                                                                                                    |    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | X               | Х            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                        | Х    | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | х               | x            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | х               | x            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | X    | Х               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | X               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | X               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SI              | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | X               | X            | A source operand was an SNaN value.                                                                                                                 |
| - 、 /                                  | Х    | х               | Х            | +infinity was added to -infinity.                                                                                                                   |
| Denormalized-operand<br>exception (DE) | X    | Х               | X            | A source operand was a denormal value.                                                                                                              |

| Exception                | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|--------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Overflow exception (OE)  | X    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE) | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Precision exception (PE) | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

# ANDNPD Logical Bitwise AND NOT Packed Double-Precision Floating-Point

Performs a bitwise logical AND of the two packed double-precision floating-point values in the second source operand and the one's-complement of the corresponding two packed double-precision floating-point values in the first source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

ANDNPS, ANDPD, ANDPS, ORPD, ORPS, XORPD, XORPS

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

|                              |      | Virtual |           |                                                                                                                                                     |
|------------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Exception                    | Real | 8086    | Protected | Cause of Exception                                                                                                                                  |
| Invalid opcode, #UD          | Х    | Х       | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |
|                              | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                              | Х    | х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                              | X    | Х       | Х         | There was an unmasked SIMD floating-point exception while<br>CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available,<br>#NM | Х    | Х       | X         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                   | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP      | Х    | X       | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                              |      |         | X         | A null data segment was used to reference memory.                                                                                                   |
|                              | Х    | х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF              |      | Х       | X         | A page fault resulted from the execution of the instruction.                                                                                        |

# ANDNPS Logical Bitwise AND NOT Packed Single-Precision Floating-Point

Performs a bitwise logical AND of the four packed single-precision floating-point values in the second source operand and the one's-complement of the corresponding four packed single-precision floating-point values in the first source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode          | Description                                                                                                                                                                                                                      |
|--------------------------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ANDNPS xmm1, xmm2/mem128 | 0F 55 <i>/r</i> | Performs bitwise logical AND NOT of four packed single-precision<br>floating-point values in an XMM register and in another XMM<br>register or 128-bit memory location and writes the result in the<br>destination XMM register. |
| vmr                      | n1              | vmm2/mem128                                                                                                                                                                                                                      |



#### **Related Instructions**

ANDNPD, ANDPD, ANDPS, ORPD, ORPS, XORPD, XORPS

#### rFLAGS Affected

## **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |
|                              | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                              | х    | x               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                              | х    | X               | Х         | There was an unmasked SIMD floating-point exception while<br>CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available,<br>#NM | х    | X               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                   | х    | X               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP      | X    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |
|                              | х    | х               | х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |

# ANDPD Logical Bitwise AND Packed Double-Precision Floating-Point

Performs a bitwise logical AND of the two packed double-precision floating-point values in the first source operand and the corresponding two packed double-precision floating-point values in the second source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                | Opcode             | Description                                                                                                                                                                                                                 |
|-------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ANDPD xmm1, xmm2/mem128 | 66 0F 54 <i>/r</i> | Performs bitwise logical AND of two packed double-precision<br>floating-point values in an XMM register and in another XMM<br>register or 128-bit memory location and writes the result in the<br>destination XMM register. |
| xmr                     | n1                 | xmm2/mem128                                                                                                                                                                                                                 |
|                         | 53                 | 0 127 64.63 0                                                                                                                                                                                                               |



#### **Related Instructions**

ANDNPD, ANDNPS, ANDPS, ORPD, ORPS, XORPD, XORPS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

|                              |      | Virtual |           |                                                                                                                                                     |
|------------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Exception                    | Real | 8086    | Protected | Cause of Exception                                                                                                                                  |
| Invalid opcode, #UD          | Х    | Х       | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |
|                              | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                              | Х    | х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                              | X    | Х       | Х         | There was an unmasked SIMD floating-point exception while<br>CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available,<br>#NM | Х    | Х       | X         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                   | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP      | Х    | X       | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                              |      |         | X         | A null data segment was used to reference memory.                                                                                                   |
|                              | Х    | х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF              |      | Х       | X         | A page fault resulted from the execution of the instruction.                                                                                        |

# ANDPS Logical Bitwise AND Packed Single-Precision Floating-Point

Performs a bitwise logical AND of the four packed single-precision floating-point values in the first source operand and the corresponding four packed single-precision floating-point values in the second source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.





#### **Related Instructions**

ANDNPD, ANDNPS, ANDPD, ORPD, ORPS, XORPD, XORPS

### rFLAGS Affected

None

### **MXCSR Flags Affected**

|                              |      | Virtual |           |                                                                                                                                                     |
|------------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Exception                    | Real | 8086    | Protected | Cause of Exception                                                                                                                                  |
| Invalid opcode, #UD          | Х    | X       | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                              | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                              | Х    | х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                              | X    | Х       | X         | There was an unmasked SIMD floating-point exception while<br>CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available,<br>#NM | Х    | Х       | X         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                   | Х    | X       | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP      | Х    | X       | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                              |      |         | X         | A null data segment was used to reference memory.                                                                                                   |
|                              | Х    | Х       | х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF              |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                                                                        |

## CMPPD Compare Packed Double-Precision Floating-Point

Compares each of the two packed double-precision floating-point values in the first source operand with the corresponding packed double-precision floating-point value in the second source operand and writes the result of each comparison in the corresponding 64 bits of the destination (first source). The type of comparison is specified by the three low-order bits of the immediate-byte operand, as shown in Table 1-1. The result of each compare is a 64-bit value of all 1s (TRUE) or all 0s (FALSE). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                      | Opcode                    | Description                                                                                                                                  |  |  |
|-------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| CMPPD xmm1, xmm2/mem128, imm8 | 66 0F C2 <i>/r ib</i>     | Compares two pairs of packed double-precision<br>floating-point values in an XMM register and an<br>XMM register or 128-bit memory location. |  |  |
| xmm1                          |                           | xmm2/mem128                                                                                                                                  |  |  |
| 127 <b>•</b> 64 63            | ↓ 0<br>                   | 127 64 63 0                                                                                                                                  |  |  |
|                               | imma<br>7 0<br>0<br>1<br> |                                                                                                                                              |  |  |
|                               |                           | cmppd.eps                                                                                                                                    |  |  |

Some compare operations that are not directly supported by the immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and then executing the appropriate compare instruction using the swapped values. These additional compare operations are shown, together with

the directly supported compare operations, in Table 1-1. When swapping operands, the first source XMM register is overwritten by the result.

| Immediate-Byte Value<br>(bits 2–0) | Compare Operation                                 | Result If NaN Operand | QNaN Operand Causes<br>Invalid Operation Exception |  |  |  |
|------------------------------------|---------------------------------------------------|-----------------------|----------------------------------------------------|--|--|--|
| 000                                | Equal                                             | FALSE                 | No                                                 |  |  |  |
| 001                                | Less than                                         | FALSE                 | Yes                                                |  |  |  |
|                                    | Greater than or equal to (uses swapped operands)  | FALSE                 | Yes                                                |  |  |  |
| 010                                | Less than or equal                                | FALSE                 | Yes                                                |  |  |  |
|                                    | Greater than<br>(uses swapped operands)           | FALSE                 | Yes                                                |  |  |  |
| 011                                | Unordered                                         | TRUE                  | No                                                 |  |  |  |
| 100                                | Not equal                                         | TRUE                  | No                                                 |  |  |  |
| 101                                | Not less than                                     | TRUE                  | Yes                                                |  |  |  |
|                                    | Not greater than<br>(uses swapped operands)       | TRUE                  | Yes                                                |  |  |  |
| 110                                | Not less than or equal                            | TRUE                  | Yes                                                |  |  |  |
|                                    | Not greater than or equal (uses swapped operands) | TRUE                  | Yes                                                |  |  |  |
| 111                                | Ordered                                           | FALSE                 | No                                                 |  |  |  |

 Table 1-1.
 Immediate Operand Values for Compare Operations

#### **Related Instructions**

CMPPS, CMPSD, CMPSS, COMISD, COMISS, UCOMISD, UCOMISS

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |
|                                       | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | х    | х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while<br>CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available,<br>#NM          | Х    | X               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | X               | Х         | There was an unmasked SIMD floating-point exception while<br>CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                |  |  |  |
|----------------------------------------|------|-----------------|-----------|-------------------------------------------------------------------------------------------------------------------|--|--|--|
| SIMD Floating-Point Exceptions         |      |                 |           |                                                                                                                   |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | X               | Х         | A source operand was an SNaN value.                                                                               |  |  |  |
|                                        | Х    | Х               | Х         | A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 24). |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                                                            |  |  |  |

## CMPPS Compare Packed Single-Precision Floating-Point

Compares each of the four packed single-precision floating-point values in the first source operand with the corresponding packed single-precision floating-point value in the second source operand and writes the result of each comparison in the corresponding 32 bits of the destination (first source). The type of comparison is specified by the three low-order bits of the immediate-byte operand, as shown in Table 1-1 on page 24. The result of each compare is a 32-bit value of all 1s (TRUE) or all 0s (FALSE). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                      | Opcode                    | <b>Description</b><br>Compares four pairs of packed single-precision<br>floating-point values in an XMM register and an XMM<br>register or 64-bit memory location. |  |  |  |  |  |  |
|-------------------------------|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
| CMPPS xmm1, xmm2/mem128, imm8 | 0F C2 <i>/r ib</i>        |                                                                                                                                                                    |  |  |  |  |  |  |
| xmm1                          |                           | xmm2/mem128                                                                                                                                                        |  |  |  |  |  |  |
| 127 96 95 64 63               | 32 31 • 0<br>imn<br>7<br> |                                                                                                                                                                    |  |  |  |  |  |  |
|                               |                           | cmpps.eps                                                                                                                                                          |  |  |  |  |  |  |

Some compare operations that are not directly supported by the immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and then executing the appropriate compare instruction using the swapped values. These additional compare operations are shown in Table 1-1 on

## 

AMD 64-Bit Technology

page 24. When swapping operands, the first source XMM register is overwritten by the result.

#### **Related Instructions**

CMPPD, CMPSD, CMPSS, COMISD, COMISS, UCOMISD, UCOMISS

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| FZ    | R  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|-------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|       |    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15    | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| Note: | •  |    | •  | •  |    |    |    |    |     |    |    | •  |    |    |    |

A flag that can be set to one or zero is M (modified). Unaffected flags are blank.

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                     |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | Х    | x               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | x               | х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                                |
|                           | Х    | X               | х         | The memory operand was not aligned on a 16-byte boundary.                                                                                        |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |  |  |  |  |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Page fault, #PF                        |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |  |  |  |  |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |  |  |  |  |
| SIMD Floating-Point Exceptions         |      |                 |           |                                                                                                                                                     |  |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | Х               | Х         | A source operand was an SNaN value.                                                                                                                 |  |  |  |  |
|                                        | X    | Х               | Х         | A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 24).                                   |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                                                                                              |  |  |  |  |

## CMPSD Compare Scalar Double-Precision Floating-Point

Compares the double-precision floating-point value in the low-order 64 bits of the first source operand with the double-precision floating-point value in the low-order 64 bits of the second source operand and writes the result in the low-order 64 bits of the destination (first source). The type of comparison is specified by the three low-order bits of the immediate-byte operand, as shown in Table 1-1 on page 24. The result of the compare is a 64-bit value of all 1s (TRUE) or all 0s (FALSE). The first source/destination operand is an XMM register. The second source operand is another XMM register or 64-bit memory location. The high-order 64 bits of the destination XMM register are not modified.

| Mnemonic                     | Opcode                | Description                                                                                                             |
|------------------------------|-----------------------|-------------------------------------------------------------------------------------------------------------------------|
| CMPSD xmm1, xmm2/mem64, imm8 | F2 0F C2 <i>/r ib</i> | Compares double-precision floating-point values in<br>an XMM register and an XMM register or 64-bit<br>memory location. |
| xmm1                         |                       | xmm2/mem64                                                                                                              |
|                              |                       |                                                                                                                         |
|                              |                       | cmpsd.eps                                                                                                               |

Some compare operations that are not directly supported by the immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and then executing the appropriate compare instruction using the swapped values. These additional compare operations are shown in Table 1-1 on page 24. When swapping operands, the first source XMM register is overwritten by the result.

This CMPSD instruction should not be confused with the same-mnemonic CMPSD (compare strings by doubleword) instruction in the general-purpose instruction set. Assemblers can distinguish the instructions by the number and type of operands.

#### **Related Instructions**

CMPPD, CMPPS, CMPSS, COMISD, COMISS, UCOMISD, UCOMISS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| FZ                  | R                                                                                           | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|---------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                             |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                          | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Iote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                 |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | х    | x               | X         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | х    | X               | x         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical                                                                              |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                     |

# 

# AMD 64-Bit Technology

| Exception                              | Real                           | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                          |  |  |  |  |  |
|----------------------------------------|--------------------------------|-----------------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| Alignment check, #AC                   |                                | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.                                                           |  |  |  |  |  |
| SIMD Floating-Point<br>Exception, #XF  | X X X                          |                 | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See SIMD Floating-Point Exceptions, below, for details. |  |  |  |  |  |
|                                        | SIMD Floating-Point Exceptions |                 |           |                                                                                                                                             |  |  |  |  |  |
| Invalid-operation exception<br>(IE)    | Х                              | Х               | X         | A source operand was an SNaN value.                                                                                                         |  |  |  |  |  |
|                                        | Х                              | Х               | X         | A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 24).                           |  |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х                              | Х               | Х         | A source operand was a denormal value.                                                                                                      |  |  |  |  |  |

## CMPSS Compare Scalar Single-Precision Floating-Point

Compares the single-precision floating-point value in the low-order 32 bits of the first source operand with the single-precision floating-point value in the low-order 32 bits of the second source operand and writes the result in the low-order 32 bits of the destination (first source). The type of comparison is specified by the three low-order bits of the immediate-byte operand, as shown in Table 1-1 on page 24. The result of the compare is a 32-bit value of all 1s (TRUE) or all 0s (FALSE). The first source/destination operand is an XMM register. The second source operand is another XMM register or 32-bit memory location. The three high-order doublewords of the destination XMM register are not modified.



Some compare operations that are not directly supported by the immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and then executing the appropriate compare instruction using the swapped values. These additional compare operations are shown in Table 1-1 on page 24. When swapping operands, the first source XMM register is overwritten by the result.

### 

AMD 64-Bit Technology

### **Related Instructions**

CMPPD, CMPPS, CMPSD, COMISD, COMISS, UCOMISD, UCOMISS

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

| FZ                  | <b>RC</b><br>14 13 |           | РМ        | UM         | ОМ       | ZM         | DM        | IM         | DAZ      | PE | UE | OE | ZE | DE | IE |
|---------------------|--------------------|-----------|-----------|------------|----------|------------|-----------|------------|----------|----|----|----|----|----|----|
|                     |                    |           |           |            |          |            |           |            |          |    |    |    |    | М  | М  |
| 15                  | 14                 | 13        | 12        | 11         | 10       | 9          | 8         | 7          | 6        | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | lag that d         | can be se | et to one | or zero is | s M (mod | dified). U | naffected | l flags ar | e blank. |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                     |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                     |
| Alignment check, #AC      |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.                                                                |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SIN             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X            | A source operand was an SNaN value.                                                                                                                 |
| , ,                                    | X    | Х               | X            | A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 24).                                   |
| Denormalized-operand<br>exception (DE) | X    | Х               | X            | A source operand was a denormal value.                                                                                                              |

# COMISD Compare Ordered Scalar Double-Precision Floating-Point

Compares the double-precision floating-point value in the low-order 64 bits of an XMM register with the double-precision floating-point value in the low-order 64 bits of another XMM register or a 64-bit memory location and sets the ZF, PF, and CF bits in the rFLAGS register to reflect the result of the comparison. The result is unordered if one or both of the operand values is a NaN. The OF, AF, and SF bits in rFLAGS are set to zero.

If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

| Mnemonic                | Opcode             | Description                                                                                                                       |
|-------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------|
| COMISD xmm1, xmm2/mem64 | 66 0F 2F <i>/r</i> | Compares double-precision floating-point values in an XMM register and an XMM register or 64-bit memory location and sets rFLAGS. |



| Result of Compare | ZF | PF | CF |
|-------------------|----|----|----|
| Unordered         | 1  | 1  | 1  |
| Greater Than      | 0  | 0  | 0  |
| Less Than         | 0  | 0  | 1  |
| Equal             | 1  | 0  | 0  |

#### **Related Instructions**

#### CMPPD, CMPPS, CMPSD, CMPSS, COMISS, UCOMISD, UCOMISS

#### **rFLAGS** Affected

| ID | VIP | VIF | AC | VM | RF | NT | IOPL  | OF | DF | IF | TF | SF | ZF | AF | PF | CF |
|----|-----|-----|----|----|----|----|-------|----|----|----|----|----|----|----|----|----|
|    |     |     |    |    |    |    |       | 0  |    |    |    | 0  | М  | 0  | М  | М  |
| 21 | 20  | 19  | 18 | 17 | 16 | 14 | 13-12 | 11 | 10 | 9  | 8  | 7  | 6  | 4  | 2  | 0  |

Note:

*Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.* 

#### **MXCSR Flags Affected**

| FZ                  | RC          |           | PM        | UM         | OM       | ZM         | DM        | IM         | DAZ      | PE | UE | OE | ZE | DE | IE |
|---------------------|-------------|-----------|-----------|------------|----------|------------|-----------|------------|----------|----|----|----|----|----|----|
|                     |             |           |           |            |          |            |           |            |          |    |    |    |    | М  | М  |
| 15                  | 14          | 13        | 12        | 11         | 10       | 9          | 8         | 7          | 6        | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | flag that d | can be se | et to one | or zero is | s M (mod | dified). U | naffected | l flags ar | e blank. |    | •  | •  |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                 |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | Х    | х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |

| Exception                              | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| General protection, #GP                | X    | Х               | X                   | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | х                   | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х                   | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X                   | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        | •    | SIN             | <b>ND Floating-</b> | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | X                   | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | X    | Х               | X                   | A source operand was a denormal value.                                                                                                              |

# COMISS Compare Ordered Scalar Single-Precision Floating-Point

Performs an ordered comparison of the single-precision floating-point value in the low-order 32 bits of an XMM register with the single-precision floating-point value in the low-order 32 bits of another XMM register or a 32-bit memory location and sets the ZF, PF, and CF bits in the rFLAGS register to reflect the result of the comparison. The result is unordered if one or both of the operand values is a NaN. The OF, AF, and SF bits in rFLAGS are set to zero.

If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

| Mnemonic                | Opcode          | Description                                                                                                                    |
|-------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------|
| COMISS xmm1, xmm2/mem32 | 0F 2F <i>/r</i> | Compares single-precision floating-point values in an XMM register and an XMM register or 32-bit memory location. Sets rFLAGS. |



| Result of Compare | ZF | PF | CF |
|-------------------|----|----|----|
| Unordered         | 1  | 1  | 1  |
| Greater Than      | 0  | 0  | 0  |
| Less Than         | 0  | 0  | 1  |
| Equal             | 1  | 0  | 0  |

#### **Related Instructions**

#### CMPPD, CMPPS, CMPSD, CMPSS, COMISD, UCOMISD, UCOMISS

#### **rFLAGS** Affected

| ID | VIP | VIF | AC | VM | RF | NT | IOPL  | OF | DF | IF | TF | SF | ZF | AF | PF | CF |
|----|-----|-----|----|----|----|----|-------|----|----|----|----|----|----|----|----|----|
|    |     |     |    |    |    |    |       | 0  |    |    |    | 0  | М  | 0  | М  | М  |
| 21 | 20  | 19  | 18 | 17 | 16 | 14 | 13-12 | 11 | 10 | 9  | 8  | 7  | 6  | 4  | 2  | 0  |

Note:

*Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.* 

#### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                     |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | Х    | х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |

| Exception                              | Real | Virtual<br>8086 | Protected            | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| General protection, #GP                | Х    | X               | X                    | A memory address exceeded a data segment limit or was non-canonical                                                                                 |
|                                        |      |                 | x                    | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х                    | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X                    | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | X               | X                    | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        | •    | SI              | <b>ND Floating</b> - | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | X               | X                    | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | Х    | X               | X                    | A source operand was a denormal value.                                                                                                              |

# CVTDQ2PD Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point

Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or a 64-bit memory location to two packed double-precision floating-point values and writes the converted values in another XMM register.



#### **Related Instructions**

CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, CVTTSD2SI

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                 |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                                |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                     |
| Alignment check, #AC      |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                |

## CVTDQ2PS Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point

Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location to four packed single-precision floating-point values and writes the converted values in another XMM register. If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register.

| Mnemonic                   | Opcode          |
|----------------------------|-----------------|
| CVTDQ2PS xmm1, xmm2/mem128 | 0F 5B <i>/r</i> |

Description

Converts packed doubleword integer values in an XMM register or 128-bit memory location to packed single-precision floating-point values in the destination XMM register.



#### **Related Instructions**

CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, CVTTSS2SI

#### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                                                                                                 | R  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                                                                                                    |    |    |    |    |    |    |    |    |     | М  |    |    |    |    |    |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected   | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | X           | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |
|                                       | Х    | х               | Х           | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х           | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х           | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х           | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | X    | X               | X           | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х           | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х           | A null data segment was used to reference memory.                                                                                                   |
|                                       | X    | Х               | Х           | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х           | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | X           | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SII             | MD Floating | Point Exceptions                                                                                                                                    |
| Precision exception (PE)              | X    | Х               | X           | A result coulld not be represented exactly in the destination format.                                                                               |

# CVTPD2DQ

## Convert Packed Double-Precision Floating-Point to Packed Doubleword Integers

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed 32-bit signed integers and writes the converted values in the low-order 64 bits of another XMM register. The high-order 64 bits in the destination XMM register are cleared to all 0s.



If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

#### **Related Instructions**

CVTDQ2PD, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, CVTTSD2SI

#### rFLAGS Affected

## **MXCSR Flags Affected**

| FZ                  | <b>RC</b><br>14 13                                                                           |    | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                              |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                           | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Note:     Image: Note of the set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

# 

AMD 64-Bit Technology

|                                     |      | Virtual |              |                                                                      |
|-------------------------------------|------|---------|--------------|----------------------------------------------------------------------|
| Exception                           | Real | 8086    | Protected    | Cause of Exception                                                   |
|                                     |      | SIN     | AD Floating- | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х       | X            | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | x    | Х       | Х            | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | X    | Х       | Х            | A result could not be represented exactly in the destination format. |

# CVTPD2PI Convert Packed Double-Precision Floating-Point to Packed Doubleword Integers

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed 32-bit signed integer values and writes the converted values in an MMX register.



If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

Execution of this instruction causes all fields in the x87 tag word to be set according to their corresponding data, the top-of-stack-pointer bit (TOP) in the x87 status word to be cleared to 0, and any pending x87 exceptions are handled before this instruction is executed. For details, see "Actions Taken on Executing 64-Bit Media Instructions" in Volume 1.

### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, CVTTSD2SI

AMD 64-Bit Technology

### rFLAGS Affected

None

## **MXCSR Flags Affected**

| FZ                  | R           | С        | РМ        | UM         | ОМ       | ZM         | DM         | IM         | DAZ       | PE | UE | OE | ZE | DE | IE |
|---------------------|-------------|----------|-----------|------------|----------|------------|------------|------------|-----------|----|----|----|----|----|----|
|                     |             |          |           |            |          |            |            |            |           | М  |    |    |    |    | М  |
| 15                  | 14          | 13       | 12        | 11         | 10       | 9          | 8          | 7          | 6         | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | flag that d | an be se | et to one | or zero is | s M (mod | dified). U | Inaffected | d flags ar | re blank. |    | 1  |    | 1  |    |    |

|                                              |      | Virtual |           |                                                                                                                                                     |
|----------------------------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Exception                                    | Real | 8086    | Protected | Cause of Exception                                                                                                                                  |
| Invalid opcode, #UD                          | Х    | Х       | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                              | Х    | Х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                              | х    | X       | X         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                              | Х    | Х       | x         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM                    | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                                   | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                      | Х    | Х       | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                              |      |         | Х         | A null data segment was used to reference memory.                                                                                                   |
|                                              | Х    | Х       | x         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                              |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| x87 floating-point<br>exception pending, #MF | Х    | Х       | X         | An exception is pending due to an x87 floating-point instruction.                                                                                   |
| SIMD Floating-Point<br>Exception, #XF        | X    | Х       | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See SIMD Floating-Point Exceptions, below, for details.         |

| Exception                           | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                   |
|-------------------------------------|------|-----------------|---------------------|----------------------------------------------------------------------|
|                                     |      | SIN             | <b>AD Floating-</b> | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х               | X                   | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | x    | Х               | х                   | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | X    | Х               | X                   | A result could not be represented exactly in the destination format. |

# **CVTPD2PS**

## Convert Packed Double-Precision Floating-Point to Packed Single-Precision Floating-Point

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed single-precision floating-point values and writes the converted values in the low-order 64 bits of another XMM register. The high-order 64 bits in the destination XMM register are cleared to all 0s.



If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register.

### **Related Instructions**

CVTPS2PD, CVTSD2SS, CVTSS2SD

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| FZ                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | R  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| Note:     Note is the image of the image.           Note:         A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | X            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | X    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | х            | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | X               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | CIN             | ND Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation                     | x    | X               |              |                                                                                                                                                     |
| exception (IE)                        |      |                 |              | A source operand was an SNaN value.                                                                                                                 |
| Overflow exception (OE)               | Х    | Х               | X            | A rounded result was too large to fit into the format of the destination operand.                                                                   |

# 

# AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Underflow exception (UE)               | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | X    | Х               | Х         | A source operand was a denormal value.                                            |
| Precision exception (PE)               | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

# CVTPI2PD Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point

Converts two packed 32-bit signed integer values in an MMX register or a 64-bit memory location to two double-precision floating-point values and writes the converted values in an XMM register.



#### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, CVTTSD2SI

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

None

AMD 64-Bit Technology

| Exception                                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|----------------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                                              | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available, #NM                    | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                                   | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP                      | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                                              |      |                 | х         | A null data segment was used to reference memory.                                             |
| Page fault, #PF                              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | X         | An exception was pending due to an x87 floating-point instruction.                            |
| Alignment check, #AC                         |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.             |

# CVTPI2PS Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point

Converts two packed 32-bit signed integer values in an MMX register or a 64-bit memory location to two single-precision floating-point values and writes the converted values in the low-order 64 bits of an XMM register. The high-order 64 bits of the XMM register are not modified.

| Mnemonic                | Opcode          | Description                                                                                                                                                                    |
|-------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTPI2PS xmm, mmx/mem64 | 0F 2A <i>/r</i> | Converts packed doubleword integer values in an MMX <sup>™</sup> register or 64-bit memory location to single-precision floating-point values in the destination XMM register. |
|                         |                 |                                                                                                                                                                                |



#### **Related Instructions**

CVTDQ2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, CVTTSS2SI

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| FZ                  | <b>RC</b><br>14 13                                                                               |    | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|--------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                  |    |    |    |    |    |    |    |     | М  |    |    |    |    |    |
| 15                  | 14                                                                                               | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Note:         A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                                    | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|----------------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | Х               | Х                   | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                              | Х    | х               | Х                   | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                              | Х    | Х               | Х                   | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                              | Х    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM                    | Х    | Х               | Х                   | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                                   | Х    | Х               | X                   | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                      | Х    | X               | X                   | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                              |      |                 | х                   | A null data segment was used to reference memory                                                                                                    |
| Page fault, #PF                              |      | Х               | Х                   | A page fault resulted from the execution of the instruction.                                                                                        |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | X                   | An exception was pending due to an x87 floating-point instruction.                                                                                  |
| Alignment check, #AC                         |      | Х               | Х                   | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF        | X    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                              |      | SIA             | <b>ND Floating-</b> | Point Exceptions                                                                                                                                    |
| Precision exception (PE)                     | Х    | Х               | X                   | A result could not be represented exactly in the destination format.                                                                                |

# CVTPS2DQ Convert Packed Single-Precision Floating-Point to Packed Doubleword Integers

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory location to four packed 32-bit signed integer values and writes the converted values in another XMM register.



If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

#### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, CVTTSS2SI

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Iote:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

| Exception                           | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                   |
|-------------------------------------|------|-----------------|---------------------|----------------------------------------------------------------------|
|                                     |      | SIN             | <b>ND Floating-</b> | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х               | Х                   | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | х    | Х               | х                   | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | Х    | Х               | Х                   | A result could not be represented exactly in the destination format. |

# CVTPS2PD Convert Packed Single-Precision Floating-Point to Packed Double-Precision Floating-Point

Converts two packed single-precision floating-point values in the low-order 64 bits of an XMM register or a 64-bit memory location to two packed double-precision floatingpoint values and writes the converted values in another XMM register.





#### **Related Instructions**

CVTPD2PS, CVTSD2SS, CVTSS2SD

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | X               | Х                   | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | х               | х                   | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | х               | Х                   | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | X    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х                   | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X                   | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | Х               | Х                   | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х                   | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х                   | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X                   | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See SIMD Floating-Point Exceptions, below, for details.         |
|                                        |      | SIN             | <b>ND Floating-</b> | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X                   | A source operand was an SNaN value.                                                                                                                 |
| Denormalized-operand<br>exception (DE) | X    | Х               | X                   | A source operand was a denormal value.                                                                                                              |

# CVTPS2PI Convert Packed Single-Precision Floating-Point to Packed Doubleword Integers

Converts two packed single-precision floating-point values in the low-order 64 bits of an XMM register or a 64-bit memory location to two packed 32-bit signed integers and writes the converted values in an MMX register.



If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

#### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, CVTTSS2SI

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|----------------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                              | х    | х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                              | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM                    | Х    | X               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                              |      |                 | x         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | X         | An exception was pending due to an x87 floating-point instruction.                                                                                  |
| Alignment check, #AC                         |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF        | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

# 

AMD 64-Bit Technology

| Exception                           | Real | Virtual<br>8086 | Protected     | Cause of Exception                                                   |
|-------------------------------------|------|-----------------|---------------|----------------------------------------------------------------------|
|                                     |      | SIN             | ID Floating-F | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х               | Х             | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | x    | Х               | х             | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | Х    | Х               | Х             | A result could not be represented exactly in the destination format. |

# CVTSD2SI Convert Scalar Double-Precision Floating-Point to Signed Doubleword or Quadword Integer

Converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 32-bit or 64-bit signed integer and writes the converted value in a general-purpose register.

| Mnemonic                  | Opcode             | Description                                                                                                                                                 |
|---------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTSD2SI reg32, xmm/mem64 | F2 0F 2D <i>/r</i> | Converts a packed double-precision floating-point value in an XMM register or 64-bit memory location to a doubleword integer in a general-purpose register. |
| CVTSD2SI reg64, xmm/mem64 | F2 0F 2D <i>/r</i> | Converts a packed double-precision floating-point value in an XMM register or 64-bit memory location to a quadword integer in a general-purpose register.   |



If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$  or quadword value  $(-2^{63} \text{ to } +2^{63} - 1)$ , the instruction returns the indefinite integer value (8000\_0000h for 32-bit integers,

## 

AMD 64-Bit Technology

 $8000\_0000\_0000h$  for 64-bit integers) when the invalid-operation exception (IE) is masked.

#### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, CVTTSD2SI

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                  | R                                                                                           | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|---------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                             |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                          | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Iote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                           | х    | х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                           | х    | X               | х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                           |      |                 | x         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                          |  |  |  |  |
|---------------------------------------|------|-----------------|--------------|---------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Alignment check, #AC                  |      | Х               | Х            | An unaligned memory reference was performed while alignment checking was enabled.                                                           |  |  |  |  |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See SIMD Floating-Point Exceptions, below, for details. |  |  |  |  |
|                                       |      | SIN             | AD Floating- | Point Exceptions                                                                                                                            |  |  |  |  |
| Invalid-operation<br>exception (IE)   | X    | X               | Х            | A source operand was an SNaN value, a QNaN value, or ±infinity.                                                                             |  |  |  |  |
|                                       | х    | Х               | х            | A source operand was too large to fit in the destination format.                                                                            |  |  |  |  |
| Precision exception (PE)              | Х    | Х               | Х            | A result could not be represented exactly in the destination format.                                                                        |  |  |  |  |

# CVTSD2SS

## **Convert Scalar Double-Precision Floating-Point** to Scalar Single-Precision Floating-Point

Converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a single-precision floating-point value and writes the converted value in the low-order 32 bits of another XMM register. The three high-order doublewords in the destination XMM register are not modified. If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register.



#### **Related Instructions**

CVTPD2PS, CVTPS2PD, CVTSS2SD

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | С  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | Μ  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х                   | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х                   | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х                   | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х                   | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х                   | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х                   | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | Х               | Х                   | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х                   | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | Х               | Х                   | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | X                   | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       | -    | SI              | <b>ND Floating-</b> | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | Х               | X                   | A source operand was an SNaN value.                                                                                                                 |
| Overflow exception (OE)               | Х    | Х               | Х                   | A rounded result was too large to fit into the format of the destination operand.                                                                   |

# 

# AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Underflow exception (UE)               | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | X    | Х               | Х         | A source operand was a denormal value.                                            |
| Precision exception (PE)               | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

# CVTSI2SD Convert Signed Doubleword or Quadword Integer to Scalar Double-Precision Floating-Point

Converts a 32-bit or 64-bit signed integer value in a general-purpose register or memory location to a double-precision floating-point value and writes the converted value in the low-order 64 bits of an XMM register. The high-order 64 bits in the destination XMM register are not modified.

| Mnemonic                | Opcode             | Description                                                                                                                                                              |
|-------------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTSI2SD xmm, reg/mem32 | F2 0F 2A <i>/r</i> | Converts a doubleword integer in a general-purpose register or 32-<br>bit memory location to a double-precision floating-point value in<br>the destination XMM register. |
| CVTSI2SD xmm, reg/mem64 | F2 0F 2A <i>/r</i> | Converts a quadword integer in a general-purpose register or 64-bit memory location to a double-precision floating-point value in the destination XMM register.          |



#### with REX prefix

If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register.

AMD 64-Bit Technology

### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTTPD2DQ, CVTTPD2PI, CVTTSD2SI

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ    | R  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|-------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|       |    |    |    |    |    |    |    |    |     | М  |    |    |    |    |    |
| 15    | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| Note: |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

A flag that can be set to one or zero is M (modified). Unaffected flags are blank.

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                           | Х    | х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC      |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SI              | MD Floating- | Point Exceptions                                                                                                                                    |
| Precision exception (PE)              | X    | Х               | Х            | A result could not be represented exactly in the destination format.                                                                                |

# CVTSI2SS Convert Signed Doubleword or Quadword Integer to Scalar Single-Precision Floating-Point

Converts a 32-bit or 64-bit signed integer value in a general-purpose register or memory location to a single-precision floating-point value and writes the converted value in the low-order 32 bits of an XMM register. The three high-order doublewords in the destination XMM register are not modified.

| Mnemonic                | Opcode             | Description                                                                                                                                                           |
|-------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTSI2SS xmm, reg/mem32 | F3 0F 2A <i>/r</i> | Converts a doubleword integer in a general-purpose register or 32-bit memory location to a single-precision floating-point value in the destination XMM register.     |
| CVTSI2SS xmm, reg/mem64 | F3 0F 2A <i>/r</i> | Converts a quadword integer in a general-purpose register or 64-bit<br>memory location to a single-precision floating-point value in the<br>destination XMM register. |





If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register.

### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, CVTTSS2SI

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ    | R  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|-------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|       |    |    |    |    |    |    |    |    |     | М  |    |    |    |    |    |
| 15    | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| Note: |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

A flag that can be set to one or zero is M (modified). Unaffected flags are blank.

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                           | Х    | х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC      |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |

AMD 64-Bit Technology

| Exception                             | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| SIMD Floating-Point<br>Exception, #XF | X    | X               | X                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIN             | <b>AD Floating-</b> | Point Exceptions                                                                                                                                    |
| Precision exception (PE)              | X    | Х               | Х                   | A result could not be represented exactly in the destination format.                                                                                |

# CVTSS2SD Convert Scalar Single-Precision Floating-Point to Scalar Double-Precision Floating-Point

Converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a double-precision floating-point value and writes the converted value in the low-order 64 bits of another XMM register. The high-order 64 bits in the destination XMM register are not modified.



#### **Related Instructions**

#### CVTPD2PS, CVTPS2PD, CVTSD2SS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | X               | X            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | X               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | X               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | Х    | X               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SI              | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X            | A source operand was an SNaN value.                                                                                                                 |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X            | A source operand was a denormal value.                                                                                                              |

# CVTSS2SI Convert Scalar Single-Precision Floating-Point to Signed Doubleword or Quadword Integer

The CVTSS2SI instruction converts a single-precision floating-point value in the loworder 32 bits of an XMM register or a 32-bit memory location to a 32-bit or 64-bit signed integer value and writes the converted value in a general-purpose register.

| Mnemonic                   | Opcode      | Description                                                                                                                                                |
|----------------------------|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTSS2SI reg32, xmm2/mem32 | F3 0F 2D /r | Converts a single-precision floating-point value in an XMM register or 32-bit memory location to a doubleword integer value in a general-purpose register. |
| CVTSS2SI reg64, xmm2/mem32 | F3 0F 2D /r | Converts a single-precision floating-point value in an XMM register or 32-bit memory location to a quadword integer value in a general-purpose register.   |





If the result of the conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$  or quadword value  $(-2^{63} \text{ to } +2^{63} - 1)$ , the instruction returns the indefinite integer value (8000\_0000h for 32-bit integers,

## 

AMD 64-Bit Technology

 $8000\_0000\_0000h$  for 64-bit integers) when the invalid-operation exception (IE) is masked.

#### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTTPS2DQ, CVTTPS2PI, CVTTSS2SI

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                           | х    | х               | X         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                           | Х    | X               | х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                           |      |                 | x         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Alignment check, #AC                  |      | Х               | Х            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       | •    | SIN             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | X    | Х               | Х            | A source operand was an SNaN value, a QNaN value, or ±infinity.                                                                                     |
|                                       | х    | Х               | х            | A source operand was too large to fit in the destination format.                                                                                    |
| Precision exception (PE)              | Х    | Х               | Х            | A result could not be represented exactly in the destination format.                                                                                |

# **CVTTPD2DQ**

## Convert Packed Double-Precision Floating-Point to Packed Doubleword Integers, Truncated

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed 32-bit signed integer values and writes the converted values in the low-order 64 bits of another XMM register. The high-order 64 bits of the destination XMM register are cleared to all 0s.



If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} -1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2PI, CVTTSD2SI

### **rFLAGS** Affected

None

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | X         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

# 

AMD 64-Bit Technology

|                                     |      | Virtual |              |                                                                      |
|-------------------------------------|------|---------|--------------|----------------------------------------------------------------------|
| Exception                           | Real | 8086    | Protected    | Cause of Exception                                                   |
|                                     |      | SIN     | MD Floating- | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х       | X            | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | х    | Х       | Х            | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | Х    | Х       | Х            | A result could not be represented exactly in the destination format. |

## CVTTPD2PI Convert Packed Double-Precision Floating-Point to Packed Doubleword Integers, Truncated

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed 32-bit signed integer values and writes the converted values in an MMX register.



If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} -1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTSD2SI

### **rFLAGS** Affected

None

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|----------------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |
|                                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                              | х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                              | Х    | Х               | x         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM                    | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                      | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                              |      |                 | х         | A null data segment was used to reference memory.                                                                                                   |
|                                              | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | X         | An exception is pending due to an x87 floating-point instruction.                                                                                   |
| SIMD Floating-Point<br>Exception, #XF        | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

| Exception                           | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                   |
|-------------------------------------|------|-----------------|--------------|----------------------------------------------------------------------|
|                                     |      | SIN             | ID Floating- | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х               | X            | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | x    | Х               | Х            | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | Х    | Х               | Х            | A result could not be represented exactly in the destination format. |

# CVTTPS2DQ Convert Packed Single-Precision Floating-Point to Packed Doubleword Integers, Truncated

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory location to four packed 32-bit signed integers and writes the converted values in another XMM register.



If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} -1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2PI, CVTTSS2SI

### **rFLAGS** Affected

None

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

# 

AMD 64-Bit Technology

|                                     |      | Virtual |              |                                                                      |
|-------------------------------------|------|---------|--------------|----------------------------------------------------------------------|
| Exception                           | Real | 8086    | Protected    | Cause of Exception                                                   |
|                                     |      | SIN     | AD Floating- | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х       | X            | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | x    | Х       | Х            | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | X    | Х       | Х            | A result could not be represented exactly in the destination format. |

## CVTTPS2PI Convert Packed Single-Precision Floating-Point to Packed Doubleword Integers, Truncated

Converts two packed single-precision floating-point values in the low-order 64 bits of an XMM register or a 64-bit memory location to two packed 32-bit signed integer values and writes the converted values in an MMX register.



If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} -1)$ , the instruction returns the 32-bit indefinite integer value (8000\_0000h) when the invalid-operation exception (IE) is masked.

### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTSS2SI

### rFLAGS Affected

None

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|----------------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                              | Х    | Х               | X         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                              | Х    | Х               | x         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM                    | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                      | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                              |      |                 | x         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                              |      | Х               | X         | A page fault resulted from the execution of the instruction.                                                                                        |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | X         | An exception was pending due to an x87 floating-point instruction.                                                                                  |
| Alignment check, #AC                         |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF        | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

| Exception                           | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                   |
|-------------------------------------|------|-----------------|--------------|----------------------------------------------------------------------|
|                                     |      | SIN             | ID Floating- | Point Exceptions                                                     |
| Invalid-operation<br>exception (IE) | X    | Х               | X            | A source operand was an SNaN value, a QNaN value, or ±infinity.      |
|                                     | х    | Х               | х            | A source operand was too large to fit in the destination format.     |
| Precision exception (PE)            | Х    | Х               | Х            | A result could not be represented exactly in the destination format. |

## CVTTSD2SI Convert Scalar Double-Precision Floating-Point to Signed Doubleword of Quadword Integer, Truncated

Converts a double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 32-bit or 64-bit signed integer value and writes the converted value in a general-purpose register.

| Mnemonic                   | Opcode             | Description                                                                                                                                                                                           |
|----------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTTSD2SI reg32, xmm/mem64 | F2 0F 2C <i>/r</i> | Converts scalar double-precision floating-point value in an XMM register or 64-bit memory location to a doubleword signed integer value in a general-purpose register. Inexact results are truncated. |
| CVTTSD2SI reg64, xmm/mem64 | F2 0F 2C <i>/r</i> | Converts scalar double-precision floating-point value in an XMM register or 64-bit memory location to a quadword signed integer value in a general-purpose register. Inexact results are truncated.   |



If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$  or quadword value  $(-2^{63} \text{ to } +2^{63} - 1)$ , the instruction returns the indefinite integer value

 $(8000\_0000h$  for 32-bit integers,  $8000\_0000\_0000\_0000h$  for 64-bit integers) when the invalid-operation exception (IE) is masked.

### **Related Instructions**

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                  | R                                                                                           | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|---------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                             |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                          | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | lote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                 |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | х    | х               | X         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | X               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                     |

# 

## AMD 64-Bit Technology

| Exception                             | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Alignment check, #AC                  |      | Х               | Х                   | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | X                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIN             | <b>ND Floating-</b> | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | X               | Х                   | A source operand was an SNaN value, a QNaN value, or ±infinity.                                                                                     |
|                                       | X    | Х               | Х                   | A source operand was too large to fit in the destination format.                                                                                    |
| Precision exception (PE)              | X    | X               | Х                   | A result could not be represented exactly in the destination format.                                                                                |

# CVTTSS2SI Convert Scalar Single-Precision Floating-Point to Signed Doubleword or Quadword Integer, Truncated

Converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 32-bit or 64-bit signed integer value and writes the converted value in a general-purpose register.

| Mnemonic                   | Opcode             | Description                                                                                                                                                                                           |
|----------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CVTTSS2SI reg32, xmm/mem32 | F3 0F 2C <i>/r</i> | Converts scalar single-precision floating-point value in an XMM register or 32-bit memory location to a signed doubleword integer value in a general-purpose register. Inexact results are truncated. |
| CVTTSS2SI reg64, xmm/mem32 | F3 0F 2C <i>/r</i> | Converts scalar single-precision floating-point value in an XMM register or 32-bit memory location to a signed quadword integer value in a general-purpose register. Inexact results are truncated.   |



If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum signed doubleword  $(-2^{31} \text{ to } +2^{31} - 1)$  or quadword value  $(-2^{63} \text{ to } +2^{63} - 1)$ , the instruction returns the indefinite integer value

AMD 64-Bit Technology

(8000\_0000h for 32-bit integers, 8000\_0000\_0000\_0000h for 64-bit integers) when the invalid-operation exception (IE) is masked.

### **Related Instructions**

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                  | R                                                                                           | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|---------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                             |    |    |    |    |    |    |    |     | М  |    |    |    |    | М  |
| 15                  | 14                                                                                          | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Iote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                     |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | х    | х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | X               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                     |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Alignment check, #AC                  |      | Х               | Х            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIN             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | X    | X               | Х            | A source operand was an SNaN value, a QNaN value, or ±infinity.                                                                                     |
|                                       | х    | Х               | х            | A source operand was too large to fit in the destination format.                                                                                    |
| Precision exception (PE)              | Х    | Х               | Х            | A result could not be represented exactly in the destination format.                                                                                |

## DIVPD

# **Divide Packed Double-Precision Floating-Point**

Divides each of the two packed double-precision floating-point values in the first source operand by the corresponding packed double-precision floating-point value in the second source operand and writes the result of each division in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



divpd.eps

### **Related Instructions**

DIVPS, DIVSD, DIVSS

### **rFLAGS** Affected

None

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | Μ  | М  | М  | М  | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | X    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       | l    | SIA             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | X    | X               | X            | A source operand was an SNaN value.                                                                                                                 |
| ·····                                 | Х    | х               | Х            | Zero was divided by zero.                                                                                                                           |
|                                       | Х    | Х               | Х            | ±infinity was divided by ±infinity.                                                                                                                 |

# AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Overflow exception (OE)                | Х    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)               | Х    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X         | A source operand was a denormal value.                                            |
| Zero-divide exception (ZE)             | Х    | Х               | Х         | A non-zero number was divided by zero.                                            |
| Precision exception (PE)               | X    | Х               | X         | A result could not be represented exactly in the destination format.              |

# DIVPS Divide Packed Single-Precision Floating-Point

Divides each of the four packed single-precision floating-point values in the first source operand by the corresponding packed single-precision floating-point value in the second source operand and writes the result of each division in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                | Opcode  | Description                                                                                                                                                                       |
|-------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DIVPS xmm1, xmm2mem/128 | 0F 5E/r | Divides packed single-precision floating-point values in an XMM register by the packed single-precision floating-point values in another XMM register or 128-bit memory location. |

xmm1 xmm2/mem128 ┛ 96 95 64 63 32 31 96 95 64 63 32 31 127 127 0 0 divide divide divide divide

divps.eps

### **Related Instructions**

DIVPD, DIVSD, DIVSS

### rFLAGS Affected

None

| FZ                  | R                                                                                                  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  | М  | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х    | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | х    | х               | x            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | Х               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIA             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation                     | X    | X               | X            | A source operand was an SNaN value.                                                                                                                 |
| exception (IE)                        | х    | х               | х            | Zero was divided by zero.                                                                                                                           |
|                                       | Х    | Х               | x            | ±infinity was divided by ±infinity.                                                                                                                 |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Overflow exception (OE)                | Х    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)               | Х    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X         | A source operand was a denormal value.                                            |
| Zero-divide exception (ZE)             | Х    | Х               | Х         | A non-zero number was divided by zero.                                            |
| Precision exception (PE)               | Х    | Х               | Х         | A result could not be represented exactly in the destination format.              |

## DIVSD

# **Divide Scalar Double-Precision Floating-Point**

Divides the double-precision floating-point value in the low-order quadword of the first source operand by the double-precision floating-point value in the low-order quadword of the second source operand and writes the result in the low-order quadword of the destination (first source). The high-order quadword of the destination is not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic               | Opcode             | Description                                                                                                                                                                                       |
|------------------------|--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DIVSD xmm1, xmm2/mem64 | F2 0F 5E <i>/r</i> | Divides low-order double-precision floating-point value in an XMM register by the low-order double-precision floating-point value in another XMM register or in a 64- or 128-bit memory location. |



### **Related Instructions**

DIVPD, DIVPS, DIVSS

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                  | R                                                                                         | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|-------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                           |    |    |    |    |    |    |    |     | М  | М  | М  | М  | М  | М  |
| 15                  | 14                                                                                        | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>A flag that can be set to one or zero is M (modified). Unaffected flags are blank.</i> |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | X    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | X               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | Х            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        | 1    | SIA             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | Х            | A source operand was an SNaN value.                                                                                                                 |
| ,                                      | Х    | Х               | Х            | Zero was divided by zero.                                                                                                                           |
|                                        | Х    | Х               | Х            | ±infinity was divided by ±infinity.                                                                                                                 |
| Overflow exception (OE)                | Х    | Х               | Х            | A rounded result was too large to fit into the format of the destination operand.                                                                   |
| Underflow exception (UE)               | X    | Х               | Х            | A rounded result was too small to fit into the format of the destination operand.                                                                   |
| Denormalized-operand<br>exception (DE) | X    | X               | Х            | A source operand was a denormal value.                                                                                                              |
| Zero-divide exception (ZE)             | Х    | Х               | Х            | A non-zero number was divided by zero.                                                                                                              |
| Precision exception (PE)               | X    | X               | Х            | A result could not be represented exactly in the destination format.                                                                                |

# DIVSS

# **Divide Scalar Single-Precision Floating-Point**

Divides the single-precision floating-point value in the low-order doubleword of the first source operand by the single-precision floating-point value in the low-order doubleword of the second source operand and writes the result in the low-order doubleword of the destination (first source). The three high-order doublewords of the destination are not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic               | Opcode             | Description                                                                                                                                                                               |
|------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DIVSS xmm1, xmm2/mem32 | F3 OF 5E <i>/r</i> | Divides low-order single-precision floating-point value in an XMM register by the low-order single-precision floating-point value in another XMM register or in a 32-bit memory location. |



### **Related Instructions**

DIVPD, DIVPS, DIVSD

### **rFLAGS** Affected

None

DIVSS

| FZ                  | R                                                                                                  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  | М  | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | х    | х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | Х               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | Х            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIM             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | Х               | Х            | A source operand was an SNaN value.                                                                                                                 |
| • • •                                 | Х    | Х               | Х            | Zero was divided by zero.                                                                                                                           |
|                                       | Х    | Х               | Х            | ±infinity was divided by ±infinity.                                                                                                                 |

# AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Overflow exception (OE)                | Х    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)               | Х    | Х               | X         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X         | A source operand was a denormal value.                                            |
| Zero-divide exception (ZE)             | Х    | Х               | Х         | A non-zero number was divided by zero.                                            |
| Precision exception (PE)               | X    | Х               | X         | A result could not be represented exactly in the destination format.              |

# **FXRSTOR Restore XMM, MMX<sup>™</sup>, and x87 State**

Restores the XMM, MMX, and x87 state. The data loaded from memory is the state information previously saved using the FXSAVE instruction. Restoring data with FXRSTOR that had been previously saved with an FSAVE (rather than FXSAVE) instruction results in an incorrect restoration.

If FXRSTOR results in set exception flags in the loaded x87 status word register, and these exceptions are unmasked in the x87 control word register, a floating-point exception occurs when the next floating-point instruction is executed (except for the no-wait floating-point instructions).

If the restored MXCSR register contains a set bit in an exception status flag, and the corresponding exception mask bit is cleared (indicating an unmasked exception), loading the MXCSR register from memory does not cause a SIMD floating-point exception (#XF).

FXRSTOR does not restore the x87 error pointers (last instruction pointer, last data pointer, and last opcode), except in the relatively rare cases in which the exceptionsummary (ES) bit in the x87 status word is set to 1, indicating that an unmasked x87 exception has occurred.

The architecture supports two memory formats for FXRSTOR, a 512-byte 32-bit legacy format and a 512-byte 64-bit format. Selection of the 32-bit or 64-bit format is accomplished by using the corresponding effective operand size in the FXRSTOR instruction. If software running in 64-bit mode executes an FXRSTOR with a 32-bit operand size (no REX-prefix operand-size override), the 32-bit legacy format is used. If software running in 64-bit mode executes an FXRSTOR with a 64-bit operand size (requires REX-prefix operand-size override), the 64-bit format is used. For details about the memory image restored by FXRSTOR, see "Saving Media and x87 Processor State" in Volume 2.

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0, the saved image of XMM0–XMM7 and MXCSR is not loaded into the processor. A general-protection exception occurs if there is an attempt to load a non-zero value to the bits in MXCSR that are defined as reserved (bits 31–16).

| Mnemonic          | Opcode   | Description                                                                   |
|-------------------|----------|-------------------------------------------------------------------------------|
| FXRSTOR mem512env | 0F AE /1 | Restores XMM, MMX <sup>™</sup> , and x87 state from 512-byte memory location. |

## 

AMD 64-Bit Technology

## **Related Instructions**

## FWAIT, FXSAVE

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| FZ                                                                                                 | R  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
| М                                                                                                  | М  | М  | М  | М  | М  | М  | М  | М  | М   | М  | М  | М  | М  | М  | М  |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                     |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | X    | Х               | X         | The FXSAVE/FSRSTOR instructions are not supported, as indicated by bit 24 of CPUID standard function 1 or extended function 8000_0001. |
|                           | Х    | х               | x         | The emulate bit (EM) of CR0 was set to 1.                                                                                              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                          |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                   |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                      |
|                           | Х    | х               | X         | The memory operand was not aligned on a 16-byte boundary.                                                                              |
|                           | Х    | Х               | х         | Ones were written to the reserved bits in MXCSR.                                                                                       |
| Page fault, #PF           |      | Х               | X         | A page fault resulted from the execution of the instruction.                                                                           |

# **FXSAVE** Save XMM, MMX<sup>™</sup>, and x87 State

Saves the XMM, MMX, and x87 state. A memory location that is not aligned on a 16byte boundary causes a general-protection exception.

Unlike FSAVE and FNSAVE, FXSAVE does not alter the x87 tag bits. The contents of the saved MMX/x87 data registers are retained, thus indicating that the registers may be valid (or whatever other value the x87 tag bits indicated prior to the save). To invalidate the contents of the MMX/x87 data registers after FXSAVE, software must execute an FINIT instruction. Also, FXSAVE (like FNSAVE) does not check for pending unmasked x87 floating-point exceptions. An FWAIT instruction can be used for this purpose.

FXSAVE does not save the x87 pointer registers (last instruction pointer, last data pointer, and last opcode), except in the relatively rare cases in which the exceptionsummary (ES) bit in the x87 status word is set to 1, indicating that an unmasked x87 exception has occurred.

The architecture supports two memory formats for FXSAVE, a 512-byte 32-bit legacy format and a 512-byte 64-bit format. Selection of the 32-bit or 64-bit format is accomplished by using the corresponding effective operand size in the FXSAVE instruction. If software running in 64-bit mode executes an FXSAVE with a 32-bit operand size (no REX-prefix operand-size override), the 32-bit legacy format is used. If software running in 64-bit mode executes an FXSAVE with a 64-bit operand size (requires REX-prefix operand-size override), the 64-bit format is used. For details about the memory image restored by FXRSTOR, see "Saving Media and x87 Processor State" in Volume 2.

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0, FXSAVE does not save the image of XMM0–XMM7 or MXCSR. For details about the CR4.OSFXSR bit, see "FXSAVE/FXRSTOR Support (OSFXSR) Bit" in Volume 2.

| Mnemonic         | Opcode   | Description                                                 |
|------------------|----------|-------------------------------------------------------------|
| FXSAVE mem512env | 0F AE /0 | Saves XMM, MMX™, and x87 state to 512-byte memory location. |

### **Related Instructions**

FINIT, FNSAVE, FRSTOR, FSAVE, FXRSTOR, LDMXCSR, STMXCSR

## 

AMD 64-Bit Technology

## rFLAGS Affected

None

## **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                     |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The FXSAVE/FSRSTOR instructions are not supported, as indicated by bit 24 of CPUID standard function 1 or extended function 8000_0001. |
|                           | х    | Х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                          |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                   |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                      |
|                           |      |                 | Х         | The destination operand was in a non-writable segment.                                                                                 |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                              |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                           |

# LDMXCSR Load MXCSR Control/Status Register

Loads the MXCSR register with a 32-bit value from memory. The least-significant bit of the memory location is loaded in bit 0 of MXCSR. Bits 31–16 of the MXCSR are reserved and must be zero. A general-protection exception occurs if the LDMXCSR instruction attempts to load non-zero values into MXCSR bits 31–16.

The MXCSR register is described in "Registers" in Volume 1.

| Mnemonic      | Opcode  | Description                                       |
|---------------|---------|---------------------------------------------------|
| LDMXCSR mem32 | 0F AE/2 | Loads MXCSR register with 32-bit value in memory. |

### **Related Instructions**

STMXCSR

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| FZ                                                                                                 | R  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
| М                                                                                                  | М  | М  | М  | М  | М  | М  | М  | М  | М   | М  | М  | М  | М  | М  | М  |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

AMD 64-Bit Technology

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                           |      |                 | х         | A null data segment was used to reference memory.                                            |
|                           | Х    | Х               | Х         | Ones were written to the reserved bits in MXCSR.                                             |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |
| Alignment check, #AC      |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.            |

# MASKMOVDQU Masked Move Double Quadword Unaligned

Stores bytes from the first source operand as selected by the sign bits in the second source operand (sign-bit is 0 = no write and sign-bit is 1 = write) to a memory location specified in the DS:rDI registers. The first source operand is an XMM register, and the second source operand is another XMM register. The store address may be unaligned.

Description



Opcode

MASKMOVDQU xmm1, xmm2

66 OF F7*/r* 

Store bytes from an XMM register selected by a mask value in another XMM register to DS:rDI.



A mask value of all 0s results in the following behavior:

- No data is written to memory.
- Code and data breakpoints are not guaranteed to be signaled in all implementations.
- Exceptions associated with memory addressing and page faults are not guaranteed to be signaled in all implementations.
- The protection features of memory regions mapped as UC or WP are not guaranteed to be enforced in all implementations.

MASKMOVDQU implicitly uses weakly-ordered, write-combining buffering for the data, as described in "Buffering and Combining Memory Writes" in Volume 2. For data that is shared by multiple processors, this instruction should be used together with a fence instruction in order to ensure data coherency (refer to "Cache and TLB Management" in Volume 2).

#### **Related Instructions**

MASKMOVQ

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | х    | x               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | X    | X               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

## MAXPD Maximum Packed Double-Precision Floating-Point

Compares each of the two packed double-precision floating-point values in the first source operand with the corresponding packed double-precision floating-point value in the second source operand and writes the numerically greater of the two values for each comparison in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

#### **Related Instructions**

MAXPS, MAXSD, MAXSS, MINPD, MINPS, MINSD, MINSS

#### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | RC                                                                                                 |    | RC |    | RC |   | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|---|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |   |    |    |    |    |    |    |     | М  | М  |    |    |    |    |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9 | 8  | 7  | 6  | 5  | 4  | 3  | 2   | 1  | 0  |    |    |    |    |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |   |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | X            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | X               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
|                                        | Х    | Х               | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SIA             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | Х    | X               | X            | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X            | A source operand was a denormal value.                                                                                                              |

## MAXPS Maximum Packed Single-Precision Floating-Point

Compares each of the four packed single-precision floating-point values in the first source operand with the corresponding packed single-precision floating-point value in the second source operand and writes the numerically greater of the two values for each comparison in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

maximum

#### **Related Instructions**

MAXPD, MAXSD, MAXSS, MINPD, MINPS, MINSD, MINSS

maxps.eps

AMD 64-Bit Technology

### rFLAGS Affected

None

## **MXCSR Flags Affected**

| FZ                  | RC                                                                                                 |    | RC |    | RC |   | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|---|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |   |    |    |    |    |    |    |     | М  | М  |    |    |    |    |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9 | 8  | 7  | 6  | 5  | 4  | 3  | 2   | 1  | 0  |    |    |    |    |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |   |    |    |    |    |    |    |     |    |    |    |    |    |    |

|                                       |      | Virtual |           |                                                                                                                                                  |
|---------------------------------------|------|---------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Exception                             | Real | 8086    | Protected | Cause of Exception                                                                                                                               |
| Invalid opcode, #UD                   | Х    | Х       | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                     |
|                                       | Х    | Х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                                       | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                                       | Х    | Х       | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                            | Х    | Х       | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP               | Х    | Х       | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                                       |      |         | Х         | A null data segment was used to reference memory.                                                                                                |
|                                       | Х    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                        |
| Page fault, #PF                       |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                                                                     |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х       | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See SIMD Floating-Point Exceptions, below, for details.      |

| Exception                              | Real | Virtual<br>8086 | Protected           | Cause of Exception                          |
|----------------------------------------|------|-----------------|---------------------|---------------------------------------------|
|                                        |      | SI              | <b>ND Floating-</b> | Point Exceptions                            |
| Invalid-operation<br>exception (IE)    | X    | Х               | X                   | A source operand was an SNaN or QNaN value. |
| Denormalized-operand<br>exception (DE) | X    | Х               | Х                   | A source operand was a denormal value.      |

## MAXSD Maximum Scalar Double-Precision Floating-Point

Compares the double-precision floating-point value in the low-order 64 bits of the first source operand with the double-precision floating-point value in the low-order 64 bits of the second source operand and writes the numerically greater of the two values in the low-order quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or a 64-bit memory location. The high-order quadword of the destination XMM register is not modified.

| Mnemonic               | Opcode             | Description                                                                                                                                                                                   |
|------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MAXSD xmm1, xmm2/mem64 | F2 0F 5F <i>/r</i> | Compares scalar double-precision values in an XMM register and<br>another XMM register or 64-bit memory location and writes the<br>greater of the two values in the destination XMM register. |
|                        | xmm1               | xmm2/mem64                                                                                                                                                                                    |
| 127                    | 64 63              | 0 127 64 63 0                                                                                                                                                                                 |
|                        |                    | maxsd.eps                                                                                                                                                                                     |

If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

#### **Related Instructions**

MAXPD, MAXPS, MAXSS, MINPD, MINPS, MINSD, MINSS

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | RC                                                                                                 |    | RC |    | RC |   | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|---|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |   |    |    |    |    |    |    |     | М  | М  |    |    |    |    |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9 | 8  | 7  | 6  | 5  | 4  | 3  | 2   | 1  | 0  |    |    |    |    |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |   |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | X            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | X    | Х               | x            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | Х               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SIN             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | X            | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | X    | Х               | X            | A source operand was a denormal value.                                                                                                              |

## MAXSS Maximum Scalar Single-Precision Floating-Point

Compares the single-precision floating-point value in the low-order 32 bits of the first source operand with the single-precision floating-point value in the low-order 32 bits of the second source operand and writes the numerically greater of the two values in the low-order 32 bits of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or a 32-bit memory location. The three high-order doublewords of the destination XMM register are not modified.



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

#### **Related Instructions**

MAXPD, MAXPS, MAXSD, MINPD, MINPS, MINSD, MINSS, PFMAX

#### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | RC                                                                                                 |    | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | X            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                        | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | Х               | x            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | X    | Х               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SIN             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X            | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X            | A source operand was a denormal value.                                                                                                              |

## MINPD

## Minimum Packed Double-Precision Floating-Point

Compares each of the two packed double-precision floating-point values in the first source operand with the corresponding packed double-precision floating-point value in the second source operand and writes the numerically lesser of the two values for each comparison in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or a 128-bit memory location.



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

#### **Related Instructions**

MAXPD, MAXPS, MAXSD, MAXSS, MINPS, MINSD, MINSS

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | RC                                                                                                 |    | RC |    | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |    |    |     |    |    | М  | М  |    |    |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6  | 5  | 4   | 3  | 2  | 1  | 0  |    |    |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | Х               | х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | Х               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
|                                        | Х    | Х               | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        | I    | SIN             | ND Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | X            | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X            | A source operand was a denormal value.                                                                                                              |

## MINPS Minimum Packed Single-Precision Floating-Point

The MINPS instruction compares each of the four packed single-precision floatingpoint values in the first source operand with the corresponding packed singleprecision floating-point value in the second source operand and writes the numerically lesser of the two values for each comparison in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or a 128-bit memory location.

| Mnemonic                  | Opcode          | Description                                                                                                                                                                                                              |
|---------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MINPS xmm 1, xmm2/mem 128 | 0F 5D <i>/r</i> | Compares four pairs of packed single-precision values in an XMM register and another XMM register or 128-bit memory location and writes the numerically lesser value of each comparison in the destination XMM register. |
| Х                         | mm1             | xmm2/mem128                                                                                                                                                                                                              |



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

### **Related Instructions**

MAXPD, MAXPS, MAXSD, MAXSS, MINPD, MINSD, MINSS, PFMIN

### rFLAGS Affected

None

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | РМ | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

|                                       |      | Virtual |           |                                                                                                                                                  |
|---------------------------------------|------|---------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Exception                             | Real | 8086    | Protected | Cause of Exception                                                                                                                               |
| Invalid opcode, #UD                   | Х    | Х       | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                     |
|                                       | Х    | Х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                                       | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                                       | Х    | Х       | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                            | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP               | Х    | Х       | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                                       |      |         | Х         | A null data segment was used to reference memory.                                                                                                |
|                                       | Х    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                        |
| Page fault, #PF                       |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                                                                     |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х       | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See SIMD Floating-Point Exceptions, below, for details.      |

# 

AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                          |  |  |  |  |  |  |
|----------------------------------------|------|-----------------|-----------|---------------------------------------------|--|--|--|--|--|--|
| SIMD Floating-Point Exceptions         |      |                 |           |                                             |  |  |  |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | Х               | Х         | A source operand was an SNaN or QNaN value. |  |  |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.      |  |  |  |  |  |  |

## MINSD Minimum Scalar Double-Precision Floating-Point

Compares the double-precision floating-point value in the low-order 64 bits of the first source operand with the double-precision floating-point value in the low-order 64 bits of the second source operand and writes the numerically lesser of the two values in the low-order 64 bits of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or a 64-bit memory location. The high-order quadword of the destination XMM register is not modified.



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

#### **Related Instructions**

MAXPD, MAXPS, MAXSD, MAXSS, MINPD, MINPS, MINSS

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |  |  |  |  |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Invalid opcode, #UD                    | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                       |  |  |  |  |
|                                        | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |  |  |  |  |
|                                        | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |  |  |  |  |
|                                        | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |  |  |  |  |
| Device not available, #NM              | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |  |  |  |  |
| Stack, #SS                             | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |  |  |  |  |
| General protection, #GP                | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |  |  |  |  |
|                                        |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |  |  |  |  |
| Page fault, #PF                        |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |  |  |  |  |
| Alignment check, #AC                   |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |  |  |  |  |
| SIMD Floating-Point<br>Exception, #XF  | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |  |  |  |  |
| SIMD Floating-Point Exceptions         |      |                 |           |                                                                                                                                                     |  |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X         | A source operand was an SNaN or QNaN value.                                                                                                         |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X         | A source operand was a denormal value.                                                                                                              |  |  |  |  |

# MINSS Minimum Scalar Single-Precision Floating-Point

Compares the single-precision floating-point value in the low-order 32 bits of the first source operand with the single-precision floating-point value in the low-order 32 bits of the second source operand and writes the numerically lesser of the two values in the low-order 32 bits of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or a 32-bit memory location. The three high-order doublewords of the destination XMM register are not modified.



If both source operands are equal to zero, the value in the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination.

#### **Related Instructions**

MAXPD, MAXPS, MAXSD, MAXSS, MINPD, MINPS, MINSD

#### rFLAGS Affected

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | Х               | X            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                        | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | X    | Х               | x            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM              | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | X    | Х               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | Х               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SIN             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | X            | A source operand was an SNaN or QNaN value.                                                                                                         |
| Denormalized-operand<br>exception (DE) | X    | Х               | X            | A source operand was a denormal value.                                                                                                              |

# MOVAPD Move Aligned Packed Double-Precision Floating-Point

Moves two packed double-precision floating-point values:

- from an XMM register or 128-bit memory location to another XMM register, or
- from an XMM register to another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode             | Description                                                                                                            |
|--------------------------|--------------------|------------------------------------------------------------------------------------------------------------------------|
| MOVAPD xmm1, xmm2/mem128 | 66 0F 28 <i>/r</i> | Moves packed double-precision floating-point value from an XMM register or 128-bit memory location to an XMM register. |
| MOVAPD xmm1/mem128, xmm2 | 66 0F 29 <i>/r</i> | Moves packed double-precision floating-point value from an XMM register to an XMM register or 128-bit memory location. |





A memory operand that is not aligned on a 16-byte boundary causes a generalprotection exception.

## 

AMD 64-Bit Technology

## **Related Instructions**

MOVHPD, MOVLPD, MOVMSKPD, MOVSD, MOVUPD

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | Х         | The destination operand was in a non-writable segment.                                        |
|                              | х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# MOVAPS Move Aligned Packed Single-Precision Floating-Point

Moves four packed single-precision floating-point values:

- from an XMM register or 128-bit memory location to another XMM register, or
- from an XMM register to another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode          | Description                                                                                                                                       |
|--------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| MOVAPS xmm1, xmm2/mem128 | 0F 28 <i>/r</i> | Moves aligned packed single-precision floating-point value from<br>an XMM register or 128-bit memory location to the destination<br>XMM register. |
| MOVAPS xmm1/mem128, xmm2 | 0F 29 <i>/r</i> | Moves aligned packed single-precision floating-point value from<br>an XMM register to the destination XMM register or 128-bit<br>memory location. |

### AMD 64-Bit Technology



A memory operand that is not aligned on a 16-byte boundary causes a generalprotection exception.

#### **Related Instructions**

MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                   | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                              |      |                 | х         | A null data segment was used to reference memory.                                            |
|                              |      |                 | Х         | The destination operand was in a non-writable segment.                                       |
|                              | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

## MOVD

# Move Doubleword or Quadword

Moves a 32-bit or 64-bit value in one of the following ways:

- from a 32-bit or 64-bit general-purpose register or memory location to the loworder 32 or 64 bits of an XMM register, with zero-extension to 128 bits
- from the low-order 32 or 64 bits of an XMM to a 32-bit or 64-bit general-purpose register or memory location
- from a 32-bit or 64-bit general-purpose register or memory location to the loworder 32 bits (with zero-extension to 64 bits) or the full 64 bits of an MMX register
- from the low-order 32 or the full 64 bits of an MMX register to a 32-bit or 64-bit general-purpose register or memory location

| Mnemonic                    | Opcode          | Description                                                                                         |
|-----------------------------|-----------------|-----------------------------------------------------------------------------------------------------|
| MOVD <i>xmm</i> , reg/mem32 | 66 0F 6E/r      | Move 32-bit value from a general-purpose register or 32-bit memory location to an XMM register.     |
| MOVD <i>xmm</i> , reg/mem64 | 66 0F 6E/r      | Move 64-bit value from a general-purpose register or 64-bit memory location to an XMM register.     |
| MOVD reg/mem32, xmm         | 66 0F 7E/r      | Move 32-bit value from an XMM register to a 32-bit general-<br>purpose register or memory location. |
| MOVD reg/mem64, xmm         | 66 0F 7E/r      | Move 64-bit value from an XMM register to a 64-bit general-<br>purpose register or memory location. |
| MOVD <i>mmx</i> , reg/mem32 | 0F 6E <i>/r</i> | Move 32-bit value from a general-purpose register or 32-bit memory location to an MMX register.     |
| MOVD <i>mmx</i> , reg/mem64 | 0F 6E <i>/r</i> | Move 64-bit value from a general-purpose register or 64-bit memory location to an MMX register.     |
| MOVD reg/mem32, mmx         | 0F 7E/r         | Move 32-bit value from an MMX register to a 32-bit general-<br>purpose register or memory location. |
| MOVD reg/mem64, mmx         | 0F 7E/r         | Move 64-bit value from an MMX register to a 64-bit general-<br>purpose register or memory location. |

The following diagrams illustrate the operation of the MOVD instruction.

### 26568-Rev. 3.02-August 2002



## 

AMD 64-Bit Technology

## **Related Instructions**

## MOVDQA, MOVDQU, MOVDQ2Q, MOVQ, MOVQ2DQ

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

### **Exceptions (All Modes)**

| Exception                                    | Real | Virtual<br>8086 | Protected | Description                                                                                     |
|----------------------------------------------|------|-----------------|-----------|-------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | X               | Х         | The MMX instructions are not supported, as indicated by bit<br>23 of CPUID standard function 1. |
|                                              | x    | Х               | x         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.   |
|                                              | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                       |
|                                              | Х    | х               | х         | The instruction used XMM registers while CR4.OSFXSR=0.                                          |
| Device not available,<br>#NM                 | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                   |
| Stack, #SS                                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                         |
| General protection, #GP                      | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                            |
| Page fault, #PF                              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                    |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | X         | An x87 floating-point exception was pending and the instruction referenced an MMX register.     |
| Alignment check, #AC                         |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.               |

# MOVDQ2Q Move Quadword to Quadword

Moves the low-order 64-bit value in an XMM register to a 64-bit MMX register.

| Mnemonic         | Opcode      | Description                                                                                     |
|------------------|-------------|-------------------------------------------------------------------------------------------------|
| MOVDQ2Q mmx, xmm | F2 0F D6 /r | Moves low-order 64-bit value from an XMM register to the destination MMX <sup>™</sup> register. |
| 63               | mmx         | <b>xmm</b><br>127 64 63 0                                                                       |
|                  |             | COPY<br>movdq2q.eps                                                                             |

#### **Related Instructions**

#### MOVD, MOVDQA, MOVDQU, MOVQ, MOVQ2DQ

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|----------------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                                              | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM                 | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| General protection                           |      |                 | Х         | The destination operand was in a non-writable segment.                                        |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | Х         | An exception was pending due to an x87 floating-point instruction.                            |

# MOVDQA Move Aligned Double Quadword

Moves an aligned 128-bit (double quadword) value:

- from an XMM register or 128-bit memory location to another XMM register, or
- from an XMM register to a 128-bit memory location or another XMM register.

| Mnemonic                 | Opcode             | Description                                                                                          |
|--------------------------|--------------------|------------------------------------------------------------------------------------------------------|
| MOVDQA xmm1, xmm2/mem128 | 66 0F 6F <i>/r</i> | Moves 128-bit value from an XMM register or 128-bit memory location to the destination XMM register. |
| MOVDQA xmm1/mem128, xmm2 | 66 0F 7F/r         | Moves 128-bit value from an XMM register to the destination XMM register or 128-bit memory location. |





A memory operand that is not aligned on a 16-byte boundary causes a generalprotection exception.

#### **Related Instructions**

MOVD, MOVDQU, MOVDQ2Q, MOVQ, MOVQ2DQ

### **rFLAGS** Affected

AMD 64-Bit Technology

## **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | X    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | Х         | The destination operand was in a non-writable segment.                                        |
|                              | х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# MOVDQU Move Unaligned Double Quadword

Moves an unaligned 128-bit (double quadword) value:

- from an XMM register or 128-bit memory location to another XMM register, or
- from an XMM register to another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode             | Description                                                                                                    |
|--------------------------|--------------------|----------------------------------------------------------------------------------------------------------------|
| MOVDQU xmm1, xmm2/mem128 | F3 0F 6F <i>/r</i> | Moves 128-bit value from an XMM register or unaligned 128-bit memory location to the destination XMM register. |
| MOVDQU xmm1/mem128, xmm2 | F3 0F 7F <i>/r</i> | Moves 128-bit value from an XMM register to the destination XMM register or unaligned 128-bit memory location. |



Memory operands that are not aligned on a 16-byte boundary do not cause a generalprotection exception.

#### **Related Instructions**

MOVD, MOVDQA, MOVDQ2Q, MOVQ, MOVQ2DQ

#### **rFLAGS** Affected

AMD 64-Bit Technology

## **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                              |
|------------------------------|------|-----------------|-----------|-------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | Х               | X         | The SSE2 instructions are not supported, as indcated by bit<br>26 of CPUID standard function 1. |
|                              | х    | х               | x         | The emulate bit (EM) of CR0 was set to 1.                                                       |
|                              | х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.               |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                   |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                         |
| General protection, #GP      | X    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                            |
|                              |      |                 | x         | A null data segment was used to reference memory.                                               |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                          |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                    |
| Alignment check, #AC         |      | Х               | Х         | An unaligned-memory reference was performed while alignment checking was enabled.               |

# MOVHLPS Move Packed Single-Precision Floating-Point High to Low

Moves two packed single-precision floating-point values from the high-order 64 bits of an XMM register to the low-order 64 bits of another XMM register. The high-order 64 bits of the destination XMM register are not modified.





#### **Related Instructions**

MOVAPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | X    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |

# MOVHPD Move High Packed Double-Precision Floating-Point

Moves a double-precision floating-point value:

- from a 64-bit memory location to the high-order 64 bits of an XMM register, or
- from the high-order 64 bits of an XMM register to a 64-bit memory location.

The low-order 64 bits of the destination XMM register are not modified.

| Mnemonic                  | Opcode             | Description                                                                                   |
|---------------------------|--------------------|-----------------------------------------------------------------------------------------------|
| MOVHPD xmm, mem64         | 66 0F 16 <i>/r</i> | Moves double-precision floating-point value from a 64-bit memory location to an XMM register. |
| MOVHPD <i>mem64</i> , xmm | 66 0F 17 <i>/r</i> | Moves double-precision floating-point value from an XMM register to a 64-bit memory location. |





#### **Related Instructions**

MOVAPD, MOVLPD, MOVMSKPD, MOVSD, MOVUPD

#### **rFLAGS** Affected

## **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | X    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                        |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |
| Alignment check, #AC         |      | X               | Х         | An unaligned memory reference was performed while alignment checking was enabled.             |

# MOVHPS Move High Packed Single-Precision Floating-Point

Moves two packed single-precision floating-point values:

- from a 64-bit memory location to the high-order 64 bits of an XMM register, or
- from the high-order 64 bits of an XMM register to a 64-bit memory location.

The low-order 64 bits of the destination XMM register are not modified.

| Mnemonic          | Opcode          | Description                                                                                               |
|-------------------|-----------------|-----------------------------------------------------------------------------------------------------------|
| MOVHPS xmm, mem64 | 0F 16 <i>/r</i> | Moves two packed single-precision floating-point values from a 64-bit memory location to an XMM register. |
| MOVHPS mem64, xmm | 0F 17 <i>/r</i> | Moves two packed single-precision floating-point values from an XMM register to a 64-bit memory location. |





#### **Related Instructions**

### MOVAPS, MOVHLPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

## 

AMD 64-Bit Technology

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                   | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                              |      |                 | х         | A null data segment was used to reference memory.                                            |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                       |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |
| Alignment check, #AC         |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.            |

## MOVLHPS Move Packed Single-Precision Floating-Point Low to High

Moves two packed single-precision floating-point values from the low-order 64 bits of an XMM register to the high-order 64 bits of another XMM register. The low-order 64 bits of the destination XMM register are not modified.





#### **Related Instructions**

MOVAPS, MOVHLPS, MOVHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |

# MOVLPD Move Low Packed Double-Precision Floating-Point

Moves a double-precision floating-point value:

- from a 64-bit memory location to the low-order 64 bits of an XMM register, or
- from the low-order 64 bits of an XMM register to a 64-bit memory location.

The high-order 64 bits of the destination XMM register are not modified.

| Mnemonic                 | Opcode             | Description                                                          |                                |
|--------------------------|--------------------|----------------------------------------------------------------------|--------------------------------|
| MOVLPD xmm, mem64        | 66 0F 12 <i>/r</i> | Moves double-precision floating-point v location to an XMM register. | alue from a 64-bit memory      |
| MOVLPD <i>mem64, xmm</i> | 66 0F 13 /r        | Moves double-precision floating-point v<br>64-bit memory location.   | alue from an XMM register to a |
|                          | xmm                |                                                                      | mem64                          |
| 127                      | 64 63              | 0                                                                    | 63 0                           |
|                          |                    |                                                                      | сору                           |
|                          | mer                | n64                                                                  | xmm                            |



#### **Related Instructions**

MOVAPD, MOVHPD, MOVMSKPD, MOVSD, MOVUPD

## 

AMD 64-Bit Technology

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                        |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |
| Alignment check, #AC         |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.             |

# MOVLPS Move Low Packed Single-Precision Floating-Point

Moves two packed single-precision floating-point values:

- from a 64-bit memory location to the low-order 64 bits of an XMM register, or
- from the low-order 64 bits of an XMM register to a 64-bit memory location

The high-order 64 bits of the destination XMM register are not modified.

| Mnemonic          | Opcode          | Description                                                                                               |
|-------------------|-----------------|-----------------------------------------------------------------------------------------------------------|
| MOVLPS xmm, mem64 | 0F 12 <i>/r</i> | Moves two packed single-precision floating-point values from a 64-bit memory location to an XMM register. |
| MOVLPS mem64, xmm | 0F 13 <i>/r</i> | Moves two packed single-precision floating-point values from an XMM register to a 64-bit memory location. |





#### **Related Instructions**

MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVMSKPS, MOVSS, MOVUPS

## 

AMD 64-Bit Technology

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                       |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.             |
|                              | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                |
|                              | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of the control register (CR4) was cleared to 0. |
| Device not available,<br>#NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                            |
| Stack, #SS                   | X    | X               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                  |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                     |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                                        |
|                              |      |                 | Х         | The destination operand was in a non-writable segment.                                                   |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                             |
| Alignment check, #AC         |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.                        |

# MOVMSKPD Extract Packed Double-Precision Floating-Point Sign Mask

Moves the sign bits of two packed double-precision floating-point values in an XMM register to the two low-order bits of a 32-bit general-purpose register, with zero-extension.





#### **Related Instructions**

MOVMSKPS, PMOVMSKB

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception (vector)           | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | X               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | X    | X               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |

# MOVMSKPS Extract Packed Single-Precision Floating-Point Sign Mask

Moves the sign bits of four packed single-precision floating-point values in an XMM register to the four low-order bits of a 32-bit general-purpose register, with zero-extension.





#### **Related Instructions**

MOVMSKPD, PMOVMSKB

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

## 

AMD 64-Bit Technology

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | X    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | X    | X               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                |

# MOVNTDQ Move Non-Temporal Double Quadword

Stores a 128-bit (double quadword) XMM register value into a 128-bit memory location. This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The exact method by which cache pollution is minimized depends on the hardware implementation of the instruction. For further information, see "Memory Optimization" in Volume 1.

MOVNTDQ is weakly-ordered with respect to other instructions that operate on memory. Software should use an SFENCE instruction to force strong memory ordering of MOVNTDQ with respect to other stores.



#### **Related Instructions**

MOVNTI, MOVNTPD, MOVNTPS, MOVNTQ

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Execution                              | Real | Virtual<br>8086 | Protected | Course of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------|
| Exception<br>Invalid opcode, #UD       | Х    | о000<br>Х       | X         | Cause of Exception<br>The SSE2 instructions are not supported, as indicated by bit |
| ······································ |      |                 |           | 26 of CPUID standard function 1.                                                   |
|                                        | Х    | Х               | Х         | The emulate bit (CR0.EM) was set to 1.                                             |
|                                        | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0.     |
| Device not available,<br>#NM           | Х    | Х               | X         | The task-switch bit (CR0.TS) was set to 1.                                         |
| Stack, #SS                             | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.            |
| General protection, #GP                | X    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.               |
|                                        |      |                 | Х         | A null data segment was used to reference memory.                                  |
|                                        |      |                 | Х         | The destination operand was in a non-writable segment.                             |
|                                        | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                          |
| Page fault, #PF                        |      | Х               | Х         | A page fault resulted from executing the instruction.                              |

## MOVNTPD Move Non-Temporal Packed Double-Precision Floating-Point

Stores two double-precision floating-point XMM register values into a 128-bit memory location. This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The exact method by which cache pollution is minimized depends on the hardware implementation of the instruction. For further information, see "Memory Optimization" in Volume 1.



MOVNTPD is weakly-ordered with respect to other instructions that operate on memory. Software should use an SFENCE instruction to force strong memory ordering of MOVNTPD with respect to other stores.

#### **Related Instructions**

MOVNTDQ, MOVNTI, MOVNTPS, MOVNTQ

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

|                              |      | Virtual |           |                                                                                               |
|------------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------|
| Exception                    | Real | 8086    | Protected | Cause of Exception                                                                            |
| Invalid opcode, #UD          | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | Х       | Х         | The emulate bit (CR0.EM) was set to 1.                                                        |
|                              | Х    | X       | Х         | The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0.                |
| Device not available,<br>#NM | Х    | X       | Х         | The task-switch bit (CR0.TS) was set to 1.                                                    |
| Stack, #SS                   | Х    | X       | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | X    | Х       | X         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |         | Х         | A null data segment was used to reference memory.                                             |
|                              |      |         | Х         | The destination operand was in a non-writable segment.                                        |
|                              | X    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х       | Х         | A page fault resulted from executing the instruction.                                         |

## MOVNTPS Move Non-Temporal Packed Single-Precision Floating-Point

Stores four single-precision floating-point XMM register values into a 128-bit memory location. This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The exact method by which cache pollution is minimized depends on the hardware implementation of the instruction. For further information, see "Memory Optimization" in Volume 1.





MOVNTPD is weakly-ordered with respect to other instructions that operate on memory. Software should use an SFENCE instruction to force strong memory ordering of MOVNTPD with respect to other stores.

#### **Related Instructions**

MOVNTDQ, MOVNTI, MOVNTPD, MOVNTQ

#### rFLAGS Affected

## **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | х    | х               | х         | The emulate bit (CR0.EM) was set to 1.                                                       |
|                              | Х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0.               |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (CR0.TS) was set to 1.                                                   |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP      | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                              |      |                 | х         | A null data segment was used to reference memory.                                            |
|                              |      |                 | Х         | The destination operand was in a non-writable segment.                                       |
|                              | Х    | х               | х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from executing the instruction.                                        |

# MOVQ Move Quadword

Moves a 64-bit value in one of the following ways:

- from the low-order 64 bits of an XMM register or a 64-bit memory location to the low-order 64 bits of another XMM register, with zero-extension to 128 bits
- from the low-order 64 bits of an XMM register to the low-order 64 bits of another XMM register, with zero-extension to 128 bits or to a 64-bit memory location

| Mnemonic              | Opcode      | Description                                                                    |
|-----------------------|-------------|--------------------------------------------------------------------------------|
| MOVQ xmm1, xmm2/mem64 | F3 0F 7E/r  | Moves 64-bit value from an XMM register or memory location to an XMM register. |
| MOVQ xmm1/mem64, xmm2 | 66 0F D6 /r | Moves 64-bit value from an XMM register to an XMM register or memory location. |
|                       |             |                                                                                |





#### **Related Instructions**

MOVD, MOVDQA, MOVDQU, MOVDQ2Q, MOVQ2DQ

### rFLAGS Affected

## **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                        |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |
| Alignment check, #AC         |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.             |

# MOVQ2DQ Move Quadword to Quadword

Moves a 64-bit value from an MMX register to the low-order 64 bits of an XMM register, with zero-extension to 128 bits.



#### **Related Instructions**

MOVD, MOVDQA, MOVDQU, MOVDQ2Q, MOVQ

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|----------------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                          | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                                              | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                                              | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM                 | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| x87 floating-point<br>exception pending, #MF | Х    | Х               | Х         | An exception was pending due to an x87 floating-point instruction.                            |

# MOVSD Move Scalar Double-Precision Floating-Point

Moves a scalar double-precision floating-point value:

- from the low-order 64 bits of an XMM register or a 64-bit memory location to the low-order 64 bits of another XMM register, or
- from the low-order 64 bits of an XMM register to the low-order 64 bits of another XMM register or a 64-bit memory location.

If the source operand is an XMM register, the high-order 64 bits of the destination XMM register are not modified. If the source operand is a memory location, the high-order 64 bits of the destination XMM register are cleared to all 0s.

| Mnemonic               | Opcode             | Description                                                                                                    |
|------------------------|--------------------|----------------------------------------------------------------------------------------------------------------|
| MOVSD xmm1, xmm2/mem64 | F2 0F 10 <i>/r</i> | Moves double-precision floating-point value from an XMM register or 64-bit memory location to an XMM register. |
| MOVSD xmm1/mem64, xmm2 | F2 0F 11 <i>/r</i> | Moves double-precision floating-point value from an XMM register to an XMM register or 64-bit memory location. |



This MOVSD instruction should not be confused with the same-mnemonic MOVSD (move string doubleword) instruction in the general-purpose instruction set. Assemblers can distinguish the instructions by the number and type of operands.

#### **Related Instructions**

MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVUPD

#### **rFLAGS** Affected

None.

#### **MXCSR Flags Affected**

None.

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | x    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.             |
| Device not available,<br>#NM | X    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | X    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                        |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |
| Alignment check, #AC         |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.             |

# MOVSS Move Scalar Single-Precision Floating-Point

Moves a scalar single-precision floating-point value:

- from the low-order 32 bits of an XMM register or a 32-bit memory location to the low-order 32 bits of another XMM register, or
- from a 32-bit memory location to the low-order 32 bits of an XMM register, with zero-extension to 128 bits.

If the source operand is an XMM register, the high-order 96 bits of the destination XMM register are not modified. If the source operand is a memory location, the high-order 96 bits of the destination XMM register are cleared to all 0s.

| Mnemonic               | Opcode             | Description                                                                                                    |
|------------------------|--------------------|----------------------------------------------------------------------------------------------------------------|
| MOVSS xmm1, xmm2/mem32 | F3 0F 10 <i>/r</i> | Moves single-precision floating-point value from an XMM register or 32-bit memory location to an XMM register. |
| MOVSS xmm1/mem32, xmm2 | F3 0F 11 <i>/r</i> | Moves single-precision floating-point value from an XMM register to an XMM register or 32-bit memory location. |



#### movss.e

#### **Related Instructions**

MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVUPS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                              |
|------------------------------|------|-----------------|-----------|-------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit<br>25 of CPUID standard function 1. |
|                              | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                       |
|                              | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.               |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                   |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                         |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                            |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                               |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                          |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                    |
| Alignment check, #AC         |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.               |

# MOVUPD Move Unaligned Packed Double-Precision Floating-Point

Moves two packed double-precision floating-point values:

- from an XMM register or 128-bit memory location to another XMM register, or
- from an XMM register to another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode             | Description                                                                                                                               |
|--------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| MOVUPD xmm1, xmm2/mem128 | 66 0F 10 <i>/r</i> | Moves two packed double-precision floating-point values from an XMM register or unaligned 128-bit memory location to an XMM register.     |
| MOVUPD xmm1/mem128, xmm2 | 66 0F 11 /r        | Moves two packed double-precision floating-point values from an XMM register to an XMM register or unaligned 128-<br>bit memory location. |





Memory operands that are not aligned on a 16-byte boundary do not cause a generalprotection exception.

## 

AMD 64-Bit Technology

### **Related Instructions**

MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVSD

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              |      |                 | х         | The destination operand was in a non-writable segment.                                        |
| Page fault, #PF              |      | X               | Х         | A page fault resulted from the execution of the instruction.                                  |
| Alignment check, #AC         |      | X               | Х         | An unaligned-memory reference was performed while alignment checking was enabled.             |

# MOVUPS Move Unaligned Packed Single-Precision Floating-Point

Moves four packed single-precision floating-point values:

- from an XMM register or 128-bit memory location to another XMM register, or
- from an XMM register to another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode          | Description                                                                                                                            |
|--------------------------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------|
| MOVUPS xmm1, xmm2/mem128 | 0F 10 <i>/r</i> | Moves four packed single-precision floating-point values from an XMM register or unaligned 128-bit memory location to an XMM register. |
| MOVUPS xmm1/mem128, xmm2 | 0F 11 /r        | Moves four packed single-precision floating-point values from an XMM register to an XMM register or unaligned 128-bit memory location. |





Memory operands that are not aligned on a 16-byte boundary do not cause a generalprotection exception.

#### **Related Instructions**

MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

None

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                   | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                              |      |                 | х         | A null data segment was used to reference memory.                                            |
|                              |      |                 | Х         | The destination operand was in a non-writable segment.                                       |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |
| Alignment check, #AC         |      | Х               | Х         | An unaligned-memory reference was performed while alignment checking was enabled.            |

## MULPD Multiply Packed Double-Precision Floating-Point

Multiplies each of the two packed double-precision floating-point values in the first source operand by the corresponding packed double-precision floating-point value in the second source operand and writes the result of each multiplication operation in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

MULPS, MULSD, MULSS, PFMUL

#### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                                                                                                 | R  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                                                                                                    |    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                                                                                                 | 14 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Freedier                              | Deal | Virtual | Durata sta d | Course of Europhics                                                                                                                              |  |  |  |
|---------------------------------------|------|---------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| Exception                             | Real | 8086    | Protected    | Cause of Exception                                                                                                                               |  |  |  |
| Invalid opcode, #UD                   | Х    | X       | Х            | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1.                                                    |  |  |  |
|                                       | Х    | х       | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |  |  |  |
|                                       | Х    | Х       | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |  |  |  |
|                                       | Х    | Х       | Х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |  |  |  |
| Device not available, #NM             | Х    | Х       | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |  |  |  |
| Stack, #SS                            | Х    | X       | Х            | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |  |  |  |
| General protection, #GP               | Х    | X       | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                             |  |  |  |
|                                       |      |         | Х            | A null data segment was used to reference memory.                                                                                                |  |  |  |
|                                       | Х    | Х       | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                        |  |  |  |
| Page fault, #PF                       |      | Х       | Х            | A page fault resulted from the execution of the instruction.                                                                                     |  |  |  |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х       | X            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.                                                                    |  |  |  |
|                                       |      |         |              | See SIMD Floating-Point Exceptions, below, for details.                                                                                          |  |  |  |
| SIMD Floating-Point Exceptions        |      |         |              |                                                                                                                                                  |  |  |  |
| Invalid-operation<br>exception (IE)   | Х    | Х       | X            | A source operand was an SNaN value.                                                                                                              |  |  |  |
|                                       | Х    | Х       | Х            | Zero was multiplied by ±infinity.                                                                                                                |  |  |  |
| Overflow exception (OE)               | Х    | Х       | Х            | A rounded result was too large to fit into the format of the destination operand.                                                                |  |  |  |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Underflow exception (UE)               | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                            |
| Precision exception (PE)               | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

26568-Rev. 3.02-August 2002

## MULPS

# **Multiply Packed Single-Precision Floating-Point**

Multiplies each of the four packed single-precision floating-point values in first source operand by the corresponding packed single-precision floating-point value in the second source operand and writes the result of each multiplication operation in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



Multiplies packed single-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the results in the destination XMM register.



### **Related Instructions**

MULPD, MULSD, MULSS, PFMUL

### rFLAGS Affected

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | Μ  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | X    | X               | X         | The SSE instructions are not supported, as indicated by bit<br>25 of CPUID standard function 1.                                                  |
|                                       | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                                       | х    | х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                                       | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                            | Х    | X               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |
| General protection, #GP               | X    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                             |
|                                       |      |                 | Х         | A null data segment was used to reference memory.                                                                                                |
|                                       | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                                                                        |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                     |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.                                                                 |
|                                       |      |                 |           | See SIMD Floating-Point Exceptions, below, for details.                                                                                          |
|                                       | r    | 1               | •         | Point Exceptions                                                                                                                                 |
| Invalid-operation<br>exception (IE)   | Х    | Х               | Х         | A source operand was an SNaN value.                                                                                                              |
|                                       | Х    | Х               | Х         | Zero was multipled by ±infinity.                                                                                                                 |
| Overflow exception (OE)               | Х    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand.                                                                |

# 

## AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Underflow exception (UE)               | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | X    | Х               | Х         | A source operand was a denormal value.                                            |
| Precision exception (PE)               | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

# MULSD Multiply Scalar Double-Precision Floating-Point

Multiplies the double-precision floating-point value in the low-order quadword of first source operand by the double-precision floating-point value in the low-order quadword of the second source operand and writes the result in the low-order quadword of the destination (first source). The high-order quadword of the destination is not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 64-bit memory location.





#### **Related Instructions**

MULPD, MULPS, MULSS, PFMUL

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | X               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | х               | х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | X    | Х               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | X    | X               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       | L    | SIN             | ND Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | Х               | X            | A source operand was an SNaN value.                                                                                                                 |
| • 、 /                                 | Х    | Х               | Х            | Zero was multipled by ±infinity.                                                                                                                    |
| Overflow exception (OE)               | Х    | Х               | X            | A rounded result was too large to fit into the format of the destination operand.                                                                   |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Underflow exception (UE)               | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X         | A source operand was a denormal value.                                            |
| Precision exception (PE)               | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

## MULSS

# **Multiply Scalar Single-Precision Floating-Point**

Multiplies the single-precision floating-point value in the low-order doubleword of first source operand by the single-precision floating-point value in the low-order doubleword of the second source operand and writes the result in the low-order doubleword of the destination (first source). The three high-order doublewords of the destination are not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 32-bit memory location.

| Mnemonic               | Opcode             | Description                                                                                                                                                                                                          |
|------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MULSS xmm1, xmm2/mem32 | F3 OF 59 <i>/r</i> | Multiplies low-order single-precision floating-point values in an XMM register and another XMM register or 32-bit memory location and writes the result in the low-order doubleword of the destination XMM register. |



### **Related Instructions**

MULPD, MULPS, MULSD, PFMUL

### **rFLAGS** Affected

## **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | РМ | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | Μ  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected           | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х                   | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х    | х               | Х                   | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | X    | Х               | Х                   | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | X    | Х               | Х                   | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х                   | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X                   | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | X    | Х               | X                   | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х                   | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | Х               | Х                   | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | X                   | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х                   | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIA             | <b>ND Floating-</b> | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | Х               | Х                   | A source operand was an SNaN value.                                                                                                                 |
|                                       | Х    | Х               | Х                   | Zero was multipled by ±infinity.                                                                                                                    |
| Overflow exception (OE)               | Х    | Х               | Х                   | A rounded result was too large to fit into the format of the destination operand.                                                                   |

# 

## AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Underflow exception (UE)               | X    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Denormalized-operand<br>exception (DE) | X    | Х               | Х         | A source operand was a denormal value.                                            |
| Precision exception (PE)               | X    | Х               | Х         | A result could not be represented exactly in the destination format.              |

# ORPD Logical Bitwise OR Packed Double-Precision Floating-Point

Performs a bitwise logical OR of the two packed double-precision floating-point values in the first source operand and the corresponding two packed double-precision floating-point values in the second source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic               | Opcode             | Description                                                                                                                                                                                                                  |
|------------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ORPD xmm1, xmm2/mem128 | 66 0F 56 <i>/r</i> | Performs bitwise logical OR of two packed double-precision floating-<br>point values in an XMM register and in another XMM register or 128-<br>bit memory location and writes the result in the destination XMM<br>register. |



#### **Related Instructions**

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPS, XORPD, XORPS

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | X    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              | Х    | X               | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# ORPS Logical Bitwise OR Packed Single-Precision Floating-Point

Performs a bitwise logical OR of the four packed single-precision floating-point values in the first source operand and the corresponding four packed single-precision floating-point values in the second source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

#### Mnemonic

Opcode Description

0F 56 /r

ORPS xmm1, xmm2/mem128

Performs bitwise logical OR of four packed single-precision floatingpoint values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.



### **Related Instructions**

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPD, XORPD, XORPS

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                            |
|                              | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

## PACKSSDW Pack with Saturation Signed Doubleword to Word

Converts each 32-bit signed integer in the first and second source operands to a 16-bit signed integer and packs the converted values into words in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.

Converted values from the first source operand are packed into the low-order words of the destination, and the converted values from the second source operand are packed into the high-order words of the destination.



For each packed value in the destination, if the value is larger than the largest signed 16-bit integer, it is saturated to 7FFFh, and if the value is smaller than the smallest signed 16-bit integer, it is saturated to 8000h.

#### **Related Instructions**

PACKSSWB, PACKUSWB

## 

AMD 64-Bit Technology

## rFLAGS Affected

None

## **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PACKSSWB Pack with Saturation Signed Word to Byte

Converts each 16-bit signed integer in the first and second source operands to an 8-bit signed integer and packs the converted values into bytes in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.

Converted values from the first source operand are packed into the low-order bytes of the destination, and the converted values from the second source operand are packed into the high-order bytes of the destination.



For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 80h.

### **Related Instructions**

PACKSSDW, PACKUSWB

## 

AMD 64-Bit Technology

### rFLAGS Affected

None

## **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

## PACKUSWB Pack with Saturation Signed Word to Unsigned Byte

Converts each 16-bit signed integer in the first and second source operands to an 8-bit unsigned integer and packs the converted values into bytes in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.

Converted values from the first source operand are packed into the low-order bytes of the destination, and the converted values from the second source operand are packed into the high-order bytes of the destination.



For each packed value in the destination, if the value is larger than the largest unsigned 8-bit integer, it is saturated to FFh, and if the value is smaller than the smallest unsigned 8-bit integer, it is saturated to 00h.

### **Related Instructions**

PACKSSDW, PACKSSWB

## 

AMD 64-Bit Technology

## rFLAGS Affected

None

## **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PADDB Packed Add Bytes

Adds each packed 8-bit integer value in the first source operand to the corresponding packed 8-bit integer in the second source operand and writes the integer result of each addition in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written in the destination.

#### **Related Instructions**

PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PADDD Packed Add Doublewords

Adds each packed 32-bit integer value in the first source operand to the corresponding packed 32-bit integer in the second source operand and writes the integer result of each addition in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of each result are written in the destination.

### **Related Instructions**

PADDB, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PADDQ Packed Add Quadwords

Adds each packed 64-bit integer value in the first source operand to the corresponding packed 64-bit integer in the second source operand and writes the integer result of each addition in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of each result are written in the destination.

### **Related Instructions**

PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PADDSB Packed Add Signed with Saturation Bytes

Adds each packed 8-bit signed integer value in the first source operand to the corresponding packed 8-bit signed integer in the second source operand and writes the signed integer result of each addition in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest representable signed 8-bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 80h.

#### **Related Instructions**

PADDB, PADDD, PADDQ, PADDSW, PADDUSB, PADDUSW, PADDW

### rFLAGS Affected

## **MXCSR Flags Affected**

None

|                           |      | Virtual |           |                                                                                                  |
|---------------------------|------|---------|-----------|--------------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                               |
| Invalid opcode, #UD       | Х    | Х       | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х       | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х       | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х       | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |         | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х       | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                     |

# PADDSW Packed Add Signed with Saturation Words

Adds each packed 16-bit signed integer value in the first source operand to the corresponding packed 16-bit signed integer in the second source operand and writes the signed integer result of each addition in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest representable signed 16-bit integer, it is saturated to 7FFFh, and if the value is smaller than the smallest signed 16-bit integer, it is saturated to 8000h.

#### **Related Instructions**

PADDB, PADDD, PADDQ, PADDSB, PADDUSB, PADDUSW, PADDW

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PADDUSB Packed Add Unsigned with Saturation Bytes

Adds each packed 8-bit unsigned integer value in the first source operand to the corresponding packed 8-bit unsigned integer in the second source operand and writes the unsigned integer result of each addition in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest unsigned 8-bit integer, it is saturated to FFh, and if the value is smaller than the smallest unsigned 8-bit integer, it is saturated to 00h.

#### **Related Instructions**

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSW, PADDW

### rFLAGS Affected

## **MXCSR Flags Affected**

None

|                           |      | Virtual |           |                                                                                               |
|---------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                            |
| Invalid opcode, #UD       | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP   | Х    | Х       | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                           |      |         | х         | A null data segment was used to reference memory.                                             |
|                           | Х    | Х       | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                  |

## PADDUSW

## Packed Add Unsigned with Saturation Words

Adds each packed 16-bit unsigned integer value in the first source operand to the corresponding packed 16-bit unsigned integer in the second source operand and writes the unsigned integer result of each addition in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest unsigned 16-bit integer, it is saturated to FFFFh, and if the value is smaller than the smallest unsigned 16-bit integer, it is saturated to 0000h.

### **Related Instructions**

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDW

### rFLAGS Affected

## **MXCSR Flags Affected**

None

| _                         |      | Virtual |           |                                                                                               |
|---------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                            |
| Invalid opcode, #UD       | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP   | Х    | X       | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                           |      |         | х         | A null data segment was used to reference memory.                                             |
|                           | X    | Х       | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                  |

# PADDW Packed Add Words

Adds each packed 16-bit integer value in the first source operand to the corresponding packed 16-bit integer in the second source operand and writes the integer result of each addition in the corresponding word of the destination (second source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of the result are written in the destination.

### **Related Instructions**

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PAND Packed Logical Bitwise AND

Performs a bitwise logical AND of the values in the first and second source operands and writes the result in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



### **Related Instructions**

PANDN, POR, PXOR

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | X    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# PANDN Packed Logical Bitwise AND NOT

Performs a bitwise logical AND of the value in the second source operand and the one's complement of the value in the first source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



### **Related Instructions**

PAND, POR, PXOR

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | X    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# PAVGB Packed Average Unsigned Bytes

Computes the rounded average of each packed unsigned 8-bit integer value in the first source operand and the corresponding packed 8-bit unsigned integer in the second source operand and writes each average in the corresponding byte of the destination (first source). The average is computed by adding each pair of operands, adding 1 to the 9-bit temporary sum, and then right-shifting the temporary sum by one bit position. The destination and source operands are an XMM register and another XMM register or 128-bit memory location.



 average

### **Related Instructions**

PAVGW

# rFLAGS Affected

None

# **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                                                                                  |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

# PAVGW Packed Average Unsigned Words

Computes the rounded average of each packed unsigned 16-bit integer value in the first source operand and the corresponding packed 16-bit unsigned integer in the second source operand and writes each average in the corresponding word of the destination (first source). The average is computed by adding each pair of operands, adding 1 to the 17-bit temporary sum, and then right-shifting the temporary sum by one bit position. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.





#### **Related Instructions**

PAVGB

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | X    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                                                                                  |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

# PCMPEQB Packed Compare Equal Bytes

Compares corresponding packed bytes in the first and second source operands and writes the result of each comparison in the corresponding byte of the destination (first source). For each pair of bytes, if the values are equal, the result is all 1s. If the values are not equal, the result is all 0s. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.

Description



PCMPEQB xmm1, xmm2/mem128

66 0F 74 /r

Opcode

Compares packed bytes in an XMM register and an XMM register or 128-bit memory location.



#### **Related Instructions**

PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PCMPEQD Packed Compare Equal Doublewords

Compares corresponding packed 32-bit values in the first and second source operands and writes the result of each comparison in the corresponding 32 bits of the destination (first source). For each pair of doublewords, if the values are equal, the result is all 1s. If the values are not equal, the result is all 0s. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

PCMPEQB, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PCMPEQW Packed Compare Equal Words

Compares corresponding packed 16-bit values in the first and second source operands and writes the result of each comparison in the corresponding 16 bits of the destination (first source). For each pair of words, if the values are equal, the result is all 1s. If the values are not equal, the result is all 0s. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

PCMPEQB, PCMPEQD, PCMPGTB, PCMPGTD, PCMPGTW

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PCMPGTB Packed Compare Greater Than Signed Bytes

Compares corresponding packed signed bytes in the first and second source operands and writes the result of each comparison in the corresponding byte of the destination (first source). For each pair of bytes, if the value in the first source operand is greater than the value in the second source operand, the result is all 1s. If the value in the first source operand is less than or equal to the value in the second source operand, the result is all 0s. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

### PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTD, PCMPGTW

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

#### **Packed Compare Greater Than Signed** PCMPGTD **Doublewords**

Compares corresponding packed signed 32-bit values in the first and second source operands and writes the result of each comparison in the corresponding 32 bits of the destination (first source). For each pair of doublewords, if the value in the first source operand is greater than the value in the second source operand, the result is all 1s. If the value in the first source operand is less than or equal to the value in the second source operand, the result is all 0s. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.

| Mnemonic                  | Opcode                    | Description                                                                                                |  |  |  |
|---------------------------|---------------------------|------------------------------------------------------------------------------------------------------------|--|--|--|
| PCMPGTD xmm1, xmm2/mem128 | 66 0F 66 <i>/r</i>        | Compares packed signed 32-bit values in an XMM register<br>and an XMM register or 128-bit memory location. |  |  |  |
| xmm1                      |                           | xmm2/mem128                                                                                                |  |  |  |
| 127 96 95 · 64 63         | . 32 31 0                 | 127 96 95 64 63 32 31 0                                                                                    |  |  |  |
| compare<br>all 1s or 0s   | compare —<br>all 1s or 0s |                                                                                                            |  |  |  |

**Related Instructions** 

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTW

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

None

pcmpgtd-128.eps

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PCMPGTW Packed Compare Greater Than Signed Words

Compares corresponding packed signed 16-bit values in the first and second source operands and writes the result of each comparison in the corresponding 16 bits of the destination (first source). For each pair of words, if the value in the first source operand is greater than the value in the second source operand, the result is all 1s. If the value in the first source operand is less than or equal to the value in the second source operand, the result is all 0s. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

### PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

#### **Extract Packed Word PEXTRW**

Extracts a 16-bit value from an XMM register, as selected by the immediate byte operand (as shown in Table 1-2) and writes it to the low-order word of a 32-bit generalpurpose register, with zero-extension to 32 bits.

| Mnemonic |
|----------|
|----------|

Opcode PEXTRW req32, xmm, imm8 66 0F C5 /r ib Description

Extracts a 16-bit value from an XMM register and writes it to low-order 16 bits of a general-purpose register.



| Immediate-Byte<br>Bit Field | Value of Bit Field | Source Bits Extracted |
|-----------------------------|--------------------|-----------------------|
|                             | 0                  | 15–0                  |
|                             | 1                  | 31–16                 |
|                             | 2                  | 47–32                 |
| 2-0                         | 3                  | 63–48                 |
| 2-0                         | 4                  | 79–64                 |
|                             | 5                  | 95–80                 |
|                             | 6                  | 111–96                |
|                             | 7                  | 127-112               |

# 

AMD 64-Bit Technology

# **Related Instructions**

PINSRW

# rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |

# PINSRW Packed Insert Word

Inserts a 16-bit value from the low-order word of a 32-bit general purpose register or a 16-bit memory location into an XMM register. The location in the destination register is selected by the immediate byte operand, as shown in Table 1-3. The other words in the destination register operand are not modified.



pinsrw-128.eps

| Immediate-Byte<br>Bit Field | Value of Bit Field | Destination Bits Filled |
|-----------------------------|--------------------|-------------------------|
|                             | 0                  | 15–0                    |
| -                           | 1                  | 31–16                   |
| -                           | 2                  | 47–32                   |
| 2–0                         | 3                  | 63–48                   |
| 2-0                         | 4                  | 79–64                   |
| -                           | 5                  | 95–80                   |
| -                           | 6                  | 111–96                  |
| -                           | 7                  | 127-112                 |

### Table 1-3. Immediate-Byte Operand Encoding for 128-Bit PINSRW

### **Related Instructions**

### PEXTRW

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | x         | A null data segment was used to reference memory.                                                                                                                                                          |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |
| Alignment check, #AC      |      | Х               | X         | An unaligned memory reference was performed while alignment checking was enabled.                                                                                                                          |

# **PMADDWD**

# **Packed Multiply Words and Add Doublewords**

Multiplies each packed 16-bit signed value in the first source operand by the corresponding packed 16-bit signed value in the second source operand, adds the adjacent intermediate 32-bit results of each multiplication (for example, the multiplication results for the adjacent bit fields 63–48 and 47–32, and 31–16 and 15–0), and writes the 32-bit result of each addition in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.





There is only one case in which the result of the multiplication and addition will not fit in a signed 32-bit destination. If all four of the 16-bit source operands used to produce a 32-bit multiply-add result have the value 8000h, the 32-bit result is 8000\_0000h, which is incorrect.

26568–Rev. 3.02–August 2002

## **Related Instructions**

### PMULHUW, PMULHW, PMULLW, PMULUDQ

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | x               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | X               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | X    | х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PMAXSW

# **Packed Maximum Signed Words**

Compares each of the packed 16-bit signed integer values in the first source operand with the corresponding packed 16-bit signed integer value in the second source operand and writes the numerically greater of the two values for each comparison in the corresponding word of the destination (first source). The first source/destination and second source operands are an XMM register and an XMM register or 128-bit memory location.

#### Mnemonic

PMAXSW xmm1, xmm2/mem128

**Opcode** 66 0F EE /*r* 

Description

Compares packed signed 16-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the greater value of each comparison in destination XMM register.



pmaxsw-128.eps

### **Related Instructions**

PMAXUB, PMINSW, PMINUB

# rFLAGS Affected

None

# **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | Х    | Х               | Х         | A memory operand was not aligned on a 16-byte boundary.                                                                                                                                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

# PMAXUB

# Packed Maximum Unsigned Bytes

Compares each of the packed 8-bit unsigned integer values in the first source operand with the corresponding packed 8-bit unsigned integer value in the second source operand and writes the numerically greater of the two values for each comparison in the corresponding byte of the destination (first source). The first source/destination and second source operands are an XMM register and an XMM register or 128-bit memory location.





### **Related Instructions**

PMAXSW, PMINSW, PMINUB

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | Х    | Х               | Х         | A memory operand was not aligned on a 16-byte boundary.                                                                                                                                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

# **PMINSW**

# **Packed Minimum Signed Words**

Compares each of the packed 16-bit signed integer values in the first source operand with the corresponding packed 16-bit signed integer value in the second source operand and writes the numerically lesser of the two values for each comparison in the corresponding word of the destination (first source). The first source/destination and second source operands are an XMM register and an XMM register or 128-bit memory location.





pminsw-128.eps

#### **Related Instructions**

PMAXSW, PMAXUB, PMINUB

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | Х    | Х               | Х         | A memory operand was not aligned on a 16-byte boundary.                                                                                                                                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

# **PMINUB**

# **Packed Minimum Unsigned Bytes**

Compares each of the packed 8-bit unsigned integer values in the first source operand with the corresponding packed 8-bit unsigned integer value in the second source operand and writes the numerically lesser of the two values for each comparison in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



pminub-128.eps

#### **Related Instructions**

PMAXSW, PMAXUB, PMINSW

### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | Х    | Х               | Х         | A memory operand was not aligned on a 16-byte boundary.                                                                                                                                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

# PMOVMSKB Packed Move Mask Byte

Moves the most-significant bit of each byte in the source operand to the destination, with zero-extension to 32 bits. The destination and source operands are a 32-bit general-purpose register and an XMM register. The result is written to the low-order word of the general-purpose register.



pmovmskb-128.eps

#### **Related Instructions**

MOVMSKPD, MOVMSKPS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | X    | X               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |

## **PMULHUW**

# **Packed Multiply High Unsigned Word**

Multiplies each packed unsigned 16-bit values in the first source operand by the corresponding packed unsigned word in the second source operand and writes the high-order 16 bits of each intermediate 32-bit result in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



pmulhuw-128.eps

#### **Related Instructions**

PMADDWD, PMULHW, PMULLW, PMULUDQ

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                                                         |
|---------------------------|------|-----------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID extended function 8000_0001. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                                                  |
|                           | X    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                                                           |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                                              |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                                                    |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                                                       |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                                                                                                                          |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                                                                                  |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                                               |

pmulhw-128 eps

## PMULHW

# **Packed Multiply High Signed Word**

Multiplies each packed 16-bit signed integer value in the first source operand by the corresponding packed 16-bit signed integer in the second source operand and writes the high-order 16 bits of the intermediate 32-bit result of each multiplication in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



multiply

# **Related Instructions**

multiply

PMADDWD, PMULHUW, PMULLW, PMULUDQ

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

## PMULLW

# **Packed Multiply Low Signed Word**

Multiplies each packed 16-bit signed integer value in the first source operand by the corresponding packed 16-bit signed integer in the second source operand and writes the low-order 16 bits of the intermediate 32-bit result of each multiplication in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



pmullw-128.eps

#### **Related Instructions**

PMADDWD, PMULHUW, PMULHW, PMULUDQ

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                           |      |                 | х         | A null data segment was used to reference memory.                                             |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# PMULUDQ Packed Multiply Unsigned Doubleword and Store Quadword

Multiplies two pairs of 32-bit unsigned integer values in the first and second source operands and writes the two 64-bit results in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location. The source operands are in the first (low-order) and third doublewords of the source operands, and the result of each multiply is stored in the first and second quadwords of the destination XMM register.



pmuludq-128.eps

#### **Related Instructions**

PMADDWD, PMULHUW, PMULHW, PMULLW

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

POR

AMD 64-Bit Technology

# Packed Logical Bitwise OR

Performs a bitwise logical OR of the values in the first and second source operands and writes the result in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

PAND, PANDN, PXOR

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              | Х    | X               | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

## **PSADBW**

## Packed Sum of Absolute Differences of Bytes Into a Word

Computes the absolute differences of eight corresponding packed 8-bit unsigned integers in the first and second source operands and writes the unsigned 16-bit integer result of the sum of the eight differences in a word in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.

| Mnemonic                 | Opcode             | Description                                   |
|--------------------------|--------------------|-----------------------------------------------|
| PSADBW xmm1, xmm2/mem128 | 66 OF F6 <i>/r</i> | Compute the<br>packed 8-bit of<br>another XMM |

Compute the sum of the absolute differences of two sets of packed 8-bit unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes the 16-bit unsigned integer result in the destination XMM register.



The sum of the differences of the eight bytes in the high-order quadwords of the source operands are written in the least-significant word of the high-order quadword in the destination XMM register, with the remaining bytes cleared to all 0s. The sum of

### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

#### Exceptions

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                                            |
|---------------------------|------|-----------------|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | X    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1; and the AMD extensions to MMX are not supported, as indicated by bit 22 of CPUID |
|                           | Х    | Х               | х         | extended function 8000_0001.                                                                                                                                                  |
|                           | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                                                     |
|                           |      |                 |           | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                                                                                              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                                                 |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                                                       |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                                          |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                                                                                             |
|                           | х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                                                     |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                                                  |

AMD 64-Bit Technology

# PSHUFD

# **Packed Shuffle Doublewords**

Moves any one of the four packed doublewords in an XMM register or 128-bit memory location to each doubleword in another XMM register. In each case, the value of the destination doubleword is determined by a two-bit field in the immediate-byte operand, with bits 0 and 1 selecting the contents of the low-order doubleword, bits 2 and 3 selecting the second doubleword, bits 4 and 5 selecting the third doubleword, and bits 6 and 7 selecting the high-order doubleword. Refer to Table 1-4 on page 277. A doubleword in the source operand may be copied to more than one doubleword in the destination.



pshufd.eps

| Destination Bits Filled | Immediate-Byte<br>Bit Field | Value of Bit Field | Source Bits Moved |
|-------------------------|-----------------------------|--------------------|-------------------|
|                         |                             | 0                  | 31–0              |
| 31–0                    | 1–0                         | 1                  | 63–32             |
| 51-0                    | 1-0                         | 2                  | 95–64             |
|                         |                             | 3                  | 127–96            |
|                         |                             | 0                  | 31–0              |
| (7.7)                   | 7.2                         | 1                  | 63–32             |
| 63–32                   | 3–2                         | 2                  | 95–64             |
|                         |                             | 3                  | 127–96            |
|                         |                             | 0                  | 31–0              |
| 05.64                   | <b>F</b> 4                  | 1                  | 63–32             |
| 95–64                   | 5–4                         | 2                  | 95–64             |
|                         |                             | 3                  | 127–96            |
|                         |                             | 0                  | 31–0              |
| 127.06                  | 7.6                         | 1                  | 63-32             |
| 127–96                  | 7–6                         | 2                  | 95–64             |
|                         |                             | 3                  | 127–96            |

### Table 1-4. Immediate-Byte Operand Encoding for PSHUFD

#### **Related Instructions**

PSHUFHW, PSHUFLW, PSHUFW

## **rFLAGS** Affected

None

## **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSHUFHW Packed Shuffle High Words

Moves any one of the four packed words in the high-order quadword of an XMM register or 128-bit memory location to each word in the high-order quadword of another XMM register. In each case, the value of the destination word is determined by a two-bit field in the immediate-byte operand, with bits 0 and 1 selecting the contents of the low-order word, bits 2 and 3 selecting the second word, bits 4 and 5 selecting the third word, and bits 6 and 7 selecting the high-order word. Refer to Table 1-5 on page 280. A word in the source operand may be copied to more than one word in the destination. The low-order quadword of the source operand is copied to the low-order quadword of the destination register.

| Mnemonic                        | Opcode                | Description                                                                                                                                                                      |  |  |
|---------------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| PSHUFHW xmm1, xmm2/mem128, imm8 | F3 0F 70 <i>/r ib</i> | Shuffles packed 16-bit values in high-order<br>quadword of an XMM register or 128-bit<br>memory location and puts the result in high-<br>order quadword of another XMM register. |  |  |
| xmm1                            |                       | xmm2/mem128                                                                                                                                                                      |  |  |
|                                 | 0<br>imm8<br>7 0<br>  | 127 112 111 96 95 80 79 64 63 0                                                                                                                                                  |  |  |

| Destination Bits Filled | Immediate-Byte<br>Bit Field | Value of Bit Field | Source Bits Moved |
|-------------------------|-----------------------------|--------------------|-------------------|
|                         |                             | 0                  | 79–64             |
| 79–64                   | 1–0                         | 1                  | 95-80             |
| 79-04                   | 1-0                         | 2                  | 111–96            |
|                         |                             | 3                  | 127–112           |
|                         |                             | 0                  | 79–64             |
| 05.00                   | 7.2                         | 1                  | 95–80             |
| 95–80                   | 3–2                         | 2                  | 111–96            |
|                         |                             | 3                  | 127-112           |
|                         |                             | 0                  | 79–64             |
| 111.00                  | F 4                         | 1                  | 95–80             |
| 111–96                  | 5–4                         | 2                  | 111–96            |
|                         |                             | 3                  | 127-112           |
|                         |                             | 0                  | 79–64             |
| 127 112                 | 7.0                         | 1                  | 95–80             |
| 127-112                 | 7–6                         | 2                  | 111–96            |
|                         |                             | 3                  | 127-112           |

### Table 1-5. Immediate-Byte Operand Encoding for PSHUFHW

#### **Related Instructions**

PSHUFD, PSHUFLW, PSHUFW

## **rFLAGS** Affected

None

## **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

## **PSHUFLW**

# **Packed Shuffle Low Words**

Moves any one of the four packed words in the low-order quadword of an XMM register or 128-bit memory location to each word in the low-order quadword of another XMM register. In each case, the selection of the value of the destination word is determined by a two-bit field in the immediate-byte operand, with bits 0 and 1 selecting the contents of the low-order word, bits 2 and 3 selecting the second word, bits 4 and 5 selecting the third word, and bits 6 and 7 selecting the high-order word. Refer to Table 1-6 on page 283. A word in the source operand may be copied to more than one word in the destination. The high-order quadword of the source operand is copied to the high-order quadword of the destination register.

| Mnemonic                        | Opcode                | Description                                                                                                                                                                    |
|---------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PSHUFLW xmm1, xmm2/mem128, imm8 | F2 0F 70 <i>/r ib</i> | Shuffles packed 16-bit values in low-order<br>quadword of an XMM register or 128-bit<br>memory location and puts the result in low-<br>order quadword of another XMM register. |
| xmm1                            |                       | xmm2/mem128                                                                                                                                                                    |
|                                 | 47 32 31 16 15 0 127  | 64 63 48 47 32 31 16 15 0                                                                                                                                                      |

| Destination Bits Filled | Immediate-Byte<br>Bit Field | Value of Bit Field | Source Bits Moved |
|-------------------------|-----------------------------|--------------------|-------------------|
|                         |                             | 0                  | 15–0              |
| 15.0                    | 1.0                         | 1                  | 31–16             |
| 15–0                    | 1–0                         | 2                  | 47–32             |
|                         |                             | 3                  | 63–48             |
|                         |                             | 0                  | 15–0              |
| 71.10                   | 7.2                         | 1                  | 31–16             |
| 31–16                   | 3–2                         | 2                  | 47–32             |
|                         |                             | 3                  | 63–48             |
|                         |                             | 0                  | 15–0              |
| 47–32                   | 5–4                         | 1                  | 31–16             |
| 47-32                   | 5-4                         | 2                  | 47–32             |
|                         |                             | 3                  | 63–48             |
|                         |                             | 0                  | 15–0              |
| 63–48                   | 7–6                         | 1                  | 31–16             |
| 03-40                   | 7-0                         | 2                  | 47–32             |
|                         |                             | 3                  | 63–48             |

### Table 1-6. Immediate-Byte Operand Encoding for PSHUFLW

#### **Related Instructions**

PSHUFD, PSHUFHW, PSHUFW

## **rFLAGS** Affected

None

## **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSLLD Packed Shift Left Logical Doublewords

Left-shifts each of the packed 32-bit values in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding doubleword of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The low-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater than 31, the destination is cleared to all 0s.

| Mnemonic                | Opcode                | Description                                                                                                                                 |
|-------------------------|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| PSLLD xmm1, xmm2/mem128 | 66 0F F2 <i>/r</i>    | Left-shifts packed doublewords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSLLD xmm, imm8         | 66 0F 72 /6 <i>ib</i> | Left-shifts packed doublewords in an XMM register by the amount specified in an immediate byte value.                                       |



pslld-128.eps

## 

AMD 64-Bit Technology

## **Related Instructions**

PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSLLDQ Packed Shift Left Logical Double Quadword

Left-shifts the 128-bit (double quadword) value in an XMM register by the number of bytes specified in an immediate byte value. The low-order bytes that are emptied by the shift operation are cleared to 0. If the shift value is greater than 15, the destination XMM register is cleared to all 0s.



#### **Related Instructions**

PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

#### **rFLAGS** Affected

None

### **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| •                         |      |                 |           | -                                                                                             |
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |

# PSLLQ Packed Shift Left Logical Quadwords

Left-shifts each 64-bit value in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding quadword of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The low-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater than 63, the destination is cleared to all 0s.

| Mnemonic                | Opcode                | Description                                                                                                                            |
|-------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| PSLLQ xmm1, xmm2/mem128 | 66 OF F3 /r           | Left-shifts packed quadwords in XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSLLQ xmm, imm8         | 66 0F 73 /6 <i>ib</i> | Left-shifts packed quadwords in an XMM register by the amount specified in an immediate byte value.                                    |



# 

AMD 64-Bit Technology

## **Related Instructions**

PSLLD, PSLLDQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSLLW Packed Shift Left Logical Words

Left-shifts each of the packed 16-bit values in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding word of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value

The low-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater than 15, the destination is cleared to all 0s.

| Mnemonic                | Opcode                | Description                                                                                                                           |
|-------------------------|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| PSLLW xmm1, xmm2/mem128 | 66 OF F1 /r           | Left-shifts packed words in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSLLW xmm, imm8         | 66 0F 71 /6 <i>ib</i> | Left-shifts packed words in an XMM register by the amount specified in an immediate byte value.                                       |



## 

AMD 64-Bit Technology

## **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

### **rFLAGS** Affected

None

## **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSRAD Packed Shift Right Arithmetic Doublewords

Right-shifts each of the packed 32-bit values in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding doubleword of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are filled with the sign bit of the doubleword's initial value. If the shift value is greater than 31, each doubleword in the destination is filled with the sign bit of the doubleword's initial value.

| Mnemonic                | Opcode                | Description                                                                                                                                  |
|-------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| PSRAD xmm1, xmm2/mem128 | 66 0F E2 <i>/r</i>    | Right-shifts packed doublewords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSRAD xmm, imm8         | 66 0F 72 /4 <i>ib</i> | Right-shifts packed doublewords in an XMM register by the amount specified in an immediate byte value.                                       |

# 

## AMD 64-Bit Technology



#### **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

## PSRAW

# **Packed Shift Right Arithmetic Words**

Right-shifts each of the packed 16-bit values in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding word of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are filled with the sign bit of the word's initial value. If the shift value is greater than 15, each word in the destination is filled with the sign bit of the word's initial value.

| Mnemonic                | Opcode                | Description                                                                                                                            |
|-------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| PSRAW xmm1, xmm2/mem128 | 66 OF E1 /r           | Right-shifts packed words in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSRAW xmm, imm8         | 66 0F 71 /4 <i>ib</i> | Right-shifts packed words in an XMM register by the amount specified in an immediate byte value.                                       |



#### **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRLD, PSRLDQ, PSRLQ, PSRLW

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSRLD Packed Shift Right Logical Doublewords

Right-shifts each of the packed 32-bit values in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding doubleword of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater than 31, the destination is cleared to 0.

| Mnemonic                | Opcode                | Description                                                                                                                                  |
|-------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| PSRLD xmm1, xmm2/mem128 | 66 0F D2 /r           | Right-shifts packed doublewords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSRLD xmm, imm8         | 66 0F 72 /2 <i>ib</i> | Right-shifts packed doublewords in an XMM register by the amount specified in an immediate byte value.                                       |



### 

AMD 64-Bit Technology

#### **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLDQ, PSRLQ, PSRLW

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSRLDQ Packed Shift Right Logical Double Quadword

Right-shifts the 128-bit (double quadword) value in an XMM register by the number of bytes specified in an immediate byte value. The high-order bytes that are emptied by the shift operation are cleared to 0. If the shift value is greater than 15, the destination XMM register is cleared to all 0s.



#### **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

AMD 64-Bit Technology

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| •                         |      |                 |           | -                                                                                             |
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |

# PSRLQ Packed Shift Right Logical Quadwords

Right-shifts each 64-bit value in the first source operand by the number of bits specified in the second source operand and writes each shifted value in the corresponding quadword of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater than 63, the destination is cleared to 0.

| Mnemonic                               | Opcode                | Description                                                                                                                                |
|----------------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
| PSRLQ <i>xmm1</i> , <i>xmm2/mem128</i> | 66 0F D3 /r           | Right-shifts packed quadwords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSRLQ xmm, imm8                        | 66 0F 73 /2 <i>ib</i> | Right-shifts packed quadwords in an XMM register by the amount specified in an immediate byte value.                                       |



psrlq-128.eps

shift right

### 

AMD 64-Bit Technology

#### **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLW

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PSRLW Packed Shift Right Logical Words

Right-shifts each of the packed 16-bit values in the first source operand by the number of bits specified in the second operand and writes each shifted value in the corresponding word of the destination (first source). The first source/destination and second source operands are:

- an XMM register and another XMM register or 128-bit memory location, or
- an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater than 15, the destination is cleared to 0.

| Mnemonic                | Opcode                | Description                                                                                                                            |
|-------------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| PSRLW xmm1, xmm2/mem128 | 66 OF D1 /r           | Right-shifts packed words in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location. |
| PSRLW xmm, imm8         | 66 0F 71 /2 <i>ib</i> | Right-shifts packed words in an XMM register by the amount specified in an immediate byte value.                                       |

# 

#### AMD 64-Bit Technology



#### **Related Instructions**

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

AMD 64-Bit Technology

### **PSUBB**

# **Packed Subtract Bytes**

Subtracts each packed 8-bit integer value in the second source operand from the corresponding packed 8-bit integer in the first source operand and writes the integer result of each subtraction in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



psubb-128.eps

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written in the destination.

#### **Related Instructions**

PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

#### rFLAGS Affected

# **MXCSR Flags Affected**

None

|                           |      | Virtual |           |                                                                                                  |
|---------------------------|------|---------|-----------|--------------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                               |
| Invalid opcode, #UD       | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х       | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х       | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |         | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                     |

AMD 64-Bit Technology

### PSUBD

# Packed Subtract Doublewords

Subtracts each packed 32-bit integer value in the second source operand from the corresponding packed 32-bit integer in the first source operand and writes the integer result of each subtraction in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



psubd-128.eps

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of each result are written in the destination.

#### **Related Instructions**

PSUBB, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

None

|                           |      | Virtual |           |                                                                                               |
|---------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                            |
| Invalid opcode, #UD       | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP   | Х    | Х       | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                           |      |         | х         | A null data segment was used to reference memory.                                             |
|                           | Х    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                  |

# PSUBQ

# **Packed Subtract Quadword**

Subtracts each packed 64-bit integer value in the second source operand from the corresponding packed 64-bit integer in the first source operand and writes the integer result of each subtraction in the corresponding quadword of the destination (first source). The first source/destination and source operands are an XMM register and another XMM register or 128-bit memory location.





This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of each result are written in the destination.

#### **Related Instructions**

PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# **PSUBSB**

# Packed Subtract Signed With Saturation Bytes

Subtracts each packed 8-bit signed integer value in the second source operand from the corresponding packed 8-bit signed integer in the first source operand and writes the signed integer result of each subtraction in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 80h.

#### **Related Instructions**

PSUBB, PSUBD, PSUBQ, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

None

|                           |      | Virtual |           |                                                                                               |
|---------------------------|------|---------|-----------|-----------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                            |
| Invalid opcode, #UD       | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | х       | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP   | Х    | Х       | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                           |      |         | х         | A null data segment was used to reference memory.                                             |
|                           | Х    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                  |

### **PSUBSW**

# Packed Subtract Signed With Saturation Words

Subtracts each packed 16-bit signed integer value in the second source operand from the corresponding packed 16-bit signed integer in the first source operand and writes the signed integer result of each subtraction in the corresponding word of the destination (first source). The first source/destination and source operands are an XMM register and another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest signed 16-bit integer, it is saturated to 7FFFh, and if the value is smaller than the smallest signed 16-bit integer, it is saturated to 8000h.

#### **Related Instructions**

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBUSB, PSUBUSW, PSUBW

#### rFLAGS Affected

# **MXCSR Flags Affected**

None

|                           |      | Virtual |           |                                                                                                  |
|---------------------------|------|---------|-----------|--------------------------------------------------------------------------------------------------|
| Exception                 | Real | 8086    | Protected | Cause of Exception                                                                               |
| Invalid opcode, #UD       | Х    | Х       | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х       | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х       | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х       | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х       | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X       | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |         | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х       | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х       | Х         | A page fault resulted from the execution of the instruction.                                     |

# **PSUBUSB**

# **Packed Subtract Unsigned and Saturate Bytes**

Subtracts each packed 8-bit unsigned integer value in the second source operand from the corresponding packed 8-bit unsigned integer in the first source operand and writes the unsigned integer result of each subtraction in the corresponding byte of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest unsigned 8-bit integer, it is saturated to FFh, and if the value is smaller than the smallest unsigned 8-bit integer, it is saturated to 00h.

#### **Related Instructions**

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSW, PSUBW

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### **PSUBUSW**

# Packed Subtract Unsigned and Saturate Words

Subtracts each packed 16-bit unsigned integer value in the second source operand from the corresponding packed 16-bit unsigned integer in the first source operand and writes the unsigned integer result of each subtraction in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



For each packed value in the destination, if the value is larger than the largest unsigned 16-bit integer, it is saturated to FFFFh, and if the value is smaller than the smallest unsigned 16-bit integer, it is saturated to 0000h.

#### **Related Instructions**

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBW

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | x         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

AMD 64-Bit Technology

### **PSUBW**

# **Packed Subtract Words**

Subtracts each packed 16-bit integer value in the second source operand from the corresponding packed 16-bit integer in the first source operand and writes the integer result of each subtraction in the corresponding word of the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



psubw-128.eps

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of the result are written in the destination.

#### **Related Instructions**

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### **PUNPCKHBW**

# **Unpack and Interleave High Bytes**

Unpacks the high-order bytes from the first and second source operands and packs them into interleaved-byte words in the destination (first source). The low-order bytes of the source operands are ignored. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



If the second source operand is all 0s, the destination contains the bytes from the first source operand zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 16-bit operands for subsequent processing that requires higher precision.

#### **Related Instructions**

PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLQDQ, PUNPCKLWD

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PUNPCKHDQ

# **Unpack and Interleave High Doublewords**

Unpacks the high-order doublewords from the first and second source operands and packs them into interleaved-doubleword quadwords in the destination (first source). The low-order doublewords of the source operands are ignored. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



If the second source operand is all 0s, the destination contains the doubleword(s) from the first source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to unsigned 64-bit operands for subsequent processing that requires higher precision.

#### **Related Instructions**

# PUNPCKHBW, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLQDQ, PUNPCKLWD

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### PUNPCKHQDQ

# **Unpack and Interleave High Quadwords**

Unpacks the high-order quadwords from the first and second source operands and packs them into interleaved quadwords in the destination (first source). The first source/destination is an XMM register, and the second source operand is another XMM register or 128-bit memory location. The low-order quadwords of the source operands are ignored.

If the second source operand is all 0s, the destination contains the quadword from the first source operand zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned 128-bit operands for subsequent processing that requires higher precision.

| Mnemonic                     | Opcode             | Description                                                                                                                                                                             |
|------------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PUNPCKHQDQ xmm1, xmm2/mem128 | 66 0F 6D <i>/r</i> | Unpacks high-order quadwords in an XMM register<br>and another XMM register or 128-bit memory location<br>and packs them into interleaved quadwords in the<br>destination XMM register. |
| xmm1                         |                    | xmm2/mem128                                                                                                                                                                             |
| 127 64 63                    | 0                  | 127 64 63 0                                                                                                                                                                             |
| сору                         |                    | сору                                                                                                                                                                                    |
|                              |                    |                                                                                                                                                                                         |
| 127                          | 64 63              | 3 0 punpckhqdq.eps                                                                                                                                                                      |

#### **Related Instructions**

PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLQDQ, PUNPCKLWD

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### PUNPCKHWD

# **Unpack and Interleave High Words**

Unpacks the high-order words from the first and second source operands and packs them into interleaved-word doublewords in the destination (first source). The loworder words of the source operands are ignored. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



If the second source operand is all 0s, the destination contains the words from the first source operand zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 32-bit operands for subsequent processing that requires higher precision.

#### **Related Instructions**

# PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKLBW, PUNPCKLDQ, PUNPCKLQDQ, PUNPCKLWD

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### PUNPCKLBW

# **Unpack and Interleave Low Bytes**

Unpacks the low-order bytes from the first and second source operands and packs them into interleaved-byte words in the destination (first source). The high-order bytes of the source operands are ignored. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



If the second source operand is all 0s, the destination contains the bytes from the first source operand zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 16-bit operands for subsequent processing that requires higher precision.

#### **Related Instructions**

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLDQ, PUNPCKLQDQ, PUNPCKLWD

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

# PUNPCKLDQ

# **Unpack and Interleave Low Doublewords**

Unpacks the low-order doublewords from the first and second source operands and packs them into interleaved-doubleword quadwords in the destination (first source). The high-order doublewords of the source operands are ignored. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



If the second source operand is all 0s, the destination contains the doubleword(s) from the first source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to unsigned 64-bit operands for subsequent processing that requires higher precision.

#### **Related Instructions**

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLQDQ, PUNPCKLWD

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### PUNPCKLQDQ

# **Unpack and Interleave Low Quadwords**

Unpacks the low-order quadwords from the first and second source operands and packs them into interleaved quadwords in the destination (first source). The first source/destination is an XMM register, and the second source operand is another XMM register or 128-bit memory location. The high-order quadwords of the source operands are ignored.

If the second source operand is all 0s, the destination contains the quadword from the first source operand zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned 128-bit operands for subsequent processing that requires higher precision.

| Mnemonic                     | Opcode             | Description                                                                                                                                                                            |
|------------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PUNPCKLQDQ xmm1, xmm2/mem128 | 66 0F 6C <i>/r</i> | Unpacks low-order quadwords in an XMM register<br>and another XMM register or 128-bit memory location<br>and packs them into interleaved quadwords in the<br>destination XMM register. |
| xmm1                         |                    | xmm2/mem128                                                                                                                                                                            |
| 127 64 63                    | 0                  | 127 64 63 0                                                                                                                                                                            |
|                              | сору               | сору                                                                                                                                                                                   |
|                              |                    |                                                                                                                                                                                        |
| 127                          | 64 63              | 0 punpcklqdq.eps                                                                                                                                                                       |

#### **Related Instructions**

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

### PUNPCKLWD

# **Unpack and Interleave Low Words**

Unpacks the low-order words from the first and second source operands and packs them into interleaved-word doublewords in the destination (first source). The highorder words of the source operands are ignored. The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



If the second source operand is all 0s, the destination contains the words from the first source operand zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 32-bit operands for subsequent processing that requires higher precision.

#### **Related Instructions**

# PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLQQ, PUNPCKLQDQ

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | х               | x         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

AMD 64-Bit Technology

# PXOR Packed Logical Bitwise Exclusive OR

Performs a bitwise exclusive OR of the values in the first and second source operands and writes the result in the destination (first source). The first source/destination operand is an XMM register and the second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

PAND, PANDN, POR

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              | Х    | X               | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

# RCPPS Reciprocal Packed Single-Precision Floating-Point

Computes the approximate reciprocal of each of the four packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the corresponding doubleword of another XMM register. The rounding control bits (RC) in the MXCSR register have no effect on the result.

The maximum relative error is less than or equal to  $1.5 \times 2^{-12}$ . A source value of 0.0 returns an infinity of the source value's sign. Denormal source operands are treated as signed 0.0. Results that underflow are changed to signed 0.0. For both SNaN and QNaN source operands, a QNaN is returned.

| Mnemonic                | Opcode   | Description                                                                                                                                                            |
|-------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RCPPS xmm1, xmm2/mem128 | 0F 53 /r | Computes reciprocals of packed single-precision floating-point values in an XMM register or 128-bit memory location and writes result in the destination XMM register. |



#### **Related Instructions**

RCPSS, RSQRTPS, RSQRTSS

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

rcpss.eps

# RCPSS Reciprocal Scalar Single-Precision Floating-Point

Computes the approximate reciprocal of the low-order single-precision floating-point value in an XMM register or in a 32-bit memory location and writes the result in the low-order doubleword of another XMM register. The three high-order doublewords in the destination XMM register are not modified. The rounding control bits (RC) in the MXCSR register have no effect on the result.

The maximum relative error is less than or equal to  $1.5 \times 2^{-12}$ . A source value of 0.0 returns an infinity of the source value's sign. Denormal source operands are treated as signed 0.0. Results that underflow are changed to signed 0.0. For both SNaN and QNaN source operands, a QNaN is returned.



#### **Related Instructions**

RCPPS, RSQRTPS, RSQRTSS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |
| Alignment check, #AC      |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.                |

# RSQRTPS Reciprocal Square Root Packed Single-Precision Floating-Point

Computes the approximate reciprocal of the square root of each of the four packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the corresponding doubleword of another XMM register. The rounding control bits (RC) in the MXCSR register have no effect on the result.

The maximum relative error for the approximate reciprocal square root is less than or equal to  $1.5 \times 2^{-12}$ . A source value of 0.0 returns an infinity of the source value's sign. Denormal source operands are treated as signed 0.0. For negative source values other than 0.0, the QNaN floating-point indefinite value ("Indefinite Values" in Volume 1) is returned. For both SNaN and QNaN source operands, a QNaN is returned.

| Mnemonic                  | Opcode          | Description                                                                                                                                                                                |
|---------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RSQRTPS xmm1, xmm2/mem128 | 0F 52 <i>/r</i> | Computes reciprocals of square roots of packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the destination XMM register. |



#### **Related Instructions**

RSQRTSS, SQRTPD, SQRTPS, SQRTSD, SQRTSS

26568–Rev. 3.02–August 2002

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP   | X    | Х               | X         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                           |      |                 | х         | A null data segment was used to reference memory.                                            |
|                           | Х    | Х               | x         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

# RSQRTSS Reciprocal Square Root Scalar Single-Precision Floating-Point

Computes the approximate reciprocal of the square root of the low-order singleprecision floating-point value in an XMM register or in a 32-bit memory location and writes the result in the low-order doubleword of another XMM register. The three high-order doublewords in the destination XMM register are not modified. The rounding control bits (RC) in the MXCSR register have no effect on the result.

The maximum relative error for the approximate reciprocal square root is less than or equal to  $1.5 * 2^{-12}$ . A source value of 0.0 returns an infinity of the source value's sign. Denormal source operands are treated as signed 0.0. For negative source values other than 0.0, the QNaN floating-point indefinite value ("Indefinite Values" in Volume 1) is returned. For both SNaN and QNaN source operands, a QNaN is returned.

| Mnemonic                 | Opcode             | Description                                                                                                                                                                            |
|--------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RSQRTPS xmm1, xmm2/mem32 | F3 0F 52 <i>/r</i> | Computes reciprocal of square root of single-precision floating-<br>point value in an XMM register or 32-bit memory location and<br>writes the result in the destination XMM register. |



#### **Related Instructions**

RSQRTPS, SQRTPD, SQRTPS, SQRTSD, SQRTSS

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP   | Х    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                            |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |
| Alignment check, #AC      |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.            |

### SHUFPD

# **Shuffle Packed Double-Precision Floating-Point**

Moves either of the two packed double-precision floating-point values in the first source operand to the low-order quadword of the destination (first source) and moves either of the two packed double-precision floating-point values in the second source operand to the high-order quadword of the destination. In each case, the value of the destination quadword is determined by the least-significant two bits in the immediate-byte operand, as shown in Table 1-7. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### Table 1-7. Immediate-Byte Operand Encoding for SHUFPD

| Destination Bits Filled | Immediate-Byte<br>Bit Field | Value of Bit<br>Field | Source 1 Bits Moved | Source 2 Bits Moved |
|-------------------------|-----------------------------|-----------------------|---------------------|---------------------|
| 63–0                    | 0                           | 0                     | 63–0                | _                   |
| 83-0                    | U                           | 1                     | 127–64              | -                   |
| 127 64                  | 1                           | 0                     | -                   | 63–0                |
| 127–64                  | I                           | 1                     | -                   | 127–64              |

#### **Related Instructions**

SHUFPS

#### rFLAGS Affected

None

### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | X         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |

# 

AMD 64-Bit Technology

| Exception               | Real | Virtual<br>8086 | Protected | Cause of Exception                                                      |
|-------------------------|------|-----------------|-----------|-------------------------------------------------------------------------|
| Stack, #SS              | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical. |
| General protection, #GP | X    | X               | X         | A memory address exceeded a data segment limit or was non-canonical.    |
|                         |      |                 | х         | A null data segment was used to reference memory.                       |
|                         | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.               |
| Page fault, #PF         |      | Х               | Х         | A page fault resulted from the execution of the instruction.            |

#### **Shuffle Packed Single-Precision Floating-Point** SHUFPS

Moves two of the four packed single-precision floating-point values in the first source operand to the low-order quadword of the destination (first source) and moves two of the four packed single-precision floating-point values in the second source operand to the high-order quadword of the destination. In each case, the value of the destination doubleword is determined by a two-bit field in the immediate-byte operand, as shown in Table 1-8 on page 354. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.





xmm2/mem128

| Destination Bits Filled | Immediate-Byte<br>Bit Field | Value of Bit<br>Field | Source 1 Bits Moved | Source 2 Bits Moved                                                                                                                                                                                                                                                                                                                                                                                                                      |  |
|-------------------------|-----------------------------|-----------------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
|                         |                             | 0                     | 31–0                | Source 2 Bits Moved         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         –         31–0         63–32         95–64         95–64 |  |
| 31–0                    | 1–0                         | 1                     | 63–32               | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
| 51-0                    | 1-0                         | 2                     | 95–64               | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
|                         |                             | 3                     | 127–96              | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
|                         |                             | 0                     | 31–0                | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
| 63–32                   | 3–2                         | 1                     | 63–32               | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
| 03-32                   | 5-2                         | 2                     | 95–64               | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
|                         |                             | 3                     | 127–96              | -                                                                                                                                                                                                                                                                                                                                                                                                                                        |  |
|                         |                             | 0                     | -                   | 63-32<br>95-64<br>127-96<br>31-0<br>63-32                                                                                                                                                                                                                                                                                                                                                                                                |  |
| 95–64                   | 5–4                         | 1                     | -                   | 63–32                                                                                                                                                                                                                                                                                                                                                                                                                                    |  |
| 95-64                   | 5-4                         | 2                     | -                   | 95–64                                                                                                                                                                                                                                                                                                                                                                                                                                    |  |
|                         |                             | 3                     | -                   | 127–96                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |
|                         |                             | 0                     | -                   | 31–0                                                                                                                                                                                                                                                                                                                                                                                                                                     |  |
| 127–96                  | 7–6                         | 1                     | -                   | 63–32                                                                                                                                                                                                                                                                                                                                                                                                                                    |  |
| 127-90                  | 7-0                         | 2                     | -                   | 95–64                                                                                                                                                                                                                                                                                                                                                                                                                                    |  |
|                         |                             | 3                     | _                   | 127–96                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |

### Table 1-8. Immediate-Byte Operand Encoding for SHUFPS

#### **Related Instructions**

SHUFPS

### rFLAGS Affected

None

### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                           |      |                 | х         | A null data segment was used to reference memory.                                            |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

sqrtpd.eps

# SQRTPD Square Root Packed Double-Precision Floating-Point

Computes the square root of each of the two packed double-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the corresponding quadword of another XMM register. Taking the square root of +infinity returns +infinity.



#### **Related Instructions**

RSQRTPS, RSQRTSS, SQRTPS, SQRTSD, SQRTSS

#### rFLAGS Affected

# **MXCSR Flags Affected**

| FZ                  | R                                                                                          | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|--------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                            |    |    |    |    |    |    |    |     | М  |    |    |    | М  | М  |
| 15                  | 14                                                                                         | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | ote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

# 

AMD 64-Bit Technology

|                                        |      | Virtual |           |                                                                      |  |  |  |  |
|----------------------------------------|------|---------|-----------|----------------------------------------------------------------------|--|--|--|--|
| Exception                              | Real | 8086    | Protected | Cause of Exception                                                   |  |  |  |  |
| SIMD Floating-Point Exceptions         |      |         |           |                                                                      |  |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | Х       | X         | A source operand was an SNaN value.                                  |  |  |  |  |
|                                        | Х    | Х       | Х         | A source operand was negative (not including –0).                    |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х       | Х         | A source operand was a denormal value.                               |  |  |  |  |
| Precision exception (PE)               | Х    | Х       | Х         | A result could not be represented exactly in the destination format. |  |  |  |  |

# SQRTPS Square Root Packed Single-Precision Floating-Point

Computes the square root of each of the four packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the corresponding doubleword of another XMM register. Taking the square root of +infinity returns +infinity.

| Mnemonic                 | Opcode   |
|--------------------------|----------|
| SQRTPS xmm1, xmm2/mem128 | 0F 51 /r |

Description

Computes square roots of packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the destination XMM register.



#### **Related Instructions**

RSQRTPS, RSQRTSS, SQRTPD, SQRTSD, SQRTSS

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

| FZ                  | R                                                                                           | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|---------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                             |    |    |    |    |    |    |    |     | М  |    |    |    | М  | М  |
| 15                  | 14                                                                                          | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Iote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | X         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | х         | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                   |  |  |  |  |
|----------------------------------------|------|-----------------|-----------|----------------------------------------------------------------------|--|--|--|--|
| SIMD Floating-Point Exceptions         |      |                 |           |                                                                      |  |  |  |  |
| Invalid-operation<br>exception (IE)    | Х    | Х               | X         | A source operand was an SNaN value.                                  |  |  |  |  |
|                                        | Х    | Х               | Х         | A source operand was negative (not including –0).                    |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х               | X         | A source operand was a denormal value.                               |  |  |  |  |
| Precision exception (PE)               | Х    | Х               | X         | A result could not be represented exactly in the destination format. |  |  |  |  |

# SQRTSD Square Root Scalar Double-Precision Floating-Point

Computes the square root of the low-order double-precision floating-point value in an XMM register or in a 64-bit memory location and writes the result in the low-order quadword of another XMM register. The high-order quadword of the destination XMM register is not modified. Taking the square root of +infinity returns +infinity.



Opcode

Description

SQRTSD xmm1, xmm2/mem64

F2 0F 51 /r

Computes square root of double-precision floating-point value in an XMM register or 64-bit memory location and writes the result in the destination XMM register.



#### **Related Instructions**

RSQRTPS, RSQRTSS, SQRTPD, SQRTPS, SQRTSS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <b>Note:</b><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                              | Real | Virtual<br>8086 | Protected   | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                    | Х    | X               | Х           | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                        | Х    | х               | Х           | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                        | Х    | x               | x           | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                        | Х    | Х               | Х           | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM              | Х    | x               | Х           | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                             | Х    | X               | X           | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP                | Х    | X               | X           | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | х           | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х           | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X           | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | Х    | X               | Х           | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        |      | SIN             | AD Floating | Point Exceptions                                                                                                                                    |
| Invalid operation                      | Х    | X               |             |                                                                                                                                                     |
| Invalid-operation<br>exception (IE)    | X    | ~               | Ā           | A source operand was an SNaN value.                                                                                                                 |
|                                        | Х    | Х               | х           | A source operand was negative (not including –0).                                                                                                   |
| Denormalized-operand<br>exception (DE) | Х    | X               | Х           | A source operand was a denormal value.                                                                                                              |
| Precision exception (PE)               | Х    | X               | X           | A result could not be represented exactly in the destination format.                                                                                |

# SQRTSS Square Root Scalar Single-Precision Floating-Point

Computes the square root of the low-order single-precision floating-point value in an XMM register or 32-bit memory location and writes the result in the low-order doubleword of another XMM register. The three high-order doublewords of the destination XMM register are not modified. Taking the square root of +infinity returns +infinity.

| Mnemonic                | Opcode      | Description                                                                                                                                                             |
|-------------------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SQRTSS xmm1, xmm2/mem32 | F3 0F 51 /r | Computes square root of single-precision floating-point value in<br>an XMM register or 32-bit memory location and writes the result<br>in the destination XMM register. |



#### **Related Instructions**

RSQRTPS, RSQRTSS, SQRTPD, SQRTPS, SQRTSD

### rFLAGS Affected

# **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | С  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  |    |    |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | X    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х         | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | Х         | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |

# 

AMD 64-Bit Technology

|                                        |      | Virtual |           |                                                                      |  |  |  |  |  |  |
|----------------------------------------|------|---------|-----------|----------------------------------------------------------------------|--|--|--|--|--|--|
| Exception                              | Real | 8086    | Protected | Cause of Exception                                                   |  |  |  |  |  |  |
| SIMD Floating-Point Exceptions         |      |         |           |                                                                      |  |  |  |  |  |  |
| Invalid-operation<br>exception (IE)    | X    | Х       | X         | A source operand was an SNaN value.                                  |  |  |  |  |  |  |
|                                        | Х    | Х       | Х         | A source operand was negative (not including –0).                    |  |  |  |  |  |  |
| Denormalized-operand<br>exception (DE) | Х    | Х       | Х         | A source operand was a denormal value.                               |  |  |  |  |  |  |
| Precision exception (PE)               | Х    | Х       | Х         | A result could not be represented exactly in the destination format. |  |  |  |  |  |  |

# STMXCSR Store MXCSR Control/Status Register

Saves the contents of the MXCSR register in a 32-bit location in memory. The MXCSR register is described in "Registers" in Volume 1.

| Mnemonic                    | Opcode   | Description                                         |
|-----------------------------|----------|-----------------------------------------------------|
| STMXCSR mem32               | 0F AE /3 | Stores contents of MXCSR in 32-bit memory location. |
| <b>Related Instructions</b> |          |                                                     |
| LDMXCSR                     |          |                                                     |

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.             |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |

# 

AMD 64-Bit Technology

| Exception               | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|-------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| General protection, #GP | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.              |
|                         |      |                 | Х         | A null data segment was used to reference memory.                                 |
|                         |      |                 | Х         | The destination operand was in a non-writable segment.                            |
| Page fault, #PF         |      | Х               | Х         | A page fault resulted from the execution of the instruction.                      |
| Alignment check, #AC    |      | Х               | Х         | An unaligned memory reference was performed while alignment checking was enabled. |

### SUBPD Subtract Packed Double-Precision Floating-Point

Subtracts each packed double-precision floating-point value in the second source operand from the corresponding packed double-precision floating-point value in the first source operand and writes the result of each subtraction in the corresponding quadword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

SUBPS, SUBSD, SUBSS

#### **rFLAGS** Affected

# **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | X               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | X    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS X                          |      | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | X               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х    | Х               | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | X    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIA             | ND Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | X    | X               | X            | A source operand was an SNaN value.                                                                                                                 |
|                                       | Х    | Х               | Х            | +infinity was subtracted from +infinity.                                                                                                            |
|                                       | Х    | Х               | Х            | -infinity was subtracted from -infinity.                                                                                                            |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                            |
| Overflow exception (OE)                | X    | Х               | X         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)               | Х    | Х               | Х         | A rounded result was too small to fit into the format of the destination operand. |
| Precision exception (PE)               | Х    | Х               | Х         | A result could not be represented exactly in the destination format.              |

### **SUBPS**

## **Subtract Packed Single-Precision Floating-Point**

Subtracts each packed single-precision floating-point value in the second source operand from the corresponding packed single-precision floating-point value in the first source operand and writes the result of each subtraction in the corresponding doubleword of the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

#### Mnemonic

SUBPS xmm1, xmm2/mem128

Opcode

0F 5C /r

#### Description

Subtracts packed single-precision floating-point values in an XMM register or 128-bit memory location from packed single-precision floating-point values in another XMM register and writes the result in the destination XMM register.



#### **Related Instructions**

SUBPD, SUBSD, SUBSS

#### **rFLAGS** Affected

### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real                    | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|-------------------------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х                       | X               | Х            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х                       | х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | X                       | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | X                       | Х               | Х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х                       | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х                       | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х                       | X               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |                         |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
|                                       | Х                       | Х               | Х            | The memory operand was not aligned on a 16-byte boundary.                                                                                           |
| Page fault, #PF                       |                         | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| SIMD Floating-Point<br>Exception, #XF | MD Floating-Point X X X |                 | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       | 1                       | SIA             | ND Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х                       | X               | Х            | A source operand was an SNaN value.                                                                                                                 |
|                                       | Х                       | Х               | Х            | +infinity was subtracted from +infinity.                                                                                                            |
|                                       | Х                       | Х               | Х            | -infinity was subtracted from -infinity.                                                                                                            |

AMD 64-Bit Technology

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                            |
| Overflow exception (OE)                | Х    | Х               | X         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)               | Х    | Х               | X         | A rounded result was too small to fit into the format of the destination operand. |
| Precision exception (PE)               | Х    | Х               | X         | A result could not be represented exactly in the destination format.              |

### SUBSD Su

### **Subtract Scalar Double-Precision Floating-Point**

Subtracts the double-precision floating-point value in the low-order quadword of the second source operand from the double-precision floating-point value in the low-order quadword of the first source operand and writes the result in the low-order quadword of the destination (first source). The high-order quadword of the destination is not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 64-bit memory location.





#### **Related Instructions**

SUBPD, SUBPS, SUBSS

#### **rFLAGS** Affected

### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х            | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                    |
|                                       | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | Х               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | X               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIN             | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | X               | Х            | A source operand was an SNaN value.                                                                                                                 |
| , /                                   | Х    | Х               | Х            | +infinity was subtracted from +infinity.                                                                                                            |
|                                       | Х    | Х               | Х            | -infinity was subtracted from -infinity.                                                                                                            |

| Exception                              | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|----------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Denormalized-operand<br>exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                            |
| Overflow exception (OE)                | X    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)               | Х    | Х               | X         | A rounded result was too small to fit into the format of the destination operand. |
| Precision exception (PE)               | Х    | Х               | X         | A result could not be represented exactly in the destination format.              |

## SUBSS

## **Subtract Scalar Single-Precision Floating-Point**

Subtracts the single-precision floating-point value in the low-order doubleword of the second source operand from the single-precision floating-point value in the low-order doubleword of the first source operand and writes the result in the low-order doubleword of the destination (first source). The three high-order doublewords of the destination are not modified. The first source/destination operand is an XMM register. The second source operand is another XMM register or 32-bit memory location.

| Mnemonic               | Opcode             | Description                                                                                                                                                                                                                                     |
|------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SUBSS xmm1, xmm2/mem32 | F3 0F 5C <i>/r</i> | Subtracts low-order single-precision floating-point value in an XMM register or in a 32-bit memory location from low-order single-precision floating-point value in another XMM register and writes the result in the destination XMM register. |



#### **Related Instructions**

SUBPD, SUBPS, SUBSD

#### **rFLAGS** Affected

### **MXCSR Flags Affected**

| FZ                  | R                                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|----------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                                    |    |    |    |    |    |    |    |     | М  | М  | М  |    | М  | М  |
| 15                  | 14                                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | <i>Note:</i><br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                             | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|---------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD                   | Х    | Х               | Х            | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.                                                        |
|                                       | Х    | Х               | Х            | The emulate bit (EM) of CR0 was set to 1.                                                                                                           |
|                                       | Х    | Х               | Х            | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                   |
|                                       | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details.    |
| Device not available, #NM             | Х    | Х               | Х            | The task-switch bit (TS) of CR0 was set to 1.                                                                                                       |
| Stack, #SS                            | Х    | Х               | X            | A memory address exceeded the stack segment limit or was non-canonical.                                                                             |
| General protection, #GP               | Х    | Х               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                       |      |                 | Х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                       |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                  |      | Х               | Х            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF | Х    | Х               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                       |      | SIN             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)   | Х    | X               | Х            | A source operand was an SNaN value.                                                                                                                 |
| /                                     | Х    | Х               | Х            | +infinity was subtracted from +infinity.                                                                                                            |
|                                       | Х    | Х               | Х            | -infinity was subtracted from -infinity.                                                                                                            |

AMD 64-Bit Technology

| Exception                           | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                |
|-------------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------|
| Denormalized-operand exception (DE) | Х    | Х               | Х         | A source operand was a denormal value.                                            |
| Overflow exception (OE)             | Х    | Х               | Х         | A rounded result was too large to fit into the format of the destination operand. |
| Underflow exception (UE)            | Х    | Х               | X         | A rounded result was too small to fit into the format of the destination operand. |
| Precision exception (PE)            | Х    | Х               | X         | A result could not be represented exactly in the destination format.              |

## UCOMISD Unordered Compare Scalar Double-Precision Floating-Point

Performs an unordered compare of the double-precision floating-point value in the low-order 64 bits of an XMM register with the double-precision floating-point value in the low-order 64 bits of another XMM register or a 64-bit memory location and sets the ZF, PF, and CF bits in the rFLAGS register to reflect the result of the compare. The result is unordered if one or both of the operand values is a NaN. The OF, AF, and SF bits in rFLAGS are set to zero.

If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

| Mnemonic                 | Opcode     | Description                                                                                                                           |  |  |  |  |  |
|--------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| UCOMISD xmm1, xmm2/mem64 | 66 0F 2E/r | Compares scalar double-precision floating-point values in an XMM register and an XMM register or 64-bit memory location. Sets rFLAGS. |  |  |  |  |  |



| Result of Compare | ZF | PF | CF |
|-------------------|----|----|----|
| Unordered         | 1  | 1  | 1  |
| Greater Than      | 0  | 0  | 0  |
| Less Than         | 0  | 0  | 1  |
| Equal             | 1  | 0  | 0  |

#### **Related Instructions**

#### CMPPD, CMPPS, CMPSD, CMPSS, COMISD, COMISS, UCOMISS

#### **rFLAGS** Affected

| ID | VIP | VIF | AC | VM | RF | NT | IOPL  | OF | DF | IF | TF | SF | ZF | AF | PF | CF |
|----|-----|-----|----|----|----|----|-------|----|----|----|----|----|----|----|----|----|
|    |     |     |    |    |    |    |       | 0  |    |    |    | 0  | М  | 0  | М  | М  |
| 21 | 20  | 19  | 18 | 17 | 16 | 14 | 13-12 | 11 | 10 | 9  | 8  | 7  | 6  | 4  | 2  | 0  |

Note:

*Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.* 

#### **MXCSR Flags Affected**

| FZ                  | R                                                                                           | C  | PM | UM | OM | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|---------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                             |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                          | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | Iote:<br>A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1.                                                 |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | х    | Х               | X         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| General protection, #GP                | X    | X               | Х            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | x            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | X               | X            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        | •    | SIN             | AD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | X            | A source operand was an SNaN value.                                                                                                                 |
| Denormalized-operand<br>exception (DE) | X    | Х               | X            | A source operand was a denormal value.                                                                                                              |

## UCOMISS Unordered Compare Scalar Single-Precision Floating-Point

Performs an unordered compare of the single-precision floating-point value in the loworder 32 bits of an XMM register with the single-precision floating-point value in the low-order 32 bits of another XMM register or a 32-bit memory location and sets the ZF, PF, and CF bits in the rFLAGS register to reflect the result. The result is unordered if one or both of the operand values is a NaN. The OF, AF, and SF bits in rFLAGS are set to zero.

If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

| Mnemonic                 | Opcode          | Description                                                                                                                           |
|--------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------|
| UCOMISS xmm1, xmm2/mem32 | 0F 2E <i>/r</i> | Compares scalar single-precision floating-point values in an XMM register and an XMM register or 32-bit memory location. Sets rFLAGS. |



| Result of Compare | ZF | PF | CF |
|-------------------|----|----|----|
| Unordered         | 1  | 1  | 1  |
| Greater Than      | 0  | 0  | 0  |
| Less Than         | 0  | 0  | 1  |
| Equal             | 1  | 0  | 0  |

#### **Related Instructions**

#### CMPPD, CMPPS, CMPSD, CMPSS, COMISD, COMISS, UCOMISD

#### **rFLAGS** Affected

| ID | VIP | VIF | AC | VM | RF | NT | IOPL  | OF | DF | IF | TF | SF | ZF | AF | PF | CF |
|----|-----|-----|----|----|----|----|-------|----|----|----|----|----|----|----|----|----|
|    |     |     |    |    |    |    |       | 0  |    |    |    | 0  | М  | 0  | М  | М  |
| 21 | 20  | 19  | 18 | 17 | 16 | 14 | 13-12 | 11 | 10 | 9  | 8  | 7  | 6  | 4  | 2  | 0  |

Note:

*Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.* 

#### **MXCSR Flags Affected**

| FZ                  | R                                                                                  | C  | PM | UM | ОМ | ZM | DM | IM | DAZ | PE | UE | OE | ZE | DE | IE |
|---------------------|------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-----|----|----|----|----|----|----|
|                     |                                                                                    |    |    |    |    |    |    |    |     |    |    |    |    | М  | М  |
| 15                  | 14                                                                                 | 13 | 12 | 11 | 10 | 9  | 8  | 7  | 6   | 5  | 4  | 3  | 2  | 1  | 0  |
| <b>Note:</b><br>A f | A flag that can be set to one or zero is M (modified). Unaffected flags are blank. |    |    |    |    |    |    |    |     |    |    |    |    |    |    |

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit<br>25 of CPUID standard function 1.                                                  |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.                                                                |
|                           | Х    | Х               | Х         | There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                                                                          |

| Exception                              | Real | Virtual<br>8086 | Protected    | Cause of Exception                                                                                                                                  |
|----------------------------------------|------|-----------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| General protection, #GP                | X    | Х               | X            | A memory address exceeded a data segment limit or was non-canonical.                                                                                |
|                                        |      |                 | х            | A null data segment was used to reference memory.                                                                                                   |
| Page fault, #PF                        |      | Х               | Х            | A page fault resulted from the execution of the instruction.                                                                                        |
| Alignment check, #AC                   |      | Х               | X            | An unaligned memory reference was performed while alignment checking was enabled.                                                                   |
| SIMD Floating-Point<br>Exception, #XF  | X    | X               | Х            | There was an unmasked SIMD floating-point exception<br>while CR4.OSXMMEXCPT = 1.<br>See <i>SIMD Floating-Point Exceptions</i> , below, for details. |
|                                        | •    | SI              | MD Floating- | Point Exceptions                                                                                                                                    |
| Invalid-operation<br>exception (IE)    | X    | Х               | X            | A source operand was an SNaN value.                                                                                                                 |
| Denormalized-operand<br>exception (DE) | X    | Х               | X            | A source operand was a denormal value.                                                                                                              |

### UNPCKHPD

## **Unpack High Double-Precision Floating-Point**

Unpacks the high-order double-precision floating-point values in the first and second source operands and packs them into quadwords in the destination (first source). The value from the first source operand is packed into the low-order quadword of the destination, and the value from the second source operand is packed into the high-order quadword of the destination. The low-order quadwords of the source operands are ignored. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

#### UNPCKHPS, UNPCKLPD, UNPCKLPS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

AMD 64-Bit Technology

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|---------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                           |      |                 | х         | A null data segment was used to reference memory.                                             |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

## UNPCKHPS Unpack High Single-Precision Floating-Point

Unpacks the high-order single-precision floating-point values in the first and second source operands and packs them into interleaved doublewords in the destination (first source). The low-order quadwords of the source operands are ignored. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

#### UNPCKHPD, UNPCKLPD, UNPCKLPS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | Х               | х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP   | Х    | Х               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                            |
|                           | X    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

## UNPCKLPD Unpack Low Double-Precision Floating-Point

Unpacks the low-order double-precision floating-point values in the first and second source operands and packs them into the destination (first source). The value from the first source operand is packed into the low-order quadword of the destination, and the value from the second source operand is packed into the high-order quadword of the destination. The high-order quadwords of the source operands are ignored. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.



#### **Related Instructions**

UNPCKHPD, UNPCKHPS, UNPCKLPS

#### **rFLAGS** Affected

None

#### **MXCSR Flags Affected**

AMD 64-Bit Technology

None

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                               |
|---------------------------|------|-----------------|-----------|--------------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit<br>26 of CPUID standard function 1. |
|                           | Х    | Х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                        |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.                 |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                    |
| Stack, #SS                | Х    | Х               | Х         | A memory address exceeded the stack segment limit or was non-canonical.                          |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                             |
|                           |      |                 | Х         | A null data segment was used to reference memory.                                                |
|                           | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                        |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                     |

## UNPCKLPS Unpack Low Single-Precision Floating-Point

Unpacks the low-order single-precision floating-point values in the first and second source operands and packs them into interleaved doublewords in the destination (first source). The high-order quadwords of the source operands are ignored. The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location





#### **Related Instructions**

#### UNPCKHPD, UNPCKHPS, UNPCKLPD

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                 | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|---------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD       | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                           | Х    | х               | х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                           | Х    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available, #NM | Х    | Х               | Х         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP   | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                         |
|                           |      |                 | х         | A null data segment was used to reference memory.                                            |
|                           | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF           |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

## XORPD Logical Bitwise Exclusive OR Packed Double-Precision Floating-Point

Performs a bitwise logical Exclusive OR of the two packed double-precision floatingpoint values in the first source operand and the corresponding two packed doubleprecision floating-point values in the second source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                | Opcode             | Description                                                                                                                                                                                                                 |
|-------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| XORPD xmm1, xmm2/mem128 | 66 0F 57 <i>/r</i> | Performs bitwise logical XOR of two packed double-precision<br>floating-point values in an XMM register and in another XMM<br>register or 128-bit memory location and writes the result in the<br>destination XMM register. |



#### **Related Instructions**

#### ANDNPD, ANDNPS, ANDPD, ANDPS, ORPD, ORPS, XORPS

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                            |
|------------------------------|------|-----------------|-----------|-----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | X    | Х               | Х         | The SSE2 instructions are not supported, as indicated by bit 26 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                     |
|                              | Х    | X               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.              |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                 |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                       |
| General protection, #GP      | Х    | X               | Х         | A memory address exceeded a data segment limit or was non-canonical.                          |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                             |
|                              | Х    | Х               | Х         | The memory operand was not aligned on a 16-byte boundary.                                     |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                  |

## XORPS Logical Bitwise Exclusive OR Packed Single-Precision Floating-Point

Performs a bitwise Exclusive OR of the four packed single-precision floating-point values in the first source operand and the corresponding four packed single-precision floating-point values in the second source operand and writes the result in the destination (first source). The first source/destination operand is an XMM register. The second source operand is another XMM register or 128-bit memory location.

| Mnemonic                | Opcode          | Description                                                                                                                                                                                                                    |
|-------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| XORPS xmm1, xmm2/mem128 | 0F 57 <i>/r</i> | Performs bitwise logical XOR of four packed single-precision floating-<br>point values in an XMM register and in another XMM register or 128-<br>bit memory location and writes the result in the destination XMM<br>register. |



**Related Instructions** 

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPD, ORPS, XORPD

#### rFLAGS Affected

None

#### **MXCSR Flags Affected**

| Exception                    | Real | Virtual<br>8086 | Protected | Cause of Exception                                                                           |
|------------------------------|------|-----------------|-----------|----------------------------------------------------------------------------------------------|
| Invalid opcode, #UD          | Х    | X               | Х         | The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1. |
|                              | Х    | х               | Х         | The emulate bit (EM) of CR0 was set to 1.                                                    |
|                              | X    | Х               | Х         | The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.            |
| Device not available,<br>#NM | Х    | Х               | X         | The task-switch bit (TS) of CR0 was set to 1.                                                |
| Stack, #SS                   | Х    | Х               | X         | A memory address exceeded the stack segment limit or was non-canonical.                      |
| General protection, #GP      | Х    | X               | X         | A memory address exceeded the data segment limit or was non-canonical                        |
|                              |      |                 | Х         | A null data segment was used to reference memory.                                            |
|                              | Х    | Х               | х         | The memory operand was not aligned on a 16-byte boundary.                                    |
| Page fault, #PF              |      | Х               | Х         | A page fault resulted from the execution of the instruction.                                 |

# Index

| Numerics                      |
|-------------------------------|
| 16-bit mode xi                |
| 32-bit mode xii               |
| 64-bit mode xii               |
| Α                             |
| ADDPD                         |
|                               |
|                               |
| addressing, RIP-relative xvii |
| ADDSD 1                       |
| ADDSS 1                       |
| ANDNPD 1                      |
| ANDNPS 1                      |
| ANDPD 1                       |
| ANDPS 2                       |
| В                             |
| biased exponent xii           |
| C                             |
| CMPPD                         |
| CMPPS                         |
| CMPSD                         |
| CMPSS                         |
| COMISD                        |
| COMISS 3                      |
| commit xii                    |
| compatibility mode xii        |
| CVTDQ2PD 4                    |
| CVTDQ2PS 4                    |
| CVTPD2DQ 4                    |
| CVTPD2PI 4                    |
| CVTPD2PS 5                    |
| CVTPI2PD                      |
| CVTPI2PS                      |
| CVTPS2DQ                      |
| CVTPS2PD                      |
| CVTPS2PI                      |
| CVTSD2SI                      |

 CVTSD2SS
 70

 CVTSI2SD
 73

 CVTSI2SS
 76

 CVTSS2SD
 79

 CVTSS2SI
 81

 CVTTPD2DQ
 84

 CVTTPS2DQ
 90

 CVTTPS2PI
 93

 CVTTSD2SI
 96

| CVTTSS2SI               | 99    |
|-------------------------|-------|
| D                       |       |
| direct referencing      | xiv   |
| displacements           | xiv   |
| DIVPD                   | 102   |
| DIVPS                   | 105   |
| DIVSD                   | 108   |
| DIVSS                   | 110   |
| double quadword         | xiv   |
| doubleword              | xiv   |
| E                       |       |
| eAX-eSP register        | xx    |
| effective address size  | xiv   |
| effective operand size  | xv    |
| eFLAGS register         |       |
| eIP register            | xxi   |
| element                 | xv    |
| endian order            | xxiii |
| exception               | xv    |
| exponent                | xiii  |
| F                       |       |
| flush                   | xv    |
| FXRSTOR                 | 113   |
| FXSAVE                  | 115   |
|                         |       |
| IGN                     |       |
| indirect                |       |
| instructions            | XV    |
| 128-bit media           | 1     |
| SSE                     |       |
| SSE-2                   |       |
|                         | 1     |
|                         |       |
| LDMXCSR                 |       |
| legacy mode             |       |
| legacy x86<br>long mode |       |
|                         | xvi   |
| LSB<br>lsb              | xvi   |
|                         | xvi   |
| M                       |       |
| mask                    | xvi   |
| MASKMOVDQU              | 119   |
| MAXPD                   | 121   |
| MAXPS                   | 123   |
| MAXSD                   | 126   |
| MAXSS                   | 128   |

| MBZ           | xvii  | overflo        |
|---------------|-------|----------------|
| MINPD         | 130   | P              |
| MINPS         | 132   | packe          |
| MINSD         | 135   | PACK           |
| MINSS         | 137   |                |
| modes         |       | PACKS          |
| 16-bit        | . xii | PACK           |
| 32-bit        | xiii  | PADD           |
| 64-bit        | xiii  | PADD           |
| compatibility |       | PADD           |
| legacy        |       | PADD           |
| long          |       | PADD           |
| protected     |       | PADD           |
| real          |       | PADD           |
| virtual-8086  |       | PADD           |
| moffset       |       | PAND           |
| MOVAPD        |       | PAND           |
| MOVAPS        |       | PAVGI          |
|               |       | PAVG           |
| MOVD          |       | PCMP           |
| MOVDQ2Q       |       | PCMP           |
| MOVDQA        |       | PCMP           |
| MOVDQU        |       | PCMP           |
| MOVHLPS       |       | PCMP           |
| MOVHPD        |       | PCMP           |
| MOVHPS        |       | PEXT           |
| MOVLHPS       |       | PINSR          |
| MOVLPD        |       | PMAD           |
| MOVLPS        |       | PMAX           |
| MOVMSKPD      |       | PMAX           |
| MOVMSKPS      |       | PMINS          |
| MOVNTDQ       |       | PMIN           |
| MOVNTPD       | 171   | PMOV           |
| MOVNTPS       | 173   | PMUL           |
| MOVQ          | 175   | PMUL           |
| MOVQ2DQ       | 177   | PMUL           |
| MOVSD         |       | PMUL           |
| MOVSS         |       | -              |
| MOVUPD        |       | POR            |
| MOVUPS        | 187   | protec<br>PSAD |
| MSB           |       |                |
| msb           |       | PSHU           |
| MSR           |       | PSHU           |
| MULPD         |       | PSHU           |
| MULPS         |       | PSLLI          |
| MULSD         |       | PSLLI          |
| MULSS         |       | PSLL           |
|               | 100   | PSLLV          |
| 0             |       | PSRA           |
| octword       |       | PSRAV          |
| offset        |       | PSRLI          |
| ORPD          | 201   | PSRLI          |
| ORPS          | 203   | PSRL           |

| overflow       | xvii  |
|----------------|-------|
| Р              |       |
| packed         | xvii  |
| PACKSSDW       | 205   |
| PACKSSWB       | 207   |
| PACKUSWB       | 209   |
| PADDB          | 211   |
| PADDD          | 213   |
| PADDQ          | 215   |
| PADDSB         | 217   |
| PADDSW         | 219   |
| PADDUSB        | 221   |
| PADDUSW        | 223   |
| PADDW          | 225   |
| PAND.          | 223   |
| PANDN          | 229   |
| PAVGB          | 231   |
| PAVGB          | 231   |
|                | 235   |
| PCMPEQB        | 235   |
| PCMPEQD        | 237   |
| PCMPEQW        |       |
| PCMPGTB        | 241   |
| PCMPGTD        | 243   |
| PCMPGTW        | 245   |
| PEXTRW         | 247   |
| PINSRW         | 249   |
| PMADDWD        | 252   |
| PMAXSW         | 254   |
| PMAXUB         | 256   |
| PMINSW         | 258   |
| PMINUB         | 260   |
| PMOVMSKB       | 262   |
| PMULHUW        | 264   |
| PMULHW         | 266   |
| PMULLW         | 268   |
| PMULUDQ        | 270   |
| POR            | 272   |
| protected mode | xviii |
| PSADBW         | 274   |
| PSHUFD         | 276   |
| PSHUFHW        | 279   |
| PSHUFLW        | 282   |
| PSLLD          | 285   |
| PSLLDQ         | 287   |
| PSLLQ          | 289   |
| PSLLW          | 291   |
| PSRAD          | 293   |
| PSRAW          | 296   |
| PSRLD          | 299   |
| PSRLDQ         | 301   |
| PSRLQ.         | 303   |

| PSRLW                            | 305   |
|----------------------------------|-------|
| PSUBB                            | 308   |
| PSUBD                            | 310   |
| PSUBQ                            | 312   |
| PSUBSB                           | 314   |
| PSUBSW                           | 316   |
| PSUBUSB                          | 318   |
| PSUBUSW                          | 320   |
| PSUBW                            | 322   |
| PUNPCKHBW                        | 324   |
| PUNPCKHDQ                        | 324   |
| PUNPCKHQDQ                       | 328   |
| PUNPCKHWD                        | 330   |
| PUNPCKLBW                        | 332   |
|                                  | 334   |
| PUNPCKLDQ                        |       |
| PUNPCKLQDQ                       | 336   |
| PUNPCKLWD                        | 338   |
| PXOR                             | 340   |
| Q                                |       |
| quadword                         | xviii |
| R                                |       |
|                                  |       |
| r8-r15                           |       |
| rAX-rSP                          |       |
| RAZ                              |       |
| RCPPS                            |       |
| RCPSS                            | 344   |
| real address mode. See real mode |       |
| real mode                        | xviii |
| registers                        |       |
| eAX-eSP                          |       |
| eFLAGS                           |       |
| eIP                              |       |
| r8–r15                           |       |
| rAX–rSP                          | xxi   |
| rFLAGS                           | xxii  |
| rIP                              | xxii  |
| relative                         | xviii |
| rFLAGS register                  | xxii  |
| rIP register                     |       |
| RIP-relative addressing          | xviii |
| RSQRTPS                          | 346   |
| RSORTSS                          | 348   |
| S                                | 0.0   |
| -                                |       |
|                                  | xviii |
| SHUFPD                           | 350   |
| SHUFPS                           | 353   |
| SQRTPD                           | 356   |
| SQRTPS                           | 359   |

| SSE               | xix |
|-------------------|-----|
| SSE-2             | xix |
| sticky bit        | xix |
| STMXCSR           | 367 |
| SUBPD             | 369 |
| SUBPS             | 372 |
| SUBSD             | 375 |
| SUBSS             | 378 |
| т                 |     |
| TSS               | wiw |
|                   | XIX |
| U                 |     |
| UCOMISD           | 381 |
| UCOMISS           | 384 |
| underflow         | xix |
| UNPCKHPD          | 387 |
| UNPCKHPS          | 389 |
| UNPCKLPD          | 391 |
| UNPCKLPS          | 393 |
| v                 |     |
| vector            | xix |
| virtual-8086 mode | xx  |
| X                 |     |
| XORPD             | 395 |
| XORPS             | 397 |

AMD 64-Bit Technology

26568–Rev. 3.02–August 2002