
P6 Integer & Floating Point Conditional Moves 0F,40,/r cmovo reg,reg/mem 0F,41,/r cmovno reg,reg/mem 0F,42,/r cmovb reg,reg/mem 0F,43,/r cmovae reg,reg/mem 0F,44,/r cmove reg,reg/mem 0F,45,/r cmovne reg,reg/mem 0F,46,/r cmovbe reg,reg/mem 0F,47,/r cmova reg,reg/mem 0F,48,/r cmovs reg,reg/mem 0F,49,/r cmovns reg,reg/mem 0F,4A,/r cmovp reg,reg/mem 0F,4B,/r cmovnp reg,reg/mem 0F,4C,/r cmovl reg,reg/mem 0F,4D,/r cmovge reg,reg/mem 0F,4E,/r cmovle reg,reg/mem 0F,4F,/r cmovge reg,reg/mem DA,C0+i fcmovb st(0),st(i) DA,C8+i fcmove st(0),st(i) DA,D0+i fcmovbe st(0),st(i) DA,D8+i fcmovu st(0),st(i) DB,C0+i fcmovnb st(0),st(i) DB,C8+i fcmovne st(0),st(i) DB,D0+i fcmovnbe st(0),st(i) DB,D8+i fcmovnu st(0),st(i) 0F,34 sysenter 0F,35 sysexit P6/KNI/SSE and "Hinting NOPs" * See Below 0F,18,/0 prefetchnta mem8 0F,18,/1 prefetcht0 mem8 0F,18,/2 prefetcht1 mem8 0F,18,/3 prefetcht2 mem8 0F,18,/4 nop 0F,18,/5 nop 0F,18,/6 nop 0F,18,/7 nop 0F,19,/r nop 0F,1A,/r nop 0F,1B,/r nop 0F,1C,/r nop 0F,1D,/r nop 0F,1E,/r nop 0F,1F,/r nop KNI/SSE Integer MMX extensions 0F,70,/r,ib pshufw mmreg1,mmreg2/mem64,imm8 0F,AE,FF sfence 0F,C4,/r,ib pinsrw mmreg,reg32/mem16,imm8 0F,C5,/r,ib pextrw reg32,mmreg,imm8 0F,D7,/r pmovmskb reg32,mmreg 0F,DA,/r pminub mmreg1,mmreg2/mem64 0F,DE,/r pmaxub mmreg1,mmreg2/mem64 0F,E0,/r pavgb mmreg1,mmreg2/mem64 0F,E3,/r pavgw mmreg1,mmreg2/mem64 0F,E4,/r pmulhuw mmreg1,mmreg2/mem64 0F,E7,/r movntq mem64,mmreg 0F,EA,/r pminsw mmreg1,mmreg2/mem64 0F,EE,/r pmaxsw mmreg1,mmreg2/mem64 0F,F6,/r psadbw mmreg1,mmreg2/mem64 0F,F7,/r maskmovq mmreg1,mmreg2 Post K6-3 3DNow! extensions, later steps? 0F,0F,/r,0C pi2fw mmreg1,mmreg2/mem64 0F,0F,/r,1C pf2iw mmreg1,mmreg2/mem64 0F,0F,/r,8A pfnacc mmreg1,mmreg2/mem64 0F,0F,/r,8E pfpnacc mmreg1,mmreg2/mem64 0F,0F,/r,BB pswapd mmreg1,mmreg2/mem64* I refer to these as complex NOPs, P6 processors ignore these fully decoded instructions (reg,reg/mem) if not supported in the current implementation. For example prefetcht0 [esi] executes as a NOP on a Pentium Pro/II. Intel has a patent on this concept. Interestingly the 6502 had something very similar as I recall.
The K7 doesn't appear to have any of the SSE floating point stuff or the FXSAVE & FXRSTOR from Intel's Deschutes PII. Christian seems to think having half the SSE instruction set is a problem, but I disagree. Determining if a CPU has 3DNow! already differs from the way you check for SSE, and frankly having most of the integer instructions in common is going to reduce the amount of code people are going to have to write and increases the available tools to do so. Personally I feel that integer and floating point uses are likely to be separate or at least fairly distinct. AMD really doesn't need to do floating point SSE, 3DNow! seems to be at least as powerful as SSE although somewhat more limited in register resources. Intel breaks their 128 bit registers into two 64 bit pairs which is no more efficient than AMD.
Why the conditional moves are important... They are critical to removing branchs from code, as now registers can be loaded based on condition flags instead of branching in and out of the program flow and eliminating miss-prediction. It allows a clean flow through the execution pipeline without getting "bubbles" and "stalls". The code generated is more space efficient, keeping more of it in the L1 cache. Compilers can optimize for the P6 architecture, no need to optimize for two different instructions sets or pick the lowest common denominator. Now AMD can benefit from Intel's VTune compilers, as well as Microsoft's and Linux's offerings.
Basically the additions bring the functionality of the PPro/PII to the K7 (moving AMD from a P5 to a P6 architecture), along with about half the new instructions from the Katmai PIII, some of which duplicate existing 3DNow! functions. AMD has also added 5 3DNow! instructions, 3 of which appear to be in some steps of the K6 and the 3DNow! emulator included in the AMD3DSDK.