Clive's K7 Page

What's "NEW" in the K7


Last updated 1:50am 23-Jul-99

Update 12:12am 7-Jul-99 - I believe the following to be complete and accurate.
P6 Integer & Floating Point Conditional Moves

0F,40,/r        cmovo    reg,reg/mem
0F,41,/r        cmovno   reg,reg/mem
0F,42,/r        cmovb    reg,reg/mem
0F,43,/r        cmovae   reg,reg/mem
0F,44,/r        cmove    reg,reg/mem
0F,45,/r        cmovne   reg,reg/mem
0F,46,/r        cmovbe   reg,reg/mem
0F,47,/r        cmova    reg,reg/mem
0F,48,/r        cmovs    reg,reg/mem
0F,49,/r        cmovns   reg,reg/mem
0F,4A,/r        cmovp    reg,reg/mem
0F,4B,/r        cmovnp   reg,reg/mem
0F,4C,/r        cmovl    reg,reg/mem
0F,4D,/r        cmovge   reg,reg/mem
0F,4E,/r        cmovle   reg,reg/mem
0F,4F,/r        cmovge   reg,reg/mem

DA,C0+i         fcmovb   st(0),st(i)
DA,C8+i         fcmove   st(0),st(i)
DA,D0+i         fcmovbe  st(0),st(i)
DA,D8+i         fcmovu   st(0),st(i)
DB,C0+i         fcmovnb  st(0),st(i)
DB,C8+i         fcmovne  st(0),st(i)
DB,D0+i         fcmovnbe st(0),st(i)
DB,D8+i         fcmovnu  st(0),st(i)

0F,34           sysenter
0F,35           sysexit

P6/KNI/SSE and "Hinting NOPs" * See Below

0F,18,/0        prefetchnta mem8
0F,18,/1        prefetcht0  mem8
0F,18,/2        prefetcht1  mem8
0F,18,/3        prefetcht2  mem8
0F,18,/4        nop
0F,18,/5        nop
0F,18,/6        nop
0F,18,/7        nop
0F,19,/r        nop
0F,1A,/r        nop
0F,1B,/r        nop
0F,1C,/r        nop
0F,1D,/r        nop
0F,1E,/r        nop
0F,1F,/r        nop

KNI/SSE Integer MMX extensions

0F,70,/r,ib     pshufw   mmreg1,mmreg2/mem64,imm8
0F,AE,FF        sfence
0F,C4,/r,ib     pinsrw   mmreg,reg32/mem16,imm8
0F,C5,/r,ib     pextrw   reg32,mmreg,imm8
0F,D7,/r        pmovmskb reg32,mmreg
0F,DA,/r        pminub   mmreg1,mmreg2/mem64
0F,DE,/r        pmaxub   mmreg1,mmreg2/mem64
0F,E0,/r        pavgb    mmreg1,mmreg2/mem64
0F,E3,/r        pavgw    mmreg1,mmreg2/mem64
0F,E4,/r        pmulhuw  mmreg1,mmreg2/mem64
0F,E7,/r        movntq   mem64,mmreg
0F,EA,/r        pminsw   mmreg1,mmreg2/mem64
0F,EE,/r        pmaxsw   mmreg1,mmreg2/mem64
0F,F6,/r        psadbw   mmreg1,mmreg2/mem64
0F,F7,/r        maskmovq mmreg1,mmreg2

Post K6-3 3DNow! extensions, later steps?

0F,0F,/r,0C     pi2fw    mmreg1,mmreg2/mem64
0F,0F,/r,1C     pf2iw    mmreg1,mmreg2/mem64
0F,0F,/r,8A     pfnacc   mmreg1,mmreg2/mem64
0F,0F,/r,8E     pfpnacc  mmreg1,mmreg2/mem64
0F,0F,/r,BB     pswapd   mmreg1,mmreg2/mem64
* I refer to these as complex NOPs, P6 processors ignore these fully decoded instructions (reg,reg/mem) if not supported in the current implementation. For example prefetcht0 [esi] executes as a NOP on a Pentium Pro/II. Intel has a patent on this concept. Interestingly the 6502 had something very similar as I recall.
Update 1:50am 23-Jul-99 - The information was buried in amongst the AMD3DSDK, the K6 disassembler can be set to decode K6, K6-2, K6-3, K6-ST50 (Step 5.0?, JC suggests this is the 180nm version of the K6-3, which will be produced on AMD's CS50 process) and K7 to be exact. I couldn't find any stepping or errata data on the K6-3 and beyond on AMD's website making it a tad difficult to cross reference some of the information. Notwithstanding I feel quite confident with my assertions.

The K7 doesn't appear to have any of the SSE floating point stuff or the FXSAVE & FXRSTOR from Intel's Deschutes PII. Christian seems to think having half the SSE instruction set is a problem, but I disagree. Determining if a CPU has 3DNow! already differs from the way you check for SSE, and frankly having most of the integer instructions in common is going to reduce the amount of code people are going to have to write and increases the available tools to do so. Personally I feel that integer and floating point uses are likely to be separate or at least fairly distinct. AMD really doesn't need to do floating point SSE, 3DNow! seems to be at least as powerful as SSE although somewhat more limited in register resources. Intel breaks their 128 bit registers into two 64 bit pairs which is no more efficient than AMD.

Why the conditional moves are important... They are critical to removing branchs from code, as now registers can be loaded based on condition flags instead of branching in and out of the program flow and eliminating miss-prediction. It allows a clean flow through the execution pipeline without getting "bubbles" and "stalls". The code generated is more space efficient, keeping more of it in the L1 cache. Compilers can optimize for the P6 architecture, no need to optimize for two different instructions sets or pick the lowest common denominator. Now AMD can benefit from Intel's VTune compilers, as well as Microsoft's and Linux's offerings.

Basically the additions bring the functionality of the PPro/PII to the K7 (moving AMD from a P5 to a P6 architecture), along with about half the new instructions from the Katmai PIII, some of which duplicate existing 3DNow! functions. AMD has also added 5 3DNow! instructions, 3 of which appear to be in some steps of the K6 and the 3DNow! emulator included in the AMD3DSDK.


Written by Clive Turvey clive@tbcnet.com

Trademarks are the property of their respective owners. No warranty expressed or implied. No deposit. No return. Copyright (C) C Turvey 1999.