Cover Story / December 1997

Beyond MMX

AMD, Cyrix, and Centaur are adding new extensions
for 3-D graphics to the x86 architecture — without Intel's blessing.
Will they split the Wintel PC standard?

Tom R. Halfhill

Java isn't the only platform showing hairline fractures. While Intel and Microsoft were chipping away at Java, three of Intel's competitors announced proprietary extensions to Intel's x86 microprocessor architecture. Together with Intel's own push for proprietary CPU slots, it makes the taken-for-granted Wintel standard seem a little less solid.

Bulking up x86

The startling news that Advanced Micro Devices (AMD), Cyrix, and Centaur Technology are independently extending the x86 architecture officially broke at the Microprocessor Forum in mid-October, though rumors have been flying for months. The companies also announced other enhancements to their CPUs, including larger caches, backside buses, and superscalar MMX units. (See the sidebars "AMD's K6 Road Map" and "Centaur's WinChip Road Map".)

Still, the new 3-D graphics extensions command the most attention because they dare to redefine Intel's 19-year-old x86 standard. Each of these independent x86 vendors plans to add from 12 to 30 instructions to the new versions of their CPUs in 1998. These instructions will dramatically improve the ability of the AMD K6, the Cyrix 6x86MX, and the Centaur IDT WinChip to crunch single-precision floating-point (FP) numbers, which are crucial to faster 3-D graphics.

Although all three companies' instructions are similar, the companies are not acting in concert. The exact numbers of instructions and the instruction formats themselves are different. They are also handling registers in different ways. In effect, the independent vendors are extending the x86 in similar, but divergent, directions.

Leapfrogging Intel

Moreover, they're beating Intel to the punch. Intel has been planning to add its own new instructions as part of an extension called MMX2. Analysts think MMX2 may appear in a Pentium II processor (code-named Katmai) in 1999. That will create a grand total of four different subsets of 3-D instructions.

To guarantee full x86 compatibility, Intel's competitors will almost certainly have to support MMX2 when it appears. So it's likely that future chips from AMD, Cyrix, and Centaur will end up with redundant instructions that do exactly the same thing, only with slightly different opcodes and mnemonics. In the meantime, they'll have the jump on Intel.

"Think of it as an extension of MMX for floating-point and 3-D graphics," says Doug Beard, Cyrix project manager.

"It's everything MMX2 will be, except it's a lot earlier," says Dana Krelle, marketing director for AMD's CPU group.

"Why should we restrict our customers' 3-D graphics performance and make them wait for Intel when we can do it now?" asks Glenn Henry, president of Centaur.

Pity the poor x86 programmer. One of the world's most arcane CPU architectures is about to get even stranger.

Microsoft's 3-D Glue

When a prized piece of china cracks, you fix it with superglue. In this case, the glue is Microsoft's Direct3D libraries for Windows 95 and Windows NT. The good news is that Microsoft is modifying Direct3D to support the extensions from AMD, Cyrix, and Centaur. That should prevent the Wintel PC standard from shattering — for now.

Direct3D is an OS-level API that sits between applications and the hardware. The DLLs and device drivers in Direct3D allow programmers to call high-level 3-D graphics routines without worrying about the low-level 3-D hardware in a system.

A program can query Direct3D to find out if the system has a processor with 3-D acceleration, and then find out which 3-D functions the processor can execute. Direct3D's hardware abstraction layer (HAL) translates the program's API calls into the parameters that the processor requires. If the system can't execute a particular function in hardware, Direct3D's hardware emulation layer (HEL) uses regular x86 instructions to execute the call in software. (See the figure "Direct3D Architecture".)

Thanks to hooks built into Direct3D, it won't take much for Microsoft to support the new extensions from AMD, Cyrix, and Centaur. In fact, the x86 vendors are writing most of the code themselves.

Nevertheless, a repaired piece of china is never as good as an unbroken piece of china. Some programmers who crave maximum performance (especially game programmers) might just bypass Direct3D and write their own graphics routines. They will have to use assembly language because compilers do not support the new instructions. Furthermore, to ensure full compatibility with any PC, these programs will have to query the system at run time to see if it has one of the new CPUs, then call a different subroutine for each chip.

There is also the problem of OSes that don't have Direct3D — namely, all the rest, including Windows 3.1, OS/2, Linux, and other x86 versions of Unix. Applications for those OSes will have to probe the system at run time and call different code if they want to take advantage of the extensions while remaining fully compatible. The same will be true for programs that use OpenGL or any other 3-D libraries instead of Direct3D — unless those libraries support the extensions, too. AMD, Cyrix, and Centaur say Direct3D is their first priority for the mainstream market.

AMD-3D

AMD's extensions — AMD-3D — should appear in a new K6 processor scheduled for production in the first quarter of 1998. There are 24 new instructions, mostly for single-precision FP. They are quite different from MMX instructions, which manipulate integer values. MMX is useful for general multimedia tasks but, contrary to popular belief, it does nothing to accelerate the most basic function of 3-D graphics: geometry transformations (see the sidebar "Geometry Lessons").

To restore honor to the x86, AMD is introducing a multiply-add (MADD) or multiply-accumulate (MAC) instruction similar to those in digital signal processors. It will multiply a pair of 32-bit FP values and add the result to another FP value in a single operation. Today's x86 chips would need two separate operations. Last year, Silicon Graphics added similar instructions to the Mips R5000 , resulting in dramatically faster 3-D graphics on Indy workstations.

The K6 will execute the AMD-3D instructions in a wholly new functional unit, separate from the regular FPU. AMD says the unit is pipelined to achieve a peak rate of more than one result per clock.

In a clever twist, the AMD-3D instructions shuffle their operands through the eight MMX registers, which are aliases of the eight-entry FP stack. The AMD-3D registers are therefore aliases of aliases. Future processors could substitute real physical registers for these logical registers without breaking compatibility. Since the AMD-3D instructions don't create any new state in the processor, they don't require modifications to the OS (except for Direct3D, as described earlier).

Centaur's Kick

Although Centaur has barely begun shipping its first x86 chip, the young company has persuaded Microsoft to absorb its new extensions into Direct3D. That puts Centaur's extensions on a nearly equal footing with AMD's. In addition, as with AMD, Centaur has about two dozen new instructions, including a MADD/MAC that achieves one-clock throughput. But Centaur is offering something AMD and Cyrix aren't: new registers.

Centaur's chip will have 30 new registers in all, and they will be addressable registers available to programmers. Twenty-two are actual physical registers on the chip. The other eight are aliases of the FP stack, like MMX registers. All of them are 80 bits wide — big enough to handle extended-precision FP values. Although single-precision FP operands are only 32 bits long, some instructions can generate extended-precision intermediate results that stretch out to 80 bits.

Ideally, application programmers won't have to mess with the new registers; they'll call Direct3D routines, and Direct3D will handle the details. Only if programmers bypass Direct3D will they have to directly manipulate the new instructions and registers. "We're not going to push that because we're not trying to evangelize application developers to write Centaur-specific code," says Centaur's Henry. "We're realistic. We realize we're the smallest potato."

Normally, a new set of logical registers would require modifications to the operating system because the OS has to save the registers' state during a context switch. However, the code in Direct3D that uses the extensions is a nonreentrant "critical section," like a low-level device driver. It's locked against interrupts. The operating system does not have to save the registers' states or know anything about the extensions. Clearly, Direct3D is all-important to the success of these independent extensions.

Cyrix Cayenne Turns Up the Heat

Cyrix, like AMD and Centaur, says Microsoft will support its new instructions in Direct3D, too. The Cyrix extensions won't make their debut until the second half of 1998 — months later than AMD's and Centaur's, but probably still ahead of Intel's MMX2. A Cyrix M2-series processor (code-named Cayenne) will offer 12 to 14 new instructions as part of a subset called MMX-FP. Cayenne will also have faster double- and extended-precision math.

The most interesting difference between MMX-FP, AMD-3D, and Centaur's extensions is that Cyrix doesn't have a MADD/MAC instruction — and Cyrix claims that it's not necessary. Instead, the CPU will issue two multiply instructions per clock, and each instruction will pack two 32-bit floating-point operands into the 64-bit mantissa portion of an 80-bit FP register. With some clever instruction scheduling, the CPU can weave these multiplies together with matching add instructions, so the sustained throughput is the same as using single-clock MADD/MAC instructions.

Cyrix is also adding new scatter/gather instructions that optimize parallelism when transforming 3-D triangles. There are special instructions for calculating reciprocals and reciprocal square roots, and there's even a motion-estimation instruction that compares blocks of pixel data when compressing and decompressing MPEG video.

3-D: Chip or Card?

Some graphics cards already have 3-D accelerator chips, but the new extensions from AMD, Cyrix, and Centaur will not make them obsolete. Only the most expensive, high-end cards have fast geometry engines. Mainstream cards accelerate the later stages of 3-D processing: triangle setup (converting 3-D polygon coordinates into 2-D screen coordinates), texture mapping (applying solid patterns to the wire frame), and rasterizing (painting the textured object on the screen).

Indeed, some 3-D accelerators will get a boost from the extensions because the accelerators can render polygons faster than existing CPUs can keep up. Nvidia's RIVA 128 chip, for example — used by Dell, Diamond, Gateway, Micron, and others — can render 1.5 million typical polygons per second. But even a 300-MHz Pentium II can't supply enough coordinates for more than a million polygons per second, says Dave Reed, technical marketing director at Nvidia. "This will be great for us," he says. "We've got the headroom to handle it."

Dave Wilt, marketing manager for Chromatic Research's Mpact chip, agrees. "If anything, this is going to whet people's appetite for 3-D graphics, and they're still going to need a 3-D accelerator."

Defying Intel

Will Intel's rivals get away with it? Only once before has an x86 vendor tried to extend the architecture in this way. In 1995, NexGen revealed some new multimedia instructions in its prototype Nx686 processor. But the Nx686 never came out. AMD acquired NexGen and redesigned the chip to make the K6. In the meantime, Intel released MMX. AMD dropped NexGen's extensions (which were MMX-like integer instructions) in favor of MMX compatibility.

Now the K6 is coming full circle. AMD says it will license AMD-3D to anyone who wants it, but Intel, Cyrix, and Centaur are already moving in their own directions.

It's not clear whether Intel has legal grounds to block its rivals. More likely, Intel will try to persuade developers to ignore rogue extensions to the x86 and wait until MMX2 comes out. Standardizing on MMX2 would make life easier for developers — and for Intel.

But a software developer who doesn't want to wait for MMX2 to arrive could use the fast instructions to write a smash-hit game that gets a one-year jump on the Intel loyalists. It might even influence some people to buy a CPU from some company besides Intel.

Realistically, AMD, Cyrix, and Centaur probably do not have a big enough advantage to seriously threaten Intel. However, these companies might gain some market share. The more important question for users and developers is whether these unsanctioned extensions (and possibly others in the future) will fracture the Wintel standard.

For now, it appears Direct3D can hold things together. But as Intel drives its own proprietary wedges into the standard (see the Feature article "Socket to Me," November BYTE), competitors will be hard-pressed to find work-arounds. Don't be surprised if some of those work-arounds spin off in different directions.

Sidebars:

Where to Find

Advanced Micro Devices
Sunnyvale, CA
Phone: 408-732-2400
Internet: http://www.amd.com/

Centaur Technology
Santa Clara, CA
Phone: 408-727-6116
Internet: http://www.winchip.com/

Chromatic Research
Sunnyvale, CA
Phone: 408-752-9100
Internet: http://www.chromatic.com/

Cyrix
Richardson, TX
Phone: 972-968-8388
Internet: http://www.cyrix.com/

Intel
Santa Clara, CA
Phone: 408-765-8080
Internet: http://www.intel.com/

Nvidia
Sunnyvale, CA
Phone: 408-720-6100
Internet: http://www.nvidia.com/

x86 in 1998

 Intel Pentium II (Deschutes)Intel Pentium II (Katmai) AMD-K6 3DAMD-K6+ 3DCyrix Cayenne Centaur IDT WinChip C6 (C3A)Centaur IDT WinChip C6
Introduction (estimated) Mid-1998Late 1998 – early 1999Q1 1998 Q3 19982nd half 1998March – April 1998 2nd half 1998
Dual-pipelined MMX YesYesYesYes YesYesYes
3-D extensions NoYes (MMX2)Yes (AMD-3D)Yes (AMD-3D) Yes (MMX-FP)Yes (Centaur)Yes (Centaur)
Direct3D support N/AYesYesYes YesYesYes
New FP registers† NoUnknownNoNoNo YesYes
Enhanced FPU‡ N/AN/ANoUnknown YesYesYes
Backside bus YesYesNoYes NoNoYes
Integrated L2 cache >256 KB (in cartridge)>256 KB (in cartridge) No256 KB (on chip)No No256 KB (on chip)
Fabrication process 0.25 micron0.25 micron0.25 – 0.35 micron 0.25 micron0.25 micron0.35 micron 0.25 – 0.35 micron
CPU interface Slot 1, Slot 2Slot 1, Slot 2Socket 7 Socket 7Socket 7Socket 7Socket 7
†New physical registers;
‡Intel-competitive performance


Graphical version of
x86 table.

The Math Behind 3-D Graphics

The math behind 3-D
graphics.

Direct3D Architecture

Direct-3D
architecture.

Tom R. Halfhill is a BYTE senior editor based in San Mateo, California. You can reach him at thalfhill@byte.com.

Copyright 1994-1998 BYTE

Return to Tom's BYTE index page