Cover Story / November 1995

Chip Fashion

Multimedia chips will dominate the technical talk
at this year's Microprocessor Forum

Tom R. Halfhill and John Montgomery

In 1996, skirts will be longer, suits will be double-breasted, and chips will go multimedia. The latter is the big news at this year's Microprocessor Forum. This is the high-tech equivalent of the annual Paris fashion shows. At last year's forum, all the excitement surrounded next-generation CPUs. But at this year's forum in San Jose, California, the focus is moving to a wave of specialized microprocessors optimized for multimedia.

Some will replace general-purpose CPUs in vertical applications, such as TV set-top boxes. Others will work as coprocessors alongside conventional CPUs in mainstream PCs. They all share a departure from the generalized computational architecture that typifies today's microprocessors. To deal more efficiently with real-time digitized streams of analog audio and video, these innovative microprocessors are adopting some techniques used by digital signal processors (DSPs). Indeed, they often resemble hybrid CPU/DSPs that blend CISC, RISC, and DSP architectures in fascinating ways.

The four multimedia chips expected to heat up this year's forum are the MicroUnity Mediaprocessor, the Philips Trimedia, the Chromatic Mpact Media Engine, and the Nvidia NV1. They're all scheduled to appear in products in 1996.

General-purpose CPUs are also adapting to the demands of multimedia, and the forum will highlight those chips as well. Sun Microsystems is announcing the UltraSPARC-II, an enhanced version of its CPU, with special multimedia instructions, that was unveiled at last year's forum. Cyrix is expected to release some details about its Multimedia 586, which is apparently an M1 variant. Intel is known to be working on a multimedia version of the Pentium (code-named P55C), though it probably won't be presented this year.

Clearly, a new trend in chip design is under way. As PCs move beyond simple numbers and words to richer, more complex data types, the old approach to number crunching is no longer adequate. Over the next decade, we'll see a new generation of PCs that are as adept at handling audio and video as today's computers are at running word processors and spreadsheets.

MicroUnity Mediaprocessor

MicroUnity has been working in great secrecy on a new chip set that is supposed to revolutionize multimedia and broadband communications. What hath MicroUnity wrought? Only time will tell if it's a breakthrough, but its technical underpinnings are fascinating. MicroUnity has created a three-part chip set that streamlines the links among broadband interactive networks (e.g., coaxial cable, fiber optics, and wireless) and client devices.

The client device could be a next-generation set-top box, where MicroUnity's chip set might replace a general-purpose CPU and half a dozen ASICs. Or it could be a communications device, such as a cable modem or an asynchronous transfer mode (ATM) switch, where the chip set might replace an embedded processor, a DSP, and various support chips. The MicroUnity chip set could also be integrated onto the motherboard of a multimedia PC, if it can weather the economics of today's low-margin clone market.

Central to the MicroUnity chip set is the Mediaprocessor, which is a programmable microprocessor that combines CISC, RISC, and DSP techniques with a high-bandwidth architecture. The actual interface to the broadband network is provided by the MediaCodec, an A/D converter (ADC) that boasts an impressive set of communications functions. Rounding out the trio is the MediaBridge, an external cache that can interface to a Peripheral Component Interconnect (PCI) bus and main-memory DRAM. The MediaCodec and MediaBridge are optional.

By far the most interesting chip is the Mediaprocessor, which is capable of prodigious throughput. MicroUnity is fabricating a slow 300-MHz version of the chip on a 0.6-micron CMOS process. The fast version, built on 0.5-micron BiCMOS, runs at 1 GHz (i.e., 1000 MHz). Thanks to a proprietary process technology invented by MicroUnity (which has built a chip foundry), the 1-GHz version squeezes about 10 million transistors onto a die only 10 millimeters square. It achieves these tiny dimensions by using four interconnected layers of ultraplanar metal, which is also part of the secret behind the Mediaprocessor's blazing speed. The critical paths are extremely short.

The other reason for the Mediaprocessor's high performance is a unique architecture. It's a five-cylinder multithreaded microprocessor. It can handle five tasks at once, devoting 200 MHz of processing bandwidth to each task. To do this, it maintains five files of 64-bit registers, with 64 registers per file. It also has 64 KB of on-board cache memory, equally divided between instructions and data. A dedicated I/O bus that's only 8 bits wide but can transfer 1 GB of data per second connects the Mediaprocessor to the MediaCodec and MediaBridge.

The Mediaprocessor has a dense instruction set that includes special operations for signal processing and extended math. For example, one of the instructions is called group fixed-point multiply and add. In a single cycle, this lone instruction can multiply four 32-bit operands by four other 32-bit operands, add the results to four more 32-bit operands, and return four 32-bit final results. That's 512 bits of bandwidth per clock cycle: 384 bits of input and 128 bits of output.

The Mediaprocessor is fully programmable and can run its own real-time OS or other low-level real-time OSes. It can even run a higher-level OS such as Unix. Thus, it doesn't need a regular CPU for vertical solutions (e.g., set-top boxes). MicroUnity expected to announce products that will use its chip set in October.

Philips Trimedia

If you think of small talking heads in postage-stamp-size windows when you think of PC video, Philips may have a surprise for you. The Trimedia microprocessor is capable of taking over for your video and sound cards and generating multiple, independently scaled live video windows that overlap in any way you want.

The Trimedia is a general-purpose microprocessor that has been enhanced to boost multimedia performance. It's a cross between a CPU and a DSP. When the Trimedia is placed in a typical PC, it will act as a multimedia coprocessor. Its role as a general-purpose microprocessor will be realized only when it's used in such devices as set-top boxes.

At the center of it is a 400-MBps bus that connects autonomous modules, including video in, video out, audio in, audio out, an MPEG variable-length decoder (VLD), an image coprocessor, a communications block, and a very long instruction word (VLIW) processor.

The VLIW processor includes a rich instruction set, with many extensions for handling multimedia. It is capable of sustaining five RISC operations per clock cycle at 100 MHz. Philips gets this performance by incorporating 27 functional units in the VLIW engine and feeding them with five instruction-issue registers. The functional units include multiple-integer ALUs, a multiple floating-point ALU, several integer and floating-point multipliers, and DSP attachment units. The functional units are pipelined, ranging from one deep (the integer ALU) to three deep (the floating-point ALU). Pipelines are also redundant (e.g., there is more than one add unit). With this design, the only functional unit that can't be fed on every clock cycle is the floating-point square root/divide.

The VLIW engine also includes 48 KB of cache memory: 32 KB for instructions and 16 KB for data. To save bandwidth and storage space, the VLIW instructions are compressed until the VLIW core needs them — even the instruction cache stores them compressed.

The other modules are processors that are each responsible for handling data-specific functions. These are generally functions that prepare data for the VLIW core.

The Trimedia is tuned to enable multiple modules to prepare their data simultaneously. The key to this is DMA. The VLIW core can issue a single instruction that opens a DMA path between main memory (stored off the chip) and any of the modules. With two instructions, video and audio could start streaming into the Trimedia. The video in and audio in modules would then asynchronously prepare the data for whatever future it may have.

The Trimedia uses glueless interfaces to the PCI bus, a digital camera, a video encoder, a stereo audio ADC/DAC, and a V.34 modem analog front end or ISDN terminal interface. The glueless interfaces decrease the cost of installing the Trimedia in a system-by up to one-third-by eliminating the need for separate adapters. The chip will cost below $50 in quantity.

With its low price and high performance (Philips estimates it will be able to perform 2 to 4 billion operations per second [BOPS] at 100 MHz), the Trimedia has the potential to become a standard part of PCs and consumer electronics devices. Philips, however, was reluctant to disclose its partners at press time.

Mpact Media Engine

The Mpact Media Engine, which is from Chromatic Research, is architecturally similar to a general-purpose DSP. However, it's actually a highly specialized microprocessor that's designed to bring a wealth of multimedia features to mass-market x86-compatible PCs. When integrated on a PC motherboard, the Mpact can do the jobs of a Windows graphics accelerator, a 3-D graphics coprocessor, an MPEG video encoder/decoder, a sound card, a fax modem, and a telephony card. It should cost system vendors less than $150 and should be in PCs by mid-1996.

Internally, the Mpact is a 1.4-million-transistor specific-purpose processor, not just a gate array or a standard-cell amalgam of other chips. It has five ALUs, including one dedicated to motion estimation — a critical function for video encoding and videoconferencing. The parallel ALUs tie into an enormous 792-bit-wide crossbar bus that can move up to 8 billion integers per second between the ALUs and the chip's primary cache. The cache has eight ports and 4 KB of static RAM (SRAM), and it's connected over a 500-MBps bus to an off-chip secondary cache that has 2 to 4 MB of Rambus DRAM (RDRAM).

According to Chromatic, this architecture (which is reminiscent of the Texas Instruments MVP DSP) can sustain 1 to 2 BOPS for most multimedia functions. Its peak rate when performing motion-estimation functions is an incredible 20 BOPS.

Much of this speed is made possible by the Mpact's dense instruction stream and DSP-like optimizations. The Mpact processor has its own machine language that packs two op codes into every word by using a long instruction word (LIW) format. The op codes are single-instruction multiple-data operations that can perform vector functions on arrays of operands without time-consuming program loops.

This programmability is crucial to the Mpact's versatility. Multimedia algorithms written in the Mpact's machine language will run atop the chip's own real-time OS. Device drivers running under the system OS — such as virtual device drivers (VxDs) in Windows 95 — map high-level API calls to these low-level programs, which are called mediaware modules. Thus, users can adapt their PCs to new industry standards and improved algorithms merely by upgrading the Mpact's device drivers and mediaware modules.

Nvidia NV1

Nvidia wants your games. The NV1 is a highly integrated multimedia accelerator that includes a 3-D video accelerator (which can do 2-D graphics, too), a 350-MIPS audio engine (with Sound Blaster emulation), and an I/O processor (for joysticks and the like). This chip is designed to improve the game experience significantly. If early demonstrations are any indicator, it will.

The 3-D video accelerator is the most interesting part of the chip. What's intriguing is a graphics primitive called QTM (Quadratic Texture Map), a derivative of an algorithm named nonuniform rational B-splines (NURBS).

Here's why. Computers are good at generating straight lines, but curves present a problem. Most 3-D accelerators represent curves by interlocking polygons- polygons that consist of three or more vertices connected by straight lines. To make a curve smooth, a 3-D accelerator must use lots of small polygons, and that's compute-intensive.

The NV1 is more clever than that. Put simply, it curves the sides of its polygons, so it doesn't need as many to make curves smooth. Fewer polygons mean fewer calculations and faster 3-D acceleration. The NV1 picks up extra speed by decreasing the number of control points required by the host CPU to move through 3-D space. Its resolution is something more like plus or minus 1 pixel. In short, it's great for games with fast 3-D motion.

The audio subsystem is similarly game-oriented. It can generate 32 concurrent audio channels of CD-quality (i.e., 16-bit) audio with phase shifting (for 3-D sound). Perfect for monsters breathing, guns firing, and a great sound track. The secret of the audio subsystem is its DMA engine. The NV1 can push and pull data from main memory incredibly quickly over its preferred PCI or VL-Bus interface. Thus, it doesn't need on-board memory to hold wave sounds - it can store them in main memory. The one potential problem with the NV1's DSP is that its algorithms are hard-wired into its silicon, so upgrades could be difficult. It's a gamble that reduces the cost and size of the NV1 overall.

Everything's connected with a 32-bit unidirectional ring bus. The bus's controller accepts transaction requests and sends instructions to every component.

The main problem we see with the NV1 is its lack of MPEG-1 decoding. MPEG-1 is popular with games. A secondary problem has to do with the new game APIs in Windows 95: They aren't optimized to the NV1's 3-D texture map, so games written to these APIs won't exploit the NV1. With a volume price of $70, we'll probably see many multimedia boards built on this chip, so maybe the API problem will work itself out. Already, Sega has worked out a deal to port many of its games to the NV1, and Diamond has developed a board that uses the NV1.

Lights, Camera, Action

These aren't the only multimedia chips around — they're not even the only ones at the Microprocessor Forum this year. Hewlett-Packard's PA7300LC has multimedia instructions, for example, and Motorola and IBM are announcing new versions of their venerable DSPs.But these chips do show that microprocessor developers are reconsidering what a microprocessor is and how it should function. More important, they show a trend toward consumerization: These multimedia processors are i nexpensive and small, and they consume little power. You'll be seeing them not only in your PC, but in your set-top box, your TV, and probably devices we haven't even thought of yet.

Sidebar: "CPU Scorecards"

Sidebar: "Alternate Views of the 615"

WHERE TO FIND

MicroUnity Mediaprocessor
Optimized for multimedia and broadband communications
Clock speeds from 300 MHz to 1 GHz
32-bit instruction set with signal-processing and
extended-math operations
Optional MediaBridge cache and MediaCodec I/O chips with
1-GBps I/O interfaces
Price: not announced
MicroUnity Systems Engineering, Inc.
Sunnyvale, CA
(408) 734-8100
fax: (408) 734-8141

Trimedia
DMA mastering-video and audio-I/O units
MPEG-2 VLD that also supports MPEG-1
Interfaces to PCI bus, digital cameras, and stereo audio
Low cost (under $50)
Price: under $50
Philips Semiconductors
Trimedia Product Group
Sunnyvale, CA
(800) 234-7381
(408) 991-2000
fax: (408) 991-2311

Mpact Media Engine
MPEG-1 real-time video and audio encoding/decoding
MPEG-2 video and audio decoding
Wave-table and wave-guide sound synthesis
H.320 (ISDN) and H.324 (analog phone line) videoconferencing
Price: under $150
Chromatic Research
Mountain View, CA
(415) 254-1600
fax: (415) 254-5849
info@mpact.com

Nvidia NV1
Wave-table sound synthesis
MIDI
3-D graphics acceleration
Up to 32 16-bit audio channels
Price: $70
Nvidia
Sunnyvale, CA
(408) 720-6100
fax: (408) 720-6111


The guardians of BYTE's San Mateo office, Tom R. Halfhill is a senior editor, and John Montgomery is the features editor. They can be reached on the Internet or BIX at thalfhill@bix.com and jmontgomery@bix.com.

Copyright 1994-1998 BYTE

Return to Tom's BYTE index page