Cover Story (sidebar) / April 1995

Smarter, More Powerful Servers

You'll be able to tie four P6s together
to make a pretty reasonable multiprocessing machine

Andy Reinhardt

Among the most significant trends of the last two years have been the increasing use of x86-based systems as applications servers and Intel's growing role as a supplier of nonprocessor technologies, such as buses, networking, video compression, flash memory, and system management tools.

With the P6, Intel is continuing the trend of bringing high-end features to mass market. All the P6's internal registers are parity-checked, and the 64-bit path between the CPU core and level 2 cache uses ECC (error checking and correcting). Built-in diagnostic features, most new to P6, make it easier for vendors to design reliable systems: More than 100 events and variables inside the chip, such as cache misses, register contents, and occurrences of self-modifying code, can be counted and reported out via pins or software. Operating-system or utility software can read these values to gauge processor status and performance. The P6 also improves support for checkpointing (i.e., rolling back the machine to a known state in the event of an error), but again, the operating system has to be written to take advantage of machine-check interrupts.

The P6 also supports the same FRC (functional redundancy check) capability offered in the Pentium, in which two chips are lashed together to constantly verify each others' results and to signal an error if a conflict is found. Unfortunately, P6 doesn't solve FRC's main weakness — the nature of the error is undetermined.

Intel's P54C implementations of the Pentium introduced a simple and inexpensive approach to dual processing: a closely tied pairing in which host and slave processors shared a cache and divided up program threads transparently to applications. (For more information, see "Pentium Chip's Dual Personality," December 1994 BYTE.) Only operating systems with support for multithreading were able to take advantage of this feature, so its market penetration has been low.

Better Multiprocessing

The P6 takes commodity multiprocessing to the next level, the Intel-defined MPS (Multi-Processor Specification) 1.1. Among the most difficult aspects of SMP (symmetric multiprocessing) is maintaining coherency among dedicated per-processor caches. Because the P6 handles level 2 cache coherency internally, its frontside (external) bus is inherently cache-coherent and presents, in effect, a kind of SMP bus to the outside world.

In the past, systems designers implementing SMP had to create their own buses to communicate among processors or license a solution such as Corollary's C-bus II. After claiming earlier that the 486 and the Pentium would be good for SMP systems, says Corollary's president George White, "this time Intel is right." The difference is the external bus. "You'll be able to tie four P6s together and make a pretty reasonable MP machine using Intel's cookbook," he says. But for now, that is the limit; the P6's arbitration logic supports only four CPUs.

Another dilemma facing vendors is that dedicated-cache multiprocessors typically benefit from using more than 256 KB of cache per processor. For now, Intel is limited to 256 KB in the P6 package. Thus, makers of high-performance servers who want to go beyond commodity SMP will have to use external cache controllers and SRAM (static RAM) to support more than four CPUs or to implement larger caches. (The subtle trade-off, for which an answer isn't yet clear, is at what point a larger but external cache would surpass the performance of the P6's smaller, in-package full-speed cache.)

Intel can partly solve this problem with a bigger level 2 cache, which could be achieved by increasing the die size or moving to smaller process technology. But today its answer for vendors who want more than four CPUs is to closely couple, or cluster, systems across a high-speed memory-to-memory serial interconnect, such as SCI. Implementations of SCI for the PCI bus could ship as early as this year.

A Shot in the Arm

The combination of the P6 and MPS 1.1 will permit creation of a new class of "clone" servers that comply with a standard architecture and can run shrink-wrapped MPS-compatible operating systems. This could be a major shot in the arm for SMP. White cautions, however, that customers "may not be ready yet to buy servers on price alone." MPS doesn't go quite far enough, he says, in defining the kinds of features demanding users require, such as ECC, system management, and hot-swapping. These need to be implemented via hardware abstraction layers, or HALs (also known as processor-specific modules), the software interfaces added to operating systems to isolate them from hardware dependencies.

The irony for vendors is that if they have already written a HAL, they won't benefit much from MPS. On the other hand, customers will enjoy an attractive new range of options: From small vendors who lack the resources to develop HALs will come simple, low-cost SMPs, while high-end solutions customized to the machine will still be available at a premium price.

Copyright 1994-1998 BYTE

Return to Tom's BYTE index page