[ MICROPROCESSOR REPORT LOGO ]
 
Here's an index of Tom's articles in Microprocessor Report. All articles are online in HTML and PDF formats for Microprocessor Report subscribers with a password. (Some articles, such as editorials, have free links.) Articles are also available in printed issues of Microprocessor Report. For more information, visit the MPR website.

Freescale's Designer SoCs
Chipmaker Offers New Design Services—With a Catch

Freescale Semiconductor is exploring a new line of business that has interesting implications for other chipmakers. Starting now, Freescale is offering design services to customers that want a custom SoC. Freescale will offer intellectual property (IP) for the chip, will design the chip, and will manufacture the chip. The customer provides a design specification and money. At first glance, it looks as if Freescale is merely launching a design-services business, just one more design house among many. But there's a catch. Unlike most design houses, Freescale has little interest in making SoCs entirely new from the ground up. Instead, the SoC must be based on an existing Freescale standard part or use a substantial amount of Freescale's existing IP. To customize the SoC for the target application, Freescale is willing to add or remove blocks and integrate some customer-provided IP or third-party IP. [November 17, 2008]

  • Figure 1: Block diagram of Freescale's PowerQUICC II Pro MPC8360E.


Godson-3 Emulates x86
New MIPS-Compatible Chinese Processor Has Extensions for x86 Translation

The hottest presentation at the recent Hot Chips Symposium at Stanford University was the world's first look at the Godson-3, the latest generation of China's most powerful microprocessor family. It was the first time a Chinese CPU architect visited the U.S. to lift the bamboo curtain on a home-grown Chinese processor at a major technical conference. Among the revelations was a startling feature: more than 200 new instructions and other modifications that accelerate x86-to-MIPS dynamic binary translation. In other words, the Godson-3 applies hardware optimization to x86 emulation, much as Transmeta did with its Crusoe and Efficeon microprocessors. (The Godson-3 is also known as the Loongson-3 or Dragon-3.) [November 3, 2008]

  • Figure 1: Block diagram of the Godson-3's GS464 processor core.

  • Figure 2: Cluster of four GS464 processor cores sharing four coherent L2 caches.

  • Figure 3: Two GS464 clusters linked together, forming an eight-core microprocessor.

  • Figure 4: A massively parallel implementation of the Godson-3 could populate the on-chip mesh network with 16 or more quad-core clusters.

  • Figure 5: GStera coprocessor block diagram.

  • Figure 6: Quad-core Godson-3 die layout.

  • Figure 7: Two examples of x86 virtual machines running atop a MIPS version of Linux on the Godson-3.

  • Photo: The Godson-3 was presented at the Hot Chips Symposium by Zhiwei Xu, a professor at the Institute of Computing Technology, Chinese Academy of Sciences.


Editorial: Paperless Voting Loses Ground

The old dream of a paperless office remains alluring, but the U.S. finally appears to be awakening from its nightmare of paperless voting. Gradually, election reformers are convincing public officials that paperless electronic voting machines are too flawed to win public confidence in the most important exercise of a democracy. Although much work remains to be done, we're seeing positive change since first editorializing on this subject in 2006. (See "Undo Electronic Voting".)

Politics is beyond the purview of Microprocessor Report, but we are alert to flagrant abuses of computer technology. A bread toaster that connects to the Internet and requires periodic firmware updates may offend our engineering sensibilities, but it's also funny, in a perverse way. A black-box voting machine that determines elections by running secret source code on untested hardware behind a poor user interface -- and without a paper trail -- is simply perverse. [October 27, 2008]

Free link to this editorial: Paperless Voting Loses Ground


Microprocessor Hits and Misses
Panel at Hot Chips Symposium Reviews 20 Years of Successes and Failures

(Edited by Tom R. Halfhill)

This year marked the 20th anniversary of the Hot Chips Symposium at Stanford University in Palo Alto, California, sponsored by the IEEE Technical Committee on Microprocessors and Microcomputers. To celebrate, the organizers invited six industry experts to join a discussion panel: "Ready, Fire, Aim -- 20 Years of Hits and Misses at Hot Chips." They reviewed new microprocessors and architectures presented at the symposium since 1989 and attempted to sort out the successes and failures. Microprocessor Report has lightly edited the transcript of their discussion for clarity and has added comments and article references to help put some remarks into context. [October 20, 2008]

  • Photo 1: The discussion panel included Howard Sachs, Telairity; David Ditzel, Intel; Michael Slater, Webvanta; Nathan Brookwood, Insight64; John R. Mashey, Techvisor; and David Patterson, University of California at Berkeley.

  • Photo 2: Moderator Nick Tredennick.

  • Photo 3: Microprocessor Report founder Michael Slater.


Intel's Larrabee Redefines GPUs
Fully Programmable Manycore Processor Reaches Beyond Graphics

Intel is spreading the x86 everywhere. No longer satisfied with existing strongholds in PCs and servers, this year Intel has revived the x86 as a standalone embedded processor and has introduced the first highly integrated x86-based SoCs. And as early as next year, Intel will debut the first x86-based 3D-graphics processors. So far, graphics is the oddest addition to the x86's growing list of target applications. A 30-year-old CISC architecture designed for general-purpose processing would seem to be seriously handicapped against special-purpose GPUs, which are highly optimized for tasks like pixel shading and texture mapping. But Intel is undeterred. At Siggraph 2008 -- a graphics show, not a microprocessor conference -- Intel unveiled the first technical details about its future x86-based GPU, code-named Larrabee. [September 29, 2008]

  • Figure 1: Larrabee has a fully programmable graphics pipeline, augmented with only a little specialized logic.

  • Figure 2: Block diagram of Larrabee's scalar and vector instruction paths.

  • Figure 3: Block diagram of Larrabee's vector processing unit (VPU).

  • Figure 4: Graphics performance of three action games on simulated Larrabee processors with eight to 48 cores.

  • Figure 5: Preliminary benchmark testing on simulated Larrabee processors with eight to 64 cores.

  • Figure 6: Larrabee's software stack.

  • Figure 7: Larrabee's multithreading model.

  • Figure 8: Block diagram of Larrabee's on-chip network.


Intel's New SoCs
Pre-Atom Integrated Chips Face Tough Competition

The embedded-processor market resembles a wild costume party, with variety galore -- from Little Bo Peep (8-bit MCUs) to the Incredible Hulk (massively parallel DSPs). Into this colorful riot wanders Intel, casually dressed by The Gap for a come-as-you-are party. Intel's first x86-based SoCs, announced July 23, are attired less appropriately than Intel would like. For now, they combine a PC processor core, a PC north-bridge chip, a PC south-bridge chip, and (optionally) a cryptography-acceleration chip. Consequently, they are relatively large and power hungry when compared with competing SoCs. But they are also fast, highly integrated, and definitely better than a system cobbled together with three or four separate Intel chips. [August 18, 2008]

  • Figure 1: Intel EP80579 block diagram.

  • Figure 2: Low-level cryptography acceleration on the Intel EP80579 with QuickAssist.

  • Figure 3: IPsec acceleration on the Intel EP80579 with QuickAssist.

  • Table 1: Summary of distinguishing features among the eight parts in the Intel EP80579 family.

  • Table 2: Comparison of similar networking and communications processors from Broadcom, Cavium, Freescale, and Intel.


EEMBC's MultiBench Arrives
CPU Benchmarks: Not Just For 'Benchmarketing' Any More

Imagine a world without measurements or statistical comparisons. Baseball fans wouldn't fail to notice that a .300 hitter is better than a .100 hitter. But would they welcome a trade that sends the .300 hitter to Cleveland for three .100 hitters? System designers and software developers face similar quandaries when making trade-offs with multicore processors. Even if a dual-core processor appears to be better than a single-core processor, how much better is it? Twice as good? Would a quad-core processor be four times better? The Embedded Microprocessor Benchmark Consortium (EEMBC) wants to help answer those questions. EEMBC's MultiBench 1.0 is a new benchmark suite for measuring the throughput of multiprocessor systems, including those built with multicore processors. [July 28, 2008]

  • Figure 1: MultiBench 1.0 introduces the concept of work items and workloads instead of kernels.

  • Figure 2: Screen shot of EEMBC's Workload Creator.

  • Figure 3: Preliminary MultiBench results on an anonymous quad-core processor.

  • Figure 4: Preliminary MultiBench results comparing two anonymous dual-core processors.

  • Table 1: EEMBC MultiBench 1.0 workloads and the existing EEMBC benchmark suites (if any) from which they were adapted.

  • Table 2: MultiBench composite scores and multicore scale factors for two different dual-core processors.


Editorial: Tools for Multicore Processors

We keep hearing more complaints that it's hard to write software for multicore processors because there aren't enough development tools. Not enough tools? That's like complaining it's hard to buy Chinese products because there aren't enough Wal-Marts. The real problem with multicore processors is too many development tools--and the tools are often difficult to learn and use. [July 28, 2008]

Free link to this editorial: Tools for Multicore Processors


Freescale's Multicore Makeover
New QorIQ Processors Will Eventually Supersede PowerQUICC Chips

They will be powerful and quick, but they won't be PowerQUICC. Instead of using the brand name that has been a household word since 1995--in the households of network engineers, that is--Freescale Semiconductor has unveiled a new name for its future communications processors. The new brand is QorIQ (pronounced "Core IQ"). Although the name doesn't seem like an upgrade, the chips look good. Among the first six QorIQ devices announced is the P4080, the first eight-processor multicore chip from Freescale. Some future QorIQ chips will have at least 16 cores. The PowerQUICC brand and product line aren't going away soon, but the vast majority of Freescale's new networking and communications processors will be QorIQ devices. [July 7, 2008]

  • Figure 1: QorIQ P4080 block diagram.

  • Figure 2: QorIQ P1-family block diagram.

  • Figure 3: QorIQ P2-family block diagram.

  • Table 1: Feature comparison of the six Freescale QorIQ chips announced to date: the P1010, P1011, P1020, P2010, P2020, and P4080.

  • Sidebar: The New, Improved Power e500mc Processor Core

    • Figure: The Power Architecture embedded hypervisor.


ReadyIP Boosts FPGAs
Synplicity Tools Offer Packaged Soft-IP for FPGA Development

For several years now, Microprocessor Report has covered the trend toward implementing and deploying SoC designs in the programmable logic of FPGAs instead of in the fixed logic of ASICs. In the past, programmable-logic devices were commonly viewed as prototype platforms, not as final products. FPGA developers received a big boost recently when Synplicity unveiled its ReadyIP initiative. ReadyIP allows soft-IP vendors to package their cores in a standardized format, so FPGA developers can easily integrate the IP using system-level design tools. Optionally, soft-IP vendors can protect their ReadyIP cores with encryption that still lets developers evaluate a design before purchasing a full license. And ReadyIP isn't specific to any particular brand of FPGAs. [June 16, 2008]

  • Figure 1: ReadyIP design flow.

  • Table 1: Feature comparison of 32-bit synthesizable processors approved by their vendors for deployment in FPGAs: the Altera Nios II/f v8.0, ARM Cortex-M1, Freescale ColdFire-V1, Gaisler Research LEON3, Tensilica Diamond Standard 106Micro, and Xilinx MicroBlaze v7.

  • Sidebar: Freescale Offers ColdFire-V1 for FPGAs


Editorial: A Tale of Two Companies

Silicon Valley is buzzing over the final fates of two fabless-semiconductor companies: Montalvo Systems and P.A. Semi. One went bust, and the other was mysteriously acquired by Apple. The only industry gossip that wagged more tongues this spring was Yahoo's frigid response to Microsoft's takeover bid. [May 27, 2008]

Free link to this editorial: A Tale of Two Companies


Fault Tolerance for Cortex-M3
ARM Modifies MCU Core for Critical Embedded Systems

ARM is enhancing its Cortex-M3 processor core with faster clock speeds, configurable debug logic, new power-saving features, and compatibility with third-party fault-tolerance technology. All the enhancements make the Cortex-M3 even more suitable for microcontrollers, but fault tolerance is especially important for automotive, medical, and military applications. Cortex-M3 Release 2.0 is compatible with a third-party fault supervisor from Yogitech, a company based in Pisa, Italy (home of the world's most fault-tolerant tower). [May 12, 2008]

  • Figure 1: ARM's enhanced Cortex-M3 processor has an optional observation port called the faultRobust Diagnostic Interface that couples to Yogitech's fRCPU fault-supervisor module.

  • Figure 2: Diagnostics recommended for ICs rated HFT=0 by the IEC61508 norm.

  • Figure 3: A common way to design HFT=0 systems is to use redundant processor cores running in lockstep. A smaller diagnostic module, tightly coupled to the CPU, can provide enough diversity and safety while saving silicon and power.

  • Figure 4: Yogitech faultRobust-CPU (fRCPU) block diagram.

  • Figure 5: When Yogitech's fRCPU supervisor detects a CPU fault, it generates an error message.

  • Figure 6: For higher degrees of fault tolerance (HFT>0), two processor cores and two of Yogitech's fRCPU supervisors can form a dual-channel system.

  • Figure 7: Cortex-M3 Release 2.0 block diagram.

  • Table 1: The IEC has defined these standards for fault tolerance in embedded subsystems. The higher the Safety Integrity Level (SIL), the higher the subsystem's availability.


Multicore Multithreading With MIPS
New MIPS32 1004K Coherent Processing System Has Four-Way SMP

Four-bangers are the low-end motors of the automobile world, but quad-core microprocessors are currently the hot rods of computing. On April 1, MIPS Technologies made it easier for chip designers to create quad-core SoCs by introducing the industry's first licensable processor core supporting four-way symmetric multiprocessing (SMP) and chip multithreading. A full implementation of the new MIPS 1004K Coherent Processing System with four dual-threaded cores offers the virtual equivalent of eight-way SMP. [April 28, 2008]

  • Figure 1: Block diagram showing four-way SMP with the MIPS 1004K Coherent Processing System.

  • Figure 2: Coherence-manager block diagram.

  • Figure 3: Optional I/O coherence unit (IOCU).

  • Figure 4: JPEG decompression on four different configurations of the MIPS 1004K CPS.

  • Table 1: Comparison of dual- and quad-core MIPS 1004Kc CPS configurations with the single-core MIPS 34Kc processor.

  • Table 2: Feature comparison of the MIPS 1004K CPS, ARM11 MPCore, and ARM Cortex-A9 MPCore.


Intel's Tiny Atom
New Low-Power Microarchitecture Rejuvenates the Embedded x86

In-depth 10,000-word report on Intel's new Atom family of low-power x86 microprocessors, formerly known as Silverthorne and Diamondville. Although Atom still uses too much power for most traditional embedded systems, by x86 standards it's a power-performance landmark. At launch, Atom's clock frequency will range from 800MHz to 1.86GHz, yet thermal design power (TDP) is a mere 0.65W-2.4W over that range. TDP is a worst-case metric, so typical workloads will draw much less wattage. Intel estimates the "average" power at 160-220mW and idle power at 80-100mW. Even the new Isaiah microarchitecture from VIA Technologies--formerly the low-power x86 leader--can't match Atom's TDPs. Atom completely redefines the low-power x86 landscape. [April 7, 2008]

  • Figure 1: Atom pipeline diagram.

  • Figure 2: Atom block diagram.

  • Figure 3: Intel's page-rendering benchmarks for seven popular websites.

  • Figure 4: Shmoo plot of Atom's voltage-frequency curve in two low-power states.

  • Figure 5: Atom's power states.

  • Figure 6: Atom die plot.

  • Table 1: Intel's Atom lineup at launch.

  • Sidebar: Atom's System Controller Slashes Power, Too

    • Figure: Poulsbo block diagram.

  • Sidebar: Decoding the Code Names


Editorial: Think Parallel

Multicore processors are causing much consternation in the software-development community. Traditional single-threaded programs essentially gain nothing by running on microprocessors with multiple cores. Indeed, the program might even run worse. Multicore processors are in vogue because the power-dissipation penalties of higher clock speeds force CPU architects to find alternatives. The newly popular alternative is to integrate multiple processor cores in a single chip, clock the cores at a lower frequency, and tell programmers to rewrite their software. The solution that seems to be emerging is explicitly coded data-level parallelism. [March 31, 2008]

Free link to this editorial: Think Parallel


VIA's Speedy Isaiah
New x86 Design Strikes a Different Balance of Power and Performance

In-depth 6,000-word report: VIA's new Isaiah microarchitecture is a clean-slate x86-compatible design with superscalar pipelining, out-of-order instruction processing, speculative execution, multilevel dynamic branch prediction, larger on-chip caches, and one of the fastest FPUs in the industry. In addition, Isaiah is VIA's first 64-bit x86 processor. VIA previewed Isaiah (then known as Centaur CN) in 2004, but it's only now sampling in silicon and is scheduled to debut later this year. [March 10, 2008]

  • Figure 1: Isaiah die plot.

  • Figure 2: Isaiah block diagram.

  • Figure 3: Isaiah's primary branch predictor.

  • Figure 4: VIA's PowerSaver technology with TwinTurbo PLLs.

  • Table 1: Isaiah's floating-point and media-processing performance.


Buy SoC IP Like MP3s
IPextreme's Core Store Sells Soft IP Online at Fixed Prices

In tech lingo, "IP" is an overloaded acronym that can mean "intellectual property" or "Internet Protocol." Now there may be a third definition: "impulse purchase." Intellectual-property vendor IPextreme has opened a retail website called the Core Store that makes buying IP for system-on-chip (SoC) development almost as easy as buying digital music. The Core Store sells synthesizable processor- and peripheral-IP cores at fixed, published prices. With a few mouse clicks, chip developers can review online documentation, buy the IP (Visa, MasterCard, and PayPal accepted), download the files, and begin working immediately. [February 11, 2008]

  • Figure 1: Soft IP for sale on the Core Store's home page.

  • Figure 2: Block diagram of a generic SoC that could be designed using National Semiconductor's fixed-price IP available at IPextreme's Core Store.


Parallel Processing With CUDA
Nvidia's High-Performance Computing Platform Uses Massive Multithreading

Nvidia's Compute Unified Device Architecture (CUDA) is a software platform for massively parallel high-performance computing on the company's powerful GPUs. Formally introduced in 2006, after a year-long gestation in beta, CUDA is steadily winning customers in scientific and engineering fields. At the same time, Nvidia is redesigning and repositioning its GPUs as versatile devices suitable for much more than electronic games and 3D graphics. For Nvidia, high-performance computing is both an opportunity to sell more chips and insurance against an uncertain future for discrete GPUs. [January 28, 2008]

  • Figure 1: Nvidia's CUDA platform for parallel processing on Nvidia GPUs.

  • Figure 2: Nvidia GeForce 8 graphics-processor architecture.

  • Figure 3: Three different models for high-performance computing.

  • Figure 4: CUDA programming example in C.

  • Figure 5: CUDA's compilation process.

Free link to this article on Nvidia's website: Parallel Processing With CUDA


Editorial: The Future of Multicore Processors

With the multicore era undeniably upon us, more talk is turning to the future implications of multicore processors. Of course, software development remains a big challenge, even provoking a recent article in The New York Times, of all places. But the discussion is equally spirited on the hardware side. One debate is about symmetric versus asymmetric multiprocessing. Should all the cores on a multicore chip be identical, or should some be specialized for different tasks? Another debate questions the value of core-level multithreading. How many threads make sense? In many ways, these debates echo the classic RISC versus CISC arguments of the 1990s--simplicity versus complexity, efficiency versus expediency. [December 31, 2007]

Free link to this editorial: The Future of Multicore Processors


Transmeta's Second Life
$250 Million Patent Windfall From Intel Creates Opportunities

Once given up for dead, Transmeta is getting a second chance. Thanks to a $250 million settlement from Intel in a patent-infringement lawsuit, Transmeta is looking forward to a new future as an intellectual-property (IP) provider. But the company says it has no plans to resume making microprocessors. This article analyzes Transmeta's current situation, discusses Transmeta's future plans, and reviews the 11 patents that Transmeta asserted against Intel. [December 26, 2007]

  • Figure 1: This figure, from Transmeta's 5,493,687 patent, illustrates a technique commonly used in microprocessors with hardware-level multithreading--multiple register banks, switchable for each thread context.

  • Figure 2: This figure, from Transmeta's 6,226,733 patent, illustrates a method for speculatively calculating memory addresses in a microprocessor that has both memory paging and memory segmentation.

  • Figure 3: This figure, from Transmeta's 7,100,061 patent, is a flow chart describing one way of dynamically adjusting the clock frequency and voltage of a microprocessor to save power.

  • Sidebar: A chronological list of 24 Transmeta-related articles published in MPR since 1998. Many additional MPR articles have discussed Transmeta in relation to other microprocessor companies.

Altera Aims For ASICs
Altera and Synopsys Offer Nios II Processor for Standard-Cell Designs

Altera's Nios II embedded-processor core is now a triple-threat contender. Thanks to a partnership with Synopsys, developers can license the 32-bit synthesizable processor for standard-cell implementations in ASICs as well as for FPGAs and structured ASICs. Previously, Nios II was restricted to Altera's FPGAs and HardCopy II structured ASICs, although Altera occasionally made special arrangements with favored customers. Now, anyone can license Nios II for a standard-cell design flow using industry-standard design tools, including the popular electronic-design automation (EDA) tools from Synopsys. [December 17, 2007]

  • Figure 1: Altera's estimated performance of the Nios II/f processor when implemented as a standard-cell ASIC, as a structured ASIC, and in programmable logic.

  • Table 1: Feature comparison of synthesizable 32-bit embedded-processor cores marketed for deployment in programmable-logic devices: Altera's Nios II/f (v7.2), ARM's Cortex-M1, Gaisler Research's LEON3, and the Xilinx MicroBlaze v7.0.


Parallel Processing For the x86
RapidMind Ports Its Multicore Development Platform to x86 CPUs

The RapidMind Multicore Development Platform requires programmers to rewrite the data-intensive portions of their code, and it also requires the target system to run a hardware-abstraction layer between the application program and the microprocessor. In return, RapidMind claims big benefits. Some tasks run five to ten times faster, and, in some cases, performance can scale faster than the rising number of processors. In addition, the parallel code is highly portable--programmers needn't rewrite it for each new multicore processor or multiprocessor system. Previously, RapidMind's platform worked only with IBM's Cell Broadband Engine (Cell BE) and the graphics processors from AMD/ATI and Nvidia. On November 5, RapidMind announced Multicore Development Platform v3.0, which targets the popular multicore x86 processors from AMD and Intel. [November 26, 2007]

  • Figure 1: RapidMind's Multicore Development Platform v3.0.

  • Figure 2: Parallel processing with RapidMind's platform.

  • Figure 3: Comparison of C++ code before and after rewriting for RapidMind's Multicore Development Platform.

  • Figure 4: Performance comparison of serial C++ code vs. RapidMind C++ code.

  • Figure 5: Performance comparison of serial C++ code vs. RapidMind C++ code on x86 processors and an Nvidia GeForce 8800 GTX graphics card.

  • Sidebar: RapidMind Wins HPCwire Awards at SC07 Conference


MicroBlaze v7 Gets an MMU
Memory Manager Brings Full-Fledged Linux to Xilinx Processor Core

Xilinx is upgrading its MicroBlaze embedded-processor core again, this time adding an optional memory-management unit (MMU) that allows the 32-bit processor to run sophisticated operating systems supporting virtual memory. Developers can also substitute a simpler memory-protection unit (MPU) or omit supervised memory management altogether. MicroBlaze v7 has other improvements as well. New instructions provide faster floating-point performance and better I/O with coprocessors and custom logic. Xilinx has upgraded the CoreConnect interface to the latest CoreConnect Processor Local Bus (PLB) v4.6 specification, which provides faster links to on-chip peripherals. [November 13, 2007]

  • Figure 1: Example SoC block diagram.

  • Table 1: Sizes of three MicroBlaze v7 memory-management options.

  • Table 2: MicroBlaze v6 versus MicroBlaze v7 performance.

  • Table 3: Feature comparison of the Xilinx MicroBlaze v7, MicroBlaze v6, Altera Nios II, and ARM Cortex-M1 processor cores.


Atmel's Customizable MCUs
Metal-Programmable Gates Add Flexibility to ARM-Based Microcontrollers

Customizable Atmel Processors (CAPs) invert the structured-ASIC formula to preserve the good aspects (design flexibility, rapid turnaround) while avoiding the bad aspects (complex design and verification, insufficient advantages over FPGAs and standard-cell ASICs). Instead of offering a blank slate of programmable metal encompassing nearly the whole chip, Atmel's CAPs are fundamentally ARM7- or ARM9-based microcontrollers with the usual integrated peripherals and I/O interfaces. Only about 10% to 20% of the chip is reserved for a metal-programmable block. By using this block to integrate additional peripherals, application-specific logic, or even multiple processor cores, customers can transform these off-the-shelf parts into the near equivalent of a custom ASIC. [October 29, 2007]

  • Figure 1: Customizable Atmel Processor (CAP) die photo.

  • Figure 2: Size comparison of Atmel's metal-programmable gates with standard-cell gates.

  • Figure 3: CAP7 block diagram.

  • Figure 4: Block diagram of Amulet Technologies' Graphical OS in Silicon technology for Atmel's CAP.

  • Table 1: Soft macros available for CAPs.


ARC Encodes Digital Video
New Video Subsystems Exploit VRaptor Media Architecture

ARC International has introduced five new members of the ARC Video Subsystem, a family of digital-video encoders and decoders. ARC licenses these subsystems to customers as soft intellectual property (IP) for integration in SoCs. The ARC Video Subsystem builds on the ARC VRaptor Media Architecture introduced in 2006. VRaptor, in turn, is based on an ARC 700 32-bit embedded-processor core augmented with instruction-set extensions, SIMD media processors, communication channels, special acceleration logic, and optimized software codecs for popular audio/video standards. Until now, ARC's preconfigured subsystems could handle video decoding but not the more challenging task of encoding. [October 15, 2007]

  • Figure 1: ARC Video 417V block diagram.

  • Table 1: Feature comparison of ARC's AV 402V, AV 404V, AV 406V, AV 407V, and AV 417V video subsystems.

  • Table 2: A sampling of ARC's VRaptor Media Architecture.

  • Table 3: ARC Video Subsystem performance.

  • Sidebar: Future Directions: Mobile HD Video

  • Sidebar: Hard-Wired Video Accelerators for ASICs


Cortex-R4X: Extreme Makeover
Intrinsity's Fast14 Technology Accelerates ARM's Processor Core

In July, Microprocessor Report described a new Power Architecture processor core that Intrinsity designed for AMCC using Fast14 dynamic logic. In that collaboration, Intrinsity played the role of a design house as well as an intellectual-property (IP) provider by designing a new Power-compatible microarchitecture to AMCC's specifications. Now, Intrinsity is playing a different role for ARM. Starting with an existing microarchitecture -- ARM's Cortex-R4 embedded-processor core -- Intrinsity is using Fast14 to transform the synthesizable model into a hard macrocell. The result is the ARM Cortex-R4X, the extreme-makeover edition of the Cortex-R4. [September 24, 2007]

  • Figure 1: Power-performance envelopes of the ARM Cortex-R4 vs. the Cortex-R4X in a low-leakage 65nm CMOS fabrication process.

  • Figure 2: ARM's sales estimates for various types of embedded systems in 2008.

  • Figure 3: Comparison of a halt-propagate-generate cell implemented with Intrinsity's Fast14 1-of-N domino logic (NDL) and conventional static logic.

  • Table 1: Feature comparison of the ARM Cortex-R4X, ARC 750D, MIPS 24KEc, MIPS 74K, and Tensilica Diamond 570T processors.

  • Sidebar: ARM's Cortex-M1 for Low-Power Actel FPGAs


Editorial: Intrinsity Turns a Corner

This month's issue of Microprocessor Report has an article about ARM's Cortex-R4X processor, a new hard-macro version of the previously released Cortex-R4 synthesizable core. What's special about this particular hard core is that it uses Intrinsity's Fast14 technology -- a type of dynamic domino logic that has been demonstrated to significantly improve microprocessor performance. In our July issue, we reported on another interesting collaboration between Intrinsity and AMCC. With these projects, Intrinsity appears to be successfully redefining itself as an IP provider and design shop specializing in speed-optimized embedded-processor cores. [September 24, 2007]

Free link to this editorial: Intrinsity Turns a Corner


Freescale's Multicore Strategy
Key Components: Optimized CPU Core, Accelerators, and Interconnects

If anyone still thinks multicore chips are merely the latest technology fad, banish such impure thoughts immediately. It has become clear that chip-level multiprocessing is the only visible path toward significantly higher performance, and every leading-edge processor company has a multicore strategy. The latest company to revamp its strategy is Freescale Semiconductor. Freescale is a good case study, because the company has been selling multicore chips--of a sort--since the mid-1990s. [August 27, 2007]

  • Figure 1: Freescale's multicore platform block diagram.


XMOS Redefines Silicon
Software-Defined Chips Attack ASICs, ASSPs, FPGAs

XMOS Semiconductor is pushing a technology it calls "software-defined silicon." In this concept, a multicore array of general-purpose embedded-processor cores uses hardware multithreading to run the control software and application software under hard real-time constraints. At the same time, separate threads drive the chip's pins to emulate the required I/O interfaces—Ethernet, USB, UARTs, I2C, and so forth. This combination of multicore integration, deterministic multithreading, and software-defined I/O allows a general-purpose microprocessor to perform the functions of an SoC, but without custom acceleration hardware or dedicated I/O controllers. [August 6, 2007]

  • Figure 1: Example XMOS C (XC) code for a UART transmit function.

  • Figure 2: Developers can use standard C and C++ to write most software.

  • Figure 3: Block diagram of a dual XCore design with XLink on-chip interconnect.

  • Sidebar: Key People at XMOS Semiconductor


Editorial: The New PC From Hell

Last month I helped my brother purchase and set up his first new home computer in eight years. What should have been an easy job became a two-day ordeal that would be comical if it weren't such a sad commentary on today's PC industry. The villains include hardware manufacturers, software publishers, documentation writers, mass-market retailers, and corporate downsizers. All are clueless about serving their primary customers -- ordinary users. And it seems to be getting worse, not better. [July 30, 2007]

Free link to this editorial: The New PC From Hell


AMCC's Titan Core
New Power Architecture Core Uses Only 2.5W at 2.0GHz

AMCC and Intrinsity have joined forces to create an entirely new Power Architecture processor core. Code-named Titan, the 32-bit semicustom core relies heavily on Intrinsity's Fast14 logic to reach high clock speeds (up to 2.0GHz in 90nm bulk CMOS) while consuming remarkably little power (2.5W). In addition, Titan is part of a dual-core "processor complex" that supports coherent multiprocessing. If Titan succeeds, it will admit AMCC and Intrinsity to an exclusive club formerly limited to Freescale, IBM, and P.A. Semi--the only other companies creating original Power Architecture designs. [July 23, 2007]

  • Figure 1: Comparing Fast-14 logic with conventional static logic.

  • Figure 2: AMCC Titan pipeline diagram.

  • Figure 3: AMCC Titan block diagram.


Cavium Stalks Storage
Coming Soon: The First Octeon Storage Processors

Cavium Networks is entering the mainstream storage-processor market with two families of Octeon chips based on the company's successful networking and communications processors. When the new storage processors debut late this year, they will bring the same high integration and programmability to networked storage systems that Cavium's existing processors have brought to routers, broadband-access devices, and many other networking products. The new Octeon Storage Services Processors will have two to twelve MIPS-compatible processor cores per chip, as much as 2MB of L2 cache per core, configurable I/O interfaces, and hardware acceleration for critical tasks. [July 16, 2007]

  • Figure 1: Octeon SSP CN57xx block diagram.

  • Figure 2: Octeon SSP sequential-read performance with iSCSI.

  • Table 1: Feature comparison of Cavium's Octeon CN55xx and CN57xx Storage Services Processors.


Editorial: Commodity Products Make Commodity Markets

Most companies fear commodity markets--those markets that subsist on razor-thin profit margins, providing sustenance only to the bottom-feeders. Typically, a new market opens with highly innovative products that command high profit margins. As more companies enter the fray, competition drives prices down. Eventually, the products become so plentiful and similar to each other that they become a nearly profitless commodity. That's Business 101. But I think much of the damage of commodity markets is self-inflicted. Lately I've been wondering if the spread of embedded-processor technology is partly to blame. [June 26, 2007]

Free link to this editorial: Commodity Products Make Commodity Markets


Freescale's First Flexis MCUs
New 8- and 32-Bit Microcontrollers Offer Pin Compatibility

Years ago, some crazy hot-rod mechanics crammed V8 engines into their classic Volkswagen Beetles. This hardware hack wasn't easy. The huge V8 transformed a cute Bug into a kludgy monstrosity. Freescale Semiconductor wants to bring a similar upgrade to embedded systems, only without the kludge quotient. So this week, Freescale is unveiling the first microcontroller family with pin-compatible 8- and 32-bit devices. Freescale's new Flexis-family MCUs for consumer and industrial applications will allow developers to pull an 8-bit chip out of a socket, replace it with a 32-bit part, update the firmware, reboot, and continue running the system as before--except with much more horsepower. [June 26, 2007]

  • Figure 1: Flexis microcontroller block diagram.

  • Table 1: Feature summary of Flexis 8- and 32-bit microcontrollers.

  • Table 2: Flexis low-power modes.


MIPS 74K Performance Update
MIPS Releases Power/Performance Estimates for New Processor Core

At the recent Microprocessor Forum in San Jose, MIPS Technologies released power-consumption estimates and performance benchmarks for the new MIPS32 74K embedded-processor core. These preliminary numbers show the 74K running neck and neck with ARM's Cortex-A8. Microprocessor Report covered the MIPS 74K in detail shortly after its May 21 debut, but we overlooked some power-consumption estimates. In her Microprocessor Forum presentation, MIPS Engineering Director Vidya Rajagopalan showed the latest data for a 74Kc processor core synthesized for TSMC's 65nm GP process, using TSMC's standard-cell library and low-Vt transistors. [June 4, 2007]

  • Figure 1: MIPS 74K vs. MIPS 24K performance.

  • Table 1: Estimated MIPS 74Kc power consumption and performance.


Editorial: Unchained Melodies

Amazon.com grabbed headlines this month by announcing that it will sell music downloads unfettered by digital-rights management (DRM). Customers will be allowed to download and listen to the songs anywhere--on personal computers, portable music players, home sound systems, car stereos--and even burn copies on CDs. Amazon's announcement is trumpeted as a breakthrough for the music industry. That's funny. I remember enjoying the same freedom to make copies of music for personal use back in the analog vinyl-and-tape days. Even in the 1980s, when audio CDs introduced the world to digitized music, it was common to make cassette copies for the car and mix-tapes for parties. Amazon's "breakthrough" is more like a restoration of lost rights. [May 29, 2007]

Free link to this editorial: Unchained Melodies


MIPS 74K Goes Superscalar
New 32-Bit Processor Core Has Dual-Issue Out-of-Order Pipelining

It's so old, it's new again. In the 1990s, MIPS Technologies was at the forefront of RISC microprocessor design, introducing speedy workstation/server processors like the R10000 with deep superscalar pipelines and out-of-order execution. Now those features are reappearing in synthesizable embedded-processor cores. At last week's Microprocessor Forum in San Jose, California, MIPS showed that architectural acrobatics are making a comeback. MIPS introduced the MIPS32 74K, a new family of 32-bit synthesizable processor cores for demanding embedded applications. Among other tricks, the 74K uses two-way superscalar superpipelining and out-of-order execution--techniques once dismissed as too complex for lowly embedded processors. [May 29, 2007]

  • Figure 1: MIPS32 74K pipeline diagram.

  • Figure 2: MIPS32 74K processor core block diagram.

  • Figure 3: Preliminary results of DSP benchmark tests on the MIPS 74K, 24KE, and 24K processors.

  • Table 1: New instructions in the MIPS DSP Application-Specific Extension (ASE) Revision 2.

  • Table 2: MIPS 74K performance characteristics after speed-optimized synthesis.

  • Table 3: Feature comparison of the MIPS 74Kc, MIPS 74Kf, MIPS 34K, ARC 750D, ARM Cortex-A8, IBM Power 460S, and Tensilica Xtensa LX2 processor cores.


Making Chips From Thin Air
IBM's New 'Air-Gap' Technology Uses Vacuums for Low-k Dielectrics

Vacuum tubes vanished from computers decades ago, but now vacuums are making a surprising comeback. IBM is introducing a new semiconductor-fabrication technique that creates "air gaps"--actually, tiny vacuum cavities--to replace the conventional insulation around copper wiring in integrated circuits. The preliminary results are even better than with the latest low-k solid dielectrics. Lower-k dielectrics reduce the capacitive coupling between adjacent wires, thereby improving current flow. Circuit designers can leverage lower capacitance to increase the chip's clock frequency, reduce the chip's power consumption, or choose some combination of those improvements. [May 21, 2007]

  • Figure 1: This photograph, taken with an electron-beam microscope, shows the lattice-like atomic structure that emerges after IBM deposits its polymer material on a copper-metal layer.

  • Figure 2: The drawing at the bottom of this figure illustrates the nanoscale holes that allow acids to create gaps in the solid dielectric material. Above are actual photographs of the tiny cavities.

  • Figure 3: This electron-beam micrograph shows an oblique view of the metal layers in a chip fabricated with IBM's new air-gap technology.

  • Figure 4: Additional electron-beam micrographs provide startling closeups of the tiny vacuum cavities.


Preview: Microprocessor Forum 2007
Intel Headlines Conference on Multicore, Video, Graphics, and Low Power

With three keynote addresses, 20 technical presentations, and a full-day seminar on power efficiency--plus our traditional Tuesday-evening expo and party--Microprocessor Forum will celebrate its 19th anniversary this year. Dozens of companies are participating as presenters or sponsors. This event will be the only Microprocessor Forum in the U.S. in 2007, moving from its usual time in the fall to May 21-23 in San Jose, California. The only other scheduled forum is Microprocessor Forum Japan, on June 19-20 in Tokyo. [May 14, 2007]

  • Photo 1: At last year's Microprocessor Forum, Intel's Dileep Bhandarkar delivered a well-received presentation on future power-management technology. Microprocessor Forum is the longest-running independent technical conference on all aspects of microprocessors.

  • Photo 2: A Wi-Fi network allows conference attendees to download the latest versions of technical presentations and other materials. In-Stat will also make the materials available on USB flash drives.

  • Photo 3: The traditional Tuesday-night Expo and Demo Showcase gives attendees a chance to huddle with industry celebrities while enjoying food and drink.

Free link to this article: Preview: Microprocessor Forum 2007


Editorial: The Dread of Threads

Multicore processors are leading the computer industry into uncharted territory. There might be entire minefields of hidden software bugs we haven't considered before. Two papers I've read on this subject are disturbing, especially because they warn that we have few alternatives. One paper is "The Problem With Threads," by Dr. Edward A. Lee, chairman of electrical engineering at the University of California at Berkeley. The other paper, also authored at that university, is "The Landscape of Parallel Computing Research: A View From Berkeley," by 11 experts on microprocessor architecture. It asserts that the only path toward significantly faster CPUs is chip multiprocessing, regardless of any consequential problems with threads. [April 30, 2007]

Free link to this editorial: The Dread of Threads


Embedded Systems Conference Highlights
News From the ESC Exhibition Floor and Meeting Rooms in San Jose

MIPS Technologies has negotiated a landmark licensing deal with STMicroelectronics that appears to resolve a long-running dispute with China over MIPS-like derivatives of the MIPS architecture...The Power.org consortium has formed technical subcommittees to resolve differences among Power Architecture microprocessors and processor cores...ARC International announced a surprising acquisition of Teja Technologies, and we suspect there's more to this deal than ARC disclosed in its press release...NXP Semiconductor (formerly Philips Semiconductors) showed some fascinating preliminary results of tests with the power-consumption benchmarks that EEMBC introduced last year...Innovasic Semiconductor, which specializes in satisfying demand for chips discontinued by other companies, wants to clone the Intel 386 processor, which Intel recently dropped from its product catalog. [April 23, 2007]

  • Figure 1: ARC VRaptor block diagram.

  • Figure 2: NXP Semiconductor measured power consumption of its LPC3180 microcontroller using EEMBC's automotive benchmark suite and EnergyBench.

  • Figure 3: LPC3180 energy consumption per floating-point loop iteration, in microjoules.


Freescale Licenses Power Cores
Power Architecture e200 Processor Cores Available for IP Licensing

For the first time, Freescale Semiconductor is making some of its Power Architecture embedded-processor cores generally available as licensable intellectual property (IP). Until now, only IBM has broadly licensed Power cores to chip developers. Freescale's move strengthens the Power Architecture as an alternative to widely licensed embedded-processor cores from ARM and others. The first Freescale cores released for licensing are four members of the 32-bit Power e200 family. All are fully synthesizable and portable to virtually any digital IC process. [April 2, 2007]

  • Figure 1: Power e200z6 block diagram.

  • Table 1: Feature comparison of Freescale's licensable Power e200z0, e200z1, e200z3, and e200z6 embedded-processor cores.

  • Table 2: Feature comparison of six 32-bit Power Architecture embedded-processor cores available for licensing: the Freescale Power e200z6, IBM Power 460S, IBM Power 464-H90, IBM Power 464FP-H90, IBM Power 440, and IBM Power 405.

  • Sidebar: Freescale Outsources Licensing to IPextreme


ARM Blesses FPGAs
New Cortex-M1 Processor Core Is Optimized for FPGA Integration

In a radical departure from past policy, ARM will allow licensees to synthesize some of its embedded-processor cores in FPGAs and is optimizing these cores for programmable-logic fabrics. Until now, with one exception, ARM has permitted licensees to synthesize ARM processors in FPGAs for development purposes only, not for product deployment. At the same time, ARM is announcing its first synthesizable processor core specially designed for FPGAs: the Cortex-M1. ARM says additional FPGA-optimized cores will follow. [March 19, 2007]

  • Table 1: ARM Cortex-M1 instruction set.

  • Table 2: ARM Cortex-M1 configuration options.

  • Table 3: Feature comparison of seven 32-bit embedded-processor cores licensable for FPGA deployment: the ARM Cortex-M1, Altera Nios II/f, Altera Nios II/s, Altera Nios II/e, Gaisler Research LEON3, Xilinx MicroBlaze v5.0, and Xilinx MicroBlaze v4.0.


Editorial: MPR Analysts' Choice Awards

We are pleased to announce all the winners of our annual Microprocessor Report Analysts' Choice Awards for 2006. We have recognized seven companies: ARM, Ambric, Eutecus, Freescale Semiconductor, Handshake Solutions, Intel, and Planet82. One award was shared by two companies, ARM and Handshake Solutions. ARM and Intel each won two awards. Also in this month's editorial: a follow-up to our December 2006 editorial against paperless electronic voting. [February 26, 2007]

Free link to this editorial: MPR Analysts' Choice Awards


MPR Innovation Award: Eutecus
Superfast Sensor-Processors Break New Ground in Digital Imaging

Microprocessor Report is presenting an MPR Analysts' Choice Award in the Innovation category to Eutecus, Inc., for designing a digital-imaging sensor-processor architecture that can capture and analyze up to 100,000 frames per second. The company's Cellular Visual Technology combines a massively parallel processor architecture with optimized image-processing software. Some implementations use an innovative semiconductor fabrication process to bond the image sensor directly onto the parallel-processor array, creating a multilayer chip. [February 26, 2007]

  • Figure 1: An innovative semiconductor-fabrication process distributes thousands of indium bumps over the surfaces of the image-sensor and processor dies, bonding them together.


MPR Analysts' Choice Awards
Five Companies Make Our First Group of Winners for 2006

This week we're announcing the first group of our annual Microprocessor Report Analysts' Choice Awards. Next week we'll announce the final group of winners. For each award, we are publishing a brief article about the winning product or technology and the reasons for our choice. Five companies are in the winner's circle this week: Ambric, ARM, Freescale Semiconductor, Handshake Solutions, and Intel. We're actually handing out four awards, because two of those companies (ARM and Handshake Solutions) share an award. [February 20, 2007]

  • Figure 1: All winners of MPR Analysts' Choice Awards will receive a wall plaque that displays a reproduction of the MPR article announcing the award.


MPR Innovation Award: Ambric
Ambric Fits New CPU Architecture to Parallel Programming Model

Microprocessor Report is presenting an MPR Analysts' Choice Award in the Innovation category to Ambric, an Oregon-based fabless semiconductor company founded in 2003. Bucking the usual trend, Ambric designed a new microprocessor architecture by first creating an innovative programming model, then fashioning an architecture capable of efficiently executing it. Ambric's Am2045 massively parallel processor crams 360 proprietary 32-bit RISC processors and 585KB of SRAM onto a single compact die. Maximum theoretical performance exceeds one trillion operations per second at 333MHz. [February 20, 2007]

  • Figure 1: Partial Am2045 block diagram.


MPR Innovation Award: ARM996HS
ARM and Handshake Solutions Debut Clockless Processor Core

Microprocessor Report is presenting an MPR Analysts' Choice Award in the Innovation category to ARM and Handshake Solutions for the ARM996HS, the first commercially available 32-bit microprocessor core implemented in asynchronous (clockless) logic. ARM introduced the ground-breaking processor in early 2006. ARM's development partner was Netherlands-based Handshake Solutions, which worked closely with ARM to bring the unconventional technology to market. [February 20, 2007]

  • Figure 1: Chart showing the much lower peak currents of the ARM996HS processor, which reduce electromagnetic emissions.


Faster Than a Blink
Parallel Processors and Bonded Sensors Enable Ultrafast Imaging

If a picture is worth a thousand words, what are 100,000 pictures per second worth? Plenty, to anyone who can design a digital-imaging system capable of achieving such spectacular frame rates. Applications include robotic vision, intelligent video surveillance, scientific analysis of momentary events, monitoring industrial processes, interactive games, and guidance systems for unmanned vehicles and missiles. With grants from the U.S. Missile Defense Agency and the Office of Naval Research, scientists from Hungary, Spain, and the U.S. founded Eutecus Inc. and developed Cellular Visual Technology. CVT combines a massively parallel processor architecture with optimized image-processing software. Some implementations use an innovative semiconductor fabrication process to bond the image sensor directly onto the parallel-processor array, creating a stacked multilayer chip. [February 12, 2007]

  • Figure 1: Photo of Eutecus C-TON chip.

  • Figure 2: Using an innovative manufacturing technique called 3D bump-bonding, Eutecus grafts an image-sensor die onto another die containing the massively parallel processor array.

  • Figure 3: C-TON block diagram.

  • Figure 4: At the abstract level, a Eutecus sensor-processor chip resembles a multilayer cake, with arrays of different components in each layer.

  • Figure 5: The array processors can adjust the intensity of individual pixels, improving the photographic quality of the image.

  • Figure 6: C-TON layout photo.

  • Figure 7: Illustration of the saccadic jumps of human vision.

  • Figure 8: Eutecus CVT technology allows developers to mimic the saccadic jumps of human vision by applying the processor array to different parts of a high-resolution image.

  • Table 1: Performance measurements of basic image-processing tasks on a 100MHz C-TON processor.


Editorial: Undo Electronic Voting

Electronic voting machines are a classic example of botching a high-tech solution to a low-tech problem, thereby creating a new high-tech problem. It might be amusing if anything less than our democracy were at stake. U.S. election authorities are rushing into electronic voting without due diligence, without carefully considering the consequences, and without sufficient input from technical experts. Indeed, the situation is so appalling that I suspect almost any reader of Microprocessor Report could design better hardware and software than we have now. We don't really need electronic voting machines, but if we're forced to use them, let's at least do it right. [December 26, 2006]

Free link to this editorial: Undo Electronic Voting


The Intel 4004's 35th Anniversary
Engineers Celebrate the World's First Commercial Microprocessor

On November 15, 1971, Intel introduced the world's first standard-part microprocessor, the 4004. It was a four-bit CPU with 2,250 transistors, and it ran at a clock speed of 740kHz. Intel manufactured the chip in a 10-micron PMOS process on two-inch silicon wafers and furnished the device in a 16-pin ceramic dual-in-line package. To celebrate this historic chip, Microprocessor Report covered the anniversary event at the Computer History Museum in Silicon Valley, which reunited codesigners Ted Hoff and Federico Faggin. Our coverage includes transcripts of their presentations, their responses during an audience question-and-answer session, our own technical analysis of the 4004, a block diagram of the processor, our newly reconstructed instruction-set table, and an analysis of how the 4004 transformed the computer industry. [December 18, 2006]

  • Figure 1: Photo of Ted Hoff, coinventor of the Intel 4004, speaking at the 35th anniversary event at the Computer History Museum in Silicon Valley.

  • Figure 2: Intel's first advertisement for the 4004 in late 1971.

  • Figure 3: Photo of an original 4004 on a SIM-402 single-board development system that Intel sold to customers in the 1970s.

  • Figure 4: Photo of Federico Faggin describing the 4004's layout during the 35th anniversary event at the Computer History Museum.

  • Figure 5: Die photo of the 4004 with Federico Faggin's etched initials visible in the lower-right corner.

  • Figure 6: Photo of the desktop calculator that Busicom built with the 4004 and other 4000-family chips.

  • Figure 7: Intel's most widely reproduced photograph of the 4004 makes it appear the package has a cap made of wood, but it's actually a ceramic-and-gold package.

  • Sidebar: Analyzing the Intel 4004

    • Figure: Microprocessor Report redrew this 4004 block diagram from vintage documentation, making minor modifications.

    • Table: Microprocessor Report reconstructed this 4004 instruction-set table by studying vintage documentation.

  • Sidebar: How Microprocessors Upset the Computer Industry (by Don Alpert, Microprocessor Report Editorial Board)


Tensilica Upgrades Xtensa Cores
New Xtensa 7 and Xtensa LX2 Processors Get ECC and More

Fending off ARM's latest punches, Tensilica is introducing two new versions of its 32-bit configurable-processor cores. The biggest improvements are error-correction codes (ECC) to protect caches and local memories, an optional memory-management unit (MMU) for both processors, and several new configuration options that can boost performance, save gates, and reduce power. The enhanced processors are the Xtensa 7 and Xtensa LX2. [December 4, 2006]

  • Figure 1: Xtensa LX2 block diagram. Xtensa 7 has a similar microarchitecture.

  • Figure 2: The Xtensa LX2 offers more I/O options than Xtensa 7 does.

  • Table 1: Feature comparison of the new Tensilica Xtensa 7, existing Xtensa 6, new Xtensa LX2, and existing Xtensa LX processor cores.


Power.org's United Roadmap
Power Architecture Consortium Hints at Future Processors and Cores

Until now, forecasting the future of the Power Architecture (formerly PowerPC) required assembling a mosaic of individual roadmaps from different companies--some of which didn't even disclose roadmaps. That situation changed a few weeks ago, when the Power.org consortium released its first microprocessor roadmap consolidating the future plans of member companies. [November 27, 2006]

  • Figure 1: These three roadmaps plot the future of 64-bit Power Architecture microprocessors, processor cores, and hybrid architectures.

  • Figure 2: Power.org's roadmap for 32-bit chips.

  • Figure 3: The Power.org roadmap for 32-bit processor cores has few surprises but does indicate that IBM intends to broaden its new Power 46x line.

  • Figure 4: The Power.org roadmap for 32-bit hybrid/accelerated architectures.

  • Sidebar: IBM's New Licensable Power Cores

    • Table 1: Feature comparison of the IBM Power 460S, Power 464-H90, Power 464FP-H90, Power 440, and Power 405 licensable processor cores.


Xilinx Revs Up MicroBlaze
Licensable Soft-Processor Core for FPGAs Gets Faster and Smaller

Small improvements add up. At last month's Fall Microprocessor Forum, Xilinx unveiled an enhanced version of its licensable 32-bit processor core for FPGAs. Optimized for synthesis in next-generation Virtex-5 programmable-logic devices, the new MicroBlaze v5.00 processor uses deeper pipelining and higher clock speeds to boost integer performance by as much as 25% and floating-point performance by as much as 50% over the existing MicroBlaze v4.00 core. [November 13, 2006]

  • Figure 1: Comparison of the MicroBlaze v5.00 five-stage pipeline with the MicroBlaze v4.00 (and earlier) three-stage pipeline.

  • Figure 2: An example screen shot of the Xilinx Embedded Development Kit processor-configuration tool.

  • Table 1: Comparison of synthesizing a typical MicroBlaze v5.00 configuration in Virtex-5 and Virtex-4 FPGAs.

  • Table 2: Feature comparison of the Xilinx MicroBlaze v5.00, MicroBlaze v4.00, Altera Nios II/f, Nios II/s, and Nios II/e.


ARM Thumbs a Ride
New Cortex-R4F Processor Adds FPU and ECC for Automotive Market

On average, there are 1.3 ARM processor cores per cellphone. And these days, it seems as if half the motorists on the road are yapping on their cellphones while driving. So, in a way, ARM already has a strong presence in the automotive market--though not exactly in the way the company desires. ARM wants to see more of its processors built into automobiles, not merely used in automobiles. Today, ARM's automotive design wins are based on older cores, such as the ARM7TDMI and ARM966. Newer designs need more processing power. So ARM has announced the Cortex-R4F specifically for the automotive market. [October 30, 2006]

  • Figure 1: Six-year forecast of semiconductors in automotive systems.

  • Figure 2: For efficiency, the Cortex-R4F integrates the FPU pipeline with the existing eight-stage integer pipeline.

  • Figure 3: Partial superscalar pipelining allows the Cortex-R4F to dual-issue some pairs of integer and floating-point instructions.

  • Figure 4: At synthesis time, developers can choose the granularity of error detection and correction in the Cortex-R4F.

  • Table 1: This table shows the number of clock cycles required to perform some common single-precision floating-point operations on the new ARM Cortex-R4F, ARM's older VFP11 FPU, Freescale's Power e200 processor core, and Infineon's TriCore 1.3 processor core.

  • Table 2: Small differences in configurations and synthesis scripts can have a great effect on the size and speed of the ARM Cortex-R4F, even in the same fabrication process.

  • Table 3: Feature comparison of the ARM Cortex-R4F, ARM Cortex-R4, ARC 625D, ARC 750D, MIPS Technologies MIPS32 24Kf, and Tensilica Xtensa 6.


Editorial: Microprocessor Confusion

Relatively few people in the world know much about microprocessors—what they are, what they do, how they work. This ignorance may seem harmless. Merely learning how to use an electronic device is challenging enough. Why should ordinary folks get bogged down in low-level technical details that couldn't possibly matter to them? Unfortunately, as microprocessors become ubiquitous, knowing something about them is becoming not only desirable but necessary. Those who are familiar with microprocessors--including everyone who writes for this newsletter and everyone who reads it--should help educate the general public about an important technology that can seem as mysterious as string theory. [October 30, 2006]

Free link to this editorial: Microprocessor Confusion


Intel Goes Quad
Quad-Core Processors and 65nm Volume Shipments Beat AMD

Intel isn't out of the dark yet, but there's light at the end of the tunnel. And no, that glow isn't the laser beam of Intel's recent experiments with silicon photonics, which is a long-term beacon. Intel needs immediate results. Wisely, the company is returning to its traditional strengths: x86 processors manufactured with the world's best high-volume fabrication technology. It's a combination that competitors have found unbeatable. On September 26, Intel announced that quad-core server and desktop processors will begin shipping in November. Both product lines are months ahead of previously disclosed schedules. [October 16, 2006]

  • Figure 1: Photo showing how Intel's first quad-core x86 processors package two dual-core dies in a multichip module (MCM), which Intel calls a multichip package (MCP).


Ambric's New Parallel Processor
Globally Asynchronous Architecture Eases Parallel Programming

At Fall Microprocessor Forum in San Jose, California, Ambric introduced the Am2045 massively parallel processor and architecture. This 117-million-transistor chip, fabricated in a modest 0.13-micron CMOS process, crams 360 proprietary 32-bit RISC processors and 585KB of SRAM onto a single compact die. Maximum theoretical performance exceeds one trillion operations per second (TOPS) at 333MHz. The Am2045 is designed to replace high-end embedded processors, DSPs, and FPGAs in applications that require fast general-purpose integer and digital-signal processing. [October 10, 2006]

Free link to this article in Adobe PDF format: Ambric's New Parallel Processor

  • Figure 1: This conceptual diagram shows how software objects (essentially, subroutines or groups of related subroutines) run on multiple processor cores in Ambric's massively parallel array.

  • Figure 2: Local channels that connect neighboring processor cores also synchronize the cores in Ambric's massively parallel architecture.

  • Figure 3: Ambric's proprietary software-development tools are based on the Eclipse integrated development environment (IDE).

  • Figure 4: A block of aStruct code, Ambric's textual source code for creating parallel structures of objects.

  • Figure 5: This Java source code defines a class named PrimeGen. Objects instantiated from this class can test candidate integers to determine which are true prime numbers.

  • Figure 6: Streaming RISC (SR) processor block diagram.

  • Figure 7: Streaming RISC with DSP (SRD) processor block diagram.

  • Figure 8: This block diagram shows a basic cluster of four processor cores, four local memories, and their associated interconnects and control structures.

  • Figure 9: This block diagram shows one complete bric in the center, surrounded by parts of eight adjacent brics.

  • Table 1: Benchmark results comparing the Am2045 with a high-end Texas Instruments DSP and a Xilinx Virtex-4 FPGA.


Number Crunching With GPUs
PeakStream's Math API Exploits Parallelism in Graphics Processors

There are dozens, if not hundreds, of microprocessor architectures in the world. And Microprocessor Report covers new ones every year. With such abundance, it might seem daffy to use highly specialized 3D-graphics coprocessors for general-purpose number crunching. But the computational allure of GPUs is proving irresistible to the scientific community, chemical engineers, defense contractors, Wall Street financiers, and other heavy-duty math junkies. PeakStream, a Silicon Valley startup, has introduced new software and development tools that make GPUs relatively easy to program for data-intensive applications. [October 2, 2006]

  • Figure 1: The PeakStream Platform includes a special virtual machine and a just-in-time (JIT) compiler. Programmers write application code in C++ and compile to a standard x86 binary, which embeds the function calls to PeakStream's math libraries.

  • Figure 2: Two examples of C++ source code. Example A is typical sequential code without using any special function libraries. Example B uses PeakStream's math library.

  • Table 1: Complete list of function calls in PeakStream's application programming interface (API).


Editorial: Intel's Comeuppance

To be fair, nobody should gloat over Intel's recent troubles. At one time or another, we've all been there, right? But let's be realistic. Many folks throughout the industry are not-so-secretly enjoying Intel's upheavals. They aren't trying very hard to hide their smirks and water-cooler jokes. It's the season of Intel's comeuppance, and it's been coming for a long, long time. But watch out -- Intel has largely corrected its course and is now introducing some impressive new microprocessors. If anything, I expect Intel will be an even tougher competitor in the years to come. [September 25, 2006]

Free link to this editorial: Intel's Comeuppance


Preview: Fall Microprocessor Forum
Advances in Power Efficiency Is Theme of 18th Annual Fall Conference

Our theme at In-Stat's Fall Microprocessor Forum is "Advances in Power Efficiency -- Addressing the Global Challenge." All developers face the same problems, whether their design uses a tiny automotive microcontroller or a mighty supercomputer processor. Surprisingly, the solutions are largely the same, too, across the design spectrum. MPF will be held on October 9-11 at the Doubletree Hotel in San Jose, California. It will be our 18th annual fall conference, and it also marks In-Stat's 25th anniversary as a leading industry-analyst firm. To celebrate, we are reviving the famous MPF chip portfolio (every paid conference attendee gets a notebook with real microprocessor chips embedded in the cover), and we have arranged a stellar lineup of presenters. [August 28, 2006]

  • Photo 1: A view of Spring Processor Forum last May.

  • Photo 2: In-Stat will provide power outlets for notebook computers and a wireless network with access to forum presentations. Presentations will also be available on USB flash drives.

  • Photo 3: The traditional Tuesday night expo and party is an opportunity to mingle with exhibitors and fellow attendees. The food and drinks are pretty good, too.

  • Sidebar: Microprocessor Forums in Japan and Europe


The New Power Architecture
Freescale and IBM Work Together and Begin Revamping PowerPC

After years of following different paths, the two key founders of the PowerPC architecture have renewed their historic collaboration. Working closely together again--now within the Power.org industry consortium--Freescale (the former semiconductor division of Motorola) and IBM are uniting their visions for the 15-year-old microprocessor architecture. Power.org has announced a new architectural definition that brings together features from both Freescale and IBM and lays the groundwork for future convergence. For the first time, all the documentation will be consolidated in a common format. And hereafter, the common architecture will be called the Power Architecture. "PowerPC" is relegated to existing products and historical references. [August 21, 2006]

  • Figure 1: This new logo symbolizes the unified Power Architecture.

  • Figure 2: This timeline shows the evolution of the PowerPC/Power Architecture since 1991.

  • Figure 3: Power ISA 2.03 merges features from the Freescale and IBM modifications to the original PowerPC architecture, now known as PowerPC Classic.

  • Figure 4: Power ISA 2.03 defines three privilege levels for software execution, but one level is optional.

  • Figure 5: Power ISA 2.03 register files.

  • Table 1: The PowerPC architecture is defined in a collection of books dating back to the original definition in 1991. Each book describes a different aspect of the architecture and has been revised over the past 15 years.

  • Table 2: The PowerPC extension packages introduced by Motorola and Freescale are called auxiliary processing units (APU). Power ISA 2.03 makes the APUs official by renaming them "categories" and merging them into the new definition of the Power Architecture.

  • Table 3: Comparison of memory-management features in PowerPC Classic 1.10 and Power ISA 2.03.


Editorial: Intel's Embedded Future

Only two weeks after AMD announced the sale of its Alchemy business unit to Raza Microelectronics (RMI), Intel announced that it's selling most of its XScale business unit to Marvell Technology Group. Both PC-processor giants are divesting embedded-processor businesses in the same month. What's going on? The obvious explanation is that AMD and Intel are refocusing on their core business--x86 processors for PCs. Certainly, both companies need to pay more attention to their foundations. But what makes sense for AMD doesn't necessarily make the same sense for Intel. [July 31, 2006]

Free link to this editorial: Intel's Embedded Future


MathStar Challenges FPGAs
New Reconfigurable-Logic Chips Have Massively Parallel Arrays

MathStar calls its device architecture a field-programmable object array (FPOA). It consists of SRAM-based programmable logic, much like a conventional FPGA, but it's programmable at a higher level of abstraction. Instead of tinkering with gate arrays, designers work with a massively parallel array of preconfigured function units. Most of these units are identical ALUs or multiply-accumulate (MAC) units that can run autonomously. Others are register files shared by the ALUs and MACs. The first FPOA device has 400 of these 16-bit units woven together in a tightly coupled interconnect fabric. [July 24, 2006]

  • Figure 1: Initially, MathStar has created three types of Silicon Objects: 16-bit ALUs, 16-bit multiply-accumulate (MAC) units, and 64-entry register files.

  • Figure 2: MathStar MOA1400D block diagram.

  • Figure 3: Diagrams of MathStar's on-chip interconnect fabric.

  • Figure 4: MathStar's development flow for FPOAs differs in important respects from development for an FPGA.

  • Figure 5: Block diagram of a multistream MPEG-2 decoder mapped onto the array of MathStar's 1.0GHz MOA1400D processor.


China's Microprocessor Dilemma
China Needs Affordable Computers, but Which CPU Architecture?

During a recent visit to China, Microprocessor Report learned that the country's leaders face a difficult technology decision: Which microprocessor architecture should they support in a coming wave of low-cost personal computers designed for the Chinese domestic market? The most obvious answer--the x86 architecture, already the world standard and the only platform running Microsoft Windows--isn't necessarily the best answer for China. This decision could significantly affect the direction of China's future economic growth. It's related to seemingly unrelated things, such as China's ambition for technology independence, a widening gap between rich and poor that threatens social stability, and mounting problems with urban sprawl and environmental pollution. [June 26, 2006]

  • Photo 1: Worsening pollution in cities like Shanghai is making the Chinese question whether an economy based on heavy industry can support the kind of progress the country needs to make.

  • Photo 2: In the former rural district of Pudong, across the river from central Shanghai, China has constructed a clone of Silicon Valley--complete with office parks, tree-lined boulevards, freeways, and exhibition halls.

  • Photo 3: Weiwu Hu in his lab at the Institute for Computing Technology, part of the Chinese Academy of Sciences in Beijing.

  • Photo 4: This is one prototype design of the $150 computer designed by the nonprofit One Laptop Per Child organization.

  • Photo 5: This photo shows the Municator computer directly in front of a large video monitor, which displays the simplified GUI for launching application programs.


Editorial: Alchemy's Third Chance

AMD is selling its Alchemy business unit to Raza Microelectronics (RMI), and we think it makes good sense for both companies--but only if the transfer includes a significant number of the original engineers. Without those alchemists, RMI will struggle to turn lead into gold. [June 26, 2006]

  • Covering China in Microprocessor Report

  • IBM's Blog for Game Developers

Free link to this editorial: Alchemy's Third Chance


LSI Logic Wants Your SoC
Zevio SoC-Design Platform Has New IP for Consumer Electronics

LSI Logic has introduced a new SoC-design platform called Zevio. It consists of hardware IP, software IP, and professional design services for consumer-electronics application processors. Zevio also has emulators and prototyping systems that allow customers to write software in parallel with hardware development. Zevio is compatible with several 32-bit processor cores from ARM and MIPS Technologies, as well as the ZSP family of 16-bit DSP cores. Customers can take the finished chip design to any independent foundry or use one of LSI Logic's affiliated foundries. [June 12, 2006]

  • Figure 1: Chart showing that SoC-development costs are rising fast as fabrication technology moves to geometries below 0.18-micron.

  • Figure 2: LSI Logic created a new SDRAM controller for the Zevio platform that runs twice as fast as the AMBA 2.0 AHB and fetches data in shorter memory bursts, reading only as much data as the AHB master needs.

  • Figure 3: The Zevio SDRAM controller can write data to nonconsecutive memory addresses without rearbitrating the AHB.

  • Figure 4: Block diagram of the geometry engine in the AHB-compatible graphics core for Zevio.

  • Figure 5: Block diagram of LSI Logic's 3D rendering engine for Zevio.

  • Figure 6: Block diagram of LSI Logic's 64-channel audio engine for Zevio.


More Patents for Tensilica
Portfolio Now Includes Ten Patents Related to Configurable Processors

The U.S. Patent and Trademark Office recently issued three new patents to Tensilica for its configurable-processor technology. They follow seven related patents issued from 2002 to 2005. In addition, the patent office has reaffirmed a key Tensilica patent issued in 2002 that was anonymously challenged a year later. As a result, Tensilica now holds an impressive portfolio of at least ten patents on configurable-processor technology. [May 30, 2006]

  • Table 1: List of Tensilica's ten patents explicitly related to configurable-processor technology, including the patent numbers, titles, file dates, and issue dates.


Editorial: Spring Processor Forum...and Help Wanted

If you attended our recent Spring Processor Forum in San Jose, thank you! I hope you're one of the attendees who won our drawing for an Apple iPod after submitting your feedback form. (You did submit a feedback form, right?) If you didn't attend SPF, we hope you'll tell us why and consider attending our Fall Microprocessor Forum in October. [May 30, 2006]

  • Microprocessor Report is looking for a new editor in chief.

Free link to this editorial: Spring Processor Forum...and Help Wanted


ARM Reveals Cortex-R4
Deeply Embedded Processor Core Inches Toward Configurability

At Spring Processor Forum in San Jose, ARM revealed the first member of its Cortex-R family -- the Cortex-R4, a synthesizable 32-bit processor core for deeply embedded applications. With this debut, ARM has now introduced initial members of all three of the new Cortex families announced in 2004. The Cortex-R4 duplicates the relatively high performance and relatively low power consumption of the existing ARM9, ARM10, and ARM11 families while incorporating the latest features of the ARMv7 architecture. [May 16, 2006]

  • Figure 1: Growth projections for the Cortex-R4's target markets: automotive systems, hard-disk controllers, printers, home network gateways, and wireless modems.

  • Figure 2: Cortex-R4 block diagram.

  • Figure 3: Cortex-R4 pipeline diagram. At eight stages, the pipeline is one stage shorter than the ARM1156T2-S pipeline.

  • Figure 4: Cortex-R4 synthesized layout. Excluding memories, the core size ranges from 180,000 to 220,000 gates.

  • Figure 5: Looser timing for data transfers allows the Cortex-R4 to use slower, lower-power SRAM arrays for caches and tightly coupled memories.

  • Table 1: The Cortex-R4 takes a small step toward user configurability by offering these options in prewritten scripts for the synthesis compiler.

  • Table 2: ARM suggests configuring the Cortex-R4 in these ways for hard-disk controllers, chassis-level automotive systems, wireless modems, and imaging systems.

  • Table 3: Using the configuration options in Table 1, ARM synthesized three different versions of the Cortex-R4 with Artisan cell libraries, targeting TSMC's generic 130nm and 90nm fabrication processes.

  • Table 4: This comparison of two memory libraries from Artisan demonstrates why looser timing on the Cortex-R4's cache and TCM interfaces is a boon for developers.

  • Table 5: Feature comparison of the ARM Cortex-R4, ARM1156T2-S, ARM946E-S, ARC International ARC 625D, ARC 750D, MIPS Technologies MIPS32 4KE, Tensilica Diamond 212GP, and Tensilica Diamond 570T.


IBM Offers Chip-Level Security
SecureBlue Technology Aims to Make Security Ubiquitous in SoCs

In the digital age, embarrassing security breaches are becoming commonplace. A laptop computer with information about nearly 200,000 current and former Hewlett-Packard employees was stolen from Fidelity Investments. Flash-memory drives containing secret military intelligence were pilfered from a U.S. Army base in Afghanistan and openly sold in street bazaars. And, worst of all, Paris Hilton's cellphone address book was leaked on the Internet. IBM's Technology Collaboration Solutions Unit has an answer: SecureBlue, a new security technology for system-on-chip (SoC) devices. [May 8, 2006]

  • Figure 1: IBM's new SecureBlue technology adapts the security features of this IBM 4758 PCI card into licensable IP for SoCs.

  • Figure 2: SecureBlue block diagram.


Editorial: Microprocessor Forum China

On March 23, In-Stat and Microprocessor Report hosted our first-ever Microprocessor Forum in mainland China. It was a condensed one-day version of the three- or four-day events we've been hosting in Silicon Valley for more than 15 years. To help with logistics, we partnered with IDG China, an offspring of International Data Group, one of the first U.S. companies to establish a publishing business on the mainland. IDG's people worked closely with our Chinese analysts at In-Stat China, based in Beijing. As part of our China experiment, I traveled to Shanghai and Beijing to participate in our forum and meet with Chinese engineers and executives. [April 24, 2006]

Free link to this editorial: Microprocessor Forum China


Power Efficiency at SPF 2006
Preview: Spring Processor Forum's Theme Is Power-Efficient Design

Power consumption is the immovable object that is coercing irresistible forces like Intel, Apple, and IBM to find strategic detours. Searing wattage compelled Intel to abandon its pursuit of high clock frequencies and instead design PC processors with power-efficient cores. The same power-performance trends exerted so much gravity on Steve Jobs's reality-distortion field that Apple has abandoned PowerPC in favor of Intel's newly improved processors. And the very same immovable object persuaded IBM, Sony, and Toshiba to design the Cell Broadband Engine with a relatively simple PowerPC core surrounded by an array of power-efficient coprocessors. If the industry's heavyweights can't displace the immovable object of power consumption, but can only steer around it, what hope is there for the average line engineer designing an SoC? That's why our theme for Spring Processor Forum 2006 is power-efficient design. [April 24, 2006]

[Note: Access to this article doesn't require a subscriber password.]


Teja's FPGA Play
New Tools Build Packet Processors Using ANSI C and FPGAs

If off-the-shelf network processors don't fit the bill, but designing a custom part is too costly or intimidating, Teja Technologies has a fresh alternative: Teja FP (FPGA Platform). It's a package of development tools, software, and hardware intellectual property (IP) that allows software engineers to build a packet processor in an FPGA without using a hardware description language (HDL) or fabricating custom silicon. With Teja FP, programmers can start with existing data-plane code written in ANSI C or write new code in that language. After profiling and analyzing the code, the next step is to partition the application. The most compute-intensive parts can execute in the FPGA's programmable-logic fabric, while other parts can run on soft processor cores synthesized in the fabric. [April 3, 2006]

  • Table 1: Feature comparison of the Xilinx Virtex-4 FX programmable-logic devices with which Teja FP is compatible.

  • Figure 1: A low-cost packet processor designed with Teja FP might use only two or three Xilinx MicroBlaze soft processor cores. Each MicroBlaze core is an engine in the packet pipeline.

  • Figure 2: A high-performance packet processor designed with Teja FP requires several MicroBlaze processor cores. This example has nine cores.

  • Figure 3: Hardware and software development with Teja FP is highly interactive. Using feedback from code profilers and actual execution in the target FPGA, developers can rapidly refine their design.


Tensilica's Preconfigured Cores
Six Embedded-Processor Cores Challenge ARM, ARC, MIPS, and DSPs

Tensilica has introduced six preconfigured versions of its 32-bit processor cores to suit an unusually broad range of embedded applications. Whereas the smallest configuration is suitable for deeply embedded microcontrollers in real-time systems, the largest configuration sets a new record for DSP benchmarks. [March 29, 2006]

  • Table 1: Feature comparison of Tensilica's Diamond Series cores: the 108Mini, 212GP, 232L, 570T, 330HiFi, and 545CK.

  • Table 2: EEMBC benchmark scores for the ARM1026EJ-S, ARM1136JF-S, and Tensilica Diamond 570T.

  • Figure 1: Berkeley Design Technology DSP benchmark scores (BDTIsimMark2000) for the Tensilica Diamond 545CK, Ceva-X 1620, StarCore SC1400, and ARM1136.


Freescale Strengthens Power.Org
Reunited Alliance With IBM Plans the Future of the Power Architecture

Freescale Semiconductor's long-awaited decision to join Power.org strengthens the industry alliance and will help chart the course of the Power Architecture -- just in time. Recent moves by ARM, Intel, MIPS Technologies, and Sun Microsystems are strengthening the competition, too. Power.org is an open industry consortium with more than 40 corporate members whose mission is to coordinate the future evolution of the Power Architecture, more commonly known as PowerPC. In 2004, when IBM formed Power.org, the most conspicuous absentee among the 15 founding members was Motorola spinoff Freescale. [March 6, 2006]


MIPS Threads the Needle
MIPS32 34K: The First Licensable Multithreaded Processor Core

Microprocessor architects have explored many paths to high performance, including high clock frequencies, superscalar pipelines, application-specific extensions, very long instruction words (VLIW), and multicore processors. All those techniques and more are available in embedded-processor cores licensed as synthesizable intellectual property. Now MIPS Technologies is adding another option: the first licensable processor cores with hardware-enabled simultaneous multithreading. The new MIPS32 34K family consists of four 32-bit processor cores, all related to the MIPS32 24KE family. The key difference is pipelined multithreading. Instructions from as many as five different tasks can pass through the nine-stage pipeline of a 34K processor at the same time. [February 27, 2006]

  • Figure 1: A graphical representation of simultaneous multithreading (SMT).

  • Figure 2: By duplicating register files and other resources, a MIPS32 34K processor can run two operating systems and up to five thread contexts at the same time.

  • Figure 3: MIPS32 34K pipeline diagram.

  • Figure 4: Benchmarks indicate that the MIPS32 34K processor is 60% faster than a MIPS32 24KE processor when running packet-processing tests.

  • Figure 5: This chart shows the number of gates required for two similarly configured 34K and 24KE processor cores -- the same configurations MIPS used to obtain the benchmark results in Figure 4.

  • Table 1: Feature comparison of the MIPS32 34Kc, 34kf, 34Kc Pro, and 34Kf Pro.

  • Table 2: MIPS32 34K processors add eight new instructions to the MIPS32 Release 2 instruction-set architecture.

  • Table 3: Feature comparison of the MIPS32 34K, MIPS32 24KE, ARC 700, ARM Cortex-A8, Tensilica Xtensa 6, and Tensilica Xtensa LX processor cores.


Can ARM Beat the Clock?
ARM Ships the First Licensable, Clockless 32-Bit Microprocessor Core

ARM has finally delivered the ARM996HS, the first commercially available 32-bit microprocessor core implemented in asynchronous (clockless) logic. ARM's development partner is Netherlands-based Handshake Solutions, which helped bring the unconventional technology to fruition. If the ARM996HS succeeds, it could spark a revolution in power-efficient processing that researchers envisioned even before microprocessors were invented. But the project still has risks. Several previous attempts to introduce a clockless 32-bit microprocessor have failed, and the ARM996HS remains unproved in silicon. [February 21, 2006]

  • Figure 1: Power-consumption comparison of the ARM996HS and ARM968E-S.

  • Figure 2: ARM996HS block diagram.

  • Figure 3: The ARM996HS has a memory-protection unit that can segregate different regions of memory.

  • Figure 4: Peak-current comparison of the ARM996HS and ARM968E-S.

  • Table 1: Feature comparison of the ARM996HS, ARM968E-S, ARM966E-S, ARM946E-S, ARM926EJ-S, and ARM922T.


Cavium Expands Octeon Family
Single- and Dual-Core Chips Supplement High-End Network Processors

Cavium Networks is expanding its family of Octeon network/communications processors with chips that have one or two MIPS64 processor cores, instead of as many as 16 cores found in higher-end members of the family. But the new parts aren't simply chopped-down layouts. Their features, performance, power consumption, and prices vary according to their target applications, and they introduce some entirely new Octeon features, such as USB 2.0 and voice-over-IP (VoIP) interfaces. In all, Cavium has announced 10 new Octeon chips scheduled for sampling in 1Q06 and 2Q06. [February 6, 2006]

  • Table 1: Differences of the Octeon CP, SCP, EXP, and NSP series.

  • Table 2: Feature comparison of the Cavium Octeon CN3005 CP, CN3005 SCP, CN3010 CP, CN3010 SCP, CN3110 CP, CN3110 SCP, CN3110 NSP, CN3120 CP, CN3120 SCP, and CN3120 NSP.

  • Figure 1: System block diagram of an Octeon-based 802.11n broadband wireless gateway with support for VoIP phones.


Cell Processor Isn't Just for Games
Innovative Chip Is Best High-Performance Embedded Processor of 2005

Deciding on our MPR Analysts' Choice Award for Best High-Performance Embedded Processor of 2005 wasn't easy. We evaluated several strong candidates before picking our winner: the Cell Broadband Engine, jointly designed by the STI alliance: Sony, Toshiba, and IBM Microelectronics. The Cell BE is destined for Sony's next-generation home videogame console, the PlayStation 3, scheduled for release later this year. [January 30, 2006]


Cortex-A8 Balances Power, Performance
ARM's Fastest Processor Wins Award for Best Processor-IP Core of 2005

Years from now, the industry may remember 2005 as the pivotal year when ARM began extending its reach from low power to high performance. In any event, we believe ARM's fastest processor to date -- the Cortex-A8 -- deserves our MPR Analysts' Choice Award for Best Processor-IP Core of 2005. The Cortex-A8 is ARM's first superscalar processor core, and it's the first ARM processor capable of attaining clock frequencies in the gigahertz range. It's the biggest departure in processor design for ARM since the company was founded in 1990. [January 30, 2006]


Embedded Processors Thrive in 2005
Radical Multicore Chips and Innovative Startup Companies Proliferate

This article contains our analysis of embedded-processor events in 2005 and speculation about what's to come in 2006 and beyond. We identify five broad trends in embedded processors. None of these trends actually started last year, but they gained momentum in 2005 and will be major forces in the future. For a concise summary of last year's developments, with links to related MPR articles, see the sidebar, "Embedded-Processor Highlights of 2005." [January 30, 2006]

  • Figure 1: Block diagram of the Chinese Godson-2 microprocessor.

  • Figure 2: Mercury Computer Systems introduced its Cell Technology Evaluation System in January 2006.

  • Figure 3: Block diagram of Actel's Fusion FPGAs.

  • Figure 4: Die photo of the triple-core processor that IBM designed for Microsoft's Xbox 360 videogame console.

  • Figure 5: Block diagram of ARM's Cortex-A8 superscalar processor core.

  • Figure 6: Die photo of the first eight-core XLR network processor from Raza Microelectronics.

  • Sidebar: Embedded-Processor Highlights of 2005


Massively Parallel Digital Video
Fabless-Semi Startup Connex Reveals New Processor Architecture

Three things in life seem certain: death, taxes, and new microprocessor architectures. Unlike the first two things, new architectures aren't necessarily bad, but they are becoming even more expensive. The latest new microprocessor architecture to emerge is unconventional, massively parallel, and optimized for the narrow domain of high-definition digital video. Although Connex Technology's architecture is applicable to other purposes -- such as pattern-matching filters in security processing -- digital video is the largest potential market offering an opportunity for a profitable return on investment. [January 9, 2006]

  • Figure 1: The Connex integral parallel architecture is based on a massively parallel array of processor cores known as processor elements (PEs). The first commercial chip has 1,024 PEs.

  • Figure 2: PEs are arranged in a two-dimensional array, much like other massively parallel architectures, but Connex simplified the on-chip interconnect fabric by severely limiting the connections among the PEs.

  • Figure 3: The Connex Machine can operate on 1,024 words of data simultaneously. These 16-bit words are arranged in a single-dimensional array or vector.

  • Figure 4: Connex created a proprietary version of ANSI C, known as Connex Programming Language (CPL), that adds new vector datatypes and commands.

  • Sidebar: The Key to Massive Parallelism: Think Small


The Oblique Perspective:
Merry Virtual Christmas

Digital Music Is Great, But I Miss Album-Cover Art!

Digital music distribution allows performing artists to circumvent the obstacles of expensive recording studios, greedy record companies, and corporate chain stores. Anyone can make their music available directly to the public. And it's understandable why listeners want their music in a pure digital format that liberates bits from atoms. Eliminating the physical media and packaging strips the music down to its essence: music. However, record-album covers were more than mere packaging. Are we sacrificing something worthwhile by distributing music as digital-audio files without visual artwork? [December 27, 2005]

  • Photo 1: The Association's Birthday album, 1968.

  • Photo 2: The Beatles' Sgt. Pepper's Lonely Hearts Club Band, 1967.


Actel Releases First Fusion Chip
Highly Integrated Mixed-Signal FPGA With Flash Is an Instant SoC

Just because ASIC and system-on-chip (SoC) projects are becoming prohibitively expensive for many developers doesn't mean there's less demand for custom chips. Product differentiation and integration still matter. Hence, the rush toward alternatives to full-custom silicon, such as FPGAs, structured ASICs, and reconfigurable processors. Actel's latest alternative is the Fusion Programmable System Chip (PSC)--a new breed of FPGA that can replace SoCs with a single off-the-shelf do-it-all chip. Fusion FPGAs combine reprogrammable logic with analog and digital peripherals, analog and digital I/O, SRAM, flash memory, and optional soft processor cores (a license-free ARM7TDMI-S or 8051). [December 19, 2005]

  • Figure 1: Die photo of the Actel Fusion AFS600.

  • Figure 2: Screen photo of Actel's CoreConsole, a new soft-IP integration tool for Fusion FPGAs.

  • Figure 3: Screen photo of Actel's SmartGen, a new peripheral-configuration tool for Fusion FPGAs.

  • Table 1: Feature comparison of the first four Fusion chips: the AFS090, AFS250, AFS600, and AFS1500.


Philips TriMedia Goes Mobile
New TM3270 Is the First Low-Power TriMedia Processor Core

In early November, Philips announced the TM3270, the first low-power TriMedia core for mobile applications. Other TriMedia cores deliver high performance but consume too much power for the new wave of portable consumer-electronics products. The TM3270 uses multiple techniques to cut power consumption and has new instructions and other features targeting digital video. It supports all the latest audio/video software codecs and can fully decode D1-resolution H.264 video streams while typically consuming less than 100mW. [December 5, 2005]

  • Table 1: Feature comparison of the Philips TriMedia TM3270, TM3260, and TM5250 media-processor cores.

  • Table 2: TM3270 power consumption while running an MP3 audio decoder on a 384Kb/s bitstream (44.1kHz stereo).

  • Figure 1: A wide range of performance benchmarks comparing three different configurations of the TM3270 and the TM3260.

  • Figure 2: Among the approximately 40 new instructions in the TM3270 is a collapsed-load operation that loads five bytes from memory and performs a linear interpolation before saving the four-byte result in a 32-bit register.

  • Figure 3: Die photo and floor plan of the TM3270 core with a 64KB instruction cache and 128KB data cache.


Tensilica Previews Video Engine
Synthesizable Dual-Core Decoder Is Optimized for Digital Video

At the recent Fall Processor Forum in San Jose, Tensilica previewed a high-performance video-decoder engine based on two Xtensa LX configurable-processor cores. Tensilica is preconfiguring the cores by customizing them with application-specific extensions, adding local memory and other intellectual property (IP), and licensing the whole synthesizable design as a drop-in module for SoCs needing video acceleration. [November 28, 2005]

  • Figure 1: Block diagram of Tensilica's dual-core video-decoder engine. One Xtensa LX processor is configured as a stream processor and the other as a pixel processor.

  • Table 1: Tensilica measured video performance on two simulated video-decoder engines: one using a pair of base-configuration Xtensa LX cores for the stream processor and pixel processor, and another using the same processors optimized with the new video extensions.

  • Sidebar: Tensilica Introduces Xtensa 6 Processor Core

    • Feature and performance comparison of Tensilica's Xtensa V, Xtensa 6, and Xtensa LX configurable-processor cores.


ARC Shows SIMD Extensions
New Instructions With Macros and DMA Extend ARC 700 Processor

Video is the next MP3, and any embedded processor competing for sockets in tomorrow's consumer gadgets must be able to handle digital video and audio processing. At Fall Processor Forum 2005, ARC International's chief architect, Nigel Topham, presented ARC's new SIMD extensions for digital video. These extensions are for the ARC 710D, 725D, and 750D -- three preconfigured cores in the ARC 700 embedded-processor family. ARC will license the SIMD extensions as parts of larger extension packages released later this year. [November 21, 2005]

  • Figure 1: ARC's new SIMD instructions can execute in a closely coupled mode or a decoupled mode, depending on the program's requirements.

  • Figure 2: These two examples show how programmers can write the same code for closely coupled SIMD execution or decoupled SIMD execution.

  • Figure 3: Pipeline diagram of the ARC 750D processor with SIMD extensions.

  • Figure 4: Synthesized floorplan of an ARC 750D processor core with the ARCmedia Subsystem, which includes the new SIMD extensions.

  • Table 1: List of 104 new instructions that ARC's SIMD extensions add to the ARCompact ISA.

  • Table 2: ARC measured the digital-video performance of its new SIMD extensions by synthesizing an ARC 750D processor core in a Virtex-4 FPGA.


Videantis Chases Digital Video
Synthesizable Video Coprocessors Pursue Emerging Applications

Gil Scott-Heron famously said the revolution will not be televised, but now it's looking like television is the revolution. TV is appearing everywhere, it's affecting everyone's lives, everyone is watching it, and it's watching everyone. In other revolutions, heads roll; in this one, heads talk. Into this maelstrom jumps Videantis, a startup based in Hannover, Germany. At Fall Processor Forum 2005, Videantis unveiled two synthesizable video-coprocessor modules based on the same proprietary processor core. Videantis wants to license the modules and optimized software to designers building programmable video chips for high-definition television (HDTV) and mobile consumer electronics. [November 7, 2005]

  • Figure 1: Block diagram of the v-MP2 core.

  • Figure 2: Block diagram of the single-core v-MP2000M video coprocessor module.

  • Figure 3: Block diagram of the triple-core v-MP2000HD video coprocessor module.

  • Table 1: Estimated performance of the v-MP2000M and v-MP2000HD when running popular video codecs.


Z-RAM Shrinks Embedded Memory
Innovative Silicon's Tiny DRAM Cells Alter the Memory Equation

Earlier this year, a Swiss startup, Innovative Silicon, announced a new embedded-memory technology called Z-RAM, because each one-transistor bit-cell requires zero capacitors. Z-RAM exploits an inherent electrical effect of silicon-on-insulator technology to temporarily store the bit-cell's binary state. In a technical presentation at Fall Processor Forum 2005, Innovative Silicon explained how Z-RAM works and made a strong argument that it's the logical alternative for embedding memory in future microprocessors. [October 25, 2005]

  • Figure 1: Embedded memory already accounts for more than half the die area of typical microprocessors and SoCs, and it will soon overwhelm the silicon devoted to logic.

  • Figure 2: Conventional embedded DRAM (eDRAM) requires a deep-trench capacitor structure in addition to the transistor for each bit-cell.

  • Figure 3: How Z-RAM works. Innovative Silicon refers to positive charging as "impact ionization" and to negative charging as "hole removal."

  • Figure 4: Detecting the difference between a stored 0 or 1 in a Z-RAM bit-cell is similar to sensing the value of a conventional DRAM bit-cell. At the top is a schematic of the cell.

  • Figure 5: Z-RAM requires only one transistor, like conventional DRAM, but it doesn't need the deep-trench capacitor shown in Figure 2.

  • Figure 6: This micrograph shows a one-transistor Z-RAM FinFET bit-cell fabricated for test purposes.


Gordon Moore and Carver Mead
Two Pioneers Discuss Moore's Law and the Birth of an Industry

To celebrate the 40th anniversary of Moore's law, the Computer History Museum in Silicon Valley invited Dr. Gordon Moore and Dr. Carver Mead to talk about the law, reminisce about Moore's distinguished career in the semiconductor industry, and discuss other topics. On the evening of September 29, the museum's auditorium filled to capacity with an eager crowd of museum members and guests. Microprocessor Report recorded and transcribed this special event. [October 17, 2005]

  • Photo: Dr. Gordon Moore

  • Photo: Dr. Carver Mead

  • Photo: Moore and Mead


Philips Challenges 8-Bit MCUs
New 32-Bit ARM7 Microcontrollers With Flash Memory Start at $1.47

In the latest attempt to lure embedded-systems designers away from 8- and 16-bit MCUs, Philips Semiconductor has introduced three new 32-bit MCUs with the ubiquitous ARM7TDMI processor core. The lowest-priced part -- the LPC2101 -- has 8KB of on-chip flash memory and starts at only $1.47 in large volumes. That appears to be a new low price for flash-integrated ARM7 MCUs in this relatively high performance class (70MHz, 63 Dhrystone mips). The other two parts -- the LPC2102 and LPC2103 -- have 16KB or 32KB of on-chip flash and cost $1.85 or $2.20, respectively. All three parts are stuffed with peripherals, timers, and other accoutrements of general-purpose MCUs. In addition, Philips has included features to address the shortcomings of previous 32-bit MCUs and to duplicate some advantages of 8- and 16-bit devices. [October 10, 2005]

  • Figure 1: LPC210x block diagram. The only differences among the LPC2101, LPC2102, and LPC2103 are their amounts of internal flash memory (8KB, 16KB, or 32KB) and SRAM (2KB, 4KB, or 8KB).

  • Table 1: Power-saving modes in the new LPC2101, LPC2102, and LPC2103.

  • Table 2: Feature comparison of the new Philips LPC2101, LPC2102, and LPC2103; the Atmel AT91SAM7S32; and the Oki Semiconductor ML67Q406x and ML67Q500x.


Preview: Fall Processor Forum 2005
Multicore Processing Dominates 18th Annual Conference

Not since the days when RISC and VLIW challenged the CISC orthodoxy has there been such an upheaval in microprocessor design. Every major company in every major market -- PCs, servers, and embedded systems -- is converging on multicore processing. Microprocessor Report has provided front-line coverage of the multicore revolution since its beginnings in the 1990s. Now it's time to pull everything together for an event that covers all dimensions of multicore processing. The theme of Fall Processor Forum 2005, our 18th annual fall conference, will be "The Road to Multicore." FPF will offer technical presentations on new multicore processors, licensable intellectual property (IP) for multicore designs, on-chip interconnect technology for multicore chips, system software for multicore architectures, and software-development tools for parallel processing. [September 26, 2005]


Cavium: Security Optional
New Octeon EXP Processors Omit Internal Cryptography Engine

Cavium Networks is as closely connected with network security as Linus in Peanuts is associated with his security blanket. Cavium gained fame with its award-winning Nitrox security coprocessors in 2002 and soon will begin shipping its Octeon NSP multicore network processors with integrated security engines. Now, Cavium is tossing aside part of its security blanket -- for some chips, at least. Cavium's new Octeon EXP family is virtually identical to the Octeon NSP family, except that it discards the integrated cryptography engine and related features. Octeon EXP is for customers that don't need network security at this time or prefer using a separate security coprocessor. In addition, Cavium can freely export Octeon EXP chips to countries subject to U.S. government trade controls. [September 6, 2005]

  • Table 1: Feature comparison of Cavium's Octeon EXP CN3630, EXP CN3830, EXP CN3840, EXP CN3850, EXP CN3860, and the Octeon NSP CN3xxx family.

  • Table 2: Feature comparison of Cavium's Octeon EXP family with Broadcom's SiByte BCM12xx and BCM14xx processors, Freescale's PowerQUICC III MPC8641/D processors, PMC-Sierra's RM11200 processor, and Raza Microelectronics' XLR family.


ARC Patent Looks Formidable
U.S. Patent Covers Automated Tools for Customizing Processor Cores

(With Rich Belgard)

A showdown may be looming between ARC International and archrival Tensilica over who invented the software tools and methods for customizing synthesizable microprocessor cores. Both companies have won important U.S. patents for the technology, and ARC's latest patent appears both broad and strong. Whether or not ARC and Tensilica come to legal blows, their growing patent portfolios should worry other companies working in the expanding field of configurable processors. In general, Microprocessor Report agrees with ARC that U.S. patent 6,862,563 lays claim to key technology for automating the configuration of synthesizable processors and other soft intellectual property (IP). However, the complex language and convoluted history of the patent defy easy analysis and interpretation. [August 29, 2005]

  • Figure 1: ARC's processor-configuration tool, now called the ARChitect Processor Configurator, runs on a PC or a Sun workstation. It has an easy graphical user interface that allows chip designers to rapidly customize an ARC processor core by choosing predefined options.

  • Figure 2: In this excerpt from Figure 1, ARChitect indicates how the user has configured an ARC 700 processor core.

  • Figure 3: Tensilica's processor-configuration tool underwent major changes last year. The predefined configuration options for the Xtensa LX processor now appear in Tensilica's integrated development environment (IDE), called Xplorer, which runs on the customer's desktop workstation.

  • Figure 4: ARC's '563 patent has dozens of flowcharts like this one, showing how a processor-configuration program accepts user input to customize the synthesizable core.

  • Figure 5: To eliminate any possible doubt about what constitutes a computer, ARC's patent describes the required components and illustrates them with this figure.

  • Figure 6: This example of creating a custom instruction in Tensilica Instruction Extension (TIE) language is from Tensilica's website. It explicitly promotes TIE as a higher-level alternative to standard design languages like Verilog and VHDL.


Actel Mixes Signals on FPGAs
Programmable Chips Will Integrate Analog, Digital, and Memory

As design costs soar like housing prices, ASIC alternatives are multiplying like Realtors. Actel--a second-tier FPGA vendor, behind market leaders Xilinx and Altera--is proving to be unusually creative at exploiting this opportunity. Actel has announced a technology called Fusion that, for the first time, can integrate mixed-signal logic with an embedded-processor core, flash memory, and SRAM in the same programmable-logic device. With Fusion, a single FPGA could perform some or all of the analog- and digital-processing functions in an embedded system. [August 15, 2005]

  • Figure 1: Fusion block diagram. Fusion will integrate hard-wired analog peripherals with hard and soft intellectual-property (IP) cores in an FPGA. A soft embedded-processor core is optional. Tying everything together is the Fusion backbone, which consists of a multidrop bus and microsequencer implemented in programmable logic.


China's Emerging Microprocessors
'MIPS-Like' Godson Chips Echo the Past, Foreshadow the Future

Beyond the land of the rising sun is the rising Godson, a growing family of microprocessors designed and manufactured in China by Chinese engineers for the Chinese domestic market. Intended for low-cost desktop computers, servers, and embedded systems, these 32- and 64-bit chips are rapidly becoming as sophisticated as any designs in the world, falling short in performance only because Chinese fabrication technology lags behind the rest of the industry by two process generations. Microprocessor Report recently interviewed Godson's chief architect, Weiwu Hu, a professor at the Institute of Computing Technology in Beijing. Weiwu described the Godson-1 and Godson-2 in unprecedented detail and revealed some of his ambitions for the Godson-3. After analyzing this information, MPR believes the Chinese already are capable of designing world-class microprocessors, if they can gain access to world-class fabrication technology. [July 25, 2005]

  • Figure 1: Godson-2 block diagram. China's most powerful microprocessor is patterned after the MIPS IV ISA and is similar to the MIPS Technologies R10000 processor introduced in 1995.

  • Table 1: The Godson-2's instruction latencies are similar to those of other high-performance RISC processors.

  • Figure 2: Godson-2 pipeline diagram. ICT lengthened the Godson-2 pipeline by two stages, compared with the Godson-1.

  • Table 2: Feature comparison of the Godson-1, Godson-2, and three important MIPS processors: the R3000, R4000, and R10000.

  • Sidebar: A Conversation With Godson's Father -- Excerpts from our exclusive interview with Weiwu Hu, chief architect of the Godson-1 and Godson-2 and a professor at the Institute for Computing Technology, Chinese Academy of Sciences, Beijing.

    • Photo of Weiwu Hu

  • Sidebar: China Likes the x86, Too -- Even while the Chinese develop their own Godson microprocessors patterned after the MIPS architecture, they are also seeking ways to make x86-compatible processors for China's domestic market and possibly for export.


ARM Strengthens Java Compilers
New 16-Bit Thumb-2EE Instructions Conserve System Memory

In the 10 years since Sun Microsystems introduced Java, the drago