Microprocessor Report Archive

Here's an index of Tom's past articles in Microprocessor Report. Some articles are online in HTML and PDF formats for paid subscribers. For more information, visit the TechInsights website.

Wi-Fi 7 AP Chips Coming Next Year

Broadcom, MediaTek, and Qualcomm Compete for Future Wireless Routers

Three leading chipmakers are sampling their first Wi-Fi 7 (802.11be) infrastructure products, so the consumer debut is about a year away. OEMs can now start designing the routers, gateways, and access points for homes and enterprises that potentially quadruple peak throughput versus today's Wi-Fi 6E (802.11ax). Broadcom, MediaTek, and Qualcomm are leading the charge with new chipsets that enable 320MHz channel widths, denser modulation, better interference mitigation, and more spatial streams. New routers can operate in all three Wi-Fi bands (2.4-, 5-, and 6GHz) and in some cases can split the higher ones for quad- or penta-band operation. In theory, Wi-Fi 7 can deliver up to 45Gbps as measured at the RF PHY interface, but real-world speeds will be much lower. Home and business gateways will provide up to 10Gbps peak. It's about twice as fast as a typical Wi-Fi 6E router, and it leaves headroom for future designs. [June 6, 2022]

Figure 1: Example system diagram of a Wi-Fi 7 access point.
Figure 2: MediaTek's chip integration.
Figure 3: MediaTek penta-band router configuration.
Table 1: Host processors for Wi-Fi 7 home and enterprise access points.
Table 2: Broadcom baseband chips for Wi-Fi 7 access points.
Table 3: MediaTek chipset configurations for Wi-Fi 7.
Table 4: Qualcomm Networking Pro Series platforms.
Table 5: Comparison of Wi-Fi 7 platforms.

TetraMem Touts In-Memory DLA

Memristors Reduce Power for Analog Network-Edge Processing

HP Labs startled the engineering world in 2008 by announcing the first memristor, a fundamental circuit element originally theorized in 1971. Now, a Silicon Valley startup is touting a different type of memristor that uses physical laws to perform analog math for power-efficient neural-network processing. Instead of trying to develop discrete memristor-based memory, TetraMem has created a materially different "computing memristor." Together with other elements, it provides nonvolatile neural-network storage and performs in-memory inference processing. The company's third test chip since 2018 is in progress for tapeout late this year. Although it's also mainly a proof of concept, the company hopes to sample it to customers next year while developing a commercial deep-learning accelerator (DLA) for small-volume production in 2024. [April 25, 2022]

Figure 1: TetraMem 65nm test chip.
Figure 2: Two types of memristors.
Figure 3: TetraMem's memristors and arrays.
Table 1: TetraMem test chip and roadmap.

Picocom Thinks Big for Small Cells

PC802 Baseband SoC Can Handle Thousands of Users

Better late than early! That's the lesson learned by survivors of PicoChip, a long-gone UK company that pioneered 3G chips for small-cell base stations — only to be swamped by later competitors. Some of the survivors are now attempting a comeback with a new UK startup that's sampling a 4G/5G baseband chip for small cells. Instead of leading the charge, this SoC comes about three years after cellular carriers began deploying 5G networks. By now, those carriers and other customers should be ready to deploy small cells to augment their coverage. At least, that's the new business strategy — and competitors are doing likewise. The new startup is Picocom, founded in 2018. Its first product is the PC802, a Layer 1 (PHY) SoC for indoor and outdoor 4G/5G base stations that can serve thousands of users on an open radio access network (O-RAN). Picocom began sampling last November and expects to begin production this August. [February 28, 2022]

Figure 1: PC802 block diagram.
Table 1: Comparison of two baseband SoCs for small cells: the Picocom PC802 and EdgeQ-2540.

DXOmark Measures Phone Cameras

Testing Is Thorough, Scoring Is Mysterious, But Results Are Credible

Smartphone vendors know that a picture can be worth a thousand dollars as well as a thousand words. In the $1,000+ premium tier, phones are rapidly improving their cameras with multiple lenses, larger image sensors, and better processing. Evaluating these improvements is a challenge. Qualitative reviewers typically photograph various subjects and compare the pictures with those made by other phones. Quantitative reviewers photograph test targets to measure parameters such as dynamic range and resolution. DXOmark Image Labs combines both approaches to create its eponymous benchmark, which has become a respected source for smartphone-camera comparisons. The camera tests evaluate both still-photo and video performance and are impressively thorough. The company says its testers take more than 3,000 photos and 2.5 hours of video for each review. Despite criticism that the scoring method is proprietary, DXOmark ratings have become a de facto standard. [August 2, 2021]

Figure 1: Highest-scoring phone cameras.
Figure 2: Still-photo-test attributes for the selfie camera.
Table 1: DXOmark Camera v4 test suites.
Table 2: DXOmark Camera scores for four premium smartphones.

Upmem Nails RowHammer

French Company Offers Defense Against DRAM Vulnerability

Since 2014, an attack called RowHammer is playing havoc with DRAM chips. It randomly flips bits by rapidly and repeatedly accessing specific DRAM rows. Although it's unable to control the flips to write malicious code into memory, it can overwhelm ECC protection, crashing the affected program or forcing a reboot. In more-sophisticated assaults, it can trigger a fault that gives the attacker elevated system privileges or access to another user's virtual partition on a shared server. Various countermeasures have been only somewhat effective or costly in lost performance and power. Now, a French startup, Grenoble-based Upmem, claims to have a defense that's effective and economical. It has patented the technique, called Silver Bullet, and is offering it to DRAM manufacturers as licensable intellectual property. [June 21, 2021]

Figure 1: How RowHammer corrupts DRAM.

ST Raises IoT Power Efficiency

STM32U5 MCUs Pair a Fast Cortex-M33 With Crypto Acceleration

STMicroelectronics' new STM32U5 microcontrollers are power-efficient chips that feature an Arm Cortex-M33 CPU operating at 160MHz with TrustZone security. At their maximum clock speed, they outrun other Cortex-M33 MCUs. Yet they can draw as little as 3.5mA at that clock rate, undercutting slower competitors. Their combination of high performance, low power, strong security, large memories, and a plethora of peripherals is unmatched. One drawback: starting at $3.60 in 10,000-unit volumes, they're more expensive than competing products. [March 29, 2021]

Figure 1: STM32U5 block diagram.
Table 1: Feature comparison of two MCUs: ST's STM32U5 and NXP's LPC55S16.
Table 2: Power/performance comparison of the STM32U5 and LPC55S16.

Lakefield Shrinks x86 SoCs

Intel Launches Its First Die-Stacked Hybrid-CPU Processors

Intel's new Core Hybrid processors (code-named Lakefield) are the first to employ the company's Foveros die-stacking technology. They bond a 10nm CPU die to a new 22nm south bridge, then use conventional package-on-package (PoP) technology to add 4GB or 8GB of DRAM. Everything squeezes into an impressively compact 12mm x 12mm x 1mm package. They're also the first Intel processors with a "hybrid" cluster of large and small CPUs. In all, they have five CPU cores, an integrated GPU, an image processor, a video accelerator, an audio DSP, a PCI Express controller, and other I/O interfaces. Smaller than a dime, these SoCs are moving the x86 architecture into slimmer notebooks and tablets than ever before. The first two models are the Core i3-L13G4 and Core i5-L16G7. As is common for Intel, they employ the same silicon, but the i3 model operates at lower clock frequencies and disables part of the GPU. [June 29, 2020]

Figure 1: Lakefield block diagram.
Figure 2: Tremont CPU versus Sunny Cove CPU power efficiency.
Table 1: Intel Core Hybrid processors.
Table 2: Comparison of three SoCs for thin notebooks and tablets: Intel's Core i5-L16G7 and Core i7-8500Y versus Qualcomm's Snapdragon 8cx.

Mali-G78 Raises Premium Performance

Arm's Newest GPU IP Boosts Throughput and Power Efficiency

One year after introducing the Valhall GPU architecture and high-end Mali-G77, Arm is launching the Mali-G78 — a second-generation design that boosts graphics and machine-learning performance by about 15% at the same power level. When matching the G77's performance, it's about 10% more power efficient, the company says. Because the G78 targets the latest 5nm FinFET technology, throughput could rise even further above a 7nm G77 design. The maximum number of shader cores increases to 24 versus 16, so performance could theoretically soar by another 50%. Arm is also adding a new mid-premium product (which the company calls "sub-premium"). The Mali-G68 is identical to the G78 but is restricted to smaller configurations of 1 to 6 cores. Although it's merely a branding strategy, the G68 allows licensees to deliver premium GPU features for the mid-premium tier. Arm now offers three Valhall-based tiers above entry-level GPUs like the Mali-G31, which still employs the previous Bifrost architecture. [June 1, 2020]

Figure 1: Arm's expanded GPU lineup.
Figure 2: Asynchronous top-level power management.
Figure 3: Asynchronous domains enable higher performance.
Table 1: Arm Mali-G78 and Mali-G68 GPUs versus Imagination Technologies' PowerVR AXT-Series GPUs.

Helio P95 Boosts AI and Graphics

MediaTek Upgrades Helio P90 for Mid-Premium 4G Smartphones

Just in time for recession shoppers wary of splurging on luxury goods, MediaTek is shipping an upgraded processor for mid-premium 4G (LTE) smartphones. The Helio P95 supersedes last year's P90 by boosting AI throughput and graphics performance by 10% each — just enough to stay competitive with similar processors from Huawei and Qualcomm. We believe the P95 is the same silicon as the P90 but represents better yields for the best bins. Even if it's operating at the same voltage, it will burn slightly more power in return for slightly higher performance — a disadvantage against the competition. It's a modest upgrade that keeps MediaTek in the 4G derby while the focus moves to a new 5G product, the Dimensity 800. [May 11, 2020]

Figure 1: Helio P95 block diagram.
Table 1: Comparison of three mid-premium 4G smartphone processors: MediaTek Helio P95, Huawei Kirin 810, and Qualcomm Snapdragon 675.

RISC-V Vectors Know No Limits

Proposed Extensions Target Microcontrollers to Supercomputers

At least five years in the making, RISC-V vector extensions are finally nearing approval. They impose almost no limits on the size or number of data elements, and they allow mixed-width elements to execute in the same instruction stream with fixed-width elements and general-purpose instructions. They can operate on all data types, and compiled binaries will run on any implementation. CPU designs now under way range from microcontroller-class cores with 32-bit vectors to supercomputer-class cores with 16,384-bit vectors. The 32-bit RISC-V Vector (RVV) specification will soon advance from v0.8 to v0.9. The V-extension task group expects to propose the v1.0 specification by the end of June, kicking off a 45-day review for final comments. Assuming no serious objections, the board could adopt the spec in August. RISC-V vendors can then begin shipping production RTL for CPUs with the extensions. SiFive already has three cores in development. We expect the first processors implementing the new extension to enter production in 2022. [April 27, 2020]

Figure 1: A typical vector-register configuration.
Figure 2: Vector-processing code example.
Figure 3: Loading small data elements from memory into vector registers.
Figure 4: Vector versus scalar performance.
Table 1: SiFive VI-series CPU cores.

ThunderX3's Cloudburst of Threads

Marvell Previews 96-Core 384-Thread Arm Server Processor

Marvell is on the verge of sampling its third-generation ThunderX server processor — an Arm-compatible chip that crams up to 96 CPUs and 384 threads per socket. The new ThunderX3 targets high-performance computing, cloud-based Android gaming, and scale-out applications in cloud data centers. In per-core and per-thread power efficiency, it surpasses all other Arm server processors as well as the best x86 chips. It's scheduled to sample this quarter, and we estimate volume production will start in 1Q21. Like the previous ThunderX2 design, the base clock frequency remains 2.2GHz, but the maximum turbo speed rises 20% to 3.0GHz. Because the biggest ThunderX2 has only 32 CPUs, tripling the core count in one generation raises the bar to a whole new level. [April 13, 2020]

Table 1: Comparison of three high-end server processors: Marvell ThunderX3 CN110xx, Ampere Altra, and AMD Epyc 7742.

Versal Premium Targets Core Networks

New Xilinx FPGAs Harden 600Gbps Ethernet and Interlaken

Xilinx recently announced new Versal FPGAs for the first time since unveiling the family in 2018. The new Versal Premium Series defines the high end, above the existing AI Core Series and Prime Series. It brings more configurable logic, more DSP blocks, and more high-speed Ethernet. New hard logic includes 600Gbps Ethernet and Interlaken controllers plus multiple 400Gbps cryptography engines and 32Gbps PCI Express Gen5 controllers. DSP integer performance surges to 99 TOPS, and 32-bit floating-point performance reaches 23 Tflop/s. Xilinx designed the Premium Series mainly for core-network acceleration in broadband and 5G-cellular networks and for high-speed interconnects between hyperscale data centers. Its prodigious Ethernet bandwidth will enable 400–600Gbps optical links. The first devices are scheduled to sample in 1H21, and we estimate volume production will start in 1H22. [March 30, 2020]

Figure 1: Block diagram of Xilinx Versal Premium VP1802.
Figure 2: Versal Premium in a data-center-interconnect (DCI) system.
Table 1: Xilinx Versal Premium chips.
Table 2: Versal Premium VP1802 versus Achronix Speedster7t 6000 and Intel Agilex AGI-027.

Marvell Upgrades to Octeon TX2

Octeon Fusion and TX2 Target Networking and 5G Base Stations

Marvell is shipping next-generation Octeon Fusion processors for 5G macrocell base stations and is sampling Octeon TX2 processors for base stations and other network infrastructure. All debut a new Arm-compatible 64-bit CPU core, 25Gbps serdes, PCI Express Gen4, and Octeon's first L3 caches. The largest chip has 36 cores, runs at 2.4GHz, and can forward up to 220 million packets per second (Mpps). It's the first major Octeon update since Marvell gained these product lines by acquiring Cavium in 2018. [March 16, 2020]

Figure 1: Block diagram of Marvell's Octeon TX2 CN98xx.
Figure 2: An Octeon Fusion 5G macrocell base-station design.
Table 1: Marvell's Octeon TX2 processors.
Table 2: Marvell Octeon TX2 versus Intel Cascade Lake.
Table 3: Octeon Fusion base-station processors.

Wi-Fi 6E Covets 6GHz Band

Higher RF Spectrum Already Licensed but Ripe for Reuse

Wi-Fi is on the verge of gaining much more radio-frequency spectrum and better real-world performance. It looks certain that the US will allow future Wi-Fi devices to operate as unlicensed radios in the mostly licensed 6GHz RF band, and other countries will likely follow. By reaching above the 2.4GHz and 5GHz unlicensed bands that Wi-Fi now occupies, the proposed Wi-Fi 6E specification would gain more RF spectrum and suffer less interference from other unlicensed radios. It would also maintain compatibility with existing standards. But Wi-Fi devices will need additional RF circuitry to operate at 6GHz — and they face opposition from some licensed users now occupying those frequencies. [March 2, 2020]

Figure 1: Wi-Fi 6E bands and channel allocations.
Figure 2: US frequency allocations in part of the 6GHz band.
Table 1: Comparison of Wi-Fi standards.
Table 2: Broadcom's Wi-Fi 6E access-point chips.

ST Debuts Wireless MCU for LoRa

New STM32WLE5 Integrates Customized Semtech Sub-GHz Radio

Anyone who recharges a smartphone daily would envy a wireless communicator that could run for a decade on a coin-size battery. It's possible with LoRa (Long Range), a low-power wide-area-network (LPWAN) technology. The catch is that LoRa is designed for brief bursts of low-bandwidth data instead of high-bandwidth voice and data. But its nine-mile range and extremely low power make it ideal for remote sensors and meters in IoT applications. To drive down the cost of LoRa and similar LPWANs, STMicroelectronics is shipping the industry's first microcontroller with a fully integrated LoRa radio. The new STM32WLE5 is a 32-bit Arm-based MCU containing ST's custom implementation of a Semtech SX126x LoRa-compliant radio that operates in the unlicensed sub-GHz radio-frequency bands. Its RF coverage of 150–960MHz ensures compatibility in North America and Australia (915MHz), Europe (433MHz and 868MHz), and Asia (923MHz). It can also track mobile objects (such as shipping containers) using radio triangulation instead of power-hungry satellite geolocation. [February 17, 2020]

Table 1: STMicroelectronics STM32WLE5 wireless microcontrollers.
Table 2: Comparison of the STM32WLE5 with Microchip's SAM R35.

Renesas and SiLabs Polish Bluetooth

RX23W and Wireless Gecko Series-2 Microcontrollers Target IoT

Renesas and Silicon Labs are touting new microcontrollers that integrate Bluetooth radios for IoT systems and other wireless devices. In addition to implementing Bluetooth Low Energy (BLE), Bluetooth Long Range, and Bluetooth Mesh, the Renesas RX23W adds USB, a controller-area-network (CAN) interface, infrared communications, and capacitive-touch inputs. It's an MCU that's virtually an SoC. By contrast, Silicon Labs' Wireless Gecko Series-2 EFR32BG22 is less integrated because it targets lower-cost, lower-power applications — but it supports Bluetooth 5.2 Direction Finding, an industry first. [February 17, 2020]

Table 1: Two new Bluetooth microcontrollers: the Renesas RX23W and Silicon Labs Wireless Gecko Series-2 EFR32BG22.

Sharper Vision, Brains In i.MX8M Plus

NXP Adds ISPs, DSP, and DLA to New Multimedia Processor

Tracking the latest trends, NXP is adding vision processing, signal processing, and neural-network acceleration to its latest i.MX multimedia processor. Among the other new features are additional I/O interfaces for industrial and commercial applications, including PCI Express 3.0 and Gigabit Ethernet with Time-Sensitive Networking. The i.MX8M Plus expands the i.MX8M family's capabilities while employing 14nm FinFET technology to hold typical power consumption to 3.0W or less. In battery-powered systems, it can operate on less than 1W. Yet it still has up to four Arm Cortex-A53 CPUs and a Cortex-M7 microcontroller core. The most important features prepare the i.MX8M Plus for edge applications that combine vision, speech recognition, and machine learning with graphics, audio, and networking. [February 3, 2020]

Figure 1: Machine-learning performance reaches 2.3 trillion operations per second (TOPS).
Table 1: Comparison of NXP's i.MX8M multimedia processors.

Year in Review: Arm Flexes IP to Battle RISC-V

Leading IP Vendor Allows Custom Instructions, Try Before You Buy

RISC-V is generating enough heat to be felt as far away as Cambridge, England — or so it seemed in 2019 as Arm, the industry's leading intellectual-property (IP) vendor, took steps to counter its open-source competitor. In addition to shipping its most powerful CPU cores yet, Arm revealed its first multithreaded design, rolled out new vector extensions, opened one of its architectures to user-defined instructions, announced its first neural-network accelerator, and expanded a "try before you buy" program that lets customers evaluate its products while deferring licensing fees. These measures reinforced the company's leadership position while giving customers more flexibility. Nevertheless, the open-source alternative continued to evolve and spread. In 2019, vendors such as Andes, Cortus, and SiFive began licensing new RISC-V CPUs that boost performance, strengthen security, and implement new extensions. The battle between Arm and its upstart challenger dominated the IP-related events of 2019 and will heat up in 2020. [January 20, 2020]

Figure 1: The most active processor-IP vendors in 2019.
Figure 2: Arm's Flexible Access IP.
Figure 3: Tiny 32-bit CPU cores.
Figure 4: Bfloat16 format.
Sidebar: CPU-IP Events of 2019
Sidebar: Other IP Events of 2019

Year in Review: AI Is Livin' On the Edge

Edge Intelligence Accelerates; Driverless Cars Face Uphill Climb

In 2019, neural-network acceleration became so popular in embedded processors that it's nearly a standard feature rather than a futuristic vision. Even some microcontrollers have added AI engines for rudimentary speech and image recognition. But while AI accelerated, autonomous vehicles hit the brakes. A few years ago, fanboys saw driverless passenger cars coming round the corner in 2020. Now it's apparent that fully autonomous technology isn't quite ready — and that the auto industry's design cycles are longer than the tech industry's hype cycles. Also in 2019, chip-level security gained momentum. To protect against a growing onslaught of hackers and malware, almost every new embedded processor and microcontroller is becoming a fortress of digital defenses. Yet even as they become more powerful and secure, MCUs are hitting a memory wall. Flash memory is difficult to economically implement in geometries smaller than 40nm, so MCUs are resorting to multiple cores and faster clock speeds to boost performance without a process shrink. Meanwhile, Intel is moving chiplets and 3D stacking into FPGAs and high-performance embedded processors. [January 6, 2020]

Figure 1: Intel's Agilex FPGAs will use chiplet technology to add features.
Figure 2: Block diagram of Achronix machine-learning processors.
Figure 3: The Linley Group's ADAS forecast, 2020–2023.
Figure 4: STMicroelectronics' phase-change-memory bit cell.
Sidebar: AI Embedded-Processor Events of 2019
Sidebar: Automotive-Processor Events of 2019
Sidebar: Other Embedded-Processor Events of 2019

PowerVR Readies Ray Tracing

Imagination's New GPU Cores Debut IMG A-Series Architecture

Imagination Technologies has revamped its PowerVR GPU architecture to deliver much higher performance and power efficiency today and ray tracing in the near future. The company says its next-generation IMG A-Series GPU cores are up to 2.5x faster than the PowerVR Series9XM and consume up to 60% less power in the same die area and process technology. Also, the new massively threaded architecture can run graphics and machine-learning workloads concurrently while maintaining secure task isolation. Production RTL is shipping now, so the GPUs should begin appearing in phones in 2021. To serve the full spectrum of mobile graphics applications, the A-Series comprises three product tiers. At the high end, the AXT-Series includes four of the largest and fastest new licensable GPU cores for Chromebooks, tablets, automotive displays, and smartphones (premium and mid-premium). The AXM-Series initially includes only one GPU, mainly for midrange phones. The low-end AXE-Series includes two GPUs for entry-level phones, IoT devices, and entry-level digital TVs and set-top boxes. [December 16, 2019]

Figure 1: Furian versus A-Series ALU pipeline.
Figure 2: Thread scheduling for the ALU pipelines.
Figure 3: PowerVR IMG AXT-64-2048 block diagram.
Table 1: New PowerVR A-Series GPU cores versus Series9.

Ryzen Threadripper 3 Thrashes Xeon

New AMD Workstation Processors Excel in Price/Performance

AMD continues its comeback surge with third-generation Ryzen Threadripper processors for high-end desktop PCs and workstations. They surpass Threadripper 2 by upgrading to the newest Zen 2 CPU cores, boosting their clock speeds, enlarging their caches, increasing their DRAM bandwidth, adopting PCI Express Gen4, and introducing a new internal design that gives all CPUs equal access to I/O interfaces and external memory. They also outrun and underprice Intel's best Xeon X- and W-series processors targeting the same applications. The first Threadripper 3 products are the 24-core 48-thread 3960X and 32-core 64-thread 3970X, which shipped on November 25. On deck for next year is the 64-core 128-thread 3990X. Base clock speeds for the first models are 3.8GHz (24 cores) and 3.7GHz (32 cores); their peak turbo frequency is 4.5GHz. We consider them workstation processors because their prices elevate them beyond most PC users: $1,399 for the 3960X and $1,999 for the 3970X. (AMD will announce clock speeds and prices for the 3990X when it ships.) But some similar Intel processors cost more than 3x as much. [December 2, 2019]

Figure 1: AMD and Intel high-end desktop/workstation processors.
Figure 2: Threadripper 3 internal topology.
Table 1: AMD Ryzen Threadripper workstation processors.
Table 2: Comparison of AMD and Intel workstation processors: Threadripper 3960X versus Xeon W-3265, and Threadripper 3970X versus Xeon W-3175X.

Renesas RA MCUs Strengthen Security

New Microcontrollers Will Include Company's First With TrustZone

Known mainly for its huge RX family of proprietary 32-bit microcontrollers, Renesas is expanding its Arm catalog by shipping the new Renesas Advanced (RA) family. Although the previous Arm-based Synergy MCUs remain available, the new models strengthen security, and future models will add Arm's TrustZone privileged-execution technology. Some future chips will also have dual cores and IoT radios. The first RA products shipped in October featuring Cortex-M4 and Cortex-M23 CPUs. Next year, Renesas plans to add Cortex-M33 models, including its first MCUs with TrustZone. The four-year-old Synergy line continues to employ Cortex-M4, Cortex-M23, and Cortex-M0+. Although the RA and Synergy families are similar, the company says the new MCUs ease software development by working with more third-party tools in the Arm ecosystem. A new Flexible Software Package helps existing customers port code to the RA family, and it works with Amazon's FreeRTOS. [November 18, 2019]

Table 1: Initial Renesas RA-family microcontrollers.
Table 2: Comparison of new microcontrollers with security features: the Renesas RA2A1, NXP LPC55S6x, and STMicroelectronics STM32L5.

Intel's Tremont: A Bigger Little Core

New CPU Pairs With Sunny Cove in Lakefield SoC's 3D Package

Baseball season just ended, but Intel is still racing for a home run. Rounding first base, it replaces the 14nm Goldmont+ core with the new 10nm Tremont. Passing second base, the company debuts Tremont in a die that integrates four of these low-power cores with a high-performance Sunny Cove CPU. Rounding third base, Intel stacks this die on a south-bridge I/O die — the first product to employ Foveros 3D-chip technology. And running for home, it uses conventional package-on-package (PoP) technology to add DRAM. Result: a high-performance/low-power CPU combo similar to Arm�s "Big.Little" clusters, except with x86 compatibility, stacked die, and copackaged memory. This unique heterogeneous processor, code-named Lakefield, is a major departure for Intel. In concept, it's like smartphone processors that use PoP technology to stack DRAM on an Arm Cortex Big.Little die, but their I/O interfaces are fully integrated. PCs need more I/O than phones, necessitating a south-bridge chip. Intel's Foveros 3D-stacking technology, revealed last year, allows a 10nm multicore processor to piggyback on a 22nm south bridge. Even with the PoP DRAM, the Foveros stack squeezes into a 12mm package only 1mm thick. [November 4, 2019]

Figure 1: Lakefield SoC exploded view.
Figure 2: Block diagram of Tremont's microarchitecture.
Figure 3: Tremont instructions-per-cycle gains over Goldmont+.
Figure 4: Tremont power efficiency versus the Sunny Cove CPU core.
Table 1: Tremont versus Goldmont+.
Table 2: Comparison of Intel's Tremont and Arm's Cortex-A77.

i.MX RT1170: Fastest Flashless MCU

NXP Pushes Cortex-M7 to 1.0GHz, Adds 400MHz Cortex-M4

NXP's new i.MX RT1170 is flashless, but it runs like the Flash. Thanks to an Arm Cortex-M7 operating at up to 1.0GHz and a 400MHz Cortex-M4 coprocessor, it will likely surpass 6,400 on the CPU-intensive EEMBC CoreMark test, which would set a record. Although the chip lacks flash memory, its 2MB of SRAM enables most software to run internally after startup. In addition to the mixed-signal peripherals and I/O interfaces expected of a 32-bit MCU, it has three Ethernet ports — including two with Audio Video Bridging (AVB) and one with Time-Sensitive Networking (TSN). To present attractive user interfaces, the RT1170 has a 2D GPU, graphics acceleration, LCD outputs, and interfaces for cameras, keypads, microphones, and audio. The speedy CPU can simultaneously perform voice and object recognition, and the coprocessor can act as a real-time controller or listen for wake words while the main CPU sleeps. In-line encryption protects sensitive code and data in external memory. The RT1170 accelerates popular crypto algorithms, and it can detect and respond to physical tampering. The chip also has numerous features for industrial and automotive control, and NXP will offer models rated for extreme temperatures. [October 21, 2019]

Figure 1: NXP i.MX RT1170 block diagram.
Figure 2: NXP's eIQ software-development environment.
Table 1: Comparison of three Arm-based MCUs and MCU-like SoCs: NXP's i.MX RT1170, STMicroelectronics' STM32H7, and Texas Instruments' Sitara AM3358.

Exynos 980 Integrates 5G Modem

Samsung First With 5G in Mid-Premium Smartphones

Samsung's first mobile processor to integrate a 5G modem potentially reduces the system cost and power consumption of mid-premium smartphones for emerging cellular networks. The Exynos 980 5G can deliver downlinks of 2,550Mbps and uplinks of 1,280Mbps by operating in 5G radio bands below 6.0GHz ("sub-6"). It's the first processor to employ Arm's new Cortex-A77, and it adds a fast neural engine and an 802.11ax baseband for compatibility with the latest Wi-Fi 6 networks. The Exynos 980 surpasses the Exynos 7885 mid-premium LTE processor and introduces new three-digit part numbers to the family. It targets $300–$500 phones in regions that are rolling out their 5G networks in the sub-6 bands, including Korea, China, and Europe. Sprint is the only compatible U.S. operator. The other U.S. majors — AT&T, T-Mobile, and Verizon — are deploying higher-frequency millimeter-wave (mmWave) technology. [October 7, 2019]

Figure 1: Exynos 980 5G block diagram.
Table 1: Comparison of three mid-premium smartphone processors: Samsung's Exynos 980 5G, MediaTek's Helio G90T, and Qualcomm's Snapdragon 730.

Power9 AIO Boosts DRAM Bandwidth

New IBM Server Processor Outruns x86 Memory Interfaces

IBM's newest Power9 processor introduces a high-bandwidth DRAM interface and improved OpenCAPI 4.0 protocol while retaining the family's other stellar features. The Power9 AIO (Advanced I/O) offers the industry's highest memory bandwidth, outdoing Intel's latest Xeon Platinum chips and AMD's new Epyc 7002 chips. It's scheduled to begin production late next year. For the most part, it resembles previous 24-core Power9 chips that employ the quad-thread SMT4 CPU core. What's new is a different I/O ring surrounding the cores and caches. Earlier Power9 processors come in two flavors: "scale-out" models with integrated DRAM controllers for industry-standard DIMMs, and "scale-up" models that replace the DRAM controllers with proprietary high-speed links to external controllers that work with standard DIMMs. The Power9 AIO introduces a third option: a new Open Memory Interface (OMI) for special DIMMs that integrate a new memory controller alongside standard DRAMs. [September 23, 2019]

Figure 1: Power9 AIO's Open Memory Interface (OMI).
Figure 2: OMI DIMM versus standard DIMM.
Figure 3: Power9 AIO die photo.
Table 1: Comparison of high-end server processors: IBM's Power9 AIO, AMD's Epyc 7742, and Intel's Xeon Platinum 8280.

Upmem Embeds Processors in DRAM

Custom DIMMs Bring In-Memory Computing to Standard Servers

Organic brains don't partition thinking and memory in separate hemispheres, and some computer scientists think electronic brains shouldn't either. In-memory computing is a frontier technology that relieves the CPU-DRAM bottleneck by integrating both functions on the same chip. The main obstacles are physical: DRAM is difficult to integrate in a conventional logic process, and logic performs poorly in a conventional DRAM process. French startup Upmem tackles this dilemma by using massive parallelism to overcome the physical limitations of embedding logic in a DRAM chip. Its unique data processing units (DPUs) run at only 500MHz, but each can execute 24 hardware threads. Each chip has eight DPUs and 4Gbits of DRAM. Sixteen chips populate a memory module that's compatible with industry-standard DIMM slots. Thus, each processor-in-memory (PIM) module has 128 DPUs and 8GB of DRAM, and it can work alongside the usual DIMMs or replace them altogether. A fully populated Intel Xeon system with six DRAM channels could amass 1,536 DPUs, 36,864 threads, and 96GB of main memory. [August 26, 2019]

Figure 1: Upmem processor-in-memory (PIM) module.
Figure 2: Block diagram of an Upmem DRAM chip.
Figure 3: Adapting 64-bit operands to an 8-bit data bus.

Ice Lake Debuts 10nm, New Cores

But Lower Clock Speeds Cool Traditional Performance Gains

Intel's historic forward march in processor performance has finally hit a wall — and it's a formidable barrier, despite measuring only 10 nanometers thick. The first real products manufactured in the company's long-delayed 10nm FinFET technology are 11 "Ice Lake" notebook processors that generally run at lower clock speeds and consume at least as much power as their 14nm predecessors. Even their new 10th Generation "Sunny Cove" CPUs can't always overcome the clock-speed slippage. Fortunately, the new chips offer faster graphics, better hardware accelerators, greater memory bandwidth, and other features that probably matter more to typical users. Also, some models have more CPU cores. The 10nm transistors are slower than the transistors on existing 14nm chips but are smaller, enabling higher integration on the processor die. One beneficiary is the much larger Gen11 GPU, which Intel says is about 2x faster than its previous integrated graphics. Another is the industry's first integrated Thunderbolt 3 controller. A copackaged 14nm south-bridge chiplet adds more new features: an 802.11ax (Wi-Fi 6) controller and two voltage regulators for the processor and chiplet itself. On balance, users can praise Ice Lake for the new goodies its 10nm technology allows even while puzzling over its retrograde CPU speed and power efficiency. [August 9, 2019]

Figure 1: Ice Lake block diagram.
Figure 2: Ice Lake versus 8th Generation notebook processors.
Table 1: Comparison of Intel's Core-i7 1068G7 versus AMD's Ryzen 7 Mobile 3750H.

ST Accelerates Motor Control

New STM32G4 MCUs Optimize Trig and Filter Functions

STMicroelectronics is shipping new Arm-based MCUs with unique math accelerators for controlling electric motors, industrial machines, and digital power supplies. The STM32G4-series is also unusually well equipped with mixed-signal peripherals, such as analog-to-digital converters (ADCs), digital-to-analog converters (DACs), pulse-width modulators (PWMs), comparators, and op amps. Integrated flash memory, SRAM, security, and I/O interfaces round out these devices. The CPU is a 32-bit Cortex-M4F that can operate at 170MHz, which is typical for an MCU in this class. Some models have up to 512KB of flash memory, up to 160KB of SRAM, and an extended temperature range. They're power efficient in active and standby modes, and they come in a variety of packages from 5mm to 14mm. Our survey of distributors found unit prices that will likely fall to the $2–$5 range for 1,000-piece volumes. [July 29, 2019]

Figure 1: Block diagram of the STM32G4 microcontroller.
Table 1: Comparison of three Arm-based MCUs for mixed-signal applications: the STM32G4, NXP LPC54S018, and Renesas Synergy S5D3.

Qualcomm Upgrades Low-End Phones

Qualcomm 215 Gains 64-Bit CPUs, NFC, 802.11ac, and Dual Cameras

Premium smartphones costing $1,000 or more attract the lust and glory, but low-cost phones for emerging markets are selling faster. Most smartphones sold this year will retail for less than $200. To deliver more features at even lower prices — $75 to $125 — Qualcomm is upgrading its 2xx-series chipset. For the first time, the company's entry-level processors will have 64-bit CPUs, 802.11ac Wi-Fi, and dual image signal processors (ISPs) — plus support for an HD+ display, full-HD video capture, a 13-megapixel camera (or dual 8MP cameras), and dual SIM cards. The new Qualcomm 215 platform supersedes the four-year-old Snapdragon 212. (The company now drops the "Snapdragon" brand from its entry-level tier.) We expect the 215 to begin volume production this quarter, followed by the first phones within three months. [July 22, 2019]

Figure 1: Block diagram of the Qualcomm 215 mobile platform.
Table 1: Comparison of three low-cost smartphone processors: the Qualcomm 215, MediaTek Helio A22, and Unisoc SC9832.

ST's Speed-Demon MCUs

New Dual-Core STM32H7 Microcontrollers Surge to 480MHz

STMicroelectronics is shipping the fastest Arm-based microcontrollers yet seen: dual-core speed demons whose Cortex-M7 main CPU can reach 480MHz. The new STM32H7 chips also have a Cortex-M4F coprocessor that can run at 240MHz. Additional features include up to 2MB of flash memory, up to 1MB of SRAM, numerous peripherals and I/Os (including Ethernet, USB, LCD, camera, and audio interfaces), secure boot, cryptography acceleration, and 2D graphics. These chips are among the most fully stuffed 32-bit MCUs on the market, yet they sprint like Usain Bolt. The new dual-core models extend the single-core STM32H7 line that began shipping in 2017. [July 15, 2019]

Figure 1: Dual-core STM32H7 block diagram.
Table 1: Comparison of three Arm-based MCUs and SoCs: the STMicroelectronics dual-core STM32H7, the NXP i.MX RT1064, and the Texas Instruments Sitara AM3358.

Intel Refreshes Xeon E Family

New E-2200 Processors Boost Core Counts, Clock Speeds

Intel's new Xeon E-2200 processors replace the E-2100 line that was new only last fall. Using the same "Coffee Lake" design, they raise the maximum core count from six to eight, add a corresponding amount of L3 cache, and nudge CPU clock speeds higher. One new embedded model halves power consumption. Otherwise, their power, prices, and other features remain the same, so the new products are better values. Also, they fill a market niche for which AMD offers less competition. Intel promotes the Xeon E family mainly for desktop and mobile workstations and small-office servers. They're basically PC processors with error correction (ECC) on the DRAM bus, so they cost a little more than PC chips that use the same 14nm++ die. The new E-2200s and previous E-2100s have the same memory controllers, I/O interfaces, packages, and pinouts. In all, Intel announced 21 new chips — mostly server/workstation models, plus a few mobile and embedded variants. Five models have eight cores, whereas the previous series peaked at six. The low-end models have only four cores and disable Hyper-Threading. [July 1, 2019]

Table 1: Intel Xeon E-2200 versus E-2100 series.
Table 2: Xeon E-2200 mobile versus E-2100 mobile.
Table 3: Comparison of Intel and AMD processors with ECC-protected memory: Xeon E-2288G versus Ryzen 7 Pro 2700X, and Xeon E-2278GEL versus Epyc Embedded 3201.

Silicon Labs Upgrades Wireless MCUs

New EFR32 Series 2 Chips Improve Performance and IoT Security

Mindful of the embarrassing security breaches that plague first-generation IoT devices, Silicon Labs is girding its newest wireless microcontrollers with hardier security hardware. Secure boot, cryptography acceleration, side-channel defenses, a secure debug port, and a true-random-number generator (TRNG) are among the improvements. The new EFR32xG21 chips in the Wireless Gecko Series 2 family are also the first Silicon Labs products to adopt Arm's Cortex-M33, and they have enhanced 2.4GHz radios for popular IoT protocols. Shipping in volume since April, these 32-bit wireless MCUs target line-powered IoT. The company is also developing battery-friendly models that will inherit the power-saving features of its Wireless Gecko Series 1 family. Already, the new chips rank among the lowest-power wireless MCUs on the market. In active mode (albeit with their radios silent), they draw only 51 microamps per megahertz. And they're tiny, cramming all their features into a 4mm surface-mount QFN package with only 32 pins. [June 17, 2019]

Figure 1: Wireless Gecko Series 2 block diagram.
Table 1: EFR32 Series 2 wireless microcontrollers.
Table 2: Comparison of four wireless MCUs for IoT: Silicon Labs EFR32MG21, NXP Kinetis K32W0x, STMicroelectronics STM32WB, and Texas Instruments SimpleLink CC1352R.

Achronix Debuts FPGAs for AI

Speedster7t Family Accelerates Machine Learning and Bandwidth

As promised last year, Achronix is using its embedded-FPGA technology to build a new FPGA family optimized for machine learning and data throughput. To compete with Intel and Xilinx FPGAs in data centers, the new Speedster7t family employs faster DSP blocks, optional GDDR6 memory, 400 Gigabit Ethernet, PCI Express Gen5, 7nm FinFETs, and a custom on-chip network. Relative to competing products, its DSPs are more optimized for machine learning (ML) and its I/O interfaces are more geared to high bandwidth. One tradeoff, however, is that it lacks the Arm CPUs that other advanced FPGAs integrate for general embedded applications. Differentiation is vital if Achronix is to avoid the chronic crashes that have thwarted other attempts to challenge the FPGA duopoly of Xilinx and Intel (Altera). Speedster7t should enter production in 2020, when both major vendors are rolling out their own next-generation products manufactured in the latest technologies. [June 3, 2019]

Figure 1: Speedster7t block diagram.
Figure 2: Speedster7t's machine-learning processors (MLPs).
Table 1: Achronix Speedster7t FPGAs.
Table 2: Comparison of three FPGAs for machine learning: Achronix Speedster7t 6000, Intel Agilex AGI-027, and Xilinx Versal VC1902.

Ambarella Expands Vision to ADAS

CV2-Series Vision Processors Include Deep-Learning Accelerator

Driverless vehicles seem irresistible to chip vendors hungry for emerging markets that established players don't yet rule. One new contender is Ambarella, a Silicon Valley-based company with long experience in vision processing. Its SoCs have appeared in security cameras, GoPro action cameras, DJI drones, dash cams, electronic mirrors, and other vision products. In 2015, it acquired an Italian company with additional machine-vision expertise plus autonomous-driving technology. Given this background, it's no wonder Ambarella is sampling its first vision processors for advanced driver-assistance systems (ADAS). The CV2AQ is the flagship of the new CV2-series. It integrates four Arm Cortex-A53 CPUs for application software, a deep-learning accelerator for vision processing (CVflow), a stereo-vision coprocessor, an image signal processor (ISP), video I/O, audio I/O, and various other interfaces and peripherals. It's designed for semiautonomous driving at ADAS Level 2 or 3. (Fully autonomous is Level 4 or 5.) [May 20, 2019]

Figure 1: Ambarella CV2AQ block diagram.
Figure 2: Ambarella's real-time object-detection performance.
Table 1: Ambarella CV2AQ vision processor for ADAS.
Table 2: Comparison of three vision processors for ADAS: Ambarella's CV2AQ, Intel's Mobileye EyeQ4, and the Renesas R-Car V3H.

AMD Accelerates Ryzen Embedded

New R1000 Processors Boost Dual-Core Clock Speeds

Squeezing a bit more performance from its 14nm Zen CPUs, AMD is sampling two new Ryzen Embedded processors with integrated graphics and networking. These R1000-series chips — or accelerated processing units (APUs), as the company calls them — target casino games, digital kiosks, industrial machines, thin clients, and other embedded systems that need relatively fast GPUs and Ethernet connectivity. Both dual-core models are scheduled for production this quarter. The new R1505G and R1606G differ only in clock speed and price. The former operates its dual CPUs at 2.4GHz (base clock) to 3.3GHz (turbo); the latter nudges those frequencies to 2.6GHz and 3.5GHz, respectively. Whereas these differences are only about 6–8%, the R1606G's GPU is 20% faster: 1.2GHz versus 1.0GHz. AMD rates both processors at 15W thermal design power (TDP). [May 6, 2019]

Table 1: AMD Ryzen Embedded R1000-series.
Table 2: Comparison of three x86 embedded processors with graphics: AMD's Ryzen Embedded R1606G, Intel's Core i5-7300U, and Intel's Atom x7-E3950.

Horizon Robotics Eyes ADAS

Chinese Startup Offers Vision Chips for Vehicles and Smart Cameras

Argus was the ancient Greek god with a hundred all-seeing eyes, but he's eclipsed by the stare of today's ubiquitous security cameras. Problem is, there aren't enough people to monitor what those millions of cameras can see. Solution: smart cameras that can recognize faces, objects, and unusual activity before summoning an operator. The same technology allows vehicles to drive themselves with little or no human control. One rising contender in this field is Horizon Robotics, a four-year-old Chinese startup that has raised $700 million — the largest sum of any current semiconductor startup and among the most ever raised by a chip company. So far, Horizon has produced two smart-vision processors, a smart camera, and a single-board camera platform for smart vehicles. The Journey 1.0 and Sunrise 1.0 processors employ a proprietary machine-learning subsystem to find faces and other objects in video streams. Horizon says these chips can identify up to 200 objects per video frame with less than 30 milliseconds' latency per frame. [April 22, 2019]

Figure 1: Block diagram of the Journey 1.0 processor for smart vehicles.
Table 1: Horizon Robotics smart-vision processors.
Table 2: Comparison of three smart-vision processors: Horizon Robotics' Sunrise 1.0, Bitmain's Sophon BM1880, and Kendryte's K210.

Intel 10nm FPGAs Add Chiplets

Next-Generation Agilex Devices Can Copackage Custom Logic

Intel's next-generation Agilex FPGAs diverge from Xilinx's new Versal devices in important ways. Although both companies want to expand the FPGA market by augmenting their programmable logic with faster processing and I/O, Intel favors multichip packages tailored for particular applications or customized for special customers. Xilinx, which disclosed its Versal family last year, favors more chip-level integration and preconfigured product tiers. Both vendors plan to sample their new devices later this year and begin volume shipments in 2020. Agilex (code-named Falcon Mesa) is Intel's first new FPGA family since acquiring Altera in 2015. Intel will use its new 10nm FinFET process and copackaging technology to build FPGAs that combine a programmable-logic chip with one or more chiplets. Some Stratix FPGAs and x86 processors already employ this packaging technology, but Agilex expands it and allows customers to develop their own chiplets. [April 8, 2019]

Figure 1: Agilex conceptual block diagram.
Table 1: Intel Agilex F-, I-, and M-series FPGAs.

i.MX8 Nano Cuts Multimedia Power

New NXP Processors Offer Lower Power Than i.MX8 Mini

Performance, power, and cost are tradeoffs — pick two out of three to optimize. With its new i.MX8M Nano application processors, NXP reduces power consumption and cost at the expense of performance — but not much. For less than 1.5W and $10, customers get an SoC with up to four Arm Cortex-A53 application CPUs, a Cortex-M7 real-time controller, a 3D GPU, and more than 1MB of internal memory. Scheduled for volume production this year, these chips will extend the i.MX8M family's reach into low-cost embedded systems for the consumer, commercial, medical, and industrial markets. [March 25, 2019]

Figure 1: Block diagram of NXP's i.MX8M Nano Quad processor.
Table 1: NXP's i.MX8M Nano series.
Table 2: Comparison of two low-power embedded processors for audio: NXP's i.MX8M Nano Quad Lite and Synaptics' AudioSmart AS-390.

ST Debuts Its First Application SoCs

STM32MP1 Processors Go Beyond Microcontrollers

Relatively few embedded processors integrate an application CPU with a 3D GPU and real-time microcontroller core. Even fewer can operate in the subwatt range when running full tilt. STMicroelectronics is joining this exclusive club with its new STM32MP1 family, which extends the existing STM32 microcontrollers into the realm of full-fledged SoCs. The superset design is the STM32MP157, which features two Arm Cortex-A7 CPUs for application software, a Cortex-M4F coprocessor for real-time control, a VeriSilicon 3D GPU, and numerous on-chip memories, peripherals, and I/O interfaces. The STM32MP153 drops the GPU and its display interface, and the STM32MP151 drops those features, one Cortex-A7, and two I/O ports. All began production in February. Even the top-end model typically consumes only 500mW. ST is targeting general-purpose embedded systems that would otherwise employ a separate application processor for the high-level software and an MCU for real-time control. [March 11, 2019]

Figure 1: STM32MP157 block diagram.
Table 1: Comparison of three embedded SoCs for applications and real-time control: ST's STM32MP157, NXP's i.MX7 Dual-10SC, and Texas Instruments' Sitara AM4378.

Synaptics Pairs Ears With Brains

AS-3xx SoCs Apply Neural-Net Acceleration to Speech Recognition

Smart speakers are really smart microphones, but the industry prefers the less spooky description. They're actually a little less Orwellian when processing voice commands locally (at the network edge) instead of remotely (in the cloud). Partly to protect privacy, and partly to reduce network latency, the newest embedded processors for smart speakers and other voice-activated devices are adding AI acceleration to offload speech recognition from the servers at distant data centers. After listening closely to this growing customer base, Synaptics is sampling its new AudioSmart AS-3xx SoCs for voice assistants. The first to arrive is the AS-371, which began sampling in January. It has four Arm Cortex-A53 CPUs for application software, a proprietary security processor, and a proprietary neural-network engine for speech recognition. Three derivatives add a low-power Cortex-M33 core for always-on wake-word awareness but replace the neural engine with a Cadence Tensilica HiFi 4 DSP. The low-end model substitutes a single Cortex-M33 for the Cortex-A53s in the other chips; the M33 is a low-power controller that runs a small RTOS instead of Linux or Android. [March 4, 2019]

Figure 1: Synaptics AudioSmart AS-371 block diagram.
Table 1: Synaptics AudioSmart AS3xx series.
Table 2: Comparison of four AI-capable SoCs for IoT devices: the Synaptics AudioSmart AS-371 versus Intel's Movidius Myriad X, and the Synaptics AudioSmart AS-320 versus NXP's iMX RT600.

Intel Zaps Some Coffee Lake GPUs

New "F" Models for Desktop PCs Disable Integrated Graphics

Sometimes less is more, so Intel is shipping new PC processors without integrated graphics, following a similar move by AMD last year. Designated with an "F" suffix in the model name, these desktop processors in the 9th-Gen Core S-series employ the same die as existing Core S-series products (Coffee Lake) but disable the on-chip Intel GPU. In addition to competing with AMD's similarly configured Ryzen 2000-series processors, the new chips should run cooler, increase the effective capacity of Intel's overtaxed fabs, and attract gamers and content creators who prefer the higher performance of external graphics cards. Intel rolled out five of these processors in the Core i3, Core i5, Core i7, and Core i9 tiers. They have four to eight CPUs, base clock frequencies of 2.9GHz to 4.0GHz, and thermal design power (TDP) ranging from 65W to 95W. Their microarchitecture is unchanged from other Coffee Lake designs. The flagship Core i9-9900KF, like its integrated twin, offers the highest boost frequency (5.0GHz) of any PC processor and is the only F-model with Hyper-Threading. [February 11, 2019]

Table 1: New F-model Intel Core S-series processors for desktop PCs.
Table 2: Comparison of Intel and AMD processors without GPUs: Intel's Core i9-9900KF versus AMD's Ryzen 7 2700X, and Intel's Core i5-9400F versus AMD's Ryzen 5 2600.

AMD Brings 7nm GPUs to PCs

[Brief Item]

AMD's new Radeon VII GPU for PCs handily outperforms its predecessor, the Radeon RX Vega 64. The company says Fortnite fans should see 25% more performance, and other popular games show 35% to 42% improvements. Video editors should see gains of up to 27%, and the consumer GPU scores 62% higher on OpenCL LuxMark. As the first 7nm graphics processor for PCs, the Radeon VII heralds a coming wave of 7nm GPUs that will quicken the competition with Nvidia and Intel. Specifications reveal that the consumer-market Radeon VII uses the same chip as in the Radeon Instinct MI50 and MI60 cards announced last November for data-center servers and high-performance computing. The crucial difference is that the Radeon VII avoids cannibalizing sales of the higher-price data-center products by disabling some acceleration logic for 64-bit floating-point math. Whereas the Radeon Instinct MI50 can execute 6.7 trillion floating-point operations per second (Tflop/s) on FP64 data, the Radeon VII manages only 3.52Tflop/s — a huge 47% reduction. (Originally it was 0.88Tflop/s, but AMD relented to customer demand and restored some of the performance.) [February 11, 2019]

Helio P90 Raises Mid-Premium Bar

MediaTek Surges in Smartphone CPU, GPU, and AI Performance

Although overall smartphone sales are stagnant, vendors are still investing in the growing mid-premium segment. MediaTek's new Helio P90 does its part by boosting performance across the board — particularly for AI processing, which enables vendors to add more smarts to their phones. By targeting handsets that sell for $300 to $500, the Helio P90 brings some premium features to this lower price tier. It cranks up the CPUs, GPU, AI engine, camera interface, video accelerators, LTE baseband, and Wi-Fi connectivity. Only the display output and other I/O interfaces remain the same. Despite the extensive redesign, MediaTek is manufacturing the P90 in the same low-cost 12nm technology as its two predecessors. Slated to ship in phones in 1Q19 or 2Q19, the P90 sets a new standard in mid-premium features and performance that competitors will be hard pressed to beat this year. [January 28, 2019]

Figure 1: Helio P90 block diagram.
Table 1: MediaTek processors for mid-premium and midrange smartphones.
Table 2: Comparison of processors for mid-premium smartphones: MediaTek's Helio P90, Qualcomm's Snapdragon 710, and Samsung's Exynos 9610.

Ryzen Mobile Rises to 12nm

AMD's U-Series and H-Series Laptop Processors Boost Clock Speeds

Amid flat laptop shipments, PC vendors hope gaming notebooks will attract more users willing to pay higher prices for higher performance. AMD's new second-generation Ryzen Mobile processors attack Intel's dominance in this specialized segment while broadening the choices for mainstream users. Although they introduce no new features relative to the first generation, they nudge clock speeds upward and provide higher power/performance options for notebooks that can dissipate the additional heat. AMD is exploiting Intel's long-delayed move to 10nm technology by manufacturing most of its new Ryzen Mobile products in a 12nm FinFET process. To gain even more clock-frequency headroom, the new H-series raises the TDP to 35W — more than twice that of the U-series. And in a bid for bargain seekers, the new A-series drops the TDP to only 6W and uses trailing-edge 28nm CMOS technology to offer low-power, low-cost chips for Chromebooks. [January 14, 2019]

Table 1: AMD's second-generation Ryzen Mobile processors.
Table 2: AMD versus Intel U-series processors for mainstream laptops.

Year in Review: Embedded Processors Embrace AI

Driverless Cars and AI Are the New Hype Heroes; IoT Resurges

Artificial intelligence was the trending technology in 2018, but only some of it was real. The rest was ... artificial. Perhaps a good buzzword for 2019 would be AAI to call out the artificial artificial intelligence. Despite the marketing hype, embedded processors and FPGAs made real progress toward integrating neural networks and machine learning. Established vendors joined the fray with lavishly funded startups from all over the globe.

IoT, initially overhyped, is again surging. In 2018, vendors focused on integrating radios with microcontrollers and adopting low-bandwidth wireless standards. Problem is, there are too many standards. IoT processors are driving other trends called "Industry 4.0" and "Industrial IoT," which introduce or update factory networks.

Costly and embarrassing data breaches continue to plague the industry, spurring more effort to strengthen security as the trouble attracts uncomfortable attention from politicians and regulatory bodies. The industry consolidation that ran unchecked in recent years is also encountering regulatory resistance.

Autonomous-vehicle technology, a major trend in 2017, continued accelerating in 2018 despite some tragic setbacks. Uber achieved the dubious milestone of inflicting the industry's first pedestrian fatality, and Tesla didn't help by overselling its "Autopilot" lane guidance — with equally fatal results.

We also saw some FPGA action in 2018. Xilinx announced its next-generation Versal chips, which are FPGA/SoC hybrids, and startup Efinix began sampling new Trion devices that employ innovative logic cells. Intel acquired eASIC to revive structured ASICs as complementary products to its FPGAs.

Generally, 2018 was a good year for established vendors and startups that rode these trends. We expect some winnowing of AI and autonomous-vehicle projects in 2019 as the established players suffer some market disruption from the startups, and as some startups encounter the harsh realities of the market. [December 31, 2018]

Figure 1: Embedded-processor industry-consolidation update, 2018.
Figure 2: Mythic flash-based neural-network accelerator.
Figure 3: Spiking neural network.
Sidebar: AI-Related Embedded-Processor Events of 2018
Sidebar: IoT-Processor Events of 2018
Sidebar: Other Embedded-Processor Events of 2018

New MCUs Embrace Cortex-M33

NXP and ST Bring TrustZone to Their 32-Bit Microcontrollers

NXP and STMicroelectronics are sampling the first microcontrollers to use Arm's Cortex-M33, which adds TrustZone security to the 32-bit Armv8-M architecture. Designed for embedded systems that need higher security than conventional MCUs can promise, these chips separate bootstrap loaders, operating systems, and application programs into trusted or untrusted partitions, privileged or not. Additional security features include secure boot, cryptography acceleration, true-random-number generators, tamper detection, and encrypted memory. NXP's new LPC5500 family includes some dual-core models, whereas ST's new STM32L5 family includes single-core models. But the dual-core NXP parts have only one core that implements TrustZone and other Cortex-M33 optional features. The second core is a coprocessor that offloads untrusted and unprivileged tasks from the main CPU. Both companies target consumer, commercial, and industrial systems as well as IoT devices, although these chips lack integrated radios. [December 17, 2018]

Figure 1: TrustZone task isolation in Armv8-M.
Table 1: STM32L5 power modes.
Table 2: Comparison of NXP's LPC55S6x and ST's STM32L5 microcontrollers.

TI Samples Its First 64-Bit Arm

Sitara AM65x-Series Targets Industry 4.0 With Ethernet TSN

Seven years after the Armv8 debut, Texas Instruments is sampling its first products using the 64-bit architecture — and they're worth the wait. The new Sitara AM65x embedded processors are the most full-featured chips yet seen for next-generation "Industry 4.0" applications. They supersede the 32-bit Sitara AM57x-series and surpass in most ways the industrial-oriented processors from rival vendors. The flagship Sitara AM6548 has four 64-bit Cortex-A53 CPUs plus two 32-bit Cortex-R5F microcontroller cores with functional-safety features. It also integrates an Imagination Technologies GPU, video in/out interfaces, a dozen proprietary RISC cores in three real-time-control subsystems, and six Gigabit Ethernet (GbE) ports that implement the IEEE Time-Sensitive Networking standard. Cryptography acceleration, secure boot, PCI Express, USB, and scratchpad memories round out the mix. [December 3, 2018]

Figure 1: TI Sitara AM6548 block diagram.
Table 1: TI Sitara AM65x-series.
Table 2: Comparison of four embedded processors for industrial applications: TI's Sitara AM6548, NXP's Layerscape LS1028A, Intel's Atom C3338, and Intel's Atom x5-E3930.

Arteris Upgrades NoC for AI

[Brief Item]

Whereas human intelligence lives in neurons and synapses, artificial intelligence resides in transistors and wires. Arteris IP helps engineers design the wires. At the recent Linley Fall Processor Conference, the company announced several upgrades for its licensable network-on-a-chip (NoC) technology, including some AI features. The FlexNoC 4 Interconnect IP and AI Package are available now. Although the upgrades are useful for designing the interconnects on any chip, some features are particularly useful for processing deep neural networks (DNNs). For example, a new intelligent multicast feature enables a source (such as a CPU) to simultaneously broadcast the same data to multiple destinations (such as memory-mapped coprocessors). In a multicore AI chip, it can distribute DNN-training weights and image-map updates to multiple cores at different memory addresses. [November 19, 2018]

Imagination Cuts GNSS Power

[Brief Item]

It's midnight curfew. Your teenagers are safely home, but do you know where your cows are? Tracking farm animals is only one application for a low-power location device that employs a global navigation satellite system (GNSS). Others include cargo logistics, smart-city infrastructure, remote patient monitors, and any mobile IoT client. But conventional GNSS positioning may use too much energy for a small battery-powered tracking device. So Imagination Technologies is offering a new synthesizable hardware/software package that can integrate a low-power GNSS receiver in almost any chip. Imagination announced the Ensigma Series 4 GNSS package at the recent Linley Fall Processor Conference. It's licensable intellectual property (IP) for chip designers that's based on previous Ensigma designs for Wi-Fi and Bluetooth. [November 19, 2018]

LS1028A Targets Cars and Factories

NXP Processor Favors Automotive and Industrial Over Networking

Targeting automotive and industrial applications, NXP plans to sample the Layerscape LS1028A embedded processor in December and begin volume production by mid-2019. At the recent Linley Fall Processor Conference, the company announced three new variants that omit some features to reduce cost and power. Two models are the first Arm-based processors in this family to integrate GPUs, and all are NXP's first to implement the IEEE Time-Sensitive Networking (TSN) standard for Ethernet. The LS1028A follows a new company strategy. Although NXP has always offered industrial processors and microcontrollers, networking was the main target for the high-performance QorIQ line. More recently, the convergence of wired and wireless communications with industrial automation is opening new opportunities while the company's sales decline in the network-infrastructure market. The new trend is called "Industrial IoT" or "Industry 4.0." NXP is riding this wave and is branding its new processors as Layerscape products instead of using the 10-year-old QorIQ brand, which is strongly associated with networking. [November 5, 2018]

Figure 1: NXP Layerscape LS1028A block diagram.
Table 1: Comparison of three embedded processors for industrial applications: NXP Layerscape LS1028A, Intel Atom C3338, and Intel Atom x5-E3930.

IBM Power9 Scales Up in Servers

Big-Memory Server Processors Reach the Merchant Market

IBM says the 12-core Power9 processor for scale-up servers is now available on the merchant market to system vendors and manufacturers. This model accesses DRAM through external buffer chips, which provide industry-leading memory bandwidth and capacity for enterprise servers that handle large workloads. It also offers industry-leading per-core integer throughput, I/O bandwidth, and glueless symmetric multiprocessing (SMP). By contrast, the scale-out Power9 processors that have been shipping for about a year integrate standard DDR4 DRAM controllers. They provide less bandwidth but are better suited to lower-cost systems (such as web servers) that handle threads with modest memory requirements. By offering Power9 products with both types of memory subsystems, IBM is targeting a wide range of servers with the same basic chip design. Power9 offers the best per-core performance of any server processor. On the SPEC CPU2017 benchmarks, a dual-socket system with 12-core processors easily beats AMD and Intel dual-socket systems that have equal or greater core counts. [October 22, 2018]

Figure 1: Power9 memory subsystems.
Figure 2: SPECint 2017 benchmark comparison.
Table 1: IBM Power9 processors.
Table 2: Comparison of high-end server processors: IBM Power9 scale-up, AMD Epyc 7601, and Intel Xeon Platinum 8180M.

Xilinx Versal Surpasses UltraScale+

Next-Generation Programmable Chips to Sample in Mid-2019

Xilinx plans to tape out its next-generation chips this quarter and begin sampling to major customers in mid-2019 using 7nm TSMC FinFET technology. At the recent Xilinx Developer Forum, the company also replaced the Everest code-name with the official brand: Versal, a portmanteau of "versatile" and "universal." Xilinx rejects the FPGA label in favor of a new generic moniker: adaptive compute acceleration platform (ACAP). Although we doubt other vendors will adopt that name, we agree these devices are full-fledged SoCs containing programmable logic, not just FPGAs with some SoC elements. Every Versal chip will have dual Arm Cortex-A72 application CPUs and dual Cortex-R5 real-time CPUs with floating-point units. Today, only the Zynq family integrates Arm cores, and they're less powerful Cortex-A53s and Cortex-R5s without FPUs. The Versal chips also have a new network-on-a-chip (NoC) that tightly binds all the processing elements, local memories, and I/O interfaces. Some Versal products will include the new software-programmable accelerators described at Hot Chips in August. Now, the company is calling them AI Engines — a marketing reference to their features for artificial-intelligence processing. [October 8, 2018]

Figure 1: Xilinx Versal SoC block diagram.
Table 1: Versal AI performance.
Table 2: Xilinx Versal AI Core Series and Prime Series.
Table 3: Comparison of Xilinx Versal and Zynq UltraScale+.

Fujitsu Raises Arm Over SPARC

Future Supercomputer Will Debut Custom 52-Core Arm Processors

SPARC isn't quite extinguished, but it's dimming. Although Fujitsu's product roadmap still shows a 7nm SPARC design in progress, the company's next-generation supercomputer switches to a custom 64-bit Arm processor built in the same technology. This powerful 52-core chip will appear in the mammoth Post-K supercomputer scheduled to debut in 2021. "Post-K" is a placeholder name for the machine that will supersede Fujitsu's famous K supercomputer, which in 2011 became the world's first to exceed 10 petaflop/s. The company has initial silicon samples of the chip and demonstrated a single-rack Post-K prototype at a supercomputer conference earlier this year. At last month's Hot Chips conference, Fujitsu said the new Arm-based A64FX processor delivers at least 2.7 double-precision teraflop/s of peak performance per chip, so it's about 21x faster than the Sparc64 VIIIfx chips in the K supercomputer. The company estimates Post-K will be about 100x faster than its predecessor, which contains 88,128 processors. Thus, Post-K could become the first exascale computer — a system capable of delivering at least one exaflop/s (one billion billion floating-point operations per second). [September 24, 2018]

Figure 1: Fujitsu A64FX block diagram.
Figure 2: A64FX pipeline diagram.
Table 1: Comparison of Fujitsu supercomputer processors: A64FX, Sparc64 XIfx, Sparc64 IXfx, and Sparc64 VIIIfx.

Xilinx Everest Outclimbs FPGAs

New Hardware/Software Programmable Engines Add Compute Power

Xilinx is redefining FPGAs with its next-generation Everest family. Instead of marketing them as FPGAs with embedded CPU cores, the company is pitching them as full-fledged SoCs augmented with programmable logic. They upend the traditional orientation of FPGAs by surrounding the programmable gates with more of everything: processing cores, hard logic, fast interconnects, and I/O interfaces. The new heterogeneous chip architecture — which Xilinx calls an adaptive compute acceleration platform (ACAP) — can boot and run as an SoC even without configuring the gates. At the recent Hot Chips conference, Xilinx focused on one new aspect: hardened compute cores tentatively called hardware/software programmable engines (PEs). Scheduled to tape out this year, Everest chips will be built in 7nm FinFET technology at TSMC. We estimate they'll begin volume production in 2H19 or 1H20, when they'll supersede today's 16nm UltraScale+ FPGAs as the new midrange and high-end models. [September 10, 2018]

Figure 1: Xilinx Everest conceptual block diagram.
Figure 2: Programmable-engine tiles.
Figure 3: Configuring the mesh as a data-flow graph.
Figure 4: The PE-to-FPGA interface.

Threadripper 2 Grows to 32 Cores

AMD's New Ryzen Threadripper Is Most Powerful PC Processor

Few people really need a 32-core 64-thread PC processor that can hit 4.2GHz, but so what? The folks at AMD will sell you one anyway, just because ... they can. And because Intel can't. So there. It's reason enough for AMD to introduce the world's most powerful PC processors — the second-generation Ryzen Threadripper family. The company positions two of the four family members for content creators who can pay $1,799 to get the 32-core 64-thread 2990WX or $1,299 to get the 24-core 48-thread 2970WX. Video editing, software development, and other compute-intensive tasks are the justification for these monster chips. The other two models are the 16-core 32-thread 2950X and the 12-core 24-thread 2920X, which list for $899 and $649, respectively. They target the high-end desktop PCs prized by avid gamers and by content creators who can't afford the costlier models. All these chips use the second-generation 12nm Zen+ CPU core that enhances the original 14nm Zen core. Although they are much pricier than mainstream PC processors, they're a bargain for their features and performance. [August 27, 2018]

Figure 1: Ryzen Threadripper memory access.
Table 1: AMD's second-generation Ryzen Threadripper family.
Table 2: Comparison of high-end desktop processors: AMD's Ryzen Threadripper 2990WX versus Intel's Core i9-7980XE, and AMD's Ryzen Threadripper 2950X versus Intel's Core i9-7960X.

Microchip Debuts Dual-Core DSC

New dsPIC33CH Challenges 32-Bit Digital Signal Controllers

Microchip's new dsPIC33CH is an unusually capable 16-bit digital signal controller (DSC) that combines the functions of a microcontroller and DSP. Actually a family of more than 50 chips, the 33CH integrates two PIC CPUs in a master-slave configuration that enables the slave to continue operating even if the master reboots to recover from a fault. The new family targets embedded systems that need real-time control and signal processing. Examples include electric-motor controllers, server power supplies, automotive sensors, and small drones. Some models are certified for automotive temperatures (�40�C to +125�C), so they're suited to under-the-hood applications, such as fan controllers and pumps. Although the chips lack duplicate lockstep cores, the slave's ability to operate independently of the master provides greater functional safety than a single-core chip. [August 13, 2018]

Figure 1: Microchip dsPIC33CH block diagram.
Table 1: Comparison of three DSC MCUs: Microchip's dsPIC33CH, NXP's Kinetis K10, and STMicroelectronics' STM32L4.

Microchip L10/L11 Bolster Security

New SAM-Series Microcontrollers Debut Arm's Cortex-M23

Once the simplest of all processors, microcontrollers are now sprouting security features that even some big processors lack. In a way, it makes sense. Built into all manner of smart devices, MCUs are more vulnerable to physical tampering as well as remote attacks. And they typically operate in embedded systems outside data centers protected by IT managers. Microchip is the latest vendor to meet these challenges by strengthening the security of its products. The new 32-bit SAM L10 and SAM L11 families are the first shipping products to use the Arm Cortex-M23 CPU. They have similar internal memories, integrated peripherals, and I/O interfaces. They employ several defenses to foil attackers, yet they still boast low power consumption, low prices, and small packages. Both families have secure memory, unique chip identifiers, true-random-number generators (TRNGs), the ability to detect tampering and erase secure data, and side-channel-attack resistance on their data flash memory and secure SRAM. The L11 family has additional defenses, including secure boot, cryptography acceleration, a second memory-protection unit to support Arm's TrustZone, data scrambling for some internal memories, and more tamper resistance. [July 30, 2018]

Figure 1: SAM L10/L11 security defenses.
Table 1: Key parameters for Microchip's new microcontrollers.

NXP S32S Drives Autos in Lockstep

New Fault-Tolerant Automotive Processor Debuts Cortex-R52

Failure is not an option when a motor vehicle must stop. To ensure that a human driver or autonomous system always maintains control, the vehicle's processors must respond under any conditions, including faults that would cripple a conventional chip. So NXP has announced a fault-tolerant automotive processor that runs four pairs of CPUs in lockstep mode. It designed the S32S247 for "any system that starts, stops, or steers the vehicle." As the first member of the S32S family, the new chip is also the first announced product to use Cortex-R52, a synthesizable CPU that Arm designed specifically for critical control. This 32-bit core supersedes the eight-year-old Cortex-R5 and is the first implementation of the Arm v8-R instruction-set architecture (ISA) announced in 2013. To host hypervisors, the R52 adds another privilege level and a second memory-protection unit (MPU). To isolate critical tasks, it can simultaneously run multiple real-time operating systems in virtual sandboxes, and it speeds up context switching and interrupt handling. [July 16, 2018]

Figure 1: NXP S32S247 automotive processor.
Figure 2: NXP FS6600 power-supply controller.
Table 1: Key parameters for NXP S32S247.

Ceva Upgrades IoT Platform

Dragonfly NB2 Implements Cat-NB2 Specification in 3GPP Release 14

New ways of enabling machines to wirelessly talk with each other are propagating as fast as blockchain startups. Ceva is staying abreast by introducing Dragonfly NB2, its second-generation platform for Narrowband IoT (NB-IoT) communications on cellular networks. Whereas the first-generation Dragonfly NB1 supported the Cat-NB1 specification in 3GPP Release 13, the new product supports the enhanced Cat-NB2 spec in Release 14. In fact, it appears to be the world's first implementation of this protocol. Dragonfly NB2 is an alternative to other low-power wireless technologies for IoT devices on wide-area networks (WANs), such as Cat-M1, Cat-M2, LoRa, Sigfox, and proprietary solutions. All these technologies make various tradeoffs in speed, cost, and power. Ceva is pitching Dragonfly NB2 for low-bandwidth, low-cost IoT devices that need the higher reliability and lower latency of a cellular network versus the uncertain connectivity in unlicensed RF spectrum. Examples include industrial machines, smart meters, asset trackers, environmental monitors, urban infrastructure, and smart farms. [July 9, 2018]

Figure 1: Dragonfly NB2 block diagram.
Table 1: Comparison of low-power wide-area (LPWA) wireless technologies.

Qualcomm Upgrades FSM for 5G-NR

[Brief Item]

Looking forward to the commercial debut of 5G-NR cellular networks in 2020, Qualcomm is readying its first small-cell base-station processors for the next-generation standard. Scheduled to sample next year, the FSM100xx-series is the first major upgrade of the FSM family since 2014. The new chips are designed for a range of wireless base stations, both indoors (e.g., homes and offices) and outdoors (e.g., dense urban areas and fixed-wireless installations). They also target remote radio heads (RRHs). Qualcomm is withholding most specifications for now but has disclosed some details. Like its existing base-station processors, the new products will be merchant-market ASSPs that handle both Layer 1 and Layer 2 functions by integrating DSP and CPU cores. Unlike some competing products, the company's FSM chips also integrate a digital front end (DFE) to reduce system cost. [June 25, 2018]

Qualcomm Targets High-End IoT

QCS605 and QCS603 Upgrade Image Processing for Computer Vision

Fish-eye lenses are great for fish, which must look in all directions for bigger fish that are hungry. But the distorted view is less than ideal for smart security cameras that must recognize human faces. That's one reason why Qualcomm has upgraded the image signal processors (ISPs) in its latest IoT chips to accelerate de-warping algorithms and other computer-vision tricks. Although the new QCS605 and QCS603 probably use the same die as the Snapdragon 710 smartphone processor, their improved ISPs and other features suit them to high-end IoT devices that need Wi-Fi and Bluetooth connectivity plus geolocation services. Like its Snapdragon forebears, the QCS605 is a fast low-power processor that bolsters its octa-core CPUs with an Adreno GPU, Hexagon DSP, and dual ISPs. Also like the Snapdragons, it uses Arm's Cortex-A75 and Cortex-A55. (Under its semicustom license, Qualcomm refers to these CPUs as the Kryo 300.) The QCS605 uses these 64-bit CPUs in the familiar Big.Little configuration, sporting two of the former and six of the latter. The lower-price QCS603 has two slower A75 cores and only two of the little A55 cores. [June 18, 2018]

Figure 1: Qualcomm QCS605 block diagram.
Table 1: Qualcomm's new IoT processors.
Table 2: Comparison of three high-performance IoT processors: Qualcomm QCS605, MediaTek MT3620, and NXP i.MX 6ULL.

Tachyum Targets Data Centers

Startup Promises Do-It-All Chip for Data Centers, HPC, and AI

Extraordinary claims require extraordinary proof, but we'll have to wait for the proof. Tachyum — a startup based in Silicon Valley and Slovakia — has announced a "universal processor" called Prodigy. Scheduled to sample in early 2020, Prodigy is designed to run data-center software, machine-learning algorithms, and high-performance-computing (HPC) workloads. Tachyum claims it will deliver 5–10x more performance per watt than today's Intel Xeon processors. The startup is withholding many details until the Hot Chips symposium in August but shared some information with Microprocessor Report. To achieve its lofty performance goals, Tachyum designed a new 64-bit architecture that combines elements of RISC, CISC, and VLIW. It says this design will not only beat today's Xeons but also compete strongly with GPUs on machine learning. In sum, Tachyum is making many promises that require extraordinary effort to fulfill. [June 11, 2018]

Figure 1: Tachyum Prodigy block diagram.

MT3620 Runs Azure Sphere for IoT

MediaTek Adopts Microsoft's Security Engine and Embedded OS

Operating systems commonly target a specific processor architecture, but rarely do they dictate a specific chip design. Microsoft has more clout than most vendors, however, and its new Azure Sphere OS for IoT has special requirements that demand custom hardware. MediaTek's MT3620, currently sampling, is the first to meet these requirements, although we expect other processor vendors will follow. The MT3620 is an unusually powerful SoC for high-end IoT devices that employ Wi-Fi. It dedicates four 32-bit Arm CPUs and a 32-bit Andes CPU to various tasks, including a dual-band RF subsystem. But its secret sauce is a Microsoft security engine, code-named Pluton, licensed to MediaTek as embedded intellectual property (IP). Pluton has cryptography accelerators and a hardware-based root of trust — crucial elements for secure boot and authenticated firmware updates. The chip also has three one-time-programmable (OTP) memories for storing crypto keys and other security essentials. [May 28, 2018]

Figure 1: Block diagram of MediaTek's MT3620 IoT processor.
Table 1: Key parameters of MediaTek's MT3620.
Table 2: Comparison of the MediaTek MT3620 and NXP i.MX 6ULL.

New Wireless MCUs Bolster IoT

NXP, ST, and TI Add Multiprotocol Radios and Stronger Security

Bluetooth Low Energy sounds like a weary medical condition that requires a dentist appointment, but IoT devices that use the short-range wireless standard are rapidly gaining popularity. In response, microcontroller vendors are introducing more chips with BLE radios, eliminating a separate transceiver chip. They're also adding radios for the Thread and Zigbee standards, which use the same unlicensed 2.4GHz band as BLE. In addition, the latest products have faster CPUs, more on-chip memory, and stronger security. More like SoCs than MCUs, they'll soon replace less efficient multichip designs. The most recent debuts are from STMicroelectronics and Texas Instruments, which both rolled out wireless MCUs based on Arm's Cortex-M4F. These products will compete with similar chips from NXP, another major MCU vendor. All of them will help IoT designers reduce their bill of materials (BOM) while making smaller devices that can run longer on battery power. [May 7, 2018]

Figure 1: Block diagram of ST's STM32WB IoT microcontroller.
Table 1: ST's STM32WB microcontroller series.
Table 2: TI's new SimpleLink MCUs.
Table 3: Comparison of new wireless MCUs: NXP's Kinetis K32W0x-series, ST's STM32WB-series, and TI's SimpleLink CC1352R.

Efinix Samples Its First FPGAs

New Trion Family Delivers Greater Density and Lower Power

The first FPGAs from Efinix appear to deliver on the startup's promises of greater gate density and lower power consumption than competing devices. Although only the smallest members of the new Trion family are sampling now, they target a high-volume market segment. Larger chips based on the same novel technology are scheduled to sample later this year. Efinix has designed and patented new cells that combine programmable logic with routing channels and hubs. Conventional FPGAs have programmable logic blocks connected to a switched routing fabric. Efinix's Exchangeable Logic and Routing (XLR) cell can either perform the usual logic operations or work as a switch for the underlying fabric. The company claims that combining both functions in one cell reduces the fabric�s physical area by 2-4x and cuts power in half. Also, the improved routing requires fewer metal layers, which reduces manufacturing costs. The first Trion FPGAs are the T4 and T8, based on the same die. According to Efinix, the first chips offer about twice the physical gate density and about the same standby power as similar FPGAs. Thus, they should undercut the cost of competing devices. [April 23, 2018]

Table 1: Effinix Trion FPGAs.
Table 2: Comparison of two low-power FPGAs: the Efinix Trion T4 and Lattice Ice40 Ultra 5LP4K.

Clockless Cortex-M3 Cuts Power

Eta Compute Offers Asynchronous CPU IP for IoT and AI

For decades, processor architects have been trying to beat the clock — the steady crystal heartbeat that regulates the synchronous logic in conventional circuits. Lower power is the reward of clockless asynchronous logic, but design complexities are the perennial obstacles. The latest company to tackle the challenge is Eta Compute, a Southern California startup. Its first product is EtaCore: an asynchronous implementation of Arm's Cortex-M3 that other companies can license for chip designs. In addition, the startup will soon announce its own EtaCore-based microcontrollers for IoT edge applications, plus some unusual neural-network software for them. It's an ambitious business plan — intellectual-property (IP) vendor, MCU supplier, AI-software provider — but the young company is exploring multiple strategies that intersect. Also, the trendy IoT and AI angles help attract funding and attention. [April 9, 2018]

Figure 1: EtaCore power consumption.
Figure 2: Clocked versus clockless logic.
Figure 3: Speech-recognition test.
Table 1: Eta Compute's low-power analog IP.

NXP Pushes i.MX8M Mini to 14nm

Lower-Cost Models Will Replace Some i.MX6 Processors

NXP is expanding its i.MX media-processor family with lower-cost i.MX8M Mini processors. Some of these newcomers will supersede existing i.MX6 products that are popular in consumer electronics and other embedded systems. These are also the company's first chips manufactured in Samsung's 14nm FinFET technology. Sporting one to four ARM Cortex-A53 CPUs plus a Cortex-M4F microcontroller core, these processors are scheduled to sample in 2Q18. They also integrate GPU cores for 2D and 3D graphics, and some have video acceleration. With their GPUs and Cortex-M4F coprocessor — which can serve as a sensor hub — they can enable a human/machine interface (HMI) on nearly any commercial, industrial, or consumer device. The Mini models with a video engine can encode and decode video streams for videoconferencing, surveillance cameras, and machine-vision inspection systems. Using the Cortex-A53s or Cortex-M4F for audio processing, they're suited to many digital-audio applications. [March 26, 2018]

Figure 1: NXP's quad-core i.MX8M Mini processor.
Table 1: NXP's i.MX8M Mini media processors.
Table 2: Comparison of two quad-core media processors: NXP's i.MX8M Mini and Synaptics' VideoSmart BG5CT.

NXP Refits i.MX and Kinetis for IoT

[Brief Item]

To expand its reach into the growing IoT market, NXP will stack radio modules on some i.MX application processors and is now sampling more-powerful Kinetis microcontrollers with fully integrated radios and new security features. These different approaches to wireless integration will give customers more choices while easing NXP's design challenges. For high-end IoT, some i.MX processors will use package-on-package (PoP) integration. They will expose ball contacts on top as well as on the bottom, so the radio module can ride piggyback. This "IoT-on-a-chip" combo is slightly thicker than a monolithic or copackaged product but simplifies and economizes the design and manufacturing. For smaller IoT devices, NXP unveiled the K32W0x MCUs — the next-generation wireless chips in the Kinetis family. They integrate CPUs, memory, and radios on a monolithic die but are less powerful than i.MX processors and support slower wireless technologies. [March 26, 2018]

AMD Embeds Graphics in Ryzen

New Zen-Based Embedded Processors Integrate Vega GPUs

AMD's first Zen-based embedded processors with integrated graphics are a major improvement over previous models and pose a serious challenge to Intel's integrated chips. Code-named Great Horned Owl, the Ryzen Embedded V1000 family is based on the Ryzen mobile-PC family ("Raven Ridge") introduced last year. The embedded chips have Ethernet ports, additional I/O interfaces, and the 10-year availability that many embedded customers require. Some have faster Zen CPUs than the mobile versions, and all are available now. Each one integrates a GPU core based on AMD's newest Vega architecture. The high-end V1807B has a Vega GPU with 11 compute units (CUs) — a far cry from the Radeon RX Vega graphics card, which has 64 CUs. Nevertheless, it easily beats Intel's integrated GPUs. The other V1000 models have fewer CUs and economize in other ways as well. [March 19, 2018]

Table 1: AMD's Ryzen Embedded V1000 family.
Table 2: Comparison of two embedded x86 processors with graphics: AMD's Ryzen Embedded V1807B and Intel's Xeon E3-1505Mv6.

Xeon D Soars With Skylake

D-2100 Family Offers Up to 18 CPUs and Optional Crypto Acceleration

Intel's Xeon D family is leaping to Skylake. The new Xeon D-2100 products now have Skylake-SP CPUs and raise the maximum core count from 16 to 18. Compared with most of the older Broadwell-based Xeon D processors, they also double the DRAM bandwidth, quadruple the DRAM capacity, double the number of 10-Gigabit Ethernet ports, and offer more PCI Express lanes and Serial ATA interfaces. Some models have QuickAssist cryptography acceleration, and all are shipping now for servers and embedded systems. In most respects, the D-2100 family is a major advance over the D-1500 Broadwell processors that shipped in 2015-2016. Essentially, they copackage a Xeon Scalable die with a C600-series south-bridge die (code-named Lewisburg). To avoid cannibalizing Xeon Scalable sales, they omit some features. Even so, they're powerful processors that serve a broader application spectrum than previous Xeon D products. [March 12, 2018]

Figure 1: Intel Xeon D-2191 block diagram.
Table 1: Intel Xeon D-2100 family.
Table 2: Comparison of three octa-core SoCs: Intel's Xeon D-2146NT, AMD's Epyc Embedded 3251, and Broadcom's BCM58808H.

AMD Embeds Epyc 3000

Epyc Embedded 3000 Family Adds Ethernet, South Bridge

After successfully introducing its Zen-based processors for desktops, laptops, and servers, AMD is now pushing Zen into the embedded market. The new Epyc Embedded 3000 family, code-named Snowy Owl, adds 10-Gigabit Ethernet (10GbE) and south-bridge interfaces to the PC chips that rejuvenated the company's fortunes last year. AMD also guarantees 10-year availability for the embedded models, which are pin compatible across the eight-member family. Epyc Embedded processors derive from the eight-core Zeppelin die in Ryzen Threadripper but add south-bridge interfaces to the die. Four models combine two of these die in the same package, and the other four have one die. To maximize production yields, several models disable some of the Zen cores and the dual threading. Thus, the family members have 4, 8, 12, or 16 CPUs and 4–32 threads. Ranking among the most powerful SoCs on the market, they're suited to networking and communications infrastructure, data-center storage, PC-like embedded systems, and other tasks that need strong performance. They lack GPU cores, however. For graphical applications, AMD simultaneously announced the Ryzen Embedded V1000 family with integrated Vega GPUs. [February 26, 2018]

Figure 1: Epyc Embedded 3000 conceptual diagram.
Figure 2: AMD's Secure Encrypted Virtualization (SEV).
Table 1: Comparison of Epyc Embedded 3000 processors.
Table 2: Comparison of AMD's Epyc Embedded 3451, Intel's Xeon D-2187NT, and Cavium's ThunderX CN8890.

Year in Review: Embedded Chips Soar to 100GbE

Processors Get More CPUs, Higher Clock Speeds, Faster I/O

In 2017, embedded processors attained new heights: 100 Gigabit Ethernet (100GbE) is becoming the new networking standard, and one ARMv8-compatible chip has 24 CPU cores. Another ARM-based processor pushed clock speeds to 3.0GHz, while 4K UltraHD is becoming the must-have feature in the latest media processors. To keep up, PCI Express Gen4 and DDR4 DRAM interfaces are appearing in some new chips. In media processors, 4K-resolution video, high dynamic range (HDR), and new content-protection schemes are becoming standard. The ARM architecture continues to steamroll through the industry, dominating almost all new designs. The usual exception is Intel, whose new Skylake-SP embedded processors raise the performance bar.

FPGAs made headlines in 2017, too. They're winning more sockets in emerging markets, such as autonomous vehicles and machine learning. Intel (which acquired Altera in 2015) began shipping the first FPGAs to integrate High Bandwidth Memory (HBM2), and Xilinx began sampling a new class of mixed-signal devices. In December, yet another FPGA startup rose to challenge the established players.

Perhaps the biggest surprise of 2017 was more industry consolidation. We had expected the pace of acquisitions to slow after the merger mania of the previous two years, if only because fewer companies were left to acquire. Instead, the surging costs of designing and manufacturing high-performance processors are forcing vendors to expand their resources faster than organic growth permits. [January 1, 2018]

Figure 1: Recent mergers of embedded-processor vendors.
Figure 2: New embedded processors for networking, 2017–2019.
Sidebar: Embedded-Processor Events of 2017

Efinix Improves FPGAs

Startup's Cell Design Promises Greater Density, Lower Power

You'd think a market dominated for decades by two entrenched leaders would discourage newcomers, but programmable logic continues to intrigue inventors — and investors. The latest attempt to breach the FPGA market ruled by Xilinx and Intel is coming from Efinix (pronounced F-N-X), a Silicon Valley startup that recently announced its first samples. The company has patented a new field-programmable logic cell that's up to 4x denser than those in conventional FPGAs, in turn reducing power consumption and cost. Founded in 2012, Efinix has raised $16 million from several investors, including Xilinx, a potential acquirer. Other investors include Samsung and several Chinese investment funds. In early October, the startup received the first samples from Chinese foundry SMIC, which fabricated the chips in its 40nm low-leakage process. Efinix says the initial chips are functional and will begin general sampling next quarter. Volume production could start as early as 3Q18 if a customer places a large order. [December 18, 2017]

Figure 1: XLR cells versus conventional FPGA logic cells.
Figure 2: Dual-function XLR cells.
Figure 3: XLR direct-drive routing.

Synaptics SoC Targets OTT STBs

New BG5CT Media Processor Recently Acquired From Marvell

Synaptics has unveiled a new SoC for the rapidly evolving set-top-box (STB) market. With its VideoSmart BG5CT, the company is repositioning itself as a chip vendor for STBs that support open platforms, such as Android TV and the open-source Reference Design Kit (RDK). These platforms cater to viewers of streaming-video services as well as traditional cable customers. Specifically, the BG5CT aims at service-operator STBs that integrate over-the-top (OTT) video with traditional pay-TV service. Because it supports transport-stream processing, the chip also targets hybrid boxes that combine OTT video with conventional broadcast TV. Lacking a cable modem, the BG5CT isn't ideal for traditional cable-TV boxes that require DOCSIS compatibility. But a growing number of "cable cutters" are canceling their traditional cable service in favor of OTT alternatives such as Netflix, Amazon Prime Video, and Hulu. Almost 50% of these viewers employ smart TVs to access these services. The rest use STBs; the leading products are Roku, Google Chromecast, Amazon Fire TV, and Apple TV. Cable and satellite operators are supporting these services in their new STBs as well. [December 4, 2017]

Figure 1: Synaptics VideoSmart BG5CT block diagram.
Table 1: Comparison of three SoCs for streaming-video set-top boxes: Synaptics VideoSmart BG5CT, NXP i.MX8M Quad, and Broadcom BCM2837.

Centriq Aces Scale-Out Performance

Qualcomm's New Server Processors Challenge the x86 Establishment

Qualcomm has its head in the clouds, but in a good way. Early benchmarking indicates its new Centriq server processors deliver excellent scale-out performance for cloud applications and data centers. Although the company's ARMv8-compatible CPUs can't match the per-core throughput of the best x86 CPUs, they rank high in throughput per thread, per watt, per dollar, and per square millimeter of silicon. These metrics translate into bargain prices for competitive performance and power consumption — and a strong debut for a newcomer to the nearly impregnable server-processor market. The Centriq 2400 family (code-named Amberwing) initially comprises three models based on the same die: the 48-core 2460, the 46-core 2452, and the 40-core 2434. Clock frequencies hover in a narrow range around 2.2GHz (base), and the parts are all about 120W TDP. List pricing, however, varies from $1,995 to a surprisingly low $888 — posing a credible challenge to Intel and AMD in view of Centriq's competitive SPEC scores. [November 20, 2017]

Figure 1: SPEC CPU2006 per gigahertz, Centriq versus x86.
Figure 2: Normalized comparison of midrange server processors.
Table 1: Qualcomm Centriq 2400 family.
Table 2: Comparison of high-end server processors.
Table 3: Comparison of midrange server processors.

LX2160A Is NXP's Biggest Multicore

New 16-CPU Processor Has 100GbE and Integrated Ethernet Switch

NXP is chasing high-end networking with its newest QorIQ processor, the LX2160A. Sporting 16 ARM Cortex-A72 cores, 100 Gigabit Ethernet, a 16-port Layer 2 switch, and faster acceleration for cryptography and data compression, it will be the company's largest and fastest multicore embedded processor when it begins production — in mid-2019, by our estimate. Announced at the recent Linley Processor Conference, the LX2160A is the most ambitious QorIQ design since the 12-core T4240, which began production five years ago. Although the LX2160A surpasses the T4240's core count and performance, the older product still offers more threads thanks to its dual-threaded Power e6500 CPUs. Nevertheless, the LX2160A has twice as many ARM CPUs as any existing QorIQ and ranks among the largest ARM-based embedded chips announced to date. It's also NXP's first chip built with FinFETs. [October 16, 2017]

Figure 1: Block diagram of NXP QorIQ LX2160A.
Figure 2: Two common examples of LX2160A switching.
Table 1: NXP's QorIQ LS2160A and derivatives.
Table 2: Comparison of three embedded processors for networking: NXP's QorIQ LX2160A, Cavium's Octeon TX CN8360, and Intel's Xeon D-1548.

Centriq Is the King of Cache

ARMv8 Server Chip Has 12MB L2, 60MB L3, PCIe, SATA, Ethernet

Qualcomm disclosed more details about its Centriq 2400 server processor at the recent Linley Processor Conference, confirming it's one of the most powerful ARMv8 designs yet. The 48-core chip, which has been sampling for nearly a year, resembles Cavium's future 54-core ThunderX2 in many respects but falls short of the best x86 server chips. Even so, it's an impressive initial effort by a new server-processor vendor to target cloud-service providers. The Centriq 2400 has 48 of the new 64-bit Falkor CPUs arranged in pairs that share an L2 cache. At the conference, Qualcomm revealed that each L2 cache is 512KB, for a total of 12MB. In addition, the L3 cache comprises twelve 5MB partitions distributed around the internal ring network that connects all the CPUs, caches, memory controllers, I/O interfaces, and other elements. Effectively, the L3 is 60MB, so the total L2/L3 is 72MB — 26% more than Intel's top-end Xeon Scalable server chip. [October 9, 2017]

Figure 1: Block diagram of Qualcomm Centriq 2400.
Table 1: Comparison of server processors: Centriq 2400, AMD Epyc, Cavium ThunderX2, and Intel Xeon Scalable.

Silexica's Hardware/Software Co-Design

System- and Software-Analysis Tools Exploit Multilevel Parallelism

German startup Silexica is pursuing two difficult goals: optimizing sequential code for parallel execution and finding the optimal hardware to run the software. Either pursuit alone would be challenging enough for most companies, but Silexica views them as inextricably linked. Parallelism has limited value if either the hardware or the software can't fully exploit it. Consequently, the company's SLX technology enables high-level systemwide analysis of both domains. SLX tools are most effective at the dawn of a design project, when both the hardware and software are malleable. Tweaking the hardware design for better parallelism can yield big gains in software performance, and vice versa. When the hardware design is already frozen or even deployed in the field, the software must adapt to it, but significant gains are still possible. [October 2, 2017]

Figure 1: System analysis using Silexica SLX tools.
Figure 2: SLX call graph.
Figure 3: SLX code highlighting.
Figure 4: SLX software-design options.

Kalray Rethinks Parallel Processing

New Coolidge Processor Has Fewer Cores but Higher Performance

Even in the semiconductor industry, sometimes less is more. While other processor vendors keep striving for higher core counts, Kalray is trying to increase efficiency by moving in the opposite direction with its newest embedded designs. But then, the French company's first product was a massively parallel 256-core chip, so there�s room to cut back. Kalray's third-generation processor, the MPPA3 Coolidge, will debut as two models that have "only" 80 or 160 cores. Their proprietary 64-bit CPUs will run at higher clock speeds than those in the company's existing processors, however. The 80-core chip is targeting 1.2GHz; to better manage power, the 160-core chip slows to 900MHz. Even that speed is faster than the fastest Kalray processors today, which operate at 600MHz. Thanks to these and other improvements, Kalray says Coolidge will deliver up to 4.6x more floating-point throughput and 9.2x more fixed-point throughput. The new products are scheduled to sample in 3Q18, and we expect production will start in mid-2019. [September 25, 2017]

Figure 1: Coolidge block diagram.
Figure 2: Coolidge's estimated performance on the GoogleNet CNN.
Table 1: Key parameters for Kalray MPPA processors.
Table 2: Comparison of three embedded processors with 100GbE interfaces: Kalray Coolidge-80, Broadcom BCM58808H, and Mellanox BlueField.

Qualcomm's Falkor Targets Servers

ARMv8-Compatible CPU Boldly Discards 32-Bit Compatibility

Stretching for the semiconductor industry's highest-hanging fruit, Qualcomm's new ARMv8 Centriq processor is targeting Intel's 99% dominance of the server market. Arriving later this year, Centriq will shake Intel's tree in the hope that some of the high-margin fruit will fall into its waiting ARMs. The new Falkor CPU is a core part of this strategy. Falkor resembles the ARM-compatible CPUs that Qualcomm formerly designed for its Snapdragon smartphone processors but adds some higher-performance features. In one major departure, it ditches 32-bit compatibility altogether in favor of software written only for the Aarch64 instruction set. Centriq is designed mainly for cloud-service providers (CSPs) that need bushels of power-efficient parallelism to run numerous virtual machines for their remote clients. Sampling for almost a year, the 48-core chips are scheduled to begin production in 4Q17. [September 18, 2017]

Figure 1: Qualcomm Falkor block diagram.
Figure 2: Falkor pipeline diagram.
Figure 3: Centriq 2400 block diagram.
Figure 4: Memory-bandwidth compression.
Figure 5: Falkor's quality-of-service optimization.

Mellanox Accelerates BlueField SoC

New ARMv8 Processor Targets 200Gbps Networking and NVMe-oF

High-speed network adapters and distributed flash-storage arrays are about to get a boost. Mellanox is testing the first silicon of its new 16-core BlueField processor and plans to begin general sampling in October. Barring any last-minute problems, volume production should start in 1H18. The company has doubled the chip's original packet-throughput target to 200Gbps. BlueField combines intellectual property from three recently merged companies: Mellanox, EZchip, and Tilera. Mellanox's main contribution is the ConnectX-5 Ethernet adapter, which becomes a fully integrated subsystem in the new SoC. From EZchip, the processor inherits vital packet acceleration. And from Tilera, it gains cryptography acceleration, a previously unreleased ARMv8 design, and experience building manycore processors using meshed tiles of programmable CPUs. [August 21, 2017]

Figure 1: Mellanox BlueField block diagram.
Figure 2: BlueField flash-array controller.
Table 1: Key parameters for Mellanox BlueField processors.
Table 2: Comparison of 100GbE networking processors: Mellanox BlueField and Broadcom NetXtreme BCM58808H.

Embedded Skylake Speeds Networking

Intel Unveils New Xeon Processors and South-Bridge Accelerators

Bronze, Silver, Gold, Platinum: four color-coded product tiers familiar to anyone who has purchased an Obamacare health-insurance plan. They're also Intel's new tiers for Xeon Scalable processors. These chips supersede the Xeon E5v4 embedded processors that use the Broadwell-EP core. Like the insurance policies, the lower tiers (Bronze and Silver) cost less but offer fewer benefits than the higher tiers (Gold and Platinum). The 16 new Xeon embedded processors derive from the new Skylake-SP server processors but have extended availability. Had Intel kept its usual branding, they would be Xeon E5v5 products. These chips have an improved CPU microarchitecture that executes about 5% more instructions per clock cycle than Broadwell. In addition, they exceed their Xeon E5v4 predecessors in core count, clock frequency, memory bandwidth, PCI Express lanes, multisocket connectivity, power consumption, and list price. [July 24, 2017]

Figure 1: Block diagram of Intel's Purley platform.
Figure 2: Intel's new key-protection technology.
Figure 3: Intel's new product nomenclature for Xeon Scalable processors.
Table 1: Intel's new C62x-series south bridge (Lewisburg).
Table 2: Intel Xeon Scalable embedded processors.
Table 3: Comparison of high-end embedded processors from Intel and Cavium.

AMS Hides Smartphone Sensors

New Proximity Detectors Need No Holes in Screen Bezels

Smartphone makers are scrounging for new ways to differentiate their products and to design phones that resemble a solid slab of smooth glass. One obsession is removing all blemishes from the front surface — including the tiny holes, or "apertures," in the screen's top bezel for the speaker, front camera, and sensors. The biggest aperture by far is the elongated speaker slot — but it's necessary until the last few holdouts finally stop using their phones to make phone calls. The next-largest aperture is for the selfie camera's lens, but it's required until narcissism becomes unfashionable. So, by process of elimination, the apertures for the front-facing sensors are the best candidates for elimination. That's why AMS, the industry's largest supplier of light sensors, has invented new modules that can hide behind an inked screen bezel of any color without sacrificing performance. [May 22, 2017]

Figure 1: The evolution of apertures in smartphone bezels.
Figure 2: Proximity-sensor design tradeoffs.
Figure 3: A two-chip no-aperture proximity sensor.
Figure 4: An AMS 3-in-1 color-sensor module.
Figure 5: 3D vision using structured light.

EEMBC Benchmarks IoT Power

IoTMark-BLE Debuts; More Wireless Protocols to Follow

Debates over whose microcontroller and Bluetooth radio module are more power efficient for IoT applications will become easier to settle with EEMBC's newest benchmarks. The industry consortium has introduced its first IoT-Communications suite, which measures the power consumption of a typical IoT client that transfers data using Bluetooth Low Energy (BLE). By far the most complex suite EEMBC has developed in its 20-year history, IoTMark-BLE is available for order now by members and nonmembers. It's scheduled to ship in June. Instead of measuring power, IoTMark-BLE actually measures energy consumption — power over time. The distinction is important because battery life depends on the total current a microcontroller draws to perform a particular task, regardless of its throughput performance. Thus, IoTMark-BLE measures total energy consumption during several sleep-wake-sleep cycles for a typical task. [May 15, 2017]

Figure 1: Conceptual diagrams of an IoT client and energy profile.
Figure 2: EEMBC's IoTMark-BLE framework.
Figure 3: IoTMark-BLE test timeline.

NXP Airs Software-Defined Radio

New QorIQ LA1575 Processor Has Programmable Baseband Engines

NXP's first QorIQ LA-series chip has programmable baseband engines that can perform Layer 1 and Layer 2 network processing in software, so it's adaptable to multiple communications standards. Along with its enhanced packet acceleration, the new LA1575 has enough horsepower to serve in multiple roles, including next-generation Wi-Fi routers, 5G cellular radios, and mixed wired/wireless applications, such as fixed-wireless nodes in neighborhood fiber-optic networks. "LA" stands for Layerscape Access — a nod to the Layerscape chip architecture that is the foundation of all ARM-based QorIQ processors. Designed primarily for residential and small-business Internet gateways, enterprise access points, and fixed-wireless applications, the LA1575 will be available in dual- and quad-core variants. Both models have ARM Cortex-A53 CPUs, and their target clock frequency is 1.4�1.6GHz. Although NXP is withholding many specifications for now, the most important new features are a programmable vector engine for Layer 1 processing, enhanced accelerators for Layer 2 processing, and an integrated RFIC interface with analog-to-digital and digital-to-analog converters (ADCs/DACs). [April 3, 2017]

Figure 1: Block diagram of NXP's QorIQ LA1575.
Figure 2: NXP's LA1575 Wi-Fi software stack.
Table 1: Comparison of NXP's QorIQ LA1575, Broadcom's StrataGX BCM58713, and Cavium's Octeon TX CN8030.

Smaller Offspring for TI's Sitara

[Brief Item]

Bargain hunters will appreciate two additions to Texas Instruments' ARM-based Sitara processor family: the AM5706 and AM5708. Sampling now, they extend the AM57x-series into embedded applications that require lower power, lower cost, and less board space. Like other AM57x processors, they integrate an ARM Cortex-A15 CPU with two Cortex-M4 controller cores, a C66x DSP, and TI's own programmable controller cores. Several economy measures suit the AM5706 and AM5708 to low-end applications, such as drones, remote sensors, and motors. Compared with the smallest previous AM57x processors, they reduce the maximum clock speed of the Cortex-A15 CPU by 33% to 1.0GHz. The C66x DSP core runs at 750MHz as usual, but some models have a 500MHz CPU and DSP. No competing products can match their features for signal processing, floating-point throughput, and real-time control. [March 13, 2017]

i.MX8 Masters Multimedia

Audio, Video, and Graphics Enliven NXP's 64-Bit ARM Processors

NXP is offering new 64-bit media processors that can bring the latest digital video and audio to TVs and other embedded systems. Four chips in the new i.MX8M-series have ARMv8 CPUs and enough additional processing power for most digital-media applications. The superset design is the i.MX8M Quad, which has four 64-bit Cortex-A53 CPUs, one 32-bit Cortex-M4F coprocessor, a VeriSilicon GPU, a high-performance video engine, a multichannel audio engine, and dual camera interfaces. The i.MX8M QuadLite, Dual, and Solo models are subsets of this design. All four chips are scheduled to sample in 1Q17 and reach volume production in 4Q17. The new i.MX8M processors comprise the media-centric branch of the i.MX8 family, which has now more than doubled in size. The first three chips in this family — the i.MX8 QuadMax, QuadPlus, and Quad — are even more powerful. The i.MX8 superset design is the QuadMax, which more than doubles the processing power by adding two Cortex-A72 CPUs to the i.MX8M configuration. NXP expects to sample the i.MX8-series by 2Q17, with production sometime in 2H18. [January 23, 2017]

Figure 1: Block diagram of i.MX8M Quad.
Table 1: NXP's i.MX8/8M processor family.
Table 2: Comparison of NXP's i.MX8M Quad, Marvell's Armada 1500U, NXP's i.MX8 QuadMax, and MediaTek's MT8693.

Year in Review: More Embedded Mergers in 2016

Consolidation Creates New Giants, but Some Products Suffer

Addendum to Moore's Law: semiconductor-industry mergers are doubling in frequency every 24 months. At least that's how it seemed in 2016. Companies continued to devour one another, further consolidating an embedded-processor market dominated by a dwindling number of major players. Although the acquisitions are creating larger companies with more resources, some products and roadmaps are falling victim to cost cutting. Looking forward, we expect 2017 to be a transitional year as the companies involved in the biggest mergers digest their large bites and the fabless vendors begin their move to next-generation process technology. ARM will gain momentum without slowing Intel's. Qualcomm will be the rising star after absorbing NXP, and Broadcom and Cavium will battle for third place. [December 26, 2016]

Figure 1: Worldwide revenue market share of the leading embedded-processor vendors.
Figure 2: Cavium's new Octeon TX family.
Figure 3: Forecast of mobile subscriptions by radio technology.

Splashdown for Intel's Apollo Lake

Goldmont CPU Debuts in Atom, Celeron, and Pentium Processors

Intel is introducing several new processors that offer the enhanced Goldmont CPU core and stronger security features. Code-named Apollo Lake, these 14nm chips include six Celeron and Pentium products for entry-level PCs, three Atom E3900 embedded processors, and additional A3900 embedded models for automotive. Their integrated GPUs support 4K-resolution graphics and up to three displays. The embedded models target IoT gateways, industrial automation, vehicle infotainment systems, automotive instrument panels, driver-assistance systems, retail kiosks, and other high-end applications. Apollo Lake supersedes the three-year-old Bay Trail. The PC versions are shipping now, and the embedded models are scheduled for volume production next quarter. [November 14, 2016]

Figure 1: Apollo Lake block diagram.
Table 1: Apollo Lake PC processors.
Table 2: Intel's Atom E3900 embedded processors.
Table 3: Comparison of Intel's Atom x7-E3950, AMD's GX-412HC, and Texas Instruments' Sitara AM5728.

New CoreLink IP Targets Servers

[Brief Item]

ARM announced higher-performance versions of its CoreLink on-chip interconnect and DRAM controller at the recent Linley Processor Conference. The new CoreLink CMN-600 can join up to 128 CPUs in a memory-coherent mesh network while boosting throughput by up to 5x over ARM's existing interconnects — making a play for server processors. And the new CoreLink DMC-620 is an enterprise-class memory controller that slashes the DRAM latency by up to 50% compared with the company's previous controller. Preliminary RTL for both products is available now as licensable intellectual property (IP), and we expect production RTL to arrive later this year. [October 10, 2016]

Cortex-R52: Safer Real-Time Control

ARM Unveils Its First ARMv8-R Core for Vital Control Systems

ARM is introducing its most advanced CPU core for safety-critical controllers in automotive, industrial, and medical systems. The new Cortex-R52 is a 32-bit ARMv8-R design that supports hypervisors by adding another privilege level and a second memory-protection unit. It can simultaneously run multiple real-time operating systems in virtual sandboxes, isolating critical tasks from others. It also boosts performance relative to the existing Cortex-R5, offering superior throughput, optional Neon SIMD extensions, faster context switching, and faster interrupt handling. Long awaited, Cortex-R52 is the first implementation of the ARMv8-R instruction-set architecture (ISA) announced three years ago. The new core omits the 64-bit features of ARMv8 and implements only a subset of the cryptographic instructions. But it's backward compatible with the 32-bit ARMv7-R architecture, including the compressed Thumb instructions. [October 3, 2016]

Table 1: Comparison of ARM's new Cortex-R52 and six-year-old Cortex-R5.

Phytium Samples 64-Core ARMv8

Chinese Company Already in Production With Smaller Processors

Chinese chip vendor Phytium has demonstrated working samples of the world's largest ARMv8-compatible server processor and expects to start production this year. The 64-core FT-2000/64, previously known as Mars, targets a maximum CPU frequency of 2.0GHz and will initially sell to domestic customers. The company has also disclosed the first details of two smaller processors: the 16-core FT-1500A/16 and 4-core FT-1500A/4. These 1.5GHz chips employ a slightly less muscular ARMv8-compatible CPU core that's more power efficient but is otherwise similar to the bigger processor's core. Phytium says both chips have been in production since last year. The 16-core model is designed for web servers, cloud computing, transaction processing, and network switching. The quad-core model is designed for small servers, desktops, laptops, and embedded systems. Phytium showed the FT-2000/64 in a 2U server at the recent Hot Chips conference in Silicon Valley. It was running cloud-computing software from Tianjin Kylin — an incorporated spinoff from China's National University of Defense Technology, which built the world's fastest supercomputers, Tianhe-1 and Tianhe-2, in 2010 and 2013. [September 26, 2016]

Table 1: Phytium's ARMv8-compatible processors.

Synopsys Debuts Secure ARC Cores

[Brief Item]

Responding to pleas for stronger security in IoT devices and embedded systems, Synopsys is introducing two new products in its DesignWare ARC EM family of licensable CPU cores. Compared with earlier ARC products, the new ARC SEM110 and SEM120D add several security features, including an improved trusted execution environment with secure privilege levels, a special interface for true-random-number generators, a secure debug interface, and countermeasures against side-channel attacks. The company is pitching these 32-bit synthesizable cores for low-power processors that must protect monetary transactions or other important data. Example applications include mobile devices that enable NFC payments, embedded SIM cards, smart meters, and IoT clients that store sensitive information. [September 26, 2016]

Oracle Sparc Accelerates Big Data

On-Chip Hardware Optimized for Databases and Analytics

At the recent Hot Chips conference, Oracle revealed new details about the database-acceleration capabilities in its Sparc M7 and S7 processors, which it unveiled at the last two annual conferences. Now shipping in Oracle systems and servers, the 32-core Sparc M7 is the flagship product, and the 8-core Sparc S7 (code-named Sonoma) is the economy model. Both processors integrate the same acceleration. Oracle says they can speed up some database operations by 10–23x and retrieve results up to 30% faster than the same software running without the new hardware. Data decompression is 8–11x faster, and a new compression algorithm can shrink memory-resident data by 2–5x. In a footrace with a previous-generation Sparc T5 system, a Sparc M7 server handled 9x more database queries per hour, delivered 11x more performance per watt, and reduced CPU utilization by 3x. Another feature is stronger security. Oracle says its enhanced memory protection effectively repels buffer-overflow attacks, such as the Heartbleed exploit that afflicted the OpenSSL cryptography library in 2014. [September 19, 2016]

Figure 1: Oracle's data-analytics accelerator (DAX).
Figure 2: DAX function calls.
Figure 3: Sparc CPU utilization with and without DAX acceleration.
Table 1: Oracle DB compression levels.

Power9 Scales Up and Out

Four Different Versions Target Scale-Up and Scale-Out Servers

Much as software engineers fork an existing code base to derive a new program, IBM is forking its Power8 processor to derive a quartet of new Power9 designs. These future chips will have 12 or 24 quad- or octa-threaded CPU cores and different memory subsystems for either scale-up or scale-out servers. The goal is to offer a few processors optimized for the most important server-market segments without duplicating Intel's vast Xeon catalog. Power9 will be IBM's first product manufactured in 14nm FinFET technology — and the first outsourced to a foundry. The company did not disclose a schedule for its new processor; we believe the initial devices are already in silicon, and the first systems will begin shipping in 2017. IBM says the new CPU cores deliver about 1.5–2.5x more throughput than Power8 cores running at the same clock frequency. In addition, the four new chip designs address various shortcomings that are limiting Power8's adoption by third-party system vendors. [September 5, 2016]

Figure 1: Four initial IBM Power9 designs.
Figure 2: Power9 SMT8 and SMT4 cores.
Figure 3: Power8 versus Power9 pipelines.
Figure 4: Power9 versus Power8 performance.
Figure 5: Power9 main-memory subsystems.
Table 1: SMT4 execution resources.
Table 2: Parameters of IBM's four initial Power9 designs.
Table 3: IBM Power9 versus two Intel Xeon server processors.

EEMBC Upgrades Auto Benchmarks

[Brief Item]

The Embedded Microprocessor Benchmark Consortium (EEMBC) has upgraded its automotive suite to work with the multithreaded and multicore processors that are becoming more common in vehicle-control and entertainment systems. Additional improvements enable testers to combine multiple components of the suite and to use larger data sets when benchmarking processors that have big caches. The new AutoBench 2.0 suite is available now to EEMBC members and nonmember licensees. [August 22, 2016]

Broadwell Accelerates the DPDK

Intel Silicon and Software Improvements Challenge RISC SoCs

Intel is quickening its march into networking with new acceleration features in the latest Broadwell Xeon chips. These features speed up common tasks such as cryptography, packet I/O, forwarding, and virtualization. As usual, though, the company prefers to execute most networking tasks in software running on its powerful x86 CPUs instead of using specialized hardware engines. Other processor vendors prefer the latter approach. Fortunately for OEMs and their customers, these differences are partly mitigated by using the Data Plane Development Kit — originally an Intel invention that the industry has adopted as a BSD-licensed open-source standard through the DPDK.org community. All the major vendors of networking-oriented RISC SoCs have embraced the DPDK as well. The latest release enables four quality-of-service techniques that Intel collectively calls Resource Director Technology (RDT). Although several Haswell Xeon chips implemented some RDT features, new Broadwell Xeons implement all of them. [July 4, 2016]

Figure 1: Improvements in basic Layer 3 forwarding performance.
Figure 2: Effects of Intel's cache-allocation technology.

Mellanox Marries ConnectX to Tile-Mx

[Brief Item]

Mellanox has announced a new SoC family that integrates its ConnectX Ethernet adapter designs with the Tilera processor technology acquired with EZchip. Code-named BlueField, the new SoCs will have up to 16 ARM Cortex-A72 cores and are scheduled to sample in 1Q17. BlueField is the first fruit of Mellanox's February acquisition of EZchip, which in turn had acquired Tilera in 2014. One apparent casualty of the consolidation, however, is the 100-core Tile-Mx processor that EZchip announced before the Mellanox deal. Instead, Mellanox is greatly reducing the core count and upgrading the CPUs. By our estimate, a 16-core BlueField chip will have about the same CPU horsepower as a Cortex-A53-based 40-core chip. Thanks to the ConnectX acceleration, however, BlueField still targets 100-gigabit networking, as the Tile-Mx100 did. [June 13, 2016]

NXP's Cortex-72 Quartet

Four New QorIQ LS2 Chips Use ARM's High-End CPU

NXP's embedded processors keep multiplying like rabbits. The latest litter includes four members of the QorIQ LS2 family that use ARM's most powerful CPU, the 64-bit Cortex-A72. All four are quad- or octa-core designs boasting maximum clock speeds of 2.0GHz, and they have the company's second-generation packet-processing hardware. Two of them also integrate 10G Ethernet switches. The new products are the octa-core LS2088A and its quad-core near twin, the LS2048A, plus the octa-core LS2084A and its quad-core near twin, the LS2044A. All are scheduled to sample this quarter and qualify for volume production in the fourth quarter. All are designed primarily for networking and communications. The new Cortex-A72 chips expand the QorIQ family's reach into higher-end systems — particularly those that will implement network functions virtualization (NFV) and software-defined networking (SDN). [May 16, 2016]

Figure 1: Block diagram of NXP's QorIQ LS2088A embedded processor.
Table 1: NXP's four new QorIQ LS2 processors.
Table 2: Comparison of NXP's QorIQ LS2088A with Cavium's Octeon TX CN8240 and CN8360.

NXP Economizes With LS2080A

Derivative Cortex-A57 QorIQ Processors Trim Power and Cost

Shortly after announcing three new LS1-series chips in March and April, NXP plans to sample two lower-cost embedded processors in the QorIQ LS2 family this quarter. The new products are the octa-core LS2080A and quad-core LS2040A, which use the 64-bit ARM Cortex-A57. They are similar to the existing LS2085A and LS2045A but trim a few features to reduce power consumption and enable lower prices. Although networking is the main target, LS2 chips are widely used in industrial and other embedded applications. The LS2080A and LS2040A are designed for enterprise routers, line-card controllers, security appliances, virtual customer premises equipment (vCPE), and service-provider gateways. They are so closely related to the existing LS2085A and LS2045A that we believe they are based on the same die. [May 9, 2016]

Figure 1: Block diagram of NXP's QorIQ LS2080A processor.
Table 1: Comparison of NXP's QorIQ LS204xA and LS208xA processors.

NXP Debuts QorIQ Cortex-A72

New Dual- and Quad-Core LS1 Chips Boost CPU Performance

NXP is on the verge of sampling the industry's first embedded processors that use ARM's Cortex-A72. The quad-core LS1046A and dual-core LS1026A will bring greater CPU performance to the QorIQ family's LS1 series, which currently comprises 32- and 64-bit chips based on Cortex-A7, Cortex-A9, and Cortex-A53. The new A72 processors are scheduled to sample this quarter and begin production later this year. As usual, networking is the main target, but the chips are also useful for industrial and general embedded applications. With their dual 10G Ethernet (10GbE) controllers, four GbE controllers, and one 2.5GbE controller, the new processors are well equipped for enterprise routers, switches, line-card controllers, security appliances, virtual customer premises equipment (vCPE), service-provider gateways, and network-attached storage (NAS). [April 25, 2016]

Figure 1: Block diagram of NXP's QorIQ LS1046A processor.
Table 1: Comparison of NXP's QorIQ LS1046A, AMD's Opteron A1120, and AppliedMicro's Helix 2 APM887104-H2 processors.

New Intel Modems Target IoT

Three Modem Chips Span 3GPP Generations From 2G to NB-IoT

Intel has revealed more details about three cellular modem chips it announced at the recent Mobile World Congress. All three are designed for use with embedded processors that need an external modem for wireless connectivity, and they reinforce the company's push into the cellular IoT market. The XMM 7120M is an LTE Category 1 modem that stacks a baseband processor, flash memory, DRAM, and power-management unit (PMU) in one package. It's intended primarily for machine-to-machine (M2M) applications, such as factory equipment, smart meters, medical devices, security cameras, and point-of-sale (PoS) terminals. Another modem, the XMM 7115, is intended for cellular IoT clients that will use the 3GPP Release 13 protocol for Narrowband IoT (NB-IoT — unofficially known as LTE Category M2, or LTE-M2). For IoT and M2M systems that don't need LTE connectivity, the XMM 6255M is a dual-band 2G/3G modem. Like the XMM 7120M, it stacks flash memory, DRAM, and a PMU on the baseband die, but its package is about 30% smaller. [April 4, 2016]

Table 1: Intel's new cellular modem chips for embedded applications.
Table 2: 3GPP protocols for low-cost systems.

New Intel SoCs Target Autos and IoT

Two Integrated-Baseband Processors Embrace Emerging Standards

Intel is making a stronger play for low-power embedded systems that need wireless communications by introducing two new processors with integrated LTE modems. Although it announced these products at the recent Mobile World Congress, the technical details are only now trickling out, and we don't expect volume production to begin until later this year or next. One new processor is a quad-core Atom, and the other uses a Quark CPU. The former is the x3-M7272, which has four Airmont CPUs operating at a maximum clock frequency of 1.2GHz. The latter is the XMM 7315, which uses a Lakemont core as the application CPU. Intel's new SoCs are vital to its strategy of pushing the x86 architecture deeper into embedded markets. [March 28, 2016]

Table 1: Intel's new processors for automotive telematics and IoT.
Table 2: LTE protocols for low-power systems.

NXP's Lowest-Power 64-Bit ARM

[Brief Item]

NXP has announced its first QorIQ product after absorbing Freescale — a low-power 64-bit ARM processor for IoT gateways, low-end routers, networked storage, printers, and factory automation. The new LS1012A slashes typical power consumption to about 1W by operating a single Cortex-A53 CPU at an 800MHz maximum clock speed. NXP bills it as the world's smallest and lowest-power 64-bit processor. In fact, it beats even the existing 32-bit LS1 chips. For networking, it has two Gigabit Ethernet (GbE) controllers that can also handle 2.5GbE links. For cryptography, it has the company's SEC 5.5 security engine, which boosts IPSec throughput to 1.1Gbps. In addition, the chip includes enough packet-acceleration hardware for line-rate networking, although it doesn't implement the second-generation Data Path Acceleration Architecture (DPAA2) that's coming in higher-end LS1 and LS2 chips. [March 14, 2016]

Broadcom Samples First ARMv8 Chip

Wi-Fi Router Processor Uses Customized Cortex-A53 CPUs

Broadcom is sampling its first announced 64-bit ARM chip, but it's not the much-anticipated Vulcan server processor. It's the BCM4908, an embedded processor for home Wi-Fi routers and enterprise access points. It integrates four Cortex-"B53" cores — licensed Cortex-A53 CPUs that Broadcom has customized. The BCM4908 is designed for 802.11ac Wave 2 routers that will implement such advanced features as 4x4 MIMO, multiuser MIMO (MU-MIMO), and high-bandwidth 160MHz channels. [February 15, 2016]

Figure 1: Broadcom's BCM4908 in an 802.11ac Wave 2 router.
Figure 2: Broadcom BCM4908 block diagram.
Table 1: Broadcom's BCM4908 versus NXP's QorIQ LS1043A and Qualcomm's IPQ8064.

Broadcom Ships First 28nm StrataGX

[Brief Item]

Broadcom's first StrataGX embedded processors built in 28nm technology bring new security features to the product line while slashing power consumption. The ARM-based BCM583xx chips, which are shipping now, typically consume about 1.5W. The previous StrataGX BCM585xx and 586xx built in 40nm CMOS typically consume 2.5-3.0W. The process shrink alone wasn't enough to achieve those power savings, however, so Broadcom also pared back some features and performance. Nevertheless, some of the new chips have additional security hardware and are particularly useful for point-of-sale (PoS) terminals, credit-card kiosks, and other secure systems. [January 18, 2016]

First Quark MCUs Target IoT

New Intel Microcontrollers Lag Competitors in Cost and Power

Intel's first Quark MCUs have finally arrived, and they are struggling to compete with similarly priced ARM-based MCUs in CPU performance, memory capacity, and standby power. For the most part, they stand out in only one respect: they are x86 compatible. If that difference matters, Quark has an unmatched advantage. The company began shipping the initial Quark D1000 ("Silver Butte") in November. An additional model, the Quark D2000 ("Mint Valley"), is beginning production now, and a third model, the Quark SE ("Atlas Peak") is coming in 2Q16. All three chips are similar and offer typical 32-bit MCU features. The SE differs from its siblings by integrating a DSP sensor hub and a pattern-matching "neural" engine that Intel says is capable of rudimentary machine learning. All these MCUs clock rather slowly at 32MHz and use the Quark CPU core, which mates a 32-bit Pentium instruction set from 1993 with a 486-class microarchitecture from 1989. [January 11, 2016]

Figure 1: Block diagram of Intel's Quark D1000 microcontroller.
Table 1: Intel's first Quark microcontrollers.
Table 2: Intel's Quark D1000 versus selected ARM Cortex-M4 MCUs.

ARM Dons Thicker Armor

New ARMv8-M, TrustZone, and Amba 5 Protect Small Systems

Heeding the call for better security in embedded systems and the Internet of (Insecure) Things, ARM has introduced a new subset of its ARMv8 architecture and a new Amba bus for future Cortex-M cores. The new ARMv8-M architecture brings the company's TrustZone security technology to even the smallest microcontrollers and deeply embedded systems. It's optimized for 32-bit chips in devices as tiny as sensors, smartwatches, and IoT end points. The improved TrustZone is a crucial part of ARMv8-M. New hardware will enforce greater separation between secure and nonsecure code and data while easing software development in some respects. And the new Amba 5 AHB5 on-chip bus can extend TrustZone beyond the CPU to protect other SoC components, including integrated peripherals, SRAM, and flash memory. All together, they constitute ARM's most extensive security upgrade since the original TrustZone made its debut 12 years ago. [November 16, 2015]

Figure 1: ARMv8-M memory protection.
Figure 2: TrustZone interrupt-mask banking.
Figure 3: TrustZone interrupt isolation.
Figure 4: TrustZone secure function calls.
Figure 5: ARMv8-M memory maps.
Figure 6: Amba 5 Advanced High-Performance Bus (AHB).
Table 1: ARMv8-M Baseline enhancements.

AMD Embeds Carrizo

Embedded R-Series SoCs Integrate South Bridge and Excavator CPUs

AMD's new Embedded R-Series processors are the company's most highly integrated SoCs to date. They include the latest Excavator x86 CPUs, the south-bridge logic, dual DDR4 controllers, and ARM security coprocessors. Three models also have Radeon GPUs and 4K video decoders. Code-named Merlin Falcon, the new R-series comprises five distinct models, not counting the extended-temperature versions. They improve on the "Bald Eagle" Embedded R-Series chips introduced last year, mainly by replacing the Steamroller CPUs with Excavator and by integrating the south bridge. In fact, the new chips are almost identical to the Carrizo processors introduced last February for low-cost desktop PCs, notebooks, and tablets. One difference: they don't support adaptive voltage and frequency scaling (AVFS). Instead, they have "configurable TDPs," meaning they can stay within a desired thermal design power by operating at a clock frequency and voltage in their nominal range. [November 2, 2015]

Figure 1: AMD Embedded R-Series versus Intel ULV Core processors (CoreMark and 3DMark11).
Table 1: AMD Embedded R-Series processors.

TI Sitara Chips Sprout DSPs

New AM57x Processors Integrate C66x DSPs With Cortex-A15

Sooner or later, it seems, Texas Instruments always reverts to its roots by integrating DSP cores in its embedded processors. Now, the Sitara family is getting its first DSPs, following a trail blazed by TI's OMAP, DaVinci, Integra, and C6000 chip families. The new Sitara AM57x series is currently sampling and is scheduled for volume production early next year. The new AM5716, AM5718, AM5726, and AM5728 have one or two ARM Cortex-A15 CPUs operating at 500MHz or 1.5GHz, plus one or two TI C66x DSP cores operating at 500MHz or 750MHz. The AM5718 and AM5728 add one or two PowerVR SGX544 GPU cores from Imagination Technologies and a GC320 graphics core from Vivante. Thanks to these and other features, the Sitara AM57x line supersedes TI's older DaVinci media processors in almost every respect. [October 19, 2015]

Figure 1: Texas Instruments Sitara AM5728 block diagram.
Table 1: Texas Instruments Sitara AM57x series.
Table 2: Comparison of TI's Sitara AM5728, DaVinci DM8168, and KeyStone II 66AK2E02.

Hexagon 680 Adds Vector Extensions

HVX Image-Processing Instructions Debut in Snapdragon 820

At the recent Hot Chips conference in Silicon Valley, Qualcomm introduced 1,024-bit SIMD extensions that turn its new Hexagon 680 DSP into a power-efficient image-processing engine. Although these Hexagon Vector Extensions (HVX) won't replace the phone's dedicated image signal processor (ISP), they can offload some tasks from the ISP, the GPU, and the application-processor CPUs, which are ARM-compatible cores with Neon SIMD extensions. The company says the new 1,024-bit vectors can perform eight times as many operations per clock cycle as the 128-bit Neon vectors while using only 6-25% as much energy per operation. The first chip to include the Hexagon 680 DSP core with HVX is the forthcoming Snapdragon 820, which we expect to appear in phones in 1Q16. [September 14, 2015]

Figure 1: Snapdragon 820 block diagram.
Figure 2: HVX vector processing.
Figure 3: Hexagon 680 block diagram.
Figure 4: HVX programmer's view.
Figure 5: Preprocessing digital images with HVX.
Figure 6: HVX versus Krait plus Neon.

Oracle Shrinks Sparc M7

Octa-Core Sonoma Processor Aims for Scale-Out Servers

After cramming a record-setting 10 billion transistors into the 32-core Sparc M7 server processor last year, Oracle is introducing a smaller version with one-fourth as many CPUs. Code-named Sonoma, the new octa-core chip is actually better integrated in some ways — it's the first server processor to include an InfiniBand host channel adapter for clustering and remote direct memory access (RDMA). Whereas the powerful Sparc M7 is designed for scale-up computing, Sonoma is designed for scale-out applications. Yet it retains the bigger processor's unique features, such as hardware accelerators for the company's database software and application-data integrity checking. Other features for reliability, availability, and serviceability (RAS) made the cut, too. At the recent Hot Chips conference, Oracle presented Sonoma as a junior version of the Sparc M7 that costs less money, consumes less power, and requires less board space. [September 7, 2015]

Figure 1: Die plot of Oracle's Sparc Sonoma processor.
Figure 2: Sonoma's InfiniBand virtualization.
Figure 3: Sonoma versus Sparc T5-2.
Table 1: Sparc Sonoma versus Intel's Xeon D1540 and Xeon E5-2630Lv3.

QuickLogic's Smarter Sensor Hub

EOS-S3 Integrates ARM, DSP, FPGA, and Voice Triggering

Paying attention to boring chitchat can be draining, but today's smartphones and other voice-enabled devices must constantly listen to our conversations to detect keyword commands and passphrases. QuickLogic's new EOS-S3 sensor hub makes that tedium more power efficient than ever. It includes an always-on sound detector that can listen and respond to predefined voice triggers while drawing a mere 350 microamps. This highly integrated SoC also has an ARM Cortex-M4, a micro-DSP core, and a programmable-logic fabric. Capable of monitoring up to 20 sensors, the EOS-S3 is designed for smartphones, tablets, Internet of Things (IoT) devices, and wearables. The EOS-S3 implements the Low Power Sound Detector and TrulyHandsFree technology from Sensory, which claims 95% command-recognition accuracy even in noisy environments. [August 17, 2015]

Figure 1: QuickLogic EOS-S3 block diagram.
Figure 2: QuickLogic's EOS-S3 in a typical IoT or wearable design.
Table 1: EOS-S3 power consumption.

Freescale Overhauls the Data Plane

DPAA2 Streamlines Packet Processing for Future QorIQ Chips

Freescale is extensively overhauling its Data Path Acceleration Architecture (DPAA), a blanket term for the specialized packet-processing logic in QorIQ chips. DPAA2 is a major revision of the data plane that is more powerful, more flexible, and more programmable than the company's previous designs. It was inspired by Nokia's Open Event Machine, a model for nonblocking data-plane processing that supersedes conventional thread-based models in multicore processors. The industrywide OpenDataPlane initiative is loosely based on Open Event Machine and is promoted by Linaro, a consortium that develops open-source Linux software for the ARM architecture. OpenDataPlane also supports the Power, MIPS, and x86 architectures, and Freescale is implementing DPAA2 in all of its QorIQ processors, not just the ARM chips. DPAA2 will debut in the QorIQ LS2085A and LS2045A, a pair of ARM-based communications processors that began sampling in 1Q15. [July 20, 2015]

Figure 1: DPAA2 architecture.
Figure 2: Advanced I/O Processor block diagram.
Figure 3: DPAA2's queue/buffer manager.
Figure 4: Layer 2 Ethernet switch.

QorIQ Chips Add More ARMs

Freescale's New LS1 Processors Have Up to Eight 64-Bit CPUs

Freescale is expanding its QorIQ family by adding the most powerful members of the LS1 series announced to date. The new LS1048A and LS1088A have four or eight ARM Cortex-A53 cores operating at 1.5GHz — plus the company's much improved packet-acceleration hardware. Although they aren't the first LS1-series chips to use Cortex-A53, the LS1088A is the first to have more than four cores. These new chips are designed mainly for intelligent network interface cards (NICs) and edge routers, and they are also useful for industrial and aerospace applications. They have dual 10 Gigabit Ethernet (10GbE) ports, eight GbE ports, cryptography engines, and Freescale's second-generation packet-acceleration hardware (Data Path Acceleration Architecture, or DPAA2). Freescale also made two important roadmap announcements: future LS2-series processors will use the more muscular Cortex-A72, and the Power Architecture branch of the QorIQ family will advance to 16nm FinFET technology. Some of those 16nm PowerPC chips will be shrinks of existing 28nm T-series designs; others will be fresh designs. [July 13, 2015]

Figure 1: Freescale QorIQ LS1088A block diagram.
Table 1: Feature comparison of the QorIQ LS1088A and LS1048A.

Cavium Completes Octeon III Line

CN72xx- and CN73xx-Series Processors Integrate 4 to 16 CPU Cores

If Goldilocks thought Cavium's initial Octeon III processors were too small and the later ones too big, she would find the newest chips to be just right. The CN72xx and CN73xx midrange products are scheduled to begin sampling in July and start volume production in 4Q15, the company says. The fast ramp from sampling to production is possible because Cavium has already delivered four other series of lower- and higher-end Octeon III chips using the same GlobalFoundries 28nm process. The CN7230, CN7240, CN7340, CN7350, and CN7360 fill the midrange of the Octeon III family by integrating 4 to 16 of Cavium's MIPS64-compatible CPU cores. Below them are the CN70xx and CN71xx series, which have one to four CPUs and which began production in 4Q14. Above them are the CN77xx and CN78xx series; these chips have 16 to 48 CPUs and began production in 2Q15. [June 29, 2015]

Figure 1: Cavium's Octeon III family.
Table 1: Cavium's Octeon III CN72xx and CN73xx series.

Sonics Offers Power-Management IP

Ice-Grain: Industry�s First Licensable Power Manager for SoCs

After pioneering licensable on-chip interconnects since the 1990s, Sonics is branching out with the industry's first licensable on-chip power manager. Implemented as synthesizable intellectual property (IP), the new Ice-Grain subsystem will work with any interconnect and can bring sophisticated energy-saving technology to any SoC design. Similar technology is proprietary and appears only in some advanced SoCs designed by top-tier chip vendors. Ice-Grain (not to be confused with "in-circuit emulation") is a hierarchical control subsystem that manages power, clock, and voltage domains. It enables chip architects to divide their designs into many more individually controllable domains than are practical using conventional techniques. By having more domains, the chip can power only those circuits it needs at any given moment, thereby reducing both active power and static leakage. [June 15, 2015]

Figure 1: Ice-Grain power-state switching.
Figure 2: Ice-Grain's central controller.
Figure 3: Ice-Grain integration with SonicsGN.
Table 1: Power-saving techniques ranked by transition latency.

Arteris FlexNoC Gets Physical

Licensable Network-on-a-Chip Eases Timing Closure

Hoping to reduce the number of chip designers furloughed to funny farms, Arteris has introduced a new version of its licensable network-on-a-chip (NoC) that tackles one of the industry's most maddening problems: timing closure. By adding some physical awareness and layout automation to the early phases of the design process, FlexNoC Physical ensures that signals can traverse the chip's interconnects within the design's timing parameters. As a leading vendor of NoC intellectual property (IP) with more than 60 licensees, Arteris has industrywide visibility into the problem. FlexNoC Physical is the company's response to customer demand for a timing solution that precedes logic synthesis and physical layout. [May 22, 2015]

Figure 1: Arteris FlexNoC block diagram.
Figure 2: Critical-path pipelining.
Figure 3: NoC pipeline placement.
Figure 4: FlexNoC versus a conventional crossbar.

TI Samples New KeyStone DSP

[Brief Item]

Texas Instruments is sampling a new KeyStone II embedded processor for high-speed signal processing in avionics, defense, medical, and test-and-measurement applications. The 66AK2L06 has two ARM Cortex-A15 cores and four TMS320C66x DSPs, all running at 1.0GHz or 1.2GHz, depending on the model. The chip is sampling now in 28nm technology and scheduled for volume production in 3Q15. TI derived the 66AK2L06 from the KeyStone II TCI6630AK2L wireless-base-station processor. Omitting some cellular-specific features reduces the chip's cost and power consumption. We suspect the 66AK2L06 is actually the same die, which would enable TI to salvage some base-station chips whose wireless hardware fails to pass muster. [May 11, 2015]

Freescale Upgrades Automotive Vision

New S32V234 Vision Processor Enables Computer-Assisted Driving

Freescale is marking another milepost on the long road to the driverless horseless carriage. In June, the company plans to sample a new processor family designed for advanced driver-assistance systems (ADAS). The first chip is the S32V234, which combines real-time computer vision with intelligent image analysis, enabling such functions as autonomous emergency braking, lane-departure correction, road-sign recognition, and adaptive cruise control. It's also capable of sensor fusion — for example, integrating the 360-degree view of multiple cameras and sensors. The S32V234 has four 64-bit ARM Cortex-A53 CPUs running at 1.0GHz. A 32-bit Cortex-M4 CPU offloads I/O control. Two Cognivue Apex-642 cores (each clocking at 500MHz) handle the computer-vision processing, aided by an image signal processor. For 3D graphics and video, Freescale licensed Vivante's GC3000 GPU and an H.264 video encoder/decoder. A Freescale cryptography engine enables secure communications with other system components and the outside world. [April 27, 2015]

Figure 1: Automotive-vision market growth.
Figure 2: Freescale S32V234 block diagram.
Table 1: Cognivue Apex-642 block diagram.

Ceva Sharpens Computer Vision

New Ceva-XM4 DSP Core Adds FPUs and 32/64-Bit Vectors

As more machines gain the gift of sight, engineers are rediscovering a principle long known to biologists: vision is equally a sensory perception and a cerebral function. The eyes see, but the brain interprets and reacts. Thus, processing power is as vital to computer vision as image capture. To augment those back-end functions, Ceva has introduced a new licensable DSP core optimized for vision processing. The Ceva-XM4 is a fourth-generation design that has numerous improvements over the previous Ceva-MM3101. It quadruples the number of multiply-accumulate (MAC) units, quadruples the width of VLIW operations, adds 32-bit floating-point units and vector operations, and doubles the number of scalar units. It also boosts the I/O bandwidth by 100% and memory bandwidth by 33%. [April 27, 2015]

Figure 1: Ceva-XM4 block diagram.
Figure 2: Ceva-XM4 performance versus Ceva-MM3101.
Figure 3: Scatter-gather memory operations.
Table 1: Ceva-XM4 features versus Ceva-MM3101.

EZchip's Tile-Mx Grows 100 ARMs

Designers Ditch Proprietary CPUs in 100-Core ARMv8 Processor

Boasting more ARMs than a Hindu goddess, EZchip's new Tile-Mx100 is by far the largest 64-bit ARM processor yet announced. It weaves 100 Cortex-A53 cores together in a cache-coherent mesh network that also includes packet accelerators, cryptography engines, memory controllers, and high-speed I/O interfaces. Intended for 100Gbps data-plane networking and network-function virtualization, the Tile-Mx100 significantly raises the bar for manycore ARM designs. It's also the first fruit reaped from EZchip's $130 million acquisition of Tilera last year. The Tile-Mx100 follows the Tile-Gx family, whose largest member is the 72-core Tile-Gx8072. [March 2, 2015]

Figure 1: EZchip Tile-Mx100 block diagram.
Figure 2: A SkyMesh quad-core tile.
Table 1: Comparison of manycore processors for networking: EZchip's Tile-Mx100 and Tile-Gx72, Broadcom's XLP980, and Cavium's Octeon III CN7890.

AMD Unearths Excavator CPU

Carrizo and Carrizo-L Processors Target Low-Cost Notebook PCs

Like black holes, AMD's Carrizo processors are packing more stuff into the same space while radiating less heat. And AMD hopes Carrizo's gravitational attraction will be so irresistible that customers will never achieve escape velocity for a return trip to Planet Intel. Carrizo succeeds the Kaveri processors that appeared last year. A related family, Carrizo-L, cuts costs and power further by omitting several features; it succeeds the Beema processors also introduced last year. Both new processors will appear mainly in low-cost notebook PCs, small desktops, and convertible tablet notebooks. Remarkably, Carrizo crams 29% more transistors onto a 250mm² die that's barely larger than Kaveri's — without a process shrink. AMD is using the larger transistor budget to introduce its new Excavator CPU core, enhance the integrated GPU and video accelerator, debut its first implementation of adaptive voltage/frequency scaling, fully support the Heterogeneous System Architecture, and make Carrizo a true SoC by integrating the south-bridge system controller. [February 23, 2015]

Figure 1: AMD Carrizo die photo.
Figure 2: AMD's adaptive voltage/frequency scaling.
Table 1: Feature comparison of AMD's Carrizo, Carrizo-L, and A10-7300 (Kaveri) processors.

Analysts' Choice Winners for 2014

Recognizing the Best Chips and Technology of the Past Year

By The Linley Group

To recognize the top semiconductor offerings of the year, The Linley Group presents its 2014 Analysts' Choice Awards. These awards span several categories: embedded processors, mobile processors, PC and server processors, processor-IP (intellectual property) cores, and related technology. We have presented them in Microprocessor Report for many years. This year, we are adding two new categories to recognize chips that are not processors: mobile chips and networking chips. The new categories reflect our expanded coverage of these areas in our sister publications Mobile Chip Report and Networking Report.

To choose each winner, The Linley Group's team of technology analysts gathered to discuss the merits of the leading products that entered production (or, in the case of IP, production RTL) in 2014. This guideline eliminates "paper" products and allows us to evaluate delivered capabilities, not promises. We also considered only merchant offerings (e.g., chips that sell to system vendors) and not ASIC or in-house designs. Our analyst team is deeply familiar with all the leading products, having written about them over the course of the past year. We selected the winners on the basis of their performance, power, features, and cost for their target applications. [January 19, 2015]

Best PC or Server Processor: Intel Xeon E7v2 family
Best Embedded Processor: Broadcom XLP980
Best Mobile Processor: Nvidia Tegra K1-64
Best Processor IP: ARM Cortex-M7
Best Mobile Chip: STMicroelectronics STM32F411 sensor hub
Best Networking Chip: Marvell Prestera DX4251 Carrier Ethernet switch
Best Technology: Samsung 3D-NAND flash memory

Power8 Hits the Merchant Market

Memory Bandwidth Helps IBM Server Processor Ace Big Benchmarks

IBM is making good on its plan to sell Power8 processors to third parties, with Tyan already offering rack-mount development systems. Newly disclosed scores show Power8 beating Intel's most powerful server processor, the 18-core Xeon E5-2699v3 (Haswell-EP), on important benchmark tests. Both processors deliver outstanding performance on the SPEC CPU benchmarks, but IBM's huge advantages in multithreading and memory bandwidth favor Power8 when running larger test suites that more closely reflect real-world enterprise applications. Overall, the results show that IBM offers a viable high-end alternative to Intel's market-leading products. Equally important to Big Blue, Power8's performance is energizing the OpenPower Foundation, an IBM-led alliance that rallies other companies to create a larger hardware and software ecosystem around the processor. IBM is offering Power8 chips to system builders in the merchant semiconductor market and is even licensing the architecture to other processor vendors. [December 29, 2014]

Table 1: IBM Power8 processors for the merchant market.
Table 2: Power8 versus Haswell-EP.

Opportunity NoCs, NetSpeed Answers

Startup's Network-on-a-Chip Technology Promises to Ease SoC Design

NetSpeed Systems, a three-year-old network-on-a-chip (NoC) vendor, received a vote of confidence in November by raising a second round of funding from Intel Capital and Walden-Riverwood Ventures. Although the dollar amount was undisclosed, it will strengthen the startup's position versus established rivals like Sonics and Arteris. NetSpeed also faces growing competition from ARM, whose licensable cache-coherent interconnects are becoming more sophisticated and are encroaching on some territory the NoC vendors have staked out. The growing complexity of SoC designs is creating more opportunities for licensable NoCs. These configurable fabrics are one more piece of intellectual property (IP) that's often better obtained ready-made than designed from scratch — just like the CPUs, GPUs, and peripheral cores that NoCs weave together on a chip. [December 1, 2014]

Figure 1: NetSpeed's Gemini network-on-a-chip.
Figure 2: NetSpeed's NocStudio configuration tool.
Figure 3: Orion versus Amba AXI.
Figure 4: Optimizing Orion.
Figure 5: Two Gemini NoC implementations.

AppliedMicro ARMs for Embedded

New Helix Family Inherits DNA From X-Gene Server Processors

AppliedMicro's X-Gene server processors are spawning a new family of ARM-compatible embedded processors intended mainly for communications. The first two members of the new Helix family use existing X-Gene die built in 40nm CMOS technology, but future products include new designs built in a 28nm high-k metal-gate (HKMG) process. All are compatible with the 64-bit ARMv8 architecture. The Helix family will eventually supersede AppliedMicro's PacketPro APM86xxx embedded processors, which have 32-bit PowerPC 460 CPUs and are manufactured in 40nm technology. Those single- and dual-core chips are highly optimized for packet processing and communications. By contrast, Helix chips have two, four, or eight 64-bit CPUs, and we believe they have much of the same packet acceleration as the PacketPro Mamba and Diamondback processors. [October 27, 2014]

Figure 1: Block diagram of AppliedMicro's Helix 2 embedded processor.
Table 1: Feature comparison of AppliedMicro's Helix embedded processors.
Table 2: AppliedMicro's Helix 2 versus Cavium's Octeon III CN72xx and Freescale's QorIQ LS2085A.

Freescale's LS1 Gets 64 Bits

New QorIQ LS1043A and LS1023A Processors Use Cortex-A53

Once an exclusive feature of servers, workstations, and supercomputers, 64-bit CPUs are now spreading even to some low-end embedded processors. Freescale's new QorIQ LS1043A and LS1023A are the first 64-bit chips in the entry-level LS1 series. They integrate up to four ARM Cortex-A53 CPUs with a cryptography engine, packet acceleration, 10-Gigabit Ethernet, and DDR4 memory control. Despite their maximum target clock frequency of 1.5GHz, they consume only 8W or less — cool enough for fanless systems. Applications include integrated-services branch routers, security appliances, industrial controllers, and edge devices that implement software-defined networking (SDN) and network-function virtualization (NFV). [October 27, 2014]

Figure 1: Block diagram of Freescale's QorIQ LS1043A communications processor.
Table 1: Feature comparison of Freescale's QorIQ LS1043A and LS1023A processors.
Table 2: Freescale's QorIQ LS1043A compared with Freescale's QorIQ T1042, Broadcom's XLP II XLP208, and Cavium's Octeon III CN7130.

ARC HS38 Can Run High-Level OS

New Synopsys CPU Offers an MMU, SMP, and Optional L2 Cache

Synopsys is licensing a new DesignWare ARC CPU core that aims for higher-end embedded applications. It is the most powerful implementation of the ARCv2 instruction-set architecture. Targets include Wi-Fi routers, Internet gateways, digital TVs, smart appliances, and advanced driver-assistance systems. To muscle up, the new ARC HS38 core adds a memory-management unit (MMU), a translation lookaside buffer (TLB), an optional L2 cache, and extended memory addressing. Consequently, it can run a virtual-memory operating system, such as full versions of Linux. The 32-bit synthesizable CPU also supports dual- and quad-core clusters with cache-coherent symmetric multiprocessing (SMP). Yet it retains the user configurability, low power consumption, and small size of its ARC predecessors. Simulations indicate the HS38 will deliver a maximum worst-case clock frequency of 1.6GHz in a 28nm high-performance-mobile CMOS process, such as TSMC's 28nm HPM. The typical clock frequency in that process is 2.2GHz, offering plenty of performance headroom. [October 20, 2014]

Figure 1: Block diagram of the Synopsys DesignWare ARC HS38 core.
Figure 2: Block diagram of an ARC HS38x2 dual-core cluster.
Table 1: Feature comparison of the ARC HS38, HS36, and HS34.
Table 2: ARC HS38 versus ARM's Cortex-A7 and Imagination Technologies' MIPS32 interAptiv CPU cores.

KeyStone Targets Industrial Apps

[Brief Item]

Texas Instruments is sampling four KeyStone II processors it originally announced two years ago and is extending their temperature range for industrial and military-aerospace applications. These chips combine up to four Cortex-A15 CPUs with an integrated Ethernet switch. Although they are intended mainly for networked industrial applications, their switched Ethernet ports and optional DSP also suit them to enterprise gateways. TI announced the 66AK2Exx and AM5K2Exx along with the 66AK2Hxx back in 2012; the 66AK2H products sampled in December 2012, but engineering samples of the other two families didn't appear until February 2014. General sampling began in September, and production is scheduled to start by the end of this year. [October 13, 2014]

Sparc64 XIfx Uses Memory Cubes

Hybrid Memory Cubes Accelerate Fujitsu's Supercomputer Processor

In the never-ending race to build the world's fastest supercomputer, using conventional technology is like running a 100-meter sprint in rubber boots. So Fujitsu's newest Sparc64 processor laces up some wing-footed running shoes — such as Micron's Hybrid Memory Cubes, new 256-bit vector instructions, and a pair of "assistant cores" that offload system software from the other 32 CPU cores. In fact, this is the first processor we've seen that uses Micron's stacked-memory cubes. Unveiled at the recent Hot Chips symposium, the Sparc64 XIfx is the latest in Fujitsu's line of SPARC-compatible processors for high-performance computing (HPC). These devices have particularly strong FPUs and single-instruction, multiple-data (SIMD) extensions, which Fujitsu continues to improve. And the core counts are doubling with each generation. The new Sparc64 XIfx has 34 CPUs, including the two assistant cores. The previous Sparc64 IXfx (introduced in 2012) had 16 cores, and the Sparc64 VIIIfx (2011) had 8. [September 22, 2014]

Figure 1: Sparc64 XIfx die photo with overlay.
Figure 2: Block diagram of Sparc64 XIfx core groups.
Table 1: Key parameters for Fujitsu's Sparc64 XIfx, IXfx, and VIIIfx supercomputer processors.

Sparc M7 Tops 10 Billion Transistors

Oracle's Newest 32-Core Server Processor Powers Bigger Iron

It's a good thing Oracle doesn't sell millions of Sparc server processors, or the world might run out of sand. The next-generation Sparc M7 weighs in with more than 10 billion transistors on a die we estimate at about 700mm². Each of its 32 CPU cores can simultaneously execute eight threads, and the chip has more than 70MB of cache. The biggest Sparc M7 system can encompass 64 sockets, which would total 2,048 CPUs, 4.4GB of cache, 16,384 threads, and up to 128 terabytes (TB) of physical memory. What to do with this monster? Run Oracle's database software, of course. Since Oracle acquired Sun Microsystems in 2010 and took over SPARC development, it has executed a surprisingly aggressive roadmap that has new processors coming out every year. Because Oracle is virtually the only customer for these processors, the architects can tune them for the company's famous enterprise software. Consequently, Sparc processors are gradually evolving into Oracle ASICs — without sacrificing general-purpose programmability. [September 8, 2014]

Figure 1: Oracle Sparc M7 die photo.
Figure 2: Sparc M7 performance versus Sparc M6.
Table 1: Key parameters for Oracle's Sparc M7, M6, and M5 server processors.

Broadcom Boosts Small Cells

New BCM617x5 Processors Add CFR, Better Carrier Aggregation

Broadcom is sampling three new small-cell base-station processors that improve on its previous chips by adding crest-factor reduction to their digital front ends and by enabling better LTE carrier aggregation. Other new features of the BCM617x5 series include dual-sector 2x2 MIMO in the high-end product, more-powerful CPU and DSP cores, dual-band Wi-Fi hosting, and support for China Mobile's Zuc stream cipher. The new processors are the BCM61765, BCM61755, and BCM61735. They are pin compatible with the BCM61760, BCM61750, and BCM61730 that Broadcom announced and shipped last year. All BCM617xx processors support LTE (FDD or TDD) and LTE-Advanced (LTE-A), plus multiple 3G standards (WCDMA and TD-SCDMA). All are capable of simultaneous dual-mode (3G/4G) operation, and they also support network sniffing for various 2G standards in self-organizing networks (SONs). The new chips began sampling in 1Q14 and are scheduled for production in 4Q14. [August 11, 2014]

Figure 1: Block diagram of Broadcom's BCM61765.
Table 1: Key parameters for Broadcom's BCM617x5 base-station processors.
Table 2: Broadcom's BCM61765 versus Qualcomm's FSM9900 and Texas Instruments' KeyStone II TCI6620K2L.

TI Slashes Sitera Power

AM437x Processors Cut Some Features to Reach One Watt

Texas Instruments is sampling four new Sitara embedded processors that use much less power than their predecessors while upgrading the CPU core. Whereas the previous Sitara AM38xx chips use Cortex-A8 and consume about 5W, the new Sitara AM437x chips use Cortex-A9 and consume only about 1W (maximum). Because TI manufactures both series in the same 45nm CMOS process, some compromises were inevitable: a lower CPU clock frequency, slower integrated graphics, fewer high-speed I/O interfaces, less memory bandwidth, and no pin compatibility with previous Sitara chips. Like their forebears, the new processors target a broad range of embedded applications, but they focus on real-time industrial communications, test-and-measurement instruments, barcode scanners, portable data terminals, medical devices, and GPS navigation. They are particularly well suited for digital signage and other designs that take advantage of the PowerVR SGX530 GPU in two of the models. [July 21, 2014]

Figure 1: Block diagram of Texas Instruments Sitara AM4379.
Table 1: TI's new Sitara AM437x series: the AM4379, AM4378, AM4377, and AM4376.

AMD Upgrades Embedded G-Series

Better Power Efficiency, Faster Graphics, and TrustZone Security

The Eagles have landed: AMD's Steppe Eagle and Crowned Eagle, that is. Those are the code-names for six new Embedded G-Series SoCs that will go talon to talon with Intel's Atom and other high-performance embedded processors. Although AMD is optimistically pitching the new dual- and quad-core chips for data-center switches and network-security appliances, their main markets are PC-like embedded systems: kiosks, point-of-sale terminals, thin clients, gambling machines, and medical equipment. Such systems commonly employ x86 embedded processors derived from PC processors and usually run Windows or Linux. Unlike previous Embedded G-Series (Kabini) processors, the new chips use the improved Puma CPU core (sometimes called Puma+ or Jaguar+). The new chips are in production now, and their clock speeds of 1.0–2.4GHz and TDPs of 6–25W will serve a wide range of embedded applications. [June 30, 2014]

Figure 1: Block diagram of AMD's Embedded G-Series GX-424CC.
Table 1: AMD's new Embedded G-Series SoCs: the GX-424CC, GX-412HC, GX-212HC, GX-210JC, GX-420MC, and GX-412TC.

Marvell's Armada Reaches 28nm

Armada 385 and 380 Begin Maiden Voyage in HKMG Technology

Marvell has quietly launched its first Armada-family embedded processors fabricated in 28nm technology. Leading the flotilla is the Armada 385, a dual-core chip powered by ARM's Cortex-A9. Its escort is the Armada 380, a single-core model based on the same die. Both processors target small-business, enterprise, and carrier-class communications equipment, such as 802.11ac Wi-Fi access points and network-attached storage (NAS). The most visible result of moving to 28nm high-k metal-gate (HKMG) technology is a big jump in clock speed — at its maximum target clock frequency of 1.6GHz, the Armada 385 is 60% faster than the Armada 375. And the new chip has four times more L2 cache than its predecessor, plus upgraded interfaces for DRAM, Ethernet, PCI Express (PCIe), and Serial ATA (SATA). [June 2, 2014]

Figure 1: Marvell Armada 385 block diagram.
Figure 2: Example Wi-Fi access point using the Armada 385 and other Marvell chips.
Table 1: Comparison of Marvell's Armada 385, 380, 375, and 370.
Table 2: Comparison of Marvell's Armada 385, Broadcom's StrataGX BCM58623, Cavium's Octeon III CN7020, and Freescale's QorIQ T1020.

Freescale Buys Mindspeed's Comcerto

[Brief Item]

Freescale is strengthening its home-networking lineup by acquiring the last fragment of Mindspeed — the Comcerto communications processors for broadband gateways and network-attached storage (NAS). If the deal closes this quarter as expected, Freescale will merge the ARM-based Comcerto line with its QorIQ family of Power Architecture and ARM processors and will continue their development. Mindspeed, which originated as a Conexant spinoff in 2004, has now been chopped into three pieces. In December, Macom paid $272 million for most of the company, and Intel paid $12 million for the Transcede base-station processors and related intellectual property. Macom is retaining Mindspeed's extensive analog portfolio and Comcerto voice-over-IP (VoIP) processors but is selling the Comcerto gateway processors to Freescale for an undisclosed sum. [May 5, 2014]

Freescale Expands QorIQ T-Series

New T4080 and T1-Series Processors Extend 28nm Power Lineup

After a yearlong drought of QorIQ T-series announcements, Freescale unveiled five new processors in that Power Architecture family at its U.S. technology forum this month. The T4080 is a quad-core eight-thread processor optimized for midrange communications infrastructure, and the others are single- and dual-core processors optimized for low-end communications and general embedded applications. These eagerly awaited 28nm chips fill several gaps in the QorIQ T-series, finally superseding some popular but aging P-series processors manufactured in 45nm technology. [April 21, 2014]

Figure 1: Summary of Freescale's QorIQ T-series processors.
Figure 2: QorIQ T4080 block diagram.
Figure 3: QorIQ T1024 block diagram.
Table 1: Comparison of the new QorIQ T4080 with the existing T2080 and T1040.
Table 2: Freescale's QorIQ T4080 versus Broadcom's XLP516 (XLP II) and Cavium's Octeon III CN7130.
Table 3: Comparison of the new QorIQ T1013, T1023, T1014, and T1024.
Table 4: Freescale's new QorIQ T1023 versus the ARM-based QorIQ LS1020A, Broadcom's StrataGX BCM58525, and Cavium's Octeon III CN7020.

Broadcom's XLP500 Raises Midrange

Newest XLP II Processors Offer 4, 6, or 8 CPUs and 40G Ethernet

With low- and high-end XLP II chips approaching volume production, Broadcom is now sampling the midrange members of this MIPS-compatible embedded-processor family. The new XLP500 series comprises three basic models with four, six, or eight CPU cores and two package options, for a total of six distinct products. They will consume about the same power as the previous-generation XLP300 chips but deliver more than twice the performance — mainly by quadrupling the speed of the network interfaces, doubling the number of CPUs, and boosting the clock frequency. The faster network interfaces launch these midrange products into the same stratosphere as previous-generation high-end processors. The XLP500 line supports two 40 Gigabit Ethernet ports, or up to eight 10GbE or nine GbE ports. Previously, only high-end communications processors supported 40GbE interfaces. Doubling the number of CPU cores and threads will help these muscular devices handle the faster packet flows. Assisting the CPUs are hardware accelerators for bulk cryptography, RSA cryptography, RAID storage, data compression, and deep packet inspection. [March 31, 2014]

Figure 1: Block diagram of Broadcom XLP532.
Figure 2: An XLP516 wireless base station.
Table 1: Broadcom XLP516, XLP524, and XLP532 processors.
Table 2: Broadcom�s XLP532 versus Freescale�s QorIQ T4160.

Freescale Integrates Digital Front End

Small-Cell Base-Station Processor Adds DFE and JESD Interfaces

Freescale's newest QorIQ Qonverge wireless base-station processor is the company's most integrated small-cell chip. By adding a digital front end (DFE), the B3421 eliminates the external FPGA or ASIC usually required to handle digital up/down-conversion and related functions. New JESD204 and JESD207 antenna interfaces enable glueless connections to the base station's analog section. Yet the processor retains the usual CPRI interfaces, giving customers the flexibility to use their own DFE and a remote radio head. Another new addition is a Serial ATA (SATA) interface. SATA enables local content caching on disk drives inside the base station — a feature particularly suited to the metrocells and microcells for which this processor is designed. [March 10, 2014]

Figure 1: Freescale QorIQ Qonverge B3421 block diagram.
Figure 2: Small-cell LTE base station.
Figure 3: Freescale's QorIQ Qonverge family.
Table 1: Key parameters for Qonverge B3421, B4420, and B4860 processors.
Table 2: Freescale's Qonverge B3421 versus two competitors: Qualcomm's FSM9900 and Texas Instruments' KeyStone II TCI6630K2L.

MIPS MCUs Outrun ARM

Microchip's PIC32MZ Microcontrollers Set CoreMark Record

Microchip's newest MIPS-based 32-bit microcontrollers not only match the features of their Cortex-M4 competitors but also achieve higher EEMBC CoreMark scores. The new PIC32MZ-EC family is powered by a MIPS microAptiv CPU core running at 200MHz — a speed demon by MCU standards. These MCUs have more memory than comparable chips (up to 2MB of flash and 512KB of SRAM) plus Ethernet, Hi-Speed USB2.0, an LCD interface, and a cryptography accelerator. An early sample scored 654 CoreMarks — the highest EEMBC-certified score for any 32-bit MCU executing from internal flash memory. Microchip designed the PIC32MZ family for high-end controller applications, such as vehicle dashboard systems, building environmental controls, and consumer-appliance control modules. [February 17, 2014]

Figure 1: Microchip PIC32MZ-EC block diagram.
Table 1: Microchip's PIC32MZ-EC family.
Table 2: Microchip's PIC32MZ versus three competitors: Freescale's Kinetis K70, NXP's LPC43x, and Texas Instruments' TM4C129x.

Analysts' Choice Winners for 2013

Recognizing the Best Processors of the Past Year

By The Linley Group

To recognize the top processor offerings of the year, The Linley Group presents its 2013 Analysts' Choice Awards. To choose each winner, The Linley Group�s team of technology analysts gathered to discuss the merits of the leading products that entered production (or, in the case of intellectual property, production RTL) in 2013. This guideline eliminates "paper" products and allows us to evaluate delivered capabilities, not promises. Our analyst team is deeply familiar with all the leading processor products, having written about them for Microprocessor Report over the course of the past year. We selected the winners on the basis of their performance, power, features, and cost for their target applications. [January 20, 2014]

Best Mobile Processor: MediaTek MT6572
Best Embedded Processor: Texas Instruments KeyStone II TCI6636
Best Processor IP: ARM Cortex-A53
Best PC or Server Processor: Intel C2000 family
Best Processor Technology: Heterogeneous System Architecture (HSA)

Year in Review:
Embedded Processors Thrive at 28nm

Core Counts, Threads, and Chip Integration Reach New Heights

Embedded processors began moving to 28nm production en masse in 2013, enabling greater performance and integration while reducing cost and power consumption — perennial benefits that the industry always takes for granted but that will likely decelerate in the near future. For now, though, embedded processors are athletes in the prime of life, achieving record-breaking performance. Broadcom began sampling its largest XLP II processor, which can run 80 threads on 20 cores — more threads than any competitor. Cavium's 48-core Octeon III, another 28nm giant, has yet to sample. In a more surprising feat, Tilera began shipping its largest Tile-Gx processor — a 72-core mammoth that has more CPUs than any competitor and is built in older 40nm technology. Two additional trends apparent in 2013 were further movement toward the ARM architecture and 64-bit processing. [January 13, 2014]

Figure 1: Tilera's Tile-Gx8072 posted the highest single-socket CoreMark score.
Figure 2: Industry revenue of embedded processors by application, 2012–2017.

Intel Saves Mindspeed's Wireless Chips

[Brief Item]

Intel's $12 million bid for Mindspeed's wireless-infrastructure processors will keep the existing products alive and ensure delivery of the future products announced thus far. Although the processor giant is saying little about the deal, it has acquired Mindspeed's inventories and intends to fulfill the product roadmap, including the new Transcede T3400 and T4400 base-station processors announced last year. Before Intel's December 16 offer, Mindspeed's wireless business looked precarious. On November 5, Macom announced a $272 million offer for the rest of the company. Although Mindspeed's analog technology was a desirable acquisition target, Macom had no interest in the base-station processors. Had Mindspeed not found a buyer for those processors, Macom would have discontinued them. SEC filings suggest that Mindspeed was already negotiating their sale to Intel before Macom tendered its bid for the rest of the company. [January 13, 2014]

Qualcomm Unveils Comms Processors

First Qualcomm Atheros IPQ Chips Aim for Home Networking

Qualcomm Atheros is sampling a new family of communications processors for home networking, Internet gateways, and household automation. Future members of the IPQ family will target other network-edge devices, such as small-business access points and small-cell base stations. Although these markets are already crowded with major competitors, Qualcomm hopes to succeed by applying its experience in related fields, such as smartphones, cellular base stations, Wi-Fi access points, and Ethernet switching. The first IPQ chips are the IPQ8064 and IPQ8062. (IPQ stands for Internet Processor Qualcomm.) The IPQ8064 is the primary product; the other chip is a slightly slower and lower-cost subset. [December 9, 2013]

Figure 1: Block diagram of Qualcomm Atheros IPQ8064.
Table 1: Comparison of the Qualcomm Atheros IPQ8064, Freescale QorIQ LS1020A, Broadcom StrataGX BCM58525, and Cavium Octeon III CN7020.

XMOS Adds an ARM

Cortex-M3 Brings ARM Compatibility to Proprietary MCUs

XMOS is sampling its first xCore microcontroller that supplements the proprietary CPU with an ARM Cortex-M3. The initial product is actually a multichip package that combines an xCore MCU with a Cortex-M3 MCU from Silicon Labs. Scheduled for production in 1Q14, this first xCore-XA device will be followed by additional models having slightly different features. By adding ARM compatibility, XMOS is broadening the market for its specialized 32-bit MCUs, which soar to clock frequencies as high as 500MHz. The company envisions customers using the xCore CPU to handle real-time control tasks while the ARM CPU runs higher-level software, such as an application program and a user interface. Although other xCore chips can handle both tasks, ARM compatibility lets developers use their existing code for the higher-level software while running optimized control code on the proprietary core. [November 18, 2013]

Figure 1: XMOS xCore-XA block diagram.

Synopsys Accelerates ARC CPUs

ARC HS Cores Improve Clock Speeds, Power Efficiency, Code Density

Synopsys has introduced the fastest ARC CPU cores yet. Whereas previous CPUs implementing the ARCv2 architecture have short three-stage instruction pipelines, the new ARC HS (High Speed) family deepens the pipeline to 10 stages. Simulations indicate it will reach clock frequencies of up to 1.6GHz for chips fabricated in 28nm high-k metal-gate (HKMG) processes, such as TSMC's 28nm HPM. But clock speed isn't the only selling point: ARC HS cores also improve power efficiency, silicon area, code density, integer throughput, floating-point performance, and real-time trace features. The ARC HS34 core is designed primarily for real-time control applications, such as solid-state drives (SSDs), network-attached storage (NAS), home gateways, home networking, and mobile products. The other new CPU, the ARC HS36, is intended primarily for higher-end mobile consumer products, such as digital cameras and tablets, as well as for digital TVs, set-top boxes, automobile infotainment systems, and the "Internet of Things" (noncomputer devices). [November 11, 2013]

Figure 1: ARC HS block diagram.
Figure 2: ARC HS instruction pipeline.
Table 1: Comparison of ARC HS and ARC EM architectural features.
Table 2: Comparison of the ARC HS34/HS36 cores with Imagination's MIPS interAptiv, Cadence's Tensilica Diamond 570T, and ARM's Cortex-R5.

P5600 Extends MIPS Performance

New Warrior CPU Implements MIPS Release 5 and SIMD Extensions

Imagination Technologies is cementing its commitment to the MIPS architecture by introducing its first new CPU core since acquiring MIPS Technologies earlier this year. The P5600 is the first member of the Series5 Warrior family and is designed for consumer electronics, smartphones, tablets, and other high-performance embedded systems. It's a licensable and synthesizable 32-bit CPU core with a 16-stage instruction pipeline, four-issue superscalar execution, instruction reordering, improved branch prediction, extended memory addressing, and hardware virtualization. Its 128-bit dual-issue SIMD units can handle single- and double-precision floating-point operations as well as integer data types. By pairing some operations, the P5600 can effectively issue up to eight instructions per clock cycle. A new coherence manager supports SMP clusters with up to six CPUs and a shared L2 cache. The P5600 also widens its internal data paths and external-I/O interfaces. [October 28, 2013]

Figure 1: MIPS P5600 block diagram.
Figure 2: MIPS P5600 load pipeline.
Figure 3: MIPS P5600 six-core coherent cluster.
Table 1: Comparison of high-performance 32-bit CPU cores.

New Sensors for Smartphones

Aptina's CMOS Image Sensors Hunt for a Better Bayer

Smartphones have become the world's most popular cameras, and they are crushing the sales of compact digital cameras. IDC estimates that compact-digicam sales will plunge to 80 million units this year from 132 million in 2010. In comparison, The Linley Group forecasts 987 million smartphones will sell this year. One result is that phone makers are scrambling to boost pixel resolution and image quality — two opposing goals, as smaller pixel sites gather less light and produce more noise. Now, a major sensor supplier is promising higher quality and greater light sensitivity with smaller pixels. That supplier is Aptina Imaging, a privately held Micron spinoff in Silicon Valley. Its new technology is Clarity+, which substitutes a unique color-filter array for the nearly universal Bayer array and then applies sophisticated processing to extrapolate colors and suppress noise. Aptina says a Clarity+ sensor with 1.1-micron pixels can deliver the same image quality now available using 1.4-micron pixels while doubling the sensor sensitivity. [August 12, 2013]

Figure 1: Aptina's Clarity+ color-filter array versus the classic Bayer array.
Figure 2: Clarity+ color conversion and noise reduction.
Figure 3: Comparison using Macbeth color chart.
Figure 4: Low-light comparison of Bayer versus Clarity+.
Table 1: The classic Bayer color-filter array versus four alternatives.

Intel Tests Code Compression

[Brief Item]

Intel Labs has demonstrated a compile-time code-compression technology in which a processor unpacks the compressed program at run time. The goal is to cut silicon costs by reducing the amount of on-chip memory. Although Intel calls the technology "direct compressed execution," the compressed program is actually decompressed on the fly before execution. It differs from other code-compression techniques (such as ARM's Thumb) that replace 32-bit instructions with 16-bit opcodes. Instead of using a modified instruction set, programmers write, compile, and link their code as usual, then employ a special utility to compress the binary file. [July 22, 2013]

XMOS Samples Mixed-Signal MCUs

Multichip Microcontrollers Add Power Management and ADCs

XMOS is sampling its xCore-Analog microcontrollers, which combine a 32-bit digital MCU and an analog controller chip in one package. Slated for production this fall, these multichip MCUs are intended mainly for high-end audio gear, automobile infotainment systems, factory robots, and other industrial applications. Unlike conventional MCUs, most XMOS devices have multiple CPU cores (which the company calls "tiles"), and those CPUs are unusually fast, reaching clock frequencies of 400-500MHz. Moreover, the CPUs support hardware multithreading with up to eight threads per core. These threads (which XMOS calls "logical cores") share equal time by executing in a deterministic round-robin fashion, switching contexts on every clock cycle. Another unusual feature is that xCore MCUs replace hard-wired I/O interfaces with software-configurable I/O pins. [July 15, 2013]

Figure 1: XMOS xCore-Analog block diagram.
Table 1: Three families of XMOS 32-bit microcontrollers.

XLP980 Boasts 20 Cores

Broadcom's Biggest Beast Has 80 Threads, 160Gbps Networking

Broadcom is sampling its biggest XLP II processor, the XLP980, delivering unprecedented performance for an embedded processor. Sporting 20 CPU cores running at 2.0GHz, as well as 80 threads, four 40 Gigabit Ethernet ports, and impressive networking acceleration, the XLP980 can sustain throughputs up to 160Gbps. Moreover, a special interchip interface can unite a maximum of eight processors in a cache-coherent cluster whose total throughput is 1.28 terabits per second (Tbps). The XLP980 will compete with top-rank processors from Cavium and Freescale for high-end control- and data-plane applications — mainly in core routers, universal-services blades, security appliances, LTE gateways, and radio network controllers. With its cryptographic engines, regular-expression (reg-ex) acceleration, RAID acceleration, packet accelerators, and compression engines, it can manage routing, switching, security processing, load balancing, and cloud storage. New virtualization features open doors into data-center processing. [June 14, 2013]

Figure 1: Broadcom XLP980 block diagram.
Figure 2: MIPS64 Release 5 virtualization.
Figure 3: Broadcom's Interchip Coherency Interface.
Table 1: Comparison of high-end communications processors: Broadcom XLP980, Cavium Octeon II CN6880, and Freescale QorIQ T4240.

Broadcom Broadens StrataGX

New ARM-Based Processors Improve Packet Processing and Security

Broadcom's latest StrataGX processors boost clock speeds, add Serial ATA, strengthen their security engines, and introduce a programmable packet accelerator. They also retain the integrated Gigabit Ethernet switches that set this family apart from other embedded processors. Their primary markets are enterprise access points and small-business Wi-Fi routers. Both new StrataGX processors have dual 1.2GHz Cortex-A9 cores with Neon SIMD extensions and error-protected L2 caches. The new high-end chip in the family is the BCM58525, intended mainly for enterprise Wi-Fi access points. When paired with Broadcom's Ethernet switch chips, it can also handle control-plane processing and cryptography acceleration. [June 3, 2013]

Figure 1: Broadcom's StrataGX family.
Figure 2: StrataGX BCM58525 block diagram.
Table 1: Key parameters for Broadcom's StrataGX processors.

Armada 375 Has Dual Cortex-A9s

[Brief Item]

Marvell has announced the first Armada embedded processor to use CPU cores licensed from ARM instead of ARM-compatible cores designed in house. The Armada 375 has two Cortex-A9 CPUs clocked at 800MHz to 1.0GHz, and they include ARM's 64-bit vector floating-point (VFP) units and 128-bit Neon SIMD extensions. Designed for low-cost storage control, media servers, and light-duty networking, the 375 surpasses other Armada 300-series chips and measures up to processors in the higher-end Armada XP family. [May 27, 2013]

Embedded Atoms Target Storage

[Brief Item]

Intel has used Atom processors as a lower-power and less expensive alternative for netbooks, tablets, and smartphones. With the S1200 family (code-named Centerton), the company has extended Atom into servers as well. Now, three new S1200 storage-control processors (code-named Briarwood) are targeting low-end RAID systems for network-attached storage (NAS) and storage-attached networks (SANs). The Atom S1269, S1279, and S1289 closely resemble Centerton server processors. Each chip has two 64-bit Atom CPUs with dual threads, a single-channel DDR3-1333 DRAM interface with ECC, and Intel's VT-x virtualization extensions. The S1269 and S1279 processors run at 1.6GHz, and the S1289 reaches 2.0GHz. All are manufactured in 32nm planar technology, not Intel's newer 22nm FinFET process. They began production in 1Q13. [May 20, 2013]

Foundries Stretch for FinFETs

GlobalFoundries, TSMC Struggle to Catch Intel, Obey Moore's Law

If Moore's Law were a real law, the semiconductor industry would be clamoring for its repeal. Although the informal "law" has brought the industry vast fame and riches, complying with its rigid demands and relentless schedule is becoming enormously expensive. It is also forcing researchers to experiment with increasingly extreme technologies, in turn forcing chip designers to face a more confusing array of choices. The growing costs and complexity of chip manufacturing were evident at two recent foundry events in Silicon Valley. In February, GlobalFoundries, IBM, and Samsung outlined their future plans at the Common Platform Technology Forum. In April, TSMC did likewise at its 19th annual North American Technology Symposium. The roadmaps unveiled at these events are vital information, because almost all chip companies have committed to a fabless or "fab-lite" business model that outsources manufacturing. [April 29, 2013]

Figure 1: Process-technology roadmaps.
Figure 2: Relative costs of new process technologies.
Figure 3: Rising manufacturing complexity.
Figure 4: Comparison of four GlobalFoundries process technologies.

Oracle Says SPARC Is Tops

Sparc T5 Server Processor Sets Enterprise Benchmarks Ablaze

Oracle has unveiled a new family of midrange and high-end SPARC servers, claiming their Sparc T5 chips are the world's fastest microprocessors. To support these claims, the company released 17 world-record benchmark results. The new processor surpasses the competition on SPEC CPU2006, including both integer and floating-point performance. It also achieves top scores on several database, middleware, and system-level benchmarks appropriate for Oracle's enterprise-server business. [April 15, 2013]

Figure 1: Sparc T5 die photo with overlay.

Small Cells Surge at MWC

New Processors, Software, Alliances Boost Micro Base Stations

Small-cell base-station designers found several new products to celebrate at last month's Mobile World Congress in Barcelona. Foremost were new base-station processors from Broadcom and Mindspeed. Broadcom's new base-station processors include the dual-mode (3G/4G) BCM617xx series and the single-mode (3G) BCM61630. All three chips in the BCM617xx series support LTE and LTE-Advanced (LTE-A), plus multiple 3G standards (WCDMA and TD-SCDMA). Mindspeed announced two second-generation dual-mode designs, the Transcede T3400 and T4400. Both support LTE and LTE-A (FDD or TDD) plus WCDMA or TD-SCDMA for 3G networks. [March 25, 2013]

Table 1: New base-station processors from Broadcom and Mindspeed.

Tilera Unwraps 72-Core Whopper

[Brief Item]

If more is better, Tilera is tops. Its new Tile-Gx72 network/media processor has 72 CPU cores — surpassing the already impressive Tile-Gx36 and Tile-Gx64. This monolithic design is a unique achievement of manycore integration in 40nm CMOS technology. Tilera is pitching the Tile-Gx72 for packet processing, network security, audio/video coding, and other highly parallel applications. Each CPU has a maximum clock frequency of 1.2GHz, local primary and secondary caches (23MB total on chip), and a connection to an internal mesh network. The mesh arranges the CPUs in an eight-by-nine tiled matrix that provides more than 100 terabits per second of aggregate bandwidth. [February 25, 2013]

Freescale Challenges Nitrox

New C29x Crypto Coprocessors Underprice Cavium

Nibbling at the fast-growing data-center market, Freescale is entering one of that market's smallest corners: cryptography coprocessing. The company has announced three new chips that will challenge the dominant market leader, Cavium's Nitrox family. Freescale's new C29x chips join additional newcomers: Intel's DH8900-family "Cave Creek" chips, which were introduced last quarter. Why is Freescale entering this market now? For one thing, the company expects demand for public-key cryptography performance to rise as the industry switches from the widely used 1,024-bit keys to the safer 2,048-bit keys recommended by the U.S. government. Also, the dearth of competition has convinced Freescale that there's room for a lower-priced product. Consequently, the new coprocessors make their debut with three models starting at $99 — hundreds of dollars less than Cavium's latest Nitrox III chips. [February 18, 2013]

Figure 1: Example system design based on Freescale's C29x.
Table 1: High-performance crypto coprocessors from Freescale, Cavium, and Intel.

Best Processor Technology of 2012

Cyclos Resonant Mesh Is More Efficient Than Clock Trees

The best new technology doesn't grow on trees, but it can be found in a mesh. For its achievements in replacing conventional clock-signal trees with a resonant clock mesh and easing the design of high-performance chips, Cyclos nets The Linley Group's Analysts' Choice Award for the best new microprocessor-related technology of 2012. High-speed processors impose many design challenges, not the least being the rising power consumption of complex clock trees. Essential for uniform timing, these circuits have numerous branches that carry clock signals to every nook and cranny of a chip. But as processors get faster and more complex, clock trees struggle to keep up and can account for one-third of the chip's power consumption. The leading alternative, especially for processors exceeding 2.0GHz, is a clock mesh, which consumes about as much power as a clock tree but enables higher performance. Cyclos reduces power consumption in clock meshes by connecting them with integrated inductors to form resonant LC oscillators. [January 21, 2013]

ARM's 64-Bit Makeover

New Architecture Strengthens ARM for Struggles Ahead

Politicians dream of rewriting constitutions. Lawyers dream of rewriting landmark court rulings. Engineers dream of redesigning microprocessor architectures. But such opportunities are so rare that few practitioners ever get a chance to make such fundamental changes. Starting in 2007, some lucky engineers at ARM got that chance. Their slate wasn't completely blank — the new 64-bit ARMv8 architecture had to maintain compatibility with the existing 32-bit architecture. Nevertheless, they got a once-in-a-lifetime opportunity to overhaul an architecture that remains quite serviceable but has some crufty features that impair performance. Since we first covered ARMv8 last year, the company has released much more documentation, allowing a more thorough analysis. Although ARM's tardiness in moving to 64 bits has undoubtedly cost the company some opportunities, ARMv8 is an extensive revision, and the delay has allowed ARM's architects to learn from other companies' experience and mistakes. [December 24, 2012]

Figure 1: ARMv8 register mapping for VFP, Neon SIMD, and crypto instructions.
Figure 2: ARMv8's privileged-execution model in AArch64.
Figure 3: ARMv8's 64-bit memory map.
Table 1: ARMv8's optional cryptography instructions.

32 Bits for 39 Cents

[Brief Item]

NXP has reached a new low with its first family of ARM Cortex-M0+ microcontrollers. These tiny chips squash the list price of 32-bit MCUs to as little as 39 cents each in 10,000-unit volumes. And one device is packaged in a classic eight-pin DIP — half as many pins as Intel's four-bit 4004, the world's first commercial microprocessor, introduced in 1971. It seems impossible that a 32-bit MCU could cost so little and function with only eight pins. But internal flash memory and SRAM make external-memory I/O unnecessary, and multiplexed interfaces allow the on-chip peripherals to share the same I/O pins. Using a configurable switch matrix, developers can assign any I/O pin to any timer, UART, SPI, or I2C interface. [December 3, 2012]

The New Look of DSPs

TI's KeyStone Chips Integrate DSP Cores With ARM CPUs

Not often is a new processor advertised for applications as diverse as cloud servers, routers, switches, industrial controllers, sensor networks, video recorders, videoconferencing systems, voice gateways, supercomputers, gaming, avionics, and radar. But then, no other processors can match the credentials of Texas Instruments' new KeyStone chips: up to four ARM Cortex-A15 CPUs, up to eight of TI's C66x DSP cores, up to 18MB of internal memory, on-chip Ethernet switching, Layer 2-4 packet acceleration, cryptography acceleration, and enough I/O to keep a few hundred pins busy. Just about the only thing TI left out is cellular-baseband acceleration, distinguishing these chips from the company's KeyStone II base-station processors. [November 19, 2012]

Figure 1: Texas Instruments' KeyStone 66AK2H12 block diagram.
Table 1: Key parameters of TI's KeyStone 66AK2Hxx, 66AK2Exx, and AM5K2Exx embedded processors.

Broadcom Samples 28nm XLP II

First Post-NetLogic Processors Expand Family

Lead customers are receiving samples of Broadcom's first chips manufactured in 28nm high-k metal-gate technology — the next-generation XLP II family of multicore networking processors acquired with NetLogic. The first XLP II product to sample is the previously announced dual-core XLP208, an eight-thread processor with several packet acceleration engines and updated I/O interfaces. Broadcom is also sampling six newly announced processors: the XLP101, XLP104, XLP108, XLP201, XLP202, and XLP204. (Note that some of these chips have the same names as previous XLP chips fabricated in 40nm technology.) All the new XLP II processors are single- or dual-core chips with one to eight threads and a target clock frequency of 2.0GHz. [October 22, 2012]

Figure 1: Broadcom XLP208 block diagram.
Table 1: Broadcom's first XLP II processors.
Table 2: Comparison of Broadcom's XLP208, Cavium's Octeon II CN6230, and Freescale's QorIQ P3041 and P5020 processors.

Atmel's Sleepwalking MCU

[Brief Item]

Atmel today announced a new 32-bit microcontroller that it claims is the world's most power-efficient Cortex-M4 MCU. At its top speed of 48MHz, the SAM4L's typical power consumption is as low as 17.3mW. But at 12MHz, it uses only 2.16mW while fully active and remains surprisingly functional even while sleeping. The key to prolonging battery life is to sleep as much as possible, then quickly wake up and do a brief burst of work before slumbering again. In deep sleep, only the real-time clock (RTC) remains awake. It can supervise simple tasks without disturbing the CPU, because the peripherals have some autonomy. [September 24, 2012]

Fujitsu and Oracle Ignite SPARCs

New Sparc64 X and Sparc T5 Processors Keep the Flame Alive

SPARCs flew at the Flint Center in Cupertino, the site of this year's Hot Chips Symposium. SPARC makers Fujitsu and Oracle both lit up the stage with new server processors intended to defend the seminal RISC architecture against more encroachments by x86 and POWER chips. Whereas Fujitsu's tenth-generation Sparc64 X design emphasizes per-thread throughput, Oracle's Sparc T5 excels at multichip scaling. This article focuses on Fujitsu's processor, for which more information is available. Oracle was more reticent but did reveal some new features. These companies are the only remaining developers of high-performance SPARC processors and are their own biggest customers. [September 17, 2012]

Figure 1: Fujitsu Sparc64 X die photo.
Figure 2: Sparc64 X instruction pipeline.
Figure 3: Sparc64 X multichip interconnects.
Figure 4: Oracle's Sparc T5 multichip configuration.

Qualcomm Buys DesignArt

Deal Includes 4G Technology and Wireless Backhaul for Base Stations

By acquiring DesignArt Networks, Qualcomm is vaulting into the burgeoning market for small-cell wireless base stations that support the Long Term Evolution (LTE) and LTE-Advanced (LTE-A) standards. DesignArt's 4G base-station processors surpass Qualcomm's existing 3G femtocell products by offering much higher data rates and user capacities. In addition, Qualcomm gains DesignArt's important wireless-backhaul technology and a respected engineering team. DesignArt is an Israeli fabless semiconductor company founded in 2006. [September 10, 2012]

Table 1: Key parameters for DesignArt's DAN3400 wireless base-station processor.

ARM Joins Hands With QorIQ

Freescale's Layerscape Is Core Agnostic, Data-Plane Programmable

ARM's Law: the number of ARM-based processors doubles every 24 months. Well, not quite, but it seems that way. The latest chips to come within ARM's reach are Freescale's market-leading QorIQ communications processors, which are also the best-selling flag bearers of the Power Architecture. Some future QorIQ chips will use ARM CPUs instead of Freescale's PowerPC cores, the company announced at its recent technology forum. The first two ARM-based QorIQ processors are the LS1 (dual Cortex-A7) and LS2 (dual Cortex-A15). They will also be based on Freescale's new Layerscape chip-level architecture — the cornerstone of the company's post-2013 QorIQ strategy. [July 9, 2012]

Figure 1: Freescale's Layerscape chip-level architecture.
Figure 2: Various Layerscape implementations.
Figure 3: Freescale QorIQ LS2 block diagram.

Freescale Targets Smaller Cells

[Brief Item]

Integrated processors for wireless base stations continue to proliferate. The newest entry is Freescale's QorIQ Qonverge B4420, which is designed for 3G/4G microcells and metrocells that serve as many as 256 active users. The B4420 supports LTE and LTE-Advanced in addition to 3G protocols. The new Qonverge B4420 is an economy model of the B4860, Freescale's most powerful base-station processor (see MPR 3/19/12, "Freescale's Qonverge Goes Macro"). These chips are pin compatible and the first Freescale products to sample in 28nm HPM technology. Samples are due in 3Q12, with production scheduled for 2H13. [June 25, 2012]

Editorial: China Should Buy MIPS

Chinese-Backed ISA Would Counter ARM and x86

Two recent news items caught my attention. First, China wants to standardize on a single instruction-set architecture for all future government-sponsored projects. Second, Bloomberg reported that MIPS Technologies is seriously looking for a buyer. These events lead me to suggest that the Chinese government (or one of its semiprivate entities) should acquire MIPS. Would a Chinese acquisition of MIPS Technologies and a Chinese-standard CPU architecture be good for American interests? Perhaps not. But when judged purely as a business and technology proposition, the deal makes more sense than many other scenarios. [May 28, 2012]

Freescale Pumps Up P5 Series

New QorIQ P5040 and P5021 Processors Raise Performance

Even while Freescale rolls out new QorIQ T-series processors, the previous-generation product line continues to grow. The latest additions are the quad-core P5040, which has twice as many CPU cores as previous P5-series chips, and the dual-core P5021, which improves on the existing P5020. Both new processors also add more Ethernet controllers and raise the maximum clock speed of their Power e5500 CPU cores to 2.4GHz. Having twice as many CPUs as the existing P5020, the P5040 bolsters Freescale's offerings for control processing and will improve the company's competitive stance against Intel's new Ivy Bridge embedded processors. [May 28, 2012]

Figure 1: Freescale QorIQ P5040 block diagram.

Parallelism for the Masses

Intel's "River Trail" Adds Easy Parallel Processing to JavaScript

Unlocking the parallelism of multicore processors has vexed the industry since the first such chips burst onto the scene in 2005. With a project code-named River Trail, Intel Labs is taking a refreshingly different approach. Not only does River Trail add data parallelism to JavaScript — the web's most popular scripting language — but it also makes parallel programming easy enough for almost anyone. Intel's proposed JavaScript extension is so abstract that programmers need know nothing about the system's hardware. At run time, the web browser's modified JavaScript engine automatically discovers and adapts to any parallel-processing resources available. It can use the CPU's vector-arithmetic instructions, multiple CPU cores per chip, multiple threads per core, and multiple processors per system. The latest experimental version can even assign tasks to some integrated GPUs. [May 14, 2012]

Figure 1: River Trail demo screen.
Figure 2: Parallelism with River Trail.

Redpine's Dual-Band 802.11ac Chip

[Brief Item]

Even a double-talking politician would be challenged to converse over two different radios at once, but Redpine Signals has announced a chip that will do so with ease. The company's Quali-Fi SoC is a next-generation Wi-Fi transceiver that can operate in the 2.4GHz and 5.0GHz radio bands simultaneously. One purpose for dual-band operation is to support the future 802.11ac Wi-Fi standard, which is moving to the roomier 5.0GHz band, while simultaneously handling traffic using previous Wi-Fi standards in the 2.4GHz band. Another dual-band scenario is a single client transmitting 2.4GHz signals to control a device that streams video in the 5.0GHz band using 802.11ac or today's 802.11n. [May 14, 2012]

Freescale Unveils First Vybrid Chips

[Brief Item]

Freescale has announced the first processors in its new Vybrid family, fulfilling a promise made last year when the company disclosed the basic chip architecture. The first five processors have ARM Cortex-A5 CPU cores with Neon extensions, and two of them add a Cortex-M4 controller core. All have on-chip SRAM, LCD controllers, analog/digital converters, Ethernet, and numerous other I/O interfaces. Until now, Freescale referred to these processors generically as asymmetric embedded MPUs (AeMPUs). The multicore chips with dual ARM cores can independently run high-level application code and low-level control code. One multicore Vybrid chip can replace at least two separate chips while keeping software development on a common 32-bit ARM platform. [April 16, 2012]

TI Boosts Base-Station Processors

New KeyStone II Integrates Cortex-A15 CPUs With C66x DSPs

Texas Instruments' first KeyStone II processor builds on the foundation of the company's original KeyStone family, which was designed for macrocell and small-cell base stations. Whereas existing KeyStone processors are built in 40nm technology and have ARM Cortex-A8 CPU cores (or no CPUs at all), KeyStone II processors will be built in 28nm technology and are the first telecommunications chips from any vendor to use the more powerful Cortex-A15. The initial KeyStone II product is the TMS320TCI6636, which will have four Cortex-A15 cores running at 1.2GHz to 1.45GHz plus eight of TI's C66x DSP cores running at 1.2GHz. Like KeyStone I processors, the TCI6636 is designed for macrocell and small-cell base stations. But the new chip aims significantly higher, and system designers can link multiple chips together to build even larger base stations. [April 2, 2012]

Figure 1: TI KeyStone II TMS320TCI6636 block diagram.
Table 1: TI's KeyStone integrated base-station processors.
Table 2: Chips eliminated from a conventional base station design.
Table 3: Key parameters of integrated base-station processors: TI's KeyStone II TCI6636, Cavium's Octeon Fusion CNF7280, and Freescale's QorIQ Qonverge B4860.

Revolutionary DIY CPU EDA

New Design-Automation Tool Makes Anyone a CPU Architect

EDA vendor Synapsys has announced a revolutionary design-automation tool that converts PowerPoint block diagrams of microprocessors into synthesizable Verilog code. By skipping the tedious steps of translating marketing concepts into gate-level logic, the new PowerSynth tool makes design engineers obsolete and allows anyone to be a CPU architect. PowerSynth uses patented artificial-intelligence algorithms to generate production-ready logic from common PowerPoint drawing objects. [April 1, 2012]

Figure 1: Typical SoC blockhead diagram.

Freescale's Qonverge Goes Macro

New B4860 Base-Station Chip Has Four CPUs, Six DSPs

It's been only a year since Freescale unveiled its QorIQ Qonverge family of CPU-DSP base-station processors, but already, the company has announced one of the most powerful such processors to date. The new B4860 combines four multithreaded CPU cores with six of the industry's fastest DSPs to offer a single-chip solution for WCDMA, LTE, and LTE-Advanced (LTE-A) macrocells. It's also compatible with 2G and 3G standards like TD-SCDMA and GSM. The B4860 will bring CPU-DSP integration to large cellular base stations and is among the company's first chips manufactured in 28nm technology. [March 19, 2012]

Figure 1: Two contrasting designs for LTE macrocell base stations.
Figure 2: Freescale QorIQ Qonverge B4860 block diagram.
Table 1: Key features of high-performance DSP cores: Freescale's StarCore SC3900 and SC3850 and TI's C66x.
Table 2: Freescale's QorIQ Qonverge family: the B4860, PSC9132, PSC9131, and PSC9130.

Cortex-M0+ Simplifies 32-Bit MCUs

[Brief Item]

No caches. No FPU. A two-stage pipeline. An instruction set composed almost entirely of 16-bit instructions. Are we in the disco days of the 1970s? Nope, it's 2012, and ARM is introducing a new 32-bit CPU core that plumbs the depths of simplicity and low power consumption. Even by RISC standards, a 32-bit design can't get much simpler than this. It is code-named Flycatcher and officially called the Cortex-M0+. Although it uses less power than a Cortex-M0, it wears a plus sign instead of a minus because it adds features. ARM designed the Cortex-M0+ for 32-bit microcontrollers used in sensor networks, real-time systems, and other deeply embedded applications — particularly those that must run for extended periods on batteries. According to ARM, the Cortex-M0+ is the world's most energy-efficient 32-bit CPU core. [March 19, 2012]

Freescale's QorIQ Adds Threads

Dual Threading Propels New T-Series AMP Processors

Freescale's new QorIQ T-series AMP processors are a significant advance for the company whose communications processors outsell all others. At the recent Linley Tech Data Center Conference in San Jose, Freescale disclosed new details about the T4240 processor and revealed its little brother, the T4160. These chips are the first products in the QorIQ T-series AMP family. The T4240 is a highly integrated design with 12 CPU cores arranged in three quad-core clusters. By contrast, the biggest existing QorIQ design is the eight-core P4080. Moreover, the T4240 debuts Freescale's new 64-bit Power e6500 dual-threaded CPU core, whereas the P4080 uses the 32-bit Power e500mc single-threaded core. The new CPU, apart from other features, ensures that the T4240 will be a much more powerful processor. [February 27, 2012]

Figure 1: Freescale QorIQ AMP T4240 block diagram.
Figure 2: QorIQ T4240 multithreading performance.
Figure 3: Power e6500 CPU cluster.
Figure 4: Freescale's Data Path Acceleration Architecture (DPAA).
Figure 5: QorIQ power management.
Table 1: Key parameters for the Freescale QorIQ T4240, T4160, and P4080, Cavium Octeon II CN6880, and NetLogic XLP832.

Wireless Wants to Wallop Wires

Emerging 802.11ac Wi-Fi Standard Challenges Gigabit Ethernet

Wi-Fi's next generation is 802.11ac. The draft specification is close enough to final that companies like Broadcom, Quantenna, and Redpine Signals are already introducing the first 802.11ac chipsets and other building-block products. The big news about 802.11ac is that it's the first Wi-Fi candidate with enough sustainable bandwidth to seriously challenge Gigabit Ethernet, which is why it's often called "Gigabit Wi-Fi." It has the potential to obsolete wired LANs in homes and small offices, though probably not at larger sites. We've heard such claims before, but 802.11ac has enough improvements to be credible. [February 20, 2012]

Figure 1: Comparison of antenna technologies: SISO, SIMO, MISO, and MIMO.
Figure 2: Quantenna's 802.11ac router.
Table 1: Evolution of 802.11 Wi-Fi standards, 1997-2012.
Table 2: Data rates for 802.11ac antenna configurations, 1x1 SISO to 8x8 MIMO.
Table 3: Broadcom's BCM43xx transceiver chips for 802.11ac.

Marvell Wins Google TV for ARM

[Brief Item]

Marvell's new Armada 1500 media processor supersedes an Intel x86 chip in the second-generation Google TV reference platform — a switch that would be bigger news if Google TV were popular and if Intel hadn't already retreated from this market. Nevertheless, the design win helps establish Armada as an up-and-coming product line for smart TVs, Blu-ray players, and advanced set-top boxes. Although Intel's Atom CE4100 ("Sodaville") processor delivers sufficient performance and has all the required I/O interfaces, it's a first-generation 45nm Atom SoC that burns more power and almost certainly costs more. Last October, Intel closed its smart-TV Digital Home Group to refocus on set-top boxes and Internet gateways, leaving a small vacuum for an ARM ally to fill. [January 30, 2012]

Best Processor Technology of 2011

Micron's Hybrid Memory Cube Stacks Multiple DRAMs in One Package

With a bumper crop of innovations to choose from, deciding which new microprocessor-related technology best deserves our Analysts' Choice Award wasn't easy. Our pick for 2011: the Hybrid Memory Cube, which stacks multiple DRAM chips inside a single package and connects the die using through-silicon vias (TSVs). Although engineers have been working for years on the concept of three-dimensional ICs, new developments in 2011 virtually guarantee that stacked-memory devices are finally on their way to commercial production in the near future. Other technologies we considered for this award are noteworthy, too. We nominated four candidates from Intel: tri-gate (FinFET) transistors, near-threshold voltage (NTV) transistors, the Thunderbolt I/O interface, and AVX2 extensions. In addition, we considered SuVolta's PowerShrink technology and the merged CPU-DRAM architecture of Venray's Tomi Borealis processor. [January 23, 2012]

Figure 1: Hybrid Memory Cube. The first commercial devices will stack four DRAM die connected using through-silicon vias.

Freescale's Asymmetric SoCs

One Chip Has ARMs of Different Sizes

Freescale has its own twist on ARM's recently announced "Big.Little" strategy. Early next year, Freescale will introduce a new family of 32-bit processors that have ARM Cortex-A5 cores and a Cortex-M4 core. This asymmetric or heterogeneous multicore design combines the high performance of application processors with the real-time response of microcontrollers. The new products have no brand name yet, so Freescale refers to them as asymmetric embedded MPUs (AeMPUs). This family will fit between Freescale's existing Kinetis 32-bit MCUs and i.MX 32-bit SoCs, all of which also use ARM CPUs. The new chips can be considered application-class SoCs with real-time credentials or 32-bit MCUs with application aspirations. By contrast, ARM's Big.Little strategy integrates a Cortex-A15 ("big") core with a Cortex-A7 ("little") core and is designed to save power in mobile application processors. [December 5, 2011]

Figure 1: Freescale AeMPU simplified block diagram.

TI's Affordable Cortex-A8 SoCs

[Brief Item]

Embedded SoCs are growing so powerful that they resemble last year's smartphone application processors. In fact, they probably are derived from last year's smartphone processors. Texas Instruments' new Sitara AM335x chips integrate an ARM Cortex-A8 core with Neon SIMD extensions, 3D graphics, a display controller, a touchscreen controller, Gigabit Ethernet, USB, cryptography acceleration, and numerous on-chip peripherals — much like TI's own OMAP3. TI has announced six basic AM335x chips at speed grades of 275-720MHz. Volume pricing ranges from $4.99 to $14.99, and target applications include industrial automation, consumer medical devices, printers, networked vending machines, portable navigators, and consumer electronics. [November 14, 2011]

Altera's Answer to Zynq

New SoC FPGAs Have Dual ARM Cortex-A9 Cores

If Altera buys a toothpaste factory, Xilinx will invest in mouthwash. And if Xilinx prepares to send a man to Mars, Altera will start building rockets. The two leading FPGA vendors are that competitive. So it's no surprise that Altera has announced its answer to Xilinx's Zynq chips, which are customizable SoCs that integrate ARM Cortex-A9 cores and hard peripherals with programmable logic. Altera's new Cyclone V and Arria V "SoC FPGAs" also integrate ARM Cortex-A9 cores and hard peripherals with programmable logic. But Altera isn't a copycat, because both companies are hearing the same pleas from customers and are building on experience with similar products introduced more than a decade ago. [October 31, 2011]

Figure 1: Altera SoC FPGA block diagram.
Figure 2: Replacing discrete chips with an Altera SoC FPGA.
Table 1: Altera's Cyclone V and Arria V SoC FPGAs.
Table 2: Comparing Altera's SoC FPGAs with Xilinx's Zynq processors.

NetLogic's XLP II Grows More CPUs

One Chip With 20 CPUs and 80 Threads Delivers 100Gbps Networking

NetLogic is forging ahead with its next-generation XLP II family of networking processors, even while Broadcom's pending acquisition of the company moves toward resolution. To keep the heat on rivals like Cavium, Freescale, and Intel, NetLogic must keep its transition to a new product line and 28nm technology on schedule. At the Linley Tech Processor Conference in San Jose last week, NetLogic disclosed new information about the XLP II family. The first member is the eight-core XLP332E, which is scheduled to sample in 1Q12. The XLP332E will introduce the EG4400 CPU core, an enhanced version of the MIPS64-compatible EC4400 core found in today's XLP-family processors. The XLP332E will also introduce PCI Express 3.0 and USB 3.0 to NetLogic's product line. [October 10, 2011]

Figure 1: NetLogic XLP332E block diagram.
Figure 2: NetLogic's Interchip Coherency Interface supports four- and eight-chip memory-coherent clusters.
Table 1: Comparison of selected XLP and XLP II processors: the XLP332E, XLP316L, XLP964, XLP980, and XLP832.

Cortex-M4F MCUs Hit 168MHz

[Brief Item]

At the recent Embedded Systems Conference in Boston, STMicroelectronics unveiled the world's fastest MCUs based on ARM's Cortex-M4F. Although there are faster 32-bit MCUs, the 168MHz STM32 F4 chips outrun all others that use ARM's popular digital-signal controller CPU. The STM32 F4 series includes four basic designs with various integrated peripherals. All the chips have a Cortex-M4F with ARM's digital-signal extensions and optional 32-bit FPU. ARM introduced this core last year as its first digital-signal controller (DSC), but it's actually a Cortex-family upgrade from the popular ARM9 and ARM11. [October 10, 2011]

Intel's NTV Technology Saves Energy

[Brief Item]

Intel Labs is experimenting with microprocessors that save energy by running transistors at very low voltages near their threshold between on and off states. Prototype chips are promising enough that commercial products may be only a few years away. If Intel can overcome the reliability and manufacturing challenges, microprocessors using this technology will come close to achieving their maximum theoretical power-performance efficiency. Intel disclosed the near-threshold voltage (NTV) project at the recent Intel Developers' Forum in San Francisco. The concept isn't new — academic research into near-threshold and subthreshold semiconductors stretches back 30 years. In 1972, researchers theorized that the lowest possible operating voltage for a CMOS circuit is 36mV, which some experiments have approached. At levels near the threshold voltage between on and off states, the circuit consumes only about 10% of the energy it uses when operating at its nominal voltage. [October 3, 2011]

Godson-T Weaves Threads

Chinese Research Processor Explores Thread-Level Parallelism

Frustrated by diminishing returns from processors with more than eight CPU cores, a Chinese research team has designed a 64-core processor to explore various paths toward higher performance. Although the experimental design exploits both instruction-level and data-level parallelism, the key to good performance scaling appears to be fine-grained thread-level parallelism. This design requires programmers to explicitly create threads, but a dynamic thread manager supervises their execution, allowing a program to spawn more threads than the processor can execute at once. To reduce the overhead of managing so many threads, the processor needs a hardware-accelerated synchronizer that eliminates deadlocks. Those are some early results of the Godson-T research project, a government-funded endeavor at the Institute of Computing Technology (Chinese Academy of Sciences) in Beijing. [September 19, 2011]

Figure 1: Godson-T CPU-level block diagram.
Figure 2: Godson-T chip-level block diagram.
Figure 3: Godson-T simulation benchmark results.

AVX2 Refreshes x86 Architecture

"Haswell New Instructions" Include 256-Bit Integer Operations

The next major extension of the world's most complex instruction-set architecture is coming in 2013 with a new Intel processor code-named Haswell. This processor will add hundreds of new instructions, including a set called Advanced Vector Extensions 2. AVX2 follows this year's debut of AVX in "Sandy Bridge" PC processors and AMD processors with the new Bulldozer CPU. Also coming in Haswell are 96 fused multiply-add (FMA) instructions with a new three-operand format (FMA3), plus 16 new general-purpose instructions. All together, these "Haswell new instructions," as Intel calls them, herald the biggest x86-architecture expansion in years. And even before Haswell, Intel will introduce seven new instructions with a processor code-named Ivy Bridge in 2012. [August 29, 2011]

Table 1: Summary of Intel's recent and future x86 extensions.
Table 2: Ivy Bridge new instructions.
Table 3: New or improved instructions in Advanced Vector Extensions 2 (AVX2).
Table 4: Haswell's new FMA instructions.
Table 5: Haswell's new general-purpose instructions.

Intel Shows MIC Progress

[Brief Item]

Intel has demonstrated early hardware and software developed for its evolving manycore processors, which aim to expand the x86 architecture's dominance in supercomputers and high-performance computing. At the recent International Supercomputing Conference in Germany, Intel demonstrated software developed by partners using a Many Integrated Core (MIC, pronounced "mike") processor salvaged from the ill-fated Larrabee GPU project. Those partners include CERN (Switzerland), the Korea Institute of Science and Technology Information, and the Leibniz Supercomputing Centre (Germany). Colfax, Dell, Hewlett-Packard, IBM, SGI, and Supermicro showed prototype MIC servers and workstations. The development processor, code-named Aubrey Isle, has 32 CPU cores. When the first commercial MIC processor enters production, it will have at least 50 CPUs. Most customers will buy it as a math coprocessor on a PCI Express board, which is code-named Knights Corner. [July 18, 2011]

Figure: Aubrey Isle die photo with overlay.

Freescale Amplifies QorIQ Family

New AMP T-Series Networking Processors Surpass P-Series Chips

Freescale has announced a whole new series of QorIQ-family processors that will deliver about four times the performance and twice the power efficiency of today's best P-series chips. Scheduled to begin sampling early next year, the AMP (Advanced Multiprocessing) series will debut with Freescale's first multithreaded CPU core, up to a dozen CPUs per chip, higher clock frequencies, faster offload engines, resurrected AltiVec extensions, and other goodies. The first T-series AMP processor will be the T4240, which will have 12 dual-threaded CPUs running at clock speeds of up to 2.0GHz — enough to sustain packet forwarding at 48Gbps in data-plane applications. For control-plane processing, future T5-series chips with six CPUs will aim for clock speeds as high as 2.5GHz. [June 27, 2011]

Figure 1: Freescale QorIQ AMP T4240 block diagram.
Table 1: Freescale's new QorIQ AMP family.

SuVolta Shrinks Transistor Power

[Brief Item]

Emerging after five years in stealth mode, Silicon Valley startup SuVolta is promising to slash the dynamic power consumption of CMOS transistors by 50% and leakage current by 50% to 80% without sacrificing performance. The company says its PowerShrink technology requires only minor modifications to existing bulk-CMOS processes and adds little cost, beyond licensing. Although these claims naturally arouse skepticism, SuVolta has successfully produced SRAM test chips in 65nm and 28nm technology, and PowerShrink has been adopted by a major customer: Fujitsu. The Japanese chipmaker and foundry will use PowerShrink to manufacture ASICs, ASSPs, and SoCs in 65nm CMOS. [June 13, 2011]

Via's First Quad-Core x86 Processor

[Brief Item]

Via Technologies has announced its first quad-core x86 processor — and it's also the world's lowest-power x86 processor sporting four CPUs. Although intended primarily for notebook PCs and entry-level desktops, the company's new Nano QuadCore processor may also find its way into high-performance embedded systems and power-efficient servers. Following a trend set by AMD and Intel, Via's first quad-core device combines two dual-core die in a single package. Each die is an 11mm � 6mm Nano X2, which Via announced in January. The 400-pin multichip package (NanoBGA2) is 21mm square, and it maintains pin compatibility with Nano X2 and several other Via processors: the Nano E Series, Eden, Eden X2, and C7. TSMC will manufacture Nano QuadCore in 40nm CMOS, with volume production scheduled for 3Q11. [May 30, 2011]

Intel Sprouts Fins at 22nm

Tri-Gate FinFET Transistors Renew Intel's Technology Lead

Cadillac introduced tailfins to evoke high-tech style in the 1950s, but Intel's new finned transistors are far from cosmetic. Purely functional, highly efficient, yet equally brash, these fin-shaped field-effect transistors (FinFETs) are sure to be copied as widely as Cadillac's useless appendages were — and they will play a similar role in defining an era. Intel refers to its FinFETs as tri-gate transistors and touts them as the first true three-dimensional devices built on planar integrated circuits. Don't confuse these "3D" transistors with 3D transistor stacking, an entirely different technology that builds transistors in multiple layers. Instead, a FinFET rises above the flat silicon substrate, creating a 3D gate structure that has much more volume than a planar gate while squeezing into approximately the same horizontal space. [May 23, 2011]

Figure 1: Micrographs of planar transistors and Intel's FinFETs.
Figure 2: Two illustrations of an Intel FinFET.
Figure 3: Threshold-voltage (V_t) characteristics of FinFET and planar transistors.
Figure 4: Gate-delay characteristics of FinFET and planar transistors.

Altera Debuts MIPS CPU for FPGAs

[Brief Item]

As Altera promised last year, a new MIPS-compatible CPU core optimized for synthesis in programmable logic is now available. The MP32 core targets Altera's FPGAs, including the high-end Stratix-IV, midrange Arria-II, and low-end Cyclone-III. Although it's larger, more expensive, less configurable, and no faster than Altera's proprietary Nios II core, the MP32 supports the more popular MIPS32 architecture and runs Wind River's VxWorks real-time operating system (RTOS). System Level Solutions (SLS), an Altera partner based in Gujarat, India, will sell and support the MP32. SLS collaborated with Altera and MIPS Technologies on the core's development. [May 16, 2011]

Embedded Memory Shrinks to 40nm

[Brief Item]

At the recent Linley Tech Mobile Conference in San Jose, Kilopass announced the first nonvolatile embedded memory that is reprogrammable up to 1,024 times and is compatible with digital-IC processes in 40nm bulk CMOS. The company has also produced test chips in 28nm CMOS. Branded Itera, the new memory is licensed as intellectual property (IP) to chip designers. It's available now for 40nm processes at GlobalFoundries, TSMC, and UMC. It will be available for 55nm and 65nm processes in 3Q11 and for 28nm processes in 4Q11. Block sizes range from 32 bits to 1Mb, and write endurance ranges from 104 to 1,024 cycles. [May 9, 2011]

NetLogic Doubles Up XLP

New XLP864 Has 16 CPU Cores, 80Gbps Throughput

Not satisfied to have merely the most powerful CPU core in a network processor, NetLogic has doubled the number of CPUs in its highest-end XLP product. The new XLP864 has 16 of NetLogic's MIPS64-compatible EC4400 cores — twice as many CPUs as the company's previous top-shelf chip, the XLP832. Packet-throughput performance doubles to 80Gbps, easily outrunning other multicore embedded processors. The 64-bit XLP864 is designed primarily for data-plane processing in large routers, security appliances, storage subsystems, next-generation cellular networks, and other communications equipment. At its fastest target clock frequency of 2.0GHz, it can process 120 million packets per second. [April 25, 2011]

Table 1: Key parameters for high-end network processors from Cavium, Freescale, and NetLogic.

The 28nm Turning Point

Foundries Bring High-k Metal Gates to the Masses

Metal tools delivered humanity from the Stone Age, and now, metal is enabling another technological breakthrough. For the first time, metal-gate transistors are broadly available to chip designers, allowing them to create higher-performance microprocessors that can still occupy less silicon and consume less power. This nanoscale application of metallurgy has been touted as the biggest advance in electronics since the invention of planar integrated circuits. As usual, Intel got there first. In 2007, Intel introduced the first microprocessors built in its new 45nm high-k metal-gate (HKMG) process. The rest of the semiconductor industry has been waiting four years for the same technology. Now, at the 32/28nm node, the leading independent foundries are introducing their own HKMG processes, and their first 28nm HKMG chips are entering production this year. [April 18, 2011]

Figure 1: Chip shrinkage, 90nm to 28nm.
Figure 2: Transistor evolution, 250nm to 28nm.
Figure 3: Intel's tri-gate transistor.
Table 1: GlobalFoundries 32/28nm-process variations.
Table 2: TSMC 28nm-process variations.

Xilinx ReARMs FPGAs

New Zynq Processors Integrate Cortex-A9 CPUs, Programmable Logic

Some ideas never die, no matter what misfortunes they suffer in the marketplace. One such idea is embedding a hardened CPU core in a programmable logic device. Developers dream of a flexible off-the-shelf alternative to costly ASIC projects and relatively inflexible ASSPs, but a successful formula has thus far eluded FPGA market leaders Xilinx and Altera, as well as several short-lived startups. Now, Xilinx is trying again. On March 1, the company announced the first products in its Zynq Extensible Processing Platform, foreshadowed last year in a joint announcement with ARM. Initial Zynq processors integrate dual 800MHz ARM Cortex-A9 CPUs with 28,000 to 235,000 programmable logic cells, up to 1.86MB of block RAM, hundreds of DSP multipliers, 256KB of tightly coupled memory, on-chip peripherals, and up to 12 high-speed serial transceivers. [March 7, 2011]

Figure 1: Zynq block diagram.
Table 1: Comparison of the Zynq Z-7010, Z-7020, Z-7030, and Z-7040.

TI Accelerates Video Processors

[Brief Item]

Texas Instruments (TI) has announced six DaVinci digital-media processors with faster video accelerators and higher integration, allowing a single chip to replace eight or more chips in some cases. All the new processors unite at least one HD-video accelerator with an ARM Cortex-A8 CPU core and a TI C674x-series DSP, aiming for higher-end video applications such as surveillance systems, multiscreen videoconferencing, professional broadcasting, digital signage, and medical imaging. Lower-power versions of the chips are also suitable for some consumer electronics. [March 7, 2011]

ARM Expands Cortex-R Family

Multicore-Ready Cortex-R5 and Cortex-R7 Raise Performance Bar

ARM is expanding its Cortex-R family of real-time embedded-processor cores with two new CPUs: Cortex-R5, an enhancement of Cortex-R4F, and Cortex-R7, an offspring of the powerful Cortex-A9. Both are designed for single- or dual-core implementations. Cortex-R7 is a radical departure from the norm: never before has an intellectual-property vendor offered such an advanced CPU design for real-time embedded applications. It's a dual-issue superscalar machine with an 11-stage integer pipeline, instruction reordering, speculative execution, and optional symmetric multiprocessing. For less demanding applications, ARM's Cortex-R5 improves on the four-year-old Cortex-R4F. [February 21, 2011]

Figure 1: ARM Cortex-R7 block diagram.
Figure 2: Dual ARM Cortex-R7 MPCore CPUs in an LTE baseband chain.
Figure 3: Cortex-R7/Cortex-R5 pipeline diagram.
Figure 4: Multicore coherence with Cortex-R5 and Cortex-R7.
Table 1: Comparison of ARM's new Cortex-R5 and Cortex-R7 against competing CPUs.

NetLogic's Upgrade for Base Stations

Other XLP316 Variations Aim for Storage, Security

Targeting networked storage systems, 3G/4G base stations, and security applications, NetLogic will sample three versions of its newest quad-core processor this quarter. The XLP316 has Serial ATA (SATA) interfaces, the XLP316L has serial RapidIO (sRIO), and the XLP316S has hardware acceleration for deep packet inspection. All are significant upgrades over previous chips in the XLR and XLS families. Although NetLogic disclosed basic information about the XLP316 during a large rollout of XLP processors last summer, the company didn't announce these variations at that time. (See MPR 7/26/10-01, "NetLogic Broadens XLP Family.") [February 14, 2011]

CPU-IP Cores Fight for Dominance

Licensable Processors and Architectures Battle the x86 in 2011

There was a time when licensable embedded-processor cores led a quiet existence in the shadows of desktop and server processors. No more. As smartphones, tablets, e-readers, and other mobile devices supersede PCs in the minds and pocketbooks of consumers, SoCs with licensable CPU architectures are emerging as the dominant species of microprocessor. This evolution is transforming the industry, and 2010–2011 may be the turning point. This year-end review article summarizes events in 2010 related to licensable embedded-processor cores and considers likely developments in 2011. The rise of tablet computers is a fresh opportunity for many companies, and Microsoft's plans to port Windows 8 to ARM will alter the CPU landscape. [January 17, 2011]

DRAM+CPU Hybrid Breaks Barriers

Radical Chip Design Slashes Power Consumption, Boosts Memory Bandwidth

Today's high-performance microprocessors are mostly memory, not logic. Of the 774 million transistors in an Intel Core i7-860 processor, for example, about 69% are SRAM transistors in the 8MB L3 cache. Now, a Texas-based startup, Venray Technology, is bucking the trend toward bigger caches — and the march toward bigger CPUs, too. Instead of building expensive six-transistor (6T) or eight-transistor (8T) SRAM cells in a logic process to accommodate the processor, Venray is moving the processor to commodity-DRAM processes, whose 1T memory cells are cheaper to manufacture and less leaky. Merging the CPU with DRAM dramatically boosts memory bandwidth, reduces memory latency, and slashes power consumption by eliminating caches and shortening the CPU-memory interface. [December 27, 2010]

Figure 1: Venray's Aurora test chip.
Figure 2: Thread-Oriented MIcroprocessor (TOMI) block diagram.
Figure 3: Block diagram of Venray's "Shirtbook" tablet.

Intel Debuts ASIC Alternative

[Brief Item]

Intel has introduced an interesting alternative to custom chips: an Atom CPU packaged with an Altera FPGA in a multichip module. The new Atom E600C series (previously known as Stellarton) is suitable for some low-volume embedded applications that need to wrap an x86 processor in application-specific logic. One die is an Atom E600-series single-core processor ("Tunnel Creek"), which has a 512KB L2 cache, an Intel GMA600 graphics engine, an Intel HD Audio engine, an LVDS display interface, a 32-bit DDR2-800 memory controller, four lanes of PCI Express, and miscellaneous I/O. The FPGA die is an Altera Arria II GX, which has 60,214 programmable logic elements, 312 DSP blocks, 5.2Mb of embedded memory, one PCIe hardware block, and eight 3.125Gbps transceivers. [December 13, 2010]

AMD's Fusion Finally Arrives

Integrated CPU/GPU Chips Strengthen AMD's Low-Power Play

Tablet computers are the latest craze, making netbooks so...2009. So why are AMD's first integrated CPU/GPU Fusion chips intended mainly for netbooks? For one, Fusion processors for desktop PCs aren't ready yet. Wait until next year. Second, the new processors aren't only for netbooks. If OEM customers want to use these low-power processors to build large-screen notebooks or even desktop PCs, AMD is happy to sell them the chips, no strings attached. And third, despite the hype over smartphones and tablets, netbooks remain a profitable market segment in which AMD has no presence whatsoever. If the struggling company can capture its usual 10% to 20% of the market, its share will be infinitely better than it is now. [December 6, 2010]

Figure 1: AMD Fusion block diagram.
Figure 2: AMD Ontario / Zacate die photo.
Figure 3: Brazos system-architecture diagram.
Figure 4: Power consumption for AMD's Brazos platform versus Intel's Pine Trail platforms.
Table 1: AMD's first Fusion processors.
Sidebar 1: AMD's Blurry Vision
- Figure: AMD's "Vision" labels and matching PC applications.
- Figure: "Vision" labels aligned with AMD's 2011 PC platforms.
Sidebar 2: AMD & Intel Code-Name Glossary

Freescale Extends QorIQ P1 Series

[Brief Item]

Freescale's newest QorIQ communications processors expand the low-end P1 series with four chips that supersede older PowerQuicc models, upgrading the CPU from the Power e300 to the Power e500v2 core. Clock speeds of the new P1010/P1010E and P1014/P1014E will range from 533MHz to 800MHz while holding maximum power consumption to 2.75W. Target applications include small-business routers, network-attached storage (NAS) controllers, digital-video surveillance systems, and industrial control-area networks. [November 29, 2010]

Cavium Completes Octeon II Line

Four New Series Fill Out Family of Networking Processors

Chips are breeding faster than rabbits at Cavium, the rapidly growing supplier of networking and security processors. Today, Cavium announced four new series in the 64-bit Octeon II family, populating a product line that now spans an unprecedented range from 1 to 32 CPU cores per chip. The new Octeon II series are the CN60xx, CN61xx, CN62xx, and CN66xx. They join the CN63xx, CN67xx, and CN68xx series announced earlier this year. The new brood fills the low-end to midrange Octeon II line, leaving no significant gaps. Family members differ in their number of CPUs, clock speeds, L2 caches, memory controllers, networking accelerators, packet interfaces, and other I/O. [November 15, 2010]

Figure 1: Cavium's Octeon II family.
Figure 2: Power savings with Octeon II's PowerOptimizer.
Figure 3: Cavium Octeon II CN66xx block diagram.
Table 1: Key parameters for Cavium's Octeon II family.
Table 2: Comparison of midrange networking processors.
Table 3: Comparison of low-end networking processors.

Altera Adds CPUs for FPGAs

New Options: Cortex-A9 and MIPS32, Plus Intel's Stellarton

Embedding a CPU core and peripherals in an FPGA is the fastest way to rush an SoC to market, but the disadvantages have kept most developers loyal to conventional fixed logic. Now, Altera is again using embedded CPUs as bait to lure the industry toward reconfigurable logic. This time, the world's second-largest FPGA company is pitching three different CPU architectures — ARM, MIPS, and Nios II — plus a fourth option of the x86 in a multichip module from Intel. These choices span a broader range of implementation options than ever before. Developers will be able to choose a hard core (the foundry builds the CPU in fixed logic on the same die as the programmable fabric), soft cores (developers compile a synthesizable CPU for the fabric at design time), and the Intel multichip module (which pairs an Atom processor with an Altera FPGA). Although Altera and other FPGA vendors have offered both hard and soft CPUs for years, Altera's new "Embedded Initiative" is the most comprehensive CPU-FPGA strategy to date. [October 25, 2010]

Figure 1: Altera's options for implementing SoCs in FPGAs.

MIPS Boosts Multiprocessing

New MIPS32 1074K Challenges ARM Cortex-A9 and Cortex-A15

MIPS Technologies is fighting to defend its strong market positions in home consumer electronics and networking while trying to win new ground in mobile electronics. To accomplish these objectives, the company needs increasingly powerful processors that reduce power consumption — the same conflicting design goals that bedevil almost all of today's CPU vendors. At the recent Linley Tech Processor Conference in San Jose, MIPS introduced its most powerful embedded-processor core to date: the MIPS32 1074K. Designed primarily for multicore SoCs with two to four CPUs, the 32-bit 1074K combines the strong single-thread performance of the MIPS32 74K with the cache-coherent multiprocessing of the MIPS32 1004K. The licensable 1074K is a fully synthesizable core, is portable to any foundry, and is available now. [October 11, 2010]

Figure 1: High-performance MIPS32 licensable CPU cores.
Figure 2: MIPS32 1074K pipeline diagram.
Figure 3: MIPS Coherent Processing System (CPS).
Table 1: MIPS32 1074Kf specifications in TSMC's 40nm-G process.
Table 2: Comparison of the MIPS32 1074K/1074Kf with the MIPS32 1004K/1004Kf, IBM PowerPC 476FP, and ARM Cortex-A9 MPCore. All are high-performance licensable 32-bit CPUs with cache-coherent SMP.

Godson-3 Adds Vector Extensions

Chinese Hope New Processors Will Challenge Top 500 Supercomputers

Aiming to build the world's fastest supercomputer using domestic technology, the Chinese Academy of Sciences is boosting the performance of its home-grown Godson microprocessors with powerful vector-processing units, new SIMD instructions, additional CPU cores, and a leap to 28nm technology. Within a few years, the Chinese hope, an all-native machine will rule the Top 500 list of the world's biggest iron. Three new Godson chips are in development. Godson-3C will be the fastest new member of the family, as well as the most sophisticated Chinese microprocessor yet disclosed. Scheduled for production in 2012, it will be manufactured in a 28nm process and will have 16 CPU cores — twice as many as Godson-3B, another new processor, which achieved first silicon in a 65nm process this month. A third new chip, Godson-2H, is a smaller single-core design with integrated GPU, memory controller, and peripheral controllers. Intended for low-cost PCs, netbooks, and embedded systems, Godson-2H will also be manufactured in 65nm and is slated for production in 2H11. [September 27, 2010]

Figure 1: Milestones of Godson evolution.
Figure 2: Godson GS464V block diagram.
Figure 3: Godson's memory-access coprocessor.
Figure 4: Godson-3B layout.
Figure 5: Godson-2H block diagram.

Freescale Upgrades QorIQ

Faster P1- and P2-Series Processors, Plus New PowerQuicc Chips

Freescale is adding packet-acceleration hardware to the QorIQ P1 and P2 series, matching a feature previously available only in the higher-priced P3, P4, and P5 series. At the same time, Freescale announced a new series of PowerQuicc II Pro chips, reassuring customers that the older PowerQuicc family lives on. The new QorIQ chips are the single-core P1017, dual-core P1023, and quad-core P2040. All have Freescale's Data-Path Acceleration Architecture (DPAA) — a fancy name for the packet-acceleration hardware that first appeared in the eight-core P4080, Freescale's largest networking chip. Extending DPAA to the lower-priced chips allows software developers to use the same code, tools, drivers, frameworks, and application programming interfaces (APIs) across the whole QorIQ family. [September 6, 2010]

Figure 1: Freescale QorIQ P2040 block diagram.
Table 1: New processors in Freescale's QorIQ family.
Table 2: Comparing Freescale's P2040 with Cavium and NetLogic competitors.
Table 3: Key parameters for the new PowerQuicc MPC830x processors.

AMD's Bobcat Snarls at Atom

Low-Power x86 Core Gives AMD Teeth in Mobile-PC Fight

For two years, Intel's Atom processors have utterly dominated the low-power x86 market, winning designs in the vast majority of netbooks while gaining market share in other segments as well. Atom has almost totally eclipsed Via Technologies' Centaur processors, which pioneered the concept of a smaller and simpler x86. Meanwhile, AMD has been virtually AWOL. Athlon Neo runs much hotter than Atom, and AMD's other x86 processors are optimized for high performance in servers, desktops, and mainstream notebooks. Now, AMD is clawing back. Its newest CPU core, code-named Bobcat, should beat Atom in single-thread performance at similar subwatt power levels. AMD estimates that Bobcat will deliver 90% of the performance of today's mobile-PC processors in half the die area. [August 30, 2010]

Figure 1: AMD Bobcat block diagram.
Figure 2: Bobcat pipeline diagram.
Table 1: Key parameters for AMD's Bobcat, Bulldozer, and Athlon Neo.
Table 2: Comparing AMD's Bobcat with Intel's Atom and Via Technologies' Nano.

AMD 4000 Is Cool for Clouds

New AMD Server Processors Reduce Power in Data Centers

(With Linley Gwennap)

AMD has its head in the clouds. Its new Opteron 4100 server processors are intended for cloud-computing data centers that buy servers by the truckload. With prices starting at $99 per chip and typical power as low as 32W, Opteron 4100 processors are challenging Intel's lowest-power Xeons in servers having one or two sockets. Although they can't match Xeon's most power-efficient models, they offer a less expensive alternative while still going easy on the electricity. These prices and power levels may seem low for server processors, but at AMD, $99 buys a quad-core chip running at 2.2GHz, and 32W represents 100% utilization for a server processor with six CPUs. In all, AMD has introduced nine new Opteron 4100 processors. [August 9, 2010]

Figure 1: AMD Opteron 4100 and Opteron 6000 block diagrams.
Table 1: Key parameters for AMD's new 4100-series server processors.
Table 2: AMD's energy-efficient Opteron versus Intel's low-power Xeon.
Table 3: Comparison of low-priced versions of Opteron and Xeon.
Sidebar: Power Plays: ACP vs. TDP

NetLogic Broadens XLP Family

Multithreading and Four-Way Issue with One to Eight CPU Cores

NetLogic is unleashing its first barrage of networking and communications processors since acquiring RMI last year. Nine new chips are scheduled to sample this fall, each with the four-way multithreading and four-issue superscalar features of the previously announced eight-core XLP832. The new chips have one, two, four, or eight CPUs. The single-core XLP104, XLP204, and XLP304 processors are designed for small-business networking equipment supporting packet-throughput rates of 100Mbps to 4Gbps. For enterprise equipment requiring packet rates of 2Gbps to 40Gbps, NetLogic announced the dual-core XLP208, XLP308, and XLP408, plus the quad-core XLP316 and XLP416. At the high end of the family, the previously announced eight-core XLP832 will be joined by another eight-core chip, the XLP432. These two chips, which are designed for network infrastructure, scale from 10Gbps to 160Gbps. [July 26, 2010]

Figure 1: Multithreading and superscalar execution in NetLogic's EC4400.
Figure 2: NetLogic EC4400 CPU block diagram.
Figure 3: NetLogic XLP832 block diagram.
Figure 4: NetLogic's interchip interface (ICI).
Table 1: Key parameters for NetLogic's XLP family.
Table 2: Comparing NetLogic's XLP308 with processors from Cavium, Freescale, and Intel.
Table 3: Comparing NetLogic's XLP208 with processors from Cavium and Freescale.

Tears for Tier Logic

[Brief Item]

FPGA startup Tier Logic looks doomed after failing to raise enough money to move its first chips into production. The company, founded in Silicon Valley in March 2002, has spent about $20 million from its first-round investors and needs another $20 million to $30 million to bring its chips to market and reach breakeven. Tier Logic has operational samples of its first programmable-logic chips and has already taken orders from early customers. The company had planned to begin production by the end of this quarter. [July 19, 2010]

Freescale's P5 Raises QorIQ's I.Q.

New Networking Chips Will Exceed 2.0GHz, Debut 64-Bit CPU

Freescale Semiconductor is making the leap to 2.2GHz and 64 bits. Although Intel and most MIPS-based competitors are already shipping 64-bit network processors, Freescale has stuck with the 32-bit Power Architecture CPUs that have been the cornerstone of its PowerQuicc line since the 1990s. Although Freescale will continue making 32-bit processors for years to come, its new P5-series chips in the QorIQ family will introduce a 64-bit Power Architecture core, which is capable of multigigahertz clock speeds. [July 5, 2010]

Figure 1: Freescale QorIQ P5020 block diagram.
Table 1: Freescale Semiconductor's QorIQ family.
Table 2: Comparison of Freescale's new QorIQ P5 chips with existing Freescale chips.
Table 3: Comparison of Freescale's QorIQ P5 series with competitors.
Table 4: Comparison of Freescale's QorIQ P3041 with two quad-core competitors.

Kilopass Brings Gusto to Memory

Improved Antifuse Nonvolatile Memory Gives SoC Designers More Options

Kilopass, already an established player in nonvolatile memory (NVM), has introduced an improved version of its antifuse one-time-programmable (OTP) memory. Called Gusto, it's licensed as process-portable intellectual property (IP). It is the industry's first 4Mb OTP, quadrupling the capacity of existing OTP memories. It's large enough to store boot code and system firmware, rather than just code patches, configuration code, and trim settings for analog components. In addition, Kilopass claims Gusto reads memory two to four times faster, cuts active power consumption by an order of magnitude, and slashes current leakage in standby mode by a factor of 40. [June 14, 2010]

Figure 1: Two-transistor antifuse bit cell.
Figure 2: Antifuse bit-cell circuit diagrams.
Figure 3: Electron micrograph of a Kilopass 2T antifuse bit cell at 40nm.
Figure 4: XPM versus Gusto area comparison.
Figure 5: Integration comparison of two hypothetical smartphones.
Table 1: Gusto versus XPM.
Table 2: Comparison of nonvolatile-memory (NVM) technologies.

Intel Adapts Larrabee for HPC

[Brief Item]

Intel's troubled manycore-processor project is steering away from discrete 3D graphics in favor of high-performance computing (HPC), mainly for scientific and engineering applications. It's a wise maneuver that will salvage Intel's investment in the Larrabee project, and the new direction is better suited to Intel's experience and expertise. But it won't avoid a collision with Nvidia, which is surging into the same market. Intel revealed new details about its HPC strategy at the recent Super Computing Conference in Hamburg, Germany. The x86-based family of GPUs code-named Larrabee will spawn a new family of manycore processors code-named Knights. Both Larrabee and Knights can integrate dozens of x86 processor cores on a single chip. Intel now refers to this technology as the Many Integrated Core (MIC) architecture. [June 15, 2010]

Intel Cuts Atom's Power

Atom-Based Moorestown Chip Sets Aim for Smartphones and Tablets

(With Linley Gwennap)

Moorestown is Intel's code-name for a platform that includes a highly integrated Atom processor, a lower-power system-logic chip, a mixed-signal chip, low-level software, and improved system-level power management. The platform is intended for high-end smartphones, tablet computers, and the handheld computing devices that Intel formerly called mobile Internet devices, or MIDs. It will compete with processors designed for trendy products like the Apple iPhone, Nexus One, and Nokia N900, but probably not with more-integrated processors designed for mainstream smartphones, like the Blackberry Bold and Blackberry Curve. [May 31, 2010]

Figure 1: Menlow versus Moorestown integration.
Figure 2: Lincroft block diagram.
Figure 3: Die-photo comparison of Lincroft versus Silverthorne.
Figure 4: Briertown block diagram.
Figure 5: Lincroft's burst mode.
Figure 6: Thermal images of Lincroft in two different power states.
Figure 7: Forecast of smartphone processor shipments from 2005 to 2014.
Table 1: Power states for systems built with Intel's Moorestown chip set.
Table 2: Estimated battery life of a Moorestown smartphone.
Table 3: Comparison of Intel's Moorestown with leading smartphone processors from Texas Instruments and Qualcomm.
Sidebar: Atomic Smartphones
Sidebar: Moorestown Goes Embedded
Sidebar: Decoding Intel's Code Names

Editorial: Smartphone Spectrum Disorder

Broadcast television in America, once described as a vast wasteland, now looks more like prime real estate. Or rather, the radio-frequency spectrum that broadcast TV occupies is the suddenly valuable property. So valuable that some people in the telecommunications industry want to seize all that RF spectrum for wireless telephony and banish terrestrial TV broadcasting to the dustbin of history. However, the real issue isn't the alleged obsolescence of broadcast TV. It's the shortage of high-quality RF spectrum for wireless data services — in particular, wireless services for smartphones and tablets. The wireless telcos and handset vendors are painting a marvelous vision of the future in which everyone carries a wireless device that delivers a dazzling array of features and services. Unfortunately, there isn't enough spectrum available to make the vision come true. [April 30, 2010]

Why Apple Wants Intrinsity

Low-Power ARM-Compatible Cores Are Ideal for iPhones and iPads

Apple's stealthy acquisition of Intrinsity is the latest strategic move toward becoming a fully integrated consumer-electronics company. To differentiate its products and justify their higher prices, Apple must do more than wrap trend-setting industrial design and slick system software around other suppliers' standard parts. By developing custom SoCs and embedded-processor cores, Apple is assuming more risk, but the potential payoffs are great: less dependence on third-party suppliers, greater differentiation, higher retail prices, and richer profit margins. Now, Apple is absorbing Intrinsity, a small Austin-based company that sells embedded-processor cores, circuit-design tools, design services, and innovative intellectual property. Microprocessor Report has been covering Intrinsity for ten years — or even longer, counting the company's earlier incarnations as EVSX and Exponential Technologies. [April 26, 2010]

Photo: The main applications processor in the iPad is an Apple-designed SoC, called the A4. It's based on a 1.0GHz ARM-compatible processor core — either a conventional ARM Cortex-A8 or Intrinsity's souped-up Hummingbird core, which is fully compatible with the Cortex-A8.

ARM's Digital Signal Controller

New Cortex-M4 Brings DSP Extensions to Cortex-M Family

ARM is pitching its new Cortex-M4 processor as a digital signal controller (DSC) — the first time ARM has so described one of its processor cores. Essentially, a DSC crosses a digital signal processor (DSP) with a microcontroller (MCU) for double duty in controller applications that need a little signal processing. But, in fact, the Cortex-M4 is not a departure for ARM. It's more like a bridge between the ARM9, ARM11, and Cortex-M families. It adopts the same DSP and SIMD extensions introduced with the ARM9E processor core in 1999, later inherited by the ARM11 family in 2002. In essence, the Cortex-M4 provides a Cortex upgrade path for existing ARM9 and ARM11 designs. It can also upgrade designs based on fellow members of the Cortex-M family — the Cortex-M0 and Cortex-M3. [April 12, 2010]

Figure 1: ARM Cortex-M4 block diagram.
Figure 2: The ARMv7-M ISA as implemented in the Cortex-M family.
Table 1: Single-cycle MAC instructions for the ARM Cortex-M4.
Table 2: ARM Cortex-M4F floating-point instruction set.
Table 3: Feature summary of the ARM Cortex-M4, Ceva TeakLite-II, Tensilica ConnX D2, Tensilica Diamond Standard 212GP, and Virage Logic ARC 610D.

Tabula's Time Machine

Rapidly Reconfigurable Chips Will Challenge Conventional FPGAs

Tabula, a Silicon Valley startup, has announced new programmable-logic devices that emulate three-dimensional stacked chips by rapidly reconfiguring their two-dimensional fabrics. With these devices, the third spatial dimension exists for only a split-second slice of time. Tabula's devices can completely reconfigure their fabrics up to 1.6 billion times per second. That's about one million times faster than conventional FPGAs. Rapid reconfiguration makes the physical fabric seem much larger than it really is. Tabula's first-generation chips can reuse the same physical gates for as many as eight different functions. In this way, a Tabula chip can match the capacity of an FPGA that's larger and more expensive. [March 29, 2010]

Figure 1: Like a conventional FPGA/PLD, a Tabula 3PLD has only one physical fabric. By rapidly reconfiguring the fabric, each physical gate can perform up to eight different functions.
Figure 2: Tabula uses time to emulate the third spatial dimension, making one fabric seem like eight fabrics stacked together. Each configuration is called a "fold" because it folds time into space.
Figure 3: Tabula uses transparent latches as "time vias" to pass signals forward in time from one fold (fabric configuration) to another.
Figure 4: In a Tabula chip, time isn't linear. It's an endless loop, because the last fold wraps around to the first fold. The virtual stack is really a torus.
Figure 5: Tabula's first-generation devices run at 1.6GHz, but the perceived user clock speed depends on the number of folds. With eight folds — the maximum in these first devices — the "user clock rate" is 200MHz.
Figure 6: As chip-fabrication technology improves, Tabula's 3PLDs may derive greater benefits from Moore's law. Faster clock speeds allow Tabula to add more folds to its virtual 3D fabric. This has implications for interconnects, as well as for gate density.
Figure 7: Memory access in a Tabula Abax 3PLD. Although Tabula's chips use single-ported SRAM instead of dual-ported SRAM for user memory, different function blocks can independently access the same memory during each fold.
Figure 8: Physical layout of logic tiles in a Tabula 3PLD.
Figure 9: Floor plan of Tabula's Abax A1EC06, the highest-end 3PLD that Tabula has announced so far.
Table 1: Feature comparison of Tabula's first four Abax 3PLDs.
Sidebar: Another Three-Dimensional FPGA Debuts [Tier Logic]

In Memoriam: Ellen Clements

It is with great sadness that we report the passing of our longtime colleague, Ellen Clements. Few readers of Microprocessor Report are familiar with Ellen, because her name didn't appear in the newsletter. Yet, for 17 years, Ellen was one of the people who worked behind the scenes to ensure its quality.

As far as anyone can remember, Ellen was the only copy editor in the 23-year history of Microprocessor Report. She began editing the newsletter in 1993, six years after it was founded by Michael Slater in 1987. As a freelance contractor, Ellen edited every article for grammar, spelling, and style. She also maintained our in-house style guide.

Ellen had a long career in Silicon Valley as an editor, ghost writer, editorial consultant, and industry analyst. She entered the field in 1977 as an analyst for Dataquest, covering minicomputers and printers. In addition to copy editing for Microprocessor Report, she worked with many other clients.

Her academic background was eclectic. Ellen graduated from the Bronx High School of Science in 1950, then earned a B.A. in English from Hunter College (City University of New York) in 1961. She did graduate work in sociology, anthropology, and linguistics at New York University, followed by additional study in German at the University of Vienna. Later, after moving to Silicon Valley, she attended De Anza College and Foothill Community College.

Ellen was an extrovert, an excellent conversationalist, a dreamer, and a romantic in the European sense. She loved opera and Shakespeare.

Ellen's passing was sudden and unexpected. She was working until her last day. In-Stat and the staff of Microprocessor Report offer our condolences to her family and especially to her surviving son and daughter, Duncan and Amanda. Ellen, we will miss you. [March 29, 2010]

Editorial: Sun Fades Into Oracle's Orbit

It's not sunset yet. Now that Oracle's $7.4 billion acquisition of Sun Microsystems has closed, Oracle has made a public commitment to keep Sun's most important products and technologies shining. Those technologies include the SPARC microprocessor architecture and Java software platform. In late January, hundreds of customers, industry analysts, and reporters gathered at Oracle's headquarters in Redwood Shores, California, to hear Oracle and former Sun executives describe their plans for the merged company. To be sure, the presentations were glossy and often lacked detail. However, the following messages were clear: Sun will not be drastically downsized in a quest for quick profits; Oracle is reinvesting in Sun's key product lines, including SPARC, Solaris, and Java; and Sun's hardware completes Oracle's evolution into a vertically integrated enterprise-technology company, much like IBM in the 1960s. Oracle didn't acquire Sun solely for the software, as some observers speculated. [February 22, 2010]

Photo: IBM's advertising for new POWER7-based servers unmistakably shows a sun in eclipse, but Oracle is fighting the FUD.

The Rise of Licensable SMP

New PowerPC 476FP Processor Core Challenges ARM and MIPS

As embedded applications demand more performance, we're seeing more interest in licensable microprocessor cores specifically designed for symmetric multiprocessing (SMP). Of course, chip designers can use any processor cores for this purpose, but only a few cores have the built-in features, coherency control, and coherent debugging that make SMP easier to implement. ARM introduced the ARM11 MPCore in 2004, followed by the Cortex-A9 MPCore in 2008 and Cortex-A5 MPCore last year. MIPS Technologies introduced the MIPS32 1004K Coherent Processing System in 2008. All these cores are licensable 32-bit embedded processors supporting two-, three-, or four-way SMP with coherent memory systems. Now IBM is joining the race with the new PowerPC 476FP. Top speed exceeds 2.0GHz, or 1.6GHz under worst-case conditions. It has an FPU, and it supports coherent SMP systems with up to eight cores — twice as many cores as ARM or MIPS. [February 16, 2010]

Figure 1: IBM PowerPC 476FP block diagram. This is one of the most complex 32-bit embedded-processor cores yet seen.
Table 1: Feature comparison of the IBM PowerPC 476FP, ARM Cortex-A9 MPCore, MIPS 1004Kf, and ARM Cortex-A5 MPCore. All these 32-bit licensable embedded-processor cores are designed for coherent symmetric multiprocessing.

ARC 601 Gets Small

Virage Logic Introduces Tiny 32-Bit Microcontroller Core

It's a race to the bottom — in a good way. The trend of replacing 8- and 16-bit microcontrollers with faster 32-bit devices has processor vendors rushing to shrink their cores to tinier dimensions. The smaller the core, the smaller the compromise in power consumption and cost when developers leave their 8- and 16-bit chips behind. The latest entry in this race is the ARC 601. It's the first new processor core Virage Logic has introduced since acquiring ARC International in September 2009. Although the ARC 601 is a relatively minor variation of the five-year-old ARC 605, it affirms that Virage Logic is committed to the ARC product line and isn't retreating before market leader ARM. [January 19, 2010]

Figure 1: Two closely related microarchitectures form the basis of the ARC microprocessor product line.
Table 1: ARC 601 metrics in three TSMC fabrication processes: 130nm-G, 90nm-G, and 65nm-GP.
Table 2: Feature comparison of the Virage Logic ARC 601, ARM Cortex-M0, Cambridge Consultants XAP5a, Cortus APS3, MIPS Technologies MIPS32 M14K, and Tensilica Xtensa 8 processors.

Editorial: Augmented Reality — and Larrabee

Virtual reality is so...1990s. Sure, artificial environments are beguiling, whether they are created for videogames (World of Warcraft), virtual worlds (Second Life), Hollywood blockbusters (Avatar), or professional training (flight simulators). But now, virtual reality is looking like a stepping stone toward a grander concept: augmented reality. Augmented reality combines some features of virtual reality with actual reality. It can overlay a live view of the real world with computer-generated graphics or textual information, building an enhanced version of reality that's easier to interpret or navigate. Sometimes, augmented reality fabricates astonishing illusions that are entertaining as well as informative. Eventually, actual reality may come to seem drab, confusing, even dangerous. [December 28, 2009]

Photo 1: Ludwig Fuchs of RTT AG and Nvidia CEO Jen-Hsun Huang demonstrate augmented reality at the GPU Technology Conference.
Photo 2: Google Goggles can identify books and paintings photographed with an Android smartphone camera.
Photo 3: Google Goggles can identify places of interest in live video images captured with an Android smartphone.

CPU Marketing: The Next Frontier

Will 'Intel Inside' Matter for Smartphones? History May Tell.

Since the 1990s, AMD and Intel have been marketing their microprocessors directly to consumers, using strategies that resemble the mass marketing of automobiles, fast food, laundry detergent, and other consumer products. But it wasn't always that way. In the 1980s, the idea seemed as silly as marketing capacitors to the general public. The transition of microprocessors from anonymous electronic components to consumer products is a fascinating study that was the subject of a recent discussion panel at the Computer History Museum in Silicon Valley. But it's not just a history lesson. The coming collision between ARM and Intel in smartphones could be the force that brings PC-style microprocessor marketing to this new frontier. [December 14, 2009]

Photo 1: A discussion about microprocessor marketing at the Computer History Museum brought together five panelists: moderator David Laws, formerly of AMD; Jack Browne, formerly of Motorola; and Melissa Rey, Claude Leglise, and Dave House, formerly of Intel.
Photo 2: Claude Leglise was Intel's marketing manager for all x86 processors from the 8086 to the 486.
Photo 3: Melissa Rey was a marketing communications manager at Intel who promoted all the x86 processors from the 8086 to the 386.
Photo 4: Dave House, a former Intel senior vice president, helped run Intel's microprocessor business from 1978 to 1991.
Photo 5: Jack Browne was the marketing manager for Motorola's high-end microprocessors from 1981 to 1992.

Tensilica Tweaks Xtensa

Xtensa LX3 and Xtensa 8 Cores Boost Performance, Cut Power

Tensilica is introducing two new versions of its configurable embedded-processor cores: the Xtensa LX3 and Xtensa 8. In addition to having new features, they are generally smaller and faster than their predecessors and use less power when fabricated in the same CMOS process. The Xtensa 8 core is the smallest base configuration of Tensilica's Xtensa architecture and is intended primarily for 32-bit microcontrollers. The Xtensa LX3 is much more configurable than Xtensa 8 and is intended primarily for data-plane processing and signal processing. New features include ConnX 16-bit DSP extensions, a smaller version of Tensilica's Vectra LX DSP engine, a double-precision floating-point math accelerator, more system-bus options, better SystemC modeling, and code enhancements for C and C++ programmers. [November 30, 2009]

Figure 1: Tensilica's DSP product line. All these DSP cores were based on the Xtensa LX2 and are moving to Xtensa LX3.
Figure 2: New asynchronous bus bridge for the Xtensa LX3.
Figure 3: Tensilica Xtensa LX3 block diagram.
Table 1: Software emulation vs. hardware acceleration for double-precision floating-point operations, measured in clock cycles.
Table 2: Performance comparison, Xtensa LX3 vs. Xtensa LX2.
Table 3: Tensilica's Xtensa LX3 performance estimates, assuming two different fabrication processes, core configurations, and design flows.
Table 4: Feature comparison of the Tensilica Xtensa LX3, ARM Cortex-A5, MIPS Technologies MIPS32 74K, MIPS32 24KE, and Virage Logic ARC 750D.
Sidebar: Tensilica Debuts Xtensa 8 Core

MicroMIPS Crams Code

New Processor Cores Introduce Denser 16/32-Bit Instruction Set

Smaller is usually better for embedded processors, so MIPS Technologies is slimming down its 1980s-vintage instruction-set architecture. A new set of 16- and 32-bit instructions — dubbed microMIPS — uses less memory than existing 32-bit MIPS instructions and the 16-bit extensions added in the 1990s. MicroMIPS will debut early next year in two new embedded-processor cores, the MIPS32 M14K and MIPS32 M14Kc. The M14K is an improvement on the MIPS32 M4K processor, introduced in 2002. Its bigger brother, the M14Kc, is an improvement on the MIPS32 4KEc processor, introduced in 2003. [November 16, 2009]

Figure 1: The new MIPS32 M14K and MIPS32 M14Kc processor cores introduce the microMIPS 16/32-bit instruction set and anchor the lower end of the MIPS product line.
Figure 2: MIPS32 M14K processor block diagram.
Figure 3: M14K processor flash-memory accelerator.
Figure 4: Interrupt chaining in the MIPS32 M14K and M14Kc processors.
Figure 5: MIPS32 M14Kc processor block diagram.
Table 1: MicroMIPS instruction set.
Table 2: MIPS32 M14K and M14Kc processor power/performance comparison.
Table 3: Feature comparison of the MIPS32 M14K, M14Kc, M4K, 4KEc, ARM Cortex-A5, and Cortex-M3 cores.

ARM's Midsize Multiprocessor

New Cortex-A5 Supports Four-Way Coherent Multiprocessing

Multicore processors are becoming so commonplace that even basic cellphones, MP3 players, and other mobile embedded systems are embracing them. That's why ARM has announced its smallest Cortex A-series multiprocessor core, the Cortex-A5. In a single-core configuration, it's small enough for workhorse microcontrollers, but a four-horse team of them can haul much bigger loads. Code-named Sparrow, the Cortex-A5 is the third member of the Cortex-A family. Although it's smaller and slower than the Cortex-A8 or Cortex-A9 MPCore, it supports coherent multiprocessing with up to four cores, as well as uniprocessor configurations. ARM is positioning the 32-bit Cortex-A5 as a superior substitute for the five-year-old ARM1176JZ(F)-S and a major upgrade from the eight-year-old ARM926EJ-S. [October 26, 2009]

Figure 1: Even in a uniprocessor configuration, ARM's new Cortex-A5 is faster than similar members of the ARM9 and ARM11 families and is much more energy efficient.
Figure 2: ARM Cortex-A5 block diagram.
Figure 3: ARM Cortex-A5 pipelines.
Figure 4: ARM Cortex-A5 trial layout.
Figure 5: ARM Cortex-A5 quad-core block diagram.
Table 1: Feature comparison of the Cortex-A5, Cortex-A8, ARM1176JZ(F)-S, ARM926EJ-S, and Cortex-M3.
Table 2: Feature comparison of the ARM Cortex-A5, Cortex-A9 MPCore, ARM11 MPCore, and MIPS32 1004K processors.

Looking Beyond Graphics

Nvidia's New GPU Architecture Energizes High-Performance Computing

Nvidia's next-generation GPU architecture, code-named Fermi, adds powerful new features for general-purpose computing. Fermi processors will continue to shoulder the graphics workloads in PCs, but they are taking the largest step yet toward becoming equal-partner coprocessors with CPUs. Fermi is the first GPU architecture to have ECC-protected memory and to be fully programmable in C++. Double-precision floating-point performance is eight times faster than Nvidia's previous generation. With numerous additional improvements, Fermi significantly advances the state of the art in this field. [October 5, 2009]

Figure 1: Nvidia estimates that the total available market for GPU computing is at least half as large as the desktop-PC market for GPUs.
Figure 2: CUDA-core block diagram.
Figure 3: Streaming-multiprocessor block diagram.
Figure 4: CUDA multithreading in the Fermi architecture.
Figure 5: Fermi architecture block diagram.
Figure 6: Fermi's memory hierarchy.
Figure 7: Memory allocation with CUDA.
Figure 8: CUDA kernels are modified versions of conventional functions written in C.
Table 1: Comparison of Nvidia's three CUDA-capable GPU architectures: G80, GT200, and Fermi.

Editorial: Picoprojectors Hit the Mainstream

Another future has arrived. Last year, my colleague Max Baron analyzed competing technologies for picoprojectors — tiny video projectors occupying less than a cubic inch of space. Although picoprojector modules began appearing in small presentation projectors and other specialized devices, the technology hadn't quite hit the consumer mainstream. Then, in August, Nikon revealed the world's first digital camera with a built-in projector. The Coolpix S1000pj displays still photos and video clips at VGA resolution. Eventually, picoprojectors will replace bulky video projectors and will liberate portable video from the confining dimensions of tiny LCDs. More important, picoprojectors will allow inventors to create new products we haven't dreamed of yet. [September 28, 2009]

Photo: Nikon Coolpix S1000pj.

Summer Shopping Spree

Intel Buys Cilk Arts and RapidMind; Virage Logic Wants ARC

Three recent business deals are of special interest to programmers and chip developers. First, Intel has acquired Cilk Arts and RapidMind, two small but brainy companies specializing in development tools for parallel programming. Second, Virage Logic is buying ARC International, which will alter the competitive landscape for licensable embedded-processor cores. All these moves are further evidence that forward-thinking companies are taking advantage of recessionary prices to strengthen their positions for recovery. Intel's late-summer purchases of Cilk Arts and RapidMind (for prices undisclosed) follow its early-summer $884 million acquisition of Wind River Systems. Virage Logic's bid for ARC will add synthesizable microprocessor cores and configurable-processor technology to its growing portfolio of licensable intellectual property (IP). [September 14, 2009]

Intel Defends x86 Strategy

Desktop PCs Are Still Important, but Mobile Computing Is Crucial

As personal computing migrates from desktops to pockets, Intel knows it must push the x86 architecture into ever-smaller, lower-power, lower-cost systems. But investors and financial analysts are watching the lower prices of Intel processors and worry that Atom will cannibalize Intel's most lucrative line of business. Their worries aren't entirely unfounded. Never has the price difference between Intel's low-end and high-end PC processors been so wide and the performance difference so narrow. But the new markets offer tremendous opportunities for growth, so Intel must pursue them, even at the risk of price erosion. [August 24, 2009]

Tensilica Plays Baseband

New ConnX DSP Core Aims for Low-Power Wireless Communications

Tensilica's ConnX Baseband Engine — a CPU/DSP core optimized for wireless baseband processing — signals a new direction for the 12-year-old company. Although Tensilica says most of the 350 million processor cores it has shipped are performing DSP tasks already, Tensilica has always styled itself as a vendor of configurable RISC CPUs. Now, with ConnX BBE, the company is making a major play for DSPs. ConnX BBE has provisions for multicore designs and is suitable for infrastructure equipment as well as for next-generation cellphones. [August 10, 2009]

Figure 1: In-Stat's forecast of 4G/LTE cellular handset sales.
Figure 2: ConnX Baseband Engine block diagram.
Figure 3: Tensilica's compiler converts ANSI C into vectorized machine code.
Figure 4: System I/O with a single-core ConnX BBE design.
Figure 5: Hardware message passing in a dual-core ConnX BBE design.
Figure 6: Distributed shared memory in a quad-core ConnX BBE design.
Figure 7: Input chain of an LTE radio baseband.
Table 1: ConnX BBE performance for various FFT and FIR functions.
Table 2: Feature comparison of the Tensilica ConnX Baseband Engine, Ceva-XC DSP, and NXP CoolFlux BSP.

Hot-Rodding the Cortex-A8

Intrinsity Accelerates ARM's Processor With Fast14 Dynamic Logic

ARM's fastest microprocessor core keeps getting faster. Only five months ago, Texas Instruments announced a 1.0GHz ARM Cortex-A8 in future OMAP3 cellphone chips. Now Intrinsity is unveiling a 1.0GHz Cortex-A8 accelerated with dynamic logic. Intrinsity's new core, code-named Hummingbird, is functionally identical to a Cortex-A8 implemented in standard-cell static logic. Intrinsity says Hummingbird can reach 1.0GHz under worst-case conditions at 1.2V when fabricated in a 45nm-LP low-leakage process. It could exceed that clock frequency in a faster but leakier 45nm-GP process. [July 27, 2009]

Figure 1: Intrinsity Fast14 design flow.
Table 1: Estimated Cortex-A8 performance in different implementations and fabrication processes.

China Gets Right With MIPS

New Architectural Licenses Bless Godson/Loongson Processors

For the first time, the world's most populous nation has licensed the MIPS microprocessor architecture directly from MIPS Technologies. The landmark deal ends all questions about the legitimacy of China's Godson and Loongson processors — MIPS-compatible chips developed independently by Chinese engineers. The official licensee is the Institute of Computing Technology (ICT) at the Chinese Academy of Sciences in Beijing. Although ICT is a government-owned academic and research institution, the MIPS licenses are full-fledged commercial contracts, not the limited academic licenses usually granted to universities. [July 13, 2009]

Editorial: Tough Times Bring Change

Recessions and depressions are national or global in scope, like epidemics and pandemics. But among individuals, the experience varies. Most people don't lose their jobs in an economic downturn or get sick when a disease breaks out. In tough times, wealth and health accrue even more value. For people who didn't lose their money in the 1930s, the Great Depression was the Great Opportunity. The same is true of today's Great Recession. Here's an analysis of recent changes in the semiconductor industry that were accelerated, if not wholly caused, by the recession. [June 29, 2009]

Figure 1: Wind River's first-quarter revenue bookings, by market category.
Photo: The SiCortex SC5832 had 5,832 processors and 8TB of memory, delivering 8.2 teraflops.

EEMBC's Dhrystone Killer

Free CoreMark Benchmark Aims to Retire Dhrystone Forever

EEMBC's new CoreMark is a quick-and-dirty benchmarking program intended primarily for embedded processors. It isn't a substitute for the EEMBC suites, which remain a more sophisticated and comprehensive way of measuring performance. But CoreMark is free, portable, easy to use, and yields a single score that's easy to grasp. Can it finally retire the ancient Dhrystone benchmark? After some quick-and-dirty testing, Microprocessor Report found CoreMark to be a major improvement. [June 8, 2009]

Figure 1: CoreMark instruction profile (Power Architecture).
Figure 2: CoreMark instruction profile (x86).
Figure 3: Typical CoreMark results.
Table 1: CoreMark and Dhrystone scores for three Intel x86 processors, as benchmarked by MPR.

Why Apple Feels Chipper

Hiring More Custom-Chip Designers Makes Sense for Apple

Why would Apple design custom chips? A recent Wall Street Journal article alarmed critics, but Apple has good reasons for hiring more chip designers. Apple is a consumer-electronics company, not just a computer company, and custom SoCs are crucial to Apple's strategy. [May 26, 2009]

Photo: On June 29, 2007, early adopters besieged Apple Stores to buy the first iPhone. To create this kind of frenzy, Apple must differentiate its products from those of competitors.
Figure 1: Apple revenues, Q1 2009. All those little iPods add up. At $3.37 billion for the quarter, they account for the biggest share (33.2%) of Apple's revenues. Even the iPhone generates more revenue ($1.24 billion) than Mac desktops.
Figure 2: Apple's iPod unit sales, 2002-2009. Today, the iPod so dominates the audio market that it's easy to forget Apple wasn't the first to ship an MP3 player. Despite a late and relatively slow start, the iPod soared to popularity. This chart also reveals the seasonal pattern of iPod sales — a Christmas holiday cycle that's typical of consumer electronics.

Itty-Bitty 32-Bitters

Tiny 32-Bit Processor Cores Race to Replace 8- and 16-Bit Chips

Not everyone thinks Moore's law is a quota. Some CPU architects strive to design smaller and smaller microprocessor cores, bucking the trend toward larger processors. In the Lilliputian world of microcontrollers and deeply embedded systems, smaller is definitely better. This report compares the ARM Cortex-M0, Cambridge Consultants XAP5a, Cortus APS3, and Tensilica Diamond Standard 106Micro. [May 11, 2009]

Figure 1: Cortus APS3 block diagram.
Figure 2: Cortus APS3 coprocessor interface.
Figure 3: Peripheral-IP blocks for the Cortus APS3 processor core.
Figure 4: Cambridge Consultants XAP5a block diagram.
Figure 5: Tensilica Diamond Standard 106Micro block diagram.
Table 1: Feature comparison of the ARM Cortex-M0, Cambridge Consultants XAP5a, Cortus APS3, and Tensilica Diamond Standard 106Micro.
Sidebar: Gate Count? Depends Who's Counting
- Sidebar Table: Estimated gate counts when implementing the Cortus APS3 processor core in different fabrication processes.

Going Parallel With Prism

New Analysis Tool Helps Programmers Refactor Serial Code

It'll be a long time — maybe forever — before someone invents a magic compiler that transforms existing serial code into optimized parallel code. Meanwhile, hard-pressed programmers are tackling the job by hand. Now their task will be a little easier. CriticalBlue has introduced Prism, a code-analysis tool that helps developers extract thread-level and system-level parallelism from legacy programs written in sequential code. After running the target program in a software simulator that captures dynamic trace data, developers can use Prism to analyze the results in numerous ways. Most important, Prism helps developers explore various what-if scenarios so they can make intelligent decisions before rewriting any code. [April 27, 2009]

Figure 1: Prism assumes most programmers will implement multithreading with the Posix Thread API.
Figure 2: Prism analyzes dynamic trace information to identify three classic types of data dependencies.
Figure 3: Finding data dependencies with Prism.
Figure 4: Analyzing source code with Prism.
Figure 5: A multithreaded data-flow graph in Prism.
Figure 6: After the programmer forced some data-dependent operations to execute serially, Prism's data-flow graph shows all data flowing forward in time.
Figure 7: Software developers can quickly add or subtract processor cores to test a program with different multicore configurations.
Figure 8: This composite of two cropped screen photos shows the results of forcing a function to run in parallel threads on a quad-core processor.
Figure 9: Prism's hot-spot finder.
Figure 10: Prism's hot-spot finder (detail).
Figure 11: Prism's function-call graph.
Figure 12: Prism's function-call graph (detail).

Editorial: Memory — The Elephant in the Room

Memory prices have fallen so dramatically that flash-memory cards are now cheaper than film, even if you use them only once. And that comparison doesn't even add the cost of processing the film. But therein lies a paradox. Eventually, this trend will drive memory cards into obsolescence before film. Bear with me — my analysis may influence your future system designs. [March 30, 2009]

Photo: Can you spot the soon-to-be-obsolete image-recording medium in this picture? Don't be too quick with your answer.

Intel Will Customize Atom

New TSMC Collaboration Will Produce Customer-Specific x86 SoCs

Intel and TSMC have announced a new collaboration in which Intel will design customer-specific SoCs based on the Atom microprocessor core. TSMC will offer peripheral blocks for the SoC designs and manufacture the chips in its fabs. For Intel, it's the first step toward x86 licensing since the 1980s, when the company sold second-source licenses to AMD and other suppliers. Make no mistake: this deal is aimed squarely at ARM. Intel wants to push the x86 architecture into smartphones and other low-power embedded systems, which ARM dominates. Although Intel isn't close to adopting a licensing model as open as ARM's, this is still a big step for a company that guards the x86 like a family heirloom. [March 30, 2009]

Table 1: The new Intel/TSMC custom-SoC program for Atom differs markedly from processor-IP licensing models, as exemplified by ARM.

ARM's Smallest Thumb

New Cortex-M0 Is ARM's Tiniest Processor Core for MCUs

The name says it all: Cortex-M0. As in "M-Zero." Unless ARM starts naming its processors with negative numbers, the new Cortex-M0 will always be the smallest member of the growing Cortex-M family. Announced February 23, it's a 32-bit synthesizable processor core for microcontrollers and deeply embedded applications. The Cortex-M0's minimum usable configuration is a mere 12,000 gates. To put that in perspective: it's about one-third the size of the ARM7TDMI hard macro — itself a small processor core, designed in the mid-1990s when everything was smaller. Even the base configurations of customizable processor cores from ARC International, MIPS Technologies, and Tensilica aren't this tiny. Anything tinier is probably an 8- or 16-bit processor. [March 2, 2009]

Figure 1: ARM Cortex-M0 instruction set.
Figure 2: ARM Cortex-M0 block diagram.
Figure 3: Low-power state retention in the Cortex-M0.
Figure 4: Cortex-M0 power profile.
Table 1: Four 32-bit ARM processor cores suitable for microcontrollers and deeply embedded applications: the Cortex-M0, Cortex-M1, Cortex-M3, and ARM7TDMI-S.
Table 2: ARM Cortex-M0 configuration options.
Table 3: Feature comparison of the ARM Cortex-M0, ARC 605, MIPS32 M4K, and Tensilica 106Micro processor cores.

How Intel Got Big

Sole Sourcing the 386 Was Crucial, Says Harvard Business Professor

What do Intel microprocessors and Microsoft operating systems have in common with Fred Astaire and Ginger Rogers? All became more famous than the products of which they are parts, says a Harvard Business School professor. And, he says, Intel won fame by deciding in 1986 to stop licensing x86 designs to second-source manufacturers like AMD — a move that probably saved Intel from bankruptcy and radically changed the computer industry. These lessons and others are part of a case study that Professor Richard S. Tedlow teaches at Harvard Business School. However, Microprocessor Report has a few quibbles with his analysis, mainly because it doesn't go far enough. And it isn't a matter of mere historical interest. We think the changes wrought by the 386 are more relevant now than ever. As AMD struggles for survival, and after all known startups working on x86-compatible processors have crashed, Intel is nearer to capturing a worldwide monopoly of PC processors today than it was 23 years ago. [February 17, 2009]

Photo 1: Professor Tedlow presented his Intel 386 case study as a lecture at the Computer History Museum on January 26.
Photo 2: IBM's advertising campaign for the original IBM PC featured a Charlie Chaplin lookalike in "Little Tramp" costume.
Photo 3: IBM's decision to adopt the x86 for the IBM PC was a watershed for the computer industry, but it took a while for Intel to realize it.
Photo 4: In the 1980s, eight-bit CPU architectures like the 6502 were big sellers, especially in home computers from Apple, Atari, and Commodore.
Table 1: AMD's evolving role as an alterative source for x86 processors since 1981.
Sidebar: Designing the Intel 386
Sidebar: How Intel's Manufacturing Got Big (by John Novitsky and Dave House)
- Sidebar photo: Dave House was general manager of Intel's Microcomputer Group in 1985. At right is John Novitsky, who worked on the 32-bit ISA for the 386 processor.

Editorial: More Computers, Less Security

As personal computers proliferate, malicious hackers have more targets. But we're so bombarded with warnings about anonymous attacks coming from the Internet that it's easy to overlook the potential threats closer to home. Consider the physical security of a personal computer — whether it's a desktop PC, laptop PC, mobile phone, or other device. Can everyone who handles it be trusted? My recent experience with a repair shop shows that we cannot. [January 26, 2009]

Figure: This download log reveals that my computer was misused by a repair technician.

Editorial: Surviving the Busted Bubble Economy

At Microprocessor Report, we are primarily technology analysts, not market analysts, so we don't make economic forecasts. Our fellow In-Stat analysts are not so lucky. Since the economy sharply worsened in September, they have struggled to update their market forecasts for 2009 and beyond — a daunting task. Even the world's top economists have been rocked by the Wall Street meltdowns, government bailouts, and financial upheavals of recent months. It's difficult to anticipate what will happen a few weeks from now, much less months or years in the future.

Unfortunately, many people believe our now collapsed bubble economy was a normal economy. As a result, they define recovery as the restoration of bubble-level business activity. This misconception is understandable for younger folks who have known nothing but bubbles. However, even some older people have forgotten what a normal-growth economy looks like. It's time for an attitude adjustment. [December 29, 2008]

AMD's Stream Becomes a River

Parallel-Processing Platform for ATI GPUs Reaches More Systems

In December, AMD started bundling the runtime package for its ATI Stream parallel-processing platform with the latest display driver for ATI graphics processors. As users download this driver, the installed base of Stream-capable systems could swell to more than two million PCs. Before, users had to download and install the free ATI Stream runtime separately. Over the past two years, Microprocessor Report has published in-depth articles on Nvidia's CUDA, the RapidMind Multicore Development Platform, and the PeakStream Platform. All are software-development platforms for parallel processing. (PeakStream's products went off the market after Google acquired the company in 2007.) This article analyzes ATI Stream. [December 22, 2008]

Figure 1: AMD's nomenclature for an ATI GPU varies according to the target application.
Figure 2: Each SIMD engine in an ATI GPU/stream processor contains multiple thread processors, and each thread processor contains multiple stream cores.
Figure 3: ATI Stream programming model.
Figure 4: Source code written in ATI Stream's flavor of C splits into two forks: code destined for the x86 CPU and code destined for the ATI GPU.
Figure 5: The GPU ShaderAnalyzer utility for ATI Stream.
Figure 6: This example code defines a kernel function in Brook+ C.
Figure 7: An example main() function in Brook+ C.
Sidebar: OpenCL Tries to Standardize Parallel Programming

Freescale's Designer SoCs

Chipmaker Offers New Design Services�With a Catch

Freescale Semiconductor is exploring a new line of business that has interesting implications for other chipmakers. Starting now, Freescale is offering design services to customers that want a custom SoC. Freescale will offer intellectual property (IP) for the chip, will design the chip, and will manufacture the chip. The customer provides a design specification and money. At first glance, it looks as if Freescale is merely launching a design-services business, just one more design house among many. But there's a catch. Unlike most design houses, Freescale has little interest in making SoCs entirely new from the ground up. Instead, the SoC must be based on an existing Freescale standard part or use a substantial amount of Freescale's existing IP. To customize the SoC for the target application, Freescale is willing to add or remove blocks and integrate some customer-provided IP or third-party IP. [November 17, 2008]

Figure 1: Block diagram of Freescale's PowerQUICC II Pro MPC8360E.

Godson-3 Emulates x86

New MIPS-Compatible Chinese Processor Has Extensions for x86 Translation

The hottest presentation at the recent Hot Chips Symposium at Stanford University was the world's first look at the Godson-3, the latest generation of China's most powerful microprocessor family. It was the first time a Chinese CPU architect visited the U.S. to lift the bamboo curtain on a home-grown Chinese processor at a major technical conference. Among the revelations was a startling feature: more than 200 new instructions and other modifications that accelerate x86-to-MIPS dynamic binary translation. In other words, the Godson-3 applies hardware optimization to x86 emulation, much as Transmeta did with its Crusoe and Efficeon microprocessors. (The Godson-3 is also known as the Loongson-3 or Dragon-3.) [November 3, 2008]

Figure 1: Block diagram of the Godson-3's GS464 processor core.
Figure 2: Cluster of four GS464 processor cores sharing four coherent L2 caches.
Figure 3: Two GS464 clusters linked together, forming an eight-core microprocessor.
Figure 4: A massively parallel implementation of the Godson-3 could populate the on-chip mesh network with 16 or more quad-core clusters.
Figure 5: GStera coprocessor block diagram.
Figure 6: Quad-core Godson-3 die layout.
Figure 7: Two examples of x86 virtual machines running atop a MIPS version of Linux on the Godson-3.
Photo: The Godson-3 was presented at the Hot Chips Symposium by Zhiwei Xu, a professor at the Institute of Computing Technology, Chinese Academy of Sciences.

Editorial: Paperless Voting Loses Ground

The old dream of a paperless office remains alluring, but the U.S. finally appears to be awakening from its nightmare of paperless voting. Gradually, election reformers are convincing public officials that paperless electronic voting machines are too flawed to win public confidence in the most important exercise of a democracy. Although much work remains to be done, we're seeing positive change since first editorializing on this subject in 2006. (See "Undo Electronic Voting".)

Politics is beyond the purview of Microprocessor Report, but we are alert to flagrant abuses of computer technology. A bread toaster that connects to the Internet and requires periodic firmware updates may offend our engineering sensibilities, but it's also funny, in a perverse way. A black-box voting machine that determines elections by running secret source code on untested hardware behind a poor user interface — and without a paper trail — is simply perverse. [October 27, 2008]

Microprocessor Hits and Misses

Panel at Hot Chips Symposium Reviews 20 Years of Successes and Failures

(Edited by Tom R. Halfhill)

This year marked the 20th anniversary of the Hot Chips Symposium at Stanford University in Palo Alto, California, sponsored by the IEEE Technical Committee on Microprocessors and Microcomputers. To celebrate, the organizers invited six industry experts to join a discussion panel: "Ready, Fire, Aim — 20 Years of Hits and Misses at Hot Chips." They reviewed new microprocessors and architectures presented at the symposium since 1989 and attempted to sort out the successes and failures. Microprocessor Report has lightly edited the transcript of their discussion for clarity and has added comments and article references to help put some remarks into context. [October 20, 2008]

Photo 1: The discussion panel included Howard Sachs, Telairity; David Ditzel, Intel; Michael Slater, Webvanta; Nathan Brookwood, Insight64; John R. Mashey, Techvisor; and David Patterson, University of California at Berkeley.
Photo 2: Moderator Nick Tredennick.
Photo 3: Microprocessor Report founder Michael Slater.

Intel's Larrabee Redefines GPUs

Fully Programmable Manycore Processor Reaches Beyond Graphics

Intel is spreading the x86 everywhere. No longer satisfied with existing strongholds in PCs and servers, this year Intel has revived the x86 as a standalone embedded processor and has introduced the first highly integrated x86-based SoCs. And as early as next year, Intel will debut the first x86-based 3D-graphics processors. So far, graphics is the oddest addition to the x86's growing list of target applications. A 30-year-old CISC architecture designed for general-purpose processing would seem to be seriously handicapped against special-purpose GPUs, which are highly optimized for tasks like pixel shading and texture mapping. But Intel is undeterred. At Siggraph 2008 — a graphics show, not a microprocessor conference — Intel unveiled the first technical details about its future x86-based GPU, code-named Larrabee. [September 29, 2008]

Figure 1: Larrabee has a fully programmable graphics pipeline, augmented with only a little specialized logic.
Figure 2: Block diagram of Larrabee's scalar and vector instruction paths.
Figure 3: Block diagram of Larrabee's vector processing unit (VPU).
Figure 4: Graphics performance of three action games on simulated Larrabee processors with eight to 48 cores.
Figure 5: Preliminary benchmark testing on simulated Larrabee processors with eight to 64 cores.
Figure 6: Larrabee's software stack.
Figure 7: Larrabee's multithreading model.
Figure 8: Block diagram of Larrabee's on-chip network.

Intel's New SoCs

Pre-Atom Integrated Chips Face Tough Competition

The embedded-processor market resembles a wild costume party, with variety galore — from Little Bo Peep (8-bit MCUs) to the Incredible Hulk (massively parallel DSPs). Into this colorful riot wanders Intel, casually dressed by The Gap for a come-as-you-are party. Intel's first x86-based SoCs, announced July 23, are attired less appropriately than Intel would like. For now, they combine a PC processor core, a PC north-bridge chip, a PC south-bridge chip, and (optionally) a cryptography-acceleration chip. Consequently, they are relatively large and power hungry when compared with competing SoCs. But they are also fast, highly integrated, and definitely better than a system cobbled together with three or four separate Intel chips. [August 18, 2008]

Figure 1: Intel EP80579 block diagram.
Figure 2: Low-level cryptography acceleration on the Intel EP80579 with QuickAssist.
Figure 3: IPsec acceleration on the Intel EP80579 with QuickAssist.
Table 1: Summary of distinguishing features among the eight parts in the Intel EP80579 family.
Table 2: Comparison of similar networking and communications processors from Broadcom, Cavium, Freescale, and Intel.

EEMBC's MultiBench Arrives

CPU Benchmarks: Not Just For 'Benchmarketing' Any More

Imagine a world without measurements or statistical comparisons. Baseball fans wouldn't fail to notice that a .300 hitter is better than a .100 hitter. But would they welcome a trade that sends the .300 hitter to Cleveland for three .100 hitters? System designers and software developers face similar quandaries when making trade-offs with multicore processors. Even if a dual-core processor appears to be better than a single-core processor, how much better is it? Twice as good? Would a quad-core processor be four times better? The Embedded Microprocessor Benchmark Consortium (EEMBC) wants to help answer those questions. EEMBC's MultiBench 1.0 is a new benchmark suite for measuring the throughput of multiprocessor systems, including those built with multicore processors. [July 28, 2008]

Figure 1: MultiBench 1.0 introduces the concept of work items and workloads instead of kernels.
Figure 2: Screen shot of EEMBC's Workload Creator.
Figure 3: Preliminary MultiBench results on an anonymous quad-core processor.
Figure 4: Preliminary MultiBench results comparing two anonymous dual-core processors.
Table 1: EEMBC MultiBench 1.0 workloads and the existing EEMBC benchmark suites (if any) from which they were adapted.
Table 2: MultiBench composite scores and multicore scale factors for two different dual-core processors.

Editorial: Tools for Multicore Processors

We keep hearing more complaints that it's hard to write software for multicore processors because there aren't enough development tools. Not enough tools? That's like complaining it's hard to buy Chinese products because there aren't enough Wal-Marts. The real problem with multicore processors is too many development tools — and the tools are often difficult to learn and use. [July 28, 2008]

Freescale's Multicore Makeover

New QorIQ Processors Will Eventually Supersede PowerQUICC Chips

They will be powerful and quick, but they won't be PowerQUICC. Instead of using the brand name that has been a household word since 1995 — in the households of network engineers, that is — Freescale Semiconductor has unveiled a new name for its future communications processors. The new brand is QorIQ (pronounced "Core IQ"). Although the name doesn't seem like an upgrade, the chips look good. Among the first six QorIQ devices announced is the P4080, the first eight-processor multicore chip from Freescale. Some future QorIQ chips will have at least 16 cores. The PowerQUICC brand and product line aren't going away soon, but the vast majority of Freescale's new networking and communications processors will be QorIQ devices. [July 7, 2008]

Figure 1: QorIQ P4080 block diagram.
Figure 2: QorIQ P1-family block diagram.
Figure 3: QorIQ P2-family block diagram.
Table 1: Feature comparison of the six Freescale QorIQ chips announced to date: the P1010, P1011, P1020, P2010, P2020, and P4080.
Sidebar: The New, Improved Power e500mc Processor Core
- Figure: The Power Architecture embedded hypervisor.

ReadyIP Boosts FPGAs

Synplicity Tools Offer Packaged Soft-IP for FPGA Development

For several years now, Microprocessor Report has covered the trend toward implementing and deploying SoC designs in the programmable logic of FPGAs instead of in the fixed logic of ASICs. In the past, programmable-logic devices were commonly viewed as prototype platforms, not as final products. FPGA developers received a big boost recently when Synplicity unveiled its ReadyIP initiative. ReadyIP allows soft-IP vendors to package their cores in a standardized format, so FPGA developers can easily integrate the IP using system-level design tools. Optionally, soft-IP vendors can protect their ReadyIP cores with encryption that still lets developers evaluate a design before purchasing a full license. And ReadyIP isn't specific to any particular brand of FPGAs. [June 16, 2008]

Figure 1: ReadyIP design flow.
Table 1: Feature comparison of 32-bit synthesizable processors approved by their vendors for deployment in FPGAs: the Altera Nios II/f v8.0, ARM Cortex-M1, Freescale ColdFire-V1, Gaisler Research LEON3, Tensilica Diamond Standard 106Micro, and Xilinx MicroBlaze v7.
Sidebar: Freescale Offers ColdFire-V1 for FPGAs

Editorial: A Tale of Two Companies

Silicon Valley is buzzing over the final fates of two fabless-semiconductor companies: Montalvo Systems and P.A. Semi. One went bust, and the other was mysteriously acquired by Apple. The only industry gossip that wagged more tongues this spring was Yahoo's frigid response to Microsoft's takeover bid. [May 27, 2008]

Fault Tolerance for Cortex-M3

ARM Modifies MCU Core for Critical Embedded Systems

ARM is enhancing its Cortex-M3 processor core with faster clock speeds, configurable debug logic, new power-saving features, and compatibility with third-party fault-tolerance technology. All the enhancements make the Cortex-M3 even more suitable for microcontrollers, but fault tolerance is especially important for automotive, medical, and military applications. Cortex-M3 Release 2.0 is compatible with a third-party fault supervisor from Yogitech, a company based in Pisa, Italy (home of the world's most fault-tolerant tower). [May 12, 2008]

Figure 1: ARM's enhanced Cortex-M3 processor has an optional observation port called the faultRobust Diagnostic Interface that couples to Yogitech's fRCPU fault-supervisor module.
Figure 2: Diagnostics recommended for ICs rated HFT=0 by the IEC61508 norm.
Figure 3: A common way to design HFT=0 systems is to use redundant processor cores running in lockstep. A smaller diagnostic module, tightly coupled to the CPU, can provide enough diversity and safety while saving silicon and power.
Figure 4: Yogitech faultRobust-CPU (fRCPU) block diagram.
Figure 5: When Yogitech's fRCPU supervisor detects a CPU fault, it generates an error message.
Figure 6: For higher degrees of fault tolerance (HFT>0), two processor cores and two of Yogitech's fRCPU supervisors can form a dual-channel system.
Figure 7: Cortex-M3 Release 2.0 block diagram.
Table 1: The IEC has defined these standards for fault tolerance in embedded subsystems. The higher the Safety Integrity Level (SIL), the higher the subsystem's availability.

Multicore Multithreading With MIPS

New MIPS32 1004K Coherent Processing System Has Four-Way SMP

Four-bangers are the low-end motors of the automobile world, but quad-core microprocessors are currently the hot rods of computing. On April 1, MIPS Technologies made it easier for chip designers to create quad-core SoCs by introducing the industry's first licensable processor core supporting four-way symmetric multiprocessing (SMP) and chip multithreading. A full implementation of the new MIPS 1004K Coherent Processing System with four dual-threaded cores offers the virtual equivalent of eight-way SMP. [April 28, 2008]

Figure 1: Block diagram showing four-way SMP with the MIPS 1004K Coherent Processing System.
Figure 2: Coherence-manager block diagram.
Figure 3: Optional I/O coherence unit (IOCU).
Figure 4: JPEG decompression on four different configurations of the MIPS 1004K CPS.
Table 1: Comparison of dual- and quad-core MIPS 1004Kc CPS configurations with the single-core MIPS 34Kc processor.
Table 2: Feature comparison of the MIPS 1004K CPS, ARM11 MPCore, and ARM Cortex-A9 MPCore.

Intel's Tiny Atom

New Low-Power Microarchitecture Rejuvenates the Embedded x86

In-depth 10,000-word report on Intel's new Atom family of low-power x86 microprocessors, formerly known as Silverthorne and Diamondville. Although Atom still uses too much power for most traditional embedded systems, by x86 standards it's a power-performance landmark. At launch, Atom's clock frequency will range from 800MHz to 1.86GHz, yet thermal design power (TDP) is a mere 0.65W-2.4W over that range. TDP is a worst-case metric, so typical workloads will draw much less wattage. Intel estimates the "average" power at 160-220mW and idle power at 80-100mW. Even the new Isaiah microarchitecture from VIA Technologies — formerly the low-power x86 leader — can't match Atom's TDPs. Atom completely redefines the low-power x86 landscape. [April 7, 2008]

Figure 1: Atom pipeline diagram.
Figure 2: Atom block diagram.
Figure 3: Intel's page-rendering benchmarks for seven popular websites.
Figure 4: Shmoo plot of Atom's voltage-frequency curve in two low-power states.
Figure 5: Atom's power states.
Figure 6: Atom die plot.
Table 1: Intel's Atom lineup at launch.
Sidebar: Atom's System Controller Slashes Power, Too
- Figure: Poulsbo block diagram.
Sidebar: Decoding the Code Names

Editorial: Think Parallel

Multicore processors are causing much consternation in the software-development community. Traditional single-threaded programs essentially gain nothing by running on microprocessors with multiple cores. Indeed, the program might even run worse. Multicore processors are in vogue because the power-dissipation penalties of higher clock speeds force CPU architects to find alternatives. The newly popular alternative is to integrate multiple processor cores in a single chip, clock the cores at a lower frequency, and tell programmers to rewrite their software. The solution that seems to be emerging is explicitly coded data-level parallelism. [March 31, 2008]

VIA's Speedy Isaiah

New x86 Design Strikes a Different Balance of Power and Performance

In-depth 6,000-word report: VIA's new Isaiah microarchitecture is a clean-slate x86-compatible design with superscalar pipelining, out-of-order instruction processing, speculative execution, multilevel dynamic branch prediction, larger on-chip caches, and one of the fastest FPUs in the industry. In addition, Isaiah is VIA's first 64-bit x86 processor. VIA previewed Isaiah (then known as Centaur CN) in 2004, but it's only now sampling in silicon and is scheduled to debut later this year. [March 10, 2008]

Figure 1: Isaiah die plot.
Figure 2: Isaiah block diagram.
Figure 3: Isaiah's primary branch predictor.
Figure 4: VIA's PowerSaver technology with TwinTurbo PLLs.
Table 1: Isaiah's floating-point and media-processing performance.

Buy SoC IP Like MP3s

IPextreme's Core Store Sells Soft IP Online at Fixed Prices

In tech lingo, "IP" is an overloaded acronym that can mean "intellectual property" or "Internet Protocol." Now there may be a third definition: "impulse purchase." Intellectual-property vendor IPextreme has opened a retail website called the Core Store that makes buying IP for system-on-chip (SoC) development almost as easy as buying digital music. The Core Store sells synthesizable processor- and peripheral-IP cores at fixed, published prices. With a few mouse clicks, chip developers can review online documentation, buy the IP (Visa, MasterCard, and PayPal accepted), download the files, and begin working immediately. [February 11, 2008]

Figure 1: Soft IP for sale on the Core Store's home page.
Figure 2: Block diagram of a generic SoC that could be designed using National Semiconductor's fixed-price IP available at IPextreme's Core Store.

Parallel Processing With CUDA

Nvidia's High-Performance Computing Platform Uses Massive Multithreading

Nvidia's Compute Unified Device Architecture (CUDA) is a software platform for massively parallel high-performance computing on the company's powerful GPUs. Formally introduced in 2006, after a year-long gestation in beta, CUDA is steadily winning customers in scientific and engineering fields. At the same time, Nvidia is redesigning and repositioning its GPUs as versatile devices suitable for much more than electronic games and 3D graphics. For Nvidia, high-performance computing is both an opportunity to sell more chips and insurance against an uncertain future for discrete GPUs. [January 28, 2008]

Figure 1: Nvidia's CUDA platform for parallel processing on Nvidia GPUs.
Figure 2: Nvidia GeForce 8 graphics-processor architecture.
Figure 3: Three different models for high-performance computing.
Figure 4: CUDA programming example in C.
Figure 5: CUDA's compilation process.

Editorial: The Future of Multicore Processors

With the multicore era undeniably upon us, more talk is turning to the future implications of multicore processors. Of course, software development remains a big challenge, even provoking a recent article in The New York Times, of all places. But the discussion is equally spirited on the hardware side. One debate is about symmetric versus asymmetric multiprocessing. Should all the cores on a multicore chip be identical, or should some be specialized for different tasks? Another debate questions the value of core-level multithreading. How many threads make sense? In many ways, these debates echo the classic RISC versus CISC arguments of the 1990s — simplicity versus complexity, efficiency versus expediency. [December 31, 2007]

Transmeta's Second Life

$250 Million Patent Windfall From Intel Creates Opportunities

Once given up for dead, Transmeta is getting a second chance. Thanks to a $250 million settlement from Intel in a patent-infringement lawsuit, Transmeta is looking forward to a new future as an intellectual-property (IP) provider. But the company says it has no plans to resume making microprocessors. This article analyzes Transmeta's current situation, discusses Transmeta's future plans, and reviews the 11 patents that Transmeta asserted against Intel. [December 26, 2007]

Figure 1: This figure, from Transmeta's 5,493,687 patent, illustrates a technique commonly used in microprocessors with hardware-level multithreading — multiple register banks, switchable for each thread context.
Figure 2: This figure, from Transmeta's 6,226,733 patent, illustrates a method for speculatively calculating memory addresses in a microprocessor that has both memory paging and memory segmentation.
Figure 3: This figure, from Transmeta's 7,100,061 patent, is a flow chart describing one way of dynamically adjusting the clock frequency and voltage of a microprocessor to save power.
Sidebar: A chronological list of 24 Transmeta-related articles published in MPR since 1998. Many additional MPR articles have discussed Transmeta in relation to other microprocessor companies.

Altera Aims For ASICs

Altera and Synopsys Offer Nios II Processor for Standard-Cell Designs

Altera's Nios II embedded-processor core is now a triple-threat contender. Thanks to a partnership with Synopsys, developers can license the 32-bit synthesizable processor for standard-cell implementations in ASICs as well as for FPGAs and structured ASICs. Previously, Nios II was restricted to Altera's FPGAs and HardCopy II structured ASICs, although Altera occasionally made special arrangements with favored customers. Now, anyone can license Nios II for a standard-cell design flow using industry-standard design tools, including the popular electronic-design automation (EDA) tools from Synopsys. [December 17, 2007]

Figure 1: Altera's estimated performance of the Nios II/f processor when implemented as a standard-cell ASIC, as a structured ASIC, and in programmable logic.
Table 1: Feature comparison of synthesizable 32-bit embedded-processor cores marketed for deployment in programmable-logic devices: Altera's Nios II/f (v7.2), ARM's Cortex-M1, Gaisler Research's LEON3, and the Xilinx MicroBlaze v7.0.

Parallel Processing For the x86

RapidMind Ports Its Multicore Development Platform to x86 CPUs

The RapidMind Multicore Development Platform requires programmers to rewrite the data-intensive portions of their code, and it also requires the target system to run a hardware-abstraction layer between the application program and the microprocessor. In return, RapidMind claims big benefits. Some tasks run five to ten times faster, and, in some cases, performance can scale faster than the rising number of processors. In addition, the parallel code is highly portable — programmers needn't rewrite it for each new multicore processor or multiprocessor system. Previously, RapidMind's platform worked only with IBM's Cell Broadband Engine (Cell BE) and the graphics processors from AMD/ATI and Nvidia. On November 5, RapidMind announced Multicore Development Platform v3.0, which targets the popular multicore x86 processors from AMD and Intel. [November 26, 2007]

Figure 1: RapidMind's Multicore Development Platform v3.0.
Figure 2: Parallel processing with RapidMind's platform.
Figure 3: Comparison of C++ code before and after rewriting for RapidMind's Multicore Development Platform.
Figure 4: Performance comparison of serial C++ code vs. RapidMind C++ code.
Figure 5: Performance comparison of serial C++ code vs. RapidMind C++ code on x86 processors and an Nvidia GeForce 8800 GTX graphics card.
Sidebar: RapidMind Wins HPCwire Awards at SC07 Conference

MicroBlaze v7 Gets an MMU

Memory Manager Brings Full-Fledged Linux to Xilinx Processor Core

Xilinx is upgrading its MicroBlaze embedded-processor core again, this time adding an optional memory-management unit (MMU) that allows the 32-bit processor to run sophisticated operating systems supporting virtual memory. Developers can also substitute a simpler memory-protection unit (MPU) or omit supervised memory management altogether. MicroBlaze v7 has other improvements as well. New instructions provide faster floating-point performance and better I/O with coprocessors and custom logic. Xilinx has upgraded the CoreConnect interface to the latest CoreConnect Processor Local Bus (PLB) v4.6 specification, which provides faster links to on-chip peripherals. [November 13, 2007]

Figure 1: Example SoC block diagram.
Table 1: Sizes of three MicroBlaze v7 memory-management options.
Table 2: MicroBlaze v6 versus MicroBlaze v7 performance.
Table 3: Feature comparison of the Xilinx MicroBlaze v7, MicroBlaze v6, Altera Nios II, and ARM Cortex-M1 processor cores.

Atmel's Customizable MCUs

Metal-Programmable Gates Add Flexibility to ARM-Based Microcontrollers

Customizable Atmel Processors (CAPs) invert the structured-ASIC formula to preserve the good aspects (design flexibility, rapid turnaround) while avoiding the bad aspects (complex design and verification, insufficient advantages over FPGAs and standard-cell ASICs). Instead of offering a blank slate of programmable metal encompassing nearly the whole chip, Atmel's CAPs are fundamentally ARM7- or ARM9-based microcontrollers with the usual integrated peripherals and I/O interfaces. Only about 10% to 20% of the chip is reserved for a metal-programmable block. By using this block to integrate additional peripherals, application-specific logic, or even multiple processor cores, customers can transform these off-the-shelf parts into the near equivalent of a custom ASIC. [October 29, 2007]

Figure 1: Customizable Atmel Processor (CAP) die photo.
Figure 2: Size comparison of Atmel's metal-programmable gates with standard-cell gates.
Figure 3: CAP7 block diagram.
Figure 4: Block diagram of Amulet Technologies' Graphical OS in Silicon technology for Atmel's CAP.
Table 1: Soft macros available for CAPs.

ARC Encodes Digital Video

New Video Subsystems Exploit VRaptor Media Architecture

ARC International has introduced five new members of the ARC Video Subsystem, a family of digital-video encoders and decoders. ARC licenses these subsystems to customers as soft intellectual property (IP) for integration in SoCs. The ARC Video Subsystem builds on the ARC VRaptor Media Architecture introduced in 2006. VRaptor, in turn, is based on an ARC 700 32-bit embedded-processor core augmented with instruction-set extensions, SIMD media processors, communication channels, special acceleration logic, and optimized software codecs for popular audio/video standards. Until now, ARC's preconfigured subsystems could handle video decoding but not the more challenging task of encoding. [October 15, 2007]

Figure 1: ARC Video 417V block diagram.
Table 1: Feature comparison of ARC's AV 402V, AV 404V, AV 406V, AV 407V, and AV 417V video subsystems.
Table 2: A sampling of ARC's VRaptor Media Architecture.
Table 3: ARC Video Subsystem performance.
Sidebar: Future Directions: Mobile HD Video
Sidebar: Hard-Wired Video Accelerators for ASICs

Cortex-R4X: Extreme Makeover

Intrinsity's Fast14 Technology Accelerates ARM's Processor Core

In July, Microprocessor Report described a new Power Architecture processor core that Intrinsity designed for AMCC using Fast14 dynamic logic. In that collaboration, Intrinsity played the role of a design house as well as an intellectual-property (IP) provider by designing a new Power-compatible microarchitecture to AMCC's specifications. Now, Intrinsity is playing a different role for ARM. Starting with an existing microarchitecture — ARM's Cortex-R4 embedded-processor core — Intrinsity is using Fast14 to transform the synthesizable model into a hard macrocell. The result is the ARM Cortex-R4X, the extreme-makeover edition of the Cortex-R4. [September 24, 2007]

Figure 1: Power-performance envelopes of the ARM Cortex-R4 vs. the Cortex-R4X in a low-leakage 65nm CMOS fabrication process.
Figure 2: ARM's sales estimates for various types of embedded systems in 2008.
Figure 3: Comparison of a halt-propagate-generate cell implemented with Intrinsity's Fast14 1-of-N domino logic (NDL) and conventional static logic.
Table 1: Feature comparison of the ARM Cortex-R4X, ARC 750D, MIPS 24KEc, MIPS 74K, and Tensilica Diamond 570T processors.
Sidebar: ARM's Cortex-M1 for Low-Power Actel FPGAs

Editorial: Intrinsity Turns a Corner

This month's issue of Microprocessor Report has an article about ARM's Cortex-R4X processor, a new hard-macro version of the previously released Cortex-R4 synthesizable core. What's special about this particular hard core is that it uses Intrinsity's Fast14 technology — a type of dynamic domino logic that has been demonstrated to significantly improve microprocessor performance. In our July issue, we reported on another interesting collaboration between Intrinsity and AMCC. With these projects, Intrinsity appears to be successfully redefining itself as an IP provider and design shop specializing in speed-optimized embedded-processor cores. [September 24, 2007]

Freescale's Multicore Strategy

Key Components: Optimized CPU Core, Accelerators, and Interconnects

If anyone still thinks multicore chips are merely the latest technology fad, banish such impure thoughts immediately. It has become clear that chip-level multiprocessing is the only visible path toward significantly higher performance, and every leading-edge processor company has a multicore strategy. The latest company to revamp its strategy is Freescale Semiconductor. Freescale is a good case study, because the company has been selling multicore chips — of a sort — since the mid-1990s. [August 27, 2007]

Figure 1: Freescale's multicore platform block diagram.

XMOS Redefines Silicon

Software-Defined Chips Attack ASICs, ASSPs, FPGAs

XMOS Semiconductor is pushing a technology it calls "software-defined silicon." In this concept, a multicore array of general-purpose embedded-processor cores uses hardware multithreading to run the control software and application software under hard real-time constraints. At the same time, separate threads drive the chip's pins to emulate the required I/O interfaces�Ethernet, USB, UARTs, I2C, and so forth. This combination of multicore integration, deterministic multithreading, and software-defined I/O allows a general-purpose microprocessor to perform the functions of an SoC, but without custom acceleration hardware or dedicated I/O controllers. [August 6, 2007]

Figure 1: Example XMOS C (XC) code for a UART transmit function.
Figure 2: Developers can use standard C and C++ to write most software.
Figure 3: Block diagram of a dual XCore design with XLink on-chip interconnect.
Sidebar: Key People at XMOS Semiconductor

Editorial: The New PC From Hell

Last month I helped my brother purchase and set up his first new home computer in eight years. What should have been an easy job became a two-day ordeal that would be comical if it weren't such a sad commentary on today's PC industry. The villains include hardware manufacturers, software publishers, documentation writers, mass-market retailers, and corporate downsizers. All are clueless about serving their primary customers — ordinary users. And it seems to be getting worse, not better. [July 30, 2007]

AMCC's Titan Core

New Power Architecture Core Uses Only 2.5W at 2.0GHz

AMCC and Intrinsity have joined forces to create an entirely new Power Architecture processor core. Code-named Titan, the 32-bit semicustom core relies heavily on Intrinsity's Fast14 logic to reach high clock speeds (up to 2.0GHz in 90nm bulk CMOS) while consuming remarkably little power (2.5W). In addition, Titan is part of a dual-core "processor complex" that supports coherent multiprocessing. If Titan succeeds, it will admit AMCC and Intrinsity to an exclusive club formerly limited to Freescale, IBM, and P.A. Semi — the only other companies creating original Power Architecture designs. [July 23, 2007]

Figure 1: Comparing Fast-14 logic with conventional static logic.
Figure 2: AMCC Titan pipeline diagram.
Figure 3: AMCC Titan block diagram.

Cavium Stalks Storage

Coming Soon: The First Octeon Storage Processors

Cavium Networks is entering the mainstream storage-processor market with two families of Octeon chips based on the company's successful networking and communications processors. When the new storage processors debut late this year, they will bring the same high integration and programmability to networked storage systems that Cavium's existing processors have brought to routers, broadband-access devices, and many other networking products. The new Octeon Storage Services Processors will have two to twelve MIPS-compatible processor cores per chip, as much as 2MB of L2 cache per core, configurable I/O interfaces, and hardware acceleration for critical tasks. [July 16, 2007]

Figure 1: Octeon SSP CN57xx block diagram.
Figure 2: Octeon SSP sequential-read performance with iSCSI.
Table 1: Feature comparison of Cavium's Octeon CN55xx and CN57xx Storage Services Processors.

Editorial: Commodity Products Make Commodity Markets

Most companies fear commodity markets — those markets that subsist on razor-thin profit margins, providing sustenance only to the bottom-feeders. Typically, a new market opens with highly innovative products that command high profit margins. As more companies enter the fray, competition drives prices down. Eventually, the products become so plentiful and similar to each other that they become a nearly profitless commodity. That's Business 101. But I think much of the damage of commodity markets is self-inflicted. Lately I've been wondering if the spread of embedded-processor technology is partly to blame. [June 26, 2007]

Freescale's First Flexis MCUs

New 8- and 32-Bit Microcontrollers Offer Pin Compatibility

Years ago, some crazy hot-rod mechanics crammed V8 engines into their classic Volkswagen Beetles. This hardware hack wasn't easy. The huge V8 transformed a cute Bug into a kludgy monstrosity. Freescale Semiconductor wants to bring a similar upgrade to embedded systems, only without the kludge quotient. So this week, Freescale is unveiling the first microcontroller family with pin-compatible 8- and 32-bit devices. Freescale's new Flexis-family MCUs for consumer and industrial applications will allow developers to pull an 8-bit chip out of a socket, replace it with a 32-bit part, update the firmware, reboot, and continue running the system as before — except with much more horsepower. [June 26, 2007]

Figure 1: Flexis microcontroller block diagram.
Table 1: Feature summary of Flexis 8- and 32-bit microcontrollers.
Table 2: Flexis low-power modes.

MIPS 74K Performance Update

MIPS Releases Power/Performance Estimates for New Processor Core

At the recent Microprocessor Forum in San Jose, MIPS Technologies released power-consumption estimates and performance benchmarks for the new MIPS32 74K embedded-processor core. These preliminary numbers show the 74K running neck and neck with ARM's Cortex-A8. Microprocessor Report covered the MIPS 74K in detail shortly after its May 21 debut, but we overlooked some power-consumption estimates. In her Microprocessor Forum presentation, MIPS Engineering Director Vidya Rajagopalan showed the latest data for a 74Kc processor core synthesized for TSMC's 65nm GP process, using TSMC's standard-cell library and low-Vt transistors. [June 4, 2007]

Figure 1: MIPS 74K vs. MIPS 24K performance.
Table 1: Estimated MIPS 74Kc power consumption and performance.

Editorial: Unchained Melodies

Amazon.com grabbed headlines this month by announcing that it will sell music downloads unfettered by digital-rights management (DRM). Customers will be allowed to download and listen to the songs anywhere — on personal computers, portable music players, home sound systems, car stereos — and even burn copies on CDs. Amazon's announcement is trumpeted as a breakthrough for the music industry. That's funny. I remember enjoying the same freedom to make copies of music for personal use back in the analog vinyl-and-tape days. Even in the 1980s, when audio CDs introduced the world to digitized music, it was common to make cassette copies for the car and mix-tapes for parties. Amazon's "breakthrough" is more like a restoration of lost rights. [May 29, 2007]

MIPS 74K Goes Superscalar

New 32-Bit Processor Core Has Dual-Issue Out-of-Order Pipelining

It's so old, it's new again. In the 1990s, MIPS Technologies was at the forefront of RISC microprocessor design, introducing speedy workstation/server processors like the R10000 with deep superscalar pipelines and out-of-order execution. Now those features are reappearing in synthesizable embedded-processor cores. At last week's Microprocessor Forum in San Jose, California, MIPS showed that architectural acrobatics are making a comeback. MIPS introduced the MIPS32 74K, a new family of 32-bit synthesizable processor cores for demanding embedded applications. Among other tricks, the 74K uses two-way superscalar superpipelining and out-of-order execution — techniques once dismissed as too complex for lowly embedded processors. [May 29, 2007]

Figure 1: MIPS32 74K pipeline diagram.
Figure 2: MIPS32 74K processor core block diagram.
Figure 3: Preliminary results of DSP benchmark tests on the MIPS 74K, 24KE, and 24K processors.
Table 1: New instructions in the MIPS DSP Application-Specific Extension (ASE) Revision 2.
Table 2: MIPS 74K performance characteristics after speed-optimized synthesis.
Table 3: Feature comparison of the MIPS 74Kc, MIPS 74Kf, MIPS 34K, ARC 750D, ARM Cortex-A8, IBM Power 460S, and Tensilica Xtensa LX2 processor cores.

Making Chips From Thin Air

IBM's New 'Air-Gap' Technology Uses Vacuums for Low-k Dielectrics

Vacuum tubes vanished from computers decades ago, but now vacuums are making a surprising comeback. IBM is introducing a new semiconductor-fabrication technique that creates "air gaps" — actually, tiny vacuum cavities — to replace the conventional insulation around copper wiring in integrated circuits. The preliminary results are even better than with the latest low-k solid dielectrics. Lower-k dielectrics reduce the capacitive coupling between adjacent wires, thereby improving current flow. Circuit designers can leverage lower capacitance to increase the chip's clock frequency, reduce the chip's power consumption, or choose some combination of those improvements. [May 21, 2007]

Figure 1: This photograph, taken with an electron-beam microscope, shows the lattice-like atomic structure that emerges after IBM deposits its polymer material on a copper-metal layer.
Figure 2: The drawing at the bottom of this figure illustrates the nanoscale holes that allow acids to create gaps in the solid dielectric material. Above are actual photographs of the tiny cavities.
Figure 3: This electron-beam micrograph shows an oblique view of the metal layers in a chip fabricated with IBM's new air-gap technology.
Figure 4: Additional electron-beam micrographs provide startling closeups of the tiny vacuum cavities.

Preview: Microprocessor Forum 2007

Intel Headlines Conference on Multicore, Video, Graphics, and Low Power

With three keynote addresses, 20 technical presentations, and a full-day seminar on power efficiency — plus our traditional Tuesday-evening expo and party — Microprocessor Forum will celebrate its 19th anniversary this year. Dozens of companies are participating as presenters or sponsors. This event will be the only Microprocessor Forum in the U.S. in 2007, moving from its usual time in the fall to May 21-23 in San Jose, California. The only other scheduled forum is Microprocessor Forum Japan, on June 19-20 in Tokyo. [May 14, 2007]

Photo 1: At last year's Microprocessor Forum, Intel's Dileep Bhandarkar delivered a well-received presentation on future power-management technology. Microprocessor Forum is the longest-running independent technical conference on all aspects of microprocessors.
Photo 2: A Wi-Fi network allows conference attendees to download the latest versions of technical presentations and other materials. In-Stat will also make the materials available on USB flash drives.
Photo 3: The traditional Tuesday-night Expo and Demo Showcase gives attendees a chance to huddle with industry celebrities while enjoying food and drink.

Editorial: The Dread of Threads

Multicore processors are leading the computer industry into uncharted territory. There might be entire minefields of hidden software bugs we haven't considered before. Two papers I've read on this subject are disturbing, especially because they warn that we have few alternatives. One paper is "The Problem With Threads," by Dr. Edward A. Lee, chairman of electrical engineering at the University of California at Berkeley. The other paper, also authored at that university, is "The Landscape of Parallel Computing Research: A View From Berkeley," by 11 experts on microprocessor architecture. It asserts that the only path toward significantly faster CPUs is chip multiprocessing, regardless of any consequential problems with threads. [April 30, 2007]

Embedded Systems Conference Highlights

News From the ESC Exhibition Floor and Meeting Rooms in San Jose

MIPS Technologies has negotiated a landmark licensing deal with STMicroelectronics that appears to resolve a long-running dispute with China over MIPS-like derivatives of the MIPS architecture...The Power.org consortium has formed technical subcommittees to resolve differences among Power Architecture microprocessors and processor cores...ARC International announced a surprising acquisition of Teja Technologies, and we suspect there's more to this deal than ARC disclosed in its press release...NXP Semiconductor (formerly Philips Semiconductors) showed some fascinating preliminary results of tests with the power-consumption benchmarks that EEMBC introduced last year...Innovasic Semiconductor, which specializes in satisfying demand for chips discontinued by other companies, wants to clone the Intel 386 processor, which Intel recently dropped from its product catalog. [April 23, 2007]

Figure 1: ARC VRaptor block diagram.
Figure 2: NXP Semiconductor measured power consumption of its LPC3180 microcontroller using EEMBC's automotive benchmark suite and EnergyBench.
Figure 3: LPC3180 energy consumption per floating-point loop iteration, in microjoules.

Freescale Licenses Power Cores

Power Architecture e200 Processor Cores Available for IP Licensing

For the first time, Freescale Semiconductor is making some of its Power Architecture embedded-processor cores generally available as licensable intellectual property (IP). Until now, only IBM has broadly licensed Power cores to chip developers. Freescale's move strengthens the Power Architecture as an alternative to widely licensed embedded-processor cores from ARM and others. The first Freescale cores released for licensing are four members of the 32-bit Power e200 family. All are fully synthesizable and portable to virtually any digital IC process. [April 2, 2007]

Figure 1: Power e200z6 block diagram.
Table 1: Feature comparison of Freescale's licensable Power e200z0, e200z1, e200z3, and e200z6 embedded-processor cores.
Table 2: Feature comparison of six 32-bit Power Architecture embedded-processor cores available for licensing: the Freescale Power e200z6, IBM Power 460S, IBM Power 464-H90, IBM Power 464FP-H90, IBM Power 440, and IBM Power 405.
Sidebar: Freescale Outsources Licensing to IPextreme

ARM Blesses FPGAs

New Cortex-M1 Processor Core Is Optimized for FPGA Integration

In a radical departure from past policy, ARM will allow licensees to synthesize some of its embedded-processor cores in FPGAs and is optimizing these cores for programmable-logic fabrics. Until now, with one exception, ARM has permitted licensees to synthesize ARM processors in FPGAs for development purposes only, not for product deployment. At the same time, ARM is announcing its first synthesizable processor core specially designed for FPGAs: the Cortex-M1. ARM says additional FPGA-optimized cores will follow. [March 19, 2007]

Table 1: ARM Cortex-M1 instruction set.
Table 2: ARM Cortex-M1 configuration options.
Table 3: Feature comparison of seven 32-bit embedded-processor cores licensable for FPGA deployment: the ARM Cortex-M1, Altera Nios II/f, Altera Nios II/s, Altera Nios II/e, Gaisler Research LEON3, Xilinx MicroBlaze v5.0, and Xilinx MicroBlaze v4.0.

Editorial: MPR Analysts' Choice Awards

We are pleased to announce all the winners of our annual Microprocessor Report Analysts' Choice Awards for 2006. We have recognized seven companies: ARM, Ambric, Eutecus, Freescale Semiconductor, Handshake Solutions, Intel, and Planet82. One award was shared by two companies, ARM and Handshake Solutions. ARM and Intel each won two awards. Also in this month's editorial: a follow-up to our December 2006 editorial against paperless electronic voting. [February 26, 2007]

MPR Innovation Award: Eutecus

Superfast Sensor-Processors Break New Ground in Digital Imaging

Microprocessor Report is presenting an MPR Analysts' Choice Award in the Innovation category to Eutecus, Inc., for designing a digital-imaging sensor-processor architecture that can capture and analyze up to 100,000 frames per second. The company's Cellular Visual Technology combines a massively parallel processor architecture with optimized image-processing software. Some implementations use an innovative semiconductor fabrication process to bond the image sensor directly onto the parallel-processor array, creating a multilayer chip. [February 26, 2007]

Figure 1: An innovative semiconductor-fabrication process distributes thousands of indium bumps over the surfaces of the image-sensor and processor dies, bonding them together.

MPR Analysts' Choice Awards

Five Companies Make Our First Group of Winners for 2006

This week we're announcing the first group of our annual Microprocessor Report Analysts' Choice Awards. Next week we'll announce the final group of winners. For each award, we are publishing a brief article about the winning product or technology and the reasons for our choice. Five companies are in the winner's circle this week: Ambric, ARM, Freescale Semiconductor, Handshake Solutions, and Intel. We're actually handing out four awards, because two of those companies (ARM and Handshake Solutions) share an award. [February 20, 2007]

Figure 1: All winners of MPR Analysts' Choice Awards will receive a wall plaque that displays a reproduction of the MPR article announcing the award.

MPR Innovation Award: Ambric

Ambric Fits New CPU Architecture to Parallel Programming Model

Microprocessor Report is presenting an MPR Analysts' Choice Award in the Innovation category to Ambric, an Oregon-based fabless semiconductor company founded in 2003. Bucking the usual trend, Ambric designed a new microprocessor architecture by first creating an innovative programming model, then fashioning an architecture capable of efficiently executing it. Ambric's Am2045 massively parallel processor crams 360 proprietary 32-bit RISC processors and 585KB of SRAM onto a single compact die. Maximum theoretical performance exceeds one trillion operations per second at 333MHz. [February 20, 2007]

Figure 1: Partial Am2045 block diagram.

MPR Innovation Award: ARM996HS

ARM and Handshake Solutions Debut Clockless Processor Core

Microprocessor Report is presenting an MPR Analysts' Choice Award in the Innovation category to ARM and Handshake Solutions for the ARM996HS, the first commercially available 32-bit microprocessor core implemented in asynchronous (clockless) logic. ARM introduced the ground-breaking processor in early 2006. ARM's development partner was Netherlands-based Handshake Solutions, which worked closely with ARM to bring the unconventional technology to market. [February 20, 2007]

Figure 1: Chart showing the much lower peak currents of the ARM996HS processor, which reduce electromagnetic emissions.

Faster Than a Blink

Parallel Processors and Bonded Sensors Enable Ultrafast Imaging

If a picture is worth a thousand words, what are 100,000 pictures per second worth? Plenty, to anyone who can design a digital-imaging system capable of achieving such spectacular frame rates. Applications include robotic vision, intelligent video surveillance, scientific analysis of momentary events, monitoring industrial processes, interactive games, and guidance systems for unmanned vehicles and missiles. With grants from the U.S. Missile Defense Agency and the Office of Naval Research, scientists from Hungary, Spain, and the U.S. founded Eutecus Inc. and developed Cellular Visual Technology. CVT combines a massively parallel processor architecture with optimized image-processing software. Some implementations use an innovative semiconductor fabrication process to bond the image sensor directly onto the parallel-processor array, creating a stacked multilayer chip. [February 12, 2007]

Figure 1: Photo of Eutecus C-TON chip.
Figure 2: Using an innovative manufacturing technique called 3D bump-bonding, Eutecus grafts an image-sensor die onto another die containing the massively parallel processor array.
Figure 3: C-TON block diagram.
Figure 4: At the abstract level, a Eutecus sensor-processor chip resembles a multilayer cake, with arrays of different components in each layer.
Figure 5: The array processors can adjust the intensity of individual pixels, improving the photographic quality of the image.
Figure 6: C-TON layout photo.
Figure 7: Illustration of the saccadic jumps of human vision.
Figure 8: Eutecus CVT technology allows developers to mimic the saccadic jumps of human vision by applying the processor array to different parts of a high-resolution image.
Table 1: Performance measurements of basic image-processing tasks on a 100MHz C-TON processor.

Editorial: Undo Electronic Voting

Electronic voting machines are a classic example of botching a high-tech solution to a low-tech problem, thereby creating a new high-tech problem. It might be amusing if anything less than our democracy were at stake. U.S. election authorities are rushing into electronic voting without due diligence, without carefully considering the consequences, and without sufficient input from technical experts. Indeed, the situation is so appalling that I suspect almost any reader of Microprocessor Report could design better hardware and software than we have now. We don't really need electronic voting machines, but if we're forced to use them, let's at least do it right. [December 26, 2006]

The Intel 4004's 35th Anniversary

Engineers Celebrate the World's First Commercial Microprocessor

On November 15, 1971, Intel introduced the world's first standard-part microprocessor, the 4004. It was a four-bit CPU with 2,250 transistors, and it ran at a clock speed of 740kHz. Intel manufactured the chip in a 10-micron PMOS process on two-inch silicon wafers and furnished the device in a 16-pin ceramic dual-in-line package. To celebrate this historic chip, Microprocessor Report covered the anniversary event at the Computer History Museum in Silicon Valley, which reunited codesigners Ted Hoff and Federico Faggin. Our coverage includes transcripts of their presentations, their responses during an audience question-and-answer session, our own technical analysis of the 4004, a block diagram of the processor, our newly reconstructed instruction-set table, and an analysis of how the 4004 transformed the computer industry. [December 18, 2006]

Figure 1: Photo of Ted Hoff, coinventor of the Intel 4004, speaking at the 35th anniversary event at the Computer History Museum in Silicon Valley.
Figure 2: Intel's first advertisement for the 4004 in late 1971.
Figure 3: Photo of an original 4004 on a SIM-402 single-board development system that Intel sold to customers in the 1970s.
Figure 4: Photo of Federico Faggin describing the 4004's layout during the 35th anniversary event at the Computer History Museum.
Figure 5: Die photo of the 4004 with Federico Faggin's etched initials visible in the lower-right corner.
Figure 6: Photo of the desktop calculator that Busicom built with the 4004 and other 4000-family chips.
Figure 7: Intel's most widely reproduced photograph of the 4004 makes it appear the package has a cap made of wood, but it's actually a ceramic-and-gold package.
Sidebar: Analyzing the Intel 4004
- Figure: Microprocessor Report redrew this 4004 block diagram from vintage documentation, making minor modifications.
- Table: Microprocessor Report reconstructed this 4004 instruction-set table by studying vintage documentation.
Sidebar: How Microprocessors Upset the Computer Industry (by Don Alpert, Microprocessor Report Editorial Board)

Tensilica Upgrades Xtensa Cores

New Xtensa 7 and Xtensa LX2 Processors Get ECC and More

Fending off ARM's latest punches, Tensilica is introducing two new versions of its 32-bit configurable-processor cores. The biggest improvements are error-correction codes (ECC) to protect caches and local memories, an optional memory-management unit (MMU) for both processors, and several new configuration options that can boost performance, save gates, and reduce power. The enhanced processors are the Xtensa 7 and Xtensa LX2. [December 4, 2006]

Figure 1: Xtensa LX2 block diagram. Xtensa 7 has a similar microarchitecture.
Figure 2: The Xtensa LX2 offers more I/O options than Xtensa 7 does.
Table 1: Feature comparison of the new Tensilica Xtensa 7, existing Xtensa 6, new Xtensa LX2, and existing Xtensa LX processor cores.

Power.org's United Roadmap

Power Architecture Consortium Hints at Future Processors and Cores

Until now, forecasting the future of the Power Architecture (formerly PowerPC) required assembling a mosaic of individual roadmaps from different companies — some of which didn't even disclose roadmaps. That situation changed a few weeks ago, when the Power.org consortium released its first microprocessor roadmap consolidating the future plans of member companies. [November 27, 2006]

Figure 1: These three roadmaps plot the future of 64-bit Power Architecture microprocessors, processor cores, and hybrid architectures.
Figure 2: Power.org's roadmap for 32-bit chips.
Figure 3: The Power.org roadmap for 32-bit processor cores has few surprises but does indicate that IBM intends to broaden its new Power 46x line.
Figure 4: The Power.org roadmap for 32-bit hybrid/accelerated architectures.
Sidebar: IBM's New Licensable Power Cores
- Table 1: Feature comparison of the IBM Power 460S, Power 464-H90, Power 464FP-H90, Power 440, and Power 405 licensable processor cores.

Xilinx Revs Up MicroBlaze

Licensable Soft-Processor Core for FPGAs Gets Faster and Smaller

Small improvements add up. At last month's Fall Microprocessor Forum, Xilinx unveiled an enhanced version of its licensable 32-bit processor core for FPGAs. Optimized for synthesis in next-generation Virtex-5 programmable-logic devices, the new MicroBlaze v5.00 processor uses deeper pipelining and higher clock speeds to boost integer performance by as much as 25% and floating-point performance by as much as 50% over the existing MicroBlaze v4.00 core. [November 13, 2006]

Figure 1: Comparison of the MicroBlaze v5.00 five-stage pipeline with the MicroBlaze v4.00 (and earlier) three-stage pipeline.
Figure 2: An example screen shot of the Xilinx Embedded Development Kit processor-configuration tool.
Table 1: Comparison of synthesizing a typical MicroBlaze v5.00 configuration in Virtex-5 and Virtex-4 FPGAs.
Table 2: Feature comparison of the Xilinx MicroBlaze v5.00, MicroBlaze v4.00, Altera Nios II/f, Nios II/s, and Nios II/e.

ARM Thumbs a Ride

New Cortex-R4F Processor Adds FPU and ECC for Automotive Market

On average, there are 1.3 ARM processor cores per cellphone. And these days, it seems as if half the motorists on the road are yapping on their cellphones while driving. So, in a way, ARM already has a strong presence in the automotive market — though not exactly in the way the company desires. ARM wants to see more of its processors built into automobiles, not merely used in automobiles. Today, ARM's automotive design wins are based on older cores, such as the ARM7TDMI and ARM966. Newer designs need more processing power. So ARM has announced the Cortex-R4F specifically for the automotive market. [October 30, 2006]

Figure 1: Six-year forecast of semiconductors in automotive systems.
Figure 2: For efficiency, the Cortex-R4F integrates the FPU pipeline with the existing eight-stage integer pipeline.
Figure 3: Partial superscalar pipelining allows the Cortex-R4F to dual-issue some pairs of integer and floating-point instructions.
Figure 4: At synthesis time, developers can choose the granularity of error detection and correction in the Cortex-R4F.
Table 1: This table shows the number of clock cycles required to perform some common single-precision floating-point operations on the new ARM Cortex-R4F, ARM's older VFP11 FPU, Freescale's Power e200 processor core, and Infineon's TriCore 1.3 processor core.
Table 2: Small differences in configurations and synthesis scripts can have a great effect on the size and speed of the ARM Cortex-R4F, even in the same fabrication process.
Table 3: Feature comparison of the ARM Cortex-R4F, ARM Cortex-R4, ARC 625D, ARC 750D, MIPS Technologies MIPS32 24Kf, and Tensilica Xtensa 6.

Editorial: Microprocessor Confusion

Relatively few people in the world know much about microprocessors�what they are, what they do, how they work. This ignorance may seem harmless. Merely learning how to use an electronic device is challenging enough. Why should ordinary folks get bogged down in low-level technical details that couldn't possibly matter to them? Unfortunately, as microprocessors become ubiquitous, knowing something about them is becoming not only desirable but necessary. Those who are familiar with microprocessors — including everyone who writes for this newsletter and everyone who reads it — should help educate the general public about an important technology that can seem as mysterious as string theory. [October 30, 2006]

Intel Goes Quad

Quad-Core Processors and 65nm Volume Shipments Beat AMD

Intel isn't out of the dark yet, but there's light at the end of the tunnel. And no, that glow isn't the laser beam of Intel's recent experiments with silicon photonics, which is a long-term beacon. Intel needs immediate results. Wisely, the company is returning to its traditional strengths: x86 processors manufactured with the world's best high-volume fabrication technology. It's a combination that competitors have found unbeatable. On September 26, Intel announced that quad-core server and desktop processors will begin shipping in November. Both product lines are months ahead of previously disclosed schedules. [October 16, 2006]

Figure 1: Photo showing how Intel's first quad-core x86 processors package two dual-core dies in a multichip module (MCM), which Intel calls a multichip package (MCP).

Ambric's New Parallel Processor

Globally Asynchronous Architecture Eases Parallel Programming

At Fall Microprocessor Forum in San Jose, California, Ambric introduced the Am2045 massively parallel processor and architecture. This 117-million-transistor chip, fabricated in a modest 0.13-micron CMOS process, crams 360 proprietary 32-bit RISC processors and 585KB of SRAM onto a single compact die. Maximum theoretical performance exceeds one trillion operations per second (TOPS) at 333MHz. The Am2045 is designed to replace high-end embedded processors, DSPs, and FPGAs in applications that require fast general-purpose integer and digital-signal processing. [October 10, 2006]

Figure 1: This conceptual diagram shows how software objects (essentially, subroutines or groups of related subroutines) run on multiple processor cores in Ambric's massively parallel array.
Figure 2: Local channels that connect neighboring processor cores also synchronize the cores in Ambric's massively parallel architecture.
Figure 3: Ambric's proprietary software-development tools are based on the Eclipse integrated development environment (IDE).
Figure 4: A block of aStruct code, Ambric's textual source code for creating parallel structures of objects.
Figure 5: This Java source code defines a class named PrimeGen. Objects instantiated from this class can test candidate integers to determine which are true prime numbers.
Figure 6: Streaming RISC (SR) processor block diagram.
Figure 7: Streaming RISC with DSP (SRD) processor block diagram.
Figure 8: This block diagram shows a basic cluster of four processor cores, four local memories, and their associated interconnects and control structures.
Figure 9: This block diagram shows one complete bric in the center, surrounded by parts of eight adjacent brics.
Table 1: Benchmark results comparing the Am2045 with a high-end Texas Instruments DSP and a Xilinx Virtex-4 FPGA.

Number Crunching With GPUs

PeakStream's Math API Exploits Parallelism in Graphics Processors

There are dozens, if not hundreds, of microprocessor architectures in the world. And Microprocessor Report covers new ones every year. With such abundance, it might seem daffy to use highly specialized 3D-graphics coprocessors for general-purpose number crunching. But the computational allure of GPUs is proving irresistible to the scientific community, chemical engineers, defense contractors, Wall Street financiers, and other heavy-duty math junkies. PeakStream, a Silicon Valley startup, has introduced new software and development tools that make GPUs relatively easy to program for data-intensive applications. [October 2, 2006]

Figure 1: The PeakStream Platform includes a special virtual machine and a just-in-time (JIT) compiler. Programmers write application code in C++ and compile to a standard x86 binary, which embeds the function calls to PeakStream's math libraries.
Figure 2: Two examples of C++ source code. Example A is typical sequential code without using any special function libraries. Example B uses PeakStream's math library.
Table 1: Complete list of function calls in PeakStream's application programming interface (API).

Editorial: Intel's Comeuppance

To be fair, nobody should gloat over Intel's recent troubles. At one time or another, we've all been there, right? But let's be realistic. Many folks throughout the industry are not-so-secretly enjoying Intel's upheavals. They aren't trying very hard to hide their smirks and water-cooler jokes. It's the season of Intel's comeuppance, and it's been coming for a long, long time. But watch out — Intel has largely corrected its course and is now introducing some impressive new microprocessors. If anything, I expect Intel will be an even tougher competitor in the years to come. [September 25, 2006]

Preview: Fall Microprocessor Forum

Advances in Power Efficiency Is Theme of 18th Annual Fall Conference

Our theme at In-Stat's Fall Microprocessor Forum is "Advances in Power Efficiency — Addressing the Global Challenge." All developers face the same problems, whether their design uses a tiny automotive microcontroller or a mighty supercomputer processor. Surprisingly, the solutions are largely the same, too, across the design spectrum. MPF will be held on October 9-11 at the Doubletree Hotel in San Jose, California. It will be our 18th annual fall conference, and it also marks In-Stat's 25th anniversary as a leading industry-analyst firm. To celebrate, we are reviving the famous MPF chip portfolio (every paid conference attendee gets a notebook with real microprocessor chips embedded in the cover), and we have arranged a stellar lineup of presenters. [August 28, 2006]

Photo 1: A view of Spring Processor Forum last May.
Photo 2: In-Stat will provide power outlets for notebook computers and a wireless network with access to forum presentations. Presentations will also be available on USB flash drives.
Photo 3: The traditional Tuesday night expo and party is an opportunity to mingle with exhibitors and fellow attendees. The food and drinks are pretty good, too.
Sidebar: Microprocessor Forums in Japan and Europe

The New Power Architecture

Freescale and IBM Work Together and Begin Revamping PowerPC

After years of following different paths, the two key founders of the PowerPC architecture have renewed their historic collaboration. Working closely together again — now within the Power.org industry consortium — Freescale (the former semiconductor division of Motorola) and IBM are uniting their visions for the 15-year-old microprocessor architecture. Power.org has announced a new architectural definition that brings together features from both Freescale and IBM and lays the groundwork for future convergence. For the first time, all the documentation will be consolidated in a common format. And hereafter, the common architecture will be called the Power Architecture. "PowerPC" is relegated to existing products and historical references. [August 21, 2006]

Figure 1: This new logo symbolizes the unified Power Architecture.
Figure 2: This timeline shows the evolution of the PowerPC/Power Architecture since 1991.
Figure 3: Power ISA 2.03 merges features from the Freescale and IBM modifications to the original PowerPC architecture, now known as PowerPC Classic.
Figure 4: Power ISA 2.03 defines three privilege levels for software execution, but one level is optional.
Figure 5: Power ISA 2.03 register files.
Table 1: The PowerPC architecture is defined in a collection of books dating back to the original definition in 1991. Each book describes a different aspect of the architecture and has been revised over the past 15 years.
Table 2: The PowerPC extension packages introduced by Motorola and Freescale are called auxiliary processing units (APU). Power ISA 2.03 makes the APUs official by renaming them "categories" and merging them into the new definition of the Power Architecture.
Table 3: Comparison of memory-management features in PowerPC Classic 1.10 and Power ISA 2.03.

Editorial: Intel's Embedded Future

Only two weeks after AMD announced the sale of its Alchemy business unit to Raza Microelectronics (RMI), Intel announced that it's selling most of its XScale business unit to Marvell Technology Group. Both PC-processor giants are divesting embedded-processor businesses in the same month. What's going on? The obvious explanation is that AMD and Intel are refocusing on their core business — x86 processors for PCs. Certainly, both companies need to pay more attention to their foundations. But what makes sense for AMD doesn't necessarily make the same sense for Intel. [July 31, 2006]

MathStar Challenges FPGAs

New Reconfigurable-Logic Chips Have Massively Parallel Arrays

MathStar calls its device architecture a field-programmable object array (FPOA). It consists of SRAM-based programmable logic, much like a conventional FPGA, but it's programmable at a higher level of abstraction. Instead of tinkering with gate arrays, designers work with a massively parallel array of preconfigured function units. Most of these units are identical ALUs or multiply-accumulate (MAC) units that can run autonomously. Others are register files shared by the ALUs and MACs. The first FPOA device has 400 of these 16-bit units woven together in a tightly coupled interconnect fabric. [July 24, 2006]

Figure 1: Initially, MathStar has created three types of Silicon Objects: 16-bit ALUs, 16-bit multiply-accumulate (MAC) units, and 64-entry register files.
Figure 2: MathStar MOA1400D block diagram.
Figure 3: Diagrams of MathStar's on-chip interconnect fabric.
Figure 4: MathStar's development flow for FPOAs differs in important respects from development for an FPGA.
Figure 5: Block diagram of a multistream MPEG-2 decoder mapped onto the array of MathStar's 1.0GHz MOA1400D processor.

China's Microprocessor Dilemma

China Needs Affordable Computers, but Which CPU Architecture?

During a recent visit to China, Microprocessor Report learned that the country's leaders face a difficult technology decision: Which microprocessor architecture should they support in a coming wave of low-cost personal computers designed for the Chinese domestic market? The most obvious answer — the x86 architecture, already the world standard and the only platform running Microsoft Windows — isn't necessarily the best answer for China. This decision could significantly affect the direction of China's future economic growth. It's related to seemingly unrelated things, such as China's ambition for technology independence, a widening gap between rich and poor that threatens social stability, and mounting problems with urban sprawl and environmental pollution. [June 26, 2006]

Photo 1: Worsening pollution in cities like Shanghai is making the Chinese question whether an economy based on heavy industry can support the kind of progress the country needs to make.
Photo 2: In the former rural district of Pudong, across the river from central Shanghai, China has constructed a clone of Silicon Valley — complete with office parks, tree-lined boulevards, freeways, and exhibition halls.
Photo 3: Weiwu Hu in his lab at the Institute for Computing Technology, part of the Chinese Academy of Sciences in Beijing.
Photo 4: This is one prototype design of the $150 computer designed by the nonprofit One Laptop Per Child organization.
Photo 5: This photo shows the Municator computer directly in front of a large video monitor, which displays the simplified GUI for launching application programs.

Editorial: Alchemy's Third Chance

AMD is selling its Alchemy business unit to Raza Microelectronics (RMI), and we think it makes good sense for both companies — but only if the transfer includes a significant number of the original engineers. Without those alchemists, RMI will struggle to turn lead into gold. [June 26, 2006]

Covering China in Microprocessor Report
IBM's Blog for Game Developers

LSI Logic Wants Your SoC

Zevio SoC-Design Platform Has New IP for Consumer Electronics

LSI Logic has introduced a new SoC-design platform called Zevio. It consists of hardware IP, software IP, and professional design services for consumer-electronics application processors. Zevio also has emulators and prototyping systems that allow customers to write software in parallel with hardware development. Zevio is compatible with several 32-bit processor cores from ARM and MIPS Technologies, as well as the ZSP family of 16-bit DSP cores. Customers can take the finished chip design to any independent foundry or use one of LSI Logic's affiliated foundries. [June 12, 2006]

Figure 1: Chart showing that SoC-development costs are rising fast as fabrication technology moves to geometries below 0.18-micron.
Figure 2: LSI Logic created a new SDRAM controller for the Zevio platform that runs twice as fast as the AMBA 2.0 AHB and fetches data in shorter memory bursts, reading only as much data as the AHB master needs.
Figure 3: The Zevio SDRAM controller can write data to nonconsecutive memory addresses without rearbitrating the AHB.
Figure 4: Block diagram of the geometry engine in the AHB-compatible graphics core for Zevio.
Figure 5: Block diagram of LSI Logic's 3D rendering engine for Zevio.
Figure 6: Block diagram of LSI Logic's 64-channel audio engine for Zevio.

More Patents for Tensilica

Portfolio Now Includes Ten Patents Related to Configurable Processors

The U.S. Patent and Trademark Office recently issued three new patents to Tensilica for its configurable-processor technology. They follow seven related patents issued from 2002 to 2005. In addition, the patent office has reaffirmed a key Tensilica patent issued in 2002 that was anonymously challenged a year later. As a result, Tensilica now holds an impressive portfolio of at least ten patents on configurable-processor technology. [May 30, 2006]

Table 1: List of Tensilica's ten patents explicitly related to configurable-processor technology, including the patent numbers, titles, file dates, and issue dates.

Editorial: Spring Processor Forum...and Help Wanted

If you attended our recent Spring Processor Forum in San Jose, thank you! I hope you're one of the attendees who won our drawing for an Apple iPod after submitting your feedback form. (You did submit a feedback form, right?) If you didn't attend SPF, we hope you'll tell us why and consider attending our Fall Microprocessor Forum in October. [May 30, 2006]

Microprocessor Report is looking for a new editor in chief.

ARM Reveals Cortex-R4

Deeply Embedded Processor Core Inches Toward Configurability

At Spring Processor Forum in San Jose, ARM revealed the first member of its Cortex-R family — the Cortex-R4, a synthesizable 32-bit processor core for deeply embedded applications. With this debut, ARM has now introduced initial members of all three of the new Cortex families announced in 2004. The Cortex-R4 duplicates the relatively high performance and relatively low power consumption of the existing ARM9, ARM10, and ARM11 families while incorporating the latest features of the ARMv7 architecture. [May 16, 2006]

Figure 1: Growth projections for the Cortex-R4's target markets: automotive systems, hard-disk controllers, printers, home network gateways, and wireless modems.
Figure 2: Cortex-R4 block diagram.
Figure 3: Cortex-R4 pipeline diagram. At eight stages, the pipeline is one stage shorter than the ARM1156T2-S pipeline.
Figure 4: Cortex-R4 synthesized layout. Excluding memories, the core size ranges from 180,000 to 220,000 gates.
Figure 5: Looser timing for data transfers allows the Cortex-R4 to use slower, lower-power SRAM arrays for caches and tightly coupled memories.
Table 1: The Cortex-R4 takes a small step toward user configurability by offering these options in prewritten scripts for the synthesis compiler.
Table 2: ARM suggests configuring the Cortex-R4 in these ways for hard-disk controllers, chassis-level automotive systems, wireless modems, and imaging systems.
Table 3: Using the configuration options in Table 1, ARM synthesized three different versions of the Cortex-R4 with Artisan cell libraries, targeting TSMC's generic 130nm and 90nm fabrication processes.
Table 4: This comparison of two memory libraries from Artisan demonstrates why looser timing on the Cortex-R4's cache and TCM interfaces is a boon for developers.
Table 5: Feature comparison of the ARM Cortex-R4, ARM1156T2-S, ARM946E-S, ARC International ARC 625D, ARC 750D, MIPS Technologies MIPS32 4KE, Tensilica Diamond 212GP, and Tensilica Diamond 570T.

IBM Offers Chip-Level Security

SecureBlue Technology Aims to Make Security Ubiquitous in SoCs

In the digital age, embarrassing security breaches are becoming commonplace. A laptop computer with information about nearly 200,000 current and former Hewlett-Packard employees was stolen from Fidelity Investments. Flash-memory drives containing secret military intelligence were pilfered from a U.S. Army base in Afghanistan and openly sold in street bazaars. And, worst of all, Paris Hilton's cellphone address book was leaked on the Internet. IBM's Technology Collaboration Solutions Unit has an answer: SecureBlue, a new security technology for system-on-chip (SoC) devices. [May 8, 2006]

Figure 1: IBM's new SecureBlue technology adapts the security features of this IBM 4758 PCI card into licensable IP for SoCs.
Figure 2: SecureBlue block diagram.

Editorial: Microprocessor Forum China

On March 23, In-Stat and Microprocessor Report hosted our first-ever Microprocessor Forum in mainland China. It was a condensed one-day version of the three- or four-day events we've been hosting in Silicon Valley for more than 15 years. To help with logistics, we partnered with IDG China, an offspring of International Data Group, one of the first U.S. companies to establish a publishing business on the mainland. IDG's people worked closely with our Chinese analysts at In-Stat China, based in Beijing. As part of our China experiment, I traveled to Shanghai and Beijing to participate in our forum and meet with Chinese engineers and executives. [April 24, 2006]

Power Efficiency at SPF 2006

Preview: Spring Processor Forum's Theme Is Power-Efficient Design

Power consumption is the immovable object that is coercing irresistible forces like Intel, Apple, and IBM to find strategic detours. Searing wattage compelled Intel to abandon its pursuit of high clock frequencies and instead design PC processors with power-efficient cores. The same power-performance trends exerted so much gravity on Steve Jobs's reality-distortion field that Apple has abandoned PowerPC in favor of Intel's newly improved processors. And the very same immovable object persuaded IBM, Sony, and Toshiba to design the Cell Broadband Engine with a relatively simple PowerPC core surrounded by an array of power-efficient coprocessors. If the industry's heavyweights can't displace the immovable object of power consumption, but can only steer around it, what hope is there for the average line engineer designing an SoC? That's why our theme for Spring Processor Forum 2006 is power-efficient design. [April 24, 2006]

Teja's FPGA Play

New Tools Build Packet Processors Using ANSI C and FPGAs

If off-the-shelf network processors don't fit the bill, but designing a custom part is too costly or intimidating, Teja Technologies has a fresh alternative: Teja FP (FPGA Platform). It's a package of development tools, software, and hardware intellectual property (IP) that allows software engineers to build a packet processor in an FPGA without using a hardware description language (HDL) or fabricating custom silicon. With Teja FP, programmers can start with existing data-plane code written in ANSI C or write new code in that language. After profiling and analyzing the code, the next step is to partition the application. The most compute-intensive parts can execute in the FPGA's programmable-logic fabric, while other parts can run on soft processor cores synthesized in the fabric. [April 3, 2006]

Table 1: Feature comparison of the Xilinx Virtex-4 FX programmable-logic devices with which Teja FP is compatible.
Figure 1: A low-cost packet processor designed with Teja FP might use only two or three Xilinx MicroBlaze soft processor cores. Each MicroBlaze core is an engine in the packet pipeline.
Figure 2: A high-performance packet processor designed with Teja FP requires several MicroBlaze processor cores. This example has nine cores.
Figure 3: Hardware and software development with Teja FP is highly interactive. Using feedback from code profilers and actual execution in the target FPGA, developers can rapidly refine their design.

Tensilica's Preconfigured Cores

Six Embedded-Processor Cores Challenge ARM, ARC, MIPS, and DSPs

Tensilica has introduced six preconfigured versions of its 32-bit processor cores to suit an unusually broad range of embedded applications. Whereas the smallest configuration is suitable for deeply embedded microcontrollers in real-time systems, the largest configuration sets a new record for DSP benchmarks. [March 20, 2006]

Table 1: Feature comparison of Tensilica's Diamond Series cores: the 108Mini, 212GP, 232L, 570T, 330HiFi, and 545CK.
Table 2: EEMBC benchmark scores for the ARM1026EJ-S, ARM1136JF-S, and Tensilica Diamond 570T.
Figure 1: Berkeley Design Technology DSP benchmark scores (BDTIsimMark2000) for the Tensilica Diamond 545CK, Ceva-X 1620, StarCore SC1400, and ARM1136.

Freescale Strengthens Power.Org

Reunited Alliance With IBM Plans the Future of the Power Architecture

Freescale Semiconductor's long-awaited decision to join Power.org strengthens the industry alliance and will help chart the course of the Power Architecture — just in time. Recent moves by ARM, Intel, MIPS Technologies, and Sun Microsystems are strengthening the competition, too. Power.org is an open industry consortium with more than 40 corporate members whose mission is to coordinate the future evolution of the Power Architecture, more commonly known as PowerPC. In 2004, when IBM formed Power.org, the most conspicuous absentee among the 15 founding members was Motorola spinoff Freescale. [March 6, 2006]

MIPS Threads the Needle

MIPS32 34K: The First Licensable Multithreaded Processor Core

Microprocessor architects have explored many paths to high performance, including high clock frequencies, superscalar pipelines, application-specific extensions, very long instruction words (VLIW), and multicore processors. All those techniques and more are available in embedded-processor cores licensed as synthesizable intellectual property. Now MIPS Technologies is adding another option: the first licensable processor cores with hardware-enabled simultaneous multithreading. The new MIPS32 34K family consists of four 32-bit processor cores, all related to the MIPS32 24KE family. The key difference is pipelined multithreading. Instructions from as many as five different tasks can pass through the nine-stage pipeline of a 34K processor at the same time. [February 27, 2006]

Figure 1: A graphical representation of simultaneous multithreading (SMT).
Figure 2: By duplicating register files and other resources, a MIPS32 34K processor can run two operating systems and up to five thread contexts at the same time.
Figure 3: MIPS32 34K pipeline diagram.
Figure 4: Benchmarks indicate that the MIPS32 34K processor is 60% faster than a MIPS32 24KE processor when running packet-processing tests.
Figure 5: This chart shows the number of gates required for two similarly configured 34K and 24KE processor cores — the same configurations MIPS used to obtain the benchmark results in Figure 4.
Table 1: Feature comparison of the MIPS32 34Kc, 34kf, 34Kc Pro, and 34Kf Pro.
Table 2: MIPS32 34K processors add eight new instructions to the MIPS32 Release 2 instruction-set architecture.
Table 3: Feature comparison of the MIPS32 34K, MIPS32 24KE, ARC 700, ARM Cortex-A8, Tensilica Xtensa 6, and Tensilica Xtensa LX processor cores.

Can ARM Beat the Clock?

ARM Ships the First Licensable, Clockless 32-Bit Microprocessor Core

ARM has finally delivered the ARM996HS, the first commercially available 32-bit microprocessor core implemented in asynchronous (clockless) logic. ARM's development partner is Netherlands-based Handshake Solutions, which helped bring the unconventional technology to fruition. If the ARM996HS succeeds, it could spark a revolution in power-efficient processing that researchers envisioned even before microprocessors were invented. But the project still has risks. Several previous attempts to introduce a clockless 32-bit microprocessor have failed, and the ARM996HS remains unproved in silicon. [February 21, 2006]

Figure 1: Power-consumption comparison of the ARM996HS and ARM968E-S.
Figure 2: ARM996HS block diagram.
Figure 3: The ARM996HS has a memory-protection unit that can segregate different regions of memory.
Figure 4: Peak-current comparison of the ARM996HS and ARM968E-S.
Table 1: Feature comparison of the ARM996HS, ARM968E-S, ARM966E-S, ARM946E-S, ARM926EJ-S, and ARM922T.

Cavium Expands Octeon Family

Single- and Dual-Core Chips Supplement High-End Network Processors

Cavium Networks is expanding its family of Octeon network/communications processors with chips that have one or two MIPS64 processor cores, instead of as many as 16 cores found in higher-end members of the family. But the new parts aren't simply chopped-down layouts. Their features, performance, power consumption, and prices vary according to their target applications, and they introduce some entirely new Octeon features, such as USB 2.0 and voice-over-IP (VoIP) interfaces. In all, Cavium has announced 10 new Octeon chips scheduled for sampling in 1Q06 and 2Q06. [February 6, 2006]

Table 1: Differences of the Octeon CP, SCP, EXP, and NSP series.
Table 2: Feature comparison of the Cavium Octeon CN3005 CP, CN3005 SCP, CN3010 CP, CN3010 SCP, CN3110 CP, CN3110 SCP, CN3110 NSP, CN3120 CP, CN3120 SCP, and CN3120 NSP.
Figure 1: System block diagram of an Octeon-based 802.11n broadband wireless gateway with support for VoIP phones.

Cell Processor Isn't Just for Games

Innovative Chip Is Best High-Performance Embedded Processor of 2005

Deciding on our MPR Analysts' Choice Award for Best High-Performance Embedded Processor of 2005 wasn't easy. We evaluated several strong candidates before picking our winner: the Cell Broadband Engine, jointly designed by the STI alliance: Sony, Toshiba, and IBM Microelectronics. The Cell BE is destined for Sony's next-generation home videogame console, the PlayStation 3, scheduled for release later this year. [January 30, 2006]

Cortex-A8 Balances Power, Performance

ARM's Fastest Processor Wins Award for Best Processor-IP Core of 2005

Years from now, the industry may remember 2005 as the pivotal year when ARM began extending its reach from low power to high performance. In any event, we believe ARM's fastest processor to date — the Cortex-A8 — deserves our MPR Analysts' Choice Award for Best Processor-IP Core of 2005. The Cortex-A8 is ARM's first superscalar processor core, and it's the first ARM processor capable of attaining clock frequencies in the gigahertz range. It's the biggest departure in processor design for ARM since the company was founded in 1990. [January 30, 2006]

Embedded Processors Thrive in 2005

Radical Multicore Chips and Innovative Startup Companies Proliferate

This article contains our analysis of embedded-processor events in 2005 and speculation about what's to come in 2006 and beyond. We identify five broad trends in embedded processors. None of these trends actually started last year, but they gained momentum in 2005 and will be major forces in the future. For a concise summary of last year's developments, with links to related MPR articles, see the sidebar, "Embedded-Processor Highlights of 2005." [January 30, 2006]

Figure 1: Block diagram of the Chinese Godson-2 microprocessor.
Figure 2: Mercury Computer Systems introduced its Cell Technology Evaluation System in January 2006.
Figure 3: Block diagram of Actel's Fusion FPGAs.
Figure 4: Die photo of the triple-core processor that IBM designed for Microsoft's Xbox 360 videogame console.
Figure 5: Block diagram of ARM's Cortex-A8 superscalar processor core.
Figure 6: Die photo of the first eight-core XLR network processor from Raza Microelectronics.
Sidebar: Embedded-Processor Highlights of 2005

Massively Parallel Digital Video

Fabless-Semi Startup Connex Reveals New Processor Architecture

Three things in life seem certain: death, taxes, and new microprocessor architectures. Unlike the first two things, new architectures aren't necessarily bad, but they are becoming even more expensive. The latest new microprocessor architecture to emerge is unconventional, massively parallel, and optimized for the narrow domain of high-definition digital video. Although Connex Technology's architecture is applicable to other purposes — such as pattern-matching filters in security processing — digital video is the largest potential market offering an opportunity for a profitable return on investment. [January 9, 2006]

Figure 1: The Connex integral parallel architecture is based on a massively parallel array of processor cores known as processor elements (PEs). The first commercial chip has 1,024 PEs.
Figure 2: PEs are arranged in a two-dimensional array, much like other massively parallel architectures, but Connex simplified the on-chip interconnect fabric by severely limiting the connections among the PEs.
Figure 3: The Connex Machine can operate on 1,024 words of data simultaneously. These 16-bit words are arranged in a single-dimensional array or vector.
Figure 4: Connex created a proprietary version of ANSI C, known as Connex Programming Language (CPL), that adds new vector datatypes and commands.
Sidebar: The Key to Massive Parallelism: Think Small

The Oblique Perspective: Merry Virtual Christmas

Digital Music Is Great, But I Miss Album-Cover Art!

Digital music distribution allows performing artists to circumvent the obstacles of expensive recording studios, greedy record companies, and corporate chain stores. Anyone can make their music available directly to the public. And it's understandable why listeners want their music in a pure digital format that liberates bits from atoms. Eliminating the physical media and packaging strips the music down to its essence: music. However, record-album covers were more than mere packaging. Are we sacrificing something worthwhile by distributing music as digital-audio files without visual artwork? [December 27, 2005]

Photo 1: The Association's Birthday album, 1968.
Photo 2: The Beatles' Sgt. Pepper's Lonely Hearts Club Band, 1967.

Actel Releases First Fusion Chip

Highly Integrated Mixed-Signal FPGA With Flash Is an Instant SoC

Just because ASIC and system-on-chip (SoC) projects are becoming prohibitively expensive for many developers doesn't mean there's less demand for custom chips. Product differentiation and integration still matter. Hence, the rush toward alternatives to full-custom silicon, such as FPGAs, structured ASICs, and reconfigurable processors. Actel's latest alternative is the Fusion Programmable System Chip (PSC) — a new breed of FPGA that can replace SoCs with a single off-the-shelf do-it-all chip. Fusion FPGAs combine reprogrammable logic with analog and digital peripherals, analog and digital I/O, SRAM, flash memory, and optional soft processor cores (a license-free ARM7TDMI-S or 8051). [December 19, 2005]

Figure 1: Die photo of the Actel Fusion AFS600.
Figure 2: Screen photo of Actel's CoreConsole, a new soft-IP integration tool for Fusion FPGAs.
Figure 3: Screen photo of Actel's SmartGen, a new peripheral-configuration tool for Fusion FPGAs.
Table 1: Feature comparison of the first four Fusion chips: the AFS090, AFS250, AFS600, and AFS1500.

Philips TriMedia Goes Mobile

New TM3270 Is the First Low-Power TriMedia Processor Core

In early November, Philips announced the TM3270, the first low-power TriMedia core for mobile applications. Other TriMedia cores deliver high performance but consume too much power for the new wave of portable consumer-electronics products. The TM3270 uses multiple techniques to cut power consumption and has new instructions and other features targeting digital video. It supports all the latest audio/video software codecs and can fully decode D1-resolution H.264 video streams while typically consuming less than 100mW. [December 5, 2005]

Table 1: Feature comparison of the Philips TriMedia TM3270, TM3260, and TM5250 media-processor cores.
Table 2: TM3270 power consumption while running an MP3 audio decoder on a 384Kb/s bitstream (44.1kHz stereo).
Figure 1: A wide range of performance benchmarks comparing three different configurations of the TM3270 and the TM3260.
Figure 2: Among the approximately 40 new instructions in the TM3270 is a collapsed-load operation that loads five bytes from memory and performs a linear interpolation before saving the four-byte result in a 32-bit register.
Figure 3: Die photo and floor plan of the TM3270 core with a 64KB instruction cache and 128KB data cache.

Tensilica Previews Video Engine

Synthesizable Dual-Core Decoder Is Optimized for Digital Video

At the recent Fall Processor Forum in San Jose, Tensilica previewed a high-performance video-decoder engine based on two Xtensa LX configurable-processor cores. Tensilica is preconfiguring the cores by customizing them with application-specific extensions, adding local memory and other intellectual property (IP), and licensing the whole synthesizable design as a drop-in module for SoCs needing video acceleration. [November 28, 2005]

Figure 1: Block diagram of Tensilica's dual-core video-decoder engine. One Xtensa LX processor is configured as a stream processor and the other as a pixel processor.
Table 1: Tensilica measured video performance on two simulated video-decoder engines: one using a pair of base-configuration Xtensa LX cores for the stream processor and pixel processor, and another using the same processors optimized with the new video extensions.
Sidebar: Tensilica Introduces Xtensa 6 Processor Core
- Feature and performance comparison of Tensilica's Xtensa V, Xtensa 6, and Xtensa LX configurable-processor cores.

ARC Shows SIMD Extensions

New Instructions With Macros and DMA Extend ARC 700 Processor

Video is the next MP3, and any embedded processor competing for sockets in tomorrow's consumer gadgets must be able to handle digital video and audio processing. At Fall Processor Forum 2005, ARC International's chief architect, Nigel Topham, presented ARC's new SIMD extensions for digital video. These extensions are for the ARC 710D, 725D, and 750D — three preconfigured cores in the ARC 700 embedded-processor family. ARC will license the SIMD extensions as parts of larger extension packages released later this year. [November 21, 2005]

Figure 1: ARC's new SIMD instructions can execute in a closely coupled mode or a decoupled mode, depending on the program's requirements.
Figure 2: These two examples show how programmers can write the same code for closely coupled SIMD execution or decoupled SIMD execution.
Figure 3: Pipeline diagram of the ARC 750D processor with SIMD extensions.
Figure 4: Synthesized floorplan of an ARC 750D processor core with the ARCmedia Subsystem, which includes the new SIMD extensions.
Table 1: List of 104 new instructions that ARC's SIMD extensions add to the ARCompact ISA.
Table 2: ARC measured the digital-video performance of its new SIMD extensions by synthesizing an ARC 750D processor core in a Virtex-4 FPGA.

Videantis Chases Digital Video

Synthesizable Video Coprocessors Pursue Emerging Applications

Gil Scott-Heron famously said the revolution will not be televised, but now it's looking like television is the revolution. TV is appearing everywhere, it's affecting everyone's lives, everyone is watching it, and it's watching everyone. In other revolutions, heads roll; in this one, heads talk. Into this maelstrom jumps Videantis, a startup based in Hannover, Germany. At Fall Processor Forum 2005, Videantis unveiled two synthesizable video-coprocessor modules based on the same proprietary processor core. Videantis wants to license the modules and optimized software to designers building programmable video chips for high-definition television (HDTV) and mobile consumer electronics. [November 7, 2005]

Figure 1: Block diagram of the v-MP2 core.
Figure 2: Block diagram of the single-core v-MP2000M video coprocessor module.
Figure 3: Block diagram of the triple-core v-MP2000HD video coprocessor module.
Table 1: Estimated performance of the v-MP2000M and v-MP2000HD when running popular video codecs.

Z-RAM Shrinks Embedded Memory

Innovative Silicon's Tiny DRAM Cells Alter the Memory Equation

Earlier this year, a Swiss startup, Innovative Silicon, announced a new embedded-memory technology called Z-RAM, because each one-transistor bit-cell requires zero capacitors. Z-RAM exploits an inherent electrical effect of silicon-on-insulator technology to temporarily store the bit-cell's binary state. In a technical presentation at Fall Processor Forum 2005, Innovative Silicon explained how Z-RAM works and made a strong argument that it's the logical alternative for embedding memory in future microprocessors. [October 25, 2005]

Figure 1: Embedded memory already accounts for more than half the die area of typical microprocessors and SoCs, and it will soon overwhelm the silicon devoted to logic.
Figure 2: Conventional embedded DRAM (eDRAM) requires a deep-trench capacitor structure in addition to the transistor for each bit-cell.
Figure 3: How Z-RAM works. Innovative Silicon refers to positive charging as "impact ionization" and to negative charging as "hole removal."
Figure 4: Detecting the difference between a stored 0 or 1 in a Z-RAM bit-cell is similar to sensing the value of a conventional DRAM bit-cell. At the top is a schematic of the cell.
Figure 5: Z-RAM requires only one transistor, like conventional DRAM, but it doesn't need the deep-trench capacitor shown in Figure 2.
Figure 6: This micrograph shows a one-transistor Z-RAM FinFET bit-cell fabricated for test purposes.

Gordon Moore and Carver Mead

Two Pioneers Discuss Moore's Law and the Birth of an Industry

To celebrate the 40th anniversary of Moore's law, the Computer History Museum in Silicon Valley invited Dr. Gordon Moore and Dr. Carver Mead to talk about the law, reminisce about Moore's distinguished career in the semiconductor industry, and discuss other topics. On the evening of September 29, the museum's auditorium filled to capacity with an eager crowd of museum members and guests. Microprocessor Report recorded and transcribed this special event. [October 17, 2005]

Photo: Dr. Gordon Moore
Photo: Dr. Carver Mead
Photo: Moore and Mead

Philips Challenges 8-Bit MCUs

New 32-Bit ARM7 Microcontrollers With Flash Memory Start at $1.47

In the latest attempt to lure embedded-systems designers away from 8- and 16-bit MCUs, Philips Semiconductor has introduced three new 32-bit MCUs with the ubiquitous ARM7TDMI processor core. The lowest-priced part — the LPC2101 — has 8KB of on-chip flash memory and starts at only $1.47 in large volumes. That appears to be a new low price for flash-integrated ARM7 MCUs in this relatively high performance class (70MHz, 63 Dhrystone mips). The other two parts — the LPC2102 and LPC2103 — have 16KB or 32KB of on-chip flash and cost $1.85 or $2.20, respectively. All three parts are stuffed with peripherals, timers, and other accoutrements of general-purpose MCUs. In addition, Philips has included features to address the shortcomings of previous 32-bit MCUs and to duplicate some advantages of 8- and 16-bit devices. [October 10, 2005]

Figure 1: LPC210x block diagram. The only differences among the LPC2101, LPC2102, and LPC2103 are their amounts of internal flash memory (8KB, 16KB, or 32KB) and SRAM (2KB, 4KB, or 8KB).
Table 1: Power-saving modes in the new LPC2101, LPC2102, and LPC2103.
Table 2: Feature comparison of the new Philips LPC2101, LPC2102, and LPC2103; the Atmel AT91SAM7S32; and the Oki Semiconductor ML67Q406x and ML67Q500x.

Preview: Fall Processor Forum 2005

Multicore Processing Dominates 18th Annual Conference

Not since the days when RISC and VLIW challenged the CISC orthodoxy has there been such an upheaval in microprocessor design. Every major company in every major market — PCs, servers, and embedded systems — is converging on multicore processing. Microprocessor Report has provided front-line coverage of the multicore revolution since its beginnings in the 1990s. Now it's time to pull everything together for an event that covers all dimensions of multicore processing. The theme of Fall Processor Forum 2005, our 18th annual fall conference, will be "The Road to Multicore." FPF will offer technical presentations on new multicore processors, licensable intellectual property (IP) for multicore designs, on-chip interconnect technology for multicore chips, system software for multicore architectures, and software-development tools for parallel processing. [September 26, 2005]

Cavium: Security Optional

New Octeon EXP Processors Omit Internal Cryptography Engine

Cavium Networks is as closely connected with network security as Linus in Peanuts is associated with his security blanket. Cavium gained fame with its award-winning Nitrox security coprocessors in 2002 and soon will begin shipping its Octeon NSP multicore network processors with integrated security engines. Now, Cavium is tossing aside part of its security blanket — for some chips, at least. Cavium's new Octeon EXP family is virtually identical to the Octeon NSP family, except that it discards the integrated cryptography engine and related features. Octeon EXP is for customers that don't need network security at this time or prefer using a separate security coprocessor. In addition, Cavium can freely export Octeon EXP chips to countries subject to U.S. government trade controls. [September 6, 2005]

Table 1: Feature comparison of Cavium's Octeon EXP CN3630, EXP CN3830, EXP CN3840, EXP CN3850, EXP CN3860, and the Octeon NSP CN3xxx family.
Table 2: Feature comparison of Cavium's Octeon EXP family with Broadcom's SiByte BCM12xx and BCM14xx processors, Freescale's PowerQUICC III MPC8641/D processors, PMC-Sierra's RM11200 processor, and Raza Microelectronics' XLR family.

ARC Patent Looks Formidable

U.S. Patent Covers Automated Tools for Customizing Processor Cores

(With Rich Belgard)

A showdown may be looming between ARC International and archrival Tensilica over who invented the software tools and methods for customizing synthesizable microprocessor cores. Both companies have won important U.S. patents for the technology, and ARC's latest patent appears both broad and strong. Whether or not ARC and Tensilica come to legal blows, their growing patent portfolios should worry other companies working in the expanding field of configurable processors. In general, Microprocessor Report agrees with ARC that U.S. patent 6,862,563 lays claim to key technology for automating the configuration of synthesizable processors and other soft intellectual property (IP). However, the complex language and convoluted history of the patent defy easy analysis and interpretation. [August 29, 2005]

Figure 1: ARC's processor-configuration tool, now called the ARChitect Processor Configurator, runs on a PC or a Sun workstation. It has an easy graphical user interface that allows chip designers to rapidly customize an ARC processor core by choosing predefined options.
Figure 2: In this excerpt from Figure 1, ARChitect indicates how the user has configured an ARC 700 processor core.
Figure 3: Tensilica's processor-configuration tool underwent major changes last year. The predefined configuration options for the Xtensa LX processor now appear in Tensilica's integrated development environment (IDE), called Xplorer, which runs on the customer's desktop workstation.
Figure 4: ARC's '563 patent has dozens of flowcharts like this one, showing how a processor-configuration program accepts user input to customize the synthesizable core.
Figure 5: To eliminate any possible doubt about what constitutes a computer, ARC's patent describes the required components and illustrates them with this figure.
Figure 6: This example of creating a custom instruction in Tensilica Instruction Extension (TIE) language is from Tensilica's website. It explicitly promotes TIE as a higher-level alternative to standard design languages like Verilog and VHDL.

Actel Mixes Signals on FPGAs

Programmable Chips Will Integrate Analog, Digital, and Memory

As design costs soar like housing prices, ASIC alternatives are multiplying like Realtors. Actel — a second-tier FPGA vendor, behind market leaders Xilinx and Altera — is proving to be unusually creative at exploiting this opportunity. Actel has announced a technology called Fusion that, for the first time, can integrate mixed-signal logic with an embedded-processor core, flash memory, and SRAM in the same programmable-logic device. With Fusion, a single FPGA could perform some or all of the analog- and digital-processing functions in an embedded system. [August 15, 2005]

Figure 1: Fusion block diagram. Fusion will integrate hard-wired analog peripherals with hard and soft intellectual-property (IP) cores in an FPGA. A soft embedded-processor core is optional. Tying everything together is the Fusion backbone, which consists of a multidrop bus and microsequencer implemented in programmable logic.

China's Emerging Microprocessors

'MIPS-Like' Godson Chips Echo the Past, Foreshadow the Future

Beyond the land of the rising sun is the rising Godson, a growing family of microprocessors designed and manufactured in China by Chinese engineers for the Chinese domestic market. Intended for low-cost desktop computers, servers, and embedded systems, these 32- and 64-bit chips are rapidly becoming as sophisticated as any designs in the world, falling short in performance only because Chinese fabrication technology lags behind the rest of the industry by two process generations. Microprocessor Report recently interviewed Godson's chief architect, Weiwu Hu, a professor at the Institute of Computing Technology in Beijing. Weiwu described the Godson-1 and Godson-2 in unprecedented detail and revealed some of his ambitions for the Godson-3. After analyzing this information, MPR believes the Chinese already are capable of designing world-class microprocessors, if they can gain access to world-class fabrication technology. [July 25, 2005]

Figure 1: Godson-2 block diagram. China's most powerful microprocessor is patterned after the MIPS IV ISA and is similar to the MIPS Technologies R10000 processor introduced in 1995.
Table 1: The Godson-2's instruction latencies are similar to those of other high-performance RISC processors.
Figure 2: Godson-2 pipeline diagram. ICT lengthened the Godson-2 pipeline by two stages, compared with the Godson-1.
Table 2: Feature comparison of the Godson-1, Godson-2, and three important MIPS processors: the R3000, R4000, and R10000.
Sidebar: A Conversation With Godson's Father — Excerpts from our exclusive interview with Weiwu Hu, chief architect of the Godson-1 and Godson-2 and a professor at the Institute for Computing Technology, Chinese Academy of Sciences, Beijing.
- Photo of Weiwu Hu
Sidebar: China Likes the x86, Too — Even while the Chinese develop their own Godson microprocessors patterned after the MIPS architecture, they are also seeking ways to make x86-compatible processors for China's domestic market and possibly for export.

ARM Strengthens Java Compilers

New 16-Bit Thumb-2EE Instructions Conserve System Memory

In the 10 years since Sun Microsystems introduced Java, the dragon of slow run-time performance has pretty much been slain by better virtual machines, faster microprocessors, and extensions like ARM's Jazelle. Today, Java is successfully running on millions of cellphones and other embedded systems. Now, ARM is taking up another challenge: reducing code bloat when compiling Java bytecode with just-in-time (JIT) compilers or static native compilers. ARM's solution is a new variation of Jazelle called Jazelle RCT (Run-time Compilation Target) that enhances the 16/32-bit Thumb-2 instruction set to assist JIT compilers and static native compilers. [July 11, 2005]

Table 1: Jazelle RCT extends Thumb-2 with these new Thumb-2EE instructions in the ARMv7 architecture.
Figure 1: At best, a static native compiler enhanced with Jazelle RCT generated executable code only 7% larger than the original Java bytecode, according to these ARM internal benchmark tests. At worst, the same compiler generated code 44% larger — still impressive, considering that most static compilers inflate Java bytecode to several times its original size.
Figure 2: These code snippets from a Java program demonstrate how Jazelle RCT conserves memory.
Table 2: ARM's internal benchmarking indicates that an ahead-of-time (AOT) Java compiler can limit code bloat without significantly hampering performance.

PowerPC Ain't Dead Yet

Freescale's New 90nm MPC7448 Scores Highest EEMBC Benchmarks

Until now, the fastest single-core embedded processor was arguably Freescale Semiconductor's PowerPC MPC7447A, which boasted the highest EEMBC benchmark scores of any microprocessor chip. (Some specially customized versions of configurable processor cores have achieved higher EEMBC scores in simulation.) To defend its high ground, Freescale has unveiled an even faster microprocessor, the MPC7448 — previously announced but without vital details. As newly certified EEMBC scores show, the 1.7GHz MPC7448 easily beats the 1.42GHz MPC7447A. [July 5, 2005]

Table 1: Freescale's new 1.7GHz MPC7448 is the world's fastest embedded microprocessor chip, according to the latest EEMBC benchmarks.
Table 2: Feature comparison of the Freescale MPC7448, Freescale MPC7447A, Freescale MPC8641/D, AMCC PowerPC 440GX, Broadcom SiByte BCM1250, IBM PowerPC 750GX, and PMC-Sierra RM9000x2GL.

Elixent Improves D-Fabrix

D-Fabrix v2.0 Tweaks Parallel Architecture for Better Performance

Elixent took the stage at Spring Processor Forum 2005 to prove that listening to customers isn't a lost art. Using feedback from early adopters of its massively parallel configurable-processor core, Elixent has introduced D-Fabrix v2.0, which significantly boosts performance without increasing the overall gate count. [June 27, 2005]

Figure 1: D-Fabrix v1.0 added dynamic multiplexers to the switchboxes in the on-chip network fabric. The original Chess architecture that inspired D-Fabrix lacked these muxes.
Figure 2: Benchmark tests comparing the simulated performance of a D-Fabrix v2.0 processor with the baseline performance of a D-Fabrix v1.0 processor of equal size.
Sidebar: Elixent Wins Japanese Design Award

Busy Bees at Silicon Hive

New Processor Cores Target Pixel Processing and Communications

Silicon Hive's mission is to replace DSPs and hard-wired application-specific logic with programmable processors based on its unbelievably long instruction word (ULIW) architecture. OK, we're exaggerating — ULIW actually stands for ultralong instruction word. But with instruction words stretching as long as 918 bits, this architecture does seem almost unbelievable. Last month, at Spring Processor Forum 2005, Silicon Hive introduced two new ULIW processor cores, the Avispa-IM1 and Avispa-CH1. This time, the company is targeting pixel processing as well as wireless communications. [June 20, 2005]

Figure 1: Silicon Hive's Avispa-IM1 processor core includes eight of the new Generic DSP processing and storage elements (PSE).
Figure 2: The Avispa-CH1 has a second new type of function block, known as a Complex DSP PSE.
Table 1: Comparison of Silicon Hive's four ULIW processor cores: the Avispa-IM1, Avispa-CH1, original Avispa, and Avispa+.

XAP3 Takes the Stage

Synthesizable 32-Bit Processor Targets Deeply Embedded Applications

From ARM's hometown of Cambridge, England, comes a new licensable embedded-processor core — except it's not from ARM. It's from Cambridge Consultants, a 250-person engineering firm that's been around since 1960. For decades, this company has been designing electronic gadgets for customers all over the world, but it's a newcomer to selling 32-bit microprocessors. At last month's Spring Processor Forum, Cambridge Consultants presented the new 32-bit XAP3a. [June 13, 2005]

Figure 1: Layout of XAP3 register files.
Figure 2: XAP3a block diagram.
Table 1: Comparison of the Cambridge Consultants XAP3a with ARC International's ARC 600, ARM's ARM7TDMI-S, the MIPS Technologies MIPS32 4KE, and Tensilica's Xtensa LX.

White Paper:
The MIPS32 24KE Core Family

High-Performance RISC Cores With DSP Enhancements

Editor's note: This is an edited version of the white paper that MIPS Technologies submitted with its presentation at Spring Processor Forum. Microprocessor Report has added a sidebar analyzing the new MIPS32 24KE processor family. The 24KE adds DSP extensions to the high-performance 24K family of synthesizable embedded-processor cores. [May 31, 2005]

Figure 1: MIPS32 24K instruction pipeline.
Figure 2: The execution-stage datapath in the 24KEc.
Figure 3: Forwarding results of DSP instructions from general-purpose registers requires a second clock cycle.
Figure 4: Result forwarding to the next instruction without a delay is possible for useful DSP sequences.
Figure 5: The dot-product-accumulate instruction performs two multiplications and two additions, storing the results in the specified accumulator.
Figure 6: The multiplier datapath supports dual-multiply operations and a repeat rate of one MAC per cycle.
Figure 7: Minimal additional logic was added to support DSP multiply instructions.
Figure 8: Comparing performance of the 24KEc with the 24Kc on eight signal-processing tasks.
Table 1: Comparison of the 24KEc with the 24Kc to show the effects of DSP extensions on core clock frequency, gate count, and die area.
Sidebar: MIPS 24KE: Better Late Than Never (by Tom R. Halfhill)
- Table: Feature comparison of the MIPS 24KEc, 24KEf, 24KEc Pro, 24KEf Pro, 24Kc, 24Kf, 24Kc Pro, and 24Kf Pro embedded-processor cores.

Float Without Bloat

ARC Adds Economical Floating Point to Customizable Processor Cores

For years, ARC International has considered adding an optional floating-point unit (FPU) to its 32-bit customizable processor cores, but it has always been deterred by the cost of the additional logic gates and power. A fully equipped FPU with its own pipeline and register file could double or triple the silicon area of a small embedded RISC processor. At last week's Spring Processor Forum, ARC unveiled FPX — Floating-Point eXtensions — which significantly improve on the performance of a software-emulation library while requiring fewer gates than a complete FPU. [May 23, 2005]

Table 1: List of new FPX single- and double-precision floating-point instructions, with descriptions and execution latencies.
Figure 1: Results of ARC's benchmark tests with a customer's GPS application using single-precision FPX instructions.
Table 2: Comparison of floating-point options from ARC, ARM, MIPS Technologies, and Tensilica.

MicroBlaze Can Float

Xilinx Adds Floating-Point Logic to FPGA-Optimized Processor Core

Only AMD and Intel share a greater rivalry than Altera and Xilinx. These companies are the Hatfields and McCoys of the semiconductor industry. While the world's leading PC-processor vendors are shotgunning each other with double-barreled cores and 64-bit extensions, the world's leading FPGA vendors are battling over who has the biggest programmable-logic chips, the coolest design-automation tools, and the best synthesizable processor cores. Altera fired the last shot by introducing Nios II at Embedded Processor Forum 2004. This week at Spring Processor Forum, Xilinx is blazing back with MicroBlaze v4.00, the newest version of its 32-bit RISC processor core for FPGAs. [May 17, 2005]

Table 1: The tightly coupled FPU, a new option, adds only 10 instructions to the MicroBlaze v4.00 architecture, which is backward compatible with v3.00.
Figure 1: MicroBlaze v4.00 block diagram.
Table 2: Feature comparison of MicroBlaze v4.00, MicroBlaze v3.00, Altera's Nios II/f, Nios II/s, Nios II/e, ARC International's ARC 600, and Tensilica's Xtensa LX.

Storage Processor Leverages LEON

Network RAID Controller Based on Free SPARC V8-Compatible Core

How to beat the high cost of living: design a new chip around bits and pieces of LEON, a freely licensable SPARC-compatible processor core from the European Space Agency. The finished RAID controller chip is now solving space problems of a different sort by bringing affordable network-attached storage (NAS) to home and small-business users. The new LEON-derived chip is the IT3107 network storage processor from Infrant Technologies, a four-year-old privately held company in Fremont, California. This is the second RAID controller Infrant has designed using parts of LEON1, a 32-bit processor core adhering to the SPARC V8 instruction-set architecture. The European Space Agency freely distributes a synthesizable VHDL model of LEON1 under a GNU license. [May 2, 2005]

Figure 1: Infrant IT3107 block diagram.
Figure 2: Infrant's ExpandaNAS system board for NAS subsystems includes the IT3107 dual-core processor.

Preview: Spring Processor Forum

Highlights Are IBM's Cell, DSPs, IP Cores, and New Track Sessions

It's not a conference exclusively for embedded processors any more, but embedded processors and cores will nevertheless make the biggest news at this year's Spring Processor Forum (formerly Embedded Processor Forum). Innovation is running wild in the embedded industry, and SPF 2005 will be a showcase for radical multicore designs, aggressive new DSPs, new embedded-processor architectures, and much more. The forum, sponsored by In-Stat (publisher of Microprocessor Report), will be held May 16�19 at the Doubletree Hotel in San Jose, California. [April 25, 2005]

Philips Debuts Media Processor

Nexperia PNX1700 Has Award-Winning TriMedia TM5250 Core

Philips Semiconductor is only weeks away from receiving silicon samples of the first Nexperia media processor based on the TriMedia TM5250 processor core. The new high-performance chip — dubbed the Nexperia PNX1700 Connected Media Processor — is designed for streaming digital-media applications, such as video decoding for high-definition television (HDTV). The TM5250, code-named Spitfire, is a 32-bit synthesizable processor core based on the TriMedia VLIW architecture, which first appeared in 1994. [April 18, 2005]

Figure 1: Philips Nexperia PNX1700 block diagram.
Table 1: The new TM5250 processor core in the PNX1700 introduces nine new instructions to the TriMedia TMA3 instruction-set architecture.
Table 2: Descriptions of audio and video codecs to be offered by Philips and third-party IP vendors when the PNX1700 media processor ships later this year.

ARM-Based MCUs Flex Muscles

Actel, Oki, and Philips Launch Innovative 32-Bit Microcontrollers

ARM-based chips continue to gain strength in the fast-growing microcontroller market. At the recent Embedded Systems Conference, Actel announced FPGAs with specially optimized ARM7 processors encrypted in programmable logic; Oki Semiconductor unveiled the world's smallest ARM-based MCUs; and Royal Philips Electronics introduced the first ARM9-based MCUs fabricated in a 90nm CMOS process. All these milestones provide more reasons to upgrade from 8- or 16-bit MCUs to 32-bit ARM-based devices. [April 4, 2005]

Figure 1: Actel's security technology allows developers to encrypt and embed their IP in a ProASIC3 FPGA alongside the embedded ARM7TDMI-S processor core.
Figure 2: Oki Semiconductor uses a proprietary manufacturing and packaging process to redistribute pins over the entire mounting surface of its wafer-level chip-size package (WCSP).
Figure 3: Philips LPC3000 block diagram. Is it an MCU or an SoC?

Freescale Quickens PowerQUICC

New PowerQUICC II Pro Chips Have Two Auxiliary Processors

It's been 10 years since Motorola created the PowerQUICC family of communications processors by substituting PowerPC cores for the 68360 CPUs in the original QUICC family. Now, Motorola spinoff Freescale is again improving the family with additional auxiliary processors, on-chip peripherals, I/O interfaces, and networking accelerators. The new chips belong to the PowerQUICC II Pro series and are called the MPC8360E and MPC8358E. Freescale will also offer them without integrated security engines as the MPC8360 and MPC8358. [March 21, 2005]

Figure 1: Block diagram of Freescale's PowerQUICC II Pro MPC8360E.
Table 1: Comparison of the new QUICC Engines in the PowerQUICC II Pro MPC8360E and MPC8358E with the communications processor module (CPM) in existing PowerQUICC chips.
Table 2: Feature comparison of the Freescale PowerQUICC II Pro MPC8360E, Freescale PowerQUICC II Pro MPC8358E, Freescale PowerQUICC II Pro MPC8349E, Freescale PowerQUICC III MPC8541E, AMD Alchemy Au1550, and Intel IXP465.

ARC's Preconfigured Cores

Six "New" Processors Are Derivatives of ARC 600 and ARC 700

ARC International has introduced six embedded-processor cores based on its configurable ARC 600 and ARC 700 families. The "new" 32-bit synthesizable processors are actually preconfigured cores, optimized for embedded applications in the low-power and high-performance realms. Although chip designers can further customize the cores for specific applications, the ready-made configurations are intended to accelerate design projects and allow easier comparisons with competing processors. The six preconfigured cores are the ARC 605, 610D, 625D, 710D, 725D, and 750D. All but one (the 605) have DSP extensions. [March 14, 2005]

Figure 1: Comparison of performance when executing a 256-point FFT on an ARC 6xx or 7xx processor core with ARC's standard DSP extensions, Advanced XY DSP extensions, and Advanced XY extensions with DSPlib function library.
Table 1: Feature comparison of the ARC 605, ARC 610D, ARC 625D, ARC 710D, ARC 725D, ARC 750D, ARM946E-S, MIPS32-4KE, and Tensilica Xtensa LX.
Sidebar: ARC Wins Key U.S. Patent

EEMBC Expands Benchmarks

New Digital Entertainment Suite Tests Audio, Video, Cryptography

After two years of labor, the Embedded Microprocessor Benchmark Consortium (EEMBC) has delivered its largest new test suite since introducing its original benchmarks in 2000. Indeed, the new Digital Entertainment suite (DENbench) has more tests than all five suites of the EEMBC 1.0 benchmarks put together. In all, there are 69 new tests, though most are alternative datasets for a smaller number of basic tests. They are grouped into four smaller suites that are useful for a broad range of applications, from consumer electronics to secure communications and digital rights management (DRM). [February 22, 2005]

Table 1: List of the minisuites and tests in the DENbench suite.
Table 2: Peak signal-to-noise ratio measurements for MPEG encoding and decoding on four benchmarked processors: the AMD Geode NX1500, Analog Devices Blackfin ADSP-BF533, Freescale PowerPC MPC7447A, and IBM PowerPC 750GX.
Figure 1: One MPEG encoding test uses radar images of this mysterious "face" on Mars, as seen by a Viking orbiter.
Sidebar: Few Surprises In First DENbench Scores
- Table: Aggregate scores in all four DENbench minisuites, plus the floating-point MPEG-2 encoding tests, for the AMD Geode NX1500, Analog Devices Blackfin ADSP-BF533, Freescale PowerPC MPC7447A, and IBM PowerPC 750GX.
- Figure: Performance-per-megahertz comparison of the AMD Geode NX1500, Analog Devices Blackfin ADSP-BF533, Freescale PowerPC MPC7447A, and IBM PowerPC 750GX.

Cavium Expands Security

New Communications Processors Have Nitrox Crypto Engines

Cavium Networks made a name for itself with security processors — timely products for an insecure world. More recently, the company has been introducing communications processors with security engines, a subtle but strategic shift. By integrating both communications and security, a single chip can do the job of two. Before long, security acceleration will be as common as caches in all types of microprocessors. The latest Cavium chips are the Nitrox Soho CN220 and CN225 secure communications processors, which incorporate the GigaCipher security engine found in Cavium's discrete Nitrox security chips. [February 7, 2005]

Figure 1: Block diagram of Cavium's Nitrox Soho CN225 secure communications processor.
Table 1: Feature comparison of Cavium's Nitrox Soho CN220 and CN225, AMD's Alchemy Au1550, Freescale's PowerQuicc II Pro MPC8343E, Intel's XScale IXP465, and PMC-Sierra's MSP2020 Multiservice Processor.

Multicore Chips Rule in 2004

High-Performance Embedded Processors Lead CPU Evolution

PC processors boast the highest clock speeds. Server processors have the fattest caches. But unsung embedded processors are at the forefront of microprocessor evolution. While the PC market is agog at dual-core 64-bit processors, the embedded market already takes such chips for granted and will deliver processors with four, eight, and sixteen 64-bit cores this year. Only the need for power efficiency restrains embedded chips from matching the high clock frequencies of PC processors and the bloated transistor budgets of the biggest server processors. Our year-end review of high-performance embedded processors finds that innovation in this market continued to accelerate in 2004. We also name the nominees and winner of our Microprocessor Report Analysts' Choice Award for Best High-Performance Embedded Processor of 2004. [January 31, 2005]

Sidebar: Best High-Performance Embedded Processor: Broadcom BCM1480

Best Processor Cores of 2004

Processor-IP Cores and Tools Grow More Versatile, Sophisticated

Greater performance, architectural enhancements, and improved design tools marked the progress of intellectual-property (IP) embedded-processor cores in 2004. Although the number of companies competing in this tight market remained static, one company booted its CEO, hired a new executive management team, and changed its strategy (again). Another company announced a mind-numbing barrage of new products and spent nearly a billion dollars on a binge of acquisitions. And the biggest competitor announced plans to greatly expand its licensing strategy. Here is an alphabetical review of the leading processor-IP companies — ARC International, ARM, IBM Microelectronics, MIPS Technologies, and Tensilica — and the most important news they made in 2004. We also name the nominees and winner of our Microprocessor Report Analysts' Choice Award for Best IP-Core Processor of 2004. [January 24, 2005]

Sidebar: Best IP-Core Processor: Tensilica's Xtensa LX

New Patent Reveals Cell Secrets

IBM, Sony, Toshiba Develop Secure Parallel-Processing Architecture

No microprocessor since Intel's Merced has stirred as much curiosity as the Cell processor under development by IBM Microelectronics, Sony, and Toshiba. Partly it's because Cell is destined for Sony's much anticipated PlayStation 3 videogame console, due in 2006. But Cell isn't meant just for fun and games. It's also intended for professional graphics workstations and other computing devices, which makes people wonder what kind of magic will be bottled in the chips. Tantalizing details will trickle out in February, when IBM presents several papers about Cell at the International Solid-State Circuits Conference (ISSCC) in San Francisco. Until then, nothing beats a weighty 57-page patent issued to IBM, Sony, and Toshiba by the U.S. Patent and Trademark Office on October 29, 2004. Patent 6,809,734 describes the Cell architecture in detail, with 42 pages of illustrations. [January 3, 2005]

Figure 1: This drawing from the '734 patent shows the way packages of program code and data, called software cells, can migrate among different Cell-based systems for distributed processing.
Figure 2: A processor element (PE) is the basic building block of a Cell processor, as shown in this figure from the '734 patent.
Figure 3: Inside look at an APU, the functional equivalent of a processor core, as shown by another figure from the '734 patent.
Figure 4: The shared DRAM attached to each PE of a Cell processor is divided into protected regions of memory called sandboxes, as shown in this figure from the '734 patent.
Figure 5: The Cell architecture's "software cells" have a complex format, as this figure from the '734 patent illustrates.
Figure 6: This figure from the '734 patent illustrates the way a slower Cell processor (top) and a faster Cell processor (bottom) can allocate "time budgets" in their APUs to execute time-critical tasks in the same amount of real time.

Viewpoint: The Mythology of Moore's Law

Why Such a Widely Misunderstood 'Law' Is So Captivating to So Many

Moore's law gets more attention all the time. Google finds 223,000 hits for the term on the Internet, remarkable for something as arcane as semiconductor chip manufacturing. People who can't tell a silicon wafer from a compact disc don't hesitate to name-drop Moore's law at business lunches and parties, usually in the context of whether Intel stock is a good buy. Not since a falling apple led Sir Isaac Newton to discover universal gravitation have so many people been so captivated by a scientific law. Yet Moore's law isn't really a law in the formal sense, and it isn't scientific. As we approach the 40th anniversary of Moore's law in 2005, it's time to set the record straight. [December 13, 2004]

Figure 1: Intel's graph of Moore's law tracks the actual progress of Intel microprocessors, not the predicted path of the law.
Table 1: Comparing three different versions of Moore's law to the actual transistor counts of Intel's latest microprocessors.

ARM's Asynchronous Handshake

Handshake Solutions Designs Asynchronous ARM9 Processor Core

From its roots in the 1950s, asynchronous logic has captivated circuit designers who yearn to break the bonds of clock-timed logic and create free-running processors that work at their own pace. It's been done many times, in many different ways, but conventional synchronous technology is too entrenched. Now, ARM and Handshake Solutions (a line of business within Royal Philips Electronics in the Netherlands) think conditions are changing in favor of asynchronous logic. Handshake Solutions has been working closely with ARM to design a fully asynchronous ARM9 processor core that ARM will license commercially in 1Q05. It will be the first commercial asynchronous 32-bit microprocessor. [November 29, 2004]

Figure 1: Infrared thermal photographs of actively powered hot spots in Handshake Solutions' 8-bit 80C51-compatible microcontroller and a conventional 80C51.
Figure 2: Diagram of Handshake Solutions' four-phase single-rail asynchronous signaling.
Figure 3: Example of Handshake Solutions' proprietary Haste language for designing asynchronous logic at the behavioral level.
Sidebar: For More Information About Asynchronous Logic

Embedded CPUs Zoom At FPF

Speedy Multicore Chips Dominate Embedded-Processors Session

CPU cores in embedded processors are multiplying like rabbits and sprinting even faster. Four of the six presentations in the high-performance embedded session at Fall Processor Forum (FPF) 2004 described impressive new multicore designs. One new product family integrates up to 16 cores on a single chip. Clock speeds are soaring to 1.8GHz and beyond. What's going on? Networking and telecommunications. Although some other embedded applications require high-performance processors, the growing demands of packet routing, control-plane processing, and wireless infrastructures are forcing chip vendors to push their designs farther than ever before. [October 25, 2004]

Table 1: Feature comparison of 13 new high-performance embedded processors from AMCC, Broadcom, Cavium Networks, Faraday, Freescale, and PMC-Sierra.
Figure 1: Block diagram of Freescale's PowerPC 440SPe storage I/O processor.
Table 2: Feature comparison of Broadcom's new SiByte processors: the BCM1255, BCM 1280, BCM1455, and BCM1480.
Figure 2: Block diagram of Broadcom's SiByte BCM12xx/14xx with two or four MIPS64 cores.
Table 3: The trade-offs of using Faraday's NetComposer structured ASICs.
Figure 3: Block diagram of Faraday's NetComposer-II (NC-II).
Figure 4: Block diagram of Freescale's MPC8641D with dual PowerPC e600 cores.
Figure 5: Block diagram of PMC-Sierra's RM11200 with dual MIPS64 cores.

IBM Makes Designer Genes

BlueGene/L Supercomputer Processor Inspired by Embedded SoCs

Designing the world's fastest supercomputer by drawing inspiration from embedded processors seems like imitating a Vespa when building a Formula 1 racer. As we've seen in the last few years, however, embedded processors are blazing the trail for advanced design strategies. So perhaps it's no surprise that IBM Microelectronics would pattern a new supercomputer processor after an embedded system-on-chip (SoC), even to the point of recycling a five-year-old processor core previously found only in embedded parts. The new dual-core supercomputer processor springing forth from the embedded gene pool is called BlueGene/L. It's destined for an awesome supercomputer of the same name, which will harness the power of 65,536 processor chips containing 131,072 PowerPC processor cores. [October 11, 2004]

Figure 1: Components of the BlueGene/L supercomputer for Lawrence Livermore National Laboratory.
Figure 2: Block diagram of the BlueGene/L microprocessor, which is based on the PowerPC 440 processor core.
Figure 3: Each BlueGene/L processor is a node in five independent networks.
Table 1: Sizes and power consumption for various cell blocks in the BlueGene/L microprocessor.

Cavium Branches Out

New Networking Processors Integrate 2�16 MIPS Cores per Chip

Already a well-regarded vendor of security processors, Cavium Networks is moving in a bold new direction. The company's new Octeon family of networking processors integrates three important functions in a single chip: packet processing, content filtering, and security. To provide enough muscle for all that heavy lifting, at line rates up to 10Gb/s, Octeon chips will have as many as 16 MIPS-compatible 64-bit processor cores, augmented by numerous coprocessors. [October 5, 2004]

Figure 1: Octeon block diagram.
Table 1: Feature comparison of Cavium's Octeon family: the CN3420, CN3430, CN3840, and CN3860.

Preview: Fall Processor Forum

Deluge of Multicore Processors for PCs, Servers, Embedded Systems

If two heads are better than one, microprocessors are about to become twice as smart. Dual-core x86 processors for PCs and servers are coming soon from AMD and Intel, along with their transition to the x86-64 architecture. It's the biggest step in x86 evolution since the migration from 16 bits to 32 bits during the 1980s. Meanwhile, high-performance embedded processors and digital signal processors (DSP) are evolving at an even faster rate. Networking chips with as many as 16 processor cores are making their debut, along with massively parallel processors that squeeze hundreds of cores onto a single chip. All that and more is happening at Fall Processor Forum (FPF), formerly known as Microprocessor Forum. FPF will be held October 4-6 at the Fairmont Hotel in San Jose, California. [September 20, 2004]

Sidebar: Introducing Processor Forum Taiwan

ARM Extends Its Reach

Artisan Acquisition Vastly Expands ARM's Semiconductor IP Portfolio

Business analysts and investors are still debating whether ARM's whopping $913 million acquisition of Artisan Components makes financial sense, but from a technology standpoint, it launches ARM into a whole new realm. Among other things, it erases all doubt that ARM is becoming a start-to-finish provider of semiconductor intellectual property, not just a vendor of embedded microprocessor cores. [September 7, 2004]

Another Tale of Two Instructions

More Mystery-Shrouded History of the x86 Architecture Uncovered

Digging into the past of the x86 architecture is like archaeology: You can never be sure what you'll find, but it's often surprising. So it goes with the LAHF and SAHF instructions, which AMD originally dropped from the 64-bit AMD64 architecture, then restored after discovering some software still needs them. (See MPR 7/19/04, "A Tale of Two Instructions".) We reported that Intel first introduced LAHF and SAHF in the 16-bit 286 processor of 1982, mainly to speed up context switching for operating systems. So imagine our surprise when a sharp-eyed reader from Germany took issue with our version of the historical record. [September 7, 2004]

Benchmarking the Benchmarks

Ever Controversial, Embedded-CPU Benchmarks Make Fitful Progress

We live in a season of divisive partisan politics: endless bickering, blame games, finger pointing, strident propaganda, arguments over strategy, and embarrassing scandals. And that's just the politics of microprocessor benchmarking. This 10,000-word in-depth article analyzes the state of benchmarking for embedded processors, paying particular attention to the Embedded Microprocessor Benchmark Consortium (EEMBC) and a controversial newcomer, Synchromesh Computing's Embedded Processor Rating System (EPRS), also known as the AMD Performance-Power Ratings. [August 30, 2004]

Table 1: The original EEMBC 1.0 benchmark suites, as introduced in 2000.
Table 2: New or improved benchmark suites EEMBC has introduced since April 2000 or will introduce shortly, including the networking 2.0 suite, the Java 2 Micro Edition (J2ME) suite, the 8/16-bit microcontroller suites, and the digital entertainment suite.
Table 3: Synchromesh Computing's EPRS benchmark suites.
Figure 1: Synchromesh Computing published these benchmark results in its white paper written for AMD. VIA's 533MHz Samuel-2 processor is compared with AMD's Geode GX 466 and Geode GX 533 processors.

Toshiba's New MIPS64 Family

TX99-Series Chips Aim for High-Performance Embedded Systems

It's been a relatively quiet year for MIPS-compatible processors, but Toshiba is making waves with a new family of high-performance embedded processors based on an enhanced MIPS64 core. The first member of the family is the TX9956CXBG, which has Toshiba's new TX99/H4 64-bit core, jointly developed with MIPS Technologies. It's a step up from Toshiba's existing 64-bit MIPS processors, because the TX99/H4 is Toshiba's first MIPS64-compatible core with superscalar pipelines and clock speeds beyond 500MHz. It's also the first chip in its class, from any vendor, to be manufactured in a 90nm IC process. [July 26, 2004]

Table 1: Feature comparison of Toshiba's new TX9956CXBG, Toshiba's existing TMPR4956CXBG, AMCC's PowerPC 440GX (recently acquired from IBM Microelectronics), IBM's PowerPC 750GX, Intrinsity's FastMath, Freescale's MPC7447A, and PMC-Sierra's RM7900.

A Tale of Two Instructions

Why AMD64 Lost, Then Regained, Two Minor x86 Instructions

Everyone has experienced the woe of cleaning out a closet and discarding something we needed later. Maybe it was something trivial, like a Pet Rock. Maybe it was something important, like a Pete Rose rookie card. Or maybe it was something both trivial and important, like the SAHF and LAHF instructions in the x86 microprocessor architecture. The strange story of the death and resurrection of these instructions is a classic example of the reason the x86 architecture has grown so complex over the past 26 years. [July 19, 2004]

For More Information: Web links to AMD's five-volume set of AMD64 programmer's manuals, Intel's two-volume set of EM64T programmer's manuals, AMD's application note for the CPUID instruction, and Intel's application note for CPUID.

Tensilica's Automaton Arrives

New Design Tool Creates CPU Extensions From C/C++ Programs

What's even faster and cheaper than outsourcing a design project to India? Answer: outsourcing it to a robot. Or, actually, to a new processor design tool that automatically generates application-specific custom instructions by analyzing software written in plain-Jane C/C++. Last week, Tensilica announced that its long-anticipated XPRES (Xtensa PRocessor Extension Synthesis) tool will ship in 3Q04. Microprocessor Report first covered XPRES after Tensilica disclosed the then-unnamed technology at Embedded Processor Forum 2003. [July 12, 2004]

Figure 1: Screen shot of a dataflow graph generated by Tensilica's C/C++ compiler and profiler to identify hot spots that XPRES can optimize.
Figure 2: Screen shot of an XPRES control panel.
Figure 3: A graph generated by XPRES to show 1,830,796 possible Xtensa LX configurations for accelerating an XviD MPEG4 video encoder.
Figure 4: Screen shot from Tensilica's Xtensa Xplorer development tool.
Figure 5: Tensilica's XPRES design flow uses feedback-directed optimization to enhance both the configurable Xtensa LX processor and the software-development tools that drive the front end of the process.

Altera's New CPU for FPGAs

Soft Processor Core Offers Alternative to Custom Silicon

FPGA vendor Altera took the stage at the recent Embedded Processor Forum to introduce Nios II, a second-generation family of 32-bit synthesizable RISC processors. All Nios II cores are intended primarily for integration in system-on-programmable-chip (SoPC) devices — essentially, SoCs in FPGAs. However, they are also suitable for structured ASICs and regular SoCs, especially as a migration path from FPGAs if production volumes climb. [June 28, 2004]

Table 1: Feature comparison of the Nios II/f, Nios II/s, and Nios II/e.
Table 2: The complete Nios II instruction set.
Table 3: Feature comparison of the Nios II/f, Nios II/s, Nios II/e, ARC 600, Tensilica Xtensa LX, and Xilinx MicroBlaze processor cores.
Figure 1: Block diagram of a user-defined custom instruction.

ARC 700 Secrets Revealed

It's the First ARC Processor for High-End Embedded Operating Systems

Challenging ARM in the embedded-processor market is as daunting as challenging Intel in the PC market: you're cruisin' for a bruisin'. No wonder ARC International wanted to delay revealing everything about its new ARC 700 processor core until a marketing plan was in place. As ARC recently disclosed at Embedded Processor Forum 2004, the ARC 700 has additional features that directly challenge ARM's most popular processor cores: it's the first ARC processor with a memory-management unit (MMU), translation lookaside buffer (TLB), precise exception model, and multiple privilege levels. That makes the ARC 700 the company's first processor capable of running Linux and other sophisticated embedded operating systems. [June 21, 2004]

Table 1: Feature comparison of the ARC 700, ARC 600, ARM926EJ-S, ARM1156T2F-S, ARM1176JZF-S, ARM1136JF-S, MIPS 4KEc Pro, MIPS 24K Pro, Tensilica Xtensa V, and Tensilica Xtensa LX.
Table 2: Nine new instructions in the ARC 700.
Figure 1: Die-photo comparison of an ARC 700 synthesized for silicon area vs. an ARC 700 synthesized for clock frequency.

Tensilica Tackles Bottlenecks

New Xtensa LX Configurable Processor Shatters Industry Benchmarks

At Embedded Processor Forum 2004, Tensilica announced new versions of its configurable microprocessor core and optional DSP engine, which are licensed as soft intellectual property (IP). The new Xtensa LX is a major upgrade of Tensilica's existing configurable processor core, the Xtensa V. It tackles three challenges vexing today's CPU architects: the architectural limitations on compute efficiency, the bottlenecks on I/O bandwidth, and rising power consumption. For SoC developers, Xtensa LX preserves the advantages of a customizable CPU architecture while laying the groundwork for future development tools that will further automate the task of creating an optimized SoC design. [May 31, 2004]

Figure 1: Block diagram of Xtensa LX bus architecture, showing the optional second load/store unit.
Figure 2: Block diagram of configurable on-chip I/O ports created in Tensilica Extension Instruction (TIE) language.
Figure 3: Diagram comparing the default five-stage pipeline with the optional seven-stage pipeline.
Sidebar: How Tensilica Busted the Benchmarks

StarCore DSPs Boost VoIP

Freescale Designs Its Latest DSPs for Packet-Telephony Applications

Two decades of deregulation have slashed the cost of long-distance phone calls to pennies a minute, but even pennies aren't free. Business and residential customers eager for lower-cost alternatives are eyeing voice-over-Internet-Protocol (VoIP) telephony, which piggybacks digitized voice packets onto existing Internet services. Although Freescale's new MSC711x-series DSPs are useful for any 16-bit fixed-point signal processing, they are especially suited for packet telephony. Two of the chips have Ethernet media-access controllers, and all have time-division multiplexers (TDM), DDR memory controllers, 32-channel DMA, and generous amounts of on-chip SRAM. [May 18, 2004]

Figure 1: Block diagram of FreeScale's VoIP reference design, which uses four MSC711x-series StarCore chips on a "DSP farm card" to assist a PowerQuicc II MPC8260 communications processor.
Figure 2: Block diagram of FreeScale's StarCore MSC7116 DSP; other members of the family are minor variations of this design.
Table 1: Feature comparison of Freescale's MSC711x family: the MSC7110, MSC7112, MSC7113, MSC7115, and MSC7116. All are based on the same StarCore SC1400 synthesizable DSP core. Also included for comparison: the previously announced MSC8122, a more powerful DSP with four SC140 cores.

Freescale Secures PowerQuicc

New PowerQuicc II Pro and PowerQuicc III Add Security Engines

Freescale Semiconductor — the newborn spinoff from Motorola — has introduced a new PowerQuicc II Pro family of communications processors and two new members of the PowerQuicc III family. In all, there are eight new PowerQuicc chips. The most significant improvements over existing PowerQuicc processors are higher-performance CPU cores, faster memory systems, enhanced network interfaces, and integrated security engines for encrypting and decrypting data packets. [May 10, 2004]

Figure 1: PowerQuicc II Pro MPC8349E block diagram.
Table 1: Feature comparison of the Freescale PowerQuicc II Pro MPC8349E, MPC8347E, and MPC8343E; Freescale PowerQuicc III MPC8541E; Freescale PowerQuicc II MPC8272; AMD Alchemy Au1550; and Intel XScale IXP425.
Sidebar: New PowerPC Cores Promise Higher Performance

IBM Loosens Up CPU Licensing

"Power Everywhere" Initiative Aims to Spread PowerPC Architecture

IBM Microelectronics has announced some important steps toward making the PowerPC architecture more widely available as licensable intellectual property (IP) for custom chip designs. However, the much publicized "Power Everywhere" initiative still falls short of matching the flexible licensing models and customizing options from competing IP vendors. Among the announcements: IBM will consider licensing any Power or PowerPC core or chip implementation, although limitations apply; IBM will allow customers to freely download a synthesizable model of the PowerPC 440 for evaluation; and IBM plans to form an open committee to help steer the future evolution of Power/PowerPC, although IBM will retain control of the architecture. [April 26, 2004]

Sidebar: AMCC Strikes a Big Deal for PowerPC

Preview: Embedded Processor Forum 2004

New CPUs From ARM, Motorola, PMC-Sierra, TI, Tensilica, and More

During the two-day conference portion of Embedded Processor Forum 2004 — May 18�19, at the Fairmont Hotel in San Jose, California — new embedded processors, architectures, and synthesizable cores will be unveiled by Altera, AMD, ARM, Cradle, Emblaze, MobilEye, Motorola, PMC-Sierra, StarCore, Texas Instruments, Tensilica, Ultra Data, and VIA/Centaur. Almost all these presentations will be the first technical disclosures of their products. The new processors run the gamut from traditional RISC and CISC architectures to bold new designs optimized for communications, mobile multimedia, machine vision, and signal processing. [April 19, 2004]

Alchemy Adds Security Engine

AMD Network Processor Adopts SafeNet Encryption Technology

AMD's Alchemy family of MIPS32-based embedded processors has a new member that integrates a security engine for encrypted communications. The new Au1550 supports Internet Protocol security (IPsec) and the Secure Sockets Layer (SSL) protocol for virtual private networks (VPN). The Au1550 is the fourth, and most advanced, member of the embedded-processor family that AMD gained by acquiring Alchemy Semiconductor in 2002. [April 5, 2004]

Figure 1: Alchemy Au1550 block diagram.
Table 1: Feature comparison of the Alchemy Au1550, Alchemy Au1500, Alchemy Au1100, Alchemy Au1000, Intel XScale IXP425, and Motorola PowerQuicc II MPC8272.

AMD and Intel Harmonize on 64

Intel's 64-Bit x86 Extensions Are Largely Compatible With AMD64

An independent analysis by MPR indicates that the 64-bit x86 architectures from AMD and Intel are almost identical. We compared all the new instructions, modified instructions, deleted instructions, and modifications to the register files. We also compared the memory-addressing schemes and many other architectural features, such as data-addressing modes, context-switching behavior, interrupt handling, and support for existing 16- and 32-bit x86 execution modes. In every case, we found that Intel has patterned its 64-bit x86 architecture after AMD64 in almost every detail. However, we also found a few differences that could make some software written for one 64-bit architecture incompatible with the other architecture. [March 29, 2004]

Figure 1: The new x86-64 execution modes and their characteristics.
Figure 2: Comparison of the 32- and 64-bit x86 register files.
Table 1: New instructions in the 64-bit architectures from AMD and Intel.
Table 2: Deleted or reassigned instructions in the 64-bit architectures from AMD and Intel.
Table 3: Summary of similarities and differences between AMD64 and Intel's Extended-Memory 64 Technology (EM64T).
Sidebar: Intel and AMD Manuals Sing Similar Tunes

Xilinx Reconfigures Triscend

Acquisition Surprise: Xilinx Snatches Triscend From the Arms of ARM

Only weeks after ARM announced the acquisition of Triscend, Xilinx loosened ARM's grip with a higher bid and wrestled the small chip vendor away from ARM. The Xilinx deal is final, averting a further bidding war or the intercession of other suitors. As ARM had planned, Xilinx will absorb almost all 41 Triscend employees and phase out Triscend's corporate identity. [March 15, 2004]

ARC 700 Aims Higher

Redesigned Configurable CPU Shoots for Higher Performance

Only a few months after introducing the ARC 600 configurable processor, ARC International has announced another new core: the ARC 700. But it's not an egregious exercise in instant obsolescence. The new(er) ARC processor is fully compatible with its still-available predecessor and is intended for customers willing to tolerate a larger core in return for higher performance. ARC claims the ARC 700 is the smallest 400MHz 32-bit RISC core available — one-third the size of an ARM11 when fabricated in a 0.13-micron IC process — with lower power consumption to boot. [March 8, 2004]

Figure 1: ARC 700 pipeline/block diagram.
Table 1: New instructions in the ARC 700.
Sidebar: New CEO Brings Varied Background to ARC

Cirrus Logic Grows Ten ARMs

ARM9-Based Processors Extend Consumer/Industrial Maverick Family

Encouraged by the reception of its first ARM9-based processor in 2001, Cirrus Logic is rolling out 10 more chips with an ARM920T core. All are highly integrated system-on-chip (SoC) devices with impressive features and on-chip peripherals, but the feature creep isn't coming at a price — even the new high-end chip costs 37% less than Cirrus Logic's first ARM9 processor from three years ago. [March 1, 2004]

Figure 1: Block diagram of Cirrus Logic's Maverick EP9315 processor.
Table 1: Feature comparison of Cirrus Logic's EP9301, EP9303, EP9304, EP9305, EP9306, EP9307, EP9309, EP9310, EP9311, EP9312, and EP9315.

ARM Grabs Triscend

Purchase of Microcontroller Company Makes ARM a Chip Vendor

No more is ARM the "chipless chip company." ARM's surprise acquisition of Triscend, a microcontroller vendor in Silicon Valley, will make ARM a fabless semiconductor company for the first time — at least in a small way. However, the slight departure from ARM's traditional line of business is actually a strategic move intended to strengthen that business. ARM's goal is to seed the market for ARM-based 32-bit microcontrollers as the industry makes a transition from less powerful 8- and 16-bit chips. [February 17, 2004]

Table 1: Feature comparison of the Triscend A7VL05, Triscend A7VE05, Triscend A7VC05, Triscend A7VT05, Atmel AT91FR40xx, Hynix HMS39C70x, Oki ML67Q500x, and Philips LPC2106.

Extreme CPUs Defy Conventions

Radical Designs Attempt Quantum Leaps in Performance

This detailed year-in-review article examines the market for "extreme" processors and describes five such processors nominated for our 2003 Microprocessor Report Analysts' Choice Awards: the ClearSpeed CS301, Cradle ECE3400/MPE3400, Intrinsity FastMath, Elixent ET1, and Xelerated Xelerator X10q. [February 9, 2004]

Figure 1: Block diagram of the ClearSpeed CS301.
Figure 2: Block diagram of the Cradle ECE3400/MPE3400.
Figure 3: Circuit diagram of an OR gate implemented with Intrinsity's Fast14 logic.
Figure 4: Block diagram of the Elixent/Toshiba ET1.
Figure 5: Block diagram of the Xelerated Xelerator X10q.

Best Extreme Processor: Xelerated X10q

Massively Pipelined NPU Is the First 40Gb/s Packet Processor

The Xelerated Xelerator X10q has been chosen for the Microprocessor Report Analysts' Choice Award as the Best Extreme Processor of 2003. The Xelerator X10q deserves the award for both its extreme design, even by the standards of extreme processors, and its focused design, which doesn't allow complexity to obscure its utility. Although massively parallel processors are becoming almost commonplace, the X10q steps forward with a massively pipelined architecture. This unusual approach is justified for a high-performance packet processor that performs repetitive tasks in serial fashion. [February 9, 2004]

Figure 1: Diagram of the X10q's unique pipeline, which is more than 1,000 stages long.

Media Processors Poised to Pounce

Digital Engines Power Next-Generation Consumer Electronics

This comprehensive year-in-review article examines the growing market for media processors and describes five such processors nominated for our 2003 Microprocessor Report Analysts' Choice Awards: Equator Technologies' BSP-15; Intel's MXP5800; Motorola's MRC6011; Philips Semiconductor's TriMedia TM5250; and Silicon Hive's Avispa+. [February 9, 2004]

Figure 1: Block diagram of the Intel MXP5800.
Figure 2: Block diagram of the Motorola MRC6011.
Figure 3: Pipeline diagram of the Philips TriMedia TM5250.
Figure 4: Conceptual diagram of Silicon Hive's feedback-driven automated processor-design tools.
Sidebar: Media-Related Processors in 2003 (including numerous links to previous articles in Microprocessor Report)

Best Media Processor: TriMedia TM5250

Philips Updates a Classic, Achieves High Benchmark Scores

The Philips TriMedia TM5250 has been chosen for the Microprocessor Report Analysts' Choice Award as the Best Media Processor of 2003. The TM5250 deserves the award for proving that smart design work can keep a 10-year-old media-processor architecture competitive against newer, more-extreme architectures without sacrificing software compatibility. [February 9, 2004]

Table 1: EEMBC consumer-suite benchmark scores and Philips MediaStone benchmark scores for the Philips TM5250, Philips TM1300, Philips PNX1300, Tensilica Xtensa V, Tensilica Xtensa III, Motorola PowerPC MPC7447, ARC ARCtangent-A4, and Hitachi/SuperH SH-4.

ClearSpeed Hits Design Targets

Early Samples of Floating-Point Coprocessors Are Fast, Power-Efficient

ClearSpeed Technology has successfully tested early production samples of its CS301 floating-point coprocessor and is delivering small quantities to prospective customers. The massively parallel chip is hitting all its design targets for clock frequency (200MHz), power consumption (less than 2W), and peak floating-point performance (25.6 GFLOPS). [January 12, 2004]

Figure 1: Screen shot of a math-intensive drug-simulation program from the University of Bristol, England, running on a PC with six ClearSpeed CS301 coprocessors.

ARM Expands ARM11 Family

Significant New Features to Debut in ARM1156 and ARM1176 Cores

ARM's latest synthesizable processor cores will introduce several eagerly anticipated features when they ship to licensees in the second quarter of the new year. Enhancements cover the gamut from security and power management to code compression and on-chip I/O. It's a significant growth spurt for the youthful ARM11 family, which will celebrate only its second birthday in 2004. [January 5, 2004]

Table 1: Feature comparison of the ARM1156T2F-S, ARM1156T2-S, ARM1176JZF-S, ARM1176JZ-S, ARM1136JF-S, and ARM1136J-S.

Sonics Gains Acceptance

On-Chip Interconnect Wins Customers, Promotes Standards

Since its founding in 1997, Sonics has been gradually establishing its on-chip interconnect technology among important customers like Broadcom, Flextronics, Fujitsu, Hitachi, Hughes, Intel, NASA, NEC, Nokia, Samsung, Texas Instruments, and Toshiba. Last fall, TI licensed additional Sonics technology for its OMAP wireless-communication processors, and an industry-standards body adopted the core-interface protocol backed by Sonics. Sonics' latest product is SiliconBackplane III, a new version of its interconnect fabric. [December 22, 2003]

Figure 1: Block diagram of the Sonics SiliconBackplane III, Synapse 3220, and MemMax on-chip interconnects.

ARC Alters Trajectory

Faster CPU Core Has New Tools, Audio Extensions, Licensing Options

ARC International was the first to license a customizable microprocessor core, but financial success has been elusive, and new competitors keep emerging. In a bid to regain the initiative, ARC has extensively revamped its product line. The most significant announcement is the ARC 600 processor core, a successor to the ARCtangent-A5, the company's two-year-old flagship product. ARC has also decided to offer preconfigured CPU cores for vertical applications, and the first example is an ARC 600 with new hardware extensions and software codecs designed for portable digital-audio products. [December 15, 2003]

Figure 1: ARC 600 pipeline diagram.
Figure 2: Screen shot of ARChitect 2, the improved version of ARC's graphical processor-configuration tool.
Table 1: Instruction-set extensions for the ARC 600 Digital Audio Platform.
Sidebar: New Leadership Tries to Revitalize ARC

Silicon Hive Breaks Out

Philips Startup Unveils Configurable Parallel-Processing Architecture

Parallel lines never meet, but great minds think alike. Maybe that explains the convergence of parallel processors at this year's Microprocessor Forum and Embedded Processor Forum. The latest example is a configurable parallel-processing architecture from Silicon Hive, a Netherlands-based startup funded by Philips Electronics. Silicon Hive has created what it calls an ultralong instruction-word (ULIW) architecture — an apt description. With instruction words that stretch up to 768 bits long, each containing scores of operations, Silicon Hive's ULIW architecture surpasses every known VLIW machine. [December 1, 2003]

Table 1: Comparison of Silicon Hive's Avispa and Avispa+ processor cores.
Table 2: Benchmark data showing the efficiency of some common signal-processing algorithms running on simulations of the Avispa and Avispa+ cores.
Figure 1: Block diagram of the configurable ULIW architecture.
Figure 2: Screen shot of a profiling tool showing the efficiency of an FFT running on an Avispa processor.
Figure 3: Diagram of Silicon Hive's feedback-driven configurable design system.

Floating Point Buoys ClearSpeed

Massively Parallel Processor Delivers 25.6 Peak GFLOPS at 200MHz

ClearSpeed, a U.K.-based startup, revealed a massively parallel CPU architecture at Microprocessor Forum 2003 that's intended to revive the market for floating-point coprocessors. ClearSpeed's strategy is to offer much higher floating-point performance at much lower power levels than general-purpose CPUs do, enabling designers to build faster embedded systems and accelerator cards for PCs, workstations, and servers. Instead of bottom-trawling for the mass market, though, ClearSpeed is fishing for customers willing to spend $975 per chip for 25.6 billion floating-point operations per second (GFLOPS). [November 17, 2003]

Figure 1: Labeled die photo of ClearSpeed's CS301 chip.
Figure 2: Block diagram of a processing element, one of 64 in the CS301.
Figure 3: Block diagram of the CS301.
Table 1: Informal FFT benchmarks comparing the CS301 to a Motorola PowerPC MPC7410 processor.

Philips Powers Up for Video

Superpipelined TriMedia Processor Core Gives New Legs to MPEG

With an eye on the growing market for consumer electronics, Philips Semiconductors announced a new TriMedia 32-bit processor core at Microprocessor Forum 2003. The swifter core will debut next year in Philips media processors destined for personal video recorders, wireless networks, high-definition TVs, and other audio/video products. Unlike some previous TriMedia CPU cores, the new TM5250 won't be offered as licensable intellectual property (IP). Instead, the TM5250 will spawn a new generation of standard-part Nexperia media processors designed and manufactured by Philips. [November 3, 2003]

Table 1: Comparing the new TM5250 to the Nexperia PNX1300 (formerly TriMedia TM1300).
Table 2: EEMBC consumer-suite benchmark scores and Philips MediaStone benchmark scores for the Philips TM5250, Philips TM1300, Philips PNX1300, Tensilica Xtensa V, Tensilica Xtensa III, Motorola PowerPC MPC7447, ARC ARCtangent-A4, and Hitachi/SuperH SH-4.
Figure 1: Pipeline diagrams for various TM5250 function units.
Figure 2: Estimated MediaStone scores for the TM5250 at clock frequencies ranging from 300MHz to 900MHz.
Photo: The TM5250 design team in Sunnyvale.

Motorola Enhances StarCore DSP

SC140e Core Offers New Instructions, Caches, and Task Protection

Some of Motorola's latest 3G mobile phones will use an enhanced StarCore DSP that the company revealed at Microprocessor Forum 2003. The new SC140e core has several advanced features not found in other StarCore DSPs, including a new memory subsystem and a user-level privilege mode. Motorola says the enhancements will eventually appear in a future architecture from StarCore LLC, a spinoff formed last year by Motorola, Infineon, and Agere (formerly Lucent). [October 20, 2003]

Figure 1: StarCore SC140e block diagram.
Figure 2: How the SC140e's new cache-locking scheme works.
Photo: The SC140e main design team in Israel.
Sidebar: StarCore LLC Offers Soft DSPs
- Table: Comparing features of StarCore LLC's SP1201, SP1202, SP1203, SP1401, SP1402, and SP1403 synthesizable DSP cores.

PicoChip Makes a Big MAC

Massively Parallel Chip Has 260 Multiply-Accumulate Processors

If you want a Big Mac, go to McDonald's. If you want a big MAC, see PicoChip Design. The U.K.-based company is introducing the PC102, a massively parallel communications chip that contains 344 processors, including 260 with multiply-accumulate (MAC) units. PicoChip COO and chief architect Peter Claydon announced the PC102 at Microprocessor Forum 2003. [October 14, 2003]

Table 1: Comparing features of PicoChip's PC101 and PC102.
Figure 1: Programmers can specify signal paths by using a structural subset of VHDL.

Tensilica Makes Music

New Extensions and Software Codecs Accelerate Digital Audio

Tensilica has introduced a new package of audio extensions and software codecs for its Xtensa V configurable microprocessor core. Known as the HiFi Audio Engine, the optional extensions include 54 new instructions that accelerate common algorithms for digital-audio encoding and decoding. Tensilica's target applications are portable MP3 players, automotive sound systems, TV set-top boxes, smart phones, and home entertainment systems. [September 29, 2003]

Table 1: HiFi Audio Engine instruction mnemonics and descriptions.
Table 2: HiFi Audio Engine registers.

ARC Accelerates Cryptography

New Processor Extensions and Software Boost AES and DES

ARC International is offering a new package of microprocessor extensions and security software that can improve the performance of common cryptographic algorithms by an order of magnitude or more. The package, called ARCprotect, includes new instructions, registers, and middleware as licensable intellectual property (IP) for the current ARCtangent-A5 and future ARC microprocessor cores. The extension instructions focus on the popular AES, DES, and 3DES (triple DES) encryption/decryption algorithms that are the foundation of many network-security protocols. [September 22, 2003]

Table 1: The new ARCprotect instructions and registers.

Triscend Revs Up for Motors

New Field-Configurable MCUs Aim at Industrial Applications

Trying to ascend above the fray of 32-bit microcontrollers, Triscend is revising its line of ARM7-based field-configurable MCUs. New on-chip peripherals and features will make them more suitable for industrial applications — particularly for motor controllers. The new A7V05 family initially has four members, all with ARM7TDMI processor cores running as fast as 70MHz. They will supersede the company's current line of ARM7-based MCUs next year. [September 15, 2003]

Table 1: Feature comparison of the new Triscend A7VL05, A7VE05, A7VC05, and A7VT05 microcontrollers.
Figure 1: Block diagram of the Triscend A7VT05.

ARM Dons Armor

TrustZone Security Extensions Strengthen ARMv6 Architecture

"Trusted computing" is such a hot topic that a dictionary editor recently asked MPR if she should include the term in the next edition she's compiling. Nothing validates a trend like the migration of technobabble to everyday language. To make designing secure embedded systems easier, ARM is adding new security extensions to the ARMv6 architecture. The new TrustZone extensions are relatively simple, consisting primarily of one new instruction, a new configuration bit, and an additional permission level that supplements the existing user and privileged modes. [August 25, 2003]

Figure 1: Block diagram of an ARM-based system-on-chip processor with various blocks secured by TrustZone.

PicoChip Preaches Parallelism

Massively Parallel PC101 Chip Has 430 16-Bit Processors

Among the most unusual microprocessors unveiled at Embedded Processor Forum 2003 was picoChip Design's new PC101, a massively parallel device that integrates 430 16-bit processors on a single die. Indeed, the PC101's resources are so abundant that, to some degree, they are expendable — the chip's internal bus fabric can bypass a few processors ruined by manufacturing defects. Designed for cellular-telephony and wireless-network base stations, the PC101 is the first implementation of picoChip's picoArray architecture. [July 28, 2003]

Table 1: The PC101's long-instruction-word (LIW) format.
Figure 1: Block diagram of a picoChip PC101 array processor.
Figure 2: Block diagram of the picoArray, with example signal flows.
Figure 3: Diagram of multiple picoArrays chained together.
Photo: Peter Claydon, picoChip's CEO and chief architect of the picoArray architecture, describes the PC101 at Embedded Processor Forum.

News Items

July 2003

Equator Revs Media Processor: Equator Technologies unveiled a new member of its media-processor family at Embedded Processor Forum 2003, claiming the chip will deliver more signal-processing performance than any other VLIW architecture. The new processor, the BSP-16, is scheduled for production in 3Q04. [July 28, 2003]

Elixent Expands SoCs

Licensable Array Processor Has Massively Parallel Architecture

Seeking a soft spot between a rock and a hard place, U.K.-based Elixent is introducing a massively parallel processor core that strives to combine the programmability of a general-purpose processor with the performance of a hard-wired ASIC. The goal: a more flexible system-on-chip (SoC) processor that consumes less power and adapts quickly to different tasks, amortizing the development costs of an SoC over multiple projects. [July 21, 2003]

Photo: Elixent's Alan Marshall describes the D-Fabrix architecture at Embedded Processor Forum 2003.
Figure 1: A basic "repeating block" in the D-Fabrix architecture.
Figure 2: Higher-level block diagram of Toshiba's ET1 implementation.
Table 1: Comparing the performance of a hypothetical D-Fabrix processor with a DSP when interpolating RGB pixels.

Motorola Attacks ASICs

Programmable Communications Processor Offers Design Flexibility

More frightening than any Halloween mask is the over�$1 million price tag on a deep-submicron mask set. No wonder everyone is looking for ways to exorcise the demon. Motorola's latest weapon is the MRC6011, a new chip that has a programmable RISC controller, internal peripherals, and six DSP cores, each with 16 function units. Designed primarily for wireless infrastructures, the MRC6011 is an off-the-shelf alternative to a costly ASIC project or a conventional DSP. [July 14, 2003]

Photo: Motorola's Roman Robles describes the MRC6011 at Embedded Processor Forum.
Figure 1: High-level MRC6011 block diagram.
Figure 2: DSP core block diagram.
Sidebar: Defining Reconfigurable Processing (by Nick Tredennick).

Tensilica's Software Makes Hardware

New Tool Customizes Processor by Analyzing C/C++ Code

At Embedded Processor Forum 2003 last week, Tensilica unveiled an impressive addition to the tool chain for its Xtensa configurable-processor architecture. A new code-analysis and hardware-generation tool — so new it doesn't have a catchy name — automatically creates processor extensions that accelerate critical functions in C/C++ source code. Custom extensions can include new instructions, registers, and function units. In minutes, the tool can evaluate thousands of possible extensions and sort them by performance (clock cycles) and efficiency (gate count). When a developer selects the optimal design for the target application, the tool automatically generates the extension in Tensilica's proprietary hardware design language and integrates it with the Xtensa processor core, ready for logic synthesis. [June 23, 2003]

Photo: Tensilica's Dror Maydan, director of software development, presenting at EPF2003.
Figure 1: Screen shot of Tensilica's graphical tool for selecting critical functions of C/C++ programs to optimize.
Figure 2: Screen shot of the tool's performance/efficiency analysis chart.
Table 1: Eleven variations of automatically generated custom extensions with their estimated gate counts, software speedups, and acceleration techniques.
Table 2: Example algorithms accelerated with custom extensions, their speedup factors, the number of possible configurations evaluated, elapsed time for evaluations, and before-and-after code-size data.
Sidebar: Xtensa Xplorer Unites Tool Chain
- Figure: Screen shot of Xtensa Xplorer's Pipeline Viewer.
- Figure: Screen shot of the Cache Xplorer.

Intel Maps Wireless Future

Goals Include Reconfigurable Radio Chips, Digital Radio on CMOS

Intel is mapping an ambitious strategy to virtually eliminate the hardware cost of wireless integration by making digital radios inexpensive enough to build into almost any chip. Two thrusts of the so-called Radio Free Intel initiative are a new microprocessor architecture and better radio integration with mainstream fabrication technology. The first goal is to create multiband communications processors that can automatically reconfigure themselves on the fly for different wireless standards. The other goal is to integrate a wireless-baseband processor and analog front end on a single CMOS chip — without the extra costs of external components, exotic semiconductors, or additional processing steps during fabrication. [June 9, 2003]

Figure 1: Block diagram of Intel's prototype reconfigurable communications architecture.

News Items

June 2003

Josh Fisher Wins Eckert-Mauchly Award: Joseph "Josh" Fisher, a Hewlett-Packard senior fellow, will receive the prestigious Eckert-Mauchly Award this week at the International Symposium on Computer Architecture in San Diego, California. Fisher is winning the award for his pioneering work on VLIW. [June 9, 2003]
Tensilica Patent Challenged: MPR has learned that an unknown party is asking the U.S. Patent and Trademark Office to reexamine one of the patents on configurable development tools issued last year to Tensilica. The challenge doesn't attempt to overturn the entire patent, but it does try to narrow the scope of about half the patent's broadest claims. The challenged patent is number 6,477,683, one of two issued to Tensilica on November 5, 2002. [June 2, 2003]

Philips Shows Flashy MCUs

ARM7-based Microcontrollers Have Embedded Flash Memory

Eight-bit chips still account for 56% of microprocessor sales by volume and 40% of revenues, according to World Semiconductor Trade Statistics. Embedded-systems developers keep using these puny chips because they are unbeatably cheap, sip miniscule amounts of power, and are small enough to add a dab of silicon intelligence to almost anything large enough to see. Hoping to displace 8- and 16-bit chips with more-powerful (and more-profitable) devices, Philips Semiconductors is introducing a new line of ARM7-based 32-bit MCUs. To sweeten the bait, Philips is fabricating the new MCUs in a special 0.18-micron CMOS process that offers the option of embedded flash memory. [May 19, 2003]

Table 1: Comparison of ARM7TDMI-based general-purpose microcontrollers with embedded flash, including the Philips LPC2104, Philips LPC2105, Philips LPC2106, Atmel AT91FR40xx series, Atmel AT91Fxxxxx series, Hynix HMS39C70x series, Oki ML67Q400x series, and Oki ML67Q500x series.

Intel's Pentium M Gets Embedded

Mobile Processors and Chip Sets Aim for Communications Market

Intel's newest microprocessor for mobile PCs is hardly out the door, but already the Intel Communications Group is promoting it as a high-performance embedded processor for networking. Intel is also sketching a roadmap for future chip sets that will improve the differentiation between its embedded and PC/server platforms. The "new" embedded processor is the Pentium M, formerly known as Banias, which Intel introduced in March as part of the Centrino mobile PC platform. Intel's intention is to sell the embedded Pentium M into high-performance communications-infrastructure applications, such as core routers, server blades, and network controllers. [May 12, 2003]

Table 1: Comparison of the Intel Pentium M, LV Pentium M, LV Pentium III, VIA C3 E-Series, Transmeta Crusoe SE, IBM PowerPC 750FX, Motorola PowerPC MPC7457, PMC-Sierra RM9000x2, and Broadcom BCM1250.
Table 2: Comparison of three Intel core-logic chipsets for the Pentium M: E7501, 855GM, and 855PM.

Ubicom's New NPU Stays Small

IP3023 Packet Processor Has Efficient I/O Architecture

Small is good if you're a jockey, a designer dog, or a microprocessor chip. Ubicom (meaning "ubiquitous communications") is a company that definitely thinks small when it designs packet processors for wired and wireless systems. Its latest NPU is the IP3023, which requires only about 50% as much silicon and 10% as much memory as some competing chips. Ubicom designed the IP3023 for wireless access points, wireless LAN (WLAN) bridges, broadband modems, home routers, and other consumer or enterprise products that operate near the edge of a network. The company's goal is to slash the bill of materials (BOM) for those systems by offering an efficient packet processor that reduces or eliminates the need for off-chip memory and protocol-specific I/O chips. [April 21, 2003]

Table 1: Comparison of Ubicom's IP3023, IP2022, and IP2012 chips.
Table 2: IP3023 instruction set.
Figure 1: IP3023 block diagram.
Figure 2: IP3023 pipeline diagram.
Figure 3: Scaled die photo of Broadcom's BCM4710 and die plot of Ubicom's IP3023 with major function blocks labeled.

IBM Opens Up PowerPC Licensing

Customers Can Take Hard or Soft Cores to Any Foundry

IBM's long-awaited decision to openly license PowerPC cores will offer formidable new competition for ARM, MIPS Technologies, and other vendors of 32-bit microprocessor cores. It's not just that PowerPC is a popular, scalable architecture with a wealth of development tools and software. IBM's marketing muscle will give the company an instant presence in the intellectual property (IP) marketplace, and its rocklike stability makes it a safe haven for nervous customers in tough times. [March 31, 2003]

Octera Throws a Javalon

Java-like Synthesizable Processor Targets Deeply Embedded Systems

San Diego-based Octera is introducing Javalon-1, a synthesizable microprocessor core that natively executes Java bytecode instructions. Javalon-1 is the first member of a small family of cores that will have minor variations on the same basic design. Chip designers can use Javalon-1 as the basis for a self-sufficient microcontroller or as a slave to another microprocessor core on an SoC or ASIC. [March 17, 2003]

Table 1: The Java bytecode instruction set.
Figure 1: Javalon-1 block diagram.
Photo: Octera's Javalon-1 design team.

MIPS Embraces Configurable Technology

Pro Series Processors With CorExtend Compete With ARC and Tensilica

MIPS Technologies is the latest company to endorse the concept of configurable processors. At Embedded Processor Forum 2002, MIPS introduced the M4K Pro synthesizable processor core, which allows customers to add application-specific instructions. More recently, MIPS announced that all soft processors in the Pro Series are user extendable, thanks to a technology MIPS refers to as CorExtend. Initial Pro Series cores, in addition to the M4K, are the 4KSd, 4KEp, 4KEm, and 4KEc. This article analyzes CorExtend and compares it with the configurable-processor technology from ARC International and Tensilica. [March 3, 2003]

Table 1: Feature comparison of the MIPS M4K Pro, 4KEp Pro, 4KEm Pro, 4KEc Pro, 4KSd Pro, ARCtangent-A5, and Xtensa V processor cores.
Figure 1: MIPS CorExtend instruction format.
Figure 2: How a custom instruction can replace a whole loop of assembly code.
Figure 3: The MIPS graphical configuration tool.
Figure 4: Tensilica's Processor Generator configuration tool.
Figure 5: ARC's ARChitect configuration tool.

Embedded News

March 2003

Chartered Seeks Lucrative Customers: Reeling from what it describes as the sharpest market decline in the history of the industry, Chartered Semiconductor plans to close its oldest fab and refocus on customers that need advanced fabrication processes. The Singapore-based company's goal is to increase its fab capacity for 0.18-micron and smaller processes from 15% in 2002 to 50% by the end of 2004, without expanding total capacity. [March 3, 2003]

Soft Cores Gain Ground

Key Trends Are Higher Speeds, Better Architectures, Configurability

Our year-end review covers five vendors whose 32-bit processor cores were nominated for a Microprocessor Report Analysts' Choice Award in the IP Core Processor category: ARC International (formerly ARC Cores), ARM Holdings, Improv Systems, MIPS Technologies, and Tensilica. The nominated cores are: ARC's ARCtangent-A5 with the ARCompact instruction-set architecture; ARM's ARM1026EJ-S and ARM1136JF-S; Improv Systems' Jazz DSP with Crescendo solution kit; MIPS' M4K Pro Series; and Tensilica's Xtensa V. The winner: ARM's ARM1136JF-S, the first ARM11 processor core. [February 18, 2003]

Table 1: Feature comparison of the six nominated processor cores.
Figure 1: EEMBC TeleMark out-of-the-box benchmark scores for the ARCtangent-A4, ARM1020E, Improv Jazz XT, Improv Jazz 2020, TriCore TC1M, Intrinsity FastMath, LSI ZSP500, MIPS-20Kc, Xtensa III, and Xtensa V processor cores.
Figure 2: EEMBC TeleMark optimized benchmark scores for the ARCtangent-A4, Jazz XT, FastMath, LSI ZSP 500, Xtensa III, and Xtensa V processor cores.
Figure 3: EEMBC ConsumerMark optimized benchmark scores for the ARCtangent-A4, TriMedia TM1300, Xtensa III, and Xtensa V processor cores.
Sidebar: Important Embedded-IP Events of 2002
Sidebar: ARCompact: An Elegant 16/32-Bit ISA

Intel Gets Extreme in 2009

Extreme Ultraviolet Lithography Scheduled for Mass Production

Intel claims it will be the first company to mass-produce microprocessors using extreme ultraviolet (EUV) lithography, a revolutionary new photomask technology. Pilot production is scheduled to begin with the 45nm fabrication process in 2007�2008, using tools and techniques now being refined. Mass production is scheduled to debut with the 32nm fabrication process in 2009. EUV lithography will enable a significant leap forward in the circuit density of chips, because the shorter-wavelength light allows stepper tools to draw features at least 10 times smaller than is possible with today's technology. [February 10, 2003]

Figure 1: The progress of circuit feature sizes (Moore's law), lithography wavelengths, and lithography processes, 1989-2011.
Figure 2: The progress of lithographic photomask technology, 1992-2005.

AMD and IBM to Develop Fab Technology

Engineers Will Collaborate On 65nm and 45nm Fabrication Processes

AMD's new alliance with IBM to jointly develop 65nm and 45nm fabrication technology should relieve some pressure on AMD in the race to keep up with Intel. It also raises the possibility that AMD will take the larger step of using IBM as a foundry for chip manufacturing — either instead of or in addition to outfitting its own 65nm and 45nm fabs. [January 27, 2003]

Figure 1: AMD's technology roadmap.

Transmeta Charges the Embedded Market

Searching for New Customers in the Shadow of Banias

With the debut of Intel's new Pentium M mobile-PC processor (code-named Banias) only months away, Transmeta is trying to expand the potential market for its competing x86-compatible chips. The company has announced a new line of Crusoe SE (Special Embedded) processors aimed at embedded systems that need to run x86 software with high performance and relatively low power consumption. [January 13, 2003]

Table 1: Comparison of the basic specifications of Crusoe SE chips with Intel's low-power Pentium III embedded processors.

News Items

January 2003

Transmeta Shows New TM8000 Astro: Transmeta is showing customers first samples of its next-generation mobile PC processor, promising to deliver production quantities by 3Q03. Although Transmeta is withholding most architectural details about the new TM8000 Astro until it's closer to release, the company says the processor will offer higher performance and additional power-management features. [January 6, 2003]

IBM Adds Strained Silicon to SOI

Advanced Fabrication Technology Accelerates Future Transistors

IBM Microelectronics has successfully produced the first short-channel nMOS transistors using silicon germanium (SiGe) and strained silicon with a silicon-on-insulator (SOI) substrate. The test chips, which have thousands of operational transistors, pave the way for IBM to introduce a combination SOI/strained-silicon fabrication process with 65-nanometer (nm) lithography in 2005. The payoff will be higher clock frequencies or lower power consumption, depending on the chip designer's priorities. [December 30, 2002]

Figure 1: At the base of IBM's experimental transistor are thin layers of strained silicon and silicon germanium on the wafer's silicon oxide substrate. The surrounding cobalt disilicide on the raised source/drain layer reduces source-to-drain resistance.

Tensilica Patents Raise Eyebrows

Legal Protection of Configurable-CPU Technology Could Frustrate Competitors

(With Rich Belgard)

Tensilica has been granted two U.S. patents for its system of automatically generating a custom microprocessor core and compatible software-development tools, and the company has 16 more patent applications pending. If archcompetitor ARC International is granted U.S. patents for dozens of similar applications now pending — in addition to the international patents it already holds — the result could be a legal minefield for any other companies that try to offer configurable-processor technology. [December 9, 2002]

Earlier Configurable Processors: Close, But No Cigar: Neither ARC nor Tensilica invented the concepts of customizable microprocessors or customizable processors with compatible software-development tools. However, MPR has been unable to find previous examples that duplicate Tensilica's system.

Embedded News

December 2002

VLIW Pioneer Bob Rau Dies: Dr. Bob Rau, Hewlett-Packard Fellow, pioneer of VLIW architectures, and recipient of numerous awards, died of cancer at his home in Los Altos, California, on December 10. He was 51. Before joining HP in 1989, Rau cofounded Cydrome in 1984 and was chief architect of the Cydra-5 computer, one of the first VLIW systems. [December 30, 2002]
IBM and Chartered Join Forces With Fabs: Fabrication technology and fabs are getting so expensive that even the biggest companies are forming alliances to share the burden. The latest linkup is between IBM, a leading innovator in chip technology, and Chartered Semiconductor Manufacturing, the world's third-largest independent chip foundry. Their open-ended multiyear agreement includes joint technology development and shared fab capacity. [December 16, 2002]
China Unveils MIPS-like CPU: A research group sponsored by the Chinese government has developed a MIPS-like microprocessor and has licensed the design to a Chinese startup company. The Beijing-based startup, BLX IC Design Corp., is currently sampling the chip and plans to begin production in 1Q03. It will be China's first commercial 32-bit microprocessor. [December 2, 2002]

Intel Spills the Beans About Banias

New Mobile CPU and Chip Set Have Numerous Power-Saving Features

Intel has disclosed intriguing details about its future Banias mobile processor at recent industry conferences, including Microprocessor Forum 2002. In addition to describing some of the chip's power-saving techniques, Intel emphasizes that Banias is a comprehensive "mobile platform," not just a lower-power CPU. At launch in 1H03, the platform will include mobile processors at various speed grades, two core-logic system chip sets, and dual-band 802.11a/b wireless networking. [November 25, 2002]

Photo: Mooly Eden, general manager of Intel's Israel Design Center, unveils new details about Banias's power-saving techniques at MPF 2002.
Table 1: Intel modified some NAND gates in the cache logic to reduce static current leakage.
Figure 1: The three-level branch predictor in Banias improves on the two-level dynamic branch prediction in Pentium III processors, which is believed to be the starting point for the Banias design.

VIA Keeps It Simple

C5XL Processor Finally Gets SSE, Faster FPU on Smaller Die

Glenn Henry, the outspoken president of VIA Technologies' Centaur microprocessor division, described VIA's new C5XL processor and updated his product roadmap at the recent Microprocessor Forum 2002 in San Jose. The C5XL appears to achieve the impossible: it adds a deeper pipeline, Intel-compatible SSE extensions, a faster FPU, support for two-way multiprocessing, a more-efficient L2 cache, and other improvements while actually shrinking its size in the same fabrication process. [November 11, 2002]

Photo: Glenn Henry presents the C5XL at MPF 2002.
Table 1: Comparison of the new C5XL and existing C5N.
Figure 1: VIA's processor roadmap 2002-2004.
Figure 2: Benchmark results for the VIA C5XL, VIA C5N, Intel Pentium III Celeron, and Intel Pentium 4 Celeron with Business Winstone 2001, Quake 3 Demo 2, 3D WinMark 2000, and CPUmark99 on a low-end system with $40 graphics card.
Figure 3: Benchmark results for the VIA C5XL, VIA C5N, Intel Pentium III Celeron, and Intel Pentium 4 Celeron with Business Winstone 2001, Quake 3 Demo 2, 3D WinMark 2000, and CPUmark99 on a lower-end system with integrated graphics.

IBM PowerPC 405EP Expands Family

SoC Targets Wireless LANs, Edge Routers, Broadband Modems

IBM is sampling the PowerPC 405EP, the newest member of its popular 405 family of SoCs. Among other features, the 405EP adds a second 10�100Mb/s Ethernet controller for greater flexibility in small routers and wireless LAN access points. The 405EP is also intended for DSL/cable modems and other network-edge products. The new chip is based on the 405D4 embedded processor core, an evolution of the 405B3 core introduced with the first chip in this series, the 405GP. [November 11, 2002]

Table 1: Comparison of the new PowerPC 405EP with the existing 405GP and 405GPr.

IBM Trims Power4, Adds AltiVec

64-Bit PowerPC 970 Targets Entry-Level Servers and Desktops

Rarely does a downsized product raise expectations for high performance. But by trimming down the awesome 64-bit Power4 server processor and adding AltiVec media extensions, IBM has created an impressive and affordable PowerPC chip for smaller servers, graphics workstations, and desktop computers. Nobody at IBM would confirm rumors that a leading customer for the PowerPC 970 is Apple — and Apple is even more tight-lipped. Nevertheless, the 970 is such an obvious improvement over today's Motorola G4-family PowerPC chips that it's hard to imagine Apple using anything else in its top-of-the-line desktop Macs and servers. [October 28, 2002]

Table 1: Feature comparison of the IBM PowerPC 970, IBM Power4, Motorola G4+, Motorola G4, and IBM/Motorola G3.
Figure 1: PowerPC 970 block diagram.
Figure 2: PowerPC 970 pipeline diagram.
Photo: IBM's Peter Sandon describes the PowerPC 970 at the recent Microprocessor Forum.

Faster Desktop CPUs From AMD, Intel

Athlon XP Debuts 333MHz Front-Side Bus; Celeron Hits 2GHz

AMD and Intel have introduced faster versions of their Athlon XP and Celeron desktop processors, including the first Athlon with a 333MHz front-side bus and the first Celeron to reach 2GHz. The new Athlon XP 2800+ and 2700+ processors run at core clock frequencies of 2.25GHz and 2.17GHz, respectively, and are priced at $397 and $349. Samples are available now, but production parts won't ship until November — and then only in limited numbers. [October 16, 2002]

Figure 1: AMD's core and front-side bus frequencies, 1999-2002.
Figure 2: Intel's core and front-side bus frequencies, 1999-2002.

Embedded News

October 2002

Tensilica Adopts CoreConnect Bus: Tensilica has licensed IBM's CoreConnect on-chip bus and is introducing a bus bridge for its Xtensa V customizable processor. The bridge allows system-on-chip (SoC) developers to integrate CoreConnect-compatible intellectual property (IP) with Xtensa processor cores. [October 7, 2002]

Intel Ships Fastest Mobile CPUs

Top-Speed Pentium 4-M Reaches 2.2GHz

Intel has introduced 11 new speed grades of its mobile processors, including a 2.2GHz Pentium 4-M that's 10% faster than the previous top-of-the-line 2.0GHz Pentium 4-M. Soon after Intel's announcement, seven PC vendors unveiled notebooks that use the new 2.2GHz speed champ. AMD quickly followed a week later by announcing two faster speed grades of the mobile Athlon XP: the 2000+ and 1900+, which run at 1.67GHz and 1.6GHz, respectively. They are shipping in notebook computers from Compaq and Fujitsu. [September 30, 2002]

Table 1: Specifications for Intel's 11 new mobile PC processors.
Figure 1: Intel's product line of mobile PC processors is closely tiered by clock frequency and price, but it has some sharp peaks and valleys because of microarchitectural differences and marketing strategies. Note the anomolies, such as two 1.33GHz processors priced at $134 and $508.

TI Links ARM in OMAP5910

ARM925 and 'C55x DSP Cores Entwined in Standard-Product Chip

(With Max Baron)

Once available to only a few favored customers, Texas Instruments' dual-processor OMAP family is now represented by a standard product. (These days, TI rarely spells out OMAP, which stands for Open Multimedia Applications Platform.) The new OMAP5910 chip unites a slightly modified ARM9TDMI microprocessor core with a TMS320C55x DSP core plus a generous amount of on-chip memory and a host of useful peripherals. TI is offering the OMAP5910 for embedded applications that need real-time control processing and data-intensive signal processing. [September 23, 2002]

Figure 1: OMAP5910 block diagram

Xtensa V Hits 350MHz

Customizable CPU Achieves Highest EEMBC ConsumerMark Score

Tensilica is shipping the fifth version of its customizable soft microprocessor core since the debut in 1999, adding new features and enhancing some existing ones. According to Tensilica's simulations, the new Xtensa V core should run at 350MHz (worst case) when fabricated in a 0.13-micron CMOS process. Tensilica says some Xtensa V configurations will run even faster and that improvements to the company's proprietary hardware-design language and C/C++ compiler can boost actual software performance by 50% over the Xtensa IV. [September 16, 2002]

Figure 1: Xtensa V now supports external DMA requests through the processor interface (PIF) and variable-latency devices on the local-memory interface (XLMI).
Wiggle Room In EEMBC's Simulated Benchmarks: Has the temperature in Hades plunged below 32�F? Perhaps, judging by MPR's amazing discovery that a CPU vendor deflated its own EEMBC benchmark scores by 25% to offer a more realistic estimate of actual performance.
- Figure: Certified EEMBC benchmark scores for the Xtensa V (at 260MHz, 285MHz, and 300MHz), ARCtangent-A4 (150MHz and 200MHz), and Motorola PowerPC MPC7455 (1GHz).

News Items

September 2002

SiS Chip Set Supports PC1066 RDRAM: Although the RDRAM bandwagon has few followers these days, Silicon Integrated Systems (SiS) hopes its new SiSR658 core-logic system chip set will attract customers looking for high-end memory performance. It's the first chip set that officially supports PC1066 Dual RDRAM, and it's the only RDRAM-capable Pentium 4 chip set from a company other than Intel. [September 9, 2002]

Intel Adopts Strained Silicon

New Fab Technology Will Boost Clock Speeds

Intel's next-generation 90nm fabrication process will use strained silicon, a new technique that boosts circuit performance and is also under development at other companies. The technique should significantly increase the clock frequency and only slightly increase the manufacturing cost of Intel's Prescott Pentium 4 when it debuts in 2H03. Experiments at other companies — including Hitachi, IBM, and startup AmberWave Systems — have shown improvements ranging from 30% to 120% in electron mobility and transistor current flow (drain current). Intel is claiming a 10-20% increase in drive current. [September 3, 2002]

Figure 1: Strained silicon improves electron mobility by stretching the atoms in the silicon layer further apart.
Figure 2: Intel's new transistors have some of the smallest features in the industry.
Table 1: In-Stat/MDR's estimates of gross die per wafer for the latest desktop processors from Intel and AMD, excluding defects.
Table 2: Intel and AMD both have aggressive roadmaps for process shrinks, but they have different priorities for new technologies, such as strained silicon and SOI.

News Items

August 2002

Athlon XP Passes 2GHz: AMD has announced two faster speed grades of the Athlon XP processor: the 2600+ and 2400+. The model numbers are benchmarketing comparisons with Pentium 4 clock speeds that deliver similar application performance. Actual core frequencies of the new Athlon XP processors are 2.13GHz and 2.0GHz, marking the first time AMD has reached 2.0GHz. [August 26, 2002]
Pentium 4 Reaches 2.8GHz: Intel is shipping four new Pentium 4 processors, including three faster models. The new top-of-the-line 2.8GHz Pentium 4 is 11% faster than the previous champion, the 2.53GHz Pentium 4. The other three new processors run at 2.66-, 2.6-, and 2.5GHz. All four processors have the same Northwood core as other recent Pentium 4 chips. [August 26, 2002]

**NOTE: There's a two-year gap in articles between August 21, 2000 and August 26, 2002. Tom worked at ARC Cores from 2000 to 2002 before returning to Microprocessor Report.**

LinkUp Systems Brushes Bluetooth

ARM-Based L7205 Is Enhanced for Short-Range Wireless Technology

LinkUp Systems is sampling an embedded processor that's "Bluetooth ready" — a pitch that sounds suspiciously like advertising stereo speakers as "digital ready." And indeed, LinkUp's new L7205 stops far short of integrating everything necessary to implement a Bluetooth radio transceiver without using additional components. But LinkUp says the USB interface and souped-up UARTs on the L7205 can nibble a few dollars off the cost of a typical Bluetooth implementation. [August 21, 2000]

Figure 1: Block diagram of the LinkUp L7205.
Table 1: Comparison of the LinkUp L7205, LinkUp L7200, Cirrus Logic EP7209, Cirrus Logic EP7211, and Cirrus Logic EP7212.

Imsys Hedges Bets On Java

Rewritable-Microcode Chip Has Instruction Sets for Java, Forth, C/C++

Depending on your point of view — and there seems to be no middle ground here — microprocessors that natively execute Java bytecodes are as palatable as latte or as loathsome as stained teeth. But in Sweden, where the spirit of neutrality still flourishes, a company called Imsys hasn't stopped trying to accommodate both sides by offering an embedded processor with rewritable microcode that natively runs Java or doesn't, as you please. Now Imsys is introducing an enhanced version of its GP1000 processor known as the Cjip. The most interesting new feature is that Imsys has developed an entirely new instruction set to supplement the Java instruction set, which has also been improved. The new instruction set, available as a microcode library, supports Forth or C and C++, using a Java-like stack architecture. [August 14, 2000]

Figure 1: Cjip block diagram.
Figure 2: Configuration of multiple register banks and on-chip stack memory in the Cjip.
Figure 3: The Java platform (compatible with Java 2 Micro Edition) as implemented on the Cjip.

Embedded Java Chips Get Real

Bytecode-Native aJ-100 Handles Real-Time Processing

Java and real-time processing usually go together like coffee and ketchup. But a Silicon Valley startup, aJile Systems, has a new Java chip that handles interrupts in real time and doesn't need a third-party RTOS. It also allows embedded-system developers to write all their software in Java — even device drivers and other low-level code that normally would be written in C or assembly language. AJile's aJ-100 microprocessor is based on the 32-bit JEM2 Java chip developed by Rockwell-Collins. [August 7, 2000]

Figure 1: aJ-100 processor-core block diagram.
Figure 2: The Java runtime environment as implemented by the aJ-100.
Figure 3: aJ-100 chip block diagram.
Figure 4: Embedded CaffeineMark 3.0 results for the aJile aJ-100, Intel StrongARM SA-110, MIPS R4600, and Intel Pentium.

PowerPC 440GP: Great Communicator

IBM's First Book-E PowerPC Combines Speed and Network Integration

Merge a Corvette and a Cadillac and you'll get a Detroit disaster. Yet IBM has successfully created a similar hybrid by crossing a fast PowerPC 440 core with the luxury features of a highly integrated communications chip. The result is the PowerPC 440GP, which IBM disclosed last month at Embedded Processor Forum. The 440GP is the first chip to use the PowerPC 440 embedded-processor core and is the first implementation of Book E, the embedded PowerPC architecture defined by IBM and Motorola. It's also the first processor to have a 128-bit version of IBM's on-chip CoreConnect bus. [July 31, 2000]

Figure 1: PowerPC 440GP block diagram.
Figure 2: The results of IBM's simulations with 64-, 128-, and 256-bit CoreConnect buses on the PowerPC 440GP.
Table 1: Comparison of the IBM 440GP, IBM 405GP, Hitachi SH7615, Infineon TriCore Harrier-XT, and the Motorola PowerQUICC MPC8260.
Photo: Donald Senzig, senior PowerPC system engineer at IBM Microelectronics, describes the 440GP at Embedded Processor Forum.

Embedded News

July 2000

New Motorola PowerQUICC II Costs Less: Motorola has announced a PowerQUICC II processor that sacrifices a few features in return for a 25% lower price, offering cost-conscious customers yet another choice in the growing line of integrated networking chips. The new PowerQUICC II MPC8255 is designed for midsize routers, switches, access concentrators, wireless base stations, and other communications equipment. [July 24, 2000]
Two New MIPS Cores From LSI Logic: LSI Logic is introducing a pair of new MIPS-compatible microprocessor cores for ASICs: the MiniRISC EZ4021 and the TinyRISC EZ4103. Both cores are available now in LSI's EasyMacro format — a physical implementation of the synthesizable models that includes a cache controller, MMU, bus-interface unit, EJTAG debugging, and other features. [July 24, 2000]
- Figure 1: Die photo with floorplan overlay of the MiniRISC EZ4021 EasyMacro in LSI's 0.18-micron G12P process.

Lexra's NetVortex Does Networking

MIPS-Like CPU Architecture Is Designed For Packet Routing

If you're not satisfied with any of the network processors (NPUs) that everyone from C-Port and IBM to Intel and Sitera has announced in recent months, Lexra has an alternative: license NetVortex and build your own. NetVortex is the first licensable microprocessor architecture designed for packet processing. Because it allows designers to integrate from 1 to 16 cores on a die, NetVortex is suitable for a wide range of applications — everything from home-network gateways at the low end to OC-192 core routers at the high end. [July 17, 2000]

Figure 1: How NetVortex uses multiple register files to switch contexts among threads with zero-cycle delays.
Figure 2: Block diagram of a hypothetical NetVortex-based NPU that integrates 12 cores for packet processing at OC-192 wire speed.
Table 1: Description of 18 new instructions in the NetVortex architecture.
Table 2: The number of clock cycles required for typical packet-processing tasks and the percentage of capacity those tasks would use in a hypothetical NetVortex NPU.
Photo: L. Patrick Hays, CTO of Lexra, describes NetVortex at Embedded Processor Forum.

Top PC Vendors Adopt Crusoe

Transmeta Reveals Roadmap; New TM5600 Has 512K L2 Cache

Four top-tier vendors at PC Expo announced their intention to make notebook computers based on Transmeta's Crusoe processors. Some of these systems will use a new version of Crusoe that has twice as much on-chip L2 cache. Transmeta has also revealed a two-year roadmap of processors with higher clock speeds, greater integration, lower power consumption, and new VLIW cores. [July 10, 2000]

Figure 1: Transmeta's roadmap of future processors.
Figure 2: A real-time trace that shows the power consumption of a Crusoe TM5400 processor while playing a DVD movie and launching another Windows application.
Figure 3: This real-time power-consumption trace shows an Intel 500/600MHz mobile Pentium III processor running the ZD Media BatteryMark 3.0 program.
Figure 4: Another real-time trace that shows a 500/600MHz mobile Pentium III processor playing a DVD movie in battery-optimized (500MHz) mode.
Table 1: Comparison of Transmeta's present and future microprocessors.
Table 2: Comparison of Transmeta's TM5400 and TM5600 Crusoe processors to Intel's 500/600MHz and 600/750MHz mobile Pentium III processors.
Transmeta Explains LongRun: For the first time, Transmeta reveals the technical details of power management in the Crusoe TM5400 processor.
- Figure: How Transmeta's LongRun power management changes the processor's voltage and frequency to match software demands.
- Photo: Transmeta's Mark Fleischmann, director of low-power programs, explains LongRun at Embedded Processor Forum.

SiByte Reveals 64-Bit Core for NPUs

Independent MIPS64 Design Combines Low Power, High Performance

On June 12, unfazed by the burgeoning number of network processors (NPUs), SiByte disclosed the first details of its new SB-1 microprocessor core at Embedded Processor Forum. If the Silicon Valley startup can deliver what it promises — a 1GHz core that surpasses 2,000 Dhrystone mips while consuming only 2.5W — the SB-1 will push MIPS-based NPUs to new heights of power efficiency and performance. [June 26, 2000]

Figure 1: Block diagram of SiByte's SB-1 four-issue superscalar core.
Figure 2: The SB-1's deep ALU, FPU, and load/store pipelines.
Figure 3: An example of an integrated NPU with multiple SB-1 cores and on-chip peripherals linked over the ZBbus.
Table 1: Comparison of SiByte's SB-1 with the MIPS 20Kc, MIPS 5Kc, Lexra NetVortex, and IBM PowerPC 440 processor cores.

Embedded News

June 2000

IDT's RC32334 Integrates PCI: IDT's new Internetworking Products Division has announced its first chip, a 32-bit MIPS-compatible embedded processor with integrated PCI and SDRAM controllers. The 32334 is intended for low-cost network equipment, such as small-office routers, LAN switches, and home DSL gateways. [June 26, 2000]
- Table 1: Comparison of IDT's RC32334 with QED's RM5720, IBM's 405GP, Hitachi's SH7751, and Motorola's MPC8240.

ARC Cores Encourages 'Plug-Ins'

Third-Party Extensions Enhance Configurable CPU Cores

High-level synthesis tools and configurable CPU cores already bring some of the malleability of software to microprocessors. Now ARC Cores is taking the next step: CPU "plug-ins." The technical concept and business model for ARC's plug-ins will be familiar to users of PC software. In a similar fashion, ARC is encouraging intellectual-property providers and even its own customers to develop and sell extensions to ARC's configurable embedded-processor cores. [June 19, 2000]

Figure 1: Generic bus structure of the ARC 3 synthesizable microprocessor core.
Figure 2: Screen shot of ARChitect, ARC's graphical CPU-design tool.

Intel Embeds Coppermine

Embedded Pentium III and Celeron Identical to Mobile Chips

Intel has introduced five embedded processors based on the same 0.18-micron Coppermine die found in most Pentium III and Celeron processors for the desktop and mobile markets. Actually, the embedded versions of the chips are identical to the desktop/mobile processors, but Intel guarantees longer availability (at least five years) and is signing up more third-party companies to support the parts with system software and development tools. Three of the new embedded processors are Pentium III designs, and two are Celeron designs. [June 12, 2000]

Table 1: Feature comparison of the embedded Pentium III-733, Pentium III-700, Pentium III-500, Celeron-566, Celeron-400, and QED RM7000A.

Sitera Samples Its First NPU

IQ2000 Programmable Network Processor Targets Edge Routers

Programmable network processors (NPUs) are the newest rage, and one of the latest examples is from Sitera, a four-year-old startup based in Longmont, Colo. Sitera recently began sampling a multiprocessor chip called the Prism IQ2000 and plans to start production in 4Q00. As with similar NPUs from C-Port, IBM, and Intel, the IQ2000 is intended to replace some of the dedicated ASICs found in routers, switches, and network-gateway devices. [May 29, 2000]

Figure 1: System diagram of the IQ2000 in a router application.
Figure 2: Block diagram of the IQ2000 network processor.
Table 1: Feature comparison of three different versions of the IQ2000.

Motorola Thaws ColdFire V4

Integrated CF5407 Chip Beats Award-Winning CF5307

It's coming a year later than Motorola had hoped, but the CF5407 — the first standard chip based on the ColdFire V4 core — is a significant improvement over the two-year-old CF5307. It delivers three times the raw performance, twice as many mips per megahertz, and nearly four times as many mips per watt. And the 5407 is almost pin compatible with the 5307, requiring only a lower Vcc supply (1.8V) and different clock inputs for its core, so developers can make boards that work with either chip. [May 15, 2000]

Figure 1: CF5407 block diagram.
Figure 2: CF5407 color die photo.
Table 1: New and improved instructions in the ColdFire V4 instruction set.
Table 2: Comparison of the CF5407 and CF5307.
Table 3: Comparison of the Motorola CF5407, IDT 32364, National Semiconductor 486SXL, AMD 486DX5, and IBM PowerPC 403GCX.

Massana Teams With Lexra and Xemics

DSP Cores Integrate With 8- and 32-Bit CPUs

Silicon Valley startup Massana has formed partnerships with two providers of embedded-processor cores — Lexra and Xemics — to offer integrated cores that combine CPUs with Massana's DSPs. The deal with Lexra teams the FILU-200 soft DSP with Lexra's LX4180, a 32-bit soft CPU core that's largely compatible with the MIPS instruction set. The deal with Xemics, a Swiss company, combines a lower-end version of Massana's DSP (the FILU-50) with a proprietary 8-bit RISC microcontroller core (CoolRISC). [May 8, 2000]

Figure 1: Block diagram of Massana's FILU-200 DSP mated to Lexra's LX4180 CPU.

Lexra Introduces LX4189 Core

Lexra has introduced its fifth MIPS-like embedded-processor core, the LX4189. It's very similar to the 32-bit LX4180 core rolled out a year ago, except that it has an additional pipeline stage to reach higher clock frequencies. Lexra says the LX4189 is better suited for next-generation 0.15-micron fabrication processes. [May 8, 2000]

Table 1: Comparison of Lexra's LX4189, LX4180, LX4080, LX4280, LX5280, MIPS Technologies' MIP32 4K, and MIPS64 5K cores.

EEMBC Releases First Benchmarks

Five Benchmark Suites Put Embedded CPUs to the Test

The EDN Embedded Microprocessor Benchmark Consortium (EEMBC) has released its long-awaited first benchmark results. MIPS-compatible processors dominated this round of benchmarking, with three MIPS licensees (IDT, NEC, and Toshiba) subjecting five different chips to EEMBC's rigorous tests. The x86 architecture was represented by two processors — AMD's K6-2 and National Semiconductor's Geode GX1. Other early birds were Infineon (TriCore TC10GP), Mitsubishi (M16C/62A), and STMicroelectronics (ST20C2). NEC also benchmarked its V832 (a 32-bit CPU based on a proprietary architecture), and Toshiba tested its TMP95FY64F (a proprietary 16-bit microcontroller). To put the results in perspective, MDR has derived its own unofficial "EEMBCmark" composite scores. [May 1, 2000]

Table 1: The EEMBC 1.0 benchmarks consist of 46 tests divided into five application suites.
Table 2: The raw numbers that EEMBC reports for the automotive/industrial benchmark suite.
Figure 1: Bar chart of MDR's unofficial "EEMBCmark" scores based on the normalized geometric means of EEMBC's raw benchmark results in the automotive/industrial suite.
Figure 2: An X-Y scatter-plot chart that shows how each processor's performance relates to clock frequency. One dotted line is the average performance trendline and the other dotted line represents a linear increase in performance with clock speed.
Figure 3: An X-Y scatter-plot chart that uses only the FFT test results from EEMBC's automotive/industrial suite, with trend and frequency/performance lines.
Figure 4: Bar chart of MDR's unofficial EEMBCmark scores from EEMBC's consumer-software benchmark results.
Figure 5: X-Y scatter-plot chart of the five processors tested in EEMBC's consumer-software suite, with trend and frequency/performance lines.
Figure 6: Bar chart of MDR's unofficial EEMBCmark scores from EEMBC's networking suite.
Figure 7: X-Y scatter-plot chart of seven processors tested in EEMBC's networking suite, with trend and frequency/performance lines.
Figure 8: Bar chart of MDR's unofficial EEMBCmark scores from EEMBC's office-automation suite.
Figure 9: X-Y scatter-plot chart of five processors tested in EEMBC's office-automation suite, with trend and frequency/performance lines.
Figure 10: Bar chart of MDR's unofficial EEMBCmark scores from EEMBC's telecommunications suite.
Figure 11: X-Y scatter-plot chart of six processors test in EEMBC's telecommunications suite, with trend and frequency/performance lines.

ARM Wrestles picoTurbo in Court

ARM-Compatible Cores From Startup Draw Patent Lawsuit

PicoTurbo, a two-year-old startup based in Milpitas, Calif., has a new twist on ARM: a family of embedded-processor cores that's compatible with the ARM architecture. Indeed, the cores are apparently too compatible for ARM, which has filed a patent-infringement lawsuit against picoTurbo in U.S. District Court in San Jose. PicoTurbo maintains that its cores do not infringe on ARM's patents because they are based on an independently designed "clean room" microarchitecture. [April 17, 2000]

Figure 1: Block diagram of the picoTurbo pT-110.
Figure 2: Comparing the ARM9 and ARM10 pipelines with the picoTurbo pT-100, pT-110, and pT-120 pipelines.
Table 1: Comparing features, performance, and power consumption of the ARM9, ARM10, pT-100, pT-110, and pT-120 cores at 0.25 and 0.18 microns.
Table 2: List of ARMv5T and ARMv5TE instructions not supported by picoTurbo's ARMv4T-compatible cores.

QED's RM7000A Gets Faster, Cooler

Process Shrink to 0.18 Microns Will Boost Frequency to 450MHz

Squeezing more life out of a four-year-old core, Quantum Effect Devices (QED) is producing a new version of its 64-bit MIPS-compatible RM7000 processor in a 0.18-micron process from TSMC (Taiwan Semiconductor Manufacturing Co.). The new RM7000A will run up to 50% faster while consuming 66% less power than its predecessor. QED will soon follow with the RM7000B, which uses 0.15-micron transistors on the same-size die, boosting clock frequencies to 500MHz. [April 3, 2000]

Table 1: Feature comparison of the QED RM7000, RM7000A, RM7000B, RM5261, IDT RC64575, IDT RC5000, and NEC VR5432.

Embedded News

April 2000

ARC Cores Builds IP Library: ARC Cores has acquired two companies — VAutomation and Precise Software Technologies — that for the first time allow it to supply intellectual property in the form of peripheral hardware and software to licensees of its configurable CPU cores. [April 10, 2000]
LSI Logic Adopts AMBA: LSI Logic has adopted the Advanced Microcontroller Bus Architecture (AMBA) as the standard interconnect for its CoreWare system-on-a-chip design services. The decision throws more weight behind AMBA's bid to become the defacto standard for on-chip buses. [April 10, 2000]
MIPS Joins Forces With TSMC: MIPS Technologies has formed a partnership with TSMC (Taiwan Semiconductor Manufacturing Co.) to make prehardened versions of its soft embedded-processor cores. The deal gives customers the option of paying a design-use fee instead of the higher cost of a MIPS architectural license while saving the time and trouble of porting a soft core to an IC process themselves. [April 3, 2000]

JSTAR Coprocessor Accelerates Java

Licensable Bytecode Translator Works With Almost Any Core

While bytecode-native Java chips continue struggling to find a market, a team led by former Sun engineers has invented a novel alternative: a coprocessor that attaches to any CPU core and translates Java bytecodes into native instructions on the fly. The coprocessor is available now as a licensable Verilog model from JEDI Technologies, a Santa Clara-based startup founded in 1998. [March 27, 2000]

Figure 1: Block diagram of the JSTAR instruction-path coprocessor.
Figure 2: Embedded CaffeineMark 3.0 results for JSTAR.
Sidebar: Java Chips Fight an Uphill Battle

TI Cores Accelerate DSP Arms Race

New 'C64x and 'C55x DSPs Battle Analog Devices, StarCore, Intel

Everything is bigger in Texas, including the DSPs. The Texas Instruments TMS320C62x-series DSP core, already the T. Rex of digital-signal processing, is about to be surpassed by an even more powerful beast. TI says its new TMS320C64x core offers about 10 times the performance of the existing core — plus greater code density and full compatibility with 'C62x software. TI isn't ignoring the opposite end of the market either. A second new core, the 'C55x, supplements the popular 'C54x and brings higher performance, lower power consumption, and greater code density to low-power DSPs. [March 6, 2000]

Figure 1: Block diagram of the 'C64x DSP core.
Figure 2: Internal resources of the eight function units in the 'C64x.
Figure 3: The new 'C64x instruction format dramatically reduces NOPs.
Figure 4: Block diagram of the 'C55x DSP core.
Table 1: Several new instructions added to the 'C64x.
Table 2: Comparison of TI's 'C64x, 'C62x, 'C67x, StarCore's SC140, and Analog Devices' ADSP-TS001 TigerSHARC DSP cores.
Table 3: Several new instructions added to the 'C55x.
Table 4: Comparison of TI's 'C55x, 'C54x, Lucent's DSP16000, Motorola's DSP56600, StarCore's SC140, and Analog Devices' ADSP-219x DSP cores.

Embedded News

March 2000

Another New DSP Core From TI: Texas Instruments has announced the TMS320C28x, its third new DSP core in less than a month. The 'C28x is designed for DSPs in the $10 range and extends TI's 'C2000 series of 16-bit fixed-point DSPs. [March 20, 2000]
Motorola Buys C-Port: Smart Move: By offering $430 million in stock for C-Port, a fabless network-processor startup, Motorola is acquiring a powerful NPU to complement its existing lines of communications chips. At the same time, the deal counters some recent incursions onto Motorola's turf by major rivals such as IBM and Intel. [March 6, 2000]

Transmeta Breaks x86 Low-Power Barrier

VLIW Chips Use Hardware-Assisted x86 Emulation

A detailed 7,000-word analysis of Transmeta's Crusoe processors, which achieve x86 compatibility and low power consumption by running "code-morphing software" on efficient VLIW chips. This article examines Transmeta's unusual design approach, the LongRun voltage/frequency-scaling technology, the tradeoffs of emulation, the hardware support for emulation in the chips, and Transmeta's business strategy. [February 14, 2000]

Figure 1: Block diagram of the Crusoe TM3120 processor.
Figure 2: Block diagram of the Crusoe TM5400 processor.
Figure 3: Crusoe's VLIW instruction formats.
Figure 4: Crusoe's pipelines for ALU, floating-point, load/store, and branch instructions.
Figure 5: Color die photo with floorplan overlay of the TM3120.
Figure 6: Color die photo with floorplan overlay of the TM5400.
Figure 7: How Transmeta's "code-morphing software" translates x86 instructions into native VLIW instructions.

Hitachi SH7615 Adds Ethernet

SH-DSP Processor Targets Internet Phones, Network Devices, Modems

To exploit the latest hot-product category — Internet gizmos — Hitachi has added an Ethernet interface and DSP instructions to one of its best-selling SuperH processors. The result is the new SH7615, which samples in March and is scheduled for volume production in June. Although in most ways the SH7615 is a relatively minor variation of Hitachi's existing SH7604 and SH7612 chips, it brings together the critical features of network connectivity and digital-signal processing for the first time in a SuperH processor. [January 24, 2000]

Figure 1: Block diagram of Hitachi's SH7615.
Table 1: Comparison of the SH7615 to Hitachi's SH7612, Infineon's Harrier-XT, IBM's PowerPC 405GP, Lexra's LX5280, ARM's ARM9E, and ARC Cores' ARC 3.

Embedded Market Breaks New Ground

Network Processors, Configurable Cores, New Architectures Are Key Trends

This detailed article reviews the embedded-processor market in 1999, focusing on major new trends, the most important new processors, and the most active semiconductor vendors. It also looks forward to the likely trends in 2000, predicting which companies, processors, and product categories will generate the most news in the coming year. In addition, we list the top seven embedded processors announced during the year and reveal our choice for the Best Embedded Processor of 1999. [January 17, 2000]

Figure 1: Pie chart of 32/64-bit embedded-processor unit sales in 1999, broken down by CPU architecture.
Figure 2: Price/performance comparison of four top embedded processors for mobile applications: Intel StrongARM SA-110, NEC VR4121, Hitachi SH7708, and Motorola DragonBall EZ.
Sidebar: Key Embedded Events of 1999
Sidebar: Best Embedded Processor 1999

Embedded News

January 2000

HP and ST Collaborate on VLIW: Hewlett-Packard Labs and STMicroelectronics have jointly developed a new customizable VLIW embedded-processor technology that will debut later this year. The technology allows developers to rapidly create application-specific VLIW processors with compatible development tools, simulators, and RTOS kernels. [January 24, 2000]
Centillium Licenses MIPS32 4Kp Core: Centillium Communications has licensed the MIPS 4Kp core from MIPS Technologies for a new family of hybrid CPU/DSP processors scheduled for introduction in 2H00. The Fremont-based company plans to integrate the 4Kp with its own DSP core to create a system-on-a-chip (SOC) device for wired-communications products. [January 31, 2000]
Motorola's Symphony Plays Digital Music: Motorola's latest DSP — the DSP56366 Symphony — has enough performance and on-chip memory to handle a wide variety of digital-audio standards, and it's aimed at the lucrative market for next-generation consumer-audio products. Sampling this quarter and scheduled for volume production in 2Q00, Symphony is a 24-bit fixed-point DSP that's compatible with Motorola's DSP56300 family. [January 31, 2000]

NEC VR4122 Wrestles StrongArm

Mobile Processor for Windows CE Beats SA-1 But Can't Reach SA-2

It took four years and two leaps in IC process technology, but NEC Electronics has finally announced a mobile embedded processor that appears to surpass the StrongArm's famed combination of performance and power consumption. NEC's new VR4122, scheduled for production in 2Q00, barely edges out StrongArm's MIPS/watt ratio, though not quite its stated performance. [December 27, 1999]

Table 1: Comparison of the NEC VR4122 with the NEC VR4121, NEC VR4111, Intel SA-1110, Intel SA-2, and Hitachi SH7751.

Embedded News

December 27, 1999

SiByte Licenses MIPS for Network Processor: More refugees from Digital Semiconductor have surfaced, and they're working on a MIPS-based network processor. Their Santa Clara-based startup, SiByte, recently licensed the MIPS64 instruction-set architecture from Mips Technologies.
Motorola Releases Specs for On-Chip Bus: As it promised six months ago, Motorola's semiconductor products sector has released the specifications for its new core-independent on-chip peripheral bus, formerly known as IP Bus.
Embedded Benchmarks Ready for Prime Time: EEMBC (EDN Embedded Microprocessor Benchmark Consortium) has released version 1.0 of two benchmarking suites: the automotive/industrial suite and the telecommunications suite.

Mips vs. Lexra: Definitely Not Aligned

Patent Lawsuit Hinges on Unusual Instructions in MIPS Architecture

Here's a detailed analysis of the patent-infringement lawsuit that Mips Technologies filed against Lexra in October, with emphasis on the technical foundations of Mips' allegations and Lexra's possible defense. The lawsuit focuses on two unusual features of the MIPS architecture: unaligned load/store instructions and SIMD instructions with extended-precision math. [December 6, 1999]

Figure 1: How the MIPS unaligned load/store instructions work.
Figure 2: The assembly code that Lexra uses to emulate unaligned load/store instructions.
Figure 3: Block diagram of the dual MAC units in Lexra's LX5280 microprocessor core.

Motorola Cellular DSP Does It All

DSP56690 Integrates M-Core MPU, Supports Multiple Wireless Standards

Jet-setters who want to stay in touch won't have to keep packing more cell phones than shoes much longer. Motorola's new DSP56690 is a highly integrated embedded processor that supports all of the most common wireless standards likely to be encountered on a globe-hopping journey. [December 6, 1999]

Figure 1: Block diagram of the DSP56690.

Embedded News

December 6, 1999

ADI's First TigerSharc DSP Has Sharp Teeth: In a bid to seize the lead in DSP performance, Analog Devices (ADI) has announced the ADSP-TS001, the first implementation of its much-delayed TigerSharc architecture.
ADI, StarCore Offer Vitamin C for DSPs: This week, two DSP vendors released preliminary benchmarks of alpha- or beta-version compilers for preproduction DSPs. Analog Devices (ADI) and StarCore claim their C compilers can achieve a significant amount of the performance normally associated with hand-coded assembly language. ADI's compiler is for the TigerSharc ADSP-TS001, and StarCore's compiler is for the SC140.
Embedded Processor Forum Moved to June: Embedded Processor Forum, sponsored by Cahners MicroDesign Resources, will be held June 12-16 instead of in May as previously announced.

Cirrus Logic Makes Music With ARM

EP7212 Maverick Processor Has Digital-Audio Interface for MP3 Players

Cirrus Logic's new EP7212 Maverick chip is an application-specific standard product (ASSP) for mobile information appliances that need digital-audio capabilities. Cirrus is aiming Maverick at next-generation products that can download and play audio files from the Internet, in addition to performing the more common tasks expected of handheld computers. [November 15, 1999]

Figure 1: Projected sales of portable Internet-audio players.
Figure 2: EP7212 Maverick block diagram.
Table 1: Comparison of Cirrus Logic's EP7212, EP7211, and EP7209 chips.

Embedded News

November 15, 1999

Intel Bids $1.6 Billion for DSP Communications: Intel is buying DSP Communications, a supplier of chip sets and software to cell-phone manufacturers, as the latest in a series of acquisitions in the communications and networking industries.
Motorola's DragonBall Rolls Faster: The new DragonBall 68VZ328 processor is an improvement over existing DragonBall chips — both the 68328 in the original PalmPilot and the 68EZ328 in new models from 3Com and startup Handspring. The DragonBall VZ doubles the clock frequency to 33 MHz, adds color capability to the integrated LCD controller, and has other enhancements.
- Table: Comparison of the DragonBall 68328, 68EZ328, and 68VZ328.
Arm Extends Reach of ARM10 Pipeline: Unable to attain its frequency goals with the original five-stage pipeline, Arm has extended the ARM10's pipeline to six stages. Arm recently taped out the new core and plans to make it available to licensees on schedule in 2Q00.
- Figure: Pipeline diagram comparing the ARM9 and old ARM10 pipeline with the new ARM10.
Zoran's Soft DSP Core Optimized for Audio: Zoran has announced Muzichord, a new synthesizable DSP core for audio applications. The 32-bit fixed-point DSP has enough speed and precision to handle next-generation audio standards such as DVD audio.
Mips Technologies Sues Lexra Over Patents: Although Mips Technologies and Lexra settled a lawsuit last year over trademark issues and product claims, it seems their legal battles aren't over. Mips has filed another lawsuit against Lexra, this time alleging patent infringement in the design of Lexra's synthesizable processor cores, which are mostly compatible with the MIPS architecture.

IBM PowerPC 440 Hits 1,000 MIPS

High-Performance Embedded Core Implements Book E Architecture

IBM's new PowerPC 440 core is the first officially announced embedded-processor core that's projected to hit 1,000 Dhrystone MIPS. It achieves some other firsts as well. It's the first core to implement Book E, the new embedded PowerPC architecture defined by IBM and Motorola. And it's the first core to use a 128-bit version of IBM's on-chip CoreConnect bus. [October 25, 1999]

Figure 1: PowerPC 440 block diagram.
Table 1: A comparison of recently announced high-performance embedded cores, including the PowerPC 440, Mips 5Kc, IDT RISCore 64600, Intel StrongArm-2, and Hitachi/STMicroelectronics SH-8000/ST50.

Mips Plays Hardball With Soft Cores

MIPS64 5Kc Is First 64-Bit Synthesizable Processor Core

Mips Technologies is getting softer all the time, but that's not good news for competitors. Mips has announced the first implementation of its MIPS64 instruction-set architecture — and the first 64-bit soft core from any microprocessor vendor. It joins a growing line of synthesizable embedded cores from Mips, including the MIPS32 4Kc, 4Kp, and 4Km. [October 25, 1999]

Figure 1: MIPS64 5Kc pipeline diagram.
Figure 2: MIPS64 5Kc block diagram.

Embedded News

October 25, 1999

Massana's DSP Coprocessor Bolts Onto CPUs: While many embedded-processor vendors are adding DSP extensions, Massana's FILU-200 is quite different: it's a synthesizable DSP coprocessor that attaches to the CPU's memory bus and is programmable with C function libraries.
ADI Adopts AMBA for New DSPs: Analog Devices has unveiled a new DSP core, the ADSP-219x, and has announced that it will use Arm's Advanced High-Performance Bus (AHB) interface, which is part of the open-standard Advanced Microcontroller Bus Architecture (AMBA).

Hitachi, ST Extend SuperH to 64 Bits

New SH-5 Architecture Aims for Multimedia Systems on a Chip

Hitachi and STMicroelectronics have collaborated on a new 64-bit embedded processor architecture called SH-5 that preserves backward compatibility with existing SuperH software. It's a clever combination of power and efficiency that keeps Hitachi and ST in the accelerating race against other high-performance embedded processors, such as those based on PowerPC, MIPS, and StrongArm cores. [October 6, 1999]

Table 1: The complete SH-5 instruction set.
Figure 1: Block diagram of the first SH-5 core.
Figure 2: A comparison of the new SH-5 and existing SuperH instruction formats.
Figure 3: Existing SuperH registers are mapped onto the new, larger SH-5 register file.
Figure 4: Block diagram of the first SH-5 system on a chip, which Hitachi calls the SH8000 and ST calls the ST50.

First StarCore DSP Targets Networking

Motorola's MSC8101 Combines SC140 Core and PowerQuicc II Coprocessor

The new MSC8101 is the first digital-signal processor (DSP) to emerge from the StarCore alliance between Motorola and Lucent. It's so network oriented that Motorola has considering inventing a new buzzword for it: NetDSP. [October 6, 1999]

Figure 1: Block diagram of the Motorola MSC8101.

Embedded News

October 6, 1999

IBM, C-Port Network Processors Challenge Intel: At about the same time as Intel's IXP1200 announcement, C-Port and IBM Microelectronics separately announced two network processors aimed at the same market — high-speed routers and related communications equipment.
Triscend Ships First Reconfigurable 8051: The first member of Triscend's E5 family — the 8051-based TE520 — is now in full production, with additional members to follow next year. It combines an 8051 processor core with reprogrammable logic.
MIPS32 4Km Core Has Fast MAC: Mips Technologies' new 4Km is the third synthesizable core to adopt the MIPS32 instruction-set architecture introduced earlier this year for embedded applications. The 4Km combines features of the 4Kc and 4Kp cores, which were announced in May.
- Table 1: Feature comparison of the MIPS32 4Kc, 4Kp, and 4Km cores.

Intel Network Processor Targets Routers

IXP1200 Integrates Seven Cores for Multithreading Packet Routing

Only Intel could have this kind of luck: it gets sued by Digital Semiconductor for patent infringement, ends up acquiring its foe after an out-of-court settlement, gains a billion-dollar fab and a StrongArm license in the deal, and then discovers that it has also inherited a groundbreaking network processor that was secretly under development. Perhaps Intel should encourage competitors to file lawsuits more often. [September 13, 1999]

Sidebar: An Architecture for Networking
Figure 1: IXP1200 block diagram.
Figure 2: The IXP1200's die is 126mm^2 in a 0.28-micron process.
Figure 3: Each microengine has 256 registers in separate files and banks.
Table 1: The complete IXP1200 microengine instruction set.

Embedded News

September 13, 1999

IDT Unveils New 64600 Core: IDT is augmenting its line of MIPS-compatible processors with a third 64-bit core designed for high-performance embedded applications: the RISCore 64600.
AMD Teaches Old Core New Tricks: AMD has announced the Elan SC520, the first in a new series of its x86-based processors for embedded applications.

Sun Reveals Secrets of "Magic"

New MAJC Architecture Has VLIW, Chip Multiprocessing Up Its Sleeve

If there were any doubts that VLIW has succeeded RISC as the most important influence on new microprocessor architectures, they vanished this month when Sun pulled the latest example out of its hat: MAJC (pronounced "magic"), the Microprocessor Architecture for Java Computing. It's a Java-friendly (though not Java-specific) architecture that's particularly amenable to multithreading and chip multiprocessing (CMP) — the integration of multiple CPU cores on a single die. [August 23, 1999]

Figure 1: The format of MAJC's VLIW instruction packets.
Figure 2: Vertical multithreading switches contexts among program threads during memory accesses.
Figure 3: VLIW packet processing in a four-way MAJC processor with orthogonal function units.
Figure 4: MAJC's unified register file.

IDT Expands Embedded MIPS Family

New RC64574 and '575 Bridge IDT's 64-Bit RISC Cores

Broadening its range of 64-bit embedded processors, IDT is sampling two new MIPS-compatible chips based on the high-performance RC5000 core. The new RC64574 and '575 extend that core in many of the same ways that IDT's RC64474 and '475 extended the 64-bit RC4700 core last year. [August 23, 1999]

Table 1: Comparing IDT's new processors with existing IDT processors and other MIPS-compatible chips from QED and NEC.

Embedded News

August 23, 1999

Mips Adds a New Dimension to MIPS64: Mips Technologies has introduced MIPS-3D, a set of 13 instructions for MIPS64 embedded processors.
Arm Announces Two Soft Cores With DSP: Arm has unveiled two synthesizable cores based on the ARM9E announced at Embedded Processor Forum last spring. Both include the ARM9E's digital-signal processing (DSP) extensions.

Alliance Detours Into Routers

SRAM-Rich Network Processor Is Departure for Memory Vendor

Alliance Semiconductor's new IPRP-V4 (Internet Protocol Routing Processor) architecture certainly defies classification. Is it embedded-memory logic or embedded-logic memory? Either way, it's designed to solve a growing problem: managing and searching IP forwarding tables to keep up with high-end routers and fiber-optic backbones. [August 2, 1999]

Figure 1: IPRP-V4 block diagram.
Table 1: IPRP-V4 instruction set.

National Unveils "Appliance On a Chip"

Highly Integrated Device Covers Retreat From PC Processor Market

Battered but not beaten by its brief foray onto Intel's turf, National Semiconductor is launching a long-anticipated flank attack — an "information appliance on a chip" designed for non-PC devices in homes and offices. The highly integrated chip, scheduled for delivery next January, is the key to National's post-PC strategy. And it's probably the last chance for National to salvage any value from its costly rental of Cyrix. [August 2, 1999]

Figure 1: Geode SC1400 block diagram.

Fujitsu FR-V Architecture Bets On VLIW

Customizable Instruction Set Optimized for Embedded Applications

Fujitsu Microelectronics has announced a new embedded-processor architecture that's definitely buzzword compliant. It has very long instruction words (VLIW), multimedia instructions, digital-signal-processing (DSP) features, a customer-extensible instruction set, and configurable cores. And the cores are designed to be combined with macro libraries to build system-on-a-chip (SOC) parts for consumer electronics, automotive-navigation computers, and communications devices. [August 2, 1999]

Figure 1: Fujitsu's FR-V architecture consists of five instruction subsets that customers may combine and customize in different ways.

Embedded News

August 2, 1999

LX4280 Fills Lexra's Midrange: Lexra has announced a MIPS-compatible processor core that it claims will be the fastest such 32-bit core on the market. The new LX4280 is expected to deliver 275 Dhrystone MIPS at a worst-case clock frequency of 200 MHz. At its maximum estimated frequency of 266 MHz, the LX4280 should deliver at least 350 MIPS.

PowerPC 405GP Has CoreConnect Bus

IBM Offers Free Licenses to Make On-Chip Bus a New Standard

IBM's latest PowerPC processor, the 405GP, is a highly integrated system on a chip (SOC) with PCI, Ethernet, an SDRAM controller, and the first implementation of CodePack code compression. What's potentially more important is that IBM is using the 405GP to kick off the CoreConnect bus — an on-chip bus architecture for SOCs that IBM is offering free to all comers. [July 12, 1999]

Figure 1: The CoreConnect architecture consists of three buses: the high-speed processor local bus, the lower-speed on-chip peripheral bus, and the device-control-register bus.
Figure 2: Block diagram of IBM's PowerPC 405GP.

SandCraft Adds Multimedia Extensions

New SR1-GX Core Aims at Next-Generation Set-Top Boxes

Microprocessors without multimedia extensions are becoming as rare as unemployed engineers in Silicon Valley. Equally rare are embedded-processor companies that don't have a system on a chip and "post-PC" strategy. One of the latest companies to swell the tide is SandCraft, which is introducing a new MIPS-compatible embedded CPU core with digital-signal-processing (DSP) and single-instruction, multiple-data (SIMD) extensions. [July 12, 1999]

Figure 1: Block diagram of the SandCraft SR1-GX.
Figure 2: SandCraft's conception of a system on a chip for a next-generation set-top box (block diagram).

Embedded News

July 12, 1999

Mips and Chartered Form Unique Partnership: Mips Technologies and Chartered Semiconductor have formed a new partnership that allows customers to use Mips's latest CPU cores without negotiating a special MIPS license or porting the soft cores to an IC process.
Motorola Plugs PowerQuicc Gap: The new PowerQuicc MPC855T fills a price and features gap in the PowerQuicc line between the six-member MPC850 family and the eight-member MPC860 family.

Embedded Benchmarks Grow Up

EEMBC Offers a Better Alternative to Dhrystone MIPS

After years of searching for an alternative to Dhrystone and other marginally useful benchmarks, the industry finally has a way to compare the performance of microcontrollers, microprocessors, compilers, and other system components. It's a series of benchmarking suites from EEMBC (pronounced "embassy"), the EDN Embedded Benchmark Consortium. EEMBC has been working on its benchmarking methods for almost three years. [June 21, 1999]

The Usual Suspects: A list of all 29 EEMBC board members and how much it costs to join EEMBC.
Figure 1: Geometric means are one way to derive composite scores from the EEMBC data, but the results may be misleading.
Table 1: Preliminary raw scores from the EEMBC automotive/industrial suite.
Table 2: The complete list of benchmark tests in the five categories of the EEMBC suites (version 0.9).

Hitachi SH7751 Gains a PCI Interface

Sega Dreamcast Chip Redesigned for Less Entertaining Embedded Applications

Hitachi's new SuperH 7751 joins the exclusive club of embedded processors that have an integrated PCI interface. It also runs Microsoft's Windows CE and consumes less than half a watt of power, opening up new possibilities for mobile CE-based devices that could make use of PCI connectivity. [June 21, 1999]

Figure 1: Block diagram of the SH7751.

Embedded News

June 21, 1999

Intel Expands Embedded x86 Lineup: Two Pentium II chips in surface-mount BGA packages and a pair of Pentium II embedded modules.
Motorola Enhances PowerPC Line: The PowerPC 745 and PowerPC 755 improve on the existing PowerPC 740 and PowerPC 750.

Jade Enriches MIPS Embedded Family

First Synthesizable Cores From MIPS Implement New 32-Bit Architecture

MIPS Technologies has unveiled two new architectures that will carry the Rx000 family toward the future of high-performance embedded cores and system-on-a-chip devices. The new architectures, known as MIPS32 and MIPS64, are 32- and 64-bit derivatives of existing MIPS architectures. The company also announced the first two cores based on MIPS32: the 4Kc and the 4Kp, popularly known as Jade and Jade Lite. [May 31, 1999]

Table 1: New instructions in the MIPS32 architecture.
Figure 1: Block diagram of the Jade/4K soft core.
Figure 2: The original R3000 execution pipeline.
Figure 3: The Jade execution pipeline.

StarCore Reveals Its First DSP

Six-Issue VLIW Core Can Execute 1.2 Billion MACs/s, 3,000 MIPS

Like Texas Instruments and Analog Devices, the Motorola-Lucent alliance known as StarCore is betting on the Great Wide Hope: VLIW. StarCore's new SC140 is the third recent DSP architecture to apply long instruction words and a wide-issue core to the challenge of delivering more instruction-level parallelism. [May 10, 1999]

Table 1: Comparison of the StarCore SC140, TI 'C6202, and Analog Devices TigerSharc DSPs.
Table 2: Key operations in the SC140 instruction set.
Figure 1: Block diagram of the SC140 core.

Intel Flexes StrongArm With New Chips

Highly Integrated SA-1110 and SA-1111 Support Synchronous Memory

Reaffirming its commitment to the StrongArm architecture cast off by Digital, Intel is introducing a new integrated microprocessor with a companion chip. The new SA-1110 and SA-1111 will strengthen StrongArm's position in the market for highly integrated power-miserly processors. Intel plans to deliver the new chips in 3Q99. [April 19, 1999]

Sun's Jini: Science, Not Magic

The Goal Is Smarter Networks, But Microsoft Has a Plan, Too

As if "write once, run anywhere" weren't an ambitious enough target, Sun is now aiming for "write once, run everywhere." Sun's new Java-based Jini technology tries to make it easier for IT administrators and befuddled users to add hardware devices and software services to networks. "Plug and work, not plug and play" is the new mantra. [March 29, 1999]

Figure 1: Jini services use remote method invocations to interact over networks.
Figure 2: Jini services can be provided by hardware devices or by programs written in Java or native code.
Figure 3: Microsoft's Universal Plug and Play relies on standardized service protocols (such as LPR) instead of Java interfaces.

Jawa Improves Garbage Collection

Latest Episode Is a New Hope for Dark Side of Virtual Machine

The recent Microprocessor Forum included a highly anticipated demonstration of Jawa 3.0 and a new interface for electronic musical instruments called Jimi. Jawa now offers numerous enhancements over earlier versions, including more efficient garbage collection, scrap collection, tighter security bolts, and vital bug fixes for the dynamic compiler. [April Fool's Day Wrapper: March 29, 1999]

GP1000 Has Rewritable Microcode

Imsys Processor Executes Java Bytecodes and Concurrent Microcode Processes

Never mind RISC and CISC. "NISC" and "WISC" are some of the fanciful terms suggested for a unique embedded processor from Sweden that has rewritable microcode, microcode-level concurrency, native Java bytecode execution, multiple register banks, and other unusual features. [December 28, 1998]

Table 1: Comparison of the GP1000 and Sun's MicroJava 701.
Figure 1: Block diagram of the GP1000.
Figure 2: GP1000 die photo with floorplan overlay.
Figure 3: Eight register banks allow the GP1000 to switch among concurrent processes without saving or restoring registers.