Processor performance in the age of multi-core: RISC vs. CISC, part 1.
Reading Apple’s announcement in the news media and
trade press about a plan
to transition its next generation Mac computers from Intel-manufactured x64
processors to custom ARM chips prompted me to write a blog entry discussingApple’s strategy in greater depth and, hopefully, with more insight than the coverage of the move that published reports provided. An issue raised by one of
the computer industry experts that analyzed the Apple announcement was that it might
re-ignite an old debate among CPU hardware engineers with regard to the
relative virtues of the CISC vs. RISC approaches to processor design. This
seems very unlikely to me, and I will attempt to explain why in this post. Basically, RISC has won the engineering battle, but meanwhile Intel has good reasons to continue to resist any breaking changes in its hardware platform that would cause existing x86 and x64 software to fail. What is actually the most interesting aspect of the Apple announcement is its consolidation around a business model where the same company makes both the hardware and the software that runs on its platform, which is the model that was used very profitably until the Wintel collaboration began to dominate the PC desktop, portable and, eventually, server market and seemingly broke the mold.
As I discussed in the previous post, Apple has a number of
compelling reasons to shift from Intel to Apple silicon to power future Macs,
none of which have much to do with the relative merits of the CISC vs. RISC
approaches to processor architecture. It is true that Intel microprocessors support
a quite complex instruction set (i.e., CISC), a legacy of the original x86
design which proved so successful for the US company that the market pushed
back when it tried changing it. When Intel did attempt to introduce a radically
new architecture, branded as the Itanium, incorporating all the latest ideas
about processor hardware performance, it was a colossal flop. Since that
debacle, which created a window of opportunity for AMD to extend the x86
architecture to 64-bit addressing in a less radical way, current Intel and AMD
x64 processors maintain strict backward compatibility with older x86 software binaries,
ensuring that new hardware advances will not break existing software.
Meanwhile the ARM processors used in Apple cell phones and tablets, as well as the smartphone and tablets from Apple’s main competitors, originally used the reduced instruction set approach (RISC) – the ARM acronym stands for Advanced RISC Machine. The RISC design approach decreased the complexity of the instruction set in order to streamline the instruction execution pipeline, which reduced the amount of logic necessary on board the microprocessor chip to control instruction execution. Limited to the execution of fixed length instructions, for example, processors designed on RISC principles not only reduced the number of instructions available, but also limited the ones available to simple arithmetic, logical and control operations. By cutting back on the number of instructions that it supported, the first RISC machines introduced in the 1980s were able to cram the instruction execution engine and ample cache memory onto a single chip microprocessor. Taking notice of the role high speed cache memory incorporated onto the chip plays in accelerating pipelined instruction execution, the RISC designers also focused on creating an instruction format that made caching easier and improved cache effectiveness.
David Patterson of US-Berkeley and David Ditzel from AT&T Bell Labs make The Case for the Reduced Instruction Set Computer in a 1980 paper that persuasively makes the argument for a slimmed down instruction set for a high performance, single chip computer. At that time CPU manufacturers like IBM and DEC would add complex instructions to the machine's instruction set that could replace a commonly used sequence of instructions both for the convenience of assembler language programmers and to reduce the footprint of your binaries, which was once a big deal because proprietary RAM on those machines was so expensive. Note that these hardware vendors also made the OS software that ran on its platform. The hardware manufacturers also made the software development tools like compilers, etc., that were used to build all the application software that ran on its platform. Whenever new instructions were added to their hardware, these vendors could also coordinate the software support for these enhancements. The result was an orchestrated release of new hardware and related-software illuminating a clear upgrade path for customers. This is exactly what Apple did this summer, announcing the new hardware strategy for the Mac and simultaneously releasing the developer tools that support a path for migrating existing software to the new hardware.
Back in the day, for example, IBM handed OS 360/370 developers a STore Multiple (STM) instruction for storing registers values in the call stack prior to branching to a subroutine and a corresponding LoaD Multiple (LDM) instruction for restoring register values upon return. Here one complex STM instruction replaced 14 individual ST instructions which needed to be executed every time you issued a BALR 14,15 to branch to a subroutine (or method). The complex STM instruction did not execute any faster than 14 separate ST Register instructions, but it was a lot more convenient for asssembly language programmers. Ironically, ARM processors -- which are based on RISC design principles -- continue to support STM and LDM instructions. Go figure. I guess developer convenience is still a thing.
Requiring less instruction management and control logic on a
RISC processor chip also led to efficiencies in power consumption. More efficient
use of electric power was a secondary benefit initially of RISC
designs initially, but with the breakdown of what is known as Dennard
scaling around 2005, it emerged as one of the key elements that allowed low-power
ARM processors to capture the market for battery powered smart phones and other
mobile devices. Dennard scaling was an observation made in the early 1970s that increased
chip density in successive generations of semiconductor fabrication technology was
accompanied by reduced power consumption, allowing manufacturers to increase
the clock speed of microprocessors at the same time that they added more
circuitry to the chip. With the breakdown of Dennard scaling, however, processor
manufacturers have been unable to increase CPU clock speeds without generating
excessive heat that must be dissipated somehow, creating a whole new set of
engineering challenges that are chronicled in Hennessy and
Patterson’s Turing award lecture and the paper
they published in 2019 to accompany the lecture.
Programming RISC machines with simpler instruction sets placed greater emphasis on compilers to generate efficient code from high level language statements, something which was aligned with broader trends in software engineering that diminished the role of assembly language programming in general. As Patterson and Ditzel observe in their 1980 "Case for RISC" paper:
"One of the interesting results of rising software costs is the increasing reliance on high-level languages. One consequence is that the compiler writer is replacing the assembly-language programmer in deciding which instructions the machine will execute. (emphasis added) Compilers are often unable to utilize complex instructions, nor do they use the insidious tricks in which assembly language programmers delight."
Back when IBM dominated the market for enterprise computing in the
1970s and 1980s, the hardware manufacturer would extend its already large set
of machine language instructions on its mainframe computers, to assist developers
using its assembler language. Sprinkling in a few new machine language instructions
to each new generation of proprietary IBM mainframe processor hardware also
helped IBM maintain its market dominance, the profitability of which was mainly
threatened by plug-compatible manufacturers (PCMs) who built machines that
executed identical instructions cheaper or faster or, ideally, both. The new instructions presented a moving target to the PCMs, making the IBM competitors play a never ending game of catch-up that they could never win to in order to maintain strict compatibility with IBM's latest and greatest.
When its dominance in the PC market was similarly threatened
by AMD’s x86 plug compatible processors, Intel adopted a markedly similar
strategy, adding new instructions to each subsequent x86 processor model. New x86
instructions targeted at speeding up graphic processing on desktop PCs, for example,
clearly benefitted customers, but also had the advantage of making the instruction
set a moving target, forcing a plug-compatible manufacturer like AMD to be in a
constant state of flux, always one step behind the latest Intel hardware. Intel’s
continuous expansion of the x86 instruction set to stay one step of its PCM
rival was less successful than IBM’s, however, because the software produced to
run on Intel hardware is generated primarily using compilers produced by 3rd
parties like Microsoft (and, at one time Borland) that are not always aligned
with Intel’s business objectives[1].
The compiler developers at Microsoft, for example, have been known to ignore many
of Intel’s hardware innovations, focusing less on optimizing code generation for
Intel’s latest and greatest and more on software developer productivity. In my previous
post, I called attention to this aspect of Apple’s business model where it builds
the hardware and also maintains the compiler software used to build the
application software that runs on its hardware platform, which is similar to
the old IBM mainframe business model, which was driven by the profitability of
its underlying hardware products.
Intel responded to competition from RISC-based CPUs by building
a “microarchitecture” that breaks complex instructions into RISC-like m-instructions that are
then feed to the instruction execution pipeline. The Intel processor microprocessor
improves instruction execution performance in a RISC-like fashion without compromising
backward capability of the machine’s complex set of instructions.
More generally, steady improvements in semiconductor fabrication
technology – as embodied in the observation that came to be known as Moore’s
Law – allows the hardware manufacturers to pack more and more logic on each
chip. It has gotten to the point where increases in chip density have outpaced
hardware engineers’ ideas about how to make good use of all that additional
logic function. Multi-core processors are the result – the processor hardware
manufacturers don’t currently have any better ideas for what can done with all
that additional circuitry available on the chip with each new semiconductor
fabrication advance. TSMC, which is Apple’s semiconductor manufacturing partner
for Apple Silicon chips, is currently using a 7 nm process and beginning the
transition to a 5 nm process. Note that ehe end of Dennard scaling occurred when MOSFET semiconductor
fabrication reached 65 nm around 2005, which marked the rise of the multi-core
approach.
A lesson learned from the expensive failure of the Itanium architecture
is that engineers are wary of introducing radical new hardware function. In the
current mode where Intel hardware innovation requires adoption from its software
partners like Microsoft, it is not wise for the hardware to get too far ahead
of the software. This is one aspect of Apple’s switch from Intel hardware to
Apple Silicon that bears watching – it is a return to a business model where
the hardware manufacturer also controls the software that runs on its platform,
which allows for tighter hardware and software integration, a key ingredient in
Apple’s enormously successful iPhone business.
On the other hand, Intel’s experience of adding instructions
narrowly targeted at multimedia applications has proved successful. Similarly, extensions
to the ARM processor instruction set that mirror earlier Intel multimedia
instructions have also worked. Extending the instruction set of both RISC and
CISC processors in narrow, targeted ways has proved successful as long as the extensions
do not introduce any breaking changes to existing software. Where the new
instructions can demonstrate superior performance for very specific functions, manufacturers
can then drive adoption by a select group of professional assembly language developers
at work on the operating system, on devices drivers, or specializing in code
generation in the compiler teams.
In the next blog entry in this series, I will drill into
these issues more, beginning with a discussion about instruction sets and
assembly language programming, followed by a deeper dive into the RISC vs. CISC
battle for the hearts and minds of developers.
[1] Intel
does build an C++ compiler focused on optimizing code generated for its x86 and
x64 platforms that is capable of taking advantageous of all the latest Intel
hardware extensions. However, adoption of Intel’s own proprietary compiler software
is very limited, compared to the widespread use of Microsoft software developer
tools, which are known for their ease of use.
Comments
Post a Comment