Skip to main content

Is there an ARM server in your future?

I cannot resist adding to some of the industry buzz about the recent HP Project Moonshot announcement and what this potentially means for folks that run Windows. As part of Project Moonshot, HP is planning to release something called the Redstone Server Development Platform in 1H12, an ARM-based, massively parallel supercomputer that uses 90% less energy than comparable Intel microprocessors.
Coupled with the big.Little hardware announcement from the ARM Consortium about two weeks ago, I think this is big news that could shake up the foundations of Windows computing. ARM-based Windows Server machines could well be in all our futures.
Let’s start with ARM and the big.Little architecture announcement, which is pretty interesting in itself. Details are from this white paper. big.Little is an explicit multi-core architecture where a simple, low power version of the processor is packaged together with a significantly more powerful (~ 2x) version of the processor that also uses about 3 times the energy consumption. The Little CPU core in the architecture is “an in-order, non-symmetric dual-issue processor with a pipeline length of between 8-stages and 10-stages.” The big guy is “an out-of-order sustained triple-issue processor with a pipeline length of between 15-stages and 24-stages,” significantly more powerful and more complex.
The idea of pairing them is that a task that needs access to more CPU power could migrate from the Little guy to the big guy in about 20K cycles, about 20 m-seconds of delay. The idea is that when you want to playback something with HD graphics on your phone, the architecture can power up the big guy and migrate the CPU-bound graphics-rendering thread so fast that the user experience is like a turbocharger kicking in. Whoosh. Meanwhile, the device – a phone or a tablet, probably – can stay powered up and ready for action for an extended period of time with just the low power CPU. Which, at 1 GHz, is still a potent microprocessor.
Apple, Android, Blackberry and Windows smartphones today all run on ARM. ARM is a reduced instruction set computer (RISC) that has significantly lower energy costs, which is why it is so popular on smartphones. The Apple iPad tablet uses ARM, and so do most iPad competitors today from folks like Samsung and HTC. Some of these tablets are dual core ARM machines today. The tablet computer running an early build of Windows 8 that Microsoft gave away at the recent PDC also uses an ARM chip. When Windows 8 does ship next year sometime, we can expect to see a flurry of announcements for Windows 8 tablets based on ARM chips. Portable devices rely on ARM chips because they consume relatively little power, and power is the most significant performance constraint in portable computing of all types.
To be fair, we will probably see some Intel Atom-based tablets, too, running Win8 being released at the same time. (The Atom is a simplified, in-order processor that supports the full Intel x64 instruction set.) But I think ARM will continue to dominate the market for smartphones and tablets in the coming years. And an ARM-compatible version of Windows is quite likely to add to that momentum – most industry observers are guessing that plenty of Windows 8 tablets are going to be sold.
Another piece of the puzzle: the 64-bit ARM extensions were just formally announced this week.
So, ARM is happening in a big way, and it is time to take notice. Plus, Windows is jumping into ARM-based computing in its next release.
BTW, I am not counting Intel out. Intel has the semiconductor industry’s foremost fabrication facilities; it is still the industry leader in manufacturing. It is the low cost, high volume manufacturer. One of the keys to its manufacturing prowess has been high volume, which is necessary to recoup the enormous capital costs associated with constructing a new, leading edge fab plant.
However, the ARM architecture is the first viable, high volume challenger to Intel’s x86/x64 products to emerge in years. In the most recent quarter, about 400 million smartphones were shipped, easily 10 times the number of Intel-compatible PCs that were sold in the same time frame. That kind of high volume success breeds more success in semiconductor manufacturing because it drives down the manufacturing cost per unit, a key component of Moore’s law.
The HP Moonshot Project announcement suggests none of this is lost on the data center where energy consumption is a major cost factor. The goal is to fill that 1-x rack space with an array of ARM computers installed on a compact motherboard with just 10% of the energy consumption of the comparable Intel computing footprint. The projected cost savings over three years are something like 60%. So, at the higher end of HPC, this is starting to happen. The Windows 8 port to ARM then leaves just one small step to get Windows Server running on these same ARM-based servers.
It will be very interesting to see how this all plays out.


Popular posts from this blog

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here