Skip to main content

Is there an ARM-based PC in your future?

In the previous blog post in this series on Windows 8, I explained that Windows RT is a new application run-time layer in Windows that was built when the Windows OS was ported to the ARM architecture. ARM is the dominant processor architecture used in current smartphones and tablets, including the Apple iPhone and iPad. So, the short answer to the question posed by the title is, “You already do run an ARM-based computer, and it is the smartphone in your pocket.” The problem for Microsoft is that this ARM computer is probably not running an OS based on Windows.

 Microsoft’s new Surface tablet, designed to showcase the capabilities of Windows 8, uses an ARM processor. On a Surface, you can only run applications known as Windows Store apps that are specifically built to run on top of Windows RT. You can also install and run Windows 8 on any Intel-compatible 32 or 64-bit compatible processor. The Intel version of Windows 8 is called Windows 8 Pro. Windows 8 Pro includes the new Windows RT application run-time, so Windows 8 App Store apps will run on Windows 8 Pro machines. Windows 8 Pro also includes all the older parts of Windows 7, so it is also capable of running “legacy” Windows desktop applications.

If you are a software developer trying to build one of the new Windows Store apps, first you have to install the latest version of Visual Studio and then create a project that targets Windows 8 App Store apps. To maintain compatibility across hardware platforms, a Windows Store app can only access functions in the new Windows Runtime, with the exception of some essential Win32 functions that were converted to run on an ARM processor, but were not included in the new Runtime. For security reasons, Windows Store Apps are run in a silo that limits their ability to interact with the underlying operating system or access any other running process. Microsoft has a certification process that guarantees that the app conforms to these requirements before it is made available on the Windows Store web site. (Apple has a similar policy with its App store.)

The new Runtime layer is quite extensive: see to get a sense of its scope. Applications written in either C++ or Javascript can call into the Runtime directly. The .NET Framework version 4.5 contains some glue to allow Windows Store apps to be written in C#, Visual Basic.NET or any of the other .NET-compatible languages.

Among the essential Win32 functions that are available under ARM are those associated with COM, a key technology used for years and years in Windows to package code into run-time components. In Windows programming, COM interfaces are frequently used to communicate between threads and processes. Many, many Win32 functions rely on COM interfaces. Win32 functions that were migrated to the new Runtime required a wrapper to hide the COM interface from the Windows 8 App Store app, but COM is still there under the covers. The complete COM infrastructure was ported to ARM for Windows 8, but the interfaces themselves were not re-written. If you access the “Win32 and COM for Windows Store apps” Help topic for Windows 8 developers at, you can see that COM is included in the Win32 subset that was ported to ARM. Drilling a little deeper, you can see that, for example, your Windows Store app can still call CoInitializeEx() to initialize a COM component, just like in the days of old.

So, while Windows 8 apps can call directly into the full set of Win32-based COM APIs, there are some very interesting omissions in the RT API surface. Performance monitoring is one of those omissions. Because the Win32-based performance monitoring interfaces were not ported to ARM, a Windows 8 app cannot access the performance counters associated with CPU accounting, for example, and determine how much CPU time it is consuming.

(Note: there is a workaround available. Any app on RT can still make a call directly into kernel32.dll and pull CPU consumption at the process level from it. You can use this hack while you are developing the app, but you must remove that PInvoke from the finished app before you submit it for certification, according to this Q&A article posted at Microsoft won’t permit a retail version of a Windows Store app to call into kernel32.dll directly.)

Instead of relying on performance counters, however, your Windows Store app can utilize ETW. The Win32 APIs that are used to generate ETW trace events or Listen to them from inside your app are part of the Win32 subset that Windows Store apps can call into on ARM. (See The fact that ETW is fully supported on ARM while performance counters are not, by the way, is, further evidence that the counter technology is on the wane and tracing is ascendant in Windows.

It is not entirely clear why performance monitoring was omitted from Windows RT. One possibility is that the application model for Windows Store apps is very different. When you run one of these Apps, it takes over the entire display. When you switch to a different app, the first app is suspended, and it is supposed to dispose of any objects it is currently holding.
Still, if you are a game designer trying to support this new class of Windows devices, leaving out the performance monitoring capabilities is worrisome. Physical memory on the Surface is limited to 2 GB, so RAM is decidedly a constraint. The Surface uses a 4-way ARM multiprocessor, running at 1.3 GHz, so RT does support multi-threading. In fact, the RT support for multi-threading is modeled on the task.await() pattern of asynchronous programming introduced recently in the .NET Framework.

To reiterate, the key deliverable in the new OS is the port to the ARM platform. I will drill into ARM and its implications for the OS in the next post in this series.


Popular posts from this blog

“There’s a lot more to running a starship than answering a lot of fool questions.”

Continuing a series of blog posts on “expert” computer Performance rules, I am reminded of something Captain James T. Kirk, commander of the starship Enterprise, once said in an old Star Trek episode: “There’s a lot more to running a starship than answering a lot of fool questions.” Star Trek, The Original Series. Episode: The Deadly Years. Season 2, Episode 12. See For some reason, the idea that the rote application of some set of rules derived by a domain “expert” can suffice in computer performance analysis has great sway. At the risk of beating a dead horse, I want to highlight another example of a performance Rule you are likely to face, and, in the process, discuss why there is a whole lot more to applying it than might be obvious at first glance. There happens to be a lot more to computer performance analysis than the rote evaluation of some set of well-formed performance rules. It ought to be apparent by now that I …

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…