理铭 的个人资料阿闷的共享空间照片日志列表更多 工具 帮助

日志


4月17日

Making parts of Windows CE Device Driver Code Non-Pageable

Making parts of Windows CE Device Driver Code Non-Pageable

Posted by Wes Barcalow

Following on to Sue’s previous posts describing the paging pool and memory management, I wanted to talk about how drivers can be made pageable for additional virtual memory savings.

Windows CE has features to allow for more data and code to be used on a device than the available RAM.  It does this by ‘paging’ resources into RAM from fixed or read-only storage (ROM/Flash), and discarding pages if the overall amount of RAM available in the system becomes too low.  In systems where code cannot execute directly from ROM, this paging is the only available way to use storage to offset RAM usage.  This is the case for NAND Flash, which is more perfomant and of lower cost than NOR Flash (which does allow XIP or eXecute In Place).

Some code and data in the system is read (‘paged’) into RAM and ‘locked’ there – it is marked as non-pageable after it is loaded.  This code and data must be available, namely when the storage it was retrieved from is no longer available.  For example, to achieve the best power saving on entry to a low-power or deep-idle mode, it is preferable to turn off the power to a NAND Flash chip. 

Applications are typically pageable, since the operating system completely stops the threads of  applications before entering a low power mode.  At this point, since the application's code will not be executed and it's data cannot be accessed, such code and data can be ‘paged out’ and is not needed.  For device drivers things are slightly different.  Most drivers written for Windows CE / Windows Mobile are by default loaded non-pageable by device manager.  This means that no matter how big the driver is, it takes up all the RAM it wants to once it is loaded – none of it can be paged out. In the case of user mode drivers, udevice.exe loads the driver instead of device manager, but it too uses the same criteria for choosing between pageable and non-pageable modes.

With an increase of functionality or flexibility in a driver comes an increase in size.  A camera driver that supports many formats or many features may be very large.  However, if the camera is not used for a long time, then the RAM resources taken up by it is not being put to efficient use.  It makes sense to make this type of driver pageable by default instead.

To make a driver pageable, these steps have to be taken.

1)      Tell device manager you want the driver to be pageable.

2)      Tell the kernel that pageable mode is allowed.

3)      Identify and flag code that is needed to be non-pageable.

The last step is slightly more complicated than the first two steps.  Even though you may have a large driver, you may still need portions of it to be non-pageable.  The most important parts of a Windows CE / Windows Mobile driver that cannot be paged out are functions that execute when the file system is not in operation.  If you do not have such functions in your driver then you do not have to worry about making them non-pageable.

Marking a driver as pageable needs to happen in two steps; the first is with a registry setting for that driver. It may or may not already have a “Flags” registry entry. To enable the driver to be paged ensure there is a registry value named “Flags” of type DWORD entry and that in its value the DEVFLAGS_LOADLIBRARY bit is set (0x02).  If there are other flag bits set, simply logical ‘or’ this with what is already there.

Here is an example of what a GPIO driver registry setting might look like in platform.reg:

[HKEY_LOCAL_MACHINE\Drivers\BuiltIn\GPIO]

   "Dll"="gpio.dll"

   "Flags"=dword:10002   ;Trusted caller only & pageable

   ...

The second step for marking a driver pageable is ensuring the ‘M’ flag of the binary image builder file (BIB file) is not set. The purpose of the ‘M’ flag is to inform the kernel not to demand page the driver, thus forcing the driver to be completely loaded into RAM.

Here is an example of what a GPIO bib file entry might look like that allows the driver to be loaded in a pageable mode by the kernel:

msm7x00_gpio.dll $(_FLATRELEASEDIR)\gpio.dll  NK SH

 

Notice the flags at the end of the statement, there is no ‘M’ flag. A user wishing to force the driver into a non-pageable mode would use “SHM” instead of “SH”. Or alternatively, a user wishing to force the driver into a non-pageable mode would clear the DEVFLAGS_LOADLIBRARY bit in the registry. Either approach is valid.

It is also worth pointing out that a trusted user can potentially change the registry after run time, thus changing a driver from non-pageable to pageable and back again. The bib file flag, however, is built into the image and cannot be overridden. Both are viewed as equally secure as only a trusted caller can change the registry, though the bib file flag provides a predictable pageable status when loading the driver.

The final, more complicated step from above is to identify and isolate code that can’t be pageable. As mentioned above, this is code that runs in single threaded mode where the file system cannot page in or out code and data.  The most well-known examples of this are:

-          XXX_PowerUp

-          XXX_PowerDown

-          Interrupt Service Threads and Interrupt Service Routines (ISTs and/or ISRs) that may execute while the file system is inactive.

-          Read-Only constants that are accessed by these functions.

-          Any supporting code called by these functions.

-          All code associated with the file system path, as it is responsible for bringing in new pages.

Once the code is identified, it should be wrapped in compiler #pragma statements to inform the linker about the properties of the code.  Below is an example of making xxx_PowerUp and xxx_PowerDown non-pageable.

#pragma comment(linker, "/section:.no_page,ER!P")

#pragma code_seg(push, ".no_page")

XXX_PowerDown()

{

      //Perform single-threaded power off logic

}

 

XXX_PowerUp()

{

      //Perform single-threaded power on logic

}

UtilityFuncOne()

{

      // Non-Paged utility function that can be called by

      // both page and non-paged code

}

#pragma code_seg(pop)

 

UtilityFuncTwo()

{

      // Paged utility function that can only be called by other

// paged code.

}

 

This sample code shows the XXX_PowerDown and XXX_PowerUp code being marked as pageable. This will allow the processor to access this code in RAM while the file system is not in operation (during suspend and resume operations). UtilityFuncOne is also in the non-paged section of code, thus making it safe to call from within XXX_PowerUp/Down. However the UtilityFuncTwo code is outside of the non-paged area, and therefore pageable and at risk of not being available if the processor were to try to access it while performing suspend / resume operations.

To test for drivers marked as pageable that are critical to suspend, resume, and shutdown code paths the registry key PageOutAllModules can be used to instruct the kernel to page out all code. This can be used to find drivers that use pageable code when calling XXX_PowerUp and XXX_PowerDown API’s while the file system is inactive.  By generating page faults, problematic drivers can be identified more easily. Below is what the registry key looks like:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power]

"PageOutAllModules"=dword:1

 

Set this registry key and force the system to suspend, resume, or shutdown. The OS will then page out all code marked as pageable and proceed with suspend / resume / shutdown operation. If a critical driver is improperly marked as pageable then this process will generate a page fault and device will die. This technique will help ensure that all drivers a properly marked as pageable/non-pageable when preparing to release the device to market.

By making your driver pageable you can decrease the load on the system for resources while a component or feature is not being used.  It is important to take care as outlined above to make sure some important parts of your driver can still function even though in general the bulk of it is ‘paged’.

 

How does Windows Embedded CE 6.0 Start?

How does Windows Embedded CE 6.0 Start?

Posted by Kurt Kennett, Senior Development Lead, Windows CE OS Core

Operating system code, as one of my colleague developers recently realized, is “just code”.  It’s not voodoo and it does not exist on a higher plane of knowledge.  In fact, an operating system kernel is usually remarkably well structured and well designed in comparison to other pieces of software.  When you think about it, it has to be.  More than one person needs to understand and maintain a core set of code that must work and must support debugging of all other software that runs upon it.  People move on, and change jobs to look for new challenges to keep learning.  If only one person understood the way an operating system worked, then there is a huge amount of risk.

One of the most interesting facets to Operating Systems that I’ve followed in my career is how they start.  Initialization is the last step in design, but at the same time it uncovers the most fundamental bedrock of the principles used.  You start with literally nothing but a CPU which can execute instructions (sometimes not even with memory to use), and must take a platform from that point to a fully functioning system - one that not only utilizes available hardware, but abstracts it to a common understanding.

What I’m going to do in this article is discuss the details of how the Windows CE 6.0 kernel starts, and the association of the ‘Microsoft’ kernel code with the code that comes from an Original Equipment Manufacturer (OEM).  It is hoped that by relating this understanding more people will have a better idea of the ‘hows’ and ‘whys’ of the Microsoft design.

To start, let’s quickly review how the operating system software is built.  The Microsoft tool chain emits .EXE and/or .DLL program files.  Files of both these types of extension are “Portable Executable” format, or “PE” format.  They are practically identical in every aspect:

  • They are extended Common Object File Format (COFF) format files
  • They have import tables and export tables (EXE export tables are usually blank)
  •  They have an entry point defined in their headers for where execution should start

There is nothing extraordinary about the operating system kernel program – it is compiled using the standard compiler and with a minimal set of definitions for that compiler.  An EXE file is produced (called NK.EXE).  It does not link to any external library or DLL – it can’t.  When this code starts there is nothing in the system, or even a system for that matter.  Since the EXE is in a known format (PE COFF), you can determine the entry point from looking at the EXE header.  This means we know where to set the CPU’s instruction pointer to so that the program can start.

One additional property is that a PE file can be arranged so that it may “execute in place”.  This means that if the file data is placed at a particular virtual address, no changes need to be made to the program code in the file in order for it to address other code and data at the correct addresses.  For example, I can tell the Microsoft linker program to place the kernel program file at the virtual address 0x80000000.  Then references to code (function entry points) will be placed in the EXE file such that other code can jump to them by address.  If function foo() is at address 0x80001000 and inside its body it calls a function bar() which sits at address 0x80005000, there will be an instruction stored directly in the program code for ‘foo()’ that calls to 0x80005000.  The dotted lines are just the delineation of function code start or end.

If the EXE program file for the kernel could not sit at 0x80000000 and had to be moved, the ‘bar()’ function would move with it and the call instruction in ‘foo()’ would have to be changed to have the correct, new address.  Otherwise it would call to the wrong place:

 

You can see in the example above that if the kernel EXE file that is designed to be placed at 0x80000000 is loaded at 0x80050000 instead, the instructions in the program will be incorrect.

The process of changing an EXE or DLL program file after it has been loaded to reflect the actual load address is called “fixing up”.  Records are placed in a standard EXE file which allow the program file to be fixed up.  However, until the fixup process is done the addresses of functions in the EXE will be incorrect.  To get around this, Windows CE kernel EXE files are fixed up beforehand to be loaded at a specific address.  A program called ROMIMAGE actually pre-processes the kernel EXE file and some DLLs that are used and fixes them up when it builds the operating system image file (NK.BIN).

To recap, we get a fixed-up EXE file which is called NK.EXE which contains portions of the operating system kernel.  This EXE has an entry point defined in it, the same as every other COFF EXE or DLL.  For execution to start, the bootloader for the system is supposed to put the image file at the right address, find this EXE entry point and jump to it. The bootloader is a separate discussion, and its startup and execution is very platform-specific.  For the context of this article, we will simply assume that a bootloader places the OS image file into memory at a specific address.  We will see below how the bootloader can find the NK.EXE file within the image and then find its entry point.

The NK.EXE is only part of the Windows Embedded CE 6.0 kernel – it comprises the OEM Adaptation Layer (OAL) and boilerplate code to start the system.  The main portion of the operating system kernel that does all the process, thread and memory functionality lives in a Microsoft-supplied DLL called ‘kernel.dll’.  This is a DLL which is also ‘fixed up’ by the ROMIMAGE program to live at a specific virtual address in memory.  So this means there are at least two executable modules that we need to know the location and the entry point of.  The entry point address is stored inside the EXE or DLL file, but what about the location of the EXE and DLL files inside the image?

Windows CE images have an important structure set up by ROMIMAGE that is placed into the image file, called the “Table Of Contents”, or TOC.  This TOC holds pointers and metadata for the operating system image file.  Somewhere near the beginning of the image file a marker is placed – the bytes “CECE” (0x44424442).  Right after this marker is placed an offset to the TOC.  This allows a bootloader or other program looking at the file to be able to find information about the image.  In addition to this offset value that is prefixed by a marker, the OAL must define a public symbol called ‘pTOC’ (exported using ‘C’ naming conventions), which ROMIMAGE can find and fill in with the virtual address of the TOC when it prepares the image file.  When compiled, the pTOC variable in the NK.EXE must have the value 0xFFFFFFFF.  When it prepares the NK.BIN OS system image, ROMIMAGE does the following (in addition to other tasks):

  1. Load NK.EXE and fix it up.
  2. Make the TOC and find a place for it in the image file (will live in virtual memory when the os image is loaded).
  3. Find the ‘pTOC’ variable in the NK.EXE file and make sure it has the current value 0xFFFFFFFF.
  4. Set the pTOC variable value to the virtual address of the TOC that was created in step (2).

This way, when the NK.EXE starts it can reference this variable to know where the TOC is.  Using the TOC, the program can find all the other pieces of the operating system image.

ROMIMAGE uses the configuration .BIB files to know where the image is supposed to go and where RAM is.  There are two important parts of the CONFIG.BIB file – the RAMIMAGE and the RAM lines.  Here is an example from the Device Emulator’s CONFIG.BIB:

    NK      0x80070000   0x02000000    RAMIMAGE
    RAM     0x82070000   0x01E7F000    RAM

These entries tell ROMIMAGE what to do.  It knows to place the OS image file at 0x80070000, and that it can start using read/write memory at 0x82070000.  With this information it can place modules such as NK.EXE and KERNEL.DLL into virtual memory, and then build a TOC and put that into the image as well.  To help the kernel start, the TOC also contains information on where RAM is.  A more detailed look at what is in memory when the image file has been placed is shown below:

In order for the actual operating system to start, the bootloader needs to:

  1. Put the image file at the right place in memory.
  2. Find the “CECE” marker.
  3. Use the TOC pointer that comes right after it to find the TOC.
  4. Search the TOC for the “NK.EXE” file entry.
  5. Scan the EXE file to find its entry point (it is a standard PE format file).
  6. Jump to the address that corresponds to the entry point.

The really interesting stuff happens once the NK.EXE program is started.  In broad strokes, it has its own tasks to perform:

  1. Set up virtual memory and turn it on.
  2. Gather important information that the KERNEL.DLL will need to use to run the system.
  3. Use the pTOC to scan the TOC for the KERNEL.DLL file inside the operating system image.
  4. Find the entry point of KERNEL.DLL (it is a standard PE format file).
  5. Pass critical information gathered in (2) to KERNEL.DLL in a call to its entry point.

We will walk through these activities in detail to better understand them.  Some parts of the startup process are CPU-type-specific.  For instance, the ARM CPU and the X86 CPU have different virtual memory management hardware and mapping structures.  However, to keep things consistent a general process is maintained.  Whenever possible I will attempt to call out any operations specific to an architecture.

When the NK.EXE starts, there are a few prerequisites of the system:

  1. All caches are disabled
  2. The entire RAMIMAGE and RAM regions specified in the CONFIG.BIB file are physically addressable and readable. 
  3. Virtual Memory is in a predefined state (CPU typically executes in physical address mode).

An additional prerequisite can be satisfied before NK.EXE starts, or can be done in the very beginning of NK.EXE execution:

    4. RAM should be writeable without any supplemental configuration (for example, of a memory controller).

These assumptions allow the NK.EXE startup code to do what is necessary to bring any particular system up, and not have to worry about some things being done and others not being done.  Point (3) above may be counterintuitive, but since the kernel must be entirely self-contained, it does not make sense for it to rely on the bootloader to configure virtual memory properly before it starts.  This ‘decouples’ the OS from whatever bootloader is used to start it.

When it starts executing instructions in physical address mode, the first action taken by NK.EXE is to calculate the physical address of the OEMAddressTable symbol.  This is a table that is built into the kernel that defines the static (unchanging) default regions of virtual memory. NK.EXE knows:

  1. It’s own location in virtual memory (where it will be executing instructions)
  2. It’s own location in physical memory (where it currently is executing instructions)
  3. The virtual address of the OEMAddressTable (it was determined when the NK.EXE was built and subsequently fixed up by ROMIMAGE).

Using this information, a simple calculation tells it the physical address of OEMAddressTable:

NK::PhysicalBase + (NK::Virtual OEMAddressTable – NK::Virtual Base) è NK Physical OEMAddressTable 

The OEMAddressTable has triads of DWORDS making up a line in a table, with the following format: 

<region virtual start>          <region physical start>       <region size in MB>

<region virtual start>          <region physical start>       <region size in MB>

...

From the information found in this table, the NK.EXE program can set up the virtual memory mapping tables for the Memory Management Unit (MMU) to function.  Where the MMU-formatted mapping tables are kept and what they look like is platform-specific – the OEMAddressTable is a simplistic format that works for any architecture.  Virtual memory is set up using the data in the OEMAddressTable and enabled, and then the NK.EXE transitions to the virtual address where it can execute code.

One thing to note at this point is that anything that is supposed to be in RAM that needs to be pre-initialized (set to zero or some other known value) is not yet available.  RAM is still a clean slate and can have any contents whatsoever.  The initialization values in the image file (the .data sections of NK.EXE and other modules) for read/write data must be copied from the image to actual RAM addresses before they can be properly used.  How does the NK.EXE know what to copy or where to place things in virtual RAM for these modules?  The TOC.

The TOC not only lists the start addresses of all modules in the image, but it also describes RAM and where the read/write portions of each module are to be located so that the kernel can work with them.  Pieces of the OS image that need to be copied to RAM are called “copy entries”.  Before the NK.EXE can access its own read/write variables, it needs to copy the copy entries to RAM.  This begs the question – the pTOC is a variable, isn’t it?  How could the NK.EXE know where the pTOC is if it hasn’t been set up?  The answer is that the pTOC is a read-only variable – only ROMIMAGE writes to it when the image file is created.  The storage for pTOC is not located in RAM, and does not need to be copied before its value can be used.  The function inside NK.EXE that copies all the copy entries described by the pTOC to RAM is typically called “KernelRelocate()”.  It is a simple process of going through a simple table of structures and copying ranges of virtual memory from one place to another.  Once it is finished all NK.EXE variables can be read from or written to just like any other program.

 

At this point we have a working program, just like any other program for decades past.  It executes instructions, can call functions, and can read and write memory locations.  There are no threads, no processes, and no operating system constructs, but everything is placed in a known location and can be accessed to let us do the rest of the startup of the higher-level systems.

Virtual Memory allows a tremendous amount of flexibility.  Windows CE reserves a few regions of the virtual address range for its own private use inside the OS kernel.  There are several ranges of 4k ‘pages’ of virtual memory that are set aside in the highest address ranges, from about 0xFFFE0000 upwards.  The kernel maps some physical memory into this range to store its ‘global’ dynamic data.  Some of this memory can be used for memory mapping tables for an architecture-specific MMU.  Some is reserved for the kernel-mode and interrupt stacks. Most importantly, at least one of the 4k pages is reserved specifically as a ‘Kernel Data Page’.  This page contains a plethora of data fields which is specific to a version of the kernel.  The NK.EXE sets up the location and initial contents of this page directly. 

Three important values stored in the structure by NK.EXE:

  1. A copy of pTOC
  2. The address of OEMAddressTable. 
  3. The address of the function OEMInitGlobals()

The first two pieces of information are placed in the Kernel Data Page so that any code that knows the address of the Page can find what is in the OS image and the basic layout of virtual memory.  The last piece of information is specifically used so that the NK.EXE contents can be used once control has been passed to KERNEL.DLL.  In general, the contents of the reserved portion of virtual memory looks like:

 

Now that the Kernel data page has been initialized and virtual memory is active, we can jump into the Microsoft KERNEL.DLL executable’s entry point.  Remember, we can find the KERNEL.DLL file in the image by using the TOC, and then we can scan for the entry point of the module.  Even though NK.EXE knows where it is going to put the kernel data page in virtual memory beforehand, the KERNEL.DLL cannot assume its location.  Therefore, we pass the virtual address of the kernel data page to the entry point of KERNEL.DLL.  Although the Microsoft code can call back into the NK.EXE function addresses, control is never fully restored to the NK.EXE program.

After the jump, we are now executing Microsoft kernel code.  The code at the entry point is given the address of the Kernel Data Page, and through its fields the TOC to know anything it needs to know about the OS image.  The kernel does some basic setup of its own and sets some critical data fields for its own use into the Kernel Data Page.

The KERNEL.DLL has a static table of functions and data, called “NKGlobals”, which is built into its DLL simply as a static data structure.  Since the KERNEL.DLL is fixed up by ROMIMAGE to run from a particular virtual address, the function pointers in the NKGlobals will be correct when the KERNEL.DLL code starts to run.  Some of the functions pointed to by this structure are ones like SetLastError() and NKwvsprintfW().  These are routines that the NK.EXE is allowed to call directly.  However, it is important to note that at this point the NK.EXE does not know where these functions are in KERNEL.DLL – it still needs to be told where this table of functions and data is inside KERNEL.DLL .

The KERNEL.DLL passes the address of “NKGlobals” back to NK.EXE in a function call to OEMInitGlobals(), the address of which was left in the Kernel Data Page.  So, in essence the function call graph looks like this:

 

As shown above, the OEMInitGlobals() function stores a pointer to the NKGlobals structure that resides in KERNEL.DLL.  After it stores this pointer, NK.EXE can use it to find the addresses of the KERNEL.DLL functions it is allowed to call.

OEMInitGlobals also passes back (via function return value) a pointer to its own structure, called “OEMGlobals”.  This structure is critical to the kernel to get access to all the functionality that is platform-specific that is inside NK.EXE.  The KERNEL.DLL module is constructed so that it will run on any processor belonging to a certain architecture (X86, ARM, etc).  The NK.EXE is the abstraction of a specific species of the architecture (such as XSCALE or OMAP processor) and the platform that supports that architecture.  The OEMGlobals structure is comprised of function pointers and data just like NKGlobals.  Some of its members include:

  • PFN_InitDebugSerial(), PFN_WriteDebugByte(), PFN_ReadDebugByte()
  • PFN_SetRealTime(), PFN_GetRealTime(), PFN_SetAlarmTime()
  • PFN_Ioctl()

These function pointers point to the legacy OEM functions like OEMInitDebugSerial and OEMIoctl that live inside NK.EXE.  Many other functions are listed so that KERNEL.DLL can do what is necessary for a particular platform.  The functions are fairly self-explanatory in name and are well documented on MSDN.

Once the call to OEMInitGlobals() completes, the KERNEL.DLL has everything it needs to do architecture-generic and platform-specific processing.  It knows where memory is and how it is laid out virtually, as well as the location of every module in the image.  The NK.EXE also has a pointer to a table of functions it can call.  In essence, the two code modules have executed a manual ‘handshake’ by executing a simplistic method of manual dynamic linking.

Everything up to this point that NK.EXE and KERNEL.DLL have done has been done without any processes or threads, and without any kernel services running. To bring the rest of the system up, the KERNEL.DLL has to do three things:

  1. Architecture-specific setup
  2. Architecture-neutral setup
  3. Platform-specific setup (specific CPU and BSP initialization)

The architecture-specific setup is done first by a call to a KERNEL.DLL function called <architecture>Setup.  On an ARM platform this would be called ARMSetup().  On an X86 platform this would be called X86Setup().  The actions taken by the architecture-specific code are numerous, but they all execute in a single-threaded context with no processes running.  The actions taken here include but are not limited to:

  • Set up hard required page tables and reserve VM for kernel page tables
  • Update cache information in Page Tables
  • Flush the Transition Lookaside Buffer (TLB)
  • Set up architecture-specific buses and components (companion chips, coprocessors, etc).

The one other thing this architecture-specific code does is set up the Interlocked API code so that NK.EXE knows where it is and can call it.  This is a bit of an aside, but I will explain in detail because it is a critically important piece of the OS.

Even at the most basic level, Windows CE needs to coordinate actions among different threads of execution – even some that run inside the kernel, outside the scope of any specific process.  The mechanism used to do this with the highest amount of efficiency is the Interlocked API.  The API consists of a handful of functions, the most important of which is InterlockedCompareExchange().  The purpose of this function is to:

  1. Read a memory location (M) into register (R)
  2. Compare the value read (R) with a match value in another register (R2)
  3. If (R) and (R2) are not equal, exit
  4. Write the value of another register (R3) back to memory location (M)

These four steps are meant to execute atomically, and they form the basis of coordination between different threads.  That is, there should be no interruption between each of (1), (2), (3) and (4).  The only way to guarantee this on some of today’s processors where the operation is not available directly in hardware is to ensure interrupts are disabled.  Herein lies a problem, since user-mode processes do not have sufficient privilege to disable interrupts, and it would be very inefficient to have to do a system call to the kernel and disable interrupts every time two threads wanted to coordinate with each other.

To be efficient, there is one single place in the entire system where the InterlockedCompareExchange() happens.  The code for the four steps above is placed in the Kernel Data Page, at a particular location that is well known.   Then the NK.EXE and KERNEL.DLL (and any process which has the Kernel Data Page mapped) can call the code, and the instructions all occur in the same place.  This is done so that the API is restartable.  What does this mean?  Why do we do this?

Thread switches in an operating system can happen for three reasons:

  • It has been specifically requested by the executing thread
  • The thread’s time-slice has expired (noted by a timer interrupt event) and it is another thread’s turn to run. 
  • Another type of interrupt occurs, which causes a situation where a thread of higher priority should execute.

The second two cases are really the same – an interrupt occurs that ultimately causes a thread switch.  Since an interrupt can occur between any of the steps (1) to (4) and potentially switch out the thread, the operation we needed to be atomic might not be – some other thread might run in between (2) and (3), for example. 

To ensure that the instructions (1) to (4) occur atomically, every time there is an interrupt a simple bounds check is made to see if the CPU was currently executing somewhere in (1) to (4).  If the interrupt occurred when the CPU was executing after (1) and before (4), then the instruction pointer for the current thread is reset to point to instruction (1), so that the operation may be retried.  In order for the interrupt code to be able to check if the CPU was executing in between (1) and (4), the code for it must be in a single known location.  That location is inside the Kernel Data Page.

 Once the Interlocked API code has been copied to the Kernel Data Page, the NK.EXE knows where it is and can coordinate actions with KERNEL.DLL when multiple threads become active – ultimately by using the Interlocked API.

 Back onto our main discussion, the next step in the KERNEL.DLL startup is the architecture-neutral setup.  One of the first architectural-neutral things to set up is to see if the OS image includes a KITL.DLL to allow communication with and debugging of the OS kernel.

 KITL stands for “Kernel Independent Transport Layer”.  This is basically a mechanism by which data ‘packets’ specific to the Windows CE system can be passed between the kernel of the device and Platform Builder running on the desktop.  Usually, the portions of KITL which are implemented in NK.EXE purely revolve around the encoding for transport and the transport of the data packets.  A Board Support Package (BSP) does not have to know anything about the data being sent and received between the device and the desktop – it just has to facilitate the correct transmission and reception.  Mechanisms for transport of the KITL packets include but are not limited to RS232 Serial, Ethernet, and USB.  A full description of KITL is beyond the scope of this blog article.

 Other actions that happen during the architecture-neutral setup include:

  1. Initialize Kernel Debug Output (by calling OEMInitDebugSerial() through the function pointer in the OEMGlobals structure)
  2. Write a masthead debug string (“Windows CE Kernel Version xxxx”) to the debug output.
  3. Select the kernel processor type from the available options

When the architecture-neutral portions have been completed, we can do the platform-specific setup.  This code lives in NK.EXE since it is OEM and board specific.  To initialize this part, the kernel calls into OEMInit() through the function pointer that is in the OEMGlobals structure.  OEMInit does board-specific initialization, and can do one other important thing – start KITL.

If KITL is built into the NK.EXE, then its functions are directly accessible from NK.EXE.  If KITL is in a DLL, then that DLL will have been loaded by the kernel at the beginning of the architecture-neutral setup, as shown above.  In either event, the OEMInit() function can call a Kernel IO control saying that KITL should be started.  Based on whether the KITL.DLL was found or not, the kernel knows what to do.

Upon return from OEMInit(), the kernel is ready to start processes and threads to run.  It synchronizes its cache, and then enters the processor architecture’s service mode if it is not already running in it.  Then it does any one-time inits that do not require a current thread. These actions include:

  1. Enumerate available Memory  (optional call to OEMEnumExtensionDRAM() )
  2. Initialize critical sections in the kernel (critical section code uses the Interlocked API, the setup of which was discussed above).
  3. Initialize heap structures
  4. Initialize process and thread tracking structures
  5. Any other actions done before multi-threading is enabled.

After all single-threaded initialization is done, the kernel is ready to schedule the first thread.  This first thread is called “SystemStartupFunc()”, and lives in KERNEL.DLL.  To start the thread, the kernel specifies that there is no current thread to switch from, sets the first thread as the only one available to run, then calls into the thread scheduler code.   The scheduler code takes a look at all available threads and chooses the next one to run.  At this point in startup we only have one thread that has been manually set up to run, so that one is the one that is switched to.

 The SystemStartupFunc() function begins execution by flushing the system cache, then does things that require a ‘current’ thread to be running in order to happen.  These actions include:

  1. Initialize the system loader
  2. Initialize the paging pool
  3. Initialize system logging
  4. Initialize system debugger

The SystemStartupFunc() will call one more OEM function before it completes initialization – it will call the OEMIoctl() function through the function pointer in the OEMGlobals, with an argument ‘OEM_HAL_POSTINIT’.  This tells the NK.EXE that all system startup has completed and we are about to schedule threads and processes.

Upon exit from this first call to OEMIoctl(), the SystemStartupFunc() initializes the system message queue, any watchdogs, and then creates and starts the threads for the power manager and file system.  Thus, the rest of the higher-level parts of the operating system begin to execute here.  The last operation taken by the SystemStartupFunc() is to create another thread which executes the function “RunAppsAtStartup()”.  This function creates the first user processes.

We are now at the point where the kernel, power manager, and file system are all executing, and applications can begin to get executed that have been described to run in the system registry. 

This concludes the blog entry on how Windows Embedded CE 6.0 starts.  The internals of Windows CE are quite interesting and very well structured, and the startup process described above gives insight into the most critical system components.  In the future I hope to publish other articles on the internals of the system registry, the file system, and the device and power managers.

Adding and removing KITL drivers in x86 BSPs

Adding and removing KITL drivers in x86 BSPs

 

Overview

Today I want to chat about what it takes to support a new Ethernet chip for download and KITL debugging on an x86 PC-based platform.  We'll start by talking about how Ethernet drivers are represented in the x86 KITL structure, then we'll walk through (in a detailed, step-by-step fashion) adding a new driver to your bootloader and OS image.  This article is valid for both CE5.0 and CE6.0.

 

This articles assumes you have some knowledge of what KITL (Kernel Independent Transport Layer) is, as well as basic understanding of what a bootloader does.

 

Architecture

First, let's take a look at the architecture of x86 KITL.  At the lowest level for our discussion, there is the driver code that supports our NIC (network interface card).  This code implements the functions in the OAL_KITL_ETH_DRIVER structure, as described in platform\common\src\inc\oal_kitl.h.  Things like GetFrame, SendFrame, Init, and InitDMABuffer will all be implemented here.  There are working samples of this code in platform\common\src\common\ethdrv.

 

Because x86 supports a PCI bus, there may be multiple supported chips; the CEPC platform, for example, supports the RTL8139 chip as well as NE2000 compatible chips, among others.  Each NIC we support will have a separate driver library that implements these functions.  The OAL_KITL_ETH_DRIVER structure that represents our NIC driver plugs into a larger structure to allow us to support multiple drivers.  This larger structure lists all of the drivers the platform supports.  It is found in platform\common\src\x86\common\kitl\kitldrv_x86.c (1), and its type is SUPPORTED_NIC.

 

If we look more closely at the SUPPORTED_NIC structure we'll see several fields that distinguish one NIC driver from another.  We can see the code in platform\common\src\x86\inc\x86kitl.h:

//

// Ethernet debug controller vendor and PCI information.

//

typedef struct _SUPPORTED_NIC // NIC vendor ID

{

    USHORT wVenId;             // PCI Vendor ID

    USHORT wDevId;             // PCI Device ID

    DWORD  dwUpperMAC;         // 1st 3 bytes of mac address

    UCHAR  Type;               // adapter type

    UCHAR  szAbbrev[3];        // Vendor name abbreviation

    const OAL_KITL_ETH_DRIVER *pDriver; // corresponding driver

} SUPPORTED_NIC, *PSUPPORTED_NIC;

 

The comments are pretty self-explanatory.  The only one that's a little ambiguous is the UCHAR Type, which is just a CE-specific type that you can pick from the list in public\common\oak\inc\halether.h - EDBG_ADAPTER_NE2000, for example.  The wVenId and Type fields in the SUPPORTED_NIC structure will identify the NIC to the bootloader and KITL as supported.

 

How it Fits Together

The SUPPORTED_NIC structure is compiled into oal_kitl_x86.lib, which our bootloader and KITL will link with.  The bootloader and KITL will also link with static libraries for each NIC driver, such as rtl8139dbg.lib, ne2kdbg.lib.  You can see examples in the SOURCES files in platform\CEPC\src\bootloader\eboot and platform\CEPC\src\kitl (2).

 

With all of this in place, we'll run the bootloader at device boot time and it will enumerate the devices on the PCI bus.  It will read the PCI Config space and find network-class devices that have a PCI Vendor ID that matches one in the SUPPORTED_NIC list.  If it can't find such a device, it will check the Adapter Type in the SUPPORTED_NIC list and match it against the default adapter type that's compiled into the bootloader.  Once it finds a match, it will use the driver to download the OS image and then begin execution at the OS level.

 

Once downloaded, the OS will perform some basic initialization.  Then it will go through the same matching process to discover a supported network card for KITL.  If it finds a match, it will use that driver and NIC for the KITL connection.

 

Walkthrough

Now that we understand how it all works, let's consider what is needed to add our own driver to this structure.  We'll use CE6.0 as a basis since the architecture is the same in 5.0 and 6.0; see the footnotes for filenames / paths that have changed slightly from CE5.0.

 

Our overall approach will be to first create a driver library that supports the new NIC.   Then we will create an x86 KITL library that includes this driver in the list.

 

We can use one the existing drivers from platform\common\src\common\ethdrv as a baseline for creating our driver.  We’ll copy this driver into our BSP and then modify it there.

 

Step 1) Copy the platform\common\src\common\ethdrv\rtl8139\... to a directory in your own platform (for example platform\MyBSP\src\kitl\ethdrv\MyNIC).

 

Add your new directory to the DIRS file in the parent directory so it gets compiled when we build.  In this new directory, change the filename of rtl8139.c to MyNIC.c.  Open the SOURCES file and change the TARGETNAME to bsp_ethdrv_mynic, and change the SOURCES target to MyNIC.c.

 

Step 2) Replace the copied driver functions with functions that support your NIC.

 

This is the hardware-specific code that initializes your NIC, allows it to send and receive frames, etc.  Note that the function names will change but the signatures should match the OAL_KITL_ETH_DRIVER structure.  Once we have the code written, we’ll create a header with some prototypes so the functions can be referenced by the bootloader and KITL.

 

Step 3) In your platform\MyBSP\src\inc directory, make a header file, MyNIC.h that prototypes the MyNIC.c functions.

 

You can copy the RTL8139 prototypes from platform\common\src\inc\oal_ethdrv.h, and then modify them to match your MyNIC.c function names.

 

The next hurdle we need to get over is the fact that our driver list is compiled and linked in an "off-limits" directory.  We can't modify the platform\common implementation for a number of reasons - if there is ever a QFE in this code, our modifications would conflict with it.  Secondly, if we add our driver to the common list, the driver will be expected by all x86 BSPs that use the common library.  So, we need to take the common list and make it our own.  The easiest way to do this is to simply clone it.

 

Step 4) Copy platform\common\src\x86\kitl\... to a directory in your own platform (for example platform\MyBSP\src\kitl\x86kitllib).

 

Add your new directory to the DIRS file in the parent directory so it gets compiled when we build.  Now that we have a copy of the supported NIC code, we need to add our driver to the list.  We can also remove drivers that we don't want to support, thus reducing the size of our bootloader and OS.  Before we can add our driver, though, we need to define an OAL_KITL_ETH_DRIVER structure that describes it.

 

Step 5) In your new kitldrv_x86.c, define an OAL_KITL_ETH_DRIVER structure for your NIC using the prototypes from your MyNIC.h file.

 

Step 6) Edit the SUPPORTED_NIC structure in kitldrv_x86.c, adding our driver to the list and removing any unwanted drivers.

 

If you don't know the PCI Vendor ID or Device ID of your NIC, you can boot your BSP without the driver in the list and examine the serial debug messages from the bootloader.

 

Now our NIC is in the supported list.  We still need to spruce up our bootloader and KITL SOURCES files so that we're linking with our custom library as opposed to the original in platform\common.

 

Step 7) Edit the SOURCES file in .\base and .\baseboot (3) subdirectories of our cloned directory.  Change the TARGETNAMEs from oal_kitl_x86.lib[s] to different names, such as mybsp_kitl_x86.lib and mybsp_kitl_x86_boot.lib.

 

Step 8) Edit the SOURCES files in your bootloader directory and kitl (4), and add the mybsp_kitl_x86_boot and mybsp_kitl_x86 libraries, respectively, to the TARGETLIBS for each SOURCES file.  Note that the location of these libraries is going to be in $(_TARGETPLATROOT)\lib instead of $(_PLATCOMMONLIB).

 

Now we’ve pointed our bootloader and KITL implementations to link with our new driver list.  The last step is to add the driver library that supports our NIC.

 

Step 9) Edit the SOURCES files in your bootloader directory and kitl (4), and add bsp_ethdrv_myNIC.lib to the TARGETLIBS for each SOURCES file.

 

Now you can rebuild your BSP and the resultant bootloader and OS image will support your new NIC!

 

I think you might agree that this architecture isn't the most flexible; that's something Microsoft will look at for future versions of CE.

 

(1) [In CE5.0, the filename is just kitldrv.c,]

(2) [In CE5.0, the driver libraries are linked with the OAL and the kernel in platform\CEPC\src\kernel\kernkitl]

(3) [In CE5.0, there are no subdirectories, so you only need to change a single SOURCES file.]

(4) [In CE5.0, this would be the kernkitl directory]

4月9日

太牛了,这是一种什么样的国际主义精神啊,呵呵

[作者: 新长城 于:2008-04-03 08:11:33]
  
  看了最近的新闻,对欧洲人彻底失望了。人家的态度是:西藏事件真的假的,who cares?德国之声更露骨:中国人就算没错,西藏人更没错,就算是武装起义,也是应该的。
  
  看到这儿俺就明白了,什么支持藏独?全是假的,真支持的话没这个干的。就是讨厌你,想恶心你。你凑上去解释也没用,不信就是不信,证据再多也没用。
  
  俺现在大彻大悟,最近正在认真学习东普鲁士,也就是east prussia的历史。碰到德国人的话,就问问他们这事咋办?被人家俄罗斯人给殖民了,我们中国人都替你着急。碰到法国人就问问尼斯,科西嘉什么时候还给意大利,碰到意大利人就问问什么时候能把这俩地方要回来,再问问他们怎么考虑西西里独立的问题。英国人就啥也不说了,碰到英格兰的就谴责他们压制苏格兰,爱尔兰和威尔士,碰到剩下这三地方的就同情他们,问他们为啥不敢反抗。鬼子如果说他活得挺好,不觉得被压制俺就更加同情他,倍有爱心。
  
  这几天一直在号召德国人要回东普鲁士,心情很舒畅。终于知道李察基尔这小子是啥感觉,虽然是损人不利己,但总觉得自己很有理想,很崇高,而且在精神上俯瞰被欺辱的德国人,鄙视抢占他们土地的俄罗斯人,立陶宛人等。
  
  遗憾的就是youtube这烂地方经常删帖子,俺煽风点火的那些帖子存在往往不到几分钟,真他*的服了,这得多少人在线管理阿。
  
  列个俺熟悉的表
  
  1:科西嘉岛,Corsica及其wiki链接链接出处
  介绍:18世纪以前一直是意大利领土,后来归属法国。
  
  2:尼斯(nice),16世纪前一直是意大利领土,后来反复争夺,19世纪后彻底归属法国。
  
  3:东普鲁士(east prussia)
  这个成分很复杂,传统德国人聚集区,历史政治面貌可以参考wiki,链接出处该地区到19世纪末还是以德国人为主,现在基本没有德国人了,也不归他们统治。
  
  4:加泰罗尼亚(catalonia),喜爱巴萨的球迷都知道,西班牙的传统独立倾向很强的区域。
  
  5:西西里岛(sicily),这个地方历史上没有任何定论,大家轮流坐庄,19世纪归属意大利。因此西西里人到现在独立情绪还是很激烈。
  
  6.直布罗陀,Gibraltar,英国的殖民地,西班牙一直想搞回来就是搞不回来。
  
  7:英国的地名大家都清楚,剩下的阿尔萨斯洛林大家可以多研究,还有一些很不错的民族争议地区欢迎大家补充,我熟悉的主要是这些。
  
  8:特里亚斯特(Trieste),夜月空山提供:
  
  1382年到1918年一直都是奥地利的一部分,那里还有希茜公主的行宫。曾经非常繁荣,并入意大利以后衰落下去。
  因为工作原因曾去那里呆过一段时间,两个印象特别深,一,风景灰常优美,半山半水的美丽古城。二,不少老人以及郊区的人还能说德语,郊区以及农村的人还有很多说斯洛文尼亚语,我住的那个旅馆老板娘一句英语都不会,而我除了Bon jorno不懂半句意大利语,但很快我就发现用德语可以和她流畅的交流…  
  
  在1382 年到1919年之间,的里雅斯特属于奥地利的一部分。1920年,的里雅斯特和整个弗留利-威尼斯朱利亚被移交给意大利。但这次合并剥夺了该市传统的腹地,使其重要性大为下降。1922年10月30日,贝尼托·墨索里尼的法西斯执政,斯洛文尼亚裔(占人口的25%)开始受到镇压,达到顶点 on 4月13日 1920年,意大利民族主义团体烧毁Narodni dom (National House),的里雅斯特斯洛文尼亚人的文化中心。
  
  9.此外三个地方,夜月空山提供
  o比利时,弗莱芒人vs瓦隆人 [ 夜月空山 ] 于:2008-04-03 15:58:41
  罗马尼亚——特兰西瓦尼亚,围绕这个地方罗马尼亚和匈牙利成了死敌,以至于二战时同为德国盟军的他们自己打起来倒比跟苏联打热闹。
  摩尔多瓦——德涅斯特河沿岸共和国。
  
  Belgium——Flemish vs Walloon
  Romania——Transivania(就是出吸血鬼那地儿) , with Hungary
  Moldova——Transnistria(Pridnestrovian Moldavian Republic
  
  看人下菜单 [ 四方城 ] 于:2008-04-03 15:22:53
  英国:
  +北爱尔兰
  +苏格兰
  +威尔士
  
  法国
  +布列塔尼(Bretagne/Breizh)
  +科西嘉(Corsica)
  +尼斯(Nice)
  
  德国
  +东普鲁士*(East Prussia),首府哥尼斯堡。现在分属:
  ++波兰,主要城市:什切青(Szczecin,德语原名斯德丁Stettin),格但斯克(Gdansk,德语原名但泽Danzig) —— 这些地方还可以给波兰挖坑
  ++俄罗斯,主要城市:加里宁格勒(Kaliningrad,德语原名哥尼斯堡Konigsberg)
  ++立陶宛,主要城市:克莱佩达(Klaipeda,德语原名Memel)
  
  意大利
  +西西里(Sicily/Sicilia),首府巴勒莫(Palermo)
  
  西班牙
  +巴斯克(Basque/Vasco/Euskadi),首府毕尔巴鄂(Bilbao)
  +加泰隆尼亚(Catalonia),首府巴塞罗那(Barcelona)
  +直布罗陀*(Gibraltar)
  
  * 是被占领土
  
  波兰历史就是一部被瓜分史,对付他们很容易。捷克领土有啥问题没有?最近跳得也很欢。
  
  附作者感言:
  说道理不管用,讲不明白的[新长城 于:2008-04-03 08:29:25
  他们要的是一种道德上的优越感,我们也可以嘛。
  
  一定要多关心他们,例如每次和意大利人聊起来科西嘉的时候,他们总是逃避或者痛苦,或者装出无所谓的样子。哈哈,很好玩
  
  俺喜爱科西嘉是因为科西嘉和我们这疙瘩比较有渊源[新长城 于:2008-04-03 08:35:18
  没事讲讲这些很好的,往他们伤口撒点盐,最后表个态:俺支持勤劳善良美丽大方的意大利人,虚伪的法国人俺最讨厌。意大利鬼子感激的眼眶都湿润了
  
  我没少拿直布罗陀恶心西班牙人[新长城 于:2008-04-03 08:43:01
  曾经和西班牙人住一块,这帮鸟人很无聊,整天party。不过还算比较友好。每当他们流露出对我们缺少人权的自由的时候,俺就很清纯的问:直布罗陀怎么回事啊?
  
  西班牙人倒也聪明:英国抢的呀,你们的香港不也一样吗?
  俺继续问:香港回来了,直布罗陀什么时候回来啊。
  西班牙人就开始沉默,然后傻乐
  
  这几天一直在号召德国人要回东普鲁士,心情很舒畅。终于知道李察基尔这小子是啥感觉,虽然是损人不利己,但总觉得自己很有理想,很崇高,而且在精神上俯瞰被欺辱的德国人,鄙视抢占他们土地的俄罗斯人,立陶宛人等。
  
  哈哈,昨天刚和一个西班牙帅哥聊天,那家伙一直说自己是加泰罗尼亚人,不是西班牙人,满口的悲愤  
  
  东普鲁士还好
  
  装作茫然无知的问他们“德意志的精神首都柯尼斯堡早就想去旅游了,就是地图上怎么也找不到……”,不过这话只和纯粹较劲的德国SB说。
  
  一般的同学什么的还是耐心和他们解释
  
  很早以前就用科西嘉和法国美女掐过了;哦还有用塞浦路斯和希腊掐,最狠的是用伊斯坦布尔掐,他们总会纠正我那个地名叫君士坦丁堡,那才是欧洲人心中永远的痛啊。
  
  
  土国三件事,君堡、库尔德、亚美尼亚屠杀
  
  西班牙,直布罗陀,巴斯克
  
  法国 科西嘉
  
  意大利 的里雅斯特
  
  希腊 塞浦路斯,小亚细亚东海岸
  
  德国 东普鲁士,苏台德区
  
  英国 北爱
  
  加国 魁北克
  
  
  
  北欧有什么破事?
  
  
  哦,大斯堪的那维亚主义
  
  hoi2里面,北欧就是大斯一个国家。。。