Tuesday, February 23, 2010

Embedded Linux - Process Isolation and Control

The Linux kernel, at its most basic level, offers these services as a way of providing a common API for accessing system resources:
• Manage tasks, isolating them from the kernel and each other
• Provide a uniform interface for the system’s hardware resources
• Serve as an arbiter to resources when contention exists

These are very important features that result in a more stable environment versus an environment where access to the hardware and resources isn’t closely managed. For example, in the absence of an operating system, every program running has equal access to all available RAM. This means an overrun bug in one program can write into memory used by another program, which will then fail for what appear to be mysterious, unexplainable reasons until all the code on the system is examined. The notion of resource contention is more complex than just making sure two processes don’t attempt to write data to the serial port simultaneously—the scarcest resource is time, and the operating system can decide what tasks run when in order to maximize the amount of work performed. The following sections look at each item in more detail.

Manage and Isolate Tasks
Linux is a multitasking operating system. In Linux, the word process describes a task that the kernel tracks for execution. The notion of multitasking means the kernel must keep some data about what is running, the current state of the task, and the resources it’s using, such as open files and memory. For each process, Linux creates an entry in a process table and assigns the process a separate memory space, file descriptors, register values, stack space, and other process specific information. After it’s created, a process can’t access the memory space of another process unless both have negotiated a shared memory pool; but even access to that memory pool doesn’t give access to an arbitrary address in another process.

Processes in Linux can contain multiple execution threads. A thread shares the process space and resources of the process that started it, but it has its own instruction pointer. Threads, unlike processes, can access each other’s memory space. For some applications, this sharing of resources is both desired and convenient; however, managing several threads’ contention for resources is a study unto itself. The important thing is that with Linux, you have the design freedom to use these process-control constructs.

Processes are isolated not only from each other but from the kernel as well. A process also can’t access arbitrary memory from the kernel. Access to kernel functionality happens under controlled circumstances, such as syscalls or file handles. A syscall, short for system call, is a generic concept in operating system design that allows a program to perform a call into the kernel to execute code. In the case of Linux, the function used to execute a system call is conveniently named syscall().
When you’re working with a syscall, as explained later in this chapter, the operation works much like a regular function call for an API. Using a file handles, you can open what appears to be a file to read and write data. The implementation of a file still reduces to a series of syscalls; but the file semantics make them easier to work with under certain circumstances.

The complete separation of processes and the kernel means you no longer have to debug problems related to processes stepping on each other’s memory or race conditions related to trying to access shared resources, such as a serial port or network device. In addition, the operating system’s internal data structures are off limits to user programs, so there’s no chance of an errant program halting execution of the entire system. This degree of survivability alone is why some engineers choose Linux over other lighter-weight solutions.

Memory Management and Linux
Linux uses a virtual memory-management system. The concept of virtual memory has been around since the early 1960s and is simple: the process sees its memory as a vector of bytes; and when the program reads or writes to memory, the processor, in conjunction with the operating system, translates the address into a physical address.

The bit of the processor that performs this translation is the memory management unit (MMU). When a process requests memory, the CPU looks up the address in a table populated by the kernel to translate the requested address into a physical address. If the CPU can’t translate the address, it raises an interrupt and passes control to the operating system to resolve the address.

The level of indirection supplied by the memory management means that if a process requests memory outside its bounds, the operating system gets a notification that it can handle or pass along to the offending process. In an environment without proper memory management, a process can read and write any physical address; this means memory-access errors may go unnoticed until some other part of the program fails because its memory has been corrupted by another process.

Programs running in Linux do so in a virtual memory space. That is, when a program runs, it has a certain address space that is a subset of the total system’s memory. That subset appears to start at 0. In reality, the operating system allocates a portion of memory and configures the processor so that the running program thinks address 0 is the start of memory, but the address is actually some arbitrary point in RAM. For embedded systems that use paging, this fiction continues: the kernel swaps some of the available RAM out to disk when not in use, a feature commonly called virtual memory. Many embedded systems don’t use virtual memory because no disk exists on the system; but for those that do, this feature sets Linux apart from other embedded operating systems.

Uniform Interface to Resources
This sounds ambiguous because there are so many different forms of resources. Consider the most common resource: the system’s memory. In all Linux systems, from an application perspective, memory from the heap is allocated using the malloc() function. For example, this bit of code allocates 100 bytes, storing the address to the first byte in from_the_heap:

char* from_the_heap;
from_the_heap = (char*) malloc(100);

No matter what sort of underlying processor is running the code or how the processor accesses the memory, this code works (or fails in a predictable manner) on all Linux systems. If paged virtual memory is enabled (that is, some memory is stored on a physical device, like a hard drive) the operating system ensures that the requested addresses are in physical RAM when the process requests them. Memory management requires interplay between the operating system and the processor to work properly. Linux has been designed so that you can access memory in the same way on all supported processors.

The same is true for accessing files: all you need to do is open a file descriptor and begin reading or writing. The kernel handles fetching or writing the bytes, and that operation is the same no matter what physical device is handling the bits:

FILE* file_handle;
file_handle = fopen(“/proc/cpuinfo”, “r”);

Because Linux is based on the Unix operating system philosophy that “everything is a file,” the most common interface to system resource is through a file handle. The interface to that file handle is identical no matter how the underlying hardware implements this functionality. Even TCP connections can be represented with file semantics.

The uniformity of access to resources lets you simulate a target environment on your development system, a process that once required special (and sometimes costly) software. For example, if the target device uses the USB subsystem, it has the same interface on the target as it does on the development machine. If you’re working on a device that shuffles data across the USB bus, that code can be developed, debugged, and tested on the development host, a process that’s much easier and faster than debugging code on a remote target.

System Calls
In addition to file semantics, the kernel also uses the idea of syscalls to expose functionality. Syscalls are a simple concept: when you’re working on the kernel and want to expose some functionality, you create an entry in a vector that points to an entry point of for the routine. The data from the application’s memory space is copied into the kernel’s memory space. All system calls for all processes are funneled through the same interface.

When the kernel is finished with the syscall, it transfers the results back into the caller, returning the result into the application’s memory space. Using this interface, there’s no way for a program in user space to have access to data structures in the kernel. The kernel can also keep strict control over its data, eliminating any chance of data corruption caused by an errant caller.

Source of Information : Pro Linux Embedded Systems (December 2009)

No comments: