Sunday, December 19, 2010

Qualifiers Demystified

Type qualifiers were introduced by ANSI C, it is used to control optimization done by the compiler on any data object.
There are two type of qualifiers in C:

1. const
2. volatile

const and volatile type qualifiers can be applied to any type of data objects. When using a type qualifier with an array identifier, each element of the array is qualified by the compiler, not the array type itself. A qualifier can be applied on structure/union objects and an individual structure/union member can also be qualified. When a structure object is qualified, each member of the structure is qualified using same qualifier and when a qualifier is applied on any particular structure member then only that member is qualified.

1. The 'const' qualifier:

This qualifier instructs compiler to mark any data object as read only i.e. no one can change the value of the data object during program execution. Objects qualified by the const keyword cannot be modified. This means that an object declared as const cannot serve as the operand in any operation that changes its value.

Some const qualified objects and their behavior are given below:

Constant integer:
const int Var = 10;
The value of Var can’t be modified.

Pointer to a constant integer:
const int *Ptr;
The value in the location pointed by Ptr can’t be modified i.e. *Ptr is non-modifiable but Ptr is modifiable;

Constant pointer:
int * const Cptr;   
A pointer which will always point to the same location.  Cptr is non-modifiable but *Cptr is modifiable;

A constant pointer to a constant integer:
const int *const Ccp;
Neither the pointer nor the integer can be modified. Ccp and *Ccp, both are not modifiable.

Redundant use of const:
const const int y; 
Illegal.

When a data object is declared as a global constant, it is stored in .ro (read only) segment of a program and its value can not be altered by any way, but when a constant data object is declared within a function/block’s scope, it is stored in the process stack and its value can be altered using a non constant qualified object. The scenario is described below using a code snippet.

#include<stdio.h>
const int glob=10;     /* reside inside read only data segment */
int main()
{
   const int localc=10;      /* goes onto stack and localc is marked as read only */
   int *localp1=&localc;  /* WARNING: local const accessed through a 
                                            non const  qualified object */
   int *localp2=&glob;    /* WARNING: global const accessed through a
                                            non const  qualified object */

   *localp1=0;                  /* Allowed */
   *localp2=0;                 /* Illegal: since address localp2 reside in.ro
                                          section and any alteration in .ro section results  a
                                          segmentation fault */
.
.
.
return 0;
}


Use of const qualifier with aggregate types is also explained in code snippet given below:

#include<stdio.h>

struct student{
int roll_no;
int age;
};

struct employee{
const int emp_id;
int salary;
};

int main()
{
const struct student John = {35,18};
struct employee Eric = {22,20000};
John.roll_no = 36;                     /* Illegal: John is read only */
John.age=26;                           /* Illegal: John is read only */

Eric.emp_id=23;                        /* Illegal: emp_id is read only */
Eric.salary=200000;                    /* OK */
.
.
.
.
.
return 0;
}


Attempting to modify a const object using a pointer to a non-const qualified type causes unpredictable behavior. gcc allows you to modify a constant object using non-const qualified pointer. The compiler just warns you, when any such type of casting occurs in code.

2. The 'volatile' qualifier:
The volatile qualifier alters the default behavior of the variable and does not attempt to optimize the storage referenced by it. volatile means the storage is likely to change at anytime and that is something outside the control of the user program. This means that if you reference the variable, the program should always check the physical address (i.e. a mapped input fifo), and not use it in a cached way.
This qualifier forces the compiler to allocate memory for the volatile object, and to always access the object from memory.
The use of volatile disables optimizations with respect to referencing the object. If an object is volatile qualified, it may be changed between the time it is initialized and any subsequent assignments. Therefore, it cannot be optimized.

Some volatile qualified objects and their behavior are given below:

Volatile data object:
To declare a volatile data object, include the keyword volatile before or after the data type in the variable definition. For instance both of these declarations will declare vint to be a volatile integer:

volatile int vint;
int volatile vint;

Pointer to a volatile data object:
Both of these declarations declare ptr to be a pointer to a volatile integer:

volatile int * ptr;
int volatile * ptr;

Volatile pointer to a non-volatile data object:
Following declaration declares vptr to be a  volatile pointer to non-volatile integer:

int * volatile vptr;

Volatile pointer to a volatile data object:
Following declaration declares vptrv to be a  volatile pointer to volatile integer:

int volatile * volatile vptrv;

If you apply volatile to a struct or union, the entire contents of the struct/union are volatile. If you don't want this behavior, you can apply the volatile qualifier to the individual members of the struct/union.

Use of volatile variable:
A volatile variable is used, where the value of variable changes unexpectedly.
You can see usage of volatile variables in
• Memory-mapped peripheral registers.
• Global variables modified by an interrupt service routine.
• Global variables within a multi-threaded application.
I’ll take example of memory mapped peripheral registers.

void TestFun()
{
     unsigned int  * ptr = (unsigned int *) 0x65917430;
     /* Wait for register to become non-zero. */
     while (0 == *ptr);
     printf(“Register updated\n”);
}

In this case optimization will be performed on *ptr. Assembler will generate assembly code, something like this:

mov ptr, #0x65917430    /* ptr <- 0x65917430 */
mov a, @ptr                     /* move data at address 0x65917430 to register a */                     
loop bz loop                     /* jump to loop if a is equal to zero */

Problem: It will read data from address 0x65917430 only once, if data read is zero it will go into a non-ending loop.
To handle this problem, we disable any optimization on the variable *ptr by qualifying it as volatile and the code will look like:

void TestFun()
{
     unsigned int  volatile * ptr = (unsigned int *) 0x65917430;
     /* Wait for register to become non-zero. */
     while (0 == *ptr);
     printf(“Register updated\n”);
}
Now assembler will generate assembly code
mov ptr, #0x65917430    /* ptr <- 0x65917430 */
loop mov a, @ptr            /* move data at address 0x65917430 to register a */                    
bz loop                           /* jump to loop if a is equal to zero */

Now in every execution of loop, address 0x65917430 is checked for non zero value.


______
For any suggestion or query mail me to jais.ebox@gmail.com

Wednesday, October 6, 2010

Inside C Program

C is a programming language originally developed for developing the Unix operating system. It is a low-level and powerful language language commonly used in system programming.

Compilation of a C program:
To create an executable program, we  compile a source file containing main function. Example to compile a program named hello.c is given below:

$gcc hello.c

The compiler displays status, warning, and error messages to standard error output (stderr). If no errors occur, the compiler creates an executable file named a.out in the current working directory. We can run a.out as follows:

$./a.out


$./a.out
Hello World!


A program may have more than one source files. For example a program is divided into two files: main.c, containing the main program, and func.c, containing the functions, those have been used in main.c. The command for compiling the two source file together is:

$gcc main.c func.c

gcc displays errors and warnings corresponding to each file with location information.

Inside Compiler:
A compiler performs a set of steps in order to convert source code into executable. Compilation and linking are the two most important steps in the process. For each source file, GCC calls the language compiler to create an object file and then calls linker, which builds an a.out file from the object files.



                                    Figure (1) Compiler overview

Object File:
An object file is basically a file containing machine language instructions and data in a form that the linker can use to create an executable program. Each routine or data item defined in an object file has a corresponding symbol name by which it is referenced. A symbol generated for a routine or data definition can be either a local definition or global definition. Any reference to a symbol outside the object file is known as an external reference.

To keep track of where all the symbols and external references occur, an object file has a symbol table. The linker uses the symbol tables of all input object files to match up external references to global definitions.

Local Definitions:
A local definition is a definition of a routine or data that is accessible only within the object file or block in which it is defined. Such a definition cannot be directly accessed from another object file.

Global Definitions:
A global definition is a definition of a procedure, function, or data item that can be accessed by code in another object file. For example, the C compiler generates global definitions for all variable and function definitions that are not static.

External References:
An external reference is an attempt by code in one object file to access a global definition in another object file. A compiler cannot resolve external references because it works on only one source file at a time. Therefore the compiler simply places external references in an object file's symbol table; the matching of external references to global definitions is left to the linker or loader.

There are three main types of object files.
1) A relocatable file code and data suitable for linking with other object files to create an executable or a shared object file.
2) An executable file a program suitable for execution; the file specifies how exec creates a program’s process image.
3) A shared object file code and data suitable for linking in two contexts. First, the linker may process it with other relocatable and shared object files to create another object file. Second, the dynamic linker combines it with an executable file and other shared objects to create a process image.

Symbol Table:
A symbol table holds information needed to locate and relocate a  program’s symbolic definitions and references. An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.
More specifically, a symbol table stores:

For each type name, its type definition.
For each variable name, its type. If the variable is an array, it also stores dimension information. It may also store storage class, offset in activation record etc.
For each constant name, its type and value.
For each function and procedure, its formal parameter list and its output type. Each formal parameter must have name, type, type of passing (by-reference or by-value), etc.



Loading:
loader is that is responsible for loading programs, one of the essential stages in the process of starting a program. Loading a program involves reading the contents of executable file, the file containing the program text, into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

As the system creates or augments a process image, it logically copies a file’s segment to a virtual memory segment.
A loader performs following tasks in order to execute a program:

1. Validation (permissions, memory requirements etc.);
2. Copying the program image from the disk into main memory;
3. Copying the command-line arguments on the stack;
4. Initializing registers (e.g., the stack pointer);
5. Jumping to the program entry point (_start).

Program Segments:
 
A C program is composed of the following segments

Text segment, the machine instructions that the CPU executes. Usually, the text segment is sharable so that only a single copy needs to be in memory for frequently executed programs, such as text editors, the C compiler, the shells, and so on. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.

Initialized data segment, usually called simply the data segment, containing global variables that are specifically initialized in the program.

Uninitialized data segment, often called the bss segment, named after an ancient assembler operator that stood for "block started by symbol." Data in this segment is initialized by the kernel to arithmetic 0 or null pointers before the program starts executing.

Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller's environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn't interfere with the variables from another instance of the function. Figure(3).

Heap, where dynamic memory allocation usually takes place. Historically, the heap has been located between the uninitialized data and the stack.

Figure(3) Stack segment



The stack is often accessed via a register called the stack pointer, which also serves to indicate the current top of the stack. Alternatively, memory within the frame may be accessed via a separate register, often termed the frame pointer, which typically points to some fixed point in the frame structure, such as the location for the return address.

Stack frames are not all the same size. Different subroutines have differing numbers of parameters, so that part of the stack frame will be different for different subroutines, although usually fixed across all activations of a particular subroutine. Similarly, the amount of space needed for local variables will be different for different subroutines.