Wednesday, October 6, 2010

Inside C Program

C is a programming language originally developed for developing the Unix operating system. It is a low-level and powerful language language commonly used in system programming.

Compilation of a C program:
To create an executable program, we  compile a source file containing main function. Example to compile a program named hello.c is given below:

$gcc hello.c

The compiler displays status, warning, and error messages to standard error output (stderr). If no errors occur, the compiler creates an executable file named a.out in the current working directory. We can run a.out as follows:

$./a.out


$./a.out
Hello World!


A program may have more than one source files. For example a program is divided into two files: main.c, containing the main program, and func.c, containing the functions, those have been used in main.c. The command for compiling the two source file together is:

$gcc main.c func.c

gcc displays errors and warnings corresponding to each file with location information.

Inside Compiler:
A compiler performs a set of steps in order to convert source code into executable. Compilation and linking are the two most important steps in the process. For each source file, GCC calls the language compiler to create an object file and then calls linker, which builds an a.out file from the object files.



                                    Figure (1) Compiler overview

Object File:
An object file is basically a file containing machine language instructions and data in a form that the linker can use to create an executable program. Each routine or data item defined in an object file has a corresponding symbol name by which it is referenced. A symbol generated for a routine or data definition can be either a local definition or global definition. Any reference to a symbol outside the object file is known as an external reference.

To keep track of where all the symbols and external references occur, an object file has a symbol table. The linker uses the symbol tables of all input object files to match up external references to global definitions.

Local Definitions:
A local definition is a definition of a routine or data that is accessible only within the object file or block in which it is defined. Such a definition cannot be directly accessed from another object file.

Global Definitions:
A global definition is a definition of a procedure, function, or data item that can be accessed by code in another object file. For example, the C compiler generates global definitions for all variable and function definitions that are not static.

External References:
An external reference is an attempt by code in one object file to access a global definition in another object file. A compiler cannot resolve external references because it works on only one source file at a time. Therefore the compiler simply places external references in an object file's symbol table; the matching of external references to global definitions is left to the linker or loader.

There are three main types of object files.
1) A relocatable file code and data suitable for linking with other object files to create an executable or a shared object file.
2) An executable file a program suitable for execution; the file specifies how exec creates a program’s process image.
3) A shared object file code and data suitable for linking in two contexts. First, the linker may process it with other relocatable and shared object files to create another object file. Second, the dynamic linker combines it with an executable file and other shared objects to create a process image.

Symbol Table:
A symbol table holds information needed to locate and relocate a  program’s symbolic definitions and references. An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.
More specifically, a symbol table stores:

For each type name, its type definition.
For each variable name, its type. If the variable is an array, it also stores dimension information. It may also store storage class, offset in activation record etc.
For each constant name, its type and value.
For each function and procedure, its formal parameter list and its output type. Each formal parameter must have name, type, type of passing (by-reference or by-value), etc.



Loading:
loader is that is responsible for loading programs, one of the essential stages in the process of starting a program. Loading a program involves reading the contents of executable file, the file containing the program text, into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

As the system creates or augments a process image, it logically copies a file’s segment to a virtual memory segment.
A loader performs following tasks in order to execute a program:

1. Validation (permissions, memory requirements etc.);
2. Copying the program image from the disk into main memory;
3. Copying the command-line arguments on the stack;
4. Initializing registers (e.g., the stack pointer);
5. Jumping to the program entry point (_start).

Program Segments:
 
A C program is composed of the following segments

Text segment, the machine instructions that the CPU executes. Usually, the text segment is sharable so that only a single copy needs to be in memory for frequently executed programs, such as text editors, the C compiler, the shells, and so on. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.

Initialized data segment, usually called simply the data segment, containing global variables that are specifically initialized in the program.

Uninitialized data segment, often called the bss segment, named after an ancient assembler operator that stood for "block started by symbol." Data in this segment is initialized by the kernel to arithmetic 0 or null pointers before the program starts executing.

Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller's environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn't interfere with the variables from another instance of the function. Figure(3).

Heap, where dynamic memory allocation usually takes place. Historically, the heap has been located between the uninitialized data and the stack.

Figure(3) Stack segment



The stack is often accessed via a register called the stack pointer, which also serves to indicate the current top of the stack. Alternatively, memory within the frame may be accessed via a separate register, often termed the frame pointer, which typically points to some fixed point in the frame structure, such as the location for the return address.

Stack frames are not all the same size. Different subroutines have differing numbers of parameters, so that part of the stack frame will be different for different subroutines, although usually fixed across all activations of a particular subroutine. Similarly, the amount of space needed for local variables will be different for different subroutines.

No comments:

Post a Comment