Sunday, January 23, 2011

Scope, storage class, and memory


1. Scope:


Scope specifies accessibility of a data object in different part of program in other words part of a program, where a data object can be accessed by it’s name. We’ll see scope of different variables in further sections.

2. Storage Class:


Storage class in C determines the part of a program memory where storage of a data object is allocated. Sometime it also defines life of the object i.e. how long the storage will exist. There are four storage classes defined by ANSI in C:

  •  auto
  •   register
  •   static
  •   extern (Global)

2.1 Automatic variable (auto storage class):


Keyword:  auto (optional)

Generally automatic variables are declared in the starting of program blocks (code enclosed within curly braces i.e. { }), but some compiler also supports declaration of automatic variable anywhere inside block although it not recommended because it affects readability of a program. They can be declared using auto keyword but it is optional. Any variable declared in a block is treated as automatic until unless any other storage class is specified. Function parameters are also considered as automatic variables.

Life:

Storage of an automatic variable is allocated at the time of its declaration and released upon exit of the block in which it is declared. Every time a block is entered, all automatic variables are initialized to its default value specified at their declaration. If any automatic variable is not initialized, it carries some junk data. It is always recommended to initialize every automatic variable at its declaration.

Scope:

The scope of automatic variable is local to the block including all the nested blocks inside that block. This is the reason why they are also called local variable. An automatic variable can’t be directly accessed (using its name specified at declaration) from outside of its declaration block. Indirectly (using mediator) they can be accessed by other blocks using pointers, function parameters, and global variables during its life time. If there is an automatic variable with same name in outer block or in global declaration then every access made on the variable will refer to the variable declared in current execution block.

Storage:

As discussed above, storage for an automatic variable is allocated at its declaration, and is freed upon exit of its declaration block. Allocation of automatic variable is done in the stack segment within the frame of function containing the block.


Example:

#include<stdio.h>
void myRoutine(int* param)
{ /* Block0 starts from here */
  /* param is auto (local) variable to this block */
  printf("Block0 Value of param:%d\n",*param);
 *param=123; /* modified the value */
  }/* Block 0 ends here */

int main()
{/* Block 1 starts from here */
  int u_var=25;              /* u_var is an auto variable and it’s life starts from here, declaration “auto int u_var;” is also OK */
  int in_var=10;        /* in_var is also an auto variable, it cannot be access before this point */
  {/* Block 2 starts from here */
    int in_var=5;         /* in_var is local variable to block 2 it will omit in_var of block1*/
    int bl2_var=20;
    /* u_var is accessible in this block too but in_var is not accessible in this block because there is another variable declared in this block with same name */
   printf("Block2: in_var=%d, u_var=%d\n",in_var,u_var);
  } /* block2 ends here */
   /* bl2_var and in_var of block2 are not accessible here */
  myRoutine(&in_var); /* in_var is passed through parameter it will be accessible in   block0 through param */
   printf("Block2: in_var=%d, u_var=%d\n",in_var,u_var);
   return 0;
} /* block1 ends here, therefore life of u_var and in_var ends here */

Output:
Block2: in_var=5, u_var=25
Block0 Value of param:10
Block2: in_var=123, u_var=25

2.2  Register Variables (register storage class):


Keyword:  register

We know that a computer has primary storage such as CPU registers, cache and RAM to store run time instructions and data. In these, CPU has fastest access to its registers and slowest to RAM. Primarily automatic variables are stored in RAM and compiler determines what variable is to be stored at what time in CPU registers. C provides register variables so that a programmer can instruct to compiler that this variable should be allocated to CPU register. Generally variables which are used repeatedly or whose access time is critical may be declared as register variable. Thus register variables provide a limited control over efficiency of program execution.

Life:

Similar to automatic variable, register variables are always part of program block they are allocated to a CPU register at the time of its declaration and deallocated upon exit of the block. In every aspect register variables are similar to automatic variables except their storage location.

Scope:

Scope rules are similar to automatic variables.

Storage:

Every register variable is likely to be stored in CPU register but it is not an obligation for CPU to do this. In case CPU doesn’t have sufficient free registers, it can refuse this request. Since every register variable occupies a register, so it is always recommended minimal use of register variable. Excess use of register variable may degrade CPU performance.


Example:

#include<stdio.h>
int main()
{
 register int reg_var=25;          /* reg_var is a register variable */
 int aut_var=10;                     /* aut_var is an auto variable */

/*
  Porgram code here:
  */
  return 0;
}

2.3  External Variable (global variables):


Keyword: extern

Register and automatic variables have limited scope (block in which they are declared) and limited life. External variables are accessible from any block and it remains for the entire execution of program.  External variables are also called as global variables.  Usually declaration of global variable is always kept out side of any block and at the beginning of a source file, and it doesn’t require extern keyword. If the program is in several source files, and a variable is defined in let say file1.c and used in file2.c and file3.c then the extern keyword must be used in file2.c and file3.c.
If a global variable is to be accessed from multiple files then usual practice is to collect extern declarations of variables and functions in a separate header file (.h file) then include by using #include directive. External variables may be initialized in declarations just as automatic variables; however, the initializers must be constant expressions. The initialization is done only once at compile time, i.e. when memory is allocated for the variables. In general, it is always recommended to avoid using external variables as they destroy the concept of a function as a independent module.
There may be occasions when the use of an external variable significantly simplifies the implementation of an algorithm.  Suffice it to say that external variables should be used rarely and with caution. Two global variables with same name can not exist in a C program.


Life:
Memory for such variables is allocated when the program begins execution, and remains allocated until the program terminates. 

Scope:
The scope of external variables is global, i.e. the entire source code in the file following the declarations. All functions following the declaration may access the external variable by using its name.  However, if a local variable having the same name is declared within a function, references to the name will access the local variable. It is also accessible from any other source file.

Storage:
Memory of an uninitialized global variable is allocated in bss section since bss section is initialized to zero so every uninitialized global variable becomes zero initialized. Initialized global variable is allocated in data section.

Example:
File1.c
#include<stdio.h>  
int GlobalVar1=20;
int GlobalVar2=10;
int main()
{
printf(“GlobalVar=%d\n”,GlobalVar1);
return 0;
}

File2.c
extern int GlobalVar1;  /* GlobalVar1 of File1.c is accessible any where in File2.c */
void foo(void)
{
  extern int GlobalVar2; /* GlobalVar2 of File1.c is accessible only inside function foo */
   printf(“GlobalVar1=%d, GlobalVar2=%d\n”,GlobalVar1,GlobalVar2);
}

 

2.4 Static Variable (static storage class):


Keyword: static

Similar to global variables, static storage class provides a life time over the entire program, however it also provides a way to limit the scope of such variables. These variables are declared with static keyword. Declaration of static variable may be kept out side of any block as well in side a block.
Similar to global variables if static variables are left uninitialized by programmer, they are initialized to zero by compiler. Static automatic variables continue to exist even after the block in which they are defined terminates. Thus, the value of a static variable in a function is retained between repeated function calls to the same function i.e. initialization of a static variable in side a block is ignored after first execution of the block. A static variable can be accessed only in the file where it is defined.


Life:
Similar to global variables memory for such variables is allocated when the program begins execution, and remains allocated until the program terminates. 

Scope:
If a static variable is defined inside a block, the scope of static variables becomes identical to that of automatic variables, i.e. it is local to the block in which it is defined; however, the storage allocated becomes permanent for the duration of the program. If it is defined out side of block, the scope of static variables becomes similar to that of global variables the only limitation is that they can not be accessed from any other file.

Storage:
Memory of an uninitialized static variable is allocated in bss section and initialized static variable is allocated in data section; the initializer must be constant expression, and initialization is done only once at compile time when memory is allocated for the static variable.


Example:

#include<stdio.h>  
static int StatVar1=20;  /* StatVar1 is accessible from anywhere in File1.c */

void foo(void)
{
  static int StatVar2=10;  /* StatVar2 is accessible only inside function foo */
  printf(“StatVar2=%d\n”,StatVar2);
  StatVar2 +=10;
}
int main()
{
printf(“StatVar1=%d\n”,StatVar1);
foo();
foo();
return 0;
}

Out Put:

StatVar1=20
StatVar2=10
StatVar2=20

3. Memory Allocation:

In C, memory allocation for variables is done by two mean:

3.1 Static Allocation:


This type of allocation is done for global and static variables. Such type of memory is reserved by compiler at compilation and allocated when the program is loaded in to computer memory and freed only when program exits. Compiler reserves such memory in form of bss and data sections.

 

3.2 Automatic Allocation:


This type of allocation happens for an automatic variables such as a function argument or a local variable. The space for an automatic variable is allocated when the compound statement containing the declaration is entered, and is freed when that compound statement is exited.  Automatic allocation takes place  in the stack segment only.

 

3.3 Limitation of static and automatic allocation:


For above two types of memory allocation, size of block to be allocated is defined by compiler and it depends on the type of the variable. They also have limitation of statically defined size of allocation particularly for array.  However, in many situations, it is not clear how much memory the program will actually need.  If too much memory is allocated and then not used, there is a waste of memory.  If not enough memory is allocated, the program will not work for larger inputs.

 

3.4 Dynamic Memory Allocation:


A program can be more flexible, if during execution, it could allocate required memory when needed and free up the memory when it is no more needed.  Allocation of memory during execution is called dynamic memory allocation. C provides library functions to allocate and free up memory dynamically during program execution.  Dynamic memory is allocated on the heap by the system.
Dynamic memory allocation also has limits.  If memory is repeatedly allocated and not freed, the system will run out of memory.

 

3.5 Standard C Library functions for dynamic memory allocation:

 

malloc()


Description:
Allocates specified number of bytes from heap. It returns address of the first byte of allocated memory. The return address doesn’t point to any specific data type, so we need to typecast it to the type of destination pointer. It returns NULL pointer if allocation fails.

Prototype:
void * malloc(size_t  nByte);
nByte: Number of bytes to be allocated.

Return Value:
Address of first byte of allocated block or NULL if allocation fails.

Usage:
char   *Ptr=NULL;
Ptr = (char *) malloc(10*sizeof(char));  /* allocates memory for 10 characters */

calloc()


Description:
calloc is very similar to malloc in its operation except it initializes all elements of the allocated memory to zero and it has two parameters. It allocates requested number of fixed size elements. Size of element is fixed in an allocation and is specified by a parameter.

Prototype:
void* calloc(size_t nElem, size_t elemSize) ;
nElem: Number of elements to allocate
elemSize: Size of each elements

Return Value:
Address of first byte of allocated block or NULL if allocation fails.

Usage:
char   *Ptr=NULL;
Ptr = (char *) calloc(10, sizeof(char));  /* allocates memory for 10 characters */

realloc()


Desciption:
Some time we may need to change size of pre-allocated block. realloc() serves this requirement. The realloc function changes the size of a given block. If requested size of block is larger than the existing block then there may be situation where space after the existing block is in use then realloc() allocates a new block of requested size and copies data of old block to new location and frees the old block.

Prototye:
void * realloc(void *CurrBlock,  size_t NewSize);
CurrBlock: Address of existing memory block
NewSize: New size requested for the block.

Return Value:
Address of first byte of reallocated block or NULL if reallocation fails.

Usage:
char *pCh1=NULL, *pCH2=NULL;
pCh1=(char*)malloc(10);
pCh2=(char*)realloc(pCh, 20);

Notes:
realloc() changes the size of the memory block pointed to by pCh1 to 20 bytes. The contents will be unchanged to the minimum of the old and new sizes; newly allocated memory will be uninitialized. If pCh1 passed in realloc() is NULL, the call is equivalent to malloc(20); if requested size is equal to zero, the call is equivalent to free(pCh1). Unless pCh1 is NULL, it must have been returned by an earlier call to malloc(), calloc() or realloc(). If the area pointed by pCh1 was moved, a free(pCh1) is done by calloc. You can not pass address of an array for resizing.

free()

Description:
Releases a memory block dynamically allocated by malloc(), calloc(), and realloc() and adds it to free list of heap.

Prototye:
void  free(void *MemBlock);
MemBlock: Address of memory block to be released

Return Value:
None.

Usage:
char *pCh1=NULL;
pCh1=(char*)malloc(10*sizeof(char));  /* allocate memory for 10 characters */
/* code to use block pointed by pCh1 */
free(pCh1);
pCh1=NULL;
/* no more usage of block pointed by pCh1 */


Sunday, December 19, 2010

Qualifiers Demystified

Type qualifiers were introduced by ANSI C, it is used to control optimization done by the compiler on any data object.
There are two type of qualifiers in C:

1. const
2. volatile

const and volatile type qualifiers can be applied to any type of data objects. When using a type qualifier with an array identifier, each element of the array is qualified by the compiler, not the array type itself. A qualifier can be applied on structure/union objects and an individual structure/union member can also be qualified. When a structure object is qualified, each member of the structure is qualified using same qualifier and when a qualifier is applied on any particular structure member then only that member is qualified.

1. The 'const' qualifier:

This qualifier instructs compiler to mark any data object as read only i.e. no one can change the value of the data object during program execution. Objects qualified by the const keyword cannot be modified. This means that an object declared as const cannot serve as the operand in any operation that changes its value.

Some const qualified objects and their behavior are given below:

Constant integer:
const int Var = 10;
The value of Var can’t be modified.

Pointer to a constant integer:
const int *Ptr;
The value in the location pointed by Ptr can’t be modified i.e. *Ptr is non-modifiable but Ptr is modifiable;

Constant pointer:
int * const Cptr;   
A pointer which will always point to the same location.  Cptr is non-modifiable but *Cptr is modifiable;

A constant pointer to a constant integer:
const int *const Ccp;
Neither the pointer nor the integer can be modified. Ccp and *Ccp, both are not modifiable.

Redundant use of const:
const const int y; 
Illegal.

When a data object is declared as a global constant, it is stored in .ro (read only) segment of a program and its value can not be altered by any way, but when a constant data object is declared within a function/block’s scope, it is stored in the process stack and its value can be altered using a non constant qualified object. The scenario is described below using a code snippet.

#include<stdio.h>
const int glob=10;     /* reside inside read only data segment */
int main()
{
   const int localc=10;      /* goes onto stack and localc is marked as read only */
   int *localp1=&localc;  /* WARNING: local const accessed through a 
                                            non const  qualified object */
   int *localp2=&glob;    /* WARNING: global const accessed through a
                                            non const  qualified object */

   *localp1=0;                  /* Allowed */
   *localp2=0;                 /* Illegal: since address localp2 reside in.ro
                                          section and any alteration in .ro section results  a
                                          segmentation fault */
.
.
.
return 0;
}


Use of const qualifier with aggregate types is also explained in code snippet given below:

#include<stdio.h>

struct student{
int roll_no;
int age;
};

struct employee{
const int emp_id;
int salary;
};

int main()
{
const struct student John = {35,18};
struct employee Eric = {22,20000};
John.roll_no = 36;                     /* Illegal: John is read only */
John.age=26;                           /* Illegal: John is read only */

Eric.emp_id=23;                        /* Illegal: emp_id is read only */
Eric.salary=200000;                    /* OK */
.
.
.
.
.
return 0;
}


Attempting to modify a const object using a pointer to a non-const qualified type causes unpredictable behavior. gcc allows you to modify a constant object using non-const qualified pointer. The compiler just warns you, when any such type of casting occurs in code.

2. The 'volatile' qualifier:
The volatile qualifier alters the default behavior of the variable and does not attempt to optimize the storage referenced by it. volatile means the storage is likely to change at anytime and that is something outside the control of the user program. This means that if you reference the variable, the program should always check the physical address (i.e. a mapped input fifo), and not use it in a cached way.
This qualifier forces the compiler to allocate memory for the volatile object, and to always access the object from memory.
The use of volatile disables optimizations with respect to referencing the object. If an object is volatile qualified, it may be changed between the time it is initialized and any subsequent assignments. Therefore, it cannot be optimized.

Some volatile qualified objects and their behavior are given below:

Volatile data object:
To declare a volatile data object, include the keyword volatile before or after the data type in the variable definition. For instance both of these declarations will declare vint to be a volatile integer:

volatile int vint;
int volatile vint;

Pointer to a volatile data object:
Both of these declarations declare ptr to be a pointer to a volatile integer:

volatile int * ptr;
int volatile * ptr;

Volatile pointer to a non-volatile data object:
Following declaration declares vptr to be a  volatile pointer to non-volatile integer:

int * volatile vptr;

Volatile pointer to a volatile data object:
Following declaration declares vptrv to be a  volatile pointer to volatile integer:

int volatile * volatile vptrv;

If you apply volatile to a struct or union, the entire contents of the struct/union are volatile. If you don't want this behavior, you can apply the volatile qualifier to the individual members of the struct/union.

Use of volatile variable:
A volatile variable is used, where the value of variable changes unexpectedly.
You can see usage of volatile variables in
• Memory-mapped peripheral registers.
• Global variables modified by an interrupt service routine.
• Global variables within a multi-threaded application.
I’ll take example of memory mapped peripheral registers.

void TestFun()
{
     unsigned int  * ptr = (unsigned int *) 0x65917430;
     /* Wait for register to become non-zero. */
     while (0 == *ptr);
     printf(“Register updated\n”);
}

In this case optimization will be performed on *ptr. Assembler will generate assembly code, something like this:

mov ptr, #0x65917430    /* ptr <- 0x65917430 */
mov a, @ptr                     /* move data at address 0x65917430 to register a */                     
loop bz loop                     /* jump to loop if a is equal to zero */

Problem: It will read data from address 0x65917430 only once, if data read is zero it will go into a non-ending loop.
To handle this problem, we disable any optimization on the variable *ptr by qualifying it as volatile and the code will look like:

void TestFun()
{
     unsigned int  volatile * ptr = (unsigned int *) 0x65917430;
     /* Wait for register to become non-zero. */
     while (0 == *ptr);
     printf(“Register updated\n”);
}
Now assembler will generate assembly code
mov ptr, #0x65917430    /* ptr <- 0x65917430 */
loop mov a, @ptr            /* move data at address 0x65917430 to register a */                    
bz loop                           /* jump to loop if a is equal to zero */

Now in every execution of loop, address 0x65917430 is checked for non zero value.


______
For any suggestion or query mail me to jais.ebox@gmail.com

Wednesday, October 6, 2010

Inside C Program

C is a programming language originally developed for developing the Unix operating system. It is a low-level and powerful language language commonly used in system programming.

Compilation of a C program:
To create an executable program, we  compile a source file containing main function. Example to compile a program named hello.c is given below:

$gcc hello.c

The compiler displays status, warning, and error messages to standard error output (stderr). If no errors occur, the compiler creates an executable file named a.out in the current working directory. We can run a.out as follows:

$./a.out


$./a.out
Hello World!


A program may have more than one source files. For example a program is divided into two files: main.c, containing the main program, and func.c, containing the functions, those have been used in main.c. The command for compiling the two source file together is:

$gcc main.c func.c

gcc displays errors and warnings corresponding to each file with location information.

Inside Compiler:
A compiler performs a set of steps in order to convert source code into executable. Compilation and linking are the two most important steps in the process. For each source file, GCC calls the language compiler to create an object file and then calls linker, which builds an a.out file from the object files.



                                    Figure (1) Compiler overview

Object File:
An object file is basically a file containing machine language instructions and data in a form that the linker can use to create an executable program. Each routine or data item defined in an object file has a corresponding symbol name by which it is referenced. A symbol generated for a routine or data definition can be either a local definition or global definition. Any reference to a symbol outside the object file is known as an external reference.

To keep track of where all the symbols and external references occur, an object file has a symbol table. The linker uses the symbol tables of all input object files to match up external references to global definitions.

Local Definitions:
A local definition is a definition of a routine or data that is accessible only within the object file or block in which it is defined. Such a definition cannot be directly accessed from another object file.

Global Definitions:
A global definition is a definition of a procedure, function, or data item that can be accessed by code in another object file. For example, the C compiler generates global definitions for all variable and function definitions that are not static.

External References:
An external reference is an attempt by code in one object file to access a global definition in another object file. A compiler cannot resolve external references because it works on only one source file at a time. Therefore the compiler simply places external references in an object file's symbol table; the matching of external references to global definitions is left to the linker or loader.

There are three main types of object files.
1) A relocatable file code and data suitable for linking with other object files to create an executable or a shared object file.
2) An executable file a program suitable for execution; the file specifies how exec creates a program’s process image.
3) A shared object file code and data suitable for linking in two contexts. First, the linker may process it with other relocatable and shared object files to create another object file. Second, the dynamic linker combines it with an executable file and other shared objects to create a process image.

Symbol Table:
A symbol table holds information needed to locate and relocate a  program’s symbolic definitions and references. An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.
More specifically, a symbol table stores:

For each type name, its type definition.
For each variable name, its type. If the variable is an array, it also stores dimension information. It may also store storage class, offset in activation record etc.
For each constant name, its type and value.
For each function and procedure, its formal parameter list and its output type. Each formal parameter must have name, type, type of passing (by-reference or by-value), etc.



Loading:
loader is that is responsible for loading programs, one of the essential stages in the process of starting a program. Loading a program involves reading the contents of executable file, the file containing the program text, into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

As the system creates or augments a process image, it logically copies a file’s segment to a virtual memory segment.
A loader performs following tasks in order to execute a program:

1. Validation (permissions, memory requirements etc.);
2. Copying the program image from the disk into main memory;
3. Copying the command-line arguments on the stack;
4. Initializing registers (e.g., the stack pointer);
5. Jumping to the program entry point (_start).

Program Segments:
 
A C program is composed of the following segments

Text segment, the machine instructions that the CPU executes. Usually, the text segment is sharable so that only a single copy needs to be in memory for frequently executed programs, such as text editors, the C compiler, the shells, and so on. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.

Initialized data segment, usually called simply the data segment, containing global variables that are specifically initialized in the program.

Uninitialized data segment, often called the bss segment, named after an ancient assembler operator that stood for "block started by symbol." Data in this segment is initialized by the kernel to arithmetic 0 or null pointers before the program starts executing.

Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller's environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn't interfere with the variables from another instance of the function. Figure(3).

Heap, where dynamic memory allocation usually takes place. Historically, the heap has been located between the uninitialized data and the stack.

Figure(3) Stack segment



The stack is often accessed via a register called the stack pointer, which also serves to indicate the current top of the stack. Alternatively, memory within the frame may be accessed via a separate register, often termed the frame pointer, which typically points to some fixed point in the frame structure, such as the location for the return address.

Stack frames are not all the same size. Different subroutines have differing numbers of parameters, so that part of the stack frame will be different for different subroutines, although usually fixed across all activations of a particular subroutine. Similarly, the amount of space needed for local variables will be different for different subroutines.