Pre-processing- This is the first step in compilation process. This step expands macros, adds the content of header files into the source code and also trims some source depending on #if-#else-#endif directives.
Demo to show
macro and pre-processor directive expansion
$ cat main.c
#define PRINT_HELLO printf("Hello World");
printf ("Debug Version : %s:%d\n", __FILE__, __LINE__);
//Pre-processing with __DEBUG__ defined
$gcc -E main.c -D__DEBUG__
# 844 "/usr/include/stdio.h" 3 4
# 2 "main.c" 2
printf ("Debug Version : %s:%d\n", "main.c", 8);
//Pre-processing with out __DEBUG__ defined
# 844 "/usr/include/stdio.h" 3 4
# 2 "main.c" 2
Compilation- Compilation is the process of converting source code to intermediate code also known as object code. Object code is not exactly machine code and it is an intermediate state of the code before generating machine code. Some cross compiler has one extra step of convert these object code to assembly code and then assembler translates these assembly code to machine code.
Optimization- C is a higher level language thus all we write in our source code can be optimized by the Optimizer. Optimizer and compiler works in conjunction and this is a part of compiler. Optimizer can work in two ways optimize for speed and optimize for size. Speed optimization is done with various ways like – taking one or more variable in CPU register in the conditions like loops or arguments of functions. Size optimization is all about reducing higher level code by many ways like – not generating any code for unused variables, unreachanbe codes, loop and conditions obtimizations etc.
Now one thing to mention here is variables are by default get optimized if we use any optimization flag (-O) during compilation. Sometimes we may not need optimization for some variables and in the same time we require obtimization for the other section of the code. This is done with the keyword “volatile”. We put volatile before any variable and compiler/optimizer never apply obtimization to it.
Linking Format ELF- Executable and Linkable Format or ELF and a.out are most common form of binary format. This format has a big role in binary loading and linking prcess.
A binary normally contains a ELF header and program header and a body divided in in several sections defines in program headers. These sections are very common in every formats.
They are generally – code, data, rodata, import, export, debug etc. Objdump is one shell utility can be used to display these details.
Please note some section names may not match. We are discussing the concepts here.
Code section (.text) – conains the exeuatble code. We write main() and other functions. All these execuatble instructions goes to this section.
Read Only Data section (.rodata) – Data section is the segment of the file where we store the values of initialized const global or static variables and strings.
Data section (.data) – Data section is the segment of the file where we store the values of initialized global or static variables.
Import section- Import section or table is a tabuler form of function name and virtual address pairs. These virtual address fields are empty during loading time. Loader loads all imported shared objects or DLLs and fills these addresses from their export table.
Export section – This section is valid for a shared object or dynamic link libraries. DLL shares their functions by means of export tables. This is similar to import section and it also contains function and virtual address pairs. Loader populates this table and this information is used to fill import table of applications.
Debug section (.debug) – This section holds debug related infromations like – line by line, symbols details. This section is mainly for debugger. We set break points and do step by step debugging and print variables etc and all these mappings are done through this section infromation.
Runtime segments: We we run an excuatble and loader loads these sections in memory. It also creates some extra sections/segments. Loader creates -
1) code segment– Where executable code loaded
2) Data – this section holds static and global initialized data
3) Rodata – holds read-only strings and constants
4) BSS – this section is to hold un-initialized static and global variables. Loader or startup code of C runtime cleans this section to set all zero to this memory area.Thus we get all values by default to zero.
5) Heap – Heap is a memory region used for dynamic allocation
6) Stack – Stack section is for the storing stack pointer, local variables and and arguments and multiple call frames
7) Import – This section is for the list of function imported from shared modules or dynamic libraries
8) Export –This section applicable for shared modules only. This is the section which tells to the loader that the list of function names can be shared to other applications or other shared modules.
Loader creates code and data segments and copies these corresponding sections from binary to memory. Now we have another section which is BSS. We do not have such section in the binary but in the binary we have header and there it is mentioned the start and end of BSS section. Loader creates a BSS section and cleans this section with memset() to zero.
Before running this task it needs stack. System provides some pages for stack and it is ready to run. Run is not done untill we link all dynamic symbols. Loader looks all symbols in import section and loads all dynamic libraties and links all function entries. Now it is all okay to run. Before running we require some heap space by dynamic allocation. Tasks are given some memory pages for this. This is heap area. Now it is all okay to run now.
It generally starts with a symbol _start with is hidden in linker/compiler. Then after all segment initialization it opens three files with descriptors stdin, stdout and stderr. These file descriptors are used for taking inputs and printing outputs and errors. Scanf/Printf etc library calls use these descriptors internally. Next thing is argument parsing and creating argument count(argc) and argument vector(argv). Then last step is calling _main() with exit i.e. exit(main());. Now execution is inside our source code. We generally call exit() to quit from the task or return the status in main which will actually feed to the exit call.
You have viewed 1 page out of 248. Your C learning is 0.00% complete. Login to check your learning progress.