Why study programming? Ethical gray hat hackers should study programming and learn as much about the subject as possible in order to find vulnerabilities in programs and get them fixed before unethical hackers take advantage of them. It is very much a foot race: if the vulnerability exists, who will find it first? The purpose of this chapter is to give you the survival skills necessary to understand upcoming chapters and later find the holes in software before the black hats do.
In this chapter, we cover the following topics:
• C programming language
• Computer memory
• Intel processors
• Assembly language basics
• Debugging with gdb
• Python survival skills
The C programming language was developed in 1972 by Dennis Ritchie from AT&T Bell Labs. The language was heavily used in Unix and is thereby ubiquitous. In fact, much of the staple networking programs and operating systems are based in C.
Although each C program is unique, there are common structures that can be found in most programs. We’ll discuss these in the next few sections.
All C programs contain a main() structure (lowercase) that follows this format:
<optional return value type> main(<optional argument>) {
<optional procedure statements or function calls>;
}
where both the return value type and arguments are optional. If you use command-line arguments for main(), use the format
<optional return value type> main(int argc, char * argv[]){
where the argc integer holds the number of arguments and the argv array holds the input arguments (strings). The parentheses and brackets are mandatory, but white space between these elements does not matter. The brackets are used to denote the beginning and end of a block of code. Although procedure and function calls are optional, the program would do nothing without them. Procedure statements are simply a series of commands that perform operations on data or variables and normally end with a semicolon.
Functions are self-contained bundles of algorithms that can be called for execution by main() or other functions. Technically, the main() structure of each C program is also a function; however, most programs contain other functions. The format is as follows:
<optional return value type> function name (<optional function argument>){
}
The first line of a function is called the signature. By looking at it, you can tell if the function returns a value after executing or requires arguments that will be used in processing the procedures of the function.
The call to the function looks like this:
<optional variable to store the returned value =>function name (arguments
if called for by the function signature);
Again, notice the required semicolon at the end of the function call. In general, the semicolon is used on all stand-alone command lines (not bounded by brackets or parentheses).
Functions are used to modify the flow of a program. When a call to a function is made, the execution of the program temporarily jumps to the function. After execution of the called function has completed, the program continues executing on the line following the call. This will make more sense during our discussion in Chapter 11 of stack operation.
Variables are used in programs to store pieces of information that may change and may be used to dynamically influence the program. Table 10-1 shows some common types of variables.
When the program is compiled, most variables are preallocated memory of a fixed size according to system-specific definitions of size. Sizes in the table are considered typical; there is no guarantee that you will get those exact sizes. It is left up to the hardware implementation to define this size. However, the function sizeof() is used in C to ensure that the correct sizes are allocated by the compiler.
Variables are typically defined near the top of a block of code. As the compiler chews up the code and builds a symbol table, it must be aware of a variable before it is used in the code later. This formal declaration of variables is done in the following manner:
<variable type> <variable name> <optional initialization starting with "=">;
For example:
int a = 0;
where an integer (normally 4 bytes) is declared in memory with a name of a and an initial value of 0.
Once declared, the assignment construct is used to change the value of a variable. For example, the statement
x=x+1;
is an assignment statement containing a variable x modified by the + operator. The new value is stored into x. It is common to use the format
destination = source <with optional operators>
where destination is the location in which the final outcome is stored.
The C language comes with many useful constructs for free (bundled in the libc library). One of the most commonly used constructs is the printf command, generally used to print output to the screen. There are two forms of the printf command:
printf(<string>);
printf(<format string>, <list of variables/values>);
The first format is straightforward and is used to display a simple string to the screen. The second format allows for more flexibility through the use of a format string that can be composed of normal characters and special symbols that act as placeholders for the list of variables following the comma. Commonly used format symbols are listed and described in Table 10-2.
These format symbols may be combined in any order to produce the desired output. Except for the symbol, the number of variables/values needs to match the number of symbols in the format string; otherwise, problems will arise, as described in Chapter 12.
The scanf command complements the printf command and is generally used to get input from the user. The format is as follows:
scanf(<format string>, <list of variables/values>);
where the format string can contain format symbols such as those shown for printf in Table 10-2. For example, the following code will read an integer from the user and store it into the variable called number:
scanf("%d", &number);
Actually, the & symbol means we are storing the value into the memory location pointed to by number; that will make more sense when we talk about pointers later in the chapter in the “Pointers” section. For now, realize that you must use the & symbol before any variable name with scanf. The command is smart enough to change types on-the-fly, so if you were to enter a character in the previous command prompt, the command would convert the character into the decimal (ASCII) value automatically. However, bounds checking is not done in regard to string size, which may lead to problems (as discussed later in Chapter 11).
The strcpy command is probably the most dangerous command used in C. The format of the command is
strcpy(<destination>, <source>);
The purpose of the command is to copy each character in the source string (a series of characters ending with a null character: