\n"; ?>
Security

Security-specific Programming Errors (Part 1)

Thomas Biege

Table of Contents

  1. Introduction/Motivation
  2. C

Introduction/Motivation

"What does security have to do with me as a programmer?"
"All I want is to finish my program quickly and I want it to have as many functions as possible."

Statements like these are fairly typical of a lot of (but not all!) programmers. Such a pragmatic and shortsighted approach isn't entirely unfounded. The pressure on and expectations of development teams in commercial as well as non-profit (for example, KDE vs. GNOME, kernel development etc.), software is high. Mammoth software products have to be shipped in record time, simply because the manager who negotiated the contract with the customer lacked the experience to realistically assess the time frame. Or a fight with the competition has to be won. Owing to this situation, security problems are a recurring feature of large and small programs. These security problems generate costs not only within the company doing the development, but also further down the line - with the customer.

These costs are incurred by the following activities:

  • Programmers need to be pulled out of current projects to eliminate the security flaw
  • Customers have to be notified
  • The new product has to be made available to the customer
  • The customer has to replace the old product. This can lead to errors which, in turn, represents more work for in-house support staff
  • The image of the company suffers, resulting in long-term damage (consider the attacks on Microsoft's own network using Microsoft products)
  • Further, products for banks and insurance companies generally need to be checked by an external (and very expensive) consultant

A large portion of these costs can be avoided by making security an issue from the very outset (as part of the QA process). To achieve this, the developers need to know all about the vulnerabilities and characteristics of the programming language they are using as well as how to avoid them and work around them.

In special cases - financial and insurance products, privileged server programs etc. - an additional review of the source code by two experienced people, usually programmers, should be carried out. These 'reviewers' should not be involved in the project and they should review the code in turn (four eyes are better than two) to make sure that no errors have found their way into the program.

Armed with the knowledge presented in this article and the source code review, a lot of the errors should disappear from the software product, thus keeping any follow-on costs to a minimum.

C

C is one of the most commonly used programming languages. It is used to program the kernels of operating systems as well as small privileged system tools, server daemons and graphical interfaces (C/C++).

This is why most security flaws are found in programs that were written with C. This does not mean to say that C is an "insecure" language per se, but that C is very widespread, powerful and flexible.

A lot of the security risks described in this section also apply to other programming languages as they are dependent on the operating environment.

Buffer Overflow

Buffer overflows are one of the main reasons for security vulnerabilities and program crashes. A buffer overflow occurs whenever data from an untrusted source such as a keyboard, network or user file is stored in a fixed-size chunk of memory without array bounds checking.

The consequences of buffer overflows depend largely on the storage type. However, let us first briefly consider the memory management process of programs on Intel-based processors in order to better understand the processes in the following attacks.

Local variables are stored on the stack and global variables are stored on the heap. The stack and heap share the same chunk of memory, with the heap growing from the bottom up, and the stack growing downwards.

Every function has its own stack frame where it saves its local variables. The same applies to code blocks enclosed in the braces { and } in C/C++. In addition to local data, the contents of the CPU registers first need to be saved before jumping to the function's machine code on the stack. This is to enable a return to the calling function after the function is done.

First, the function parameters are pushed to the stack before the routine is called.  
The Assembler command CALL saves the current position in the machine code, which is represented by CS:IP, and the BP register, that it requires for its own stack frame, that of the calling function.
Finally, the function sets up its stack frame for the local variables.

When the function is done, it returns to its parent program by using RET to restore the CS, IP and BP registers saved on the stack. This means that execution of the machine code resumes at the point where the saved CS:IP points, i.e., after the CALL command.

Lets us now consider the consequences.

Stack. Buffer overflows on the stack can be exploited in three ways.

  • The contents of variables above the variable in question on the stack can be overwritten with any kind of data by the attacker. A classic example of a security vulnerability that can be exploited is password-based authentication. The password is first retrieved from a local database and stored in a variable. Later on, the user is prompted to enter the password and the program compares the two strings.

    For example:

    [...]
    /* that's our secret phrase */
    char origPassword[12] = "Secret\0";
    char userPassword[12];
    [...]
    
    gets(userPassword);   /* read user input */
    [...]
    
    if(strncmp(origPassword, userPassword, 12) != 0)
    {
            printf("Password doesn't match!\n");
    exit(-1);
    }
    [...]
    /* give user access to everything */
    [...]
    
    

    If the user now enters more than 12 characters (32-bit alignment), he will overwrite the contents of origPassword[]. Thus, if he enters opensesame!!opensesame!!', userPassword[] and origPassword[] contain the same string (opensesame!!) and the comparison is thus positive.

  • Of course, not only the contents of variables, but also the saved registers on the stack can be overwritten. Thus, by entering even more characters and overwriting the instruction pointer (IP), the attacker can execute the program code, with RET at the end of the function, at any point in the program. Generally, however, the program's own code is not used; instead, the CPU is fed with the attacker's own machine code. To do this, the machine code is written into the variable, i.e., the stack, and, in addition, the saved IP address is set to the start address of the attacker code. If the variable is too small to accept the machine code, then it can still be stored in the program environment, on the heap or elsewhere in the accessible address space.
    When the function finishes, RET fills the IP register of the CPU with the IP value from our stack, which was set by the attacker, and the computer now faithfully executes the attacker's code sequences.

  • Moreover, function pointers can be overwritten in order to execute third-party code when the pointer is used. The principle is therefore the same. The attacker places his machine code in a global or local variable or in the program environment - no overflow is required to do this, a place where the code can be stored is all that is needed - and has the function pointer point to his program code.

    When the function pointer is used to call the function, it is not the function code that is executed but the attacker's code instead.

    For example:

    [...]
    long (* funcptr) () = atol();
    [...]
    /*
    ** the attacker writes his code somewhere in the
    ** addressable memory
    */
    [...]
    
    /*
    ** thanks to an overflow in the program code, the attacker
    ** overwrites the value which (*funcptr) contains
    ** with the start address of his own code
    */
    [...]
    
    /*
    ** the function is called by the pointer and the third-
    ** party code is thus executed
    */
    (*funcptr)(string);
    [...]
    

Heap. Just like the stack, heap overflows can be used to modify data and function pointers thus altering the manner in which the program behaves to the attacker's advantage. The heap also offers the chance to overwrite the jmp_buf variable of the setjmp(3) function. Among other things, the Jumpbuffer contains the address for the position in the program code when setjmp(3) is called. When this value is overwritten with the start address of the own machine code and if longjmp(3) is then called, the IP register is set to the beginning of the third-party code and thus made to execute.

Range. Errors that are difficult to find are caused by exceeding value ranges with numeric variables. The code snippets below should illustrate the danger.

For example:

1.)

[...]
unsigned int uintAnzahl = GetZahlFromUser();
unsigned int uintGroesse = uintAnzahl * sizeof(struct
                                            myStructure);

/*
** When the user enters the maximum for the 
** range unsigned int
** (UINT_MAX defined in limits.h) as a number, then a value
** greater than UINT_MAX is obtained by
** multiplying it.
** The variable now overflows, and the consequence is
** that uintGroesse is allocated a smaller value.
*/

myStructureArray[i] = malloc(uintGroesse);

/*
** with the malloc(3) call, a smaller portion
** of memory is allocated than
** UINT_MAX * sizeof(struct myStructure);
** The smaller buffer thus inevitably leads to a 
** buffer overflow.
*/
[...]

2.)

[...]
unsigned int uintAnzahl = GetZahlFromUser();
myArray[i] = malloc(uintAnzahl + strlen("oops!"));

/*
** The same happens with addition,
** the buffer allocated by malloc(3) is too small
*/

3.)

[...]
char Buffer[1024];
[...]

int intAnzahl = GetZahlFromUser();
[...]

if(intAnzahl > sizeof(Buffer))
{
fprintf(stderr, "Buffer too small!\n");
exit(-1);
}

/*
** If we specify -1 for intAnzahl, the expression in
** the if condition is FALSE, but if we 
** a malloc(3), memcpy(3) or similar
** later, then -1 is handled as a positive
** value by the functions. The value -1
** corresponds to approx. 4 GB.
*/
[...]

A number of system/library calls and program segments, which are the most common reason for buffer overflows are listed at the end of this section.

  • gets(3)

    Data is read from stdin into a static buffer. The most famous bug of this kind was exploited by the Morris Internet Worm in fingerd in order to execute commands on a computer across the network.

    Wrong:

    [...]
    char HopeItFits[12];
    [...]
    
    while(gets(HopeItFits) != NULL)
    {
        puts(HopeItFits);
        memset(HopeItFits, 0, sizeof(HopeItFits));
    
    }
    [...]
    
    

    Right:

    With fgets(3) data can be read securely by restricting size. By specifying the amount of data to read with sizeof(HopeItFits), i.e., 12 bytes, fgets(3) then reads only 12-1 bytes and also adds the NULL character at the end of the line. This way, no problems occur when the string is further processed with the str*-functions in string.h.

    [...]  char HopeItFits[12]; [...]
    
    while(fgets(HopeItFits, sizeof(HopeItFits), stdin) !=
                                                     NULL)
    
    {
         puts(HopeItFits);
         memset(HopeItFits, 0, sizeof(HopeItFits));
    }
    [...]
    
    
  • scanf(3)

    The scanf functions also generally read data without array bounds checking.

    Wrong:

    [...]
    char HopeItFits[12];
    [...]
    
    while(scanf("%s", HopeItFits) != NULL)
    {
         puts(HopeItFits);
         memset(HopeItFits, 0, sizeof(HopeItFits));
    }
    [...]
    
    

    Right:

    With *scanf(3) a size limit can be set in the format string for the format tags. For strings, this is done with %.<size>s.

    [...]
    char HopeItFits[12];
    [...]
    
    while(scanf("%.11s", HopeItFits) != NULL)
    {
    
         HopeItFits[11] = `\0`;
         puts(HopeItFits);
         memset(HopeItFits, 0, sizeof(HopeItFits));
    }
    [...]
    
    
  • *sprintf(3)

    Basically, the problem here is the same as with the scanf functions. The array bounds can be defined either in the format tags too or via the snprintf(3) or vsnprintf(3) function, which contains the size of the destination buffer as the second parameter.

    Wrong:

    [...]
    char HopeItFits[12];
    char BigBadBuffer[120];
    [...]
    
    while(scanf("%.120s", BigBadBuffer) != NULL)
    {
    
         BigBadBuffer[111] = `\0`;
         sprintf(HopeItFits, "%s", BigBadBuffer);
         [...]
         memset(HopeItFits, 0, sizeof(HopeItFits));
         memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
    }
    [...]
    
    

    Right:

    • Format tags:

      
      [...]
      char HopeItFits[12];
      char BigBadBuffer[120];
      [...]
      
      while(scanf("%.120s", BigBadBuffer) != NULL)
      {
      
           BigBadBuffer[111] = `\0`;
           sprintf(HopeItFits, "%.11s", BigBadBuffer);
           [...]
      
           memset(HopeItFits, 0, sizeof(HopeItFits));
           memset(BigBadBuffers, 0,
                               sizeof(BigBadBuffers));
      }
      [...]
      
      
    • snprintf(3):

      
      [...]
      char HopeItFits[12];
      char BigBadBuffer[120];
      [...]
      
      while(scanf("%.120s", BigBadBuffer) != NULL)
      {
           BigBadBuffer[111] = `\0`;
           snprintf(HopeItFits, sizeof(HopeItFits), "%s",
                                         BigBadBuffer);
          [...]
      
      memset(HopeItFits, 0, sizeof(HopeItFits));
      memset(BigBadBuffers, 0,
                          sizeof(BigBadBuffers));
      }
      [...]
      
  • strcpy(3)/strcat(3)

    With strcpy(3) and strcat(3), too, attention needs to be paid to the size of the destination buffer.

    Wrong:

    [...]
    char HopeItFits[12];
    char BigBadBuffer[120];
    [...]
    
    while(scanf("%.120s", BigBadBuffer) != NULL)
    {
         BigBadBuffer[111] = `\0`;
         strcpy(HopeItFits, BigBadBuffer);
         [...]
    
         memset(HopeItFits, 0, sizeof(HopeItFits));
         memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
    }
    [...]
    
    

    Right:

    The number of bytes to copy can be specified with strncpy(3) and strncat(3). However, be careful as strncpy(3)/strncat(3) copies the exact number of bytes specified as the third argument when the function is called, and (strncpy(3)/strncat(3)) does not NULL-terminate the string. This particular feature should be taken into account. strncpy(3)/strncat(3) therefore does not work like fgets(3)!

     
    [...]  
    char HopeItFits[12]; 
    char BigBadBuffer[120]; 
    [...]
    
    while(scanf("%.120s", BigBadBuffer) != NULL)
    {
         BigBadBuffer[111] = `\0`;
         strncpy(HopeItFits, BigBadBuffer,
                                      sizeof(HopeItFits)-1);
         HopeItFits[sizeof(HopeItFits)-1] = '\0';
         [...]
    
         memset(HopeItFits, 0, sizeof(HopeItFits));
         memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
    }
    [...]
    
    
  • strncpy(3)/strncat(3), when used incorrectly

    A lot of programmers use strncat(3) or strncpy(3) and think that they are on the safe side. However, they often forget the particular characteristic of strncpy(3) (see above). This results in a 1-byte buffer overflow, which leads to a segmentation fault without necessarily posing a security threat.

    Wrong:

    [...]
    char HopeItFits[12];
    char BigBadBuffer[120];
    [...]
    
    while(scanf("%.120s", BigBadBuffer) != NULL)
    {
         BigBadBuffer[111] = `\0`;
         strncpy(HopeItFits, BigBadBuffer,
                                       sizeof(HopeItFits));
         [...]
    
         memset(HopeItFits, 0, sizeof(HopeItFits));
         memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
    }
    [...]
    
    

    Right:

    [...]
    char HopeItFits[12];
    char BigBadBuffer[120];
    [...]
    
    while(scanf("%.120s", BigBadBuffer) != NULL)
    {
         BigBadBuffer[111] = `\0`;
         strncpy(HopeItFits, BigBadBuffer,
                                       sizeof(HopeItFits)-1);
         HopeItFits[sizeof(HopeItFits)-1] = '\0';
         [...]
    
         memset(HopeItFits, 0, sizeof(HopeItFits));
         memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
    }
    [...]
    
    
  • Reading in a loop while ignoring buffer lengths

    Loops that read user input until a specific character (such as the newline character '\n') is found in the input stream are common.

    Wrong:

    [...]
    int Byte, i;
    char HopeItFits[12];
    [...]
    
    i = 0;
    while((Byte = getc(stdin)) != `\n`)
    {
         HopeItFits[i] = Byte;
         [...]
    
         i++;
    }
    [...]
    
    
  • Format tags:

    To force a buffer overflow, all the attacker needs to do is enter more than 12 bytes without a newline character.

    Right:

    [...]
    int Byte, i;
    char HopeItFits[12];
    [...]
    
    i = 0;
    while((Byte = getc(stdin)) != `\n`)
    {
         HopeItFits[i] = Byte;
         [...]
    
         if(++i >= sizeof(HopeItFits))
         {
              fprintf(stderr, "Too much data read!\n");
              return(-1);
         }
    }
    [...]
    
    

    Of course, this can also be solved with strncat(3).

  • getwd(3)

    The library function getwd(3) returns the name of the current directory to the char array that it received as an argument. If the array is too small for the name, a buffer overflow occurs. More recent versions of the getwd(3) implementation write a maximum of PATH_MAX characters to the array. One is thus safe when the array is PATH_MAX+1 byte large.

    By using getcwd(3) or get_current_dir_name(3), one can be sure that one's program does not contain a buffer overflow due to implementation discrepancies. With getcwd(3), however, caution is called for as it only calls popen("pwd") on old SunOS systems which brings with it its own set of problems. (see section "Program Environment")

  • And many more

    There are a lot more functions that do not perform array bounds checking. They depend on the operating system, the existing libraries and the implementations.

\n"; ?>