Teaching‎ > ‎

CS210 Fall 2009 TF

This is the teaching fellow's page. I will post some supplemental material here from time to time, allowing both students and the professor to keep track of what happened in discussion sessions.

General Information

You're welcome to make appointment by e-mail; please check my schedule first.

Course Outline


posted Dec 2, 2009, 8:22 PM by Likai Liu   [ updated Dec 8, 2009, 6:18 PM ]

  • Discussed yet another memory allocator implementation (see mm.c attached).
    • Object is prefixed with a size_t header regardless it's free or not. The first word of free object is also used to store the "next object" pointer. Allocated objects do not have a next pointer. The free objects form a singly linked list.
    • Does not sort objects to address order.
    • Splits large objects into smaller ones.
    • No coalescing.
    • LIFO (as opposed to first fit).
  • Demonstrated that the allocator works with a simple program "tac" which reads each line from stdin into memory and prints them out last line first.
    • Compile tac.c against standard C malloc: gcc -o tac tac.c
    • Compile tac.c against mm.c implementation: gcc -DUSE_MM -o tac tac.c
  • Possible improvements:
    • Calling mm_malloc() with request n >= ZONE_SIZE causes infinite loop. However, using a ZONE_SIZE that is too big will penalize your utilization score. This should be an easy fix.
    • Currently all objects are sizeof(size_t) aligned. Will need to modify pointer arithmetic to make it align to 8 bytes on any architecture. The malloclab requires all objects to be 8-byte aligned.
    • Deferred coalescing: wait until the free list runs out; before asking for more memory from sbrk(), perform a merge sort, then coalesce all coalesceable free blocks in one pass.
    • The malloc() routine spends most of its time searching through the linear free list, which is O(n). Use a log(n) time lookup data structure such as binary search tree.
    • Approximate best fit strategy using segregated size class bins. See the design of Doug Lea malloc for some ideas. The "hash table" like data structure could provide close to O(1) malloc(), and best fit is known to minimize fragmentation.
    • To improve small object allocation, consider a special "headerless" class that is simply an array of same-sized small objects.
  • Note: if you want to use this mm.c with malloclab, remember to reinitialize mm_first_free to NULL in mm_init(). Otherwise mdriver will clear the heap and cause objects already in the free list to be freed again, creating a cyclic list that causes infinite loop in mm_malloc().


posted Nov 23, 2009, 12:02 PM by Likai Liu   [ updated Nov 23, 2009, 12:57 PM ]

  • Why isn't stack allocation enough for programming?
    • Stack-allocated objects become invalid as soon as the function returns, so callee cannot return objects on stack back to the caller.
    • Caller can create an object on stack and pass its pointer to the callee to store results there, but the caller must be able to predict how much space will be used by the callee (e.g. gettimeofday(), strftime(), fgets(), snprintf(), ...) Typically, the callee return value has a special way to indicate whether the space provided by the caller was sufficient.
    • Not a good paradigm for data structure, even those as simple as a list. The functions to enumerate directory entries for example---opendir(), readdir(), closedir()---are awkward to use and error prone if used incorrectly.
  • What is the heap?
    • Traditional Unix process views the heap as an extension to the data section, in order to store additional "global" objects created in run time. The data section typically stores objects created as a global variable.
    • Heap is grown in opposite direction to the stack (see figure on the right).
    • The brk() system call sets the end of the data section; sbrk() is a wrapper that remembers and returns the previous end of the data section, and adjusts the new end by the given number of bytes. In other words, use of sbrk() feels like malloc().
    • There is no corresponding free() for the data section. The end of the data section can be shrunk, but one must take the trouble to ensure that the shrinkage does not cause objects that are still in use to be invalidated.
    • brk() typically operates on one page at a time: if we want 8 bytes, it will still give us a page (4096 bytes). We typically use malloc(), which obtains memory from sbrk(), to manage the heap in a fine grained way.
    • Modern operating systems use paged virtual memory management to allocate pages anywhere in the address space.
  • Simple strategies to manage the heap:
    • Use it like a stack. The decoupling of call stack and object stack means objects can survive the scope of a function call. Objects must be freed in the exact opposite order they're allocated. See K&R section 5.4 pp. 101.
    • Use a singly linked list to chain free memory blocks. Objects may be freed in any order. May cause fragmentation problem. See K&R section 8.7 pp. 185.
  • A word about using union (see K&R pp. 147) for alignment: K&R pp. 186 presents this object header definition:
    typedef long Align;  /* for alignment to long boundary */
    union header {       /* block header: */
        struct {
            union header *ptr; /* next block if on free list */
            unsigned size;     /* size of this block */
        } s;
        Align x;         /* for alignment of blocks */
    typedef union header Header;

    The use of union in this case ensures that header is at least as big as the Align type, which is typedef as a long int. The malloc() code on pp. 187 uses sizeof(Header) to round up the number of bytes to allocate.

    • On a 16-bit machine, long int is 4 bytes, pointer 2 bytes, and unsigned int 2 bytes. The union does nothing in this case because sizeof(x) == sizeof(s) (x and s are the members in the union).
    • On a 32-bit machine, long int is 4 bytes, pointer 4 bytes, and unsigned int 4 bytes. All of them are aligned to their size. The union does nothing in this case because sizeof(x) < sizeof(s).
    • On a 64-bit machine, long int is 8 bytes, pointer 8 bytes, and unsigned int 4 bytes. The struct is 12 bytes, and the union still does nothing because sizeof(x) < sizeof(s).

    In all cases, sizeof(Header) == sizeof(s), so the union does nothing. Furthermore, on 64-bit machine, the compiler may leave sizeof(Header) as 12 bytes, which is not a very good number to round up to.


posted Nov 6, 2009, 2:58 PM by Likai Liu   [ updated Nov 9, 2009, 12:28 PM ]

Continue from last week the implementation of binary search tree traversal, this time using pointer to pointer (see bst-new.c attached). This approach cannot be efficiently emulated by other languages like Java.
  • Review: writing free_bst() recursively. The malloc()/free() pair of functions is how you obtain/relinquish memory. Remember not to free(t) first, otherwise t->left and t->right would become garbage.
  • find_pos() takes pointer to pointer to a binary search tree node for argument, and returns pointer to pointer to node where the node with given key would be found. If there is no node with the given key, returns pointer to the NULL pointer. It could either be the original bst_t ** to the root node, or either the memory location of the left or right member of some node that contains the NULL pointer.
  • In other words, find_pos() returns the position that can be used to modify the tree, either for insert or remove.
  • find() can be written in terms of find_pos().
  • insert() can be written in terms of find_pos().
  • As an exercise, see if you can write remove() in terms of find_pos() as well. It may involve writing a find_min_pos() function.
  • The indirect pointer can be seen reflected in the assembly code of bst-new.c. It doesn't increase number of instructions significantly (compared to bst.s, attached, generated from bst.c from last week).
  • The insert() function is now tail recursive, and it generates much better code compared to before.
  • With -O3, the compiler inlines the definition of find_pos() into find() and insert(), essentially synthesizing new functions based on the one we wrote.

Summary of GDB Hints

posted Nov 3, 2009, 10:11 AM by Likai Liu

Some GDB usage tips collected from messages posted on the mailing list.
  • Prevent bomb explosion:
    (gdb) b explode_bomb

    then when the program is stopped by the debugger, check if the function has been triggered.

  • Tracing execution at assembly level. At the beginning of each gdb session, it would help to enter this command first:
    (gdb) display /i $pc

    then gdb will show the upcoming instruction each time your program steps.

    (gdb) nexti    # next instruction, skips over function calls.
    (gdb) ni       # shorthand for nexti
    (gdb) stepi    # step instruction, steps into function calls.
    (gdb) si       # shorthand for stepi
    To see the content of registers,
    (gdb) info registers
    (gdb) i r      # shorthand for info registers. Note the space.
  • To disassemble a function without running it,
    (gdb) disassemble addr
    (gdb) disas addr    # shorthand for disassemble

    The address "addr" can be a symbol name (e.g. phase_1) or address (e.g. 0x08048ea6) or a register (e.g. $pc).

  • To examine memory content,
    (gdb) x /fmt addr
    where /fmt specifies the format at the memory location "addr". Some examples:
    (gdb) x /s 0x8049890   # shows the string at address 0x8049890
    (gdb) x /16bc $esi     # shows 16 bytes of characters at $esi
    (gdb) x /4wx &node1    # shows 4 words of hex at symbol name node1
    (gdb) x /6wx $ebp - 0x20  # shows 6 words of hex at address $ebp - 0x20.

    Notice that &node1 is the address of that symbol. If you omit the &, it would try to read a word value at that memory location, and then use the value as the address to show for the x command.

  • If you see a constant that seems to refer to a memory location, and if you want to see if there is a symbol associated with that address, you can lookup the symbol name like this:
    (gdb) info symbol 0x08048cfb
    phase_2 in section .text
    (gdb) info symbol 0x804a5fc
    node1 in section .data

    Sometimes the symbol name reveals intent of the program.


posted Nov 3, 2009, 10:01 AM by Likai Liu

Binary search tree in C. See bst.c attached below. Several points to note:
  • We're writing recursive functions. Base case (terminal case) is when the tree reaches a NULL pointer. The recursion makes progress towards the base case because every time we make a recursive call, we're one node closer to a NULL pointer. We can assume this is the case because tree pointers are acyclic. If we have cyclic pointers (it wouldn't be a tree anymore), our recursive function may not terminate.
  • find() takes a node and recursively calls find on either the left or right child depending on the key. If key is smaller than what's stored at node, we look into the left child. If key is greater, we look into the right child. Otherwise the key is equal to the one stored at the node, and we've found the node we're looking for. Initially, find() is called on the root of the tree. Notice that find() is tail recursive.
  • insert() is a bit different. Ideally it should just return void and modify the tree transparently, but the pointer we want to modify resides in the parent of the current node. For now, we make insert() return the new updated subtree to the caller, which knows the parent and will perform the update. Notice that insert() is not tail recursive. We'll see a different way to write insert() next time.


posted Oct 21, 2009, 10:26 AM by Likai Liu   [ updated Oct 21, 2009, 2:07 PM ]

Assembly bomb lab hints

  • Rather than working online with gdb, I suggest you use offline tools like objdump to study the bomb offline, unless you "like to live dangerously" (there are way too many Austin Powers jokes in this lab).
    • To produce disassembly of the bomb, including hidden functions phase_1, phase_2, ..., that are not included in bomb.c,
      objdump -d ./bomb
      You could use shell redirection to save the output to a file:
      objdump -d ./bomb > bomb-asm.txt

      Note the output is not quite a valid .s file that can be assembled again.

    • You can dump the section content like this,
      objdump -s ./bomb
      and you will see output in six columns. The first column is the address, next four the content of this strip of 16-bytes of memory, and the last column the ascii equivalent, which allows you to see the strings visually. Note: each group of 4 bytes from columns 2 through 5 should be read as an array of bytes as they appear in memory, not as an integer with little or big endian byte order.

      For example, the following two 16-byte strips:

       8049790 476f6f64 20776f72 6b212020 4f6e2074  Good work!  On t
       80497a0 6f207468 65206e65 78742e2e 2e000000  o the next......

      is to be loaded at memory address 0x08049790, and is the string "Good work!  On to the next..." And this 16-byte strip:

       80498e0 598e0408 628e0408 698e0408 728e0408  Y...b...i...r...
      is at address 0x080498e0, with four little endian integers 0x08048e59, 0x08048e62, 0x08048e69, and 0x08048e72. This appears to be a jump table.
  • If you're an enterprising hacker, you might have thought about patching the explode_bomb() function in gdb to just return without doing anything. The "ret" instruction has the opcode 0xc3, so you can do something like this:
    $ gdb ./bomb
    (gdb) b main
    Breakpoint 1 at 0x80489bb: file bomb.c, line 44.
    (gdb) run
    Starting program: /home/grad2/liulk/Teaching/CS210/bomb2/bomb 
    Breakpoint 1, main (argc=1, argv=0xbff6a974) at bomb.c:44
    44          if (argc == 1) {  
    (gdb) disas explode_bomb
    Dump of assembler code for function explode_bomb:
    0x08049233 <explode_bomb+0>:    push   %ebp
    0x08049234 <explode_bomb+1>:    mov    %esp,%ebp
    End of assembler dump.
    (gdb) set {char}explode_bomb = 0xc3
    (gdb) disas explode_bomb 
    Dump of assembler code for function explode_bomb:
    0x08049233 <explode_bomb+0>:    ret    
    0x08049234 <explode_bomb+1>:    mov    %esp,%ebp
    End of assembler dump.
    (gdb) continue
    The line "set {char}explode_bomb = 0xc3" patches the function to just return without exploding. Using this technique, you can fool the program into thinking that the bomb has been defused! But the catch is, the bomb sends us e-mail about the input string you entered, and the solution is verified independently by the auto-grader. Patching the code, while clever, will unfortunately earn you an "invalid" status.

Function call convention

  • C pushes arguments on stack in reverse order, so 8(%ebp) is the first argument, 12(%ebp) is the second, etc. This allows the callee to potentially have an indefinite number of arguments, like printf(). See figure on the right.
  • The stdarg.h header provides a way to iterate over an indefinite number of arguments. See print_ints.c attached. The function va_start() initializes the argument pointer ap variable (its type is va_list) to the ... part, right after the given argument; va_arg() is like *ap++, and va_end() to do clean-up. See man stdarg for more information.
  • On Linux 32-bit x86, when compiled with -O3, the compiler uses %esi for ap, and "leal 12(%ebp), %esi" to initialize ap. The increment by va_arg() is done by addl $4, %esi. See attached print_ints.s.
  • Notice how in print_ints(), the compiler passes argument to printf() by reusing existing reserved stack space.
    subl $16, %esp
    movl $.LC0, (%esp)  ; "%d "
    movl %eax, 4(%esp)  ; i
    call printf
    addl $16, %esp

Ackermann's function and recursion

  • Recursive functions we've seen so far are primitive recursion, and can be written using while loops, e.g. factorial, fibonacci. Recursion, however, is much more powerful than that.
  • Introducing Ackermann's function, ack(m, n), which can grow faster than exponential rate in terms of n, for m >= 4. See ack.c attached below. For this function, we cannot cheat by implementing a while-loop version!
  • I'm using gcc 4.4.1 (csa2 has gcc 4.1.2 by default) which demonstrate tail call optimization. Comparing the assembly generated by gcc 4.4.1 -O0 (see ack-O0.s below) and -O3 (see ack-O3.s below), notice that for -O3, there is only one call ack in the body of ack(). That's due to tail call optimization. Basically, the statement "return ack(x, y);" is transformed to setting the arguments to x and y, followed by a goto. This saves the effort to create and destroy a stack frame that is not needed.


posted Oct 17, 2009, 10:19 AM by Likai Liu   [ updated Oct 21, 2009, 10:26 AM ]

Transforming high-level structured control flow into either conditional or unconditional jump in C, in a form that is closer to how assembly language does control flow.
  • In C, we have goto for unconditional jump. The goto targets are labels, which is an identifier at the beginning of a line followed by a colon.
    The goto statement can jump below,  or jump above,
      /* do something... */
      goto label_below;
      /* this code is skipped... */
      /* continue... */
      /* do something... */
      goto label_above;
  • Conditional jump in C is simulated by an if statement with only a goto in the then-clause.
      if (cond)
        goto label;
  • Structured if's can be translated to conditional and unconditional jump like this.
    Original  Translated
      if (cond) {
        /* then-statement */
      } else {
        /* else-statement */
      /* ... */
      if (!cond) goto else_begin;
      /* then-statement */
      goto else_end;
      /* else-statement */
      /* ... */
Translating loops:
  • While loop
    Original  translates into
    while (cond) {
      /* while-body */
      if (!cond)
        goto loop_end;
      /* while-body */
      goto loop_begin;
  • For loop like this:
    for (initialize; cond; update) {
      /* for-body */
    is first translated into a while loop
    while (cond) {
      /* for-body */
    Then it can be translated into goto like above.
Examples and exercises:
  • Fibonacci, using while loop (see fib.c below) and using goto (fib-goto.c below).
  • Factorial, using while loop (see factorial.c below). Exercise.
  • Prime factorization, using for loop (see prime-factor.c below). Exercise.
(Note that their original location in my home directory has been removed.)


posted Oct 7, 2009, 7:06 PM by Likai Liu   [ updated Oct 12, 2009, 7:54 AM ]

Overview of program's perception of memory

  • Each program is loaded into an isolated, private address space. It cannot see the memory of another program. The hardware and the OS works together to enforce the address space protection.
  • The address space on a 32-bit machine ranges from 0 through 0xffffffff (232 - 1); on a 64-bit machine, it ranges from 0 through 0xffffffffffffffff (264 - 1). A number that represents memory location in the address is called a pointer.
  • At address 0, there is a reserved NULL page intentionally made inaccessible in order to catch NULL pointer reference. This is because functions can only return one value, and NULL pointer (which is numeric 0) conventionally indicates an error condition. If a programmer forgets to check for error, the program can dereference NULL pointer and crash. This is the best case scenario so you can debug. It is better than if the program keeps running, reading random garbage from memory. NULL pointer dereference causes segmentation fault (see null.c below).
    • The asterisk serves two purposes. In null.c, char *p declares variable p to be a character pointer. In an expression, *p causes the character value at pointer p to be read out, or dereferenced.
  • Use the command: objdump -h a.out and you'll find the sections in your executable. The rodata, data, and bss sections store global variables: rodata (read-only) stores string literals (what you write in the program inside double quotes) and other things; data stores initialized read-write variables; bss stores uninitialized global variables (or those initialized to 0). You can observe the change of bss and data section sizes (see section.c below).
  • Heap grows upward (not going to be observed for now), but stack grows downward (see stack.c). Local variables are placed on stack, and function call forces the stack to grow, so the char c in main() occupies a higher address than the char c in foo(). &c (unary & operator) gives you the address of the variable c.

Array, pointer arithmetic, and string

  • Summation of an integer array (see sum.c below).
    • Use size_t instead of int for the size of the array. size_t is basically an unsigned integer that has the same width as a pointer. On a 64-bit machine, using int to represent size of a memory object could overflow; size_t makes your program portable. size_t can be 0 or positive, but there cannot be negative sizes.
    • In C, pointer and array are synonymous. int *p is the same as int p[]. We can dereference pointer p either using *p or p[0]. Furthermore, p[i] is the same as *(p + i), and &p[i] is the same is (p + i).
    • *p++ is parsed as *(p++), to increment the pointer. On the other hand, (*p)++ means increment the memory location at p.
    • Prefix ++x and --x operators causes the variable x to be incremented or decremented before the statement is executed. Postfix x++ and x-- causes the same effect after the statement.
  • NUL '\0' terminator of a string (see nulterm.c below).
    • NUL character is a byte of value 0, denoted as '\0', not the character for number zero, denoted as '0'.
    • String is a character array that is NUL terminated. This is so we can pass a string without also needing to pass the size of the string.
    • If we pass a char buffer without NUL terminator, puts() will print garbage. The int x increases the likelihood that buf[] will not be immediately followed by a zero byte.
  • Computing the length of a string (see string.c below).
    • const char *s means we do not intend to modify the char pointed to by s, so the content of s is supposed to be constant.

Programming Assignment 1 Errata

posted Oct 7, 2009, 4:56 PM by Likai Liu   [ updated Oct 7, 2009, 5:36 PM ]

Some notes about programming assignment 1.
  • dlc in the datalab-handout.tar is distributed as a Linux x86 binary executable. There is no source. If you work on Mac OS X or on Windows (requires Cygwin), download the dlc.macosx or dlc.zip (contains dlc.exe) from the attachments below.
  • If you do your datalab on a 64-bit machine, you might have problem with tmax() test that says the result should be -1 (0xffffffff) which is incorrect. You have two options.
    • Use this make command: make CFLAGS=-m32
      This tells gcc to compile in 32-bit mode instead of the default 64-bit mode. If you get compilation error involving gnu/stub-32.h, you will have to install additional development packages for the 32-bit environment, e.g. on Ubuntu it's called libc6-dev-i386.
    • Change LONG_MAX to INT_MAX in tests.c. The grader will use his own copy of tests.c, but he will compile the code on csa2, which is 32-bit Linux.
    The reason is that LONG_MAX on 64-bit machine is 0x7fffffffffffffff and not 0x7fffffff, and LONG_MAX gets truncated when returned as int.
  • If you try to run dlc on bits.c, it might choke on some header files included by limits.h on your system. There should be no need to use limits.h, so you may remove the line that says #include <limits.h>.


posted Sep 23, 2009, 3:39 PM by Likai Liu

  • Review
    • Bitwise operators & | ~ treat an integer as a vector of bits.
    • Logical (boolean) operators && || ! treat non-zero integer as true, and zero integer as false.
  • Practice problem 2.13, use gdb commands as a simple command line calculator.
    • set var $x = 0x66    /* sets gdb pseudo variable $x to value 0x66 */
    • set var $y = 0x93    /* sets gdb pseudo variable $y to value 0x93 */
    • p /x $x & $y    /* prints the result of 0x66 & 0x93 in hexadecimal */
    • ....
  • Practice problem 2.14, write x == y without using == operator (see eq.c attached).
    • If we see x and y as bit sets, x & ~y is the set difference x - y, and y & ~x is the set difference y - x. If x ⊆ y, then x - y = ∅. Likewise, if y ⊆ x, then y - x = ∅. If x ⊆ y and y ⊆ x, then x = y. So the expression !(x & ~y) && !(y & ~x) works.
    • Using XOR, x ^ x = 0 and x ^ y != 0 if x != y. So the expression !(x ^ y) also works.
  • Practice problem 2.23, pass gcc flags -Wall -Wextra to tell gcc to report many common mistakes. In particular, the warning for this particular problem is -Wsign-compare.
  • Big endian vs. little endian machines (see endian.c attached).
    • int i = 0x44332211, on little endian the bytes are stored in memory as 11 22 33 44. On big endian, the bytes are stored as 44 33 22 11.
    • On both 32-bit and 64-bit machines, sizeof(int) is 4 bytes (32-bits). However, sizeof(long int) is 4 bytes (32-bits) on 32-bit machine and 8 bytes (64-bits) on 64-bit machine.
    • On little endian 64-bit machine, long int i = 0x44332211 is stored in memory as 11 22 33 44 00 00 00 00.

1-10 of 13