Binary Leetness Walkthrough (DefCon CTF 2007 Qualifiers)

This binary does a handful of mmap calls, copies itself into the mmapped locations, jumps to them in order, and uses the ud2 opcode to generate an undefined opcode exception. Running it under gdb, we can't help but notice that all four bytes of %eip are in the ascii range. The "bt" command in gdb doesn't give us a valid backtrace, but we can do it manually by inspecting %ebp...

(gdb) run
Starting program: ./reversing200-f742f0774c1ff2a3ec5a0f4b2c9d9019 
5...4...3...2...1...
The key is ready!

Program received signal SIGILL, Illegal instruction.
0x54686520 in ?? ()
(gdb) x/2x $ebp
0xbff7fc80:     0xbff7fca0      0x6b657920
(gdb) set var $ebp = 0xbff7fca0
(gdb) x/2x $ebp
0xbff7fca0:     0xbff7fcc0      0x69733a22
(gdb) set var $ebp = 0xbff7fcc0
(gdb) x/2x $ebp
0xbff7fcc0:     0xbff7fce0      0x73617665
(gdb) set var $ebp = 0xbff7fce0
(gdb) x/2x $ebp
0xbff7fce0:     0xbff7fd00      0x64206569
(gdb) set var $ebp = 0xbff7fd00
(gdb) x/2x $ebp
0xbff7fd00:     0xbff7fd20      0x702c2066
(gdb) set var $ebp = 0xbff7fd20
(gdb) x/2x $ebp
0xbff7fd20:     0xbff7fd28      0x74772122
(gdb)

By taking the %eip from each stack frame that we've walked back and printing them out as an ascii string, we end up with the key.

$ perl -e 'print unpack("Z*", pack("N", hex($_))) for @ARGV' 0x54686520 0x6b657920 0x69733a22 0x73617665 0x64206569 0x702c2066 0x74772122
The key is:"saved eip, ftw!"

We can also using gdb's internal scripting language to manually walk up the stack and print out %eip as ascii:

define print_ascii_eip_frames
  set $a = (long)$eip
  set $b = $ebp
  set $i = 0
  while ($i < ($arg0))
    printf "%c%c%c%c", $a>>24, ($a>>16) & 0xff, ($a>>8) & 0xff, $a & 0xff
    set $a = *((long *)($b+4))
    set $b = *((long *)$b)
    set $i++
  end
  printf "\n"
end

Which give us...

(gdb) run
Starting program: /home/underminers/ctf07/reversing200-f742f0774c1ff2a3ec5a0f4b2c9d9019 
5...4...3...2...1...
The key is ready!

Program received signal SIGILL, Illegal instruction.
0x54686520 in ?? ()
(gdb) print_ascii_eip_frames 7
The key is:"saved eip, ftw!"
(gdb)

The provided binary was reported by file as:

reversing500: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked (uses shared libs), stripped

Intersting strings: from "strings -a"

   Congratulations. Your decrypted key is: %s
   The Netwide Assembler 0.98.39

The first string hints that the binary will print the key for us. The second string hints that some nasm generated code has been linked in

Firing up IDA on the binary, we locate "main" based on the behavior of "start". main (0x804849c) will be the first function called after the call to __init in this case.

main quickly copies pointers to 6 functions into a stack allocated array (from the table at 0x8048b20). We dubbed these function func_0, func_1, ... func_5 and refered to them collectively as the stage functions.

The program assumes that argv[1] exists, and computes its length which must be 24, it then does is own version of strdup to create a copy of argv[1]. As a simplifying assumption, we assumed that argv[1] was made up of ascii printable characters. We also assume that the output was ascii printable based on the format string above.

main loops three times calling two of the stage functions per pass. The functions are called in pairs, 0/1, 2/3, and 4/5. Each of the functions takes a length and an array of bytes. The length is always 24 and the array of bytes is always the copy that was made of argv[1]. The functions all return a boolean and if any return false, the program terminates.

For understanding, we reversed the entire program back to C which paid off later when we elected to build a brute forcer to solve the challenge. Our source is available in re500.c and func4.asm. Both contain many variations from the original kenshoto code but function essentially the same.

The basic transformation of data through the program is as follows:

     argv[1] -> func_0 -> func_2 -> func_4 -> build_key -> key
                       ^         ^         ^       ^
                       |         |         |       |
FILTERS:             func_1    func_3    func_5  bad_hash

None of the filter functions modify data, they simply verify that certain conditions hold at the completion of various phases.

We studied all of the functions to understand how variations in the input data could affect the output data. It turns out that a given output byte n is dependent on input bytes n, n + 1, and n + 2, and to a lesser extent all other input bytes from 0 through n-1. Dependency on n + 1 comes from func_0, n + 2 from func_2, and 0..n-1 from func_4. The relationship between the input bytes is fairly easy to follow through func_2 where for a byte at index i f2(i) = (f0[i] ^ f0[(i % 24) + 1]) + f2_val[i] and f0[i] = in[i] + in[(i % 24) + 1] + f0_val[i]. All of this makes values out of func_2 easy to compute. The first filter in func_3 is useful here as it requires that f2(i) must be prime when i is prime. We therefore have an opportunity to discard many potential input candidates once we can start computing f2(2), f2(3), f2(5) etc. This part was not difficult, but did not significantly cut the number of valid candidates that made it through func_3.

The func_4 presented the greatest challenge as it contained two stages of self modifying code, and used the outputs from func_2 to index into and select transformation functions from a table of 32 possibilities. These functions performed a wide variety of operations including float to int conversions, trigonometric functions, and byte swapping operations. Many of the functions took additional input in the form of the current loop count, and many of these functions carried results forward from previous iterations. This is why outputs from func_4 become dependent on all of the previous outputs. As a side twist, two of the func_4 subfunctions resulted in the execution of an illegal instruction which would cause the program to terminate. When developing the brute force attack we had to insert logic to prevent these functions from being called.

In order to brute force the key, we needed to be able to run values through func_4 an get the same results as Kenshoto did. To achieve this we elected to strip all func_4 and its related code out of the binary and link it in to our brute forcer. Before we could extract func_4, it needed to be deobfuscated which we did with the IDA script below:

//IDC Script to decode the func_4 stuff
//this implements the decoder loop at loc_8049D01
auto i, addr, val;
addr = 0x08049D2E;
for (i = 0; i < 377; i++) {
   val = Byte(addr);
   val = val ^ 0x4B;
   PatchByte(addr, val);
   addr++;
}

The extracted func_4 then needed to be modified to preserve its exact behavior, even after multiple calls. The modified version can be found in func4.asm.

Once we could push values through func_4, we could utilize the checks in func_5 to further filter down input candidates. As before, the number of candidates that were ascii on the input side and yielded ascii on the output side were huge.

As a result we started working backwards from the location that the key is displayed to the user. We learned that the output from stage 4 gets hashed and must hash to a specific value. If the hash matches a predetermined value, the buffer was transformed one additional time to yield the actual key which was displayed to the user.

We could not used the hash function as a filter while growing input candidates because the hash function required 24 bytes of input. At this point we chose to attack the hash function which was a simple shift and xor type hash that yielded a 4 byte value. By unrolling the hash loop we determined that the final hash was actually dependent on only a few bytes out ouf func_4. This is because the shifting portion of the hash causes all of the other bytes to "fall off" the right side. The final dependencies are shown below:

input                   11
bytes               15  14
from            19  18  17
func_4      23  22  21  20
            
final hash: C6  68  35  9B

As can be seen, the most significant byte of the hash is determined solely by byte f4(23), thus f4(23) was known to yield C6. Furthermore C6 subtracted from key_bytes[23] must yeild the output byte at position 23 ('!'). Knowing f4(23) immediately yields f4(2), f4(5), f4(8), f4(11), f4(14), f4(17), and f4(20) which in turn yielded output bytes 2, 5, 8, 11, 14, 17, and 20. At this point, we knew the output to look something like this:

char output[] = "..t..n..e..y..i.. ..r..!";

where dots represent unknown bytes. This was enough additional information to cut down the number of candidate input values significantly, enough to conduct a practical brute force on the input space. Our brute force grew to a maximum of 580 million input candidates by the time we had grown the input to 10 characters. Fortunately growing beyond 10 started to bring more of the filters from func_1, and func_3 in to play and the number of candidates tailed off significantly as we continued to grow the input. As the number of candidates started to shrink we finally reached a point where there were few enough for us to start looking for plain language candidates either on the input side or output side. By the time the inputs had grown to the mid teens, it was apparent that the output was going to be something along the lines of "rotten decryptin....." which was obtained with input in the form D0ct3rK3NpwNzALLM1W so we figured we were on the right track.

In the end out final solution was ugly and used 22 programs, each responsible for growing the input by a single byte. The 22 programs could be chained together with pipes to create a brute forcer that would solve to problem in about 5-6 hours and looked something like this on the command line:

b0 | b1 | b2 | b3 | b4 | b4 | b6 | ..... | b20 | b21

Stage b0 required brute forcing 3 bytes of input to yeild a single byte of output. After that each additional stage appends a single input byte to generate an additional output byte. At stages b2, b5, b8, ... we have the benefit of know what the output byte in those positions must be which cuts down the number on input candidates significantly at each of those stages and keeps the brute force space to a reasonable size.

By far the biggest challenge in the process was the extraction of func_4 which was self modifying, contained position dependent transformations, and which did not honor the gcc requirements to preserve ebx, edi, and esi across function calls (hence the heavily modified version of func_4 that we incorporated into our brute_forcer). The second challenge of incorporating func_4 into a brute forcer was that func_4 contained a subfunction that was essentially a flip-flop toggled by a specific input value. We had to incorporate code that ensured the flip-flop function was reset prior to each brute force attempt that we made.

props to sk3wl member r00tbat for all of his contributions to our solution.

--sk3wlm4st3r