During the Quals, 1@stPlace didn't finished BinLeet 400, much less BinLeet 500. It
is with great thanks that we present the BinLeet 500 walk-through from sk3wl0fr00t's own sk3wlm4st3r:
The provided binary was reported by file as:
reversing500: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked (uses shared libs), stripped
Intersting strings: from "strings -a"
Congratulations. Your decrypted key is: %s
The Netwide Assembler 0.98.39
The first string hints that the binary will print the key for us.
The second string hints that some nasm generated code has been linked in
Firing up IDA on the binary, we locate "main" based on the behavior of "start".
main (0x804849c) will be the first function called after the call to __init in this case.
main quickly copies pointers to 6 functions into a stack allocated array (from the table at 0x8048b20). We
dubbed these function func_0, func_1, ... func_5 and refered to them
collectively as the stage functions.
The program assumes that argv[1] exists, and computes its length which must
be 24, it then does is own version of strdup to create a copy of argv[1]. As
a simplifying assumption, we assumed that argv[1] was made up of ascii
printable characters. We also assume that the output was ascii printable
based on the format string above.
main loops three times calling two of the stage functions per pass. The
functions are called in pairs, 0/1, 2/3, and 4/5. Each of the functions
takes a length and an array of bytes. The length is always 24 and the
array of bytes is always the copy that was made of argv[1]. The functions
all return a boolean and if any return false, the program terminates.
For understanding, we reversed the entire program back to C which paid off
later when we elected to build a brute forcer to solve the challenge. Our
source is available in
re500.c and
func4.asm. Both contain many variations
from the original kenshoto code but function essentially the same.
The basic transformation of data through the program is as follows:
argv[1] -> func_0 -> func_2 -> func_4 -> build_key -> key
^ ^ ^ ^
| | | |
FILTERS: func_1 func_3 func_5 bad_hash
None of the filter functions modify data, they simply verify that
certain conditions hold at the completion of various phases.
We studied all of the functions to understand how variations in the input
data could affect the output data. It turns out that a given output byte n
is dependent on input bytes n, n + 1, and n + 2, and to a lesser extent all
other input bytes from 0 through n-1. Dependency on n + 1 comes from func_0,
n + 2 from func_2, and 0..n-1 from func_4. The relationship between the
input bytes is fairly easy to follow through func_2 where for a byte at index
i f2(i) = (f0[i] ^ f0[(i % 24) + 1]) + f2_val[i] and
f0[i] = in[i] + in[(i % 24) + 1] + f0_val[i]. All of this makes values out of
func_2 easy to compute. The first filter in func_3 is useful here as it
requires that f2(i) must be prime when i is prime. We therefore have an
opportunity to discard many potential input candidates once we can start
computing f2(2), f2(3), f2(5) etc. This part was not difficult, but did not
significantly cut the number of valid candidates that made it through func_3.
The func_4 presented the greatest challenge as it contained two stages of
self modifying code, and used the outputs from func_2 to index into and
select transformation functions from a table of 32 possibilities. These
functions performed a wide variety of operations including float to int
conversions, trigonometric functions, and byte swapping operations. Many
of the functions took additional input in the form of the current loop
count, and many of these functions carried results forward from previous
iterations. This is why outputs from func_4 become dependent on all of
the previous outputs. As a side twist, two of the func_4 subfunctions
resulted in the execution of an illegal instruction which would cause the
program to terminate. When developing the brute force attack we had to
insert logic to prevent these functions from being called.
In order to brute force the key, we needed to be able to run values through
func_4 an get the same results as Kenshoto did. To achieve this we elected
to strip all func_4 and its related code out of the binary and link it in
to our brute forcer. Before we could extract func_4, it needed to be
deobfuscated which we did with the IDA script below:
//IDC Script to decode the func_4 stuff
//this implements the decoder loop at loc_8049D01
auto i, addr, val;
addr = 0x08049D2E;
for (i = 0; i < 377; i++) {
val = Byte(addr);
val = val ^ 0x4B;
PatchByte(addr, val);
addr++;
}
The extracted func_4 then needed to be modified to preserve its exact
behavior, even after multiple calls. The modified version can be found in
func4.asm.
Once we could push values through func_4, we could utilize the checks in
func_5 to further filter down input candidates. As before, the number
of candidates that were ascii on the input side and yielded ascii on
the output side were huge.
As a result we started working backwards from the location that the key is
displayed to the user. We learned that the output from stage 4 gets hashed
and must hash to a specific value. If the hash matches a predetermined
value, the buffer was transformed one additional time to yield the actual
key which was displayed to the user.
We could not used the hash function as a filter while growing input
candidates because the hash function required 24 bytes of input. At this
point we chose to attack the hash function which was a simple shift and
xor type hash that yielded a 4 byte value. By unrolling the hash loop
we determined that the final hash was actually dependent on only a few
bytes out ouf func_4. This is because the shifting portion of the hash
causes all of the other bytes to "fall off" the right side. The final
dependencies are shown below:
input 11
bytes 15 14
from 19 18 17
func_4 23 22 21 20
final hash: C6 68 35 9B
As can be seen, the most significant byte of the hash is determined solely
by byte f4(23), thus f4(23) was known to yield C6. Furthermore C6 subtracted
from key_bytes[23] must yeild the output byte at position 23 ('!').
Knowing f4(23) immediately yields f4(2), f4(5), f4(8), f4(11), f4(14),
f4(17), and f4(20) which in turn yielded output bytes 2, 5, 8, 11, 14, 17,
and 20. At this point, we knew the output to look something like this:
char output[] = "..t..n..e..y..i.. ..r..!";
where dots represent unknown bytes. This was enough additional information
to cut down the number of candidate input values significantly, enough to
conduct a practical brute force on the input space. Our brute force
grew to a maximum of 580 million input candidates by the time we had
grown the input to 10 characters. Fortunately growing beyond 10 started
to bring more of the filters from func_1, and func_3 in to play and the
number of candidates tailed off significantly as we continued to grow the
input. As the number of candidates started to shrink we finally reached
a point where there were few enough for us to start looking for plain
language candidates either on the input side or output side. By the time
the inputs had grown to the mid teens, it was apparent that the output
was going to be something along the lines of "rotten decryptin....." which
was obtained with input in the form D0ct3rK3NpwNzALLM1W so we figured
we were on the right track.
In the end out final solution was ugly and used 22 programs, each responsible
for growing the input by a single byte. The 22 programs could be chained
together with pipes to create a brute forcer that would solve to problem in
about 5-6 hours and looked something like this on the command line:
b0 | b1 | b2 | b3 | b4 | b4 | b6 | ..... | b20 | b21
Stage b0 required brute forcing 3 bytes of input to yeild a single byte of
output. After that each additional stage appends a single input byte to
generate an additional output byte. At stages b2, b5, b8, ... we have the
benefit of know what the output byte in those positions must be which cuts
down the number on input candidates significantly at each of those stages
and keeps the brute force space to a reasonable size.
By far the biggest challenge in the process was the extraction of func_4
which was self modifying, contained position dependent transformations, and
which did not honor the gcc requirements to preserve ebx, edi, and esi across
function calls (hence the heavily modified version of func_4 that we
incorporated into our brute_forcer). The second challenge of incorporating
func_4 into a brute forcer was that func_4 contained a subfunction that was
essentially a flip-flop toggled by a specific input value. We had to
incorporate code that ensured the flip-flop function was reset prior to each
brute force attempt that we made.
props to sk3wl member r00tbat for all of his contributions to our solution.
--sk3wlm4st3r