Damn. I just thought up another piece of software (that I’m capable of writing) that I can’t find. This is bad; it means it’s going to haunt me until I code it.
So, frequently, I’m faced with streams of bytes of unknown origin/purpose. (For example, the .TiVo file format, RTMP streams, and most recently, Outlook “NK2” address autocompletion cache files.) I’ve had experience finding patterns, but it’s always so time-consuming. Usually I’m compiling some little C program over and over, slowly tweaking some guessed-at structure. This is basically the advice I got from Andrew Tridgell when I asked how he went about reverse engineering protocols. His methods deal more with sending/receiving, so it’s much more interactive. Most of what I’ve mucked with are just unknown file formats.
What I want is a nice GUI tool that will let me specify a language to describe a data file’s contents. I can see lots of meta-specifications like “repeat this structure until EOF”, and “if byte 5 is 1, read X bytes, otherwise, read X+50 bytes”, etc. Most data formats have pretty simple layouts after you figure them out. As you create the structure for the data to fit into, you can see the data from your example file displayed live. This way you can quickly tweak lengths, offsets, encoding types, endianness, etc, without needing to totally recompile your test harness.
Hell, it could even spit out the C code to process it, too. :)
I’m thinking about using Gtk and Python. We’ll see how rapid that path is for developing a nice GUI. I’ve heard good things. :)
© 2005, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.