OverTheWire: Behemoth Writeup
I continue my journeys through OverTheWire wargames with the next challenge: Behemoth. This series has similar, simple memory corruption vulnerabilities to exploit, but this time we aren’t given source code. I also touch on some of the tools I commonly use to approach reverse engineering and exploit development for CTF-like challenges.
Tools
I thought I’d start this blog post off with a quick overview of some of the tools I find useful for solving these types of challenges. You’ll likely see these tools referenced in my solutions but I wanted to call them out specifically, mention some potential alternatives, and explain my reasons for choosing them for those of you that may be less familiar with all the options.
Built-In Linux Tools
Most linux distros come with a lot of built in tools that are pretty useful. These tools aren’t likely to solve any challenges for you on their own, but they’re useful for learning about the binary your looking at and debugging your exploits when they don’t work as expected.
- strings - this command line utility searches a given file for printable
text and prints them to stdout. There aren’t really any alternatives to a
good
strings
search and it can be surprisingly useful. As such, it’s one of the first things I run on a new binary/file/challenge. - strace - prints dynamic system calls and their arguments while a program is running. Useful for getting a high level perspective on what a program is doing and for debugging hand written shellcode.
- ltrace - like
strace
, but for library (function) calls. This gives you an even higher level view of a programs execution and seeing what arguments are being passed to a particular function can be incredibly useful.
Disassembler
A disassembler (and in some cases a decompiler) is usually a reverse engineer’s most useful tool and where they’ll spend most of their time examining a binary. At a minimum, a disassembler will turn the binary machine code back into readable assembly so you can start to make sense of what the program does. Higher quality disassemblers provide a GUI, break the binary into basic blocks, allow you to annotate and save your work, support directly editing a binary, and potentially much more.
I personally use BinaryNinja - a relatively new reverse engineering platform that provides a solid GUI, editing tools, a scripting api, and much more. It’s a commercial product, at $100 for a personal license, but there is a demo version you can try.
If you’re not interested in paying for a license, objdump
, a command line
utility for inspecting and disassembling binaries, can be just as effective in
some cases - I used objdump
and gdb
for disassembly until relatively
recently. On the expensive end, another alternative to BinaryNinja is IDA,
widely considered the industry standard. At $2350 a license, you get access to
their Hex Rays decompiler which will convert disassembled functions into
notional C code.
Debugger
Developing exploits for these challenges is a whole lot easier in an
environment you control, like a debugger. On linux, nothing beats the good old
GNU debugger, or gdb
. However, there are some extensions to gdb
that make
exploit development much easier. I’m personally a fan of pwndbg - at every
breakpoint it prints register values, dereferencing if necessary (eg strings),
disassembly near the current instruction, a small window around the current
stack pointer, and the current backtrace. It also has a bunch of neat exploit
assistance features like heap inspection, ROP gadget search, memory space
search, and even IDA integration.
PEDA, and Voltron are similar alternatives, but neither is as fully featured or as mature as pwndbg.
Other Tools
- pwntools - a library to assist exploit development in python. I would actually suggest newbies stay away from pwntools - it handles a lot of things for you that are worth struggling with if you haven’t done them before.
Solutions
Warning: below are my solutions to the Behemoth wargame. If you’re interested in solving it yourself DO NOT READ THIS. Passwords have been redacted with XXXXXXXXXX.
Challenges
Level0
SSH to behemoth0@behemoth.labs.overthewire.org
with password behemoth0
to
get started.
Level0 → Level1
Opening the first challenge up in BinaryNinja, we can quickly get the idea
that this is a simple password challenge - there’s a prompt for “Password:”,
a call to scanf
and a strcmp
to a fixed value. If the strings are the same,
we get a shell, otherwise “Access denied…” A quick check of man memfrob
indicates that memfrob
is used to encrypt a memory region - so our password
string is probably encrypted and modified at runtime. Rather than try and
decrypt it by hand, let’s pop it open in pwndbg and break on the call to
strcmp
to see what args are being passed.
If we take a look at the stack (this is where function arguments are stored per
the calling convention for x86), we can see the plaintext password as it is
passed to strcmp
. Now we can grab the password for the next level.
Level1 → Level2
This level looks similar to the previous one on the surface, but this time
there’s no password check. Instead, there’s an unbounded write into a
fixed-size buffer that we can overflow. Let’s figure out how many bytes we need
to get control of eip
. After some trial and error:
Great, now let’s generate some shellcode and drop it in an environment variable
with a nice big nop
sled in front of it (this is pretty similar to a Narnia
challenge I solved
previously).
Now we need to find where our shellcode is in memory, so we’ll open it up in
gdb
and look for our EGG
environment variable.
There it is, so picking an address right in the middle of our nop
sled, we
can write an exploit script.
We have to remember to keep stdin
open since the binary will quit as soon as
it is closed and kill our shell. Adding a simple cat
with no arguments will
do the trick.
Level2 → Level3
This program gets the current process id with getpid
and then calls touch
on that pid, via system
. However, the path to the binary is not specified
absolutely, so we can manipulate the PATH environment variable to make touch
do whatever we want. Unfortunately, we can’t just write a bash script for this
because bash drops effective privileges; we’ll have to write it in C.
Level3 → Level4
Similar to the last ‘password’ one, we’re asked for some input value, but no
check is performed and the program simply exits after printing our input
string. Fortunately for us, the program passes our input directly to printf
,
so it’s likely a formatstring vulnerability.
Looks like by entering special formatting characters, we can print values off
the stack (for more on format string vulnerabilities check out OWASPs wiki
page). This gives us the
ability to write to an arbitrary location, but what should we write? If we look
back at the disassembly, we can see that the program calls puts
after the
vulnerable printf
- this is perfect candidate for us to overwrite in the
global offset table (GOT) with the address of some shellcode. Let’s start by
overwriting the GOT entry with a dummy value.
After some trial and error, we know that we can find the beginning of our
format string after 6 reads. Let’s check BinaryNinja again for the address of
puts
in the GOT.
Great, so now we can start to write our exploit script. I used the same process
as the last challenge to generate a shellcode environment variable EGG
and
find its address, so I won’t repeat it here. The general outline of the exploit
is as follows.
We’re using short writes here (that’s the %hn
), so we need to write each half
of the address in the GOT in a sepirate write. We start with the address of the
first write, followed by a word of junk, and then the address of the second
write. Using the offset we found earlier and two %hn
s we can start trying
some values for the sizes of reads in each %x
preceding the %hn
and check
in a debugger if we’ve written the address of our shellcode EGG
(0xffffd696
in my case) to the GOT, adjusting accordingly. After some trial and error, I
came up with the following.
Level4 → Level5
This program gets the current process id, then checks for the existence of a
file named /tmp/{pid}
.
If it exists, it reads it out a single character at a time and prints it to
stdout. My first thought was to create a symbolic link to
/etc/behemoth_pass/behemoth5
at /tmp/{pid}
for it to read, but I’d have to
know the pid of the behemoth4
process before it tried to open the file. It
would probably be possible to predict which pid would be allocated to the newly
spawned behemoth4
process, but I opted for a more precise solution - suspend
the process on spawn, giving me plenty of time to create the symbolic link,
then resume it. Here’s a simple bash script to start a processes suspended.
This script prints the current pid ($$
in bash), sends a SIGSTOP
to itself,
and then exec
s the program provided as an argument when resumed.
Level5 → Level6
At first glance, this program looks like it’s going to be a bit more difficult
to reverse engineer. However, just looking at the flow of api calls:
fopen("/etc/behemoth_pass/behemoth6")
, fgets(...)
,
gethostbyname("localhost")
, socket(...)
, atoi("1337")
, sendto(...)
we
can guess that this probably just stupidly sends the password for the next
level to localhost:1337
.
Level6 → Level7
This program calls another program /behemoth/behemoth6_reader
, reading the
output string and comparing that string to a fixed value: HelloKitty
. If the
value is correct, it starts a shell. Opening up behemoth6_reader
in
BinaryNinja, we can see reads the contents of a file into memory:
And then simply executes it as code.
I chose to use pwntools to generate shellcode that prints the string
HelloKitty
and then exits.
Level7 → Level8
Initially opening this program in BinaryNinja, it looks like it zeroes environment variables at the beginning of main. This means we won’t be able to store our shellcode in an environment variable as we did previously.
Main then checks the contents of the provided argument and, if it contains any non-printable characters, it quits. So we won’t be able to store our naive shellcode in the buffer itself.
Finally, there’s a vulnerable strcpy
if we pass the previous check.
Luckily, there’s a bug in the printable check - it only checks the first 511
(0x1ff) bytes of the provided buffer, so we can put our shellcode anywhere in
the buffer after the first 511 bytes, in the space that we overflow. After some
experimentation, we get control of eip
after 536 bytes.
Now that we have control, we can begin to write our exploit script.
We can use gdb
to find the address of our buffer on the stack by breaking on
the strcpy
.
We can see that the program did in fact zero out the environment variables,
which would have appeared right above the program image name on the stack. We
can also see our nop
sled and shellcode at the bottom of the argument buffer;
let’s pick an address somewhere in the middle and write our final exploit
script.
And that’s the final challenge for Behemoth.