What the Fuzz? American Fuzzy Lop

What the Fuzz? American Fuzzy Lop

Disclaimer: Bunny pictures used in this post are purely for cuteness, and might not be related to the actual American Fuzzy Lop in which the fuzzing application AFL is named after.

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. That computer program could be a command line tool, web application, or thick application that has a full blown graphical user interface. Fuzzing has been proven to be one of the most proven strategies for identifying security issues.  The majority of public CVEs and exploits in the wild were found using fuzzing methods.

However, it can be slow tedious and ineffective.  Using blind, random mutations make it unlikely that a fuzzer would even make it further into the program and would have to be tested by navigating each individual decision path manually.  This of course introduces human error as well and areas can go missing.  There have been many efforts in trying to nail down a more efficient fuzzing method.  None of the other attempts have come close to the American Fuzzy Lop (AFL) approach.

AFL uses a modified form of edge coverage to effortlessly pick up subtle, local-scale changes to program control flow. However, if you do an a search of the Internet for AFL fuzzing you will find a plethora of links and resources.  There is no need in repeating what others much better than me have already explained.  What you might not find as much of is writing harnesses for fuzzing applications though.  This is what I have struggled with the most and I hope that maybe someone else will read this and it will help them as well.

As mentioned above there are plenty of blogs on how to install AFL on a Linux operating system, but I will include some brief instructions here so that anyone currently reading this blog post will not have to leave to get those install instructions.

Yuanyuan’s rabbit
Photo by Jerry Wang / Unsplash

I will preface the remainder of this blog post to say that the majority of the code base and instructions that helped me the most were located in the Github repository called AFL-Training which includes materials of the "Fuzzing with AFL" workshop by Michael Macnair (@michael_macnair).  All credit goes to them.

Exercises to learn how to fuzz with American Fuzzy Lop - mykter/afl-training

Setting up your own machine manually

Install dependencies:

$ sudo apt-get install clang-6.0 build-essential llvm-6.0-dev gnuplot-nox

Work around some Ubuntu annoyances

sudo update-alternatives --install /usr/bin/clang clang `which clang-6.0` 1
sudo update-alternatives --install /usr/bin/clang++ clang++ `which clang++-6.0` 1
sudo update-alternatives --install /usr/bin/llvm-config llvm-config `which llvm-config-6.0` 1
sudo update-alternatives --install /usr/bin/llvm-symbolizer llvm-symbolizer `which llvm-symbolizer-6.0` 1

Make the operating system not interfere with crash detection:

$ echo core | sudo tee /proc/sys/kernel/core_pattern

Get, build, and install AFL:

$ wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
$ tar xvf afl-latest.tgz
$ cd afl-2.52b   # replace with whatever the current version is
$ make && make -C llvm_mode CXX=g++
$ make install

If you have access to the project files AFL makes it very easy to compile with instrumentation.  If this is what you are looking for the instructions are included below, but if you don't have access to the source code much like many production binaries keep reading to see how we can interface with those.  

The vulnerable program

In this instance we will be using some source code that we have access to and will add some additional instrumentation using AFL.  To build our binary program using the instrumented compiler:

$ cd quickstart
$ CC=afl-clang-fast AFL_HARDEN=1 make

Test it:

$ ./vulnerable
# Press enter to get usage instructions.
# Test it on one of the provided inputs:
$ ./vulnerable < inputs/u


Now lets get to fuzzing.  In order to fuzz the vulnerable program you will need an input directory and an output directory.  The input directory is intended for the Corpus, which is essentially a set of inputs for a fuzz target.  In most contexts, it refers to a set of minimal test inputs that generate maximal code coverage.  The output directory is used for the data generated by AFL during the fuzz process.  The actual fuzzing command is as follows:

$ afl-fuzz -i inputs -o out ./vulnerable

Your session should soon resemble this:

fuzzing session

Now that we have validated that our AFL fuzzer is setup and working correctly, let's jump into building our own test harnesses.  Harness in this context is meant to define a wrapper or some programming code that will be used to interface with the target application.  The harness itself is not what we are looking for vulnerabilities in, but instead helps AFL interface with the target application and send all the inputs and outputs through our harness program.  The small exercises in AFL-Training repository are about writing a harness around a bit of code to allow it to be fuzzed with AFL.  In future blog posts, we will discuss how to use crash data and potentially exploit those crashes.

Test harness basics

The code in library.h manipulates some input data and gives an output, but either the header file interfaces with another blackbox binary or we want to fuzz the header in our own way.  So we must ask the following question, how can we fuzz it?

  1. The code needs to be executable - it needs to be compiled into a program.
  2. Needs to allow AFL to work effectively, the code needs to be instrumented - so we have to compile it using one of afl-clang-fast, afl-clang, or afl-gcc.
  3. For the data generated by AFL to actually test the library, we have to write a harness that will take external input and feed it to the library. This can either be from a file specified on the command line, or directly from stdin.

A minimal stdin test harness

To meet point 1 we need a main() function that calls the library. Here's an example we'll call harness.c:

#include "library.h"
#include <string.h>
#include <stdio.h> 

// necessary includes for IO and string manipulation
// imported the library header file we are attempting to fuzz

// the entry point for a C program is nearly always main 
void main(){
	// get some input data
	char *data = "Some input data\n";
        // lib_echo function resides in our "library.h" file
	lib_echo(data, strlen(data)); 
        // lib_mul function resides in our "library.h" file
	printf("%d\n", lib_mul(1,2)); 

We can compile this minimal program like so:

AFL_HARDEN=1 afl-clang-fast harness.c library.c -o harness

Here is where I ran into some issues with understanding this concept.  In the example provided in the AFL Training material there is a library.c and a library.h file.  The header file contains the C functions.  The C file is what calls those functions in the header file.  So our harness is technically making a call to the library.c file even though the library.h contains all of the functions.  When it gets compiled and linked the library.c file will also include the header file.  We could also define and build the includes and calls within our own harness if we wanted.  As this time the harness will call the library code (run ./harness to test it out), but there's no hook yet to allow the inputs generated by AFL to make it to the target function. Try running this program under afl-fuzz: afl-fuzz -i in -o out ./harness - you will see that AFL gives you a warning that nothing is happening with the message: "(odd, check syntax!)".

So let's make our harness take input from stdin and feed it to the target function. See man 3 stdin for an overview if the concept of standard input and output is new to you.  This is extremely important as all of AFL's fuzzing input will be contained within the "input" directory.  The input defined here is commonly known as corpus, or a set of inputs for a fuzz target.

#include <unistd.h>
#include <string.h>
#include <stdio.h>

#include "library.h"

// fixed size buffer based on assumptions about the maximum size that 
// is likely necessary to exercise all aspects of the target function
#define SIZE 50

int main() {
        // make sure buffer is initialized to eliminate
        // variable behaviour that isn't dependent on the input.
	char input[SIZE] = {0};

	ssize_t length;
	length = read(STDIN_FILENO, input, SIZE);

	lib_echo(input, length);

After compiling this with the instrumenting compiler, running it under afl-fuzz should give (marginally) better results - now the inputs it is sending to the program are actually having an impact on the execution flow, and it can discover inputs that lead to different paths. This is an incredibly simple program though, and the underlying printf call is not instrumented, so AFL won't find many paths. One of the paths leads to a crash, so if your harness is working you should soon see AFL report a crash. We won't discuss the crash further here, as we're focused on harness writing.

As this post is getting somewhat lengthy I will breaking this post up into a second part.  Let's just hope that it doesn't take me another 4 months to write the second half.  Until next time keep on fuzzing.

## Sources and Inspiration
* https://lcamtuf.coredump.cx/afl/
* https://github.com/mykter/afl-training/tree/master/harness

Ryan Villarreal

About Ryan Villarreal