Date: 29 September 2010
Click here for printable version
What is Return-Oriented Programming?
Return-Oriented Programming enables an attacker to use non-malicious code maliciously by combining short snippets of benign code already present in the system.
I first heard about how Return-Oriented Programming works back on the 28th of August 2009 in relation to a new attack on electronic voting machines. After that I did a bit of reading because I found the idea of using an existing programís own code against itself an interesting one. In light of the recent Adobe Reader Vulnerability where malware was found using Return-Oriented Programming, I thought it would be good to provide an overview about what Return-Oriented Programming actually is. If you are looking for something more in-depth, then there is a good presentation from BlackHat available and also two good PDF files about Return-Oriented Programming:
Getting Misquoted Ė A Small Example
The idea behind Return-Oriented Programming is to use an existing programís code, but to alter the flow of execution to perform a different task. Typically an attacker would attempt to execute malicious actions by using certain parts of a benign programís code.
The best example of this that I can give is where a journalist deliberately reports information out of context and changes the meaning of a statement. Letís say that Tom has just found a cure for cancer and distributed the following media release to the press:
Cure for Cancer Media Release
After 10 years of research, my team has developed a cure for most forms of cancer. We have decided to donate this research to the world by making it freely available. Because of this research and the funding we were provided the whole world will benefit. We hope within the next few months many others will sign up for a trial, as I have.
I would love to be able to say that I alone am responsible for the cure for cancer. The truth, however is that the team of people I have working with me are the ones that have done all of the work. They are the people that took us in the right direction with almost no guidance from myself. They are the people who worked week after week in the lab. They are the ones who should receive the publicís thanks.
The next day the following misquoted article appears in the newspaper (or should that be eReader?):
Cure for Cancer News Article
If you are wondering whether Tom had any help in finding the cure, this is what he had to say: "I alone am responsible for the cure for cancer. The whole world will benefit as I have done all the work myself." Ė Tom
If you look closely, all these words/statements are indeed in the media release provided. This is one of the ideas behind Return-Oriented Programming; using small parts of a program in a different order to achieve different results.
A Little Bit About Programming
Now we have to take a small step back to look at what makes up a simple computer program. At its most basic level a program is made up of lots of small simple instructions. Instructions can be things like adding two numbers, reading or writing a piece of memory, or moving some data from one location to another. These small instructions can then be built up into longer sets of instructions that accomplish a more complex task. These complex tasks could be things like sorting a set of numbers or finding the average of a set of numbers. Programmers often want to repeat these more complex tasks again and again. To do this they could either write the same code again and again and again, or they could use a subroutine. A subroutine is a section of code that is packaged together to perform a specific complex task. A programmer can then call that subroutine, the subroutine will execute, and when finished it returns control back to where it was called from. The diagram below illustrates the difference between code without and with a subroutine.
Making a Mashup
Subroutines don't strictly have to start from the beginning. That may seem counter intuitive, but a subroutine is just a list of instructions with a return statement at the end; as such you could start executing the instructions from anywhere in that list. No matter where you start in the subroutine, the code will then continue until it reaches the return statement.
What this means is that the last few instructions before the return statement of each subroutine (often called a "gadget") can be cobbled together to create a mashup program. This mashup program can often be created to perform absolutely anything that a normal program can do, including malicious actions.
This form of programming can allow an attacker to give a set of existing code locations to execute rather than injecting new (malicious) code. This is important because lots of software is written to try and prevent any external data from executing as code (eg: No eXecute bit and Data Execution Prevention). Return-Oriented Programming gets around these protections because the data provided by an attacker is never actually executed. The attacker simply provides a list of entry points or addresses into subroutines. It is then the code already in those subroutines that gets executed.
The Return in Return-Oriented Programming
The reason the return is important, is to retain control of the flow of this new mashup program. The attacker is only really able to supply a list of addresses to execute or jump to. This means that if there was no return statement at the end of each code snippet or gadget, the flow of the program would not move onto the next bit of code that attacker wants to run. Instead it would continue running the normal program code.
The diagram below shows how all these gadgets can get cobbled together. Note the return statements (arrows) at the end of each subroutine allow for the next gadget in the attackerís list to execute.
Jumping back for a minute to our journalism example, you will see that various parts of the statement are highlighted. Each part occurs at the end of a sentence where the full stop represents a return statement. Each sentence can be thought of as a subroutine. The misquoting journalist can jump into each sentence at any point, but once there must continue until the full stop. Upon reaching the full stop, another sentence may, once again, be chosen to jump into. Look at the quote again:
Cure for Cancer News Article
If you are wondering whether Tom had any help in finding the cure, this is what he had to say: "I alone am responsible for the cure for cancer. the whole world will benefit. as I have. done all of the work. myself." Ė Tom
Getting Into the Flow of Things
Now we come to one final tricky question; how does the attacker get control of what I have called the "program flow"? There is no simple answer to that question, but it normally starts with some sort of program error. One of the most common is called a buffer overflow. This gives the attacker the ability to overwrite parts of memory that control the flow of the program; normally the stack.
The stack is an area of memory used to store small pieces of information that can be used or referenced later. One of the main types of information stored on the stack is which instruction a program should return to after a subroutine finishes. As you can probably guess this makes the ability to control the contents of the stack very useful for Return-Oriented Programming.
When a subroutine that has been called finishes with a return instruction, the computer automatically removes the next item from the top of the stack, and uses it as the next instruction to execute or the starting point for the next gadget. This essentially means that if you can write the location of the gadgets you want to execute, one after the other, onto the stack (using a buffer overflow for example) you can make existing code do more or less anything you want.
Well, I hope someone found that useful. I know I learnt quite a bit reading through some of those papers.