Reverse Engineering Techiniques Essay

IEEE—ICET 2006 2nd International Conference on Emerging Technologies Peshawar, Pakistan 13-14 November 2006 1-4244-0502-5/06/$20. 00©2006 IEEE 689 Preventing Reverse Engineering Threat in Java Using Byte Code Obfuscation Techniques Jan M. Memon, Shams-ul-Arfeen, Asghar Mughal, Faisal Memon Department of Computer Science Isra University, Hyderabad, Pakistan {janmohd, shams, asghar, faisal}@isra. edu. pk Abstrac: Java programs are compiled into a platform independent byte code format. Much of the information contained in the source code is retained in the byte code. Consequently reverse engineering ecomes much easier. Several software protection techniques have been developed, of which, code obfuscation seems to be a promising one. In this paper, two new byte code obfuscation techniques have been evolved. These techniques involve applying obfuscating transformations to the Java byte code. These techniques prevent automatic software analysis tools, De-compilers, from producing correct source code by introducing syntax and semantic errors in the generated source code. The proposed techniques are applied on sample Java class files to examine the effectiveness of the techniques in impeding reverse engineering.

The results reveal the erroneous codes generated by the tested de-compilers. Keywords: reverse engineering, obfuscation, byte code, de-compiler 1. INTRODUCTION Java provides platform independence to software programs. The software is compiled in the intermediate code format, the class file format. A class file contains massive amount of information enough for easy reverse engineering [11]. When an organization sells its software, developed in Java, to the other organization, it delivers its software by providing the intermediate code format. The organization purchasing the software may break all the laws nd obligations by simply hiring a software developer to reverse engineer the software often with the help of automated tools, De-compilers, and make appropriate changes according to its needs, thus giving financial loss and breaking all the ethical measures of the organization who did all the efforts to develop it [17]. More precisely, the developer places some sort of protection on the program that may be removed by the malicious users. Most of this work has concentrated on embedding the license file or identification keys in the software, especially in commercial applications. The program contains he code that checks for the presence of license file. In addition, the license file identifies the user to whom the software was issued. This information can be used in a legal action against the user who distributes the copies of the software illegally. However, the attack on this type of protection is to reverse engineer the code and removing the code that implements the protection or bypassing the key checking, while leaving intact the code that provides the core functionality. This paper proposes two byte code obfuscation techniques, namely, Un-letting the completion of statement and removing variable nd method name. The objective is to prevent De-compilers from generating correct source code by means of applying obfuscation techniques to the Java byte code. Successful program obfuscation will confer a number of benefits, including protection of secrets in a program, license management for software, and provide software based tamper resistance which will make reverse engineering uneconomical. In the next section, we discuss briefly about code obfuscation. In section 3, two different approaches of achieving code obfuscation in Java is discussed along with the obfuscation techniques already suggested. Then in section 4, e present the implementation of the two proposed byte code obfuscation techniques, namely, Un-letting completion of statement and removing variable and method names, by taking sample Java programs. Section 5 reveals the effects on the tested de-compilers after the implementation of the two proposed byte code obfuscation techniques. In section 6, we present the discussion over the proposed byte code obfuscation techniques. And finally, the conclusions are drawn in section 7. 690 2. CODE OBFUSCATION Code obfuscation transforms a program into another program that has an equivalent behavior but which is harder to perceive and understand 1, 2, 3, 15, 16, 18]. Code obfuscation has been proposed as the solution to problems such as protection of transient secrets in programs, protection of algorithms, license management for software, protection of digital watermarks in programs, software-based tamper resistance, and protection of mobile agents [4]. 3. APPROACHES TO OBFUSCATION IN JAVA Obfuscation in Java can be performed in two major ways, (i) Source code obfuscation and (ii) Byte code obfuscation [19]. 3. 1 Source Code Obfuscation Source code obfuscation is a technique to transform source code into another source code that has the same behavior but difficult to omprehend. Several attempts have been made in this regard. A very simple technique is to rename identifiers to contain more ambiguous names. More viable source code obfuscation methods are based on composite functions, which are Array Index Transformation, Method Argument Transformation, and Hiding Constant [5]. The obfuscation techniques that are based on composite functions make the computation complex and extensive use of these techniques make the software to respond slowly. Some source code obfuscation methods are directed at the object oriented concept; Class Coalescing, Class Splitting, and Type Hiding [6]. Other ource code obfuscation techniques may include; false refactoring, restructure arrays, inline and outline methods, clone methods, split variables, convert static to procedural data, and merge scalar variables [7]. The obfuscation techniques that work over object oriented concept and other techniques like restructure arrays, split variables, merge scalar variables may distort the logic of the software, so these must be carefully used. The employment of obfuscation technique like outline methods, clone methods, convert static to procedural data increase the size of a class file without providing any significant advantage.

We will write a custom essay sample on

or any similar topic only for you

Order now

Inlining a method results in an unresolved method call when some other class calls the inlined method. 3. 2 Byte Code Obfuscation Byte code obfuscation is a technique that transforms a Java class file into another Java class file that has the same functionality but makes reverse engineering practically difficult. It is an advanced form of obfuscation that makes byte code difficult to decompile or recompile. While every program written in Java can be compiled to byte code by a Java compiler, it is possible to create class files which no Java compiler can produce, and yet, which pass the Java Verifier with flying colors.

Such class files are said to be deviant [8]. The goal of byte code obfuscation is to produce deviant class files, such that when class files are decompiled it becomes difficult to recompile them. Therefore, the byte code file gets stronger protection against the reverse engineering. The approaches to make decompiled program uncompilable are introducing illegal identifiers, nested type name, and static methods versus instance methods [9]. A simple approach to deobfuscate these techniques is to rename identifiers and class names. Other techniques of byte code obfuscation include, removal of debugging nformation, encrypting string literals, and replacing instructions with other ones [10]. Removal of debugging information does not prevent De-compilers from doing their job, in fact debugging information can be removed by Java compiler itself if it is requested and encrypting string literals and replacing instructions with other ones may impose performance penalty. 4. IMPLEMENTATION OF THE PROPOSED BYTE CODE OBFUSCATION TECHNIQUES Since the advent of Java Programming Language, several De-compilers have been developed and most of them are freely available on the Internet. Two byte code obfuscation techniques have been proposed with the ntentions that the De-compilers will not be able to generate correct source code; the code will contain syntax and semantic errors. 4. 1 Un-letting Completion of Statement Every language has its grammar; similarly Java Programming Language follows its grammar [12]. A Java program contains various 691 statements performing some intended functions. The fundamental purpose of the proposed technique is to break the statement into two parts, the left and right hand sides. The right hand side is computed, but its result is not assigned to the left hand side immediately, rather some dead values are incorporated after he computation of right hand side and let them stay there until left hand side is used in some other statement. After it is found that the left hand side of that particular statement is used in some other statement, all those dead values incorporated in the byte code are consumed, and the computation resulted from right hand side is assigned to the left hand side. In this way, the technique becomes successful in breaking the grammatical rule that a statement cannot be broken into parts, which will lead to syntax and sometimes semantic errors in the reverse engineered code. A Java byte code is laid down with a standard format.

The methods that are implemented in the source code are converted into virtual machine instructions and are placed at a particular place according to the class file format. The byte code file is traced for methods code and the technique is implemented in the body of those methods. A sample Java program that adds two numbers and prints the result is taken to show the implementation of the proposed technique. Two different De-compilers; DJ [13], and JReversePro [14], are employed to demonstrate that the technique works. The byte code is then read and presented in XML format to make it easy to understand the byte code. The XML file s traced to reach the “add” method as shown below. There are six byte code instructions for the “add” method. First two instructions, iload_0 and iload_1, load the parameters passed to it and push the values on the Java Virtual Machine (JVM) stack. Third instruction, iadd, pops two values from the stack and adds them and pushes the result back on the stack. Fourth instruction, istore_0, pops the value from the stack and assigns it to the variable “a”. Fifth instruction, iload_0, loads the value of the variable “a” and pushes it on the stack. Sixth instruction, ireturn, pops a value from the stack and returns it to the ain method. Above the method’s byte code instructions, there is an attribute length that is equal to 34, mentioning the number of bytes following it for the method’s code. Then, there is a two bytes entry for maximum stack size containing the value 2, mentioning that the method’s code is allocated two stack entries. Then, there is a two bytes entry for number of local variables containing the value 2, showing that the method contains two local variables. After that there is a four bytes code length entry containing the value 6, showing that the method has six bytes instructions. Next, the class file is pened using AXE [20] hex editor to show the contents of the class file in hexadecimal format. The class file is modified by tracing it and making appropriate changes in it using the AXE hexadecimal editor. Seven bytes of instructions are added to the method’s body, so it needs to be reflected in the attribute length and the code length of the method. The stack size also needs to be updated to allocate more space on the stack. The attribute length value 22h is updated to contain 29h, then two bytes stack size value 02h is updated to contain 04h, and then six bytes later, the code length value 06h is updated to contain 0Dh.

Some dead values and instructions to the method’s code section are added to break the statement, “a = a + b”, into two parts. To do that 08h instruction is added to push the value 5 on the stack, 06h instruction is added to push the public class Method1{ public static void main(String arg[]){ int c=add(5,8); System. out. println(c); } public static int add(int a, int b){ a=a+b; return a; } } 34 2 2 6 000: iload_0 001: iload_1 002: iadd 003: istore_0 004: iload_0 005: ireturn 692 value 3 on the stack, A3h instruction is added to insert an “if greater than” condition, 0003h value is added to insert an offset to start execution rom the “if greater than” condition, 1Bh instruction is added to load value of variable “b” on the stack, 3Bh instruction is added to pop a value and assign it to the variable “a”. After modifying the class file and saving it, the class file was executed to see the output; the output was still the same. 4. 2 Removing Variable and Method Names Java class files follow a strict standard format. A Java class mostly contains a number of global variables and a number of methods. According to the Java class file format, all the method and global variable names used in the class are listed in the constant pool area of the class file.

From the method’s body, global variables and methods are referred with constant pool indexes; they are not called with their names. Method and global variable names are preceded with their length, number of characters, in the constant pool area. De-compilers get the names of global variables and methods from the constant pool while generating the source code from the class file. If the names of global variables and methods are removed from the constant pool with the restriction that they are not of the same type, De-compilers won’t be able to get sufficient information and will not display the names of global variables and methods n the generated source code. If the access attribute for a method or a global variable is private, they can be removed from the constant pool easily, because they are restricted to a single class only. However, if the access attribute for a method or a global variable is public, protected, or no access attribute is defined, other class files need to be searched where these are used and remove their names from those classes as well. However, today’s de-compilers are intelligent enough to detect same variable and method names in the class file and replacing them with automatically generated names in the generated source code.

So removing names from class file is restricted to one global variable name and one method name, based on the frequency of their use in the class file. As discussed above, global variable and method names are preceded with their length in the constant pool. In order to implement the technique, the length needs be updated to contain the value zero and then remove the name from the constant pool. For the sake of showing the implementation of the technique, a sample Java program is taken. The program contains a global Boolean variable, “bool”, and two methods, the “main” and the “isLicenseValid” methods. The same wo De-compilers; the DJ, and the JReversePro, are used. Next, the class file is opened using AXE hexadecimal editor. The Boolean variable name, “bool”, is placed preceded with its length 04h. The method name “isLicenseValid” is placed preceded with its length 0Eh. These names can be removed by making their length zero. Before the word “bool”, the value 04h is replaced with 00h. Then, the word “bool” is removed. The value 0Eh is replaced with 00h, and then the word “isLicenseValid” is removed by using the AXE hexadecimal editor. The output after modification remained the same, proving that there is no effect on functionality fter applying the proposed technique. 5. RESULTS After successful implementation of both techniques, it needs to show how the experimented De-compilers, the DJ, and the JReversePro, behaved and what syntax and semantic errors encountered in the generated source codes. public class Method2 { public static void main(String args[]) { bool=isLicenseValid(); if(! bool) System. out. println(“License not valid”); else System. out. println(“License Validated”); } public static boolean isLicenseValid() { String licenseKey=”Isra University”; return licenseKey. equals(“Isra”); } static boolean bool=false; } 93 After applying the first proposed technique, un-letting completion of statement, to the sample byte code, no any difference was found in the output, but as a result all of the experimented De-compilers failed to generate correct source code. The DJ De-compiler was tried on the obfuscated byte code to generate the source code as shown particularly below is the “add” method. It resulted in compilation errors when the code was tried to get compiled. After that, JReversePro De-compiler was used to generate source code as shown specifically below is the “add” method. When the generated source code as compiled, it faced an error that was both semantic as well as syntax error. After applying the second proposed technique, removing variable and method names, to the second sample class file, Decompilers resulted in generating erroneous source codes. The DJ De-compiler generated the code with missing variable and method name as shown below are the lines with missing variable and method names. The generated source code resulted in compilation errors, including those that were not errors and missed few syntax errors. Finally, JReversePro De-compiler was tested to generate source code, it also failed in producing correct ource code as shown below are the erroneous statements. The compiler was enraged on the code produced by JReversePro and printed various errors. 6. DISCUSSION The first proposed technique seems to be better than the second proposed technique, because it not only produced syntax errors but also semantic errors, whereas the second proposed technique only produced syntax errors. The second proposed technique seems to be efficient in terms of file size because it reduces the file size by removing the name of variables and methods. The first proposed technique increases the file size little bit by incorporating ome dead values in the byte code, which can be compensated by applying second proposed technique altogether. 7. CONCLUSIONS To thwart reverse engineering attacks, two new byte code obfuscation techniques have been proposed. These techniques have become successful in defeating automatic software analysis tools by introducing syntax and semantic errors, and have made reverse engineering extravagant. REFERENCES [1] G. Nolan, Decompiling Java, Apress, Berkeley, 2004. [2] M. R. Stytz and J. A. WHITTAKER, Software protection: security’s last stand. IEEE Security and Privacy, 2003, Vol. 1, No. 1, pp. 95-98. [3] D. E.

Bakken, R. Parameswaran, D. M. Blough, A. A. Franz, and T. J. Palmer, Data obfuscation: anonymity and desensitization of usable data sets. IEEE Security and Privacy, 2004, Vol. 2, No. 6, pp. 34-41. [4] L. D’Anna, B. Matt, A. Reisse, T. V. Vleck, S. Schwab, and P. Leblanc, Self-protecting mobile agents obfuscation report. Technical Report No. 03-015, Network Associates Laboratories, 2003. [5] L. Ertaul and S. Venkatesh, Novel obfuscation algorithms for software security. Proceedings of public static int add(int i, int j) { i + j; if(5>3); i = j; i; return i; } public static int add(int i, int j) { i = j; i = (5

Reverse Engineering Techiniques Essay

Related Essays:

Related Essays

New Essays