We’ll start introducing what basic static analysis means and later we’ll take a closer look at the advanced one.
Static analysis is the process of analysing a malicious code without executing it. In this chapter, we will see different techniques when approaching our initial phase of analysis where we’ll simply gather general information about what the malware could do and guess it’s behaviour.
Since there are a lot of premade applications that will scan our sample (for examples antiviruses), we could start our journey from there.
Simply as it sounds, we’ll let a bunch of antiviruses scan our sample and wait for the final report. Since every antivirus is different and could potentially produce different outputs, the best solution is to use multiple antiviruses to scan the sample. The question is, why the hell do I have to install 5 or 6 antiviruses on my computer for a stupid sample?
You’re right, we don’t. VirusTotal will take your sample and run it into dozens of antiviruses, leaving a detailed report at the end of the scan. It’s an incredible tool that could tell us soo much about the sample, it’s maliciousness, possible behaviours, linked libraries or imported functions.
Every antivirus to be efficient has to scan and determine if the sample is malicious or not. In order to do that, the antivirus has an internal database of file signatures that will check to see if the sample is a common malware or it can perform an heuristic analysis.
The heuristic analysis checks the sample for common code pattern in common malicious samples, determining it’s behaviour based on past research and analysis. This type of analysis is really common and can lead to new malware discoveries if they use the same pattern. On the other hand, the heuristic analysis can be a total failure if the malware is coded in different styles, has different vulnerability or patterns that the antivirus didn’t know.
A signature is an unique identifier that identifies a sample and we can do that by simply getting the MD5 or SHA-1 hash of the file. It is so common that every researcher / software use this signature to identify malware and share between them.
To be efficient as possible, the antivirus checks the file signature from the sample with its internal database to find a match. Since most antiviruses share a common file signature database, before every scan, the antivirus updates its database and then starts comparing the signature.
This is a really stupid method but also sometimes effective when analysing a non obfuscated / packed malware. By executing the
strings command on the sample, we can extract all the strings used in the program.
Most cases will be just gibberish and some interesting strings like domains, IPs, commands, debug outputs or API calls.
Packed and obfuscated malware
Obfuscation and packaging of malware is really common since it makes the sample way harder to reverse and analyse. Just think about it, instead of reading pure code with comments and easy variables that have a meaning behind it, you see random strings, random variable names, random function names, random code with no logical sense just to piss off the analyst.
This techniques have one simple purpose: the harder is to analyse, the greater chances of success the malware can have since you can’t understand it’s behaviour. Obviously, this is not always the case since there are a lot of automated tools that will detonate the sample and analyse its executing, guessing what it might be and how severe can it be, but you can’t always rely on automated tools simply because they might not be 100% accurate.
A packed sample will have 2 layers: the malicious malware and a wrapper that will pack and unpack the malware upon it’s execution. If you try to statically analyse the sample, you can only see the outer layer, which is the wrapper program in this case, making it useless if your goal is to analyse the sample. Upon the malware execution, the wrapper has to unpack and run the malware, leaving the malware vulnerable for dynamic analysis.
One way to detect a packed malware is to invoke the
strings function on the file and look at its output; if the output is full of random strings, most likely it’s packed and obfuscated.
You can use PEiD to find the packer and additional informations such the entry point.
Portable Executable File Format (PE File)
A PE file is used by Windows executables and DLLs, containing necessary information for the operative system to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage data.
A PE file consists of a number of headers and sections that tell the dynamic linker how to map the file into memory. An executable image consists of several different regions, each of which require different memory protection; so the start of each section must be aligned to a page boundary.
For instance, typically the .text section (which holds program code) is mapped as execute/readonly, and the .data section (holding global variables) is mapped as no-execute/readwrite. However, to avoid wasting space, the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section to memory individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers
Linked Libraries and Functions
One of the most useful pieces of information that we can gather about an executable is the list of functions that imports ( also known as IAT). The reason of imported libraries is to not implement the same functions all over again, linking and making possible that a single function stored in a location can be executed on different machines with re-implementing it.
The programmer has mainly 3 ways to import functions: static linking, runtime linking and dynamic linking.
Static linking is the result of the linker copying all library routines used in the program into the executable image. This may require more disk space and memory than dynamic linking, but is both faster and more portable, since it does not require the presence of the library on the system where it is run.
Runtime linking is commonly used by malware especially whe it’s packed and obfuscated. Executables that use runtime linking connect to libraries only when that function is needed.
Dynamic linking is accomplished by placing the name of a sharable library in the executable image. Actual linking with the library routines does not occur until the image is run, when both the executable and the library are placed in memory. An advantage of dynamic linking is that multiple programs can share a single copy of the library.
You can analyse the imported functions with Dependency Walker.
Common libraries used and it’s usage:
|Kernel32.dll||This is a very common DLL that contains core functionality, such as access and manipulation of memory, files, and hardware.|
|Advapi32.dll||This DLL provides access to advanced core Windows components such as the Service Manager and Registry.|
|User32.dll||This DLL contains all the user-interface components, such as buttons, scroll bars, and components for controlling and responding to user actions.|
|Gdi32.dll||This DLL contains functions for displaying and manipulating graphics.|
|Ntdll.dll||This DLL is the interface to the Windows kernel. Executables generally do not import this file directly, although it is always imported indirectly byKernel32.dll. If an executable imports this file, it means that the author intended to use functionality not normally available to Windows pro- grams. Some tasks, such as hiding functionality or manipulating pro- cesses, will use this interface.|
|WSock32.dll & Ws2_32.dll||Networking Dlls. A program that accesses either of these most likely connects to a network or performs network-related tasks.|
|Wininet.dll||This DLL contains higher-level networking functions that implement protocols such as FTP, HTTP, and NTP.|