How to start RE/malware analysis?

Many people approach me asking more or less the same questions: how to start RE, how to become a malware analyst, how did I start, what materials I can recommend, etc. So, in this section I will collect some hints and useful links for the beginners.

//NOTE: this article is periodically updated with new materials

The topic of reverse engineering (RE) is very broad. You can reverse engineer all sort of software for all sort of platforms. You can even reverse engineer hardware. But in this article I will focus mostly on the subset of skills that you need for analyzing malware on Windows.

Tools & environment

In order to not infect yourself, you need to prepare an isolated virtual environment with all the tools installed, where you can deploy the malware sample and analyze it. More details:

Learning tools

Among the tools that you will use on daily will be debuggers and disassembles, such as IDA, Ghidra, BinaryNinja, OllyDbg (or some of its derivatives such as ImmunityDbg), x64dbg. Very useful and advanced, but not as user-friendly is WinDbg – also worth to learn it, but I don’t recommend it to beginners. Below you will find some courses that will help you familiarize with those tools:

Reversing with Lena151 – learn OllyDbg (old, but still very useful)
TiGa’s course on IDA Pro
Introduction to WinDbg by Anand George

How to get malware samples, intelligence etc.?

If you are a beginner and not a member of any community yet, you can find fresh, nicely cataloged samples for free here:

You can also download them from some of the free online sandboxes and open repositories, such as:

For a threat intelligence, information about outbreaks, hashes of fresh samples etc, I recommend you to join twitter and follow some of the researchers that you know.

Check also some malware trackers, where you can find live links to the latest malware, and some more information about campaigns:

Common malware families

A catalogue of various articles on particular malware families you can find here:

https://malpedia.caad.fkie.fraunhofer.de/

Mind the fact, that the sourcecode of several popular malware families already leaked (i.e. ZeuS, Tinba, Gozi , Pony, Alina, Carberp). A bigger collection is available at VXUnderground, here. Strains that are currently in circulation may be based on them, or have some fragments of code copied. Hours of reading the leaked code may save you days of analysis! And even when you are dealing with a malware that was written from the scratch, the experience gained by reading the leaked code can help you recognize common approaches.

Similarly, malware authors don’t hesitate to adapt the toolkits that were written for a legitimate purpose, i.e. for Red Teams. An example of such tool that was leaked and is often being used in malware is Cobalt Strike (Avast on Cobalt Strike, Mandiant on Cobalt Strike).

Exercises

Reversing is an art that you can learn only by doing, so I recommend you to start practicing directly. First try to practice by following step-by-step writeups.

Beginner Malware Reversing Challenges (by Malware Tech)
Malwarebytes CrackMe #1 + tutorial
Malwarebytes CrackMe #2 + list of write-ups
Malwarebytes Crackme #3 + list of write-ups
https://crackmes.one/ – various crackmes to help you exercise reversing
“Nightmare” – a reverse engineering course created around CTF tasks

Check also writeups from an annual FlareOn Challenge (including my writeups that are on this blog). It contains variety of reverse engineering tasks with growing difficulty level.

Inside the compiled application

Reversing a native application requires you to understand some low-level concepts. If you want to focus on Windows malware (as I do), you will most of the time be dealing with PE files. When you watch an application under a debugger, you see it in a disassembled form – transformed to assembly language (assembler). So, the more about assembler, PE structure, and operating system you know, the easier will be for you to follow.

Here and here you will find some gentle introduction to x86 assembly. To get a deeper understanding and a grasp on other platforms too, check this free course. An invaluable, comprehensive resource about assembly is the official Intel manual.

For learning the PE format, I recommend you to read [this] + the articles of Matt Pietrek (i.e. [1] [2] [3]), and Ange Albertini’s posters (PE101, PE102). You can start by my slides about PE. Check also PE-bear and try to view various executables, compare it with what you read about the format.

Programming for RE/malware analysis

Not all malware analysts are proficient programmers, but you need to have some basic skills, and at least be able to understand the code. The more fluent programmer you are, the better for you – you will be able to experiment with the techniques and create some tools helping you in analysis.

The languages that I use on daily are C/C++, Python, and assembler, and I am mostly agree with [this] MalwareTech’s article.

Some people ask me from where I learned particular languages, so here are some of the sources:

x86 Assembler:
- Iczelion’s tutorial, Win32 Assembler for Crackers by Goppit (original: chm, converted: pdf)
C/C++
- “The C Programming language” – by Kernighan & Ritchie
- “The C++ Programming language“
- “Linux Programming by example” – by Kurt Wall

“Windows System Programming” is a very solid book covering Windows API and the related topics.

Malware unpacking

Malware usually comes packed, and in order to analyze the core you will have to unpack it from the outer, protective layer. Malware distributors may use legitimate, well-known packers and protectors, as well as custom ones, prepared with a special focus on AV evasion. This article explains the concept.

To get familiar with manual unpacking, check the series of tutorials “Unpacking With Anthracene” [mirror: Unpacking With Anthracene.zip, pass: tuts4you], and other tutorials from Tuts4You.

My vidoetutorials about unpacking malware are available here.

Protectors using virtualization

A separate class of executable protectors, also applied in malware, are the ones using virtualization. In contrast to the classic ones, they not only wrap the original code into another layer, but also modify the existing code, rewriting some fragments in a way that they can run only on a built-in Virtual Machine. This type of protectors are especially difficult to analyze, as they cannot be unpacked in a typical way. Sometimes reconstruction of the original executable is impossible, or time inefficient. I usually deal with them with the help of tracing. However, for less complex variants, it is possible to analyze the full VM logic, and reconstruct the code.

Example of (legitimate) protectors of such type are VMProtect and Themida.

Below you can find some links on analysis of this type of protection method:

Workshop: Analysis of Virtualization-based Obfuscation
https://www.youtube.com/watch?v=PAG3M7mWT2c&t=13229s – a talk on reversing VMProtect
VMProtect 2 – Detailed Analysis of the Virtual Machine Architecture
VMProtect 2 – Part Two, Complete Static Analysis
A solution to VMProtect challenge from UIUCTF 2021 – SpeakEasy: https://medium.com/@acheron2302/speakeasy-writeup-3af3375ab63
“Tickling VMProtect with LLVM”: [1][2][3] (more on LLVM here)

You can practice de-virtualilzation on some dedicated crackmes – packed by simple, custom VM-protectors. Examples:

https://www.malwaretech.com/challenges/windows-reversing/vm1
Flare-On 8 Task 10 “Wizardcult”

Malware injection/impersonation methods

Most of the malware injects code into other processes. The common purposes of injections are: impersonating other applications and hooking. The implant can be a shellcode, as well as a full PE. Used methods are various. Among PE impersonation techniques, the most popular is Process Hollowing (aka RunPE) and Reflective DLL injection.

a walk-through various techniques (by Endgame)
ready-made demos of various code injection techniques (source code)
Review of various injection techniques (BlackHat 2019) [Video] [PDF]
https://github.com/odzhan/injection – an extensive set of examples
List of my PE injection demos (source code)

Code implants can be detected i.e. with the help of PE-sieve/HollowsHunter. PE implants are detected by default, and shellcode implants optionally.

Hooking

Hooking is a technique that allows to intercept API calls. Malware uses this technique for various purposes, such as: being unnoticed by monitoring applications, intercepting the data being sent etc. From the other hand side, the same technique is also used by sandboxes, to monitor malware.

How the hooking works:

“Inline Hooking for programmers” (by MalwareTech) – part #1 and part #2
Windows API Hooking (at Red Teaming Experiments)
My slides about various types of hooking

How a simple, userland rootkit utilizes hooking:

https://blog.malwarebytes.com/threat-analysis/2016/12/simple-userland-rootkit-a-case-study/

Hooking can be detected i.e. with the help of PE-sieve/HollowsHunter. PE-sieve detects inline hooking by default, and IAT hooking optionally. In HollowsHunter, the detection of both is optional.

Kernel Mode malware

Most of the malware you will encounter works in userland. But from time to time you can come across some kernel mode malware modules. Reversing them is more difficult, and it will require different environment setup.

Setting up the environment for analyzing malware in kernel mode will follow the same steps as I described for Windows Kernel Exploitation practice, here:

https://hshrzd.wordpress.com/2017/05/28/starting-with-windows-kernel-exploitation-part-1-setting-up-the-lab/

Kernel mode modules are structured completely different than the ones that you encounter on daily in userland. This is why, before analyzing them, I strongly recommend you to get a general knowledge on drivers. In my opinion, the best introduction to the topic is a book “Windows Kernel Programming” by Pavel Yosifovich , and the set of accompanying examples that is available for free here. A briefer intro into the methodology of analyzing a driver (by Matt Hand) can be found here. Other notes on the topic, (by VoidSec) are available here (although the author focus mostly on Driver RE in context of exploitation, but most of the tips are valid also for malware researchers).

Below, you can find a very nice tutorial about reversing a kernel more rootkit:

http://www.sekoia.fr/blog/wp-content/uploads/2016/10/Rootkit-analysis-Use-case-on-HIDEDRV-v1.6.pdf

More about techniques used by kernel mode rootkits you will find, i.e. here:

Keep in mind that some of the techniques of classic kernel mode rootkits no longer work on modern Windows.

Even lower level…

It is possible to install malware implants at even lower level than the Operating System’s kernel space. There are malware out there that infect MBR, such as Petya, and other bootlockers. A separate category are bootkits, which can hijack the whole boot sequence of the system, starting from the earliest stages. Of course preparing such implant is much more difficult than writing a conventional malware, so it is done very rarely. Here you can find a very interesting research on the malware that infects UEFI firmware: “Moon Bounce”.

Courses

https://www.begin.re/ – Reverse Engineering for beginners
Reverse Engineering Malware 101 and 102 – by MalwareUnicorn
https://github.com/mytechnotalent/Reverse-Engineering
http://legend.octopuslabs.io/sample-page.html
http://opensecuritytraining.info/Training.html
https://samsclass.info/126/126_S17.shtml – Practical Malware Analysis
Malware Analysis course (University of Cincinnati)
Red/purple teaming: a malware development course by 0xPat
Building C2 implants in C++
My training: malware_training_vol1 (work-in-progress)

YouTube channels

Books

Practical Malware Analysis: A Hands-On Guide to Dissecting Malicious Software
The Art of Computer Virus Research and Defense – Peter Szor (old but very good book)
“The “Ultimate”Anti-Debugging Reference” – by Peter Ferrie (old but still relevant compendium of various anti-debug techniques used by malware)
Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code
Hacker Disassembling Uncovered – by Kris Kaspersky
The Rootkit Arsenal: Escape and Evasion in the Dark Corners of the System
Rootkits and Bootkits – by Alex Matrosov, Eugene Rodionov, and Sergey Bratus
Windows System Programming (4th edition) – by Johnson M. Hart
Gray Hat Python

Tips & ideas

How to get a job as malware analyst?

From my experience, the best way is to contribute in the community. Be active, start researching on your own, show your passion, share what you learned. There is a big and very friendly community of researchers on twitter, it helped me a lot finding a job in this field. So, if you are not there yet, I strongly recommend you to join.

40 Responses to How to start RE/malware analysis?

withrich says:

July 26, 2018 at 7:17 am

Thanks!

Reply
anthonynowlan says:

July 26, 2018 at 11:59 am

More about techniques used by kernel more rootkits you will find, i.e. here: s/more/mode

Reply
- hasherezade says:
  
  July 26, 2018 at 11:23 pm
  
  thx, fixed!
  
  Reply
Pingback: Reverse Engineering in CTF Tips – fareedfauzi
Nanna says:

September 17, 2018 at 10:07 am

What are some CS classes hat help improve Reversing Skills?

Reply
- lazer991 says:
  
  October 23, 2019 at 11:21 pm
  
  Hey, check out reversinghero video tutorial: https://www.reversinghero.com
  
  Reply
Eilon says:

October 1, 2018 at 12:50 pm

Awesome as always.

Reply
Gio says:

December 29, 2018 at 10:12 am

Many thanks for sharing this! Would you tell some more about your environment for malware analysis? Which setup are you using, whether you use hardened VM and how hardened, what is your typical workflow, if you use any sandboxes too, if and how you store samples, etc

Reply
- hasherezade says:
  
  December 31, 2018 at 3:36 am
  
  Hi! My setup for malware analysis is very simple. As a base system I use Linux (Debian) with Wireshark (to sniff the traffic from the guest if needed). Then I use Windows on VirtualBox. On Windows I have all my tools installed (PE-bear, debuggers, PIN tools, SysInternals Tools, Fiddler, etc). I don’t usually use hardened VMs, just a basic setup.
  I start from viewing a sample in PE-bear, then I am unpacking it (with PE-sieve, or manually if needed). Once I have the sample unpacked, I view it again in PE-bear, to get a general overview. If it is not obfuscated, I just open it in IDA and start analyzing statically. If the sample is complex or obfuscated, I start from tracing it by a PIN tracer. I usually use TinyTracer (https://github.com/hasherezade/tiny_tracer first), then eventually some more complex traces. They give me tags that I am loading to IDA to better understand the obfuscated parts.
  Depending on a sample, I can switch from static to dynamic analysis multiple times. Sometimes I may start from a behavioral analysis, observing API calls with ProcMon, observing eventual traffic with Fiddler or Wireshark.
  I do several iterations, renaming functions in IDA, adding comments.
  When the sample is defending itself against analysis, I find those branches by PIN tracers, and patch them to make the malware “blind”. Sometimes I import functions from malware to experiment with them (with libPeConv).
  I hope it answers your question 🙂
  
  Reply
  - Gio says:
    
    January 8, 2019 at 10:32 pm
    
    Many thanks, that really helps! 🙂
Sanyuj says:

January 18, 2019 at 10:11 am

Thankyou for sharing all of this @hasherezade !!!
I came to know about you from a paper i read about PrincessLocker unpacking.
Love your content and wish to contribute to the community soon!

Reply
ahmedES says:

February 1, 2019 at 11:34 pm

any recommended resources to learn shellcoding ?

Reply
- hasherezade says:
  
  June 25, 2019 at 4:29 pm
  
  https://idafchev.github.io/exploit/2017/09/26/writing_windows_shellcode.html
  
  Reply
- hasherezade says:
  
  February 11, 2021 at 6:11 pm
  
  Also, some time ago I released a paper about shellcoding, maybe you will find it useful: https://vxug.fakedoma.in/papers/VXUG/Exclusive/FromaCprojectthroughassemblytoshellcodeHasherezade.pdf
  
  Reply
Pingback: trimstray/the-book-of-secret-knowledge
Adrian Dostoevksy says:

September 14, 2019 at 8:46 am

Thank you for this amazing post.

Reply
regular_user says:

July 11, 2020 at 6:01 pm

which is your recommendation? “Windows 10 System Programming” or “Windows System Programming”

Reply
Jesus Conejo says:

February 7, 2021 at 11:25 am

Hi Alexandra. First of all thanks so much for this Git plenty of information. I have put a message on one of your Youtube videos, sorry for repeat it here. I’m studying the Phobos ransomware. I have seen your analysis and is amazing. I also have decompile it with IDA to assembler and C but it’s a nightmare to study it with all these generic “sub_xxx”….is possible for you to send me the C code (with the variables and routines named) please?. Thanks!!

Reply
- hasherezade says:
  
  February 8, 2021 at 7:28 pm
  
  Hi, I don’t have the code of Phobos, all I had for the analysis was the compiled executable. I don’t think I saved my full IDB – but I have some notes from the analysis of some important fragments (RSA algorithm in Phobos): https://gist.github.com/malwarezone/83c6c73a7657f4b6abded5af2b1a6fe8. You can apply the CSV on your IDB using IFL plugin (https://github.com/hasherezade/ida_ifl)
  
  Reply
Jesus Conejo says:

February 9, 2021 at 9:44 am

Hi. Thanks for answering. Which CSV file?..Do you have a CSV with Phobos data?….My idea is trying to understand how Phobos creates the AES key. My company contracted somebody to decrypt some files and he got it so it’s possible. He only asked for the IV value (in the file name) so all the information for decrypt must be into the encrypted files . We have too the tool from the Phobos hacker for decrypt but it do it in two steps, first you must look for the file for it to see the RSA public (?) key, this key must be sent to the hacker and then he/she must send to you the AES Key that you must write in the tool for decrypt the files. Getting RSA private key from public key is not possible in a short time (calculating for some hundreds years i think) but knowing how Phobos create/mount the AES key could be a solution. What do you think?. Thanks!

Reply
- hasherezade says:
  
  February 10, 2021 at 5:13 pm
  
  If you read carefully my analysis at Malwarebytes blog (https://blog.malwarebytes.com/threat-analysis/2019/07/a-deep-dive-into-phobos-ransomware/) you will see that I already found how the AES key is generated. They use CryptGenRandom (a strong random generator). So, as my analysis concluded, the ransomware is NOT decryptable: “[…] the used encryption algorithm is secure. It is AES, with a random key and initialization vector, both created by a secure random generator. The used implementation is also valid: the authors decided to use the Windows Crypto API.”
  
  There is still some (small) window of chance to decrypt it without paying the ransom, but only if you manage to dump the generated key from the memory of the running ransomware. I demonstrated it on the video ( https://youtu.be/tbcrV1rNgMo ) – yet, this scenario is not applicable in most of the cases.
  
  Reply
  - Jesus Conejo says:
    
    February 11, 2021 at 11:55 am
    
    Hi Alexandra. Thanks so much. I meant really to found the random number inside the AES key, because I have a encrypted and decrypted file (the problem would be if I only had a encrypted file). In a brute force way I could implement a simple app to look for that rand number. As I already comment we contracted a little company for decrypt some files and they did it without knowing anything, just having the files…(perhaps contacting with the hacker and sharing the money, who knows) so there must be a way. I’m going to study your video. Thanks!!
  - hasherezade says:
    
    February 11, 2021 at 6:03 pm
    
    The key consists of 32 values, each from the range 0-255 (demonstrated here: https://youtu.be/tbcrV1rNgMo?t=546). It gives 256^32 possibilities that we need to check in order to brutforce the key.
    Means it is impossible to brutforce it in our lifetime.
    Maybe you can find my presentation on this topic helpful: https://speakerdeck.com/hshrzd/virus-bulletin-2016-challenges-and-approaches-of-cracking-ransomware
    
    The company that proven to decrypt the data probably got them decrypted by the cybercriminals themselves.
  - Jesus Conejo says:
    
    February 14, 2021 at 11:10 pm
    
    Hi again. Amazing video!!!!. I understand, but you are watching when Phobos is going to encrypt…what happend when Phobos has finished and it has encrypted all files?, IV changes when it starts another encrypting “session” (you can see it that in the same hard disk you can have several IV’s in the name of the encrypted files corresponding, I suppose , to different AES Keys)….How you could find an AES key from a old “encrypting session”?…..Thanks!!
  - hasherezade says:
    
    February 15, 2021 at 7:36 pm
    
    this is why I said earlier: it is impossible to crack it. once the AES key is destroyed, there is no way to recover it. and as I also mentioned earlier, brutforcing it is impossible because there are too many values in the range, and checking all of them will exceed our lifetime.
  - Jesus Conejo says:
    
    February 16, 2021 at 11:46 am
    
    Yes, I understand what you say. My hope was that AES were built with some constant values plus random (IV + volume_serial + random value for example) and not only the random value. If not of course is impossible. I’m watching (re-reading your deep analysis) that at generating this 32 byte value it looks for a context, perhaps looking for a key container?…It’s not clear for me the way virus reads/hide the strings and file extensions….It hides those strings in memory? if so..which AES key uses (if there are not any files to encrypt) ?, …what do you think?. Thanks!!!
  - hasherezade says:
    
    February 16, 2021 at 9:43 pm
    
    “My hope was that AES were built with some constant values plus random (IV + volume_serial + random value for example) and not only the random value” – but this is not the case. All the 32 values are completely random, generated by a cryptographically strong generator. This is why cracking it is really impossible, sorry.
Jesus Conejo says:

February 17, 2021 at 12:51 pm

Hi Alexandra. Yes, ok. Just for continue learning…what about the encrypted strings questions?…what do you think?. Thanks so much!!

Reply
aaa says:

May 17, 2021 at 10:52 am

Windbg Preview – use it for kernel debugging.

Reply
Pingback: List of Step by Step Manuals, Howto Guide & Tutorials for Every Software Engineers - DevOpsSchool.com
moshe says:

November 5, 2021 at 12:34 am

Your links to tuts4you aren’t working, for example the unpacking section.
can you please fix them?

Reply
- hasherezade says:
  
  November 14, 2021 at 8:50 pm
  
  ok, I added a mirror to the unpacking tutorials from tuts4you. unfortunately their main site is down.
  
  Reply
Christoff Sogon, a dangerous toy maker und trainer (@ChristoffSogon) says:

September 18, 2022 at 7:56 pm

Holy smokes! I just noticed this resource and this is absolutely outstanding!

Reply
qazerr says:

May 4, 2023 at 3:58 pm

Hi. The link ““So You Want To Be A Malware Analyst” – by Adam Kujawa ” is broken (URL is doubled). Thanks.

Reply
- hasherezade says:
  
  May 6, 2023 at 11:52 pm
  
  thanks for the heads-up, fixed!
  
  Reply
Burak says:

June 8, 2023 at 2:52 pm

Hİ GoaT
Should a non-native English speaker first learn English to start reverse engineering?
Also, is learning C# very useful in reverse engineering?
Also my last question will be a bit illegal xd. I want to start reverse engineering to crack programs. can every program be cracked?

Reply
- hasherezade says:
  
  July 9, 2023 at 5:40 pm
  
  Hello Burak :)! Yes, I think learning English is important, because of various reasons. Even code in many programming languages contains English words, so at least you need some basic grasp. Also, not all tools are localized. But more importantly, most documentation, and the latest publications, are written in English. Yes, you can use automatic translators, but the quality of the produced output vary, and it may be misleading sometimes, so it’s better not to be dependent on them. Also English is helpful to interact with the global community. It is useful if you can write your own publications in English, you will reach much broader audience. So, although in theory you can manage without English, not knowing it may slow down your progress significantly. Should you learn it first? I would say, build up good basis, and you will learn more along the way, even by reading technical articles.
  Learning C# is useful, but at some point you will probably want to reverse native applications, so knowing C# is not enough. I would say it is a plus, but give more focus to C/C++ etc. It will be helpful not only in writing your own tools, but also to get a better understanding of how the native code is structured, memory management, etc. Also, most decompilers of native applications produce a C-like output.
  Lastly: can every program be cracked – in theory yes. Just, some are harder to crack than others. Yet, sometimes you can encounter elements that cannot be cracked, because they are protected with strong cryptographic algorithms. Even in such cases though, it may sometimes be possible to find some workaround, to achieve your goal without cracking the algorithm. Many things seem uncrackable, until someone comes and cracks it ;). So, you never know.
  I hope it answers your questions, cheers!
  
  Reply
  - Burak says:
    
    August 11, 2023 at 9:27 am
    
    thx Goat ❤
0x539 says:

November 20, 2023 at 1:59 am

Did you install a Windows XP virtual machine for following Lena151’s playlist or what kind of environmental setup you had ?

Reply
- Sabko says:
  
  May 11, 2024 at 10:04 am
  
  I installed a Windows 7 VM and x64dbg. It is easy to follow
  
  Reply