Before diving into the malware world, we need to have a complete understanding of the core of the machines we are analyzing malware on. For reverse engineering purposes, it makes sense to focus largely on the architecture and the operating system it supports. Of course, there are multiple devices and modules that comprise a system, but it is mainly these two that define a set of tools and approaches used during the analysis. The physical representation of any architecture is a processor. A processor is like a heart of any smart device or computer in that it keeps them alive.

In this chapter, we will cover the basics of the most widely used architectures, from the well-known x86 and x64 Instruction Set Architectures (ISAs) to solutions powering multiple mobile and Internet of Things (IoT) devices that are often misused by malware families, such as Mirai and many others. It will set the tone for your journey into malware analysis, as static analysis is impossible without understanding assembly instructions. Although modern decompilers indeed become better and better, they don't exist for all platforms that are targeted by malware. Additionally, they will probably never be able to handle obfuscated code. Don't be daunted by the complexity of assembly; it just takes time to get used to it, and after a while, it becomes possible to read it like any other programming language. While this chapter provides a starting point, it always makes sense to deepen your knowledge by practicing and exploring further.

This chapter is divided into the following sections to facilitate the learning process:

Basic concepts
Assembly languages
Becoming familiar with x86 (IA-32 and x64)
Exploring ARM assembly
Basics of MIPS
Covering the SuperH assembly
Working with SPARC
Moving from assembly to high-level programming languages

Basic concepts

Most people don't really understand that the processor is pretty much a smart calculator. If you look at most of its instructions (whatever the assembly language is), you will find many of them dealing with numbers and doing some calculations. However, there are multiple features that actually differentiate processors from usual calculators, for example:

Processors have access to a bigger memory space compared to traditional calculators. This memory space gives them the ability to store billions of values, which allows them to perform more complex operations. Additionally, they have multiple fast and small memory storage units embedded inside the processors' chip called registers.
Processors support many instruction types other than arithmetic instructions, such as changing the execution flow based on certain conditions.
Processors are able to communicate with other devices (such as speakers, mics, hard disks, graphics card, and so on).

Armed with such features in conjunction with great flexibility, processors became the go-to smart machines for technologies such as AI, machine learning, and others. In the following sections, we will explore these features and later will dive deeper into different assembly languages and how these features are manifested in these languages' instruction sets.

Registers

As most of the processors have access to a huge memory space storing billions of values, it takes longer for the processor to access the data (and it gets complex, as we will see later). So, to speed up the processor operations, they contain small and fast internal memory storage units called registers.

Registers are built into the processor chip and are able to store the immediate values that are needed while performing calculations and data transfer from one place to another.

Registers may have different names, sizes, and functions, depending on the architecture. Here are some of the types that are widely used:

General data registers: General data registers are registers that are used to save values or results from different arithmetic and logical operations.
Stack and frame pointers: These are registers that are used to point to the beginning and the end of the stack.
Instruction pointer/program counter: The instruction pointer is used to point to the start of the next instruction to be executed by the processor.

Memory

Memory plays an important role in the development of all smart devices that we see nowadays. The ability to manage lots of values, text, images, and videos on a fast and volatile memory allows processors to process more information and eventually perform more complicated operations, such as displaying graphical interfaces in 3D and virtual reality.

Virtual memory

In modern operating systems, whether they are 32-bit or 64-bit based, operating system allocates an isolated virtual memory (in which its pages are mapped to the physical memory pages) for each application to secure the operating system's and the other applications' data.

Usual applications are supposed to have an ability to access only their own virtual memory. They have the ability to read, write, or execute instructions in their virtual memory pages. Each virtual memory page has a set of permissions assigned to it that represent the type of operations that the application is allowed to execute on this page. These permissions are read, write, and execute. Additionally, multiple permissions can be assigned to each memory page.

For an application to attempt to access any value stored in memory, it needs a virtual address, which is basically the address of where this value is stored in memory.

Despite knowing the virtual address, access can be hindered by another issue, which is storing this virtual address. The size of the virtual address on 32-bit systems is 4 bytes and on 64-bit systems it is 8 bytes. This means we may need to allocate another space in memory to store that virtual address. For this new space in memory, to access it directly by its address only, we would need to store its own memory address in another memory space that will lead us to an infinite loop, as shown in the following figure:

Figure 1: Virtual memory addresses

To solve this problem, multiple solutions are used nowadays, and in the next section, we will cover one of them, which is the stack.

Stack

Stack literally means a pile of objects. In computer science, a stack is basically a data structure that helps to save different values in memory with the same size in a pile structure using the principle of Last In First Out (LIFO).

The top of the stack (where the next element will be placed) is pointed by a dedicated stack pointer, which will be discussed in greater detail below.

A stack is common between many assembly languages and it has several functions. For example, it may help in solving mathematical equations, such as X = 5*6 + 6*2 + 7(4 + 6), by storing each calculated value and pushing each one in the stack, and later popping (or pulling) them back to calculate the sum of all of them and saving them in variable X.

It is also commonly used to pass arguments (especially if there are a lot of them) and store local variables.

A stack is also used to save the return addresses just before calling a function or a subroutine. So, after this routine finishes, it pops the return address back from the top of the stack and returns it to where it was called from to continue the execution.

While the stack pointer is generally pointing to the current top of the stack, the frame pointer is keeping the address of the top of the stack before the subroutine call, so it can be easily restored after it is returned.

Branches, loops, and conditions

The second feature that processors have is the ability to change the execution flow of a program based on a given condition. In every assembly language, there are multiple comparison instructions and flow control instructions. The flow control instructions can be divided into the following categories:

Unconditional jump: This is a type of instruction that forcefully changes the flow of the execution to another address (without any given condition).
Conditional jump: This is like a logical gate that switches to another branch based on the given condition (such as equal to zero, greater than, or lower than), as shown in the following figure:

Figure 2: An example of a conditional jump

Call: This changes the execution to another function and saves the return address to be restored later if necessary.

Exceptions, interrupts, and communicating with other devices

In assembly language, communication with different hardware devices is done through what's called interrupts.

An interrupt is a signal to the processor sent by the hardware or software indicating that there's something happening or there is a message to be delivered. The processor suspends its current running process, saving its state, and executes a function called an interrupt handler to deal with this interrupt. Interrupts have their own notation and are widely used to communicate with hardware for sending requests and dealing with their responses.

There are two types of interrupts. Hardware interrupts are generally used to handle external events when communicating with hardware. Software interrupts are caused by software, usually by calling a particular instruction. The difference between an interrupt and an exception is that exceptions take place within the processor rather than externally. An example of an operation generating an exception can be a division by zero.

Assembly languages

There are two big groups of architectures defining assembly languages that we will cover in this section, and they are Complex Instruction Set Computer (CISC) and Reduced Instruction Set Computer (RISC).

CISC versus RISC

Without going into too many details, the main difference between CISC assemblies, such as Intel IA-32 and x64, and RISC assembly languages associated with architectures such as ARM, is the complexity of their instructions.

CISC assembly languages have more complex instructions. They focus on completing tasks using as few lines of assembly instructions as possible. To do that, CISC assembly languages include instructions that can perform multiple operations, such as mul in Intel assembly, which performs data access, multiplication, and data store operations.

In the RISC assembly language, assembly instructions are simple and generally perform only one operation each. This may lead to more lines of code to complete a specific task. However, it may also be more efficient, as this omits the execution of any unnecessary operations.

Types of instructions

In the following sections, we will cover the main structure of each assembly language, the three basic types of assembly instructions, and how they are implemented in each of them:

Data manipulation:
- Arithmetic manipulation
- Logic and bit manipulation
- Shifts and rotations
Data transfers:
- Transfers between memory and registers
- Transfers between registers
Execution of flow control:
- Jumps or calls
- Branches based on a condition

Becoming familiar with x86 (IA-32 and x64)

Intel x86 (IA-32 and x64) is the most common architecture used in PCs and is powering many servers, so there is no surprise that most of the malware samples we have at the moment are supporting it. IA-32 is also commonly referred to as i386 (succeeded by i686) or even simply x86 while x64 is also known as x86-64 or AMD64. x86 is a CISC architecture, and it includes multiple complex instructions in addition to simple ones. In this section, we will introduce the most common of them, along with how compilers take advantage of them in their calling conventions.

Registers

Here is a table showing the relationship between registers in IA-32 and x64 architectures:

Figure 3: Registers used in the x86 architecture

r8 to r15 are available only in x64 and not in IA-32, and spl, bpl, sil, and dil can be accessed only in x64.

The first four registers (rax, rbx, rcx, and rdx) are General-Purpose Registers (GPRs), but some of them have the following special use cases for certain instructions:

rax/eax: This is used to store information and it's a special register for some calculations
rcx/ecx: This is used as a counter register in loop instructions
rdx/edx: This is used in division to return the modulus

In x64, the registers from r8 to r15 are also GPRs that were added to the available GPRs.

The rsp/esp register is used as a stack pointer that points to the top of the stack. Its value decreases when there's a value getting pushed to the stack, and increases, when there's a value getting pulled out from the stack. The rbp/ebp register is used as a frame pointer and it's helpful for accessing the function's local variables and arguments, as we will see later in this section. In addition to this, rbp/ebp is sometimes used as a GPR for storing any kind of data.

rsi/esi and rdi/edi are used mostly to define the addresses when copying a group of bytes in memory. The rsi/esi register always plays the role of the source and the rdi/edi register plays the role of the destination. Both registers are non-volatile and are also GPRs .

Special registers

There are two special registers in Intel assembly and they are as follows:

rip/eip: This is an instruction pointer that points to the next instruction to be executed. It cannot be accessed directly but there are special instructions to access it.
rflags/eflags/flags: This register contains the current state of the processor. Its flags are affected by the arithmetic and logical instructions including comparison instructions such as cmp and test, and it's used with conditional jumps and other instructions as well. Here are the most common flags:
- Carry flag (CF): This is when an arithmetic operation goes out of bounds; look at the following operation:

mov al, FFh ;al = 0xFF & CF = 0
 add al, 1 ;al = 0 & CF = 1

- Zero flag (ZF): This flag is set when the arithmetic or a logical operation's result is zero. This could also be set with compare instructions.
- Sign flag (SF): This flag indicates that the result of the operation is negative.
- Overflow flag (OF): This flag indicates that an overflow occurred in an operation, leading to a change of the sign (only on signed numbers), as follows:

mov cl, 7Fh   ;cl = 0x7F (127) & OF = 0
 inc cl      ;cl = 0x80 (-128) & OF = 1

There are other registers as well, such as MMX and FPU registers (and instructions to work with them) but we won't cover them in this chapter.

The instruction structure

For Intel x86 assembly (IA-32 or x64), the common structure of its instructions is opcode, dest, src.

Let's get deeper into them.

opcode

opcode is the name of the instruction. Some instructions have only opcode without any dest or src such as the following:

Nop, pushad, popad, movsb

pushad and popad are not available in x64.

dest

dest represents the destination or where the result of the calculations will be saved, as well as becoming part of the calculations themselves like this:

add eax, ecx ;eax = (eax + ecx)
 sub rdx, rcx ;rdx = (rdx - rcx)

Also, it could play a role of a source and a destination with some opcode instructions that take only dest without a source:

inc eax
 dec ecx

Or, it could be only the source or the destination, such as in case of these instructions that save the value on the stack and then out it back:

push rdx
 pop rcx

dest could look like the following:

REG: A register such as eax and edx.
r/m: A place in memory such as the following:

DWORD PTR [00401000h]
 BYTE PTR [EAX + 00401000h]
 WORD PTR [EDX*4 + EAX+ 30]

A value in the stack (used to represent local variables), such as the following:

DWORD PTR [ESP+4]
 DWORD PTR [EBP-8]

src

src represents the source or another value in the calculations, but it doesn't save the results afterward. It may look like this:

REG: For instance, add rcx, r8
r/m: For instance, add ecx, dword ptr [00401000h]
imm: An immediate value such as mov eax, 00100000h

The instruction set

Here, we will cover the different types of instructions that we listed in the previous section.

Data manipulation instructions

Some of the arithmetic instructions are as follows:

Instruction	Structure	Description
`add`/`sub`	`add`/`sub dest, src`	`dest = dest + src`/`dest = dest - src`
`inc`/`dec`	`inc`/`dec dest`	`dest = dest + 1`/`dest = dest - 1`
`mul`	`mul src`	(Unsigned multiply) `rdx:rax = rax * src`
`div`	`div src`	`rdx:rax`/`src` (returns the result in `rax` and the remainder/modulus in `rdx`)

Additionally, for logic and bits manipulation, they are like this:

Instruction	Structure	Description
`or`/`and`/`xor`	`or`/`and`/`xor dest`, `src`	`dest = dest & src`/`dest = dest` \| `src/dest = dest ^ src`
`not`	`not dest`	`dest = !dest` (the bits are flipped)

And, lastly, for shifts and rotations they are like this:

Instruction

Structure

Description

shl/shr

shl/shr dest, src

(the dest register's maximum number of bits such as 32 or 64)

dest = dest << src/dest = dest >> src
(shifts the dest register's bits to the left or the right, which is the same effect as multiplying or dividing by two src times)

rol/ror

rol/ror dest, src

(same as shl and shr)

Rotates the dest register's bits left or right

Data transfer instructions

There's a mov instruction, which copies a value from src to dest. This instruction has multiple forms, as we can see in this table:

Instruction	Structure	Description
`mov`	`mov dest`, `src`	`dest = src`
`movsx`/`movzx`	`movsx`/`movzx dest` , `src`	`src` is smaller than `dest` (`src` is 16-bits and `dest` is 32-bits) `movzx`: Sets the remaining bits in `dest` to zero `movsx`: Preserves the sign of the `src` value

Other instructions related to stack are like this:

Instruction	Structure	Description
`push`/`pop`	`push`/`pop dest`	Pushes the value to the top of the stack (`esp = esp - 4`)/ pulls the value out of the stack (`esp = esp + 4`)
`pushad`/`popad`	`pushad`/`popad`	Saves all registers to the stack/pulls out all registers from the stack (in x86 only)

For string manipulation, they are like this:

Instruction	Structure	Description
`lodsb`/`lodsw`/`lodsd`/`lodsq`	`lodsb`/`lodsw`/`lodsd`/`lodsq`	Loads a byte, 2 bytes, 4 bytes, or 8 bytes from the address `rsi`/`esi` into `al`/`ax`/`eax`/`rax`
`stosb`/`stosw`/`stosd`/`stosq`	`stosb`/`stosw`/`stosd`/`stosq`	Stores a byte, 2 bytes, 4 bytes, or 8 bytes at the address `rdi`/`edi` from `al`/`ax`/`eax`/`rax`
`movsb`/`movsw`/`movsd`/`movsq`	`movsb`/`movsw`/`movsd`/`movsq`	Copy a byte, 2 bytes, 4 bytes, or 8 bytes from the address `rsi`/`esi` to the address `rdi`/`edi`

Flow control instructions

Some of the unconditional redirections are as follows:

Instruction	Structure	Description
`jmp`	`jmp <relative address>` `jmp DWORD/QWORD ptr [Absolute Address]`	The relative address is calculated from the start of the next instruction after `jmp` to the destination
`call`	`call <relative address>` `call DWORD/QWORD ptr [Absolute Address]`	Same as `jmp` but it saves the return address in the stack
`ret`/`retn`	`ret imm`	Pulls the return address from the stack, for some calling conventions cleans the stack from the pushed arguments, and jumps to that address

Some of the conditional redirections are as follows:

Instruction	Structure	Description
`jnz`/`jz`/`jb`/`ja`	`jz`/`jnz` `<relative address>`	Similar to `jmp`, but jumps based on a condition
`loop`	`loop` `<relative address>`	Similar to `jmp`, but it decrements `rcx`/`ecx` and jumps if it didn't reach zero (uses `rcx`/`ecx` as a loop counter)
`rep`	`rep opcode dest`, `src` (if needed)	`rep` is a prefix that is used with string instructions; it decrements `rcx`/`ecx`, and repeats the instruction until `rcx`/`ecx` reaches zero

Arguments, local variables, and calling conventions (in x86 and x64)

There are multiple ways in which the compilers represent functions, calls, local variables, and more. We will not be covering all of them, but we will be covering some of them. We will cover standard call (stdcall), which is only used in x86, and then we will be covering the differences between the other calls and stdcall.

stdcall

The stack, rsp/esp, and rbp/ebp registers do most of the work when it comes to arguments and local variables. The call instruction saves the return address at the top of the stack before transferring the execution to the new function, and the ret instruction at the end of the function returns the execution back to the caller function using the return address saved in the stack.

Arguments

For stdcall, the arguments are also pushed in the stack from the last argument to the first like this:

Push Arg02
 Push Arg01
 Call Func01

In the Func01 function, the arguments can be accessed by rsp/esp but keeping in mind how many values have been pushed to the top of the stack through time with something like this:

mov eax, [esp + 4] ;Arg01
 push eax
 mov ecx, [esp + 8] ; Arg01 keeping in mind the previous push

In this case, the value located at the address specified by the value inside the square brackets is transferred. Fortunately, modern static analysis tools, such as IDA Pro, can detect which argument is being accessed in each instruction, as in this case.

The most common way to access arguments, as well as local variables, is by using rbp/ebp. First, the called function needs to save the current rsp/esp in rbp/ebp register and then access them this way:

push ebp
 mov ebp, esp
 ...
 mov ecx, [ebp + 8] ;Arg01
 push eax
 mov ecx, [ebp + 8] ;still Arg01 (no changes)

And, at the end of the called function, it returns back the original value of rbp/ebp and the rsp/esp like this:

mov esp,ebp
 pop ebp
 ret

As it's a common function epilogue, Intel created a special instruction for it, which is leave, so it became this:

leave
 ret

Local variables

For local variables, the called function allocates space for them by decreasing the value of the rsp/esp register. To allocate space for two variables of four bytes each, the code will be this:

push ebp
 mov ebp,esp
 sub esp, 8

Additionally, the end of the function will be this:

mov ebp,esp
 pop ebp
 ret

Figure 4: An example of a stack change at the beginning and at the end of the function

Additionally, if there are arguments, the ret instruction cleans the stack given the number of bytes to pull out from the top of the stack like this:

ret 8 ;2 Arguments, 4 bytes each

cdecl

cdecl (which stands for c declaration) is another calling convention that was used by many C compilers in x86. It's very similar to stdcall, with the only difference being that the caller cleans the stack after the callee function (the called function) returns like this:

Caller:
    push Arg02
    push Arg01
    call Callee
    add esp, 8 ;cleans the stack

fastcall

The __fastcall calling convention is also widely used by different compilers, including Microsoft C++ compiler and GCC. This calling convention passes the first two arguments in ecx and edx, and pushes the remaining arguments to the stack. It's only used in x86 as there's only one calling convention for x64 on Windows.

thiscall

For object-oriented programming and for the non-static member functions (such as the classes' functions), the C compiler needs to pass the address of the object whose attribute will be accessed or manipulated using it as an argument.

In GCC compiler, thiscall is almost identical to the cdecl calling convention and it passes the object address as a first argument. But in the Microsoft C++ compiler, it's similar to stdcall and it passes the object address in ecx. It's common to see such patterns in some object-oriented malware families.

The x64 calling convention

In x64, the calling convention is more dependent on the registers. For Windows, the caller function passes the first four arguments to the registers in this order: rcx, rdx, r8, r9, and the rest are pushed back to the stack. While for the other operating systems, the first six arguments are usually passed to the registers in this order: rdi, rsi, rdx, rcx, r8, r9, and the remaining to the stack.

In both cases, the called function cleans the stack after using ret imm, and this is the only way to clean up stack for these operating systems in x64.

Exploring ARM assembly

Most readers are probably more familiar with the x86 architecture, which implements the CISC design, and may wonder—why do we actually need something else? The main advantage of RISC architectures is that processors that implement them generally require fewer transistors, which eventually makes them more energy and heat efficient and reduces the associated manufacturing costs, making them a better choice for portable devices. We start our introduction to RISC architectures with ARM for a good reason—at the moment, this is the most widely used architecture in the world.

The explanation is simple—processors implementing it can be found on multiple mobile devices and appliances such as phones, video game consoles, or digital cameras, heavily outnumbering PCs. For this reason, multiple IoT malware families and mobile malware targeting Android and iOS platforms have payloads for ARM architecture; an example can be seen in the following screenshot:

Figure 5: Disassembled IoT malware targeting ARM-based devices

Thus, in order to be able to analyze them, it is necessary to understand how ARM works first.

ARM originally stood for Acorn RISC Machine, and later for Advanced RISC Machine. Acorn was a British company considered by many as the British Apple, producing some of the most powerful PCs of that time. It was later split into several independent entities with Arm Holdings (currently owned by SoftBank Group) supporting and extending the current standard.

There are multiple operating systems supporting it, including Windows, Android, iOS, various Unix/Linux distributions, and many other lesser known embedded OSes. The support for a 64-bit address space was added in 2011 with the release of the ARMv8 standard.

Overall, the following ARM architecture profiles are available:

Application profiles (suffix A, for example, the Cortex-A family): This implements a traditional ARM architecture and supports a virtual memory system architecture based on a Memory Management Unit (MMU). These profiles support both ARM and Thumb instruction sets (as discussed later).
Real-time profiles (suffix R, for example, the Cortex-R family): This implements a traditional ARM architecture and supports a protected memory system architecture based on a Memory Protection Unit (MPU).
Microcontroller profiles (suffix M, for example, the Cortex-M family): This implements a programmers' model and is designed for integration into Field Programmable Gate Arrays (FPGAs).

Each family has its own corresponding set of associated architectures (for example, the Cortex-A 32-bit family incorporates ARMv7-A and ARMv8-A architectures), which in turn incorporate several cores (for example, ARMv7-R architecture incorporates Cortex-R4, Cortex-R5, and so on).

Basics

Here, we will cover both the original 32-bit and the newer 64-bit architectures. There were multiple versions released over time, starting from the ARMv1. In this book, we will focus on the recent versions of them.

ARM is a load-store architecture; it divides all instructions into the following two categories:

Memory access: Moves data between memory and registers
Arithmetic Logic Unit (ALU) operations: Does computations involving registers

ARM supports arithmetic operations for adding, subtracting, and multiplying, and some new versions, starting from ARMv7, also support division operations. It supports big-endian order, and uses the little-endian format by default.

There are 16 registers visible at any time on the 32-bit ARM: R0-R15. This number is convenient as it takes only 4 bits to define which register is going to be used. Out of them, 13 (sometimes referred to as 14, including R14 , or 15, also including R13) are general-purpose registers: R13 and R15 each have a special function while R14 can take it occasionally. Let's have a look at them in greater detail:

R0-R7: Low registers are the same in all CPU modes.
R8-R12: High registers are the same in all CPU modes except the Fast Interrupt Request (FIQ) mode not accessible by 16-bit instructions.
R13 (also known as SP): Stack pointer—points to the top of the stack, and each CPU mode has its own version of it. It is discouraged to use it as a GPR.
R14 (also known as LR): Link register—in user mode it contains the return address for the current function, mainly when BL (Branch with Link) or BLX (Branch with Link and eXchange) instructions are executed. It can also be used as a GPR if the return address is stored on the stack. Each CPU mode has its own version of it.
R15 (also known as PC): Program counter, points to the currently executed command. It's not a GPR.

Altogether, there are 30 general-purpose 32-bit registers on most of the ARM architectures overall, including the same name instances in different CPU modes.

Apart from these, there are several other important registers, as follows:

Current Program Status Register (CPSR): This contains bits describing a current processor mode, a processor state, and some other values.
Saved Program Status Registers (SPSR): This stores the value of CPSR when the exception is taken, so it can be restored later. Each CPU mode has its own version of it, except the user and system modes, as they are not exception-handling modes.
Application Program Status Register (APSR): This stores copies of the ALU status flags, also known as condition code flags, and on later architectures, it also holds the Q (saturation) and the greater than or equal to (GE) flags.

The number of Floating-Point Registers (FPRs) for a 32-bit architecture may vary, depending on the core, up to 32.

ARMv8 (64-bit) has 31 general-purpose X0-X30 (R0-R30 notation can also be found) and 32 FPRs accessible at all times. The lower part of each register has the W prefix and can be accessed as W0-W30.

There are several registers that have a particular purpose, as follows:

Name	Size	Description
XZR/WZR	64/32 bits, respectively	Zero register
PC	64 bits	Program counter
SP/WSP	64/32 bits, respectively	Current stack pointer
ELR	64 bits	Exception link register
SPSR	32 bits	Saved processor state register

ARMv8 defines four exception levels (EL0-EL3), and each of the last three registers gets its own copy of each of them; ELR and SPSR don't have a separate copy for EL0.

There is no register called X31 or W31; the number 31 in many instructions represents the zero register, ZR (WZR/XZR). X29 can be used as a frame pointer (which stores the original stack position), and X30 as a link register (which stores a return value from the functions).

Regarding the calling convention, R0-R3 on the 32-bit ARM and X0-X7 on the 64-bit ARM are used to store argument values passed to functions with the remaining arguments passed through the stack if necessary, R0-R1 and X0-X7 (and X8, also known as XR indirectly) to hold return results. If the type of the returned value is too big to fit them, then space needs to be allocated and returned as a pointer. Apart from this, R12 (32-bit) and X16-X17 (64-bit) can be used as intra-procedure-call scratch registers (by so-called veneers and procedure linkage table code), R9 (32-bit) and X18 (64-bit) can be used as platform registers (for OS-specific purposes) if needed, otherwise they are used the same way as other temporaries.

As previously mentioned, there are several CPU modes implemented according to the official documentation, as follows:

Operating mode name	Abbreviation	Description
User	`usr`	Usual program execution state, used by most of the programs
Fast interrupt	`fiq`	Supports data transfer or channel process
Interrupt	`irq`	Used for general-purpose interrupt handling
Supervisor	`svc`	Protected mode for the OS
Abort	`abt`	Is entered after a data abort or prefetch abort exception
System	`sys`	Privileged user mode for the OS. Can be entered only from another privileged mode by modifying the mode bit of the CPSR
Undefined	`und`	Is entered when an undefined instruction is executed

Instruction sets

There are several instruction sets available for ARM processors: ARM and Thumb. A processor that is executing ARM instructions is said to be operating in the ARM state and vice versa. ARM processors always start in the ARM state, and then a program can switch to the Thumb state by using a BX instruction. Thumb Execution Environment (ThumbEE) was introduced relatively recently in ARMv7 and is based on Thumb, with some changes and additions to facilitate dynamically generated code.

ARM instructions are 32 bits long (for both AArch32 and AArch64), while Thumb and ThumbEE instructions are either 16 or 32 bits long (originally, almost all Thumb instructions were 16-bit, while Thumb-2 introduced a mix of 16- and 32-bit instructions).

All instructions can be split into the following categories according to the official documentation:

Instruction Group	Description	Examples
Branch and control	These instructions are used to: Follow subroutines Go forward and backwards for conditional structures and loops Make instructions conditional Switch between ARM and Thumb states	`B`: Branch `BX`: Branch and exchange instruction set `CBZ`: Compare against zero and branch `IT`: If-then, makes up to four following instructions conditional (32-bit Thumb)
Data processing	Operate with GPRs, support data movement between registers and arithmetic operations	`ADD`: Add `MOV`: Move data `MUL`: Multiply
Register load and store	Move data between registers and memory	`LDR`: Load register (1 byte) `STRB`: Store register (1 byte) `SWP`: Swap register and memory content
Multiple register load and store	Load or store multiple GPRs from or to memory	`STM`/`LDM`: Store and load multiple registers to and from memory `PUSH`/`POP`: Push and pop registers to and from the stack
Status register access	Move the content of a status register (CPSR or SPSR) to or from a GPR	`MRS`: Move the contents of the CPSR or SPSR to a GPR MSR; load specified fields of the CPSR or SPSR with an immediate value or another register's value
Coprocessor	Extend the ARM architecture; enable control of the system control coprocessor registers (CP15)	`CDP`/`CDP2`: Coprocessor data operations

In order to interact with the OS, syscalls can be accessed using the Software Interrupt (SWI) instruction, which was later renamed the Supervisor Call (SVC) instruction.

See the official ARM documentation to get the exact syntax for any instruction. Here is an example of how it may look:

SVC{cond} #imm

The {cond} code in this case will be a condition code. There are several condition codes supported by ARM, as follows:

EQ: Equal to
NE: Not equal to
CS/HS: Carry set or unsigned higher or both
CC/LO: Carry clear or unsigned lower
MI: Negative
PL: Positive or zero
VS: Overflow
VC: No overflow
HI: Unsigned higher
LS: Unsigned lower or both
GE: Signed greater than or equal to
LT: Signed less than
GT: Signed greater than

LE: Signed less than or equal to
AL: Always (normally omitted)

An imm value stands for the immediate value.

Basics of MIPS

Microprocessor without Interlocked Pipelined Stages (MIPS) was developed by MIPS technologies (formerly MIPS computer systems). Similar to ARM, at first, it was a 32-bit architecture with 64-bit functionality added later. Taking advantage of the RISC ISA, MIPS processors are characterized by low power and heat consumption. They can often be found in multiple embedded systems such as routers and gateways, and several video game consoles such as Sony PlayStation also incorporated them. Unfortunately, due to the popularity of this architecture, the systems implementing it became a target of multiple IoT malware families. An example can be seen in the following screenshot:

Figure 6: IoT malware targeting MIPS-based systems

As the architecture evolved, there were several versions of it, starting from MIPS I and going up to V, and then several releases of the more recent MIPS32/MIPS64. MIPS64 remains backward-compatible with MIPS32. These base architectures can be further supplemented with optional architectural extensions called Application Specific Extension (ASE) and modules to improve performance for certain tasks that are generally not used by the malicious code much. MicroMIPS32/64 are supersets of MIPS32 and MIPS64 architectures respectively, with almost the same 32-bit instruction set and additional 16-bit instructions to reduce the code size. They are used where code compression is required, and are designed for microcontrollers and other small embedded devices.

Basics

MIPS supports bi-endianness. The following registers are available:

32 GPRs r0-r31, 32-bit size on MIPS32 and 64-bit size on MIPS64.
A special-purpose PC register that can be affected only indirectly by some instructions.
Two special-purpose registers to hold the results of integer multiplication and division (HI and LO). These registers and related instructions were removed from the base instruction set in the release of 6 and now exist in the Digital Signal Processor (DSP) module.

The reason behind 32 GPRs is simple—MIPS uses 5 bits to specify the register, so this way, we can have a maximum of 2^5 = 32 different values. Two of the GPRs have a particular purpose, as follows:

Register r0 (sometimes referred to as $0 or $zero) is a constant register and always stores zero, and provides read-only access. It can be used as a /dev/null analog to discard the output of some operation, or as a fast source of a zero value.
r31 (also known as $ra) stores the return address during the procedure call branch/jump and link instructions.

Other registers are generally used for particular purposes, as follows:

r1 (also known as $at): Assembler temporary—used when resolving pseudo-instructions
r2-r3 (also known as $v0 and $v1): Values—hold return function values
r4-r7 (also known as $a0-$a3): Arguments—used to deliver function arguments

r8-r15 (also known as $t0-$t7/$a4-$a7 and $t4-$t7): Temporaries—the first four can also be used to provide function arguments in N32 and N64 calling conventions (another O32 calling convention uses only r4-r7 registers; subsequent arguments are passed on the stack)
r16-r23 (also known as $s0-$s7): Saved temporaries—preserved across function calls
r24-r25 (also known as $t8-$t9): Temporaries
r26-r27 (also known as $k0-$k1): Generally reserved for the OS kernel
r28 (also known as $gp): Global pointer—points to the global area (data segment)
r29 (also known as $sp): Stack pointer
r30 (also known as $s8 or $fp): Saved value/frame pointer—stores the original stack pointer (before the function was called).

MIPS also has the following co-processors available:

CP0: System control
CP1: FPU
CP2: Implementation-specific
CP3: FPU (has dedicated COP1X opcode type instructions)

The instruction set

The majority of the main instructions were introduced in MIPS I and II. MIPS III introduced 64-bit integers and addresses, and MIPS IV and V improved floating-point operations and added a new set to boost the overall efficacy. Every instruction there has the same length—32 bits (4 bytes), and any instruction starts with an opcode that takes 6 bits. The three major instruction formats supported are R, I, and J:

Instruction category	Syntax	Description
R-type	Specifies three registers: an optional shift amount field (for shift and rotate instructions), and an optional function field (for control codes to differentiate between instructions sharing the same opcode).	These instructions are used when all the data values used are located in registers.
I-type	Specifies two registers and an immediate value.	This group is used when the instruction operates with a register and an immediate value, for example, the ones that involve memory operations to store the offset value.
J-type	Has a jump target address after the opcode that takes the remaining bits.	They are used to affect the control flow.

For the FPU-related operations, the analogous FR and FI types exist.

Apart from this, several other less common formats exist, mainly coprocessors and extension-related formats.

In the documentation, registers usually have the following suffixes:

Source (s)
Target (t)
Destination (d)

All instructions can be split into the following several groups depending on the functionality type:

Control flow—mainly consists of conditional and unconditional jumps and branches:
- JR: Jump register (J format)
- BLTZ: Branch on less than zero (I format)
Memory access—load and store operations:
- LB: Load byte (I format)
- SW: Store word (I format)
ALU—covers various arithmetic operations:
- ADDU: Add unsigned (R format)
- XOR: Exclusive or (R format)
- SLL: Shift left logical (R format)
OS interaction via exceptions—interacts with the OS kernel:
- SYSCALL: System call (custom format)
- BREAK: Breakpoint (custom format)

Floating-point instructions will have similar names for the same types of operations in most cases, for example, ADD.S. Some instructions are more unique such as Check for Equal (C.EQ.D).

As we can see here and later, the same basic groups can be applied to virtually any architecture, and the only difference will be in the implementation. Some common operations may get their own instructions to benefit from optimizations and, in this way, reduce the size of the code and improve the performance.

As the MIPS instruction set is pretty minimalistic, the assembler macros called pseudo-instructions also exist. Here are some of the most commonly used:

ABS: Absolute value—translates to a combination of ADDU, BGEZ, and SUB
BLT: Branch on less than—translates to a combination of SLT and BNE
BGT/BGE/BLE: Similar to BLT
LI/LA: Load immediate/address—translates to a combination of LUI and ORI or ADDIU for a 16-bit LI
MOVE: Moves the content of one register into another—translates to ADD/ADDIU with a zero value
NOP: No operation—translates to SLL with zero values
NOT: Logical NOT—translates to NOR

Diving deep into PowerPC

PowerPC stands for Performance Optimization With Enhanced RISC—Performance Computing and sometimes spelled as PPC. It was created in the early 1990s by the alliance of Apple, IBM, and Motorola (commonly abbreviated as AIM). It was originally intended to be used in PCs and was powering Apple products including PowerBooks and iMacs up until 2006. The CPUs implementing it can also be found in game consoles such as Sony PlayStation 3, XBOX 360, and Wii, and in IBM servers and multiple embedded devices, such as car and plane controllers and even in the famous ASIMO robot. Later, the administrative responsibilities were transferred to an open standards body, Power.org, where some of the former creators remained members, such as IBM and Freescale. They then separated from Motorola and were later acquired by NXP Semiconductors, as well as many new entities. The OpenPOWER Foundation is a newer initiative by IBM, Google, NVIDIA, Mellanox, and Tyan, which is aiming to facilitate collaboration in the development of this technology.

PowerPC was mainly based on IBM POWER ISA and, later, a unified Power ISA was released, which combined POWER and PowerPC into a single ISA that is now used in multiple products under a Power Architecture umbrella term.

There are plenty of IoT malware families that have payloads for this architecture.

Basics

The Power ISA is divided into several categories; each category can be found in a certain part of the specification or book. CPUs implement a set of these categories depending on their class; only the base category is an obligatory one.
Here is a list of the main categories and their definitions in the latest second standard:

Base: Covered in Book I (Power ISA User Instruction Set Architecture) and Book II (Power ISA Virtual Environment Architecture)
Server: Covered in Book III-S (Power ISA Operating Environment Architecture – Server Environment)
Embedded: Book III-E (Power ISA Operating Environment Architecture – Embedded Environment)

There are many more granular categories covering aspects such as floating-point operations and caching for certain instructions.

Another book, Book VLE (Power ISA Operating Environment Architecture – Variable Length Encoding (VLE) Instructions Architecture), defines alternative instructions and definitions intended to increase the density of the code by using 16-bit instructions as opposed to the more common 32-bit ones.

Power ISA version 3 consists of three books with the same names as Books I to III of the previous standard, without distinctions between environments.

The processor starts in the big-endian mode but can switch by changing a bit in the MSR (Machine State Register), so that bi-endianness is supported.

There are many sets of registers documented in Power ISA, mainly grouped around either an associated facility or a category. Here is a basic summary of the most commonly used ones:

32 GPRs for integer operations, generally used by their number only (64-bit)
64 Vector Scalar Registers (VSRs) for vector operations and floating-point operations:
- 32 Vector Registers (VRs) as part of the VSRs for vector operations (128-bit)
- 32 FPRs as part of the VSRs for floating-point operations (64-bit)
Special purpose fixed-point facility registers, such as the following:
- Fixed-point exception register (XER)—contains multiple status bits (64-bit)
Branch facility registers:
- Condition Register (CR)—consists of 8 4-bit fields, CR0-CR7, involving things like control flow and comparison (32-bit)
- Link Register (LR)—provides the branch target address (64-bit)
- Count Register (CTR)—holds a loop count (64-bit)
- Target Access Register (TAR)—specifies branch target address (64-bit)
Timer facility registers:
- Time Base (TB)—is incremented periodically with the defined frequency (64-bit)
Other special purpose registers from a particular category, including the following:
- Accumulator (ACC) (64-bit)—the Signal Processing Engine (SPE) category

Generally, functions can pass all arguments in registers for non-recursive calls; additional arguments are passed on the stack.

The instruction set

Most of the instructions are 32-bit size, only the Variable-Length Encoding (VLE) group is smaller in order to provide a higher code density for embedded applications. All instructions are split into the following three categories:

Defined: All of the instructions are defined in the Power ISA books.
Illegal: Available for future extensions of the Power ISA. An attempt to execute them will invoke the illegal instruction error handler.
Reserved: Allocated to specific purposes that are outside the scope of the Power ISA. An attempt to execute them will either perform an implemented action or invoke the illegal instruction error handler if the implementation is not available.

Bits 0 to 5 always specify the opcode, and many instructions also have an extended opcode. A large number of instruction formats are supported; here are some examples:

I-FORM [OPCD+LI+AA+LK]
B-FORM [OPCD+BO+BI+BD+AA+LK]

Each instruction field has its own abbreviation and meaning; it makes sense to consult the official Power ISA document to get a full list of them and their corresponding formats. In the case of the previously mentioned I-FORM, they are as follows:

OPCD: Opcode
LI: Immediate field used to specify a 24-bit signed two's complement integer
AA: Absolute address bit
LK: Link bit affecting the link register

Instructions are also split into groups according to the associated facility and category, making them very similar to registers:

Branch instructions:
- b/ba/bl/bla: Branch
- bc/bca/bcl/bcla: Branch conditional
- sc: System call
Fixed-point instructions:
- lbz: Load byte and zero
- stb: Store byte
- addi: Add immediate
- ori: Or immediate

Floating-point instructions:
- fmr: Floating move register
- lfs: Load floating-point single
- stfd: Store floating-point double
SPE instructions:
- brinc: Bit-reversed increment

Covering the SuperH assembly

SuperH, often abbreviated as SH, is a RISC ISA developed by Hitachi. SuperH went through several iterations, starting from SH-1 and moving up to SH-4. The more recent SH-5 has two modes of operation, one of which is identical to the user-mode instructions of SH-4, while another, SHmedia, is quite different. Each family takes its own market niche:

SH-1: Home appliances
SH-2: Car controllers and video game consoles such as Sega Saturn
SH-3: Mobile applications such as car navigators
SH-4: Car multimedia terminals and video game consoles such as Sega Dreamcast
SH-5: High-end multimedia applications

Microcontrollers and CPUs implementing it are currently produced by Renesas Electronics, a joint venture of the Hitachi and Mitsubishi Semiconductor groups. As IoT malware mainly targets SH-4-based systems, we will focus on this SuperH family.

Basics

In terms of registers, SH-4 offers the following:

16 general registers R0-R15 (32-bit)
7 control registers (32-bit):
- Global Base Register (GBR)
- Status Register (SR)
- Saved Status Register (SSR)
- Saved Program Counter (SPC)
- Vector Base Counter (VBR)
- Saved General Register 15 (SGR)
- Debug Base Register (DBR) (only from the privileged mode)
4 system registers (32-bit):
- MACH/MACL: Multiply-and-accumulate registers
- PR: Procedure register
- PC: Program counter
- FPSCR: Floating-point status/control register
32 FPU registers FR0-FR15 (also known as DR0/2/4/... or FV0/4/...) and XF0-XF15 (also known as XD0/2/4/... or XMTRX); two banks of either 16 single-precision (32-bit) or eight double-precision (64-bit) FPRs and FPUL (floating-point communication register) (32-bit)

Usually, R4-R7 are used to pass arguments to a function with the result returned in R0. R8-R13 are saved across multiple function calls. R14 serves as the frame pointer and R15 as a stack pointer.

Regarding the data formats, in SH-4, a word takes 16 bits, a long word takes 32 bits, and a quad word takes 64 bits.

Two processor modes are supported: user mode and privileged mode. SH-4 generally operates in the user mode and switches to the privileged mode in case of an exception or an interrupt.

The instruction set

The SH-4 features instruction set that is upward-compatible with the SH-1, SH-2, and SH-3 families. It uses 16-bit fixed length instructions in order to reduce the program code size. Except for BF and BT, all branch instructions and the RTE (return from exception instruction) implement so-called delayed branches, where the instruction following the branch is executed before the branch destination instruction.

All instructions are split into the following categories (with some examples):

Fixed-point transfer instructions:
- MOV: Move data (or particular data types specified)
- SWAP: Swap register halves
Arithmetic operation instructions:
- SUB: Subtract binary numbers
- CMP/EQ: Compare conditionally (in this case on equal to)

Logic operation instructions:
- AND: AND logical
- XOR: Exclusive OR logical
Shift instructions:
- ROTL: Rotate left
- SHLL: Shift logical left
Branch instructions:
- BF: Branch if false
- JMP: Jump (unconditional branch)
System control instructions:
- LDC: Load to control register
- STS: Store system register
Floating-point single-precision instructions:
- FMOV: Floating-point move
Floating-point double-precision instructions:
- FABS: Floating-point absolute value
Floating-point control instructions:
- LDS: Load to FPU system register
Floating-point graphics acceleration instructions
- FIPR: Floating-point inner product

Working with SPARC

Scalable Processor Architecture (SPARC) is a RISC ISA that was originally developed by Sun Microsystems (now part of the Oracle corporation). The first implementation was used in Sun's own workstation and server systems. Later, it was licensed to multiple other manufacturers, one of them being Fujitsu. As Oracle terminated SPARC Design in 2017, all future development continued with Fujitsu as the main provider of SPARC servers.

Several fully open source implementations of SPARC architecture exist. Multiple operating systems are currently supporting it, including Oracle Solaris, Linux, and BSD systems, and multiple IoT malware families have dedicated modules for it as well.

Basics

According to the Oracle SPARC Architecture documentation, the particular implementation may contain between 72 and 640 general-purpose 64-bit R registers. However, only 31/32 GPRs are immediately visible at any one time; 8 are global registers, R[0] to R[7] (also known as g0-g7), with the first register, g0, hardwired to 0; and 24 are associated with the following register windows:

Eight in registers in[0]-in[7] (R[24]-R[31]): For passing arguments and returning results
Eight local registers local[0]-local[7] (R[16]-R[23]): For retaining local variables
Eight out registers out[0]-out[7] (R[8]-R[15]): For passing arguments and returning results

The CALL instruction writes its own address into the out[7] (R[15]) register.

In order to pass arguments to the function, they must be placed in the out registers and, when the function gets control, it will access them in its in registers. Additional arguments can be provided through the stack. The result is placed to the first in register, which then becomes the first out register when the function returns. The SAVE and RESTORE instructions are used in this switch to allocate a new register window and later restore the previous one, respectively.

SPARC also has 32 single-precision FPRs (32-bit), 32 double-precision FPRs (64-bit), and 16 quad-precision FPRs (128- bit), some of which overlap.

Apart from that, there are many other registers that serve specific purposes, including the following:

FPRS: Contains the FPU mode and status information
Ancillary state registers (ASR 0, ASR 2-6, ASR 19-22, and ASR 24-28 are not reserved): Serve multiple purposes, including the following:
- ASR 2: Condition Codes Register (CCR)
- ASR 5: PC
- ASR 6: FPRS
- ASR 19: General Status Register (GSR)

Register-Window PR state registers (PR 9-14): Determine the state of the register windows including the following:
- PR 9: Current Window Pointer (CWP)
- PR 14: Window State (WSTATE)
Non-register-Window PR state registers (PR 0-3, PR 5-8 and PR 16): Visible only to software running in the privileged mode

32-bit SPARC uses big-endianness, while 64-bit SPARC uses big-endian instructions but can access data in any order. SPARC also uses a notion of traps that implement a transfer of control to privileged software using a dedicated table that may contain the first 8 instructions (32 for some frequently used traps) of each trap handler. The base address of the table is set by software in a Trap Base Address (TBA) register.

The instruction set

The instruction from the memory location, which is specified by the PC, is fetched and executed, and then new values are assigned to the PC and the Next Program Counter (NPC), which is a pseudo-register.

Detailed instruction formats can be found in the individual instruction descriptions.

Here are the basic categories of instructions supported, with examples:

Memory access:
- LDUB: Load unsigned byte
- ST: Store
Arithmetic/logical/shift integers:
- ADD: Add
- SLL: Shift left logical
Control transfer:
- BE: Branch on equal
- JMPL: Jump and link
- CALL: Call and link
- RETURN: Return from the function
State register access:
- WRCCR: Write CCR

Floating-point operations:
- FOR: Logical OR for F registers
Conditional move:
- MOVcc: Move if the condition is True for the selected condition code (cc)
Register window management:
- SAVE: Save caller's window
- FLUSHW: Flush register windows
Single Instruction Multiple Data (SIMD) instructions:
- FPSUB: Partitioned integer subtraction for F registers

Moving from assembly to high-level programming languages

Developers mostly don't write in assembly. Instead, they write in higher-level languages, such as C or C++, and the compiler converts this high-level code into a low-level representation in assembly language. In this section, we will look at different code blocks represented in the assembly.

Arithmetic statements

Now we will look at different C statements and how they are represented in the assembly. We will take Intel IA-32 as an example and the same concept applies to other assembly languages as well:

X = 50 (assuming 0x00010000 is the address of the X variable in memory):

mov eax, 50
 mov dword ptr [00010000h],eax

X = Y+50 (assuming 0x00010000 represents X and 0x00020000 represents Y):

mov eax, dword ptr [00020000h]
add eax, 50
mov dword ptr [00010000h],eax

X = Y + (50 * 2):

mov eax, dword ptr [00020000h]
 push eax ;save Y for now
 mov eax, 50 ;do the multiplication first
 mov ebx,2
 imul ebx ;the result is in edx:eax
 mov ecx, eax
 pop eax ;gets back Y value
 add eax,ecx
 mov dword ptr [00010000h],eax

X = Y + (50 / 2):

mov eax, dword ptr [00020000h]
 push eax ;save Y for now
 mov eax, 50
 mov ebx,2
 div ebx ;the result in eax, and the remainder is in edx
 mov ecx, eax
 pop eax
 add eax,ecx
 mov dword ptr [00010000h],eax

X = Y + (50 % 2) (% represents the remainder):

mov eax, dword ptr [00020000h]
 push eax ;save Y for now
 mov eax, 50
 mov ebx,2
 div ebx ;the remainder is in edx
 mov ecx, edx
 pop eax
 add eax,ecx
 mov dword ptr [00010000h],eax

Hopefully, this explains how the compiler converts these arithmetic statements to assembly language.

If conditions

Basic If statements may look like this:

If (X == 50) (assuming 0x0001000 represents the X variable):

mov eax, 50
cmp dword ptr [00010000h],eax

If (X | 00001000b) (| represents the OR logical gate):

mov eax, 000001000b
test dword ptr [00010000h],eax

In order to understand the branching and flow redirection, let's take a look at the following diagram to see how it's manifested in pseudocode:

Figure 7: Conditional flow redirection

To apply this branching sequence in assembly, the compiler uses a mix of conditional and unconditional jumps, as follows:

IF.. THEN.. ENDIF:

cmp dword ptr [00010000h],50
 jnz 3rd_Block ; if not true
 …
 Some Code
 …
 3rd_Block:
 Some code

IF.. THEN.. ELSE.. ENDIF:

cmp dword ptr [00010000h],50
 jnz Else_Block ; if not true
 ...
 Some code
 ...
 jmp 4th_Block ;Jump after Else
 Else_Block:
 ...
 Some code
 ...
 4th_Block:
...
 Some code

While loop conditions

The while loop conditions are quite similar to if conditions in terms of how they are represented in assembly:

`While (X == 50){` `…` `}`	`1st_Block:` `cmp dword ptr [00010000h],50` `jnz 2nd_Block ; if not true` `…` `jmp 1st_Block` `2nd_Block:` `…`
`Do{` `}While(X == 50)`	`1st_Block:` `…` `cmp dword ptr [00010000h],50` `jz 1st_Block ; if true`

Summary

In this chapter, we covered the essentials of computer programming and described universal elements shared between multiple CISC and RISC architectures. Then, we went through multiple assembly languages including the ones behind Intel x86, ARM, MIPS, and others, and understood their application areas that eventually shaped the design and structure. We also covered the fundamental basics of each of them, learned the most important notions (such as the registers used and CPU modes supported), got an idea of how the instruction sets look, discovered what opcode formats are supported there, and explored what calling conventions are used.

Finally, we went from the low-level assembly languages to their high-level representation in C or other similar languages, and became familiar with a set of examples for universal blocks, such as if conditions and loops.

After reading this chapter, you should have the ability to read the disassembled code of different assembly languages and be able to understand what high-level code it could possibly represent. While not aiming to be completely comprehensive, the main goal of this chapter is to provide a strong foundation, as well as a direction that you can follow in order to deepen your knowledge before starting analysis on actual malicious code. It should be your starting point for learning how to perform static code analysis on different platforms and devices.

In Chapter 2, Basic Static and Dynamic Analysis for x86/x64, we will start analyzing the actual malware for particular platforms, and the instruction sets we have become familiar with will be used as languages describing its functionality.

Table of Contents for A Crash Course in CISC/RISC and Programming Basics

Create new playlist

Sign In

Sign Up

Basic concepts

Registers

Memory

Virtual memory

Stack

Branches, loops, and conditions

Exceptions, interrupts, and communicating with other devices

Assembly languages

CISC versus RISC

Types of instructions

Becoming familiar with x86 (IA-32 and x64)

Registers

Special registers

The instruction structure

opcode

dest

src

The instruction set

Data manipulation instructions

Data transfer instructions

Flow control instructions

Arguments, local variables, and calling conventions (in x86 and x64)

stdcall

Arguments

Local variables

cdecl

fastcall

thiscall

The x64 calling convention

Exploring ARM assembly

Basics

Instruction sets

Basics of MIPS

Basics

The instruction set

Diving deep into PowerPC

Basics

The instruction set

Covering the SuperH assembly

Basics

The instruction set

Working with SPARC

Basics

The instruction set

Moving from assembly to high-level programming languages

Arithmetic statements

If conditions

While loop conditions

Summary

Table of Contents for
A Crash Course in CISC/RISC and Programming Basics