CMU 15-213 Intro to Computer Systems Lecture 5

课程主页：http://www.cs.cmu.edu/afs/cs/academic/class/15213-f15/www/schedule.html

课程资料：https://github.com/EugeneLiu/translationCSAPP

课程视频：https://www.bilibili.com/video/av31289365/

这一讲介绍了机器级的程序表示。

C，汇编，机器码

定义

体系结构：（也称ISA：指令集体系结构）处理器设计中需要理解或编写汇编/机器代码的部分。
- 例子：指令集规范，寄存器。
微体系结构：体系结构的实现。
- 例子：缓存大小和核心频率。
代码形式：
- 机器代码：处理器执行的字节级程序
- 汇编代码：机器代码的文本表示
ISA例子：
- 英特尔：x86，IA32，Itanium，x86-64
- ARM：几乎用于所有手机

汇编/机器代码视图

程序员可见状态

PC：程序计数器
- 下一条指令的地址
- 称为“RIP”（x86-64）
内存
- 字节数组
- 代码和用户数据
- 堆叠起来以支持程序
寄存器
- 大量使用的程序数据
条件码
- 存储有关最新算术或逻辑运算的状态信息
- 用于条件分支

将C转换为目标代码

文件中的代码p1.c,p2.c
使用命令编译：gcc -Og pl.c p2.c -o p
- 使用基本优化（-Og）
- 二进制结果在文件p中

编译成汇编码

C代码（sum.c）

long plus(long x, long y);
void sumstore(long x, long y, long *dest)
{
    long t = plus(x, y);
    *dest = t;
}

利用

gcc –Og –S sum.c

生成文件sum,s，产生x86-64汇编代码

sumstore:
    pushq %rbx
    movq %rdx, %rbx
    call plus
    movq %rax, (%rbx)
    popq %rbx
    ret

汇编码特征：数据类型

1、2、4或8个字节的“整数”数据
- 数据值
- 地址（无类型的指针）
4、8或10个字节的浮点数据
代码：字节序列编码的一系列指令
没有例如数组或者结构之类的聚合类型的结构
- 只是在内存中连续分配字节

目标代码

sumstore的目标代码：

汇编器
- 将.s转换成.o
- 将每个指令转换为二进制编码
- 几乎是可执行代码
- 不同文件中的代码之间缺少链接
连接器
- 解决文件之间的引用
- 与静态运行库结合
  - 例如，malloc，printf的代码
- 一些库是动态链接的
  - 程序开始执行时发生连接

机器指令例子

C代码
- ```c
*dest = t;
```
    - 将t存储到dest的位置
  
  - 汇编码
  
    - ```
      movq %rax, (%rbx)
```
- 将8字节的值移到内存中
- 操作数：
  - t：寄存器%rax
  - dest：寄存器%rbx
  - *dest：内存M[%rbx]

目标码

```
0x40059e: 48 89 03

    
  - 3字节指令
    
  - 存储在地址0x40059e



#### 反汇编目标代码

目标码很难阅读，可以利用反汇编器生成方便阅读的内容：

```assembly
0000000000400595 <sumstore>:
    400595: 53 			    push %rbx
    400596: 48 89 d3		mov %rdx,%rbx
    400599: e8 f2 ff ff ff	callq 400590 <plus>
    40059e: 48 89 03 		mov %rax,(%rbx)
    4005a1: 5b 			    pop %rbx
    4005a2: c3 			    retq

反汇编器

```shell
objdump –d sum


  - 检查目标代码的有用工具

  - 分析一系列指令的位模式

  - 产生近似的汇编代码

  - 可以在a.out（完整的可执行文件）或.o文件上运行

- 那些文件可以被反汇编

  - 任何可以解释为可执行代码的文件

  - 反汇编程序检查字节并重建汇编源

  - 例如反汇编word

objdump -d WINWORD.EXE




### 汇编基础知识：寄存器，操作数，移动

#### X86-64寄存器

![](https://github.com/Doraemonzzz/md-photo/blob/master/CMU%2015-213%20Intro%20to%20Computer%20Systems/Lecture5/2020051803.jpg?raw=true)



#### 移动数据

- 移动数据

movq Source, Dest:


- 操作数类型

  - 立即数（Immediate）：常量整数数据
    -  例如：$\$ 0 \times 400, \$-533$
    - 类似于C常量，但以$\$$开头
    - 1,2，或4个字节编码
  - 寄存器（Register）：16个整数寄存器之一
    - 例子：%rax, %r13
    - 但是%rsp保留作特殊用途
  - 内存（Memory）：在寄存器指定的地址处连续8个字节的内存
    - 最简单的例子：(%rax)
    - 其他各种“地址模式”

各种组合如下：

![](https://github.com/Doraemonzzz/md-photo/blob/master/CMU%2015-213%20Intro%20to%20Computer%20Systems/Lecture5/2020051804.jpg?raw=true)



#### 寻址模式的例子

```c
void swap(long *xp, long *yp)
{
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}

汇编代码为

//xp in rdi, yp in rsi
swap:
    movq (%rdi), %rax # t0 = *xp
    movq (%rsi), %rdx # t1 = *yp
    movq %rdx, (%rdi) # *xp = t1
    movq %rax, (%rsi) # *yp = t0
    ret

图示如下：

初始情形：

代码的对应的操作：

完整的地址寻址模式

最一般的形式
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+D]
- D：常量“位移” ，$ 1,2$或4个字节
- Rb：基地址寄存器：16个整数寄存器中的任何一个
- Ri：索引寄存器：任何，除了%rsp
- S：比例系数：$ 1,2,4$或$8$
特殊情形
$\begin{aligned} &(\mathrm{Rb}, \mathrm{Ri}) &&\text{Mem[Reg[Rb]+Reg[Ri]]} \\ &\mathrm{D}(\mathrm{Rb}, \mathrm{Ri}) && \text{Mem[Reg[Rb]+Reg[Ri]+D]} \\ &(\mathrm{Rb}, \mathrm{Ri}, \mathrm{S}) && \text{Mem[Reg[Rb]+S*Reg[Ri]]} \end{aligned}$

全部情形如下：

例子

$\begin{array}{|l|l|} \hline \% \mathrm{rdx} & 0 \mathrm{xf} 000 \\ \hline \% \mathrm{rcx} & 0 \mathrm x0100 \\ \hline \end{array}$

算术，逻辑运算

全部算术操作如下

下面分别介绍。

地址计算指令

leaq Src，Dst
- Src是地址模式表达式
- 将Dst设置为表达式表示的地址
用途
- 在没有内存引用的情况下计算地址
  - 例如，翻译$ p = \& x [i] $
- 计算形式为$ x + k * y $的算术表达式
  - $ k = 1,2,4，$或8

例子：

long m12(long x) 
{
	return x*12; 
}

转换成汇编语言得到

leaq (%rdi,%rdi,2), %rax # t <- x+x*2
salq $2, %rax 			# return t<<2

某些算数指令

二元指令
$\begin{array}{lll} \text { Format } && \text { Computation } & \\ \text { addq } & \text { Src,Dest } & \text { Dest }=\text { Dest }+\text { Src } \\ \text { subq } & \text { Src,Dest } & \text { Dest }=\text { Dest }-\text { Src } \\ \text { imulq } & \text { Src,Dest } & \text { Dest }=\text { Dest } * \text { Src } \\ \text { salq } & \text { Src,Dest } & \text { Dest }=\text { Dest }<<\text { Src } \\ \text { sarq } & \text { Src,Dest } & \text { Dest }=\text { Dest }>>\text { Src } \\ \text { shrq } & \text { Src,Dest } & \text { Dest }=\text { Dest }>>\text { Src } \\ \text { xorq } & \text { Src,Dest } & \text { Dest }=\text { Dest }^{\wedge} \text { Src } \\ \text { andq } & \text { Src,Dest } & \text { Dest }=\text { Dest & Src } \\ \text { orq } & \text { Src,Dest } & \text { Dest }=\text { Dest } | \text { Src } \end{array}$
一元指令
$\begin{array}{lll} \text { Format } & &\text { Computation } & \\ \text { incq } & \text { Dest } & \text { Dest }=\text { Dest }+1 \\ \text { decq } & \text { Dest } & \text { Dest }=\text { Dest }-1 \\ \text { negq } & \text { Dest } & \text { Dest }=-\text { Dest } \\ \text { notq } & \text { Dest } & \text { Dest }=\sim \text {Dest } \end{array}$

算术表达式例子

long arith(long x, long y, long z)
{
    long t1 = x+y;
    long t2 = z+t1;
    long t3 = x+4;
    long t4 = y * 48;
    long t5 = t3 + t4;
    long rval = t2 * t5;
    return rval;
}

汇编代码为

arith:
    leaq (%rdi,%rsi), %rax 		# t1
    addq %rdx, %rax 	    	# t2
    leaq (%rsi,%rsi,2), %rdx     # 3y
    salq $4, %rdx           	# t4
    leaq 4(%rdi,%rdx), %rcx 	# t5
    imulq %rcx, %rax        	# rval
    ret

其中参数的对应关系如下：

总结

英特尔处理器和架构的历史
C，汇编，机器码
- 可见状态的新形式：程序计数器，寄存器，
- 编译器必须将语句，表达式，过程转换为低级指令序列
汇编基础知识：寄存器，操作数，移动
- x86-64移动指令涵盖了多种数据移动形式
算术
- C编译器将找出不同的指令组合来进行计算