当前位置：首页 > news >正文

操作系统：线程概念与控制

news 2025/11/5 22:19:27

操作系统：线程概念与控制

操作系统：线程概念与控制
- 引言
- 1. Linux线程概念
- - 1.1 线程的基本概念
  - 1.2 分页式存储管理
  - - 1.2.1 虚拟地址和页表的由来（了解）
    - 1.2.2 物理内存管理（了解）
    - 1.2.3 页表（了解）
    - 1.2.4 页目录结构（了解）
    - 1.2.5 两级页表的地址转换（重点）
    - 1.2.6 缺页异常
  - 1.3 线程的优点
  - 1.4 线程的缺点
  - 1.5 线程的异常
  - 1.6 线程的用途
- 2. 线程VS进程
- - 2.1 进程的多个线程共享
- 3. Linux线程控制
- - 3.1 POSIX线程库
  - 3.2 创建线程
  - 3.3 线程终止
  - 3.4 线程等待
  - 3.5 线程分离
- 4. 线程ID及进程地址空间布局
- 5. 线程封装
- 6. 附录
- - 6.1 线程栈
  - 6.2 线程相关源码（辅助理解线程）

引言

本文将深入探讨线程的基本概念、Linux中的线程实现原理、线程与进程的区别，以及线程的创建、终止、等待和分离等控制方法。此外，我们还将分析线程的优缺点、异常处理机制，并通过源码解析帮助读者理解线程在操作系统底层的实现细节。

1. Linux线程概念

1.1 线程的基本概念

在一个程序里的一个执行路线就叫做线程（thread）。更准确的定义是：线程是“一个进程内部的执行分支”。
一个进程至少都有一个执行线程。
线程在进程内部运行，本质是在进程地址空间内运行。
在Linux系统中，在CPU眼中，看到的线程PCB都要比传统的进程更加轻量化，所以我们称线程为轻量化进程。
透过进程虚拟地址空间，可以看到进程的大部分资源。将进程资源合理分配给每个执行流，就形成了线程执行流。
从内核和资源角度来看：

进程：承担分配系统资源的基本实体。

线程：CPU调度的基本单位。

在这里插入图片描述

1.2 分页式存储管理

1.2.1 虚拟地址和页表的由来（了解）

如果没有虚拟内存和分页机制的情况下，每一个用户程序在物理内存上所对应的空间必须是连续的。这就会出现物理内存中有很多外部碎片的问题。为了解决这个问题，我们希望操作系统提供给用户的空间是连续的，但是物理内存最好不要连续。这就有了我们的虚拟内存和分页式存储管理。

在这里插入图片描述

物理内存按照一个固定的长度的页框进行分割，有时叫做物理页。每个页框包含一个物理页（page）。一个页的大小等于页框的大小。大多数32位体系结构支持4KB的页，而64位体系结构一般会支持8KB的页。区分一页和一个页框是很重要的：

页框是一个存储区域；

而页是一个数据块，可以存放在任何页框或磁盘中。

有了这种机制，CPU便并非是直接访问物理内存地址，而是通过虚拟地址空间来间接的访问物理内存地址。所谓的虚拟地址空间，是操作系统为每一个正在执行的进程分配的一个逻辑地址，在32位机上，其范围从0 ~ 4G-1。

操作系统通过将虚拟地址空间和物理内存地址之间建立映射关系，也就是页表，这张表上记录了每一对页和页框的映射关系，能让CPU间接的访问物理内存地址。

总结一下，其思想是**将虚拟内存下的逻辑地址空间分为若干页，将物理内存空间分为若干页框，通过页表便能把连续的虚拟内存，映射到若干个不连续的物理内存页。**这样就解决了使用连续的物理内存造成的碎片问题。

1.2.2 物理内存管理（了解）

**假设一个可用的物理内存有 4GB 的空间。按照一个页框的大小 4KB 进行划分，4GB 的空间就是 4GB/4KB = 1048576 个页框。**有这么多的物理页，操作系统肯定是要将其管理起来的，操作系统需要知道哪些页正在被使用，哪些页空闲等等。

内核用 struct page 结构表示系统中的每个物理页，出于节省内存的考虑，struct page 中使用了大量的联合体 union。

/* include/linux/mm_types.h */
struct page {/* 原子标志，有些情况下会异步更新 */unsigned long flags;union {struct {/* 换出页列表，例如由zone->lru_lock保护的active_list */struct list_head lru;/* 如果最低位为0，则指向inode* address_space，或为NULL* 如果页映射为匿名内存，最低位置位* 而且该指针指向anon_vma对象*/struct address_space* mapping;/* 在映射内的偏移量 */pgoff_t index;/** 由映射私有，不透明数据* 如果设置了PagePrivate，通常用于buffer_heads* 如果设置了PageSwapCache，则用于swp_entry_t* 如果设置了PG_buddy，则用于表示伙伴系统中的阶*/unsigned long private;};struct { /* slab, slob and slub */union {struct list_head slab_list; /* uses lru */struct { /* Partial pages */struct page* next;
#ifdef CONFIG_64BITint pages;     /* 剩余的页数 */int pobjects;  /* 近似对象数量 */
#elseshort int pages;short int pobjects;
#endif};};struct kmem_cache* slab_cache; /* not slob *//* Double-word boundary */void* freelist; /* 第一个空闲对象 */union {void* s_mem; /* slab: 第一个对象 */unsigned long counters; /* SLUB */struct { /* SLUB */unsigned inuse : 16;  /* 用于SLUB分配器：正在使用的对象数 */unsigned objects : 15;unsigned frozen : 1;};};};};union {/* 内存管理子系统中映射的页表项计数，用于表示页是否已经映射，还用于限制逆向映射搜索 */atomic_t _mapcount;unsigned int page_type;unsigned int active; /* SLAB */int units; /* SLOB */};
#if defined(WANT_PAGE_VIRTUAL)/* 内核虚拟地址（如果没有映射则为NULL，即高端内存） */void* virtual;
#endif /* WANT_PAGE_VIRTUAL */...
};

其中比较重要的几个参数：

flags：用来存放页的状态。这些状态包括页是不是脏的，是不是被锁定在内存中等。flags 的每一位单独表示一种状态，所以它至少可以同时表示出 32 种不同的状态。这些标志定义在 <linux/page-flags.h> 中。其中一些比特位非常重要，如 PG_locked 用于指定页是否锁定，PG_uptodate 用于表示页的数据已经从块设备读取并且没有出现错误。
_mapcount：表示在页表中有多少项指向该页，也就是这一页被引用了多少次。当计数值变为 -1 时，就说明当前内核并没有引用这一页，于是在新的分配中就可以使用它。
virtual：是页的虚拟地址。通常情况下，它就是页在虚拟内存中的地址。有些内存（即所谓的高端内存）并不永久地映射到内核地址空间上。在这种情况下，这个域的值为 NULL，需要的时候，必须动态地映射这些页。

要注意的是，struct page 与物理页相关，而并非与虚拟页相关。而系统中的每个物理页都要分配一个这样的结构体。让我们来算算，对所有这些页都这么做，到底要消耗掉多少内存。

算 struct page 占 40 个字节的内存，假定系统的物理页大小为 4KB，系统有 4GB 物理内存。那么系统中共有页框 1048576 个（1 兆个），所以描述这么多页框的 struct page 结构体消耗的内存只不过 40MB。相对于系统 4GB 内存而言，仅是很小的一部分罢了。因此，要管理系统中这么多物理页框，这个代价并不算太大。

要知道的是，页的大小对于内存利用和系统开销来说非常重要。页太大，必然会剩余较大不能利用的空间（页内碎片）；页太小，虽然可以减小页内碎片的大小，但是页太多，会使得页表太长而占用大量内存，同时系统频繁地进行页转换，加重系统开销。因此，页的大小应该适中，通常为 512B - 8KB。Windows 系统的页框大小为 4KB。

1.2.3 页表（了解）

在32位系统中，页表中的每一个表项指向一个物理页的开始地址。虚拟内存的最大空间为4GB，这是每一个用户程序都拥有的虚拟内存空间。为了让4GB的虚拟内存全部可用，页表需要能够表示全部的4GB空间，因此需要 4GB ÷ 4KB = 1048576 个表项。如下图所示：

在这里插入图片描述

虚拟内存表面上被虚线“分割”为一个个单元，实际上虚拟内存仍然是连续的，这些单元仅表示其与页表中每一个表项的映射关系，最终映射到大小相同的一个物理内存页上。

页表中的物理地址与物理内存之间是随机映射关系，哪儿可用就指向哪儿（物理页）。**虽然实际使用的物理内存是离散的，但与虚拟内存对应的线性地址是连续的。**处理器在访问数据和获取指令时使用的都是线性地址，只要线性地址连续，通过页表就能找到实际物理地址。

在32位系统中，地址长度为4个字节，因此页表中的每一个表项占用4个字节。页表总空间为 1048576 × 4 = 4MB，即页表自身就要占用 4MB ÷ 4KB = 1024 个物理页。这就带来一些问题：

一方面，使用页表的初衷是将进程划分为页后可不连续地存放在物理内存中，而现在页表本身却需要1024个连续的页框，似乎与初衷相悖；
另一方面，根据局部性原理，进程在一段时间内通常只访问几个页，因此没必要一次让所有页表都常驻内存。

解决大容量页表问题的最好办法是将页表看成普通文件进行离散分配，即对页表再分页，从而形成多级页表的结构。

为此，可以将单一页表拆成1024个更小的映射表，每个映射表包含1024个表项。1024 × 1024 仍可覆盖4GB虚拟内存。

在这里插入图片描述

此时的每一个表才是真正的页表，因此共计有1024个页表，每个页表占用4KB，合计仍为4MB。虽然总量不变，但大多数应用程序不会使用完整的4GB空间，可能只需几十个页表即可。例如，一个用户程序的代码段、数据段、栈段总共只需10MB内存空间。由于一个页表覆盖4MB物理内存，向上对齐至12MB后，仅需3个页表就足够。

计算过程：

每⼀个页表项指向⼀个 4KB 的物理⻚，那么⼀个页表中 1024 个页表项，⼀共能覆盖 4MB 的物理内存；

那么 10MB 的程序，向上对⻬取整之后(4MB 的倍数，就是 12 MB)，就需要 3 个页表就可以了。

1.2.4 页目录结构（了解）

到目前为止，每一个页框都被一个页表中的一个表项所指向，那么这 1024 个页表也需要被管理起来。管理页表的表称之为页目录表，形成二级页表。如下图所示：

在这里插入图片描述

所有页表的物理地址被页目录表项指向，页目录的物理地址被 CR3 寄存器指向，这个寄存器中保存了当前正在执行任务的页目录地址。

所以操作系统在加载用户程序的时候，不仅仅需要为程序内容分配物理内存，还需要为用来保存程序的页目录和页表分配物理内存。

1.2.5 两级页表的地址转换（重点）

下面以一个逻辑地址为例，说明将逻辑地址（0000000000 0000000001 11111111111）转换为物理地址的过程：

在32位处理器中，采用4KB的页大小，则虚拟地址中低12位为页偏移，剩下高20位用于页表，分成两级，每个级别占10个bit（10+10）。

首先由CR3寄存器读取页目录起始地址，再根据一级页号查页目录表，找到下一级页表在物理内存中的位置。

接着根据二级页号查表，找到最终想要访问的内存块号。

然后结合页内偏移量得到物理地址。

需要注意的是，一个物理页的地址一定是4KB对齐的（即最后的12位全部为0），所以实际上只需记录物理页地址的高20位即可。

上述过程就是MMU（Memory Management Unit，内存管理单元）的工作流程。MMU是一种高速硬件电路，其主要作用是进行内存管理，地址转换只是其功能之一。

但这里存在一个问题，即MMU需要进行两次页表查询才能确定物理地址，在确认权限等问题后才将地址发送到总线，内存随后读取对应地址的数据并返回。当页表级数增加为N级时，就变成了N次检索加一次读写。由此可见，页表级数越多，查询步骤越多，对CPU来说等待时间越长，效率越低。

我们总结一下：单级页表对连续内存要求高，因此引入多级页表。多级页表虽减少了连续存储要求并节省了空间，却牺牲了查询效率。

那么有没有提升效率的办法呢？在计算机科学中，所有问题几乎都可以通过增加一个中间层来解决。MMU因此引入了一种被称作快表的TLB（Translation Lookaside Buffer，快表，本质上是一种缓存）。

当CPU传给MMU一个新的虚拟地址后，MMU会先查询TLB是否已有该地址的映射关系，如果有，就直接拿到物理地址并发给内存，大大提升速度。但由于TLB容量较小，容易发生Cache Miss，此时MMU会回退到页表查找映射，找到之后不仅将地址发给内存，还会把这条映射记录更新到TLB中，以便下一次加速访问（局部性原理）。

在这里插入图片描述

资源划分：本质就是地址空间划分

资源共享：本质就是虚拟地址的共享

1.2.6 缺页异常

设想，CPU 给 MMU 的虚拟地址，在 TLB 和页表都没有找到对应的物理页，该怎么办呢？其实这就是缺页异常（Page Fault），它是一个由硬件中断触发的可以由软件逻辑纠正的错误。

假如目标内存页在物理内存中没有对应的物理页，或者存在但无对应权限，CPU 就无法获取数据，这种情况下CPU就会报告一个缺页错误。

由于 CPU 没有数据就无法进行计算，CPU罢工了，用户进程也就出现了缺页中断，进程会从用户态切换到内核态，并将缺页中断交给内核的 Page Fault Handler 处理。

在这里插入图片描述

缺页中断会交给 PageFaultHandler 处理，其根据缺页中断的不同类型会进行不同的处理：

Hard Page Fault（硬缺页错误/主要缺页错误）：此时物理内存中没有对应的物理页，CPU 需要打开磁盘设备将数据读取到物理内存中，再让 MMU 建立虚拟地址和物理地址的映射。
Soft Page Fault（软缺页错误/次要缺页错误）：此时物理内存中有对应的物理页，只是可能是其他进程调入的，发出缺页异常的进程并不知晓，此时 MMU 只需要建立映射即可，无需从磁盘读取或写入内存，通常发生在多进程共享内存区域。
Invalid Page Fault（无效缺页错误）：比如进程访问的内存地址越界，或者对空指针解引用，内核就会报 segment fault 错误并中断进程，导致进程直接挂掉。

1.3 线程的优点

创建一个新线程的代价要比创建一个新进程小得多。因为线程创建时，进程已经存在了，对地址空间也进行了划分，线程只需要创建PCB对资源进行调度即可。
与进程之间的切换相比，线程之间的切换需要操作系统做的工作要少很多。
- **最主要的区别是线程的切换虚拟内存空间依然是相同的，而进程切换则不同。**这两种上下文切换的处理都是通过操作系统内核来完成的。内核的这种切换过程伴随的最显著的性能损耗是将寄存器中的内容切换出。
- 另外一个隐藏的损耗是上下文的切换会扰乱处理器的缓存机制。简单来说，一旦切换上下文，处理器中所有已经缓存的内存地址一瞬间都会作废。还有一个显著的区别是，当你改变虚拟内存空间时，处理器的页表缓冲 TLB（快表）会被全部刷新，这将导致内存访问在一段时间内非常低效，但在线程切换中不会出现这个问题，当然硬件缓存也不会被清空。
线程占用的资源要比进程少得多；
能充分利用多处理器的可并行数量；
在等待慢速 I/O 操作结束的同时，程序可以执行其他的计算任务；
在计算密集型应用中，为了能在多处理器系统上运行，会将计算分解到多个线程中实现；
在 I/O 密集型应用中，为了提高性能，将 I/O 操作重叠，线程可以同时等待不同的 I/O 操作。

1.4 线程的缺点

性能损失

一个很少被外部事件阻塞的计算密集型线程往往无法与其他线程共享同一个处理器。如果计算密集型线程的数量超过了可用处理器数量，那么可能会出现较大的性能损失，这里的性能损失指的是增加了额外的同步和调度开销，而可用的资源并未增加。
健壮性降低

编写多线程程序需要更全面、更深入的考虑，在一个多线程程序中，因时间分配上的细微偏差或者因共享了不该共享的变量而造成不良影响的可能性很大，换句话说，线程之间是缺乏保护的。
缺乏访问控制

进程是访问控制的基本粒度，在一个线程中调用某些操作系统函数会对整个进程造成影响。
编程难度提高

编写和调试一个多线程程序要比单线程程序困难得多。

1.5 线程的异常

单个线程如果出现除零、野指针问题导致线程崩溃，进程也会随着崩溃。
线程是进程的执行分支，线程出异常就类似进程出异常，进而触发信号机制终止进程，进程终止后该进程内的所有线程也就随即退出。

1.6 线程的用途

合理使用多线程能提高CPU密集型程序的执行效率.
同时也能提高IO密集型程序的用户体验（如生活中我们一边写代码一边下载开发工具，就是多线程运行的一种表现）。

2. 线程VS进程

进程是资源分配的基本单位
线程是调度的基本单位
线程共享进程数据，但也拥有自己的一部分数据：
- 线程ID
- 一组寄存器
- 栈
- errno
- 信号屏蔽字
- 调度优先级

2.1 进程的多个线程共享

由于线程共享同一地址空间，因此 Text Segment、Data Segment 都是共享的。如果定义一个函数，在各线程中都可以调用；如果定义一个全局变量，在各线程中也都可以访问到。

除此之外，各线程还共享以下进程资源和环境：

文件描述符表、每种信号的处理方式（SIG_IGN、SIG_DFL 或者自定义的信号处理函数）、当前工作目录、用户 ID 和组 ID。

进程和线程的关系如下图所示：

我们之前学的单进程，其实就是具有一个线程执行流的进程。

3. Linux线程控制

3.1 POSIX线程库

与线程有关的函数构成了一个完整的系列，绝大多数函数的名字都是以“pthread_”打头的。
要使用这些函数库，需要通过引入头文件 <pthread.h>，并在链接这些线程函数库时使用编译器命令的 “-lpthread” 选项。

3.2 创建线程

功能：创建⼀个新的线程
原型：
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *
(*start_routine)(void*), void *arg);
参数:
thread:返回线程ID
attr:设置线程的属性，attr为NULL表⽰使⽤默认属性
start_routine:是个函数地址，线程启动后要执⾏的函数
arg:传给线程启动函数的参数
返回值：成功返回0；失败返回错误码

错误检查：
- 传统的⼀些函数是，成功返回0，失败返回-1，并且对全局变量errno赋值以指示错误。
- pthreads函数出错时不会设置全局变量errno（而⼤部分其他POSIX函数会这样做）。而是将错
  
  误代码通过返回值返回。
- pthreads同样也提供了线程内的errno变量，以⽀持其它使用errno的代码。对于pthreads函数的
  
  错误，建议通过返回值业判定，因为读取返回值要⽐读取线程内的errno变量的开销更小。

代码示例：

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>// 线程1的处理函数
void *rout(void *arg) {int i;for( ; ; ) {printf("I'am thread 1\n");sleep(1);}
}int main(void) {pthread_t tid;int ret;// 创建线程if ((ret = pthread_create(&tid, NULL, rout, NULL)) != 0) {fprintf(stderr, "pthread_create : %s\n", strerror(ret));exit(EXIT_FAILURE);}// 主线程的循环int i;for( ; ; ) {printf("I'am main thread\n");sleep(1);}return 0;
}

#include <pthread.h>
// 获取线程ID
pthread_t pthread_self(void);

- 打印出来的 tid 是通过 pthread 库中的 pthread_self 函数得到的，该函数返回一个 pthread_t 类型的变量，表示调用该函数的线程的 “ID”。这个 “ID” 是pthread库为每个线程定义的进程内唯一标识，由pthread库维护。
- 由于每个进程有独立的内存空间，因此这个 “ID” 的作用范围是进程级别，而非系统级（即内核不认识）。实际上，pthread 库是通过内核提供的系统调用（如 clone）来创建线程的，内核会为每个线程分配一个系统全局唯一的 “ID” 来唯一标识该线程。
这个ID我们后面会发现他其实是虚拟地址空间上pthread库中维护线程信息的结构体的首地址。

可以使用 ps 命令查看线程信息。

$ ps -aL | head -1 && ps -aL | grep mythread
PID 	LWP 	TTY 	TIME 	CMD
2711838 2711838 pts/235 00:00:00 mythread
2711838 2711839 pts/235 00:00:00 mythread
-L 选项：打印线程信息

其中LWP 是一个线程ID，它代表的是实际的线程标识帮助用户使用。之前通过pthread_self获得的线程ID是一个地址，位于虚拟地址空间，通过该地址可以找到与线程相关的基本信息，如线程ID、线程栈、寄存器等属性。在ps -aL中显示的线程ID中，主线程的线程ID与进程ID相同，主线程的栈位于虚拟地址空间，而其他线程的栈位于共享区（堆栈之间）。这是因为 pthread 库是在共享区中的，因此，除了主线程外，其他线程的栈都位于共享区。

3.3 线程终止

如果需要只终止某个线程而不终止整个进程，可以有三种方法：
1. 从线程函数 return。这种方法对主线程不适用，从 main 函数 return 相当于调用 exit；
2. 线程可以调用 pthread_exit 终止自己；
3. 一个线程可以调用 pthread_cancel 终止同一进程中的另一个线程。

pthread_exit函数

功能：线程终⽌
原型:
void pthread_exit(void *value_ptr);
参数:
value_ptr:value_ptr不要指向⼀个局部变量。
返回值：
⽆返回值，跟进程⼀样，线程结束的时候⽆法返回到它的调⽤者（⾃⾝）

需要注意，pthread_exit 或者 return 返回的指针所指向的内存单元必须是全局的或者是用 malloc 分配的，不能在线程函数的栈上分配，因为当其他线程得到这个返回指针时，线程函数已经退出了。

pthread_cancel函数

功能：取消⼀个执⾏中的线程
原型:
int pthread_cancel(pthread_t thread);
参数:
thread:线程ID
返回值：成功返回0；失败返回错误码

3.4 线程等待

为什么需要线程等待？
- 已经退出的线程，其空间没有被释放，仍然在进程的地址空间内。
- 创建新的线程不会复用刚才退出线程的地址空间。

功能：等待线程结束
原型
int pthread_join(pthread_t thread, void **value_ptr);
参数:
thread:线程ID
value_ptr:它指向⼀个指针，后者指向线程的返回值
返回值：成功返回0；失败返回错误码

调用该函数的线程将挂起等待，直到 ID 为 thread 的线程终止。thread 线程以不同的方式终止，通过 pthread_join 得到的终止状态也不同，总结如下：
1. 如果 thread 线程通过 return 返回，value_ptr 所指向的单元里存放的是 thread 线程函数的返回值；
2. 如果 thread 线程被别的线程调用 pthread_cancel 异常终止，value_ptr 所指向的单元里存放的是常数 PTHREAD_CANCELED；
3. 如果 thread 线程是自己调用 pthread_exit 终止的，value_ptr 所指向的单元存放的是传给 pthread_exit 的参数；
4. 如果对 thread 线程的终止状态不感兴趣，可以传 NULL 给 value_ptr 参数。

示例代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>void *thread1(void *arg) {printf("thread 1 returning ... \n");int *p = (int *)malloc(sizeof(int));*p = 1;return (void *)p;
}void *thread2(void *arg) {printf("thread 2 exiting ...\n");int *p = (int *)malloc(sizeof(int));*p = 2;pthread_exit((void *)p);
}void *thread3(void *arg) {while (1) { // 无限循环，模拟线程运行printf("thread 3 is running ...\n");sleep(1);}return NULL;
}int main(void) {pthread_t tid;void *ret;// thread 1 returnpthread_create(&tid, NULL, thread1, NULL);pthread_join(tid, &ret);printf("thread return, thread id %X, return code:%d\n", tid, *(int *)ret);free(ret);// thread 2 exitpthread_create(&tid, NULL, thread2, NULL);pthread_join(tid, &ret);printf("thread return, thread id %X, return code:%d\n", tid, *(int *)ret);free(ret);// thread 3 cancel by otherpthread_create(&tid, NULL, thread3, NULL);sleep(3);pthread_cancel(tid);pthread_join(tid, &ret);if (ret == PTHREAD_CANCELED)printf("thread return, thread id %X, return code:PTHREAD_CANCELED\n", tid);elseprintf("thread return, thread id %X, return code:NULL\n", tid);return 0;
}

运行结果：

[root@localhost linux]# ./a.out
thread 1 returning ...
thread return, thread id 5AA79700, return code:1
thread 2 exiting ...
thread return, thread id 5AA79700, return code:2
thread 3 is running ...
thread 3 is running ...
thread 3 is running ...
thread return, thread id 5AA79700, return code:PTHREAD_CANCELED

在这里插入图片描述

3.5 线程分离

默认情况下，新创建的线程是 joinable 的。线程退出后，需要对其进行 pthread_join 操作，否则无法释放资源，从而造成系统泄漏。
如果不关心线程的返回值，join 是一种负担。这时可以告诉系统在线程退出时自动释放线程资源。

int pthread_detach(pthread_t thread);

可以是线程组内其他线程对⽬标线程进行分离，也可以是线程自己分离：

pthread_detach(pthread_self());

joinable和分离是冲突的，⼀个线程不能既是joinable又是分离的。

代码示例：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>void *thread_run(void *arg)
{pthread_detach(pthread_self());           // 分离线程printf("%s\n", (char *)arg);              // 输出线程参数return NULL;
}int main(void)
{pthread_t tid;if (pthread_create(&tid, NULL, thread_run, "thread1 run...") != 0) {printf("create thread error\n");      // 创建线程失败return 1;}int ret = 0;sleep(1);                                 // 很重要，要让线程先分离，再等待if (pthread_join(tid, NULL) == 0) {printf("pthread wait success\n");     // 线程等待成功ret = 0;} else {printf("pthread wait failed\n");      // 线程等待失败ret = 1;}return ret;
}

4. 线程ID及进程地址空间布局

**pthread_create 函数会产生一个线程 ID，存放在第一个参数指向的地址中。**该线程 ID 与前面提到的线程 ID 并不是一回事。

前面讲的线程 ID 属于进程调度的范畴，因为线程是轻量级进程，是操作系统调度器的最小单位，所以需要一个数值来唯一表示该线程。

而 pthread_create 函数的第一个参数指向一个虚拟内存单元，该内存单元的地址即为新创建线程的线程 ID，属于 NPTL 线程库的范畴。线程库的后续操作就是根据该线程 ID 来操作线程的。
线程库 NPTL 提供了 pthread_self 函数，可以获得线程自身的 ID。
```
pthread_t pthread_self(void);
```

pthread_t类型取决于实现。对于Linux⽬前实现的NPTL实现而言，pthread_t类型的线程ID，本质就是⼀个进程地址空间上的⼀个地址。

在这里插入图片描述

结论：其实线程是分成两部分被操作系统管理的，内核负责管理线程pcb运行，动态库负责管理线程tcb信息和数据。

5. 线程封装

Thread.hpp

#pragma once
#include <iostream>
#include <string>
#include <functional>
#include <pthread.h>namespace ThreadModule
{// 原子计数器，方便形成线程名称std::uint32_t cnt = 0;// 线程要执行的外部方法，我们不考虑传参，后续有std::bind来进行类间耦合using threadfunc_t = std::function<void()>;// 线程状态enum class TSTATUS{THREAD_NEW,THREAD_RUNNING,THREAD_STOP};// 线程class Thread{private:static void *run(void *obj){Thread *self = static_cast<Thread *>(obj);pthread_setname_np(pthread_self(), self->_name.c_str()); // 设置线程名称self->_status = TSTATUS::THREAD_RUNNING;if (!self->_joined){pthread_detach(pthread_self());}self->_func();return nullptr;}void SetName(){// 后期加锁保护_name = "Thread-" + std::to_string(cnt++);}public:Thread(threadfunc_t func): _status(TSTATUS::THREAD_NEW),_joined(true),_func(func){SetName();}void EnableDetach(){if (_status == TSTATUS::THREAD_NEW)_joined = false;}void EnableJoined(){if (_status == TSTATUS::THREAD_NEW)_joined = true;}bool Start(){if (_status == TSTATUS::THREAD_RUNNING)return true;int n = ::pthread_create(&_id, nullptr, run, this);if (n != 0)return false;return true;}bool Join(){if (_joined){int n = pthread_join(_id, nullptr);if (n != 0)return false;return true;}return false;}~Thread() {}private:std::string _name;pthread_t _id;TSTATUS _status;bool _joined;threadfunc_t _func;};
}

main.cc

#include <iostream>
#include <unistd.h>
#include "test.hpp"void hello1()
{char buffer[64];pthread_getname_np(pthread_self(), buffer, sizeof(buffer) - 1);while (true){std::cout << "hello world, " << buffer << std::endl;sleep(1);}
}void hello2()
{char buffer[64];pthread_getname_np(pthread_self(), buffer, sizeof(buffer) - 1);while (true){std::cout << "hello world, " << buffer << std::endl;sleep(1);}
}int main()
{pthread_setname_np(pthread_self(), "main");ThreadModule::Thread t1(hello1);t1.Start();ThreadModule::Thread t2(std::bind(&hello2));t2.Start();t1.Join();t2.Join();return 0;
}

运行结果查询

$ ps -aL
PID LWP TTY TIME CMD
195828 195828 pts/1 00:00:00 main
195828 195829 pts/1 00:00:00 Thread-0
195828 195830 pts/1 00:00:00 Thread-1

6. 附录

6.1 线程栈

虽然Linux将线程和进程不加区分的统一放到了test_struct，但是对待其地址空间的stack还是有区别的。

对于 Linux 进程或者说主线程，简单理解就是 main 函数的栈空间，在 fork 的时候，实际上就是复制了父亲的 stack 空间地址，然后写时拷贝（cow）以及动态增长。如果扩充超出该上限则栈溢出会报段错误（发送段错误信号给该进程）。进程栈是唯一可以访问未映射页而不一定会发生段错误——超出扩充上限才报。
然而对于主线程生成的子线程而言，其 stack 将不再是向下增长的，而是事先固定下来的。线程栈一般是调用 glibc/uclibc 等的 pthread 库接口 pthread_create 创建的线程，在文件映射区（或称之为共享区），其中使用 mmap 系统调用，这个可以从 glibc 的 nptl/allocatestack.c 中的 allocate_stack 函数中看到。
mem = mmap (NULL, size, prot,MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
此调用中的 size 参数的获取很是复杂，你可以手工传入 stack 的大小，也可以使用默认的，一般而言就是默认的 8M。这些都不重要，重要的是，**这种 stack 不能动态增长，一旦用尽就没了，这是和生成进程的 fork 不同的地方。**在 glibc 中通过 mmap 得到了 stack 之后，底层将调用 sys_clone 系统调用：
int sys_clone(struct pt_regs *regs)
{unsigned long clone_flags;unsigned long newsp;int __user *parent_tidptr, *child_tidptr;clone_flags = regs->bx;//获取了mmap得到的线程的stack指针newsp = regs->cx;parent_tidptr = (int __user *)regs->dx;child_tidptr = (int __user *)regs->di;if (!newsp)newsp = regs->sp;return do_fork(clone_flags, newsp, regs, 0,parent_tidptr, child_tidptr);
}
因此，对于子线程的 stack，它其实是在进程的地址空间中 map 出来的一块内存区域，原则上是线程私有的，但是同一个进程的所有线程生成的时候，是会浅拷贝生成者的 task_struct 的很多字段，如果愿意，其它线程也还是可以访问到的，于是一定要注意。

6.2 线程相关源码（辅助理解线程）

以下是glibc-2.4中pthread源码相关内容：

路径：nptl/pthread_create.c

int __pthread_create_2_1(newthread, attr, start_routine, arg)pthread_t *newthread;const pthread_attr_t *attr;void *(*start_routine)(void *);void *arg;
{STACK_VARIABLES;// 重点1: 线程属性，虽然我们不设置，但是不妨碍我们了解const struct pthread_attr *iattr = (struct pthread_attr *)attr;/if (iattr == NULL)/* Is this the best idea? On NUMA machines this could meanaccessing far-away memory. */iattr = &default_attr;// 重点2: 传说中的原生线程库中的用来描述线程的tcbstruct pthread *pd = NULL;//// 重点3: ALLOCATE_STACK会在先申请struct pthread对象，当然其实是申请一大块空间，// struct pthread在空间的开头，一会追int err = ALLOCATE_STACK(iattr, &pd);///if (__builtin_expect(err != 0, 0))/* Something went wrong. Maybe a parameter of the attributes isinvalid or we could not allocate memory. */versioned_symbol return err;/* Initialize the TCB. All initializations with zero should beperformed in 'get_cached_stack'. This way we avoid doing this ifthe stack freshly allocated with 'mmap'. */#ifdef TLS_TCB_AT_TP/* Reference to the TCB itself. */pd->header.self = pd;/* Self-reference for TLS. */pd->header.tcb = pd;
#endif/* Store the address of the start routine and the parameter. Sincewe do not start the function directly the stillborn thread willget the information from its thread descriptor. */// 重点4: 向线程tcb中设置未要求执行的方法的地址和参数pd->start_routine = start_routine;/pd->arg = arg;///* Copy the thread attribute flags. */struct pthread *self = THREAD_SELF;pd->flags = ((iattr->flags & ~(ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)))| (self->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)));/* Initialize the field for the ID of the thread which is waiting for us. This is a self-reference in case the thread is created detached. */pd->joinid = iattr->flags & ATTR_FLAG_DETACHSTATE ? pd : NULL;/* The debug events are inherited from the parent. */pd->eventbuf = self->eventbuf;/* Copy the parent's scheduling parameters. The flags will say what is valid and what is not. */pd->schedpolicy = self->schedpolicy;pd->schedparam = self->schedparam;/* Copy the stack guard canary. */
#ifdef THREAD_COPY_STACK_GUARDTHREAD_COPY_STACK_GUARD(pd);
#endif/* Copy the pointer guard value. */
#ifdef THREAD_COPY_POINTER_GUARDTHREAD_COPY_POINTER_GUARD(pd);
#endif// 一堆参数设定，我们不关心/* Determine scheduling parameters for the thread. */if (attr != NULL && __builtin_expect((iattr->flags &ATTR_FLAG_NOTINHERITSCHED) != 0, 0) && (iattr->flags & (ATTR_FLAG_SCHED_SET |ATTR_FLAG_POLICY_SET)) != 0){INTERNAL_SYSCALL_DECL(scerr);}/* Use the scheduling parameters the user provided. */if (iattr->flags & ATTR_FLAG_POLICY_SET)pd->schedpolicy = iattr->schedpolicy;else if ((pd->flags & ATTR_FLAG_POLICY_SET) == 0){pd->schedpolicy = INTERNAL_SYSCALL(sched_getscheduler, scerr, 1, 0);pd->flags != ATTR_FLAG_POLICY_SET;}if (iattr->flags & ATTR_FLAG_SCHED_SET)memcpy(&pd->schedparam, &iattr->schedparam,sizeof(struct sched_param));else if ((pd->flags & ATTR_FLAG_SCHED_SET) == 0){INTERNAL_SYSCALL(sched_getparam, scerr, 2, 0, &pd->schedparam);pd->flags != ATTR_FLAG_SCHED_SET;}/* Check for valid priorities. */int mfnprio = INTERNAL_SYSCALL(sched_get_priority_min, scerr, 1,iattr->schedpolicy);int maxprio = INTERNAL_SYSCALL(sched_get_priority_max, scerr, 1,iattr->schedpolicy);if (pd->schedparam.sched_priority < mfnprio || pd->schedparam.sched_priority > maxprio){err = EINVAL;goto errout;}/* Pass the descriptor to the caller. */// 重点5：把pd（就是线程控制块地址）作为ID，传递出去，所以上层拿到的就是一个虚拟地址*newthread = (pthread_t)pd;//* Remember whether the thread is detached or not. In case of an error we have to free the stacks of non-detached stillborn threads. */// 重点6：检测线程属性是否分离，这个很好理解bool is_detached = IS_DETACHED(pd);/* Start the thread. */err = create_thread(pd, iattr, STACK_VARIABLES_ARGS); // 重点函数if (err != 0){/* Something went wrong. Free the resources. */if (!is_detached){errout:__deallocate_stack(pd);}return err;}return 0;
}
// 版本确认信息，意思就是如果用的库是GLIBC_2_1, pthread_create函数就是__pthread_create_2_1
versioned_symbol(libpthread, __pthread_create_2_1, pthread_create, GLIBC_2_1);

线程属性

struct pthread_attr
{/* Scheduler parameters and priority. */struct sched_param schedparam;int schedpolicy;/* Various flags like detachstate, scope, etc. */int flags;/* Size of guard area. */size_t guardsize;/* Stack handling. */void *stackaddr;size_t stacksize;/* Affinity map. */cpu_set_t *cpuset;size_t cpusetsize;
};

线程tcb

/* Thread descriptor data structure. */
struct pthread
{union{
#if !TLS_DTV_AT_TP/* This overlaps the TCB as used for TLS without threads (see tls.h). */tcbhead_t header;
#elsestruct{int multiple_threads;} header;
#endif/* This extra padding has no special purpose, and this structure layoutis private and subject to change without affecting the official ABI.We just have it here in case it might be convenient for someimplementation-specific instrumentation hack or suchlike. */void *__padding[16];};/* This descriptor's link on the 'stack_used' or '__stack_user' list. */list_t list;/* Thread ID - which is also a 'is this thread descriptor (andtherefore stack) used' flag. */pid_t tid;/* Process ID - thread group ID in kernel speak. */pid_t pid;//* List of robust mutexes the thread is holding. */
#ifdef __PTHREAD_MUTEX_HAVE_PREV__pthread_list_t robust_list;# define ENQUEUE_MUTEX(mutex) \do { \__pthread_list_t *next = THREAD_GETMEM (THREAD_SELF, robust_list.__next); \next->__prev = &mutex->__data.__list; \mutex->__data.__list.__next = next; \mutex->__data.__list.__prev = &THREAD_SELF->robust_list; \THREAD_SETMEM (THREAD_SELF, robust_list.__next, &mutex->__data.__list); \} while (0)
# define DEQUEUE_MUTEX(mutex) \do { \mutex->__data.__list.__next->__prev = mutex->__data.__list.__prev; \mutex->__data.__list.__prev->__next = mutex->__data.__list.__next; \mutex->__data.__list.__prev = NULL; \mutex->__data.__list.__next = NULL; \} while (0)
#else__pthread_slist_t robust_list;# define ENQUEUE_MUTEX(mutex) \do { \mutex->__data.__list.__next \= THREAD_GETMEM (THREAD_SELF, robust_list.__next); \THREAD_SETMEM (THREAD_SELF, robust_list.__next, &mutex->__data.__list); \} while (0)
# define DEQUEUE_MUTEX(mutex) \do { \__pthread_slist_t *rump = THREAD_GETMEM (THREAD_SELF, robust_list.__next); \if (rump == &mutex->__data.__list) \THREAD_SETMEM (THREAD_SELF, robust_list.__next, rump->__next); \else \{ \while (rump->__next != &mutex->__data.__list) \rump = rump->__next; \} \rump->__next = rump->__next->__next; \mutex->__data.__list.__next = NULL; \} while (0)
#endif/* List of cleanup buffers. */struct _pthread_cleanup_buffer *cleanup;/* Unwind information. */struct pthread_unwind_buf *cleanup_jmp_buf;
#define HAVE_CLEANUP_JMP_BUF/* Flags determining processing of cancellation. */int cancelhandling;/* Bit set if cancellation is disabled. */
#define CANCELSTATE_BIT 0
#define CANCELSTATE_BITMASK 0x01/* Bit set if asynchronous cancellation mode is selected. */
#define CANCELTYPE_BIT 1
#define CANCELTYPE_BITMASK 0x02/* Bit set if canceling has been initiated. */
#define CANCELING_BIT 2
#define CANCELING_BITMASK 0x04/* Bit set if canceled. */
#define CANCELED_BIT 3
#define CANCELED_BITMASK 0x08/* Bit set if thread is exiting. */
#define EXITING_BIT 4
#define EXITING_BITMASK 0x10/* Bit set if thread terminated and TCB is freed. */
#define TERMINATED_BIT 5
#define TERMINATED_BITMASK 0x20/* Bit set if thread is supposed to change XID. */
#define SETXID_BIT 6
#define SETXID_BITMASK 0x40/* Mask for the rest. Helps the compiler to optimize. */
#define CANCEL_RESTMASK 0xffffff80#define CANCEL_ENABLED_AND_CANCELED(value) \(((value) & (CANCELSTATE_BITMASK | CANCELED_BITMASK | EXITING_BITMASK \| CANCEL_RESTMASK | TERMINATED_BITMASK)) == CANCELED_BITMASK)
#define CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS(value) \(((value) & (CANCELSTATE_BITMASK | CANCELTYPE_BITMASK | CANCELED_BITMASK \| EXITING_BITMASK | CANCEL_RESTMASK | TERMINATED_BITMASK)) \== (CANCELTYPE_BITMASK | CANCELED_BITMASK))/* We allocate one block of references here. This should be enough to avoid allocating any memory dynamically for most applications.*/struct pthread_key_data{/* Sequence number. We use uintptr_t to not require padding on 32- and 64-bit machines. On 64-bit machines it helps to avoid wrapping, too.*/uintptr_t seq;/* Data pointer. */void *data;} specific_istblock[PTHREAD_KEY_2NDLEVEL_SIZE];/* Two-level array for the thread-specific data. */struct pthread_key_data *specific[PTHREAD_KEY_1STLEVEL_SIZE];/* Flag which is set when specific data is set. */bool specific_used;/* True if events must be reported. */bool report_events;/* True if the user provided the stack. */bool user_stack;////* True if thread must stop at startup time. */bool stopped_start;/* Lock to synchronize access to the descriptor. */lll_lock_t lock;/* Lock for synchronizing setxid calls. */lll_lock_t setxid_futex;#if HP_TIMING_AVAIL/* Offset of the CPU clock at start thread start time. */hp_timing_t cpuclock_offset;
#endif/* If the thread waits to join another one the ID of the latter is stored here.In case a thread is detached this field contains a pointer of the TCB if the thread itself. This is something which cannot happen in normal operation. */struct pthread *joinid;/* Check whether a thread is detached. */
#define IS_DETACHED(pd) ((pd)->joinid == (pd))/* Flags. Including those copied from the thread attribute. */int flags;/* The result of the thread function. */// 线程运行完毕，返回值就是void*，最后的返回值就放在tcb中的该变量里面// 所以我们用pthread_join获取线程退出信息的时候，就是读取该结构体// 另外，要能理解线程执行流可以退出，但是tcb可以暂时保留，这句话void *result;////* Scheduling parameters for the new thread. */struct sched_param schedparam;int schedpolicy;/* Start position of the code to be executed and the argument passed to the function. */// 用户指定的方法和参数void *(*start_routine) (void *);/void *arg;////* Debug state. */td_eventbuf_t eventbuf;/* Next descriptor with a pending event. */struct pthread *nextevent;#ifdef HAVE_FORCED_UNWIND// Machine-specific unwind info. */struct Unwind_Exception exc;
#endif/* If nonzero pointer to area allocated for the stack and its size. */void *stackblock;///size_t stackblock_size;/size_t guardsize;size_t reported_guardsize;/* Resolver state. */struct __res_state res;/* This member must be last. */char end_padding[];#define PTHREAD_STRUCT_END_PADDING \(sizeof (struct pthread) - offsetof (struct pthread, end_padding))
} __attribute ((aligned (TCB_ALIGNMENT)));

create_thread

static int
create_thread(struct pthread *pd, const struct pthread_attr *attr,STACK_VARIABLES_PARMS)
{
#ifdef TLS_TCB_AT_TPassert(pd->header.tcb != NULL);
#endif/* We rely heavily on various flags the CLONE function understands:CLONE_VM, CLONE_FS, CLONE_FILESThese flags select semantics with shared address space andfile descriptors according to what POSIX requires.CLONE_SIGNALThis flag selects the POSIX signal semantics.CLONE_SETTLSThe sixth parameter to CLONE determines the TLS area for thenew thread.CLONE_PARENT_SETTIDThe kernels writes the thread ID of the newly created threadinto the location pointed to by the fifth parameters to CLONE.Note that it would be semantically equivalent to useCLONE_CHILD_SETTID but it is be more expensive in the kernel.CLONE_CHILD_CLEARTIDThe kernels clears the thread ID of a thread that has calledto CLONE.CLONE_DETACHEDNo signal is generated if the thread exists and it isautomatically reaped.The termination signal is chosen to be zero which means no signalis sent. */int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL |CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_SYSVSEM
#if __ASSUME_NO_CLONE_DETACHED == 0| CLONE_DETACHED
#endif| 0);if (__builtin_expect(THREAD_GETMEM(THREAD_SELF, report_events), 0)){/* The parent thread is supposed to report events. Check whetherthe TD_CREATE event is needed, too. */const int idx = __td_eventword(TD_CREATE);const uint32_t __mask = __td_eventmask(TD_CREATE);if ((__mask & (_nptl_threads_events.event_bits[idx] | pd->eventbuf.eventmask.event_bits[idx])) != 0){/* We always must have the thread start stopped. */pd->stopped_start = true;/* Create the thread. We always create the thread stoppedso that it does not get far before we tell the debugger. */int res = do_clone(pd, attr, clone_flags, start_thread,STACK_VARIABLES_ARGS, 1);//if (res == 0){/* Now fill in the information about the new thread inthe newly created thread's data structure. We cannot letthe new thread do this since we don't know whether it wasalready scheduled when we send the event. */pd->eventbuf.eventnum = TD_CREATE;pd->eventbuf.eventdata = pd;/* Enqueue the descriptor. */dopd->nextevent = __nptl_last_event;while (atomic_compare_and_exchange_bool_acq(&__nptl_last_event,pd, pd->nextevent) != 0);/* Now call the function which signals the event. */__nptl_create_event();/* And finally restart the new thread. */lll_unlock(pd->lock);}return res;///}}#ifdef NEED_DL_SYSINFOassert(THREAD_SELF_SYSINFO == THREAD_SYSINFO(pd));
#endif/* Determine whether the newly created threads has to be startedstopped since we have to set the scheduling parameters or set theaffinity. */bool stopped = false;if (attr != NULL && (attr->cpuset != NULL || (attr->flags &ATTR_FLAG_NOTINHERITSCHED) != 0))stopped = true;pd->stopped_start = stopped;/* Actually create the thread. */int res = do_clone(pd, attr, clone_flags, start_thread,///STACK_VARIABLES_ARGS, stopped);///if (res == 0 && stopped)/* And finally restart the new thread. */lll_unlock(pd->lock);return res;
}

do_clone

static int
do_clone(struct pthread *pd, const struct pthread_attr *attr,int clone_flags, int (*fct)(void *), STACK_VARIABLES_PARMS,int stopped)
{
#ifdef PREPARE_CREATEPREPARE_CREATE;
#endifif (stopped)/* We Make sure the thread does not run far by forcing it to get a lock. We lock it here too so that the new thread cannot continue until we tell it to. */lll_lock(pd->lock);/* One more thread. We cannot have the thread do this itself, since it might exist but not have been scheduled yet by the time we've returned and need to check the value to behave correctly. We must do it before creating the thread, in case it does get scheduled first and then might mistakenly think it was the only thread. In the failure case, we momentarily store a false value; this doesn't matter because there is no kosher thing a signal handler interrupting us right here can do that cares whether the thread count is correct. */atomic_increment(&__nptl_nthreads);// 执行特性体系结构下的clone函数if (ARCH_CLONE(fct, STACK_VARIABLES_ARGS, clone_flags,/pd, &pd->tid, TLS_VALUE, &pd->tid) == -1)//{atomic_decrement(&__nptl_nthreads); /* Oops, we lied for a second. *//* Failed. If the thread is detached, remove the TCB here since the caller cannot do this. The caller remembered the thread as detached and cannot revertify that it is not since it must not access the thread descriptor again. */if (IS_DETACHED(pd))__deallocate_stack(pd);return errno;}/* Now we have the possibility to set scheduling parameters etc. */// 下面是调用相关系数调用，设置轻量级进程的调度参数和一些异常处理，不关心if (__builtin_expect(stopped != 0, 0)){INTERNAL_SYSCALL_DECL(err);int res = 0;/* Set the affinity mask if necessary. */if (attr->cpuset != NULL){res = INTERNAL_SYSCALL(sched_setaffinity, err, 3, pd->tid,sizeof(cpu_set_t), attr->cpuset);if (__builtin_expect(INTERNAL_SYSCALL_ERROR_P(res, err), 0)){/* The operation failed. We have to kill the thread. Firstsend it the cancellation signal. */INTERNAL_SYSCALL_DECL(err2);err_out:
#if __ASSUME_TGKILL(void)INTERNAL_SYSCALL(tgkill, err2, 3,THREAD_GETMEM(THREAD_SELF, pid),pd->tid, SIGCANCEL);
#else(void)INTERNAL_SYSCALL(tkill, err2, 2, pd->tid, SIGCANCEL);
#endifreturn (INTERNAL_SYSCALL_ERROR_P(res, err)? INTERNAL_SYSCALL_ERRNO(res, err): 0);}}/* Set the scheduling parameters. */if ((attr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0){res = INTERNAL_SYSCALL(sched_setscheduler, err, 3, pd->tid,pd->schedpolicy, &pd->schedparam);if (__builtin_expect(INTERNAL_SYSCALL_ERROR_P(res, err), 0))goto err_out;}}/* We now have for sure more than one thread. The main thread mightnot yet have the flag set. No need to set the global variableagain if this is what we use. */THREAD_SETMEM(THREAD_SELF, header.multiple_threads, 1);return 0;
}

ARCH_CLONE

#define ARCH_CLONE __clone

__clone是glibc用汇编封装的一个调用clone系统调用的函数，所以__clone的实现就是汇编，贴一份代码（sysdeps/unix/sysv/linux/x86_64）：

ENTRY (BP_SYM (__clone))/* Sanity check arguments. */movq    $-EINVAL,%raxtestq    %rdi,%rdi    /* no NULL function pointers */jz    SYSCALL_ERROR_LABELtestq    %rsi,%rsi    /* no NULL stack pointers */jz    SYSCALL_ERROR_LABEL/* Insert the argument onto the new stack. */subq    $16,%rsimovq    %rcx,8(%rsi)/* Save the function pointer. It will be popped off in the child in the ebx frobbing below. */movq    %rdi,0(%rsi)/* Do the system call. */movq    %rdx, %rdimovq    %r8, %rdxmovq    %r9, %r8movq    8(%rsp), %r10movl    $SYS_ify(clone),%eax // 获取系统调用号/* End FDE now, because in the child the unwind info will be wrong. */cfi_endproc;syscall    // 陷入内核(x86_32是int 80)，要求内核创建轻量级进程///testq    %rax,%raxjl    SYSCALL_ERROR_LABELjz    L(thread_start)

空间申请：int err = ALLOCATE_STACK(iattr, &pd);

源码路径：nptl/allocatestack.c

// 空间申请的函数，其实就是一个宏
#define ALLOCATE_STACK(attr, pd) \allocate_stack(attr, pd, &stackaddr, &stacksize)static int
allocate_stack(const struct pthread_attr *attr, struct pthread **pdp,ALLOCATE_STACK_PARMS)
{struct pthread *pd;size_t size;size_t pagesize_m1 = __getpagesize() - 1;void *stacktop;assert(attr != NULL);assert(powerof2(pagesize_m1 + 1));assert(TCB_ALIGNMENT >= STACK_ALIGN);// 获取栈大小，用户没设置就默认size = attr->stacksize ?: __default_stacksize;// 如果用户已经在线程属性里面设置了空间，就直接用if (__builtin_expect(attr->flags & ATTR_FLAG_STACKADDR, 0)) {uintptr_t adj;// 栈空间太小，报错if (attr->stacksize != 0 &&attr->stacksize < (__static_tls_size + MINIMAL_REST_STACK))return EINVAL;// TLS 对齐
#if TLS_TCB_AT_TPadj = ((uintptr_t)attr->stackaddr - TLS_TCB_SIZE) & __static_tls_align_m1;assert(size > adj + TLS_TCB_SIZE);
#elif TLS_DTV_AT_TPadj = ((uintptr_t)attr->stackaddr - __static_tls_size) & __static_tls_align_m1;assert(size > adj);
#endif// 使用用户提供的内存，不使用 guard page
#if TLS_TCB_AT_TPpd = (struct pthread *)((uintptr_t)attr->stackaddr - TLS_TCB_SIZE - adj);
#elif TLS_DTV_AT_TPpd = (struct pthread *)((uintptr_t)attr->stackaddr - __static_tls_size - adj - TLS_PRE_TCB_SIZE);
#endifmemset(pd, '\0', sizeof(struct pthread));pd->specific[0] = pd->specific_istblock;pd->stackblock = (char *)attr->stackaddr - size;pd->stackblock_size = size;pd->user_stack = true;pd->header.multiple_threads = 1;#ifndef TLS_MULTIPLE_THREADS_IN_TCB__pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
#endif#ifdef NEED_DL_SYSINFOTHREAD_SYSINFO(pd) = THREAD_SELF_SYSINFO;
#endifpd->pid = THREAD_GETMEM(THREAD_SELF, pid);#ifdef __PTHREAD_MUTEX_HAVE_PREVpd->robust_list.__prev = &pd->robust_list;
#endifpd->robust_list.__next = &pd->robust_list;if (_dl_allocate_tls(TLS_TPADJ(pd)) == NULL) {assert(errno == ENOMEM);return EAGAIN;}lll_lock(stack_cache_lock);list_add(&pd->list, &__stack_user);lll_unlock(stack_cache_lock);}else {// 库内部自己分配空间size_t guardsize, reqsize;void *mem;const int prot = PROT_READ | PROT_WRITE |((GL(dl_stack_flags) & PF_X) ? PROT_EXEC : 0);#if COLORING_INCREMENT != 0if (size <= 16 * pagesize_m1)size += pagesize_m1 + 1;
#endifsize &= ~__static_tls_align_m1;assert(size != 0);guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;if (__builtin_expect(size < ((guardsize + __static_tls_size + MINIMAL_REST_STACK + pagesize_m1) & ~pagesize_m1),0))return EINVAL;// 先尝试从pthread缓存中申请空间reqsize = size;pd = get_cached_stack(&size, &mem);if (pd == NULL) {if ((size % MULTI_PAGE_ALIASING) == 0)size += pagesize_m1 + 1;// 缓存申请失败，就在堆空间申请私有的匿名内存空间，这⾥mmap类似malloc// 当然他也可以作为共享内存的实现，类似原理我们接触过，这个功能和当前⽆关mem = mmap(NULL, size, prot,MAP_PRIVATE | MAP_ANONYMOUS | ARCH_MAP_FLAGS, -1, 0);if (__builtin_expect(mem == MAP_FAILED, 0)) {
#ifdef ARCH_RETRY_MMAPmem = ARCH_RETRY_MMAP(size);if (__builtin_expect(mem == MAP_FAILED, 0))
#endifreturn errno;}assert(mem != NULL);#if COLORING_INCREMENT != 0unsigned int ncreated = atomic_increment_val(&nptl_ncreated);size_t coloring = (ncreated * COLORING_INCREMENT) & pagesize_m1;if (__builtin_expect((coloring & __static_tls_align_m1) != 0, 0))coloring = ((coloring + __static_tls_align_m1) & ~__static_tls_align_m1) & ~pagesize_m1;
#else
#define coloring 0
#endif// 下⾯的代码其实就是我们课件中的图，这⾥是在申请的空间中确定struct thread(tcb)的地址
#if TLS_TCB_AT_TPpd = (struct pthread *)((char *)mem + size - coloring) - 1;
#elif TLS_DTV_AT_TPpd = (struct pthread *)((((uintptr_t)mem + size - coloring - __static_tls_size) &~__static_tls_align_m1) -TLS_PRE_TCB_SIZE);
#endif// 记录下来整个空间的地址和⼤⼩pd->stackblock = mem;pd->stackblock_size = size;pd->specific[0] = pd->specific_istblock;pd->header.multiple_threads = 1;#ifndef TLS_MULTIPLE_THREADS_IN_TCB__pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
#endif#ifdef NEED_DL_SYSINFOTHREAD_SYSINFO(pd) = THREAD_SELF_SYSINFO;
#endif// 获取线程对应进程的pidpd->pid = THREAD_GETMEM(THREAD_SELF, pid);#ifdef __PTHREAD_MUTEX_HAVE_PREVpd->robust_list.__prev = &pd->robust_list;
#endifpd->robust_list.__next = &pd->robust_list;if (_dl_allocate_tls(TLS_TPADJ(pd)) == NULL) {assert(errno == ENOMEM);(void)munmap(mem, size);return EAGAIN;}lll_lock(stack_cache_lock);list_add(&pd->list, &stack_used);lll_unlock(stack_cache_lock);if (__builtin_expect((GL(dl_stack_flags) & PF_X) != 0 && (prot & PROT_EXEC) == 0, 0)) {int err = change_stack_perm(pd
#ifdef NEED_SEPARATE_REGISTER_STACK,~pagesize_m1
#endif);if (err != 0) {(void)munmap(mem, size);return err;}}if (__builtin_expect(guardsize > pd->guardsize, 0)) {
#ifdef NEED_SEPARATE_REGISTER_STACKchar *guard = mem + (((size - guardsize) / 2) & ~pagesize_m1);
#elsechar *guard = mem;
#endifif (mprotect(guard, guardsize, PROT_NONE) != 0) {int err;mprot_error:err = errno;lll_lock(stack_cache_lock);list_del(&pd->list);lll_unlock(stack_cache_lock);_dl_deallocate_tls(TLS_TPADJ(pd), false);(void)munmap(mem, size);return err;}pd->guardsize = guardsize;}else if (__builtin_expect(pd->guardsize - guardsize > size - reqsize, 0)) {
#ifdef NEED_SEPARATE_REGISTER_STACKchar *guard = mem + (((size - guardsize) / 2) & ~pagesize_m1);char *oldguard = mem + (((size - pd->guardsize) / 2) & ~pagesize_m1);if (oldguard < guard && mprotect(oldguard, guard - oldguard, prot) != 0)goto mprot_error;if (mprotect(guard + guardsize, oldguard + pd->guardsize - guard - guardsize, prot) != 0)goto mprot_error;
#elseif (mprotect((char *)mem + guardsize, pd->guardsize - guardsize, prot) != 0)goto mprot_error;
#endifpd->guardsize = guardsize;}pd->reported_guardsize = guardsize;pd->lock = LLL_LOCK_INITIALIZER;*pdp = pd;// ⼆级指针，返回struct thread的地址，其实就是⼀个堆快的地址，对⽐之前的⽰意图#if TLS_TCB_AT_TPstacktop = ((char *)(pd + 1) - __static_tls_size);
#elif TLS_DTV_AT_TPstacktop = (char *)(pd - 1);
#endif#ifdef NEED_SEPARATE_REGISTER_STACK*stack = pd->stackblock;*stacksize = stacktop - *stack;
#else*stack = stacktop;
#endif}}return 0;
}// 所以，在创建线程的时候，其实就是在pthread库内部，创建好描述线程的结构体对象，填充属性
// 第二步就是调用clone，让内核创建轻量级进程，并执行传入的回调函数和参数
// 其实，库提供的无非就是未来操作线程的API，通过属性设置线程的优先级之类，而真正调度的
// 过程，还是内核来的。
// 但是如果我们自己在上层，设计一些让线程暂停出让CPU，然后我们上层自定义队列，让线程的tcb进行排队
// 那么我们其实也可以基于内核，在用户层实现线程的调度，很多更高级的语言，可能会做这个工作。