Chapter 27 Interlude: The Threading API

第 27 章插叙：线程 API

线程 API 介绍和使用...

1. 线程创建

#include <pthread.h>

int pthread_create(
    pthread_t *thread,              // 与线程交互的句柄, 传入以便初始化
    const pthread_attr_t *attr,     // 线程属性(栈大小, 调度优先级)...可以为NULL
    void *(*start_routine)(void *), // 线程执行的函数
    void *arg                       // 线程执行函数的参数
);

2. 线程完成

You might notice that the use of pthread_create() to create a thread, followed by an immediate call to pthread_join(), is a pretty strange way to create a thread. In fact, there is an easier way to accomplish this exact task; it’s called a procedure call. Clearly, we’ll usually be creating more than just one thread and waiting for it to complete, otherwise there is not much purpose to using threads at all.

#include <stdio.h>
#include <pthread.h>
#include <assert.h>
#include <stdlib.h>
#include "common.h"
#include "common_threads.h"

typedef struct myarg_t
{
  int a;
  int b;
} myarg_t;

typedef struct myret_t
{
  int x;
  int y;
} myret_t;

void *mythread(void *arg)
{
  myarg_t *m = (myarg_t *)arg;
  printf("%d %d\n", m->a, m->b);
  myret_t *r = Malloc(sizeof(myret_t));
  r->x = 1;
  r->y = 2;
  return (void *)r;
}

int main(int argc, char *argv[])
{
  pthread_t p;
  myret_t *m;
  myarg_t args;
  args.a = 10;
  args.b = 20;
  Pthread_create(&p, NULL, mythread, (void *)&args);
  Pthread_join(p, (void **)&m);
  printf("%d %d\n", m->x, m->y);
  return 0;
}

We should note that not all code that is multi-threaded uses the join routine. For example, a multi-threaded web server might create a number of worker threads, and then use the main thread to accept requests and pass them to the workers, indefinitely. Such long-lived programs thus may not need to join.

However, a parallel program that creates threads to execute a particular task (in parallel) will likely use join to make sure all such work completes before exiting or moving onto the next stage of computation.

3. 锁

POSIX 线程库提供的最有用的函数集：LOCK

提供互斥进入临界区的那些函数...

最基本的一对

int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);

3.1. 当有一段代码是临界区的时候

pthread_mutex_t lock;
pthread_mutex_lock(&lock);

x = x + 1; // or any other

pthread_mutex_unlock(&lock);

The intent of the code is as follows:

if no other thread holds the lock when pthread mutex lock() is called, the thread will acquire the lock and enter the critical section.
If another thread does indeed hold the lock, the thread trying to grab the lock will not return from the call until it has acquired the lock (implying that the thread holding the lock has released it via the unlock call).
Of course, many threads may be stuck waiting inside the lock acquisition function at a given time; only the thread with the lock acquired, however, should call unlock.

遗憾的是，这段代码有两个重要的问题...

缺乏正确的初始化
1. 使用 PTHREAD_MUTEX_INITIALIZER 来初始化
2. 或调用 pthread_mutex_init(&lock, NULL);
获取锁和释放锁没有检查错误码
1. 可以进行包装函数...
2. #define Pthread_mutex_lock(m) assert(pthread_mutex_lock(m) == 0);

还有其他的函数对...

3.2. 用于获取锁

int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timedlock(pthread_mutex_t *mutex, struct timespec *abs_timeout);

如果锁被占用：

trylock 会失败
timedlock 会在超时或获取锁后返回
- 以先发生者为准

这些都是我们后续聊到死锁的好的解决方案...

4. 条件变量

condition variable...当线程之间必须发生某种信号时，如果一个线程在等待另一个线程继续执行某些操作，条件变量就很有用。

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);

4.1. 调用

线程在调用条件变量函数的时候，必须先拥有这把锁...

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

Pthread_mutex_lock(&lock);

while (ready == 0)
    Pthread_cond_wait(&cond, &lock);

Pthread_mutex_unlock(&lock);

Pthread_mutex_lock(&lock);

ready = 1;

Pthread_cond_signal(&cond);
Pthread_mutex_unlock(&lock);

不要省略成这样

while (ready == 0)
; // spin

ready = 1;

引入 CPU 自旋...反而性能更差...

引入竞态条件，会出错...

而且也不要使用 if(read == 0)...未及时检查更新可能导致未知后果...

5. 编译和允许

使用线程的 API

要添加头文件 pthread.h
链接时需要 pthread 库：-lpthread 标记

但他能不能工作，完全是另一回事...

5.1. ASIDE: THREAD API GUIDELINES

There are a number of small but important things to remember when you use the POSIX thread library (or really, any thread library) to build a multi-threaded program. They are:

Keep it simple. Above all else, any code to lock or signal between threads should be as simple as possible. Tricky thread interactions lead to bugs.
Minimize thread interactions. Try to keep the number of ways in which threads interact to a minimum. Each interaction should be carefully thought out and constructed with tried and true approaches (many of which we will learn about in the coming chapters).
Initialize locks and condition variables. Failure to do so will lead to code that sometimes works and sometimes fails in very strange ways.
Check your return codes. Of course, in any C and UNIX programming you do, you should be checking each and every return code, and it’s true here as well. Failure to do so will lead to bizarre and hard to understand behavior, making you likely to (a) scream, (b) pull some of your hair out, or (c) both.
Be careful with how you pass arguments to, and return values from, threads. In particular, any time you are passing a reference to a variable allocated on the stack, you are probably doing something wrong.
Each thread has its own stack. As related to the point above, please remember that each thread has its own stack. Thus, if you have a locally-allocated variable inside of some function a thread is executing, it is essentially private to that thread; no other thread can (easily) access it. To share data between threads, the values must be in the heap or otherwise some locale that is globally accessible.
Always use condition variables to signal between threads. While it is often tempting to use a simple flag, don’t do it.
Use the manual pages. On Linux, in particular, the pthread man pages are highly informative and discuss many of the nuances presented here, often in even more detail. Read them carefully!

6. 作业（编码作业）

又被中文版阉割了...

In this section, we’ll write some simple multi-threaded programs and use a specific tool, called helgrind, to find problems in these programs. Read the README in the homework download for details on how to build the programs and run helgrind.

6.1. 您需要查看几个不同的 C 程序：

main-race.c：一个简单的竞争条件
main-deadlock.c：一个简单的死锁
main-deadlock-global.c：解决死锁问题
main-signal.c：一个简单的子/父信号示例
main-signal-cv.c：通过条件变量更有效地发出信号
common_threads.h：带有包装器的头文件，使代码检查错误并更具可读性

6.2. Question1 & 2

共享变量 + 双线程访问 == 代码

root@LAPTOP-GT06V0GS:/mnt/d/CSLab/osTEP/chapter27/hwk_code# valgrind --tool=helgrind ./main-race
==2065== Helgrind, a thread error detector
==2065== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2065== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2065== Command: ./main-race
==2065==
==2065== ---Thread-Announcement------------------------------------------
==2065==
==2065== Thread #1 is the program's root thread
==2065==
==2065== ---Thread-Announcement------------------------------------------
==2065==
==2065== Thread #2 was created
==2065==    at 0x49A1A23: clone (clone.S:76)
==2065==    by 0x49A1BA2: __clone_internal_fallback (clone-internal.c:64)
==2065==    by 0x49A1BA2: __clone_internal (clone-internal.c:109)
==2065==    by 0x491454F: create_thread (pthread_create.c:297)
==2065==    by 0x49151A4: pthread_create@@GLIBC_2.34 (pthread_create.c:836)
==2065==    by 0x4854975: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x10926B: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==
==2065== ----------------------------------------------------------------
==2065==
==2065==  Lock at 0x10C060 was first observed
==2065==    at 0x48512DC: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x109207: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==    by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x4914AA3: start_thread (pthread_create.c:447)
==2065==    by 0x49A1A33: clone (clone.S:100)
==2065==  Address 0x10c060 is 0 bytes inside data symbol "mutex"
==2065==
==2065== Possible data race during read of size 4 at 0x10C040 by thread #1
==2065== Locks held: none
==2065==    at 0x109298: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==
==2065== This conflicts with a previous write of size 4 by thread #2
==2065== Locks held: 1, at address 0x10C060
==2065==    at 0x109211: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==    by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x4914AA3: start_thread (pthread_create.c:447)
==2065==    by 0x49A1A33: clone (clone.S:100)
==2065==  Address 0x10c040 is 0 bytes inside data symbol "balance"
==2065==
==2065== ----------------------------------------------------------------
==2065==
==2065==  Lock at 0x10C060 was first observed
==2065==    at 0x48512DC: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x109207: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==    by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x4914AA3: start_thread (pthread_create.c:447)
==2065==    by 0x49A1A33: clone (clone.S:100)
==2065==  Address 0x10c060 is 0 bytes inside data symbol "mutex"
==2065==
==2065== Possible data race during write of size 4 at 0x10C040 by thread #1
==2065== Locks held: none
==2065==    at 0x1092A1: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==
==2065== This conflicts with a previous write of size 4 by thread #2
==2065== Locks held: 1, at address 0x10C060
==2065==    at 0x109211: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==    by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065==    by 0x4914AA3: start_thread (pthread_create.c:447)
==2065==    by 0x49A1A33: clone (clone.S:100)
==2065==  Address 0x10c040 is 0 bytes inside data symbol "balance"
==2065==
==2065==
==2065== Use --history-level=approx or =none to gain increased speed, at
==2065== the cost of reduced accuracy of conflicting-access information
==2065== For lists of detected and suppressed errors, rerun with: -s
==2065== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

其中

==2065== Possible data race during read of size 4 at 0x10C040 by thread #1
==2065== Locks held: none
==2065==    at 0x109298: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)

表示有 data race...

取消共享变量访问，修改之后 || 加锁...之后

int balance = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock(&mutex);
balance++; // protected access
pthread_mutex_unlock(&mutex);

没有报错了

root@LAPTOP-GT06V0GS:/mnt/d/CSLab/osTEP/chapter27/hwk_code# valgrind --tool=helgrind ./main-race
==2058== Helgrind, a thread error detector
==2058== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2058== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2058== Command: ./main-race
==2058==
==2058==
==2058== Use --history-level=approx or =none to gain increased speed, at
==2058== the cost of reduced accuracy of conflicting-access information
==2058== For lists of detected and suppressed errors, rerun with: -s
==2058== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)

6.3. Q3 & Q4 & Q5

Helgrind 通过检测锁的获取顺序来识别潜在的死锁问题。它会记录每个锁的获取顺序，并检查是否存在违反顺序的情况。

死锁的典型场景：

线程 A 和线程 B 分别以相反的顺序获取两个锁，导致循环等待：
- 线程 A 持有锁 m1 ，等待锁 m2 。
- 线程 B 持有锁 m2 ，等待锁 m1 。

全局锁...

一定程度能解决思索问题，但是还是会报错的！

全局锁 g 确实确保了线程在获取其他锁之前必须先获取全局锁，但这并不能解决线程在全局锁保护下以不同顺序获取其他锁的问题。换句话说，全局锁只是确保了线程在获取其他锁之前必须先获取全局锁，但它并没有强制线程以一致的顺序获取其他锁。

6.4. Q6 & Q7 & Q8

报告：

valgrind --tool=helgrind ./main-signal

...
Possible data race during read of size 4 at 0x10C014 by thread #1
...

加上条件变量

root@LAPTOP-GT06V0GS:/mnt/d/CSLab/osTEP/chapter27/hwk_code# valgrind --tool=helgrind ./main-signal-cv
==2155== Helgrind, a thread error detector
==2155== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2155== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2155== Command: ./main-signal-cv
==2155==
this should print first
this should print last
==2155==
==2155== Use --history-level=approx or =none to gain increased speed, at
==2155== the cost of reduced accuracy of conflicting-access information
==2155== For lists of detected and suppressed errors, rerun with: -s
==2155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)

PreviousChapter 26 Concurrency: an introduction NextChapter 28 Locks

Last updated 2 months ago