You might notice that the use of pthread_create() to create a thread, followed by an immediate call to pthread_join(), is a pretty strange way to create a thread. In fact, there is an easier way to accomplish this exact task; it’s called a procedure call. Clearly, we’ll usually be creating more than just one thread and waiting for it to complete, otherwise there is not much purpose to using threads at all.
We should note that not all code that is multi-threaded uses the join routine. For example, a multi-threaded web server might create a number of worker threads, and then use the main thread to accept requests and pass them to the workers, indefinitely. Such long-lived programs thus may not need to join.
However, a parallel program that creates threads to execute a particular task (in parallel) will likely use join to make sure all such work completes before exiting or moving onto the next stage of computation.
3. 锁
POSIX 线程库提供的最有用的函数集:LOCK
提供互斥进入临界区的那些函数...
最基本的一对
3.1. 当有一段代码是临界区的时候
The intent of the code is as follows:
if no other thread holds the lock when pthread mutex lock() is called, the thread will acquire the lock and enter the critical section.
If another thread does indeed hold the lock, the thread trying to grab the lock will not return from the call until it has acquired the lock (implying that the thread holding the lock has released it via the unlock call).
Of course, many threads may be stuck waiting inside the lock acquisition function at a given time; only the thread with the lock acquired, however, should call unlock.
There are a number of small but important things to remember when you use the POSIX thread library (or really, any thread library) to build a multi-threaded program. They are:
Keep it simple. Above all else, any code to lock or signal between threads should be as simple as possible. Tricky thread interactions lead to bugs.
Minimize thread interactions. Try to keep the number of ways in which threads interact to a minimum. Each interaction should be carefully thought out and constructed with tried and true approaches (many of which we will learn about in the coming chapters).
Initialize locks and condition variables. Failure to do so will lead to code that sometimes works and sometimes fails in very strange ways.
Check your return codes. Of course, in any C and UNIX programming you do, you should be checking each and every return code, and it’s true here as well. Failure to do so will lead to bizarre and hard to understand behavior, making you likely to (a) scream, (b) pull some of your hair out, or (c) both.
Be careful with how you pass arguments to, and return values from, threads. In particular, any time you are passing a reference to a variable allocated on the stack, you are probably doing something wrong.
Each thread has its own stack. As related to the point above, please remember that each thread has its own stack. Thus, if you have a locally-allocated variable inside of some function a thread is executing, it is essentially private to that thread; no other thread can (easily) access it. To share data between threads, the values must be in the heap or otherwise some locale that is globally accessible.
Always use condition variables to signal between threads. While it is often tempting to use a simple flag, don’t do it.
Use the manual pages. On Linux, in particular, the pthread man pages are highly informative and discuss many of the nuances presented here, often in even more detail. Read them carefully!
6. 作业(编码作业)
又被中文版阉割了...
In this section, we’ll write some simple multi-threaded programs and use a specific tool, called helgrind, to find problems in these programs. Read the README in the homework download for details on how to build the programs and run helgrind.
root@LAPTOP-GT06V0GS:/mnt/d/CSLab/osTEP/chapter27/hwk_code# valgrind --tool=helgrind ./main-race
==2065== Helgrind, a thread error detector
==2065== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2065== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2065== Command: ./main-race
==2065==
==2065== ---Thread-Announcement------------------------------------------
==2065==
==2065== Thread #1 is the program's root thread
==2065==
==2065== ---Thread-Announcement------------------------------------------
==2065==
==2065== Thread #2 was created
==2065== at 0x49A1A23: clone (clone.S:76)
==2065== by 0x49A1BA2: __clone_internal_fallback (clone-internal.c:64)
==2065== by 0x49A1BA2: __clone_internal (clone-internal.c:109)
==2065== by 0x491454F: create_thread (pthread_create.c:297)
==2065== by 0x49151A4: pthread_create@@GLIBC_2.34 (pthread_create.c:836)
==2065== by 0x4854975: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x10926B: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==
==2065== ----------------------------------------------------------------
==2065==
==2065== Lock at 0x10C060 was first observed
==2065== at 0x48512DC: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x109207: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065== by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x4914AA3: start_thread (pthread_create.c:447)
==2065== by 0x49A1A33: clone (clone.S:100)
==2065== Address 0x10c060 is 0 bytes inside data symbol "mutex"
==2065==
==2065== Possible data race during read of size 4 at 0x10C040 by thread #1
==2065== Locks held: none
==2065== at 0x109298: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==
==2065== This conflicts with a previous write of size 4 by thread #2
==2065== Locks held: 1, at address 0x10C060
==2065== at 0x109211: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065== by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x4914AA3: start_thread (pthread_create.c:447)
==2065== by 0x49A1A33: clone (clone.S:100)
==2065== Address 0x10c040 is 0 bytes inside data symbol "balance"
==2065==
==2065== ----------------------------------------------------------------
==2065==
==2065== Lock at 0x10C060 was first observed
==2065== at 0x48512DC: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x109207: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065== by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x4914AA3: start_thread (pthread_create.c:447)
==2065== by 0x49A1A33: clone (clone.S:100)
==2065== Address 0x10c060 is 0 bytes inside data symbol "mutex"
==2065==
==2065== Possible data race during write of size 4 at 0x10C040 by thread #1
==2065== Locks held: none
==2065== at 0x1092A1: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065==
==2065== This conflicts with a previous write of size 4 by thread #2
==2065== Locks held: 1, at address 0x10C060
==2065== at 0x109211: worker (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
==2065== by 0x4854B7A: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==2065== by 0x4914AA3: start_thread (pthread_create.c:447)
==2065== by 0x49A1A33: clone (clone.S:100)
==2065== Address 0x10c040 is 0 bytes inside data symbol "balance"
==2065==
==2065==
==2065== Use --history-level=approx or =none to gain increased speed, at
==2065== the cost of reduced accuracy of conflicting-access information
==2065== For lists of detected and suppressed errors, rerun with: -s
==2065== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==2065== Possible data race during read of size 4 at 0x10C040 by thread #1
==2065== Locks held: none
==2065== at 0x109298: main (in /mnt/d/CSLab/osTEP/chapter27/hwk_code/main-race)
root@LAPTOP-GT06V0GS:/mnt/d/CSLab/osTEP/chapter27/hwk_code# valgrind --tool=helgrind ./main-race
==2058== Helgrind, a thread error detector
==2058== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2058== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2058== Command: ./main-race
==2058==
==2058==
==2058== Use --history-level=approx or =none to gain increased speed, at
==2058== the cost of reduced accuracy of conflicting-access information
==2058== For lists of detected and suppressed errors, rerun with: -s
==2058== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)
valgrind --tool=helgrind ./main-signal
...
Possible data race during read of size 4 at 0x10C014 by thread #1
...
root@LAPTOP-GT06V0GS:/mnt/d/CSLab/osTEP/chapter27/hwk_code# valgrind --tool=helgrind ./main-signal-cv
==2155== Helgrind, a thread error detector
==2155== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2155== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2155== Command: ./main-signal-cv
==2155==
this should print first
this should print last
==2155==
==2155== Use --history-level=approx or =none to gain increased speed, at
==2155== the cost of reduced accuracy of conflicting-access information
==2155== For lists of detected and suppressed errors, rerun with: -s
==2155== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)