Chapter 19 Paging: Fast Address Translation (TLB)

第 19 章 分页:快速地址转换(TLB)

分页机制,可能会带来较高的性能开销...

分页逻辑会变得愈来愈复杂,这慢的无法接受...于是请出来自操作系统的个老朋友:硬件

地址转换旁路缓冲存储器(translation-lookaside buffer,TLB)

频繁发生的虚拟到物理地址转换的硬件缓存 cache

or 地址转换缓存 address-ranslation cache

TLB 带来了巨大的性能提升

1. TLB 的基本算法

假定使用简单的先行页表 linear page table 此时页表是一个数组...和硬件管理的 TLB hardware-managed TLB,硬件承担许多页表访问的责任...

1.1. TLB 控制流算法

The TLB, like all caches, is built on the premise that in the common case, translations are found in the cache (i.e., are hits). If so, little overhead is added, as the TLB is found near the processing core and is designed to be quite fast. When a miss occurs, the high cost of paging is incurred; the page table must be accessed to find the translation, and an extra memory reference (or more, with more complex page tables) results. If this happens often, the program will likely run noticeably more slowly; memory accesses, relative to most CPU instructions, are quite costly, and TLB misses lead to more memory accesses. Thus, it is our hope to avoid TLB misses as much as we can.

实际上还是一个蒙的过程,看运气...

2. 示例:访问数组

2.1. 数组的布局

我们会遍历数组,然后看看结果...

Let us summarize TLB activity during our ten accesses to the array: miss, hit, hit, miss, hit, hit, hit, miss, hit, hit. Thus, our TLB hit rate, which is the number of hits divided by the total number of accesses, is 70%. Although this is not too high (indeed, we desire hit rates that approach 100%), it is non-zero, which may be a surprise. Even though this is the first time the program accesses the array, the TLB improves performance due to spatial locality. The elements of the array are packed tightly into pages (i.e., they are close to one another in space), and thus only the first access to an element on a page yields a TLB miss.

...and updates the TLB accordingly...

TLB 表会进行及时更新...

2.2. 也需要注意页的大小的影响

如果页大小变大,在本例子中会提升 TLB hit 的概率!

TIP: USE CACHING WHEN POSSIBLE

尽可能利用缓存!

因为光速和其他物理限制起作用!大的缓存注定慢!我们只能用小而快的缓存,剩下的问题就是如何利用好缓存来提升性能...

2.3. 最后一点

如果在这次循环不久之后,再次 access 数组,我们会看到更好的结果...

In this case, the TLB hit rate would be high because of temporal locality, i.e., the quick re-referencing of memory items in time. Like any cache, TLBs rely upon both spatial and temporal locality for success, which are program properties. If the program of interest exhibits such locality (and many programs do), the TLB hit rate will likely be high.

3. 谁来处理 TLB 未命中

硬件或软件

3.1. 硬件

以前有复杂的指令集(sometimes called CISC, for complex-instruction set computers)

造硬件的人不相信搞操作系统的人...

因此硬件全权处理 TLB page-table base register

因此发生未命中时会抛出异常,会重试某条指令...

3.2. 软件管理

主要的优势是灵活,OS 可以用任意的 data structure 来处理页表,硬件只需要 cast exception 然后 OS 会 response to 剩下的工作...

RISC 和 CISC

In the early days, RISC chips made a huge impact, as they were noticeably faster [BC91]; many papers were written; a few companies were formed (e.g., MIPS and Sun). However, as time progressed, CISC manufacturers such as Intel incorporated many RISC techniques into the core of their processors, for example by adding early pipeline stages that transformed complex instructions into micro-instructions which could then be processed in a RISC-like manner. These innovations, plus a growing number of transistors on each chip, allowed CISC to remain competitive. The end result is that the debate died down, and today both types of processors can be made to run fast.

4. TLB 的内容

fully associative

全相联的

A TLB entry might look like this: VPN PFN other bits

4.1. TLB 的有效位 != 页表的有效位

A common mistake is to confuse the valid bits found in a TLB with those found in a page table. In a page table, when a page-table entry (PTE) is marked invalid, it means that the page has not been allocated by the process, and should not be accessed by a correctly-working program. The usual response when an invalid page is accessed is to trap to the OS, which will respond by killing the process.

A TLB valid bit, in contrast, simply refers to whether a TLB entry has a valid translation within it. When a system boots, for example, a common initial state for each TLB entry is to be set to invalid, because no address translations are yet cached there. Once virtual memory is enabled, and once programs start running and accessing their virtual address spaces, the TLB is slowly populated, and thus valid entries soon fill the TLB.

The TLB valid bit is quite useful when performing a context switch too, as we’ll discuss further below. By setting all TLB entries to invalid, the system can ensure that the about-to-be-run process does not accidentally use a virtual-to-physical translation from a previous process.

5. 上下文切换时对 TLB 的处理

硬件常常分不清哪个项属于哪个进程?

5.1. 运行前清空 TLB

每次切换到新进程都把 TLB 清空,清空操作把所有 valid bit set to 0,本质上清空了 TLB

6. 实际系统的 TLB 表项

RAM isn’t always RAM

7. 作业(测量作业)

有点复杂,而且有点底层了,这里跳过省略了...感兴趣的朋友可以自己实现...

Last updated