

Disk caching has gone a long way from a complete novelty, to a necessary performance improving technique, and eventually to something that we take for granted and never really care about, unless it breaks. Along the way, it has evolved, reached some dead ends, but finally converged to the solutions we know today. If you are interested like I am in the journey disk caching on IBM PC compatible computers has made, read on to learn more about it.

In the early days, disk caching was an unknown concept. Early computers had very little memory available and operating systems and applications strived to read data from tapes and disks directly to the target buffer and purge data from memory as soon as it was no longer useful. Keeping a copy of data in memory just in case was considered a waste of resources.
As a result, applications were developed so that they were extremely careful about how often they used external storage. For the most part, all read and write operations had to be initiated by the user. Usually, a programme would be loaded to memory as a whole in response to the user's command and never touch the tape or disk unless issued a command to retrieve or store a data file. This made external memory accesses very predictable, and the performance hit caused by the slow access to tape or disk was close to none.
Nevertheless, there was a form of caching in use back in the day. Some applications were so large that they were unable to fit in memory all at a time. To solve the problem, only the core of such a program would be loaded on startup, and the more extensive, buy rarely used functions were then loaded on-demand from storage, as overlays. In the worst case, an overlay would need to be loaded from disk every time the user accessed the function. However, if the same function was invoked repeatedly, the overlay manager could avoid the overhead of loading the same code all over again and reuse what had been loaded previously. While not exactly caching, this technique prevented redundant accesses to storage media.
While the first IBM PCs were not known for their memory capacity, falling prices of memory chips and storage units quickly made a PC with 640 KiB of RAM and a hard drive a possibility. On the one hand, this accelerated the development of more sophisticated and larger applications. On the other, it let users operate on larger data sets, sometimes even significantly larger than their computer's random access memory. With the advent of dBase, the users were able to accumulate data in variable-length records, perform extensive analysis of the data, and reuse the data in their own applications.
However, with the now much larger memory capacity, caching data in memory became a viable solution. It was even embraced by MS-DOS in the form of disk buffers, set up in CONFIG.SYS with the help of the BUFFERS statement. To be fair, the buffers were primarily responsible for making sure that all data was aligned on a paragraph boundary and always accessed with the granularity of the sector size. However, the operating system was retaining and keeping track of the sectors stored in the buffers and if an application requested the same sector to be read twice, the data would be provided directly from the buffers, with no disk access needed. This provided for a slight, but nice performance boost.
The problem with the buffers was that there were always too few of them. It was common to configure a hard-drive equipped system with 10 to 20 buffers, which in turn let the system cache up to 10 or 20 sectors. What was worse, increasing the buffer size didn't result in a sizeable performance boost, as many reads could be performed directly to the application's private memory and never reached the buffers. Thus, very few users ever bothered to bump the number of buffers to 30 or more, as it meant blocking over 15 KiB of conventional memory for no real benefit.
One of the solutions to the problem of inadequacy and memory consumption of in-memory disk buffers were caching controllers. Dedicated on-board memory, managed by the controller itself, could store quite a few sectors. Even 64 KiB of memory could cache 128 sectors, significantly improving performance while not using even a single byte of the computer's memory.

A good controller could also replace the simple policy of caching the most recently used sectors with a mixed policy which would allocate some memory to the most commonly used sectors. This could massively improve the performance of frequently used commands and applications, as well as the overall performance of the system, with the filesystem metadata area cached most of the time.
Caching controllers could also improve performance based on data locality. When a read was issued, the controller could read the whole track, the idea being that there was high probability that the operating system would read the subsequent sectors too. By the way, this technique resolved another problem as well: interleave. Due to the low performance of the computers of the time, it was often impossible to read and process sectors in their natural sequence. Instead, the disk was formatted so that the computer would read one sector and skip one or two subsequent sectors while processing the data. This meant that reading a whole track required at least two or three rotations of the spindle. A good caching controller could make use of its performance and prefetch the whole track to the cache, eliminating the interleave. This technique could lead to a massive performance improvement even if no redundant reads were made.
Up to this point, the only caching technique we have been discussing was caching the data being read from the media, with the hope of a sector being read more than once and the read being satisfied from the cache. All writes to the media would be performed synchronously, with no performance benefit. Such a caching scheme is called write-through caching, because the data is written directly to the disk, with its copy placed in the cache only in case someone would try to read the same sector right after writing it. However, there is also a way to improve write performance using caching. Instead of writing data directly to the disk, it can be stored in the cache, with the operating system informed that the write has been succeeded. The actual write may get delayed and performed completely by the controller. Of course, there must be a limit as to how much data can be written to the cache, and in the case of very large writes the software gets stalled waiting for the ”dirty sectors“ to get flushed to the disk. However, for the most part, data writes are small enough to fit in the cache, and the write-back caching scheme, what it is called, brings enormous performance benefits.
Lazy writes, implemented in the write-back caching scheme, solve one more problem as well. Some applications or operating systems write small chunks of data, smaller than the sector size. In such a case, updating multiple chunks of data results in the same sector being repeatedly read and written multiple times, which dramatically reduces performance. With write-back caching enabled, all these reads and writes can be satisfied by the cache, and the actual flushing of the modified sector to the media happens only once.
Write-back caching, as good as it is, has one major flaw. If the computer hangs or powers down before all the data written gets flushed to the media, the data can get lost or corrupted. What is worse, the user can be completely oblivious to the fact until they try to use the data again, because from the user experience perspective the write has been completed successfully. One way to prevent data from being lost was limiting how long the data can stay in the write cache before it gets flushed to the media. With a limit of one or two seconds, the risk of the user turning off the computer right after saving their data files is fairly minimal. As for the power loss issue, some caching controllers were equipped with static memory chips and a battery, so that they could complete the writes once the computer was powered on again.
For a short time, hardware caching disk controllers were considered the must-have equipment of high-end computers, sometimes even making it to the mid-range ones. However, they faded into obscurity before they became a thing. There were two reasons for that. First, the IDE standard removed the need for a separate disk controller board and moved the controller to the disk itself. Second, CPUs and memory got faster and faster with every iteration of the PC platform, with their bandwidth reaching tens of megabytes per second, while the ISA bus stayed capped below 10 MiB/s. In extreme cases, replacing a software disk cache solution with a hardware caching controller could… reduce disk throughput.
On faster 80286 and 80386-based PCs with more than 640 KiB of memory, the user could get the most out of their hard drive subsystem by installing a software disk caching solution such as SmartDrive. It could use the extended memory, normally inaccessible to MS-DOS applications, thus not only not wasting a byte of the precious conventional memory, but utilising the otherwise wasted upper range of memory. A cache of 128 KiB or 256 KiB produced a significant improvement in disk performance, on par with low-end hardware caching controllers.
Such software caches work by intercepting BIOS interrupt 13h, which provides storage services and handles floppy and hard drives. While this provided enough abstraction to cache all reads from hard disks, it also meant that SmartDrive and alikes were unable to cache reads from other storage devices, such as removable Bernoulli Box drives or CD-ROMs. But there was another, much more problematic issue with such an approach. As hard drive capacity grew, BIOSes started providing geometry translation to properly match between the addressing limits of IDE, BIOS and MS-DOS. As a result, sector-based software caches could be fooled by the translation layer and either corrupt data, or issue suboptimal reads leading to performance reduction.
In the early MS-DOS days, using SmartDrive or alikes was relatively easy. All one had to do was to include an additional DEVICE statement in their CONFIG.SYS, define the size of the cache, and decide on which kind of memory the cache should reside in. However, with time more advanced MS-DOS applications began to appear, making use of extended memory. Among them, Microsoft Windows was the primary issue, as it was perfectly capable of running its applications in extended memory, and it was always short on memory. Even on machines with 2 MiB of memory, users found themselves shuffling memory between Windows and SmartDrive, balancing the trade-off between high disk read/write performance and low amount of application swapping.
SmartDrive kept on getting better and more mature. Starting from version 4.0, it switched to being block-based, which resolved most of its shortcomings. First, it became aware of devices other than hard drives, what coincided nicely with the rising popularity of CD-ROM drives. Second, by working at MS-DOS's interrupt 21h level, it didn't have to worry about the geometry translation issues.
But the nicest thing about the new SmartDrive was its ability to do write-back caching. Saving data became a nearly instantaneous operation, with the hard drive indicator lighting up a few seconds later and the user being able to immediately resume working on the document. This new improvement became one of the most talked about features of SmartDrive, in a negative way, though. As it was implemented in software, it was extremely prone to the computer hanging, rebooting or turning off before the data had been written to disk. On standard filesystems it could lead to data being lost or corrupted, but on volumes compressed with tools such as Stacker or DriveSpace it could damage the compressed volume image and prevent the user from accessing their data.
This problem applied to all software caching solutions equipped with the write-back capability. However, due to SmartDrive being a standard component of MS-DOS and Windows, only Microsoft got blamed for exposing their customers to a potential data loss scenario. In reaction to that, the company made significant improvements to SmartDrive. Even the initial version applied a time-out to “dirty” data which made sure that every write should get flushed to disk fairly quickly. However, the subsequent versions of SmartDrive improved on that by intercepting the Ctrl+Alt+Del shortcut and flushing all buffers before the computer rebooted. Also, to further minimise the risk of data loss, the buffers were flushed every time before the DOS prompt appeared on screen, which made sure the data was safe in case the user turned off their computer or ran a program that immediately hung up the system. This by no means meant mitigating the risk of data loss, and for the really paranoid users Microsoft provided an easy switch that disabled the write-back capability of SmartDrive altogether.
Overall, the improved software disk caching solutions could make a difference in regards to perceived performance. By combining file-aware block-based read caching and prefetch and write-back caching, SmartDrive could turn even a slow hard drive into an average performer—given enough memory was installed in the system.
Software caches gave their users a nice performance boost while keeping the impact on conventional memory minimal. However, this could cause some problems on machines equipped with DMA-based disk controllers. As soon as paging was enabled, either by Microsoft Windows or even EMM386 or a similar advanced memory manager, addresses pointing beyond 1 MiB were no longer directly mapped to physical memory. When attempting to exchange data between the cache and the drive, SmartDrive would pass the virtual address to the controller's BIOS. The BIOS, however, knew nothing about the virtual memory translation set up by the memory manager and would program the DMA controller using the virtual address, not the physical one. In most cases it meant that reading data from disk would corrupt the data, and the machine would lock up soon after SmartDrive was loaded.
The problem rarely occured without SmartDrive, as MS-DOS never allowed applications to load data to the extended memory, and even with a memory manager running the first megabyte of RAM was always mapped directly to the physical memory.
SmartDrive was able to cope with the problem, though. Optional double-buffering could be enabled, which made sure that all reads and writes were performed to and from conventional memory. There was a performance penalty to that, caused by the need to copy all the data between two memory blocks on each read/write operation; this is why this function was disabled by default.
Modern-day 32- and 64-bit operating systems are all based on the principle of paging. In many cases, there may be even no such concept as directly reading or writing data to disk. Most operations can be translated to memory-mapped files, a construct that represents a file as a contiguous area in virtual memory and assigns a page fault handler that will fetch a page from the file whenever it is not present in physical memory, and make sure all dirty pages (the ones that have been modified) eventually make it to the disk. This is how executable files are run, and in some operating system this is also the way normal reads and writes are implemented.
An operating system that is completely based on the concepts of paging and memory mapping can contain no file cache whatsoever. All disk caching may be implemented in the form of a page cache that keeps the balance between anonymous pages, file-backed pages and free pages. An interesting side effect of switching from a sector-based or block-based file cache to a page cache is that implementing a dynamic cache—one that adapts its size as memory pressure changes—is trivial.
Of course, it is impossible to create a page cache without a file system running in the background and communicating directly with physical storage devices. This split makes it possible to implement specialised metadata caching policies in the filesystem itself, and keep the data cache abstracted away and shared across all types of filesystems. You could not implement a feature-packed filesystem such as NTFS without being able to perform atomic, write-through updates to metadata.
The cooperation between filesystem drivers and the page cache makes it also possible to provide applications with advanced caching policies. When opening a file, an application can provide hints on what caching policy it expects:
The hints mechanism makes it possible to create highly reliable applications that never fail to persist data, such as databases or audit logs. The application no longer needs to bypass the operating system to make sure the data has really made it to the disk. Instead, a set of flags needs to be provided in the API call, and then it is the operating system's responsibility to synchronously write data or even forcibly flush the drive's internal cache, so that once the call returns, the application can be sure the data is safe.
Even though caching controllers are no longer a thing, there is still some hardware caching involved in the disk technology. Especially with traditional, rotational drives, effective caching is key to providing high performance. The drive's internal controller can prefetch whole tracks, even ahead the current position, so that once the operating system issues a read request, it can be fulfilled without waiting for the head to move. Also, when writing data, the drive can accept all the data immediately, and complete the write in the background.
A large on-board cache is crucial to providing the command queuing capability. With command queuing, the drive's controller can realise operations in an order different from the originally requested, optimising the access pattern so that it requires fewer head movements.
The write-back capability that most modern-day hard drives support also means that there's still a window of opportunity for data loss. That is why some server-grade operating systems disable write caching on all drives by default. However, a drive can be forced to flush its caches at any time, and as long as applications provide the operating system with proper hints on their access patterns, the relevant writes may become completely synchronous.
With SSDs, there is not much need to cache data. Nonetheless, many controllers keep a cache around, if only to speed up remapping flash memory blocks and facilitate SATA transfers. However, some recent SSD controller designs can work with no on-board RAM and still provide stellar performance.
SSDs still benefit from software caches, though. There is still an enormous performance gap between the throughput of today's CPUs and memory subsystems, and SATA or PCIe buses. However, while the cost of a cache miss on a rotational hard drive is giant, with an SSD it is fairly small.
Disk caching has evolved enormously, from no caching at all, to a simple buffer holding up several sectors, to sophisticated sector- and block-based solutions with write-back caching and prefetch, and eventually to page caches tightly integrated with the virtual memory manager.
However, through all that time, caching solutions have kept the promise of delivering as much performance as possible in regards to data storage. Even though the storage devices have improved from providing hundreds of kilobytes to thousands of megabytes per second, disk caching can still help reduce the time needed to read and write data and, most importantly, reduces the need for the application developers to optimise their reads and writes to match the internal organisation of a specific drive.