RSS

Disk caching: all you need to know

Liczba odsłon: 1711

Disk caching has gone a long way from a com­ple­te no­vel­ty, to a ne­ce­ssary per­for­mance im­pro­ving tech­ni­que, and even­tu­ally to some­thing that we take for gran­ted and never really care about, un­less it breaks. Along the way, it has evolved, reached some dead ends, but finally con­ver­ged to the solu­tions we know to­day. If you are in­te­res­ted like I am in the jour­ney disk caching on IBM PC com­pa­ti­ble com­pu­ters has made, read on to learn more about it.

Image by Gerd Altmann from Pixabay

Early days

In the early days, disk caching was an un­known con­cept. Early com­pu­ters had very lit­tle me­mo­ry avail­able and ope­ra­ting sys­tems and appli­cations stri­ved to read da­ta from ta­pes and disks di­rect­ly to the tar­get buffer and pur­ge da­ta from me­mo­ry as soon as it was no lon­ger use­ful. Keeping a co­py of da­ta in me­mo­ry just in case was con­si­de­red a waste of re­sour­ces.

As a result, appli­cations were de­ve­loped so that they were extre­mely care­ful about how often they used exter­nal sto­rage. For the most part, all read and write ope­ra­tions had to be ini­tia­ted by the user. Usually, a prog­ram­me would be loaded to me­mo­ry as a whole in res­ponse to the user's command and never touch the tape or disk un­less issued a command to re­trieve or store a da­ta file. This made exter­nal me­mo­ry acces­ses very pre­dic­table, and the per­for­mance hit cau­sed by the slow access to tape or disk was close to none.

Neverthe­less, there was a form of caching in use back in the day. Some appli­cations were so large that they were unable to fit in me­mo­ry all at a time. To solve the prob­lem, on­ly the core of such a pro­gram would be loaded on start­up, and the more ex­ten­sive, buy rare­ly used func­tions were then loaded on-de­mand from sto­rage, as over­lays. In the worst case, an over­lay would need to be loaded from disk every time the user acces­sed the fun­ction. However, if the sa­me fun­ction was in­voked re­peat­ed­ly, the over­lay ma­na­ger could avoid the over­head of loading the sa­me code all over again and reuse what had been loaded pre­vio­usly. While not exactly caching, this tech­ni­que pre­vented re­dun­dant acces­ses to sto­rage media.

MS-DOS and its sim­ple buffers

While the first IBM PCs were not known for their me­mo­ry ca­pa­city, falling pri­ces of me­mo­ry chips and sto­rage units quickly made a PC with 640 KiB of RAM and a hard drive a pos­si­bi­li­ty. On the one hand, this acce­le­ra­ted the de­ve­lop­ment of more sophis­ti­ca­ted and larger appli­cations. On the other, it let users ope­rate on larger da­ta sets, some­times even sig­ni­fi­can­tly larger than their com­pu­ter's ran­dom access me­mo­ry. With the ad­vent of dBase, the users were able to accu­mu­late da­ta in variable-length re­cords, per­form ex­ten­sive ana­ly­sis of the da­ta, and reuse the da­ta in their own appli­cations.

However, with the now much larger me­mo­ry ca­pa­city, caching da­ta in me­mo­ry be­came a viable so­lu­tion. It was even em­bra­ced by MS-DOS in the form of disk buffers, set up in CONFIG.SYS with the help of the BUFFERS sta­te­ment. To be fair, the buffers were pri­ma­ri­ly res­pon­sib­le for making sure that all da­ta was alig­ned on a para­graph boun­dary and al­ways acces­sed with the gra­nu­la­ri­ty of the sec­tor size. However, the ope­ra­ting sys­tem was retaining and keeping track of the sectors sto­red in the buffers and if an appli­ca­tion re­ques­ted the sa­me sec­tor to be read twice, the da­ta would be pro­vided di­rect­ly from the buffers, with no disk access needed. This pro­vided for a slight, but nice per­for­mance boost.

The prob­lem with the buffers was that there were al­ways too few of them. It was common to con­fi­gure a hard-drive equip­ped sys­tem with 10 to 20 buffers, which in turn let the sys­tem ca­che up to 10 or 20 sectors. What was worse, increas­ing the buffer size didn't result in a size­able per­for­mance boost, as many reads could be per­for­med di­rect­ly to the appli­cation's pri­vate me­mo­ry and never reached the buffers. Thus, very few users ever both­ered to bump the num­ber of buffers to 30 or more, as it meant blocking over 15 KiB of con­ven­tio­nal me­mo­ry for no real be­ne­fit.

Hardware caching con­trol­lers

One of the solu­tions to the prob­lem of in­ade­qua­cy and me­mo­ry con­sump­tion of in-me­mo­ry disk buffers were caching con­trol­lers. Dedicated on-board me­mo­ry, ma­na­ged by the con­trol­ler itself, could store quite a few sectors. Even 64 KiB of me­mo­ry could ca­che 128 sectors, sig­ni­fi­can­tly im­pro­ving per­for­mance while not using even a single byte of the com­pu­ter's me­mo­ry.

Promise SuperIDE
Promise SuperIDE Plus ISA IDE caching host adap­ter
Source: vogons.org

A good con­trol­ler could also re­place the sim­ple po­li­cy of caching the most re­cen­tly used sectors with a mixed po­li­cy which would allo­cate some me­mo­ry to the most com­mon­ly used sectors. This could mas­si­ve­ly im­prove the per­for­mance of fre­quently used com­mands and appli­cations, as well as the over­all per­for­mance of the sys­tem, with the file­sys­tem meta­data area cached most of the time.

Caching con­trol­lers could also im­prove per­for­mance based on da­ta lo­ca­lity. When a read was issued, the con­trol­ler could read the whole track, the idea being that there was high pro­ba­bi­li­ty that the ope­ra­ting sys­tem would read the sub­se­quent sectors too. By the way, this tech­ni­que re­solved ano­ther prob­lem as well: inter­leave. Due to the low per­for­mance of the com­pu­ters of the time, it was often im­pos­sible to read and pro­cess sectors in their na­tu­ral se­quen­ce. Instead, the disk was for­mat­ted so that the com­pu­ter would read one sec­tor and skip one or two sub­se­quent sectors while pro­cessing the da­ta. This meant that reading a whole track re­quired at least two or three ro­ta­tions of the spin­dle. A good caching con­trol­ler could make use of its per­for­mance and pre­fetch the whole track to the ca­che, eli­mi­na­ting the inter­leave. This tech­ni­que could lead to a mas­sive per­for­mance im­prove­ment even if no re­dun­dant reads were made.

Up to this point, the on­ly caching tech­ni­que we have been dis­cus­sing was caching the da­ta being read from the media, with the hope of a sec­tor being read more than once and the read being satis­fied from the ca­che. All wri­tes to the media would be per­for­med syn­chro­no­usly, with no per­for­mance be­ne­fit. Such a caching scheme is called write-through caching, because the da­ta is written di­rect­ly to the disk, with its co­py placed in the ca­che on­ly in case some­one would try to read the sa­me sec­tor right after wri­ting it. However, there is also a way to im­prove write per­for­mance using caching. Instead of wri­ting da­ta di­rect­ly to the disk, it can be sto­red in the ca­che, with the ope­ra­ting sys­tem in­for­med that the write has been suc­ceeded. The actual write may get de­layed and per­for­med com­ple­tely by the con­trol­ler. Of course, there must be a li­mit as to how much da­ta can be written to the ca­che, and in the case of very large wri­tes the soft­wa­re gets stalled waiting for the ”dirty sectors“ to get flushed to the disk. However, for the most part, da­ta wri­tes are small enough to fit in the ca­che, and the write-back caching scheme, what it is called, brings enor­mous per­for­mance be­ne­fits.

Lazy wri­tes, im­ple­men­ted in the write-back caching scheme, solve one more prob­lem as well. Some appli­cations or ope­ra­ting sys­tems write small chunks of da­ta, smaller than the sec­tor size. In such a case, up­dating mul­tiple chunks of da­ta re­sults in the sa­me sec­tor being re­peat­ed­ly read and written mul­tiple times, which dra­ma­ti­cally re­du­ces per­for­mance. With write-back caching enab­led, all these reads and wri­tes can be satis­fied by the ca­che, and the actual flushing of the mo­di­fied sec­tor to the media hap­pens on­ly once.

Write-back caching, as good as it is, has one major flaw. If the com­pu­ter hangs or powers down be­fore all the da­ta written gets flushed to the media, the da­ta can get lost or cor­rup­ted. What is worse, the user can be com­ple­tely obli­vious to the fact until they try to use the da­ta again, because from the user ex­per­ience pers­pec­tive the write has been com­pleted success­fully. One way to pre­vent da­ta from being lost was li­mi­ting how long the da­ta can stay in the write ca­che be­fore it gets flushed to the media. With a li­mit of one or two se­conds, the risk of the user tur­ning off the com­pu­ter right after saving their da­ta files is fair­ly mini­mal. As for the po­wer loss issue, some caching con­trol­lers were equip­ped with sta­tic me­mo­ry chips and a battery, so that they could com­ple­te the wri­tes once the com­pu­ter was po­we­red on again.

For a short time, hard­ware caching disk con­trol­lers were con­si­de­red the must-have equip­ment of high-end com­pu­ters, some­times even making it to the mid-range ones. However, they fa­ded into ob­scu­ri­ty be­fore they be­came a thing. There were two rea­sons for that. First, the IDE stan­dard re­moved the need for a sepa­rate disk con­trol­ler board and moved the con­trol­ler to the disk itself. Second, CPUs and me­mo­ry got fas­ter and fas­ter with every ite­ra­tion of the PC plat­form, with their band­width reaching tens of mega­bytes per se­cond, while the ISA bus stayed capped be­low 10 MiB/s. In ex­treme cases, re­pla­cing a soft­wa­re disk ca­che so­lu­tion with a hard­ware caching con­trol­ler could… re­duce disk through­put.

Early soft­wa­re disk caches

On fas­ter 80286 and 80386-based PCs with more than 640 KiB of me­mo­ry, the user could get the most out of their hard drive sub­sys­tem by in­stal­ling a soft­wa­re disk caching so­lu­tion such as Smart­Drive. It could use the ex­ten­ded me­mo­ry, nor­mal­ly in­acces­sible to MS-DOS appli­cations, thus not on­ly not wasting a byte of the pre­cious con­ven­tio­nal me­mo­ry, but uti­li­sing the other­wise wasted up­per range of me­mo­ry. A ca­che of 128 KiB or 256 KiB pro­duced a sig­ni­fi­cant im­prove­ment in disk per­for­mance, on par with low-end hard­ware caching con­trol­lers.

Such soft­wa­re caches work by in­ter­cep­ting BIOS in­ter­rupt 13h, which pro­vides sto­rage ser­vi­ces and hand­les flop­py and hard drives. While this pro­vided enough ab­strac­tion to ca­che all reads from hard disks, it also meant that Smart­Drive and alikes were unable to ca­che reads from other sto­rage de­vi­ces, such as re­mo­vab­le Bernoulli Box drives or CD-ROMs. But there was ano­ther, much more prob­le­ma­tic issue with such an appro­ach. As hard drive ca­pa­city grew, BIOSes star­ted pro­vi­ding geo­met­ry trans­la­tion to pro­per­ly match be­tween the addres­sing li­mits of IDE, BIOS and MS-DOS. As a result, sec­tor-based soft­wa­re caches could be fooled by the trans­la­tion layer and either cor­rupt da­ta, or issue sub­op­ti­mal reads leading to per­for­mance re­duc­tion.

In the early MS-DOS days, using Smart­Drive or alikes was re­la­ti­vely easy. All one had to do was to in­clu­de an addi­tio­nal DEVICE sta­te­ment in their CONFIG.SYS, de­fine the size of the ca­che, and de­ci­de on which kind of me­mo­ry the ca­che should re­side in. However, with time more ad­vanced MS-DOS appli­cations began to appear, making use of ex­ten­ded me­mo­ry. Among them, Micro­soft Win­dows was the pri­ma­ry issue, as it was per­fect­ly capable of run­ning its appli­cations in ex­ten­ded me­mo­ry, and it was al­ways short on me­mo­ry. Even on ma­chines with 2 MiB of me­mo­ry, users found them­sel­ves shuff­ling me­mo­ry be­tween Win­dows and Smart­Drive, ba­lan­cing the trade-off be­tween high disk read/write per­for­mance and low amo­unt of appli­ca­tion swap­ping.

Advanced soft­wa­re disk caches

Smart­Drive kept on getting better and more ma­tu­re. Starting from ver­sion 4.0, it switch­ed to being block-based, which re­solved most of its short­com­ings. First, it be­came aware of de­vi­ces other than hard drives, what co­in­ci­ded nice­ly with the rising po­pu­la­ri­ty of CD-ROM drives. Second, by work­ing at MS-DOS's in­ter­rupt 21h level, it didn't have to worry about the geo­met­ry trans­la­tion issues.

But the ni­cest thing about the new Smart­Drive was its abi­lity to do write-back caching. Saving da­ta be­came a nearly in­stan­ta­ne­ous ope­ra­tion, with the hard drive in­di­ca­tor lighting up a few se­conds later and the user being able to immedia­tely re­sume work­ing on the docu­ment. This new im­prove­ment be­came one of the most talked about fea­tu­res of Smart­Drive, in a nega­tive way, though. As it was im­ple­men­ted in soft­wa­re, it was extre­mely pro­ne to the com­pu­ter hanging, re­boot­ing or tur­ning off be­fore the da­ta had been written to disk. On stan­dard file­sys­tems it could lead to da­ta being lost or cor­rup­ted, but on vo­lu­mes com­pres­sed with tools such as Stacker or Drive­Space it could da­ma­ge the com­pres­sed vo­lume image and pre­vent the user from accessing their da­ta.

This prob­lem applied to all soft­wa­re caching solu­tions equip­ped with the write-back ca­pa­bi­lity. However, due to Smart­Drive being a stan­dard com­po­nent of MS-DOS and Win­dows, on­ly Micro­soft got bla­med for ex­po­sing their cus­to­mers to a po­ten­tial da­ta loss sce­nario. In re­ac­tion to that, the com­pa­ny made sig­ni­fi­cant improve­ments to Smart­Drive. Even the ini­tial ver­sion applied a time-out to “dirty” da­ta which made sure that every write should get flushed to disk fair­ly quickly. However, the sub­se­quent ver­sions of Smart­Drive im­pro­ved on that by in­ter­cep­ting the Ctrl+Alt+Del short­cut and flushing all buffers be­fore the com­pu­ter re­booted. Also, to fur­ther mi­ni­mi­se the risk of da­ta loss, the buffers were flushed every time be­fore the DOS prompt appeared on screen, which made sure the da­ta was safe in case the user turned off their com­pu­ter or ran a pro­gram that immedia­tely hung up the sys­tem. This by no means meant miti­ga­ting the risk of da­ta loss, and for the really para­noid users Micro­soft pro­vided an easy switch that dis­abled the write-back ca­pa­bi­lity of Smart­Drive al­to­get­her.

Overall, the im­pro­ved soft­wa­re disk caching solu­tions could make a dif­fe­ren­ce in re­gards to per­ce­ived per­for­mance. By com­bi­ning file-aware block-based read caching and pre­fetch and write-back caching, Smart­Drive could turn even a slow hard drive into an ave­rage per­for­mer—given enough me­mo­ry was in­stal­led in the sys­tem.

Double buffering

Software caches gave their users a nice per­for­mance boost while keeping the im­pact on con­ven­tio­nal me­mo­ry mini­mal. However, this could cause some prob­lems on ma­chines equip­ped with DMA-based disk con­trol­lers. As soon as paging was enab­led, either by Micro­soft Win­dows or even EMM386 or a simi­lar ad­vanced me­mo­ry ma­na­ger, add­ress­es pointing beyond 1 MiB were no lon­ger di­rect­ly mapped to phy­si­cal me­mo­ry. When at­temp­ting to ex­change da­ta be­tween the ca­che and the drive, Smart­Drive would pass the vir­tu­al address to the con­trol­ler's BIOS. The BIOS, how­ever, knew nothing about the vir­tu­al me­mo­ry trans­la­tion set up by the me­mo­ry ma­na­ger and would pro­gram the DMA con­trol­ler using the vir­tu­al address, not the phy­si­cal one. In most cases it meant that reading da­ta from disk would cor­rupt the da­ta, and the ma­chi­ne would lock up soon after Smart­Drive was loaded.

The prob­lem rare­ly occu­red with­out Smart­Drive, as MS-DOS never allowed appli­cations to load da­ta to the ex­ten­ded me­mo­ry, and even with a me­mo­ry ma­na­ger run­ning the first mega­byte of RAM was al­ways mapped di­rect­ly to the phy­si­cal me­mo­ry.

Smart­Drive was able to cope with the prob­lem, though. Optional double-buffering could be enab­led, which made sure that all reads and wri­tes were per­for­med to and from con­ven­tio­nal me­mo­ry. There was a per­for­mance pe­nal­ty to that, cau­sed by the need to co­py all the da­ta be­tween two me­mo­ry blocks on each read/write ope­ra­tion; this is why this fun­ction was dis­abled by de­fault.

Disk caching in mo­dern ope­ra­ting sys­tems

Modern-day 32- and 64-bit ope­ra­ting sys­tems are all based on the prin­ciple of paging. In many cases, there may be even no such con­cept as di­rect­ly reading or wri­ting da­ta to disk. Most ope­ra­tions can be trans­la­ted to me­mo­ry-mapped files, a con­struct that re­pre­sents a file as a con­ti­gu­ous area in vir­tu­al me­mo­ry and assigns a page fault hand­ler that will fetch a page from the file when­ever it is not pre­sent in phy­si­cal me­mo­ry, and make sure all dirty pages (the ones that have been mo­di­fied) even­tu­ally make it to the disk. This is how exe­cu­tab­le files are run, and in some ope­ra­ting sys­tem this is also the way nor­mal reads and wri­tes are im­ple­men­ted.

An ope­ra­ting sys­tem that is com­ple­tely based on the con­cepts of paging and me­mo­ry map­ping can con­tain no file ca­che what­so­ever. All disk caching may be im­ple­men­ted in the form of a page ca­che that keeps the ba­lan­ce be­tween ano­ny­mous pages, file-backed pages and free pages. An in­te­res­ting side ef­fect of switch­ing from a sec­tor-based or block-based file ca­che to a page ca­che is that imple­men­ting a dy­na­mic ca­che—one that adapts its size as me­mo­ry pres­sure changes—is trivial.

Of course, it is im­pos­sible to cre­ate a page ca­che with­out a file sys­tem run­ning in the back­ground and com­mu­ni­ca­ting di­rect­ly with phy­si­cal sto­rage de­vi­ces. This split makes it possible to im­ple­ment spe­cia­li­sed meta­data caching po­li­cies in the file­sys­tem itself, and keep the da­ta ca­che ab­strac­ted away and shared across all types of file­sys­tems. You could not im­ple­ment a fea­ture-packed file­sys­tem such as NTFS with­out being able to per­form ato­mic, write-through up­dates to meta­data.

The co­ope­ration be­tween file­sys­tem dri­vers and the page ca­che makes it also possible to pro­vide appli­cations with ad­vanced caching po­li­cies. When opening a file, an appli­ca­tion can pro­vide hints on what caching po­li­cy it ex­pects:

The hints me­cha­nism makes it possible to cre­ate highly reliable appli­cations that never fail to per­sist da­ta, such as data­ba­ses or audit logs. The appli­ca­tion no lon­ger needs to by­pass the ope­ra­ting sys­tem to make sure the da­ta has really made it to the disk. Instead, a set of flags needs to be pro­vided in the API call, and then it is the ope­ra­ting system's res­pon­si­bi­li­ty to syn­chro­no­usly write da­ta or even for­cib­ly flush the dri­ve's inter­nal ca­che, so that once the call re­turns, the appli­ca­tion can be sure the da­ta is safe.

Return of hard­ware caches

Even though caching con­trol­lers are no lon­ger a thing, there is still some hard­ware caching in­vol­ved in the disk tech­no­logy. Especially with tradi­tional, ro­ta­tio­nal drives, effective caching is key to pro­vi­ding high per­for­mance. The dri­ve's inter­nal con­trol­ler can pre­fetch whole tracks, even ahead the current po­sition, so that once the ope­ra­ting sys­tem issues a read re­quest, it can be ful­fil­led with­out waiting for the head to move. Also, when wri­ting da­ta, the drive can accept all the da­ta immedia­tely, and com­ple­te the write in the back­ground.

A large on-board ca­che is cru­cial to pro­vi­ding the command que­uing ca­pa­bi­lity. With command que­uing, the dri­ve's con­trol­ler can rea­li­se ope­ra­tions in an or­der diffe­rent from the ori­gi­nally re­ques­ted, opti­mi­sing the access pattern so that it re­quires fewer head move­ments.

The write-back ca­pa­bi­lity that most mo­dern-day hard drives sup­port also means that there's still a win­dow of op­por­tu­ni­ty for da­ta loss. That is why some ser­ver-grade ope­ra­ting sys­tems dis­able write caching on all drives by de­fault. However, a drive can be forced to flush its caches at any time, and as long as appli­cations pro­vide the ope­ra­ting sys­tem with pro­per hints on their access patterns, the rele­vant wri­tes may be­come com­ple­tely syn­chro­nous.

SSDs need no caching

With SSDs, there is not much need to ca­che da­ta. Nonetheless, many con­trol­lers keep a ca­che around, if on­ly to speed up re­map­ping flash me­mo­ry blocks and fa­ci­li­tate SATA trans­fers. However, some re­cent SSD con­trol­ler de­signs can work with no on-board RAM and still pro­vide stel­lar per­for­mance.

SSDs still be­ne­fit from soft­wa­re caches, though. There is still an enor­mous per­for­mance gap be­tween the through­put of to­day's CPUs and me­mo­ry sub­sys­tems, and SATA or PCIe buses. However, while the cost of a ca­che miss on a ro­ta­tio­nal hard drive is giant, with an SSD it is fair­ly small.

Summary

Disk caching has evolved enor­mo­usly, from no caching at all, to a sim­ple buffer holding up se­ve­ral sectors, to sophis­ti­ca­ted sec­tor- and block-based solu­tions with write-back caching and pre­fetch, and even­tu­ally to page caches tightly in­te­gra­ted with the vir­tu­al me­mo­ry ma­na­ger.

However, through all that time, caching solu­tions have kept the pro­mi­se of de­li­ve­ring as much per­for­mance as possible in re­gards to da­ta sto­rage. Even though the sto­rage de­vi­ces have im­pro­ved from pro­vi­ding hund­reds of kilo­bytes to thou­sands of mega­bytes per se­cond, disk caching can still help re­duce the time needed to read and write da­ta and, most im­por­tant­ly, re­du­ces the need for the appli­ca­tion deve­lopers to op­ti­mise their reads and wri­tes to match the inter­nal orga­ni­sa­tion of a spe­ci­fic drive.