Abstract Attribute-Based Prediction of File Properties

更新时间：2023-08-10 07:43:01 阅读量：工程科技文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

abstract推荐度：
相关推荐

We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern.

Attribute-Based Prediction of File Properties Daniel Ellard,Michael Mesnier,Eno Thereska,Gregory R.Ganger,Margo Seltzer Abstract

We present evidence that attributes that are known to

the?le system when a?le is created,such as its name,

permission mode,and owner,are often strongly related

to future properties of the?le such as its ultimate size,

lifespan,and access pattern.More importantly,we show

that we can exploit these relationships to automatically

generate predictive models for these properties,and that

these predictions are suf?ciently accurate to enable opti-

mizations.

1Introduction

In“Hints for Computer System Design,”Lampson

tells us to“Use hints to speed up normal execution.”[14]

The?le system community has rediscovered this prin-ciple a number of times,suggesting that hints about a ?le’s access pattern,size,and lifespan can aid in a va-riety of ways including improving the?le’s layout on disk and increasing the effectiveness of prefetching and caching.Unfortunately,earlier hint-based schemes have required the application designer or programmer to sup-ply explicit hints using a process that is both tedious and error-prone,or to use a special compiler that can recog-nize speci?c I/O patterns and automatically insert hints. Neither of these schemes have been widely adopted.

In this paper,we show that applications already give useful hints to the?le system,in the form of?le names and other attributes,and that the?le system can success-fully predict many?le properties from these hints.

We begin by presenting statistical evidence from three contemporary NFS traces that many?le attributes,such as the?le name,user,group,and mode,are strongly re-lated to?le properties including?le size,lifespan,and access patterns.We then present a method for automati-cally constructing tree-based predictors for the properties of a?le based on these attributes and show that these

predictions are accurate.Finally,we discuss uses for such predictions,including an implementation of a sys-tem that uses them to improve?le layout by anticipating which blocks will be the most frequently accessed and grouping these blocks in a small area on the disk,thereby improving reference locality.

The rest of this paper is organized as follows:Sec-tion2discusses related work.Section3describes the collection of NFS traces we analyze in this study.Sec-tion4makes the case for attribute-based predictions by presenting a statistical analysis of the relationship be-tween attributes of?les and their properties.Section5 presents ABLE,a classi?cation-tree-based predictor for several?le properties based on their attributes.Section6 discusses how such models might be used,and demon-strates an example application which increases the local-ity of reference for on-disk block layout.Section7con-cludes.

2Related Work

As the gap between I/O and CPU performance has increased many efforts have attempted to address it.An entire industry and research community has emerged to 1

attack I/O performance;?le systems have been modi?ed, rewritten and rethought in attempts to reduce the number of synchronous disk requests.Signi?cant effort has also been expended to make caches more effective so that the number of disk requests can be reduced.Many powerful heuristics have been discovered,often from the analyses of real workloads,and incorporated into production?le systems.All of these endeavors have been productive, but I/O performance is still losing ground to CPU,mem-ory,and network performance,and we have not resolved the I/O crisis to which Patterson refers in the original RAID paper,written more than?fteen years ago[24].

There is extensive ongoing research in the?le system and database communities regarding the optimization of various aspects of performance,reliability,and availabil-ity of data access.Many heuristics have been developed and incorporated into popular?le systems like the Fast File System(FFS)[17].Many of these heuristics depend on assumptions about workloads and?le properties.

One example of a contemporary?le system is the Fast File System(FFS)[17],whose basic design is nearly twenty years old and yet continues to be tuned[7].For example,FFS is optimized to handle small?les in a dif-ferent manner than large?les;it attempts to organize small?les on disk so that they are near their metadata and other?les in the directory,under the assumption that ?les in the same directory are often accessed together. Some?le systems go to more extreme lengths,such as storing the contents of short?les in the same disk block as their inode[22]or storing the directory and inode in-formation in the same block[11].

In addition to size,other properties of?les,such as whether they are write-mostly or read-mostly,have been found useful to drive various?le system policies.For example,the assumption underlying the design of the log-structured?le system(LFS)is that write-latency is the bottleneck for?le system performance[26].Hybrid schemes that use LFS to store write-mostly?les have also found this approach useful[23].In contrast,if a ?le is known to be read-mostly,it may bene?t from ag-gressive replication for increased performance and avail-ability[27].

Unfortunately,every widespread heuristic approach suffers from at least one of the following problems:First, if the heuristics are wrong,they may cause performance to degrade,and second,if the heuristics are dynamic, they may take considerable time,computation,and stor-age space to adapt to the current workload(and if the workload varies over time,the adaptation might never converge).

One partial solution to the problem of inappropriate

or incomplete?le system heuristics is for applications to supply hints to the?le system about the?les’antici-pated access patterns.In some contexts these hints can be extremely successful,especially when used to guide the policies for prefetching and selective caching[5,25].

The drawback of this approach is that it requires that applications be modi?ed to provide hints.There has been work in having the compiler automatically generate hints,but success in this area has been largely con?ned to scienti?c workloads with highly regular access patterns

[21],and no?le system that uses these ideas has been

widely deployed.

In previous work,we noted that for some workloads, applications(and the users of the applications)already provide hints about the future of the?les that they create via the names they choose for those?les[8].In this paper we generalize this?nding and show that?le names,as well as other attributes such as uid and mode,are,in fact, hints that may be useful to the?le system.

In addition to static analyses of workloads,there has been research aimed at understanding the dynamic be-haviors of?les.Previous work has shown that proper-ties of?les depend on the applications and users access-ing them[2,9]and because users and applications may change,workloads change as well.

Considerable work has been done in developing and exploiting predictive models for the access patterns of ?les(or their data blocks)[1,30,31].Most work in this area focuses on rather complex and computationally ex-pensive predictive models.Furthermore,such models are often needed on a?le-by-?le basis and do not attempt to?nd relationships or classes among?les to generalize

[15].We extend this work by providing a framework for

automatically classifying?les with similar behaviors.

There also exist systems that use a variety of layout policies that provide non-uniform access characteristics.

In the most extreme case,a system like AutoRAID[31] employs several different methods to store blocks with different characteristics.On a more mundane level,the performance of nearly all modern disk drives is highly in?uenced by the multi-zone effect,which can cause the effective transfer rate for the outer tracks of a disk to be considerably higher than that of the inner[19].There is ample evidence that adaptive block layout can im-prove performance;we will demonstrate that we can pre-emptively determine the layout heuristics to achieve this bene?t without having to reorganize?les after their ini-tial placement.

Advances in arti?cial intelligence and machine learn-ing have resulted in ef?cient algorithms for building ac-curate predictive models that can be used in today’s?le 2

systems.We leverage this work and utilize a form of classi?cation tree to capture the relationships between ?le attributes and their behaviors,as further described in Section5.

The work we present here does not focus on new heuristics or policies for optimizing the?le system.In-stead it enables a?le system to choose the proper policies to apply by predicting whether or not the assumptions on which these policies rely will hold for a particular?le. 3The Traces

To demonstrate that our?ndings are not con?ned to a single workload,system,or set of users,we analyze traces taken from three servers:

DEAS03traces a Network Appliance Filer that serves the home directories for professors,graduate stu-dents,and staff of the Harvard University Divi-sion of Engineering and Applied Sciences.This trace captures a mix of research and development, administrative,and email traf?c.The DEAS03 trace begins at midnight on2/17/2003and ends on 3/2/2003.

EECS03traces a Network Appliance Filer that serves the home directories for some of the professors, graduate students,and staff of the Electrical Engi-neering and Computer Science department of the Harvard University Division of Engineering and Applied Sciences.This trace captures the canonical engineering workstation workload.The EECS03 trace begins at midnight on2/17/2003and ends on 3/2/2003.

CAMPUS traces one of14?le systems that hold home directories for the Harvard College and Harvard Graduate School of Arts and Sciences(GSAS)stu-dents and staff.The CAMPUS workload is almost entirely email.The CAMPUS trace begins at mid-night10/15/2001and ends on10/28/2003.

Ideally our analyses would include NFS traces from a variety of workloads including commercial datacenter servers,but despite our diligent efforts we have not been able to acquire any such traces.

The DEAS03and EECS03traces are taken from the same systems as the DEAS and EECS traces described in earlier work[9],but are more recent and contain infor-mation not available in the earlier traces.The CAMPUS trace is the same trace described in detail in an earlier

study[8],although we draw our samples from a longer subset of the trace.All three traces were collected with nfsdump[10].

Table1gives a summary of the average hourly oper-ation counts and mixes for the workloads captured in the traces.These show that there are differences between these workloads,at least in terms of the operation mix.

CAMPUS is dominated by reads and more than85%of the operations are either reads or writes.DEAS03has proportionally fewer reads and writes and more meta-data requests(getattr,lookup,and access)than CAMPUS,but reads are still the most common opera-tion.On EECS03,meta-data operations comprise the majority of the workload.

Earlier trace studies have shown that hourly opera-tion counts are correlated with the time of day and day of week,and much of the variance in hourly operation count is eliminated by using only the working hours[8].Table 1shows that this trend appears in our data as well.Since the“work-week”hours(9am-6pm,Monday through Fri-day)are both the busiest and most stable subset of the data,we focus on these hours for many of our analyses.

One aspect of these traces that has an impact on our research is that they have been anonymized,using the method described in earlier work[8].During the anonymization UIDs,GIDs,and host IP numbers are simply remapped to new values,so no information is lost about the relationship between these identi?ers and other variables in the data.The anonymization method also preserves some types of information about?le and direc-tory names–for example,if two names share the same suf?x,then the anonymized forms of these names will also share the same suf?x.Unfortunately,some informa-tion about?le names is lost.A survey of the?le names in our own directories leads us to believe that capital-ization,use of whitespace,and some forms of punctua-tion in?le names may be useful attributes of?le names, but none of this information survives anonymization.As we will show in the remaining sections of this paper,the anonymized names provide enough information to build good models,but we believe that it may be possible to build even more accurate models from unanonymized data.

4The Case for Attribute-Based Predic-tions

To explore the associations between the create-time attributes of a?le and its longer-term properties,we be-gin by scanning our traces to extract both the initial at-3

All Hours

DEAS0315.7%(55.3%)29.2%(49.3%)

EECS0312.3%(123.8%) 3.2%(263.2%)

CAMPUS21.3%(58.9%) 2.3%(60.7%)

DEAS0316.8%(28.9%)26.6%(29.4%)

EECS0312.3%(86.7%) 3.0%(129.9%)

CAMPUS22.3%(16.7%) 2.4%(32.6%)

Table1:The average percentage of read,write,lookup,getattr,and access operations for a fourteen day trace from each server.The averages are shown for both all hours during the trace and for the peak hours(9:00am–6:00pm on weekdays).The coef?cient of variation for each hourly average is given in parentheses.

DEAS033/24/2003

relative strength of association

first

middle last EECS033/24/2003

relative strength of association

first

middle last uid gid

CAMPUS 10/22/2001

relative strength of association

first

middle last siz e=0

Figure 2:The relative strength of the correlation between the properties “lifetime (lftmd)is one second or shorter”and “size is zero”and several ?le attributes (as indicated by the chi-squared values)for one day of each trace.The chi-squared values are normalized relative to the attribute with the strongest association.The last,middle,and ?rst attributes refer to components of the ?le name,as de-scribed in Section 5.

culty with such a test is distinguishing natural variation from a statistically signi?cant difference;the chi-squared test is used to detect and quantify such differences.The sum of squared differences between the expected and observed number of ?les is our chi-squared statis-tic,and we calculate this statistic for each combination of attribute and property.In statistical terms,we are try-ing to disprove the null hypothesis that ?le attributes are not associated with ?le properties.A chi-squared statis-tic of zero indicates that there is no association (i.e.,the expected and observed values are the same),while the magnitude of a non-zero statistic indicates the degree of association.This value is then used to calculate a p-value which estimates the probability that the difference be-tween the expected and observed values is coincidental.For all of our tests,we have a high chi-square statis-tic,and the p-values are very close to zero.Therefore we may,with very high con?dence,reject the null hy-pothesis and claim that attributes are associated with ?le properties.

The chi-squared test can also be used to rank the at-tributes by the degree of association.Figure 2shows how the chi-squared values differ for the size and lifespan properties.There are two important points to take from this ?gure.First,the attribute association differs across properties for a given trace –for example,in CAM-PUS the uid shows a relatively strong association with the lifespan,yet a weak association with the size.The second point is that the relative rankings differ across traces.For example,on CAMPUS the middle compo-nent of a ?le name has strong association with lifespan and size,but the association is much weaker on DEAS03and EECS03.

Although we show only two properties in these graphs,similarly diverse associations exist for other properties (e.g.,directory entry lifespan and read/write ratio).In Section 5we show how these associations can be dynamically discovered and used to make predictions.

The chi-squared test described in this section is a one-way test for association.This test provides statistical ev-idence that individual attributes are associated with ?le properties.It does not,however capture associations be-tween subsets of the attributes and ?le properties.It also does not provide an easy way to understand exactly what those associations are.One can extend this methodology to use -way chi-square tests,but the next section dis-cusses a more ef?cient way for both capturing multi-way associations and extracting those associations ef?ciently.

5The ABLE Predictor

The results of the previous section establish that each of a?le’s attributes(?le name,uid,gid,mode)are,to some extent,associated with its long term properties (size,lifespan,and access pattern).This fact suggests that these associations can be used to make predictions on the properties of a?le at creation time.The chi-squared results also give us hope that higher order as-sociations(i.e.,an association between more than one at-tribute and a property)may exist,which could result in more accurate predictions.

To investigate the possibility of creating a predictive model from our data,we constructed an Attribute-Based Learning Environment(ABLE).ABLE is a learning en-vironment for evaluating the predictive power of?le at-tributes.The input to ABLE is a table of information about?les whose attributes and properties we have al-ready observed and a list of properties for which we wish to predict.The output is a statistical analysis of the sam-ple,a chi-squared ranking of each?le attribute relative to each property,and a collection of predictive models that can be used to make predictions about new?les.

In this paper,we focus on three properties:the?le size,the?le access pattern(read-only or write-only),and the?le lifespan.On UNIX?le systems,there are two as-pects of?le lifespan that are interesting:the?rst is how long the underlying?le container(usually implemented as an inode)will live,and the other is how long a par-ticular name of a?le will live(because each?le may be linked from more than one name).We treat these cases separately and make predictions for each.

To simplify our evaluation,each property we wish to predict is represented by a Boolean predicate.For exam-ple:

size

size16KB

inode lifespan1sec

?le name lifespan1sec

read-only

write-only

We believe these properties are representative of properties that a?le or storage system designer might use to optimize for different classes of?les.For exam-ple,if we know that a?le will be read-only,then we might choose to replicate it for performance and avail-ability,but this optimization would be inappropriate for ?les that are written frequently but rarely read.Write-only?les might be stored in a partition optimized for writes(e.g.,a log-structured?le system),and short-lived ?les could live their brief lives in NVRAM.In Section

6,for example,we show that by identifying small,short-lived?les and hot directories,we can use predictions to optimize directory updates in a real?le system.

ABLE consists of three steps:

Step1:Obtaining Training Data.Obtain a sample of ?les and for each?le record its attributes(name,

uid,gid,mode)and properties(size,lifespan,and

access pattern).

Step2:Constructing a Predictive Classi?er.For each ?le property,we train a learning algorithm to clas-

sify each?le in the training data according to that

property.The result of this step is a set of predic-

tive models that classi?es each?le in the training

data and can be used to make predictions on newly

created?les.

Step3:Validating the e the model to pre-dict the properties of new?les,and then check

whether the predictions are accurate.

Each of these steps contains a number of interesting issues.For the?rst step,we must decide how to obtain representative samples.For the second,we must choose

a learning algorithm.For the third,we must choose how

to evaluate the success of the predictions.We may con-sider different types of errors to have different degrees of importance–for example,if the?le system treats short-lived?les in a special manner,then incorrectly predicting that a?le will be short-lived may be worse than incor-rectly predicting that a?le will be long-lived.

5.1Obtaining Training Data

There are two basic ways to obtain a sample of?les: from a running system or from traces.ABLE currently uses the latter approach,using the NFS traces described in Section3.

Determining some of the attributes of a?le(gid,uid, mode)is a simple matter of scanning the traces and cap-turing any command(e.g.,lookup or getattr)that get or set attributes.To capture?le names,ABLE simu-lates each of the directory operations(such as create, symlink,link,and rename)in order to infer the?le names and connect them to the underlying?les.

Table2shows samples take from DEAS03.The spe-ci?c property for this table is the write-only property;

each?le is classi?ed as write-only or not.For this prop-erty,ABLE classi?es each?le by tracking it through the trace and observing whether it is ever read.

Attributes

last gid mode uid wronly

18aa018b72

18b1118b7e

18aac18b28

18b5818c6a

18abe18b7f

18aad18b2f

18aab18ab4

18abe18b7f

18abe18b7c

18a9018aa1

18abe18b7c

5.2Constructing Prediction Models

There are a variety of learning algorithms to build classi?ers for data.In general,these algorithms attempt to cluster the set of observations into a group of classes. For example,if we were trying to predict the write-only property,our learning algorithm would attempt to?nd all write-only?les with similar attributes,and place those into a single class(e.g.,all?les ending in.log).If the learning algorithm can successfully determine the classes of?les,then it is able to classify the?les.On new ?les that do not yet have established properties,the al-gorithm simply examines the attributes to determine the ?le’s class,and predicts that the properties of the new?le will be the properties of its class.Therefore,the?rst step in making predictions is to select an appropriate learning algorithm to classify the training data.ABLE uses the ID3algorithm to construct a decision tree[4].

A decision tree is ideal for our purposes for several reasons:First,our attributes and properties are categori-cal(i.e.,?le names,uid,gid,and mode are symbols,with no inherent ordering,as are the binary classi?cations for each property).Second,the computational scalability of a decision tree model makes it well-suited for use in an on-line system.Third,decision trees are humanly read-able and allow us to gain some intuition about how the ?les are being classi?ed.In short,decision trees are easy to train,produce predictions quickly,and the resulting models are easy to interpret.

Given a sample of?les,a decision tree learning algo-rithm attempts to recursively split the samples into clus-ters.The goal is to create clusters whose?les have sim-ilar attributes and similar classi?cations.Figure3illus-trates the ID3algorithm used by ABLE to induce a tree

2. split attribute A

3. if leave nodes are "pure" done

4. else, if attributes remaining, goto 1

Figure3:Constructing a simple decision tree from the training data in Table2.

EECS03

MODE ABLE MODE

size=098.97%66.58%98.57%

0size16KB95.42%57.69%98.83%

lftmd1s(?le)88.16%58.80%72.95%

lftmd1s(direntry)96.96%52.80%77.66%

wronly91.17%46.98%81.83%

rdonly75.55%49.63%81.24%

Table3:A comparison of the accuracy of the ABLE and MODE predictors for several properties for the three traces. MODE always predicts the value that occurred most frequently in the training sample,without considering any at-tributes of the new?le.

against the ABLE(unconstrained)decision trees. MABLE:trees induced with only the inode attributes (mode,uid,gid).

NABLE:trees induced with only?le names.

Figure4compares the predication accuracies for ABLE,MABLE and NABLE.For the purpose of clarity, this?gure only shows the accuracy for three of our bi-nary properties(size,write-only,and?le name lifespan); the results for our other properties are similar.The?g-ure shows that ABLE usually outperforms both MABLE and NABLE.This tells that some multi-way associations exist between the?le name attributes and other attributes that allow us to make more accurate predictions when all are considered.An example of a multi-way association would be that the lifespan of a?le depends on both the ?le name and the user who created the?le.

However,the CAMPUS and EECS03results tell us that,in some situations,ABLE does worse than MABLE or NABLE.In these traces,some multi-way associations existed on Monday that did not generalize to new?les on Tuesday.This is a common problem of over-?tting the data with too many attributes,although the differences are not severe in our evaluation.

There are two important points to take away from our analysis of MABLE and NABLE.First,more attributes are not always better.We can fall into a trap known as the curse of dimensionality in which each attribute adds a new dimension to the sample space[6].Unless we see a suf?cient number of?les,our decision trees may get clouded by transient multi-way associations that do not apply in the long run.Second,NABLE and MABLE offer predictions roughly equivalent to ABLE.This is somewhat surprising,particularly in the case of MABLE, because it means that we can make accurate predictions even if we do not consider?le names at all.

Given enough training data,ABLE always outper-forms MABLE and NABLE.For the results presented in the paper,ABLE required an extra week of training to detect the false attribute associations,due in part to the small number of attributes.We anticipate that more training will be required for systems with larger attribute spaces,such as object-based storage with extended at-tributes[18]and non-UNIX?le systems such as CIFS or NTFS[29].Furthermore,irrelevant attributes may need to be pre-?ltered before induction of the decision tree[6] to prevent over-?tting.The automation of ABLE’s train-ing policies,including attribute?ltering,is an area for future work.

DEAS0303/24/2003

(

)

EECS0303/24/2003

(

)

CAMPUS

10/22/2001

(

)

Figure4:Comparing the prediction accuracy of ABLE, NABLE,and MABLE for the properties size=0,write-only,and lifetime1second.Prediction accuracy is measured as percentage correct.

?NABLE predicts “write-mostly" if

first=cache & last=gif [5742/94.0%]?MABLE predicts “size=0” if

mode=777 [4535/99.8%]?ABLE predicts “deleted within 1 sec” if

first = 01eb & last = 0004 & mode = 777 &

uid = 18abe [1148/99.7%] Figure5:Example rules for DEAS03discovered by NABLE,MABLE,and ABLE.The number of?les that match the attributes and the observed probability that these?les have the given property are shown on the right. For example,NABLE predicts that names whose name begins with cache and end in.gif will be“write-mostly”.This prediction is based on observations of 5742?les,94.0%of which have the“write-only”prop-erty.

6.1Benchmarking Attribute-Based Systems

One of the dif?culties of measuring the utility of attribute-based hints in the context of real?le systems is ?nding a suitable benchmark.Synthetic workload gener-ators typically create?les in a predictable and unrealistic manner–they make little or no attempt to use realis-tic?le names or mimic the diverse behaviors of differ-ent users.If we train our models on data gathered when these benchmarks are running then our predictions will probably be unrealistically accurate,but if we train on a workload that does not include the benchmarks,then our predictions for the?les created by the benchmark will be uncharacteristically bad.

Our solution to this problem is to construct a benchmark directly from traces of the target work-load,thereby ensuring that the associations between?le names,modes,and uids during the trace will resemble those present in the actual workload.This leads imme-diately to a new problem–in order to replay the traces, we need a real?le system on which to play them.The usual solution to this problem is to recreate the traced ?le system from a snapshot of its metadata taken at a known time,and then begin replaying from that time [28].This method works well when snapshots are avail-able,and when a suitable device is available on which to reconstruct.Unfortunately we have neither–there are no publicly-available snapshots of the systems from which the traces were taken,and even if there were,reconstruct-ing them would require at least500GB of disk space and many hours of set-up time per test.

To solve this problem,we have developed a new method of performing a snapshot-less trace replay that uses the trace itself to reconstruct the subset of the?le system necessary to replay a given section of the trace. We call these sub-snapshots.In essence,our method is to replay the trace several times,inferring knowledge about the underlying?le system by observing how it is used.

The?rst pass reconstructs as much as it can of the ?le system hierarchy,primarily by observing the param-eters and responses from lookup,getattr,create, mkdir,rename,remove,and link calls.The idea of discovering the?le system hierarchy by snooping NFS calls is not new and has been in widespread use since the technique was described by Blaze[3].Unfortunately,as other researchers have noted,this method is imperfect–some of the information may be absent from the trace because of missed packets or because it is cached on the client during the trace period and thus never visible in the trace.To compensate for this missing data,we keep track of each?le or directory that is accessed during the trace, but whose metadata we cannot infer.When the?rst pass

is?nished,we may either?ll in the missing values with reasonable defaults or discard the incomplete items.

Because we are using attribute-based models,we can-not simply invent?le attributes and hope that they will work.However,there is a danger that if we discard all the objects for which we have incomplete information, we may lose a signi?cant portion of workload.For the experiment described in this section,we use only name attributes.After examining the traces we cannot?nd names for fewer than than5%of the?les mentioned in the workload(and typically much less).Therefore we believe that discarding these“anonymous?les”does not alter the workload to an important degree.

Files or directories for which we cannot infer the par-ent are attached to the root directory,because from our own experiments we have found that this is the direc-tory most likely to be cached on the client.For example, we rarely see lookups for/home/username,because home directories are frequently accessed and rarely in-validated.

The output of the?rst pass is a table of pathnames of each?le and directory observed in the trace along with a unique identi?er for each object,and the size,mode,and other relevant information necessary to reconstruct the object.The purpose of the new identi?er is to provide

a convenient substitute for the?le handle that is inde-

pendent of the actual implementation of the?le system.

(File handles usually encode the mount point and inode numbers,and we cannot ensure that we will get the same values when we reconstruct the?le system.)

The second pass through the trace replaces all of the ?le handles in the trace with the unique identi?ers cre-ated in the?rst pass,and removes references to?les for which no information could be inferred.

Based on the table created after the?rst pass,we then create a?le system that matches the rewritten trace,and replay the new trace on that?le system.The result is both realistic and repeatable.

Using this method,we constructed several sub-snapshots for each workload.A typical hour of ac-tivity on these systems accesses?les containing only ?ve to ten GB of data(although there are hours when many directories are scanned,resulting in enormous and unwieldy sub-snapshots).One of the challenges with DEAS03and EECS03is that there are apparently some jobs that periodically scan large parts of the directory hierarchy,checking the modi?cation time of each?le.

Since most of these?les are never actually read or writ-ten,we could modify our sub-snapshot builder to recog-nize this and treat these?les differently(only creating a short or empty?le,instead of a?le the same size as the 11

original).This would permit us to create sub-snapshots for a much larger fraction of the underlying?le system.

6.2Increasing Locality of Reference

As an example application,we explore the use of attribute-based hints to control the locality of block ref-erence by anticipating which blocks are likely to be hot and grouping them in the same cylinder.

We use two methods to identify hot data blocks. The?rst method,which we call HotName,automatically classi?es as hot any?le that we predict will be short-lived and/or zero-length.For this type of?le,the overhead of creating and maintaining the inode and name the?le (i.e.,the directory entry for the?le)can be a large frac-tion of the cost incurred by the?le,and therefore there may be bene?t to reducing this overhead.The second method,which we call HotDir,predicts which directo-ries are most likely to contain?les that have the Hot-Name property.Since these directories are where the names for the HotName?les will be entered,there may be bene?t from identifying them as well.

The model that we use for HotDir is constructed via a method similar to ABLE,but unfortunately in our pro-totype requires some external logic because ABLE is fo-cused on?les and does not currently gather as much in-formation about directories.In general,the HotDir rules are that directories identi?ed as home directories,mail spool directories,and directories named Cache are clas-si?ed as hot directories.(ABLE is capable of identifying the mail and Cache directories as interesting,but does not currently have a“is-home-directory”attribute.) To test the effect of HotDir and HotName,we have modi?ed the FreeBSD implementation of FFS so that it uses a simpli?ed predictor(similar in nature to the ABLE predictor,but employing only name attributes, and re-coded to live in the kernel environment)to predict whether each new directory has the HotDir property and whether each new?le has the HotName property.If so, it attempts to allocate blocks for that?le or directory in a designated area of the disk.Our goal is to measure the increase in the number of accesses to this area of the disk when we use policies guided by HotDir and HotName.

We use two systems as our testbed.Both have a 1GHz Pentium III processor,1GB of RAM,and run FreeBSD4.8p3.Our experiments use the FreeBSD im-plementation of FFS with16KB blocks and soft-updates enabled[12].We have instrumented the device driver for the disk so that it keeps a count of how many reads and writes are done on each16KB logical disk block.

Heuristic Ops Reads Writes

Perfect0.85%

HotDir0.22%

HotFile0.00%

HotDir+HotFile0.22%

EECS03

23.89%41.61%

3.09%

4.61%

2.82% 5.00%

5.95%9.65%

Perfect0.76%

HotDir0.58%

HotFile0.00%

HotDir+HotFile0.57%

6.3Results

To test our heuristics,we ran a series of one-hour trace replays for the hours noon-5pm for several days on each of our traces.The models are trained on a Monday (3/24/03for DEAS03and EECS03,10/22/01for CAM-PUS),and the replays are constructed from the following Tuesday through Thursday.Each hour-long replay be-gins with15minutes to warm the cache.Then the block counters are reset,and the test begins in earnest and runs for45minutes of replay time.

We designate a4MB region as the target area for hot objects.Our evaluation examines the distribution of ac-tual accesses to the disk and compares the percentage that go to the target area to the theoretically maximum number of accesses that would go to the hottest4MB region given perfect knowledge(i.e.,if the hottest256 16KB blocks on the disk were allocated in the target re-gion).

As shown in Table4,both heuristics improve local-ity compared to the default layout policy,and using both heuristics is an improvement over using either one alone.

Write locality is increased more than read locality;this is not surprising because directory contents are ing both HotDir and HotName,we manage to increase the number of accesses to two-thirds of that of 12

the hottest possible region on CAMPUS,and on EECS03 nearly6%of all the disk accesses during the trace are now con?ned to the target area.These percentages may seem small,but keep in mind that we are focusing only on small?les and directories,and normal?le traf?c is the dominant cause of disk accesses in these workloads. 7Conclusions

We have shown that the attributes of a?le are strong hints of how that?le will be used.Furthermore,we have exploited these hints to make accurate predictions about the longer-term properties of?les,including the size, read/write ratio,and lifespan.Overall,?le names pro-vide the strongest hints,but using additional attributes can improve prediction accuracy.In some cases,accu-rate predictions are possible without considering names at ing traces from three NFS environments,we have demonstrated how classi?cation trees can predict ?le and directory properties,and that these predictions can be used within an existing?le system.

Our results are encouraging.Contemporary?le sys-tems use hard-coded policies and heuristics based on general assumptions about their workloads.Even the most advanced?le systems do no more than adapt to vio-lations of these assumptions.We have demonstrated how to construct a learning environment that can discover pat-terns in the workload and predict the properties of new ?les.These predictions enable optimization through dy-namic policy selection–instead of reacting to the prop-erties of new?les,the?le system can anticipate these properties.Although we only provide one example?le system optimization(clustering of hot directory data), this proof-of-concept demonstrates the potential for the system-wide deployment of predictive models.

ABLE is a?rst step towards a self-tuning?le system or storage device.Future work involves automation of the entire ABLE process,including sample collection, attribute selection,and model building.Furthermore, since changes in the workload will cause the accuracy of our models to degrade over time,we plan to auto-mate the process of detecting when models are failing (or are simply suboptimal)and retraining.When cata-clysmic changes in the workload occur(e.g.,tax season in an accounting?rm,or September on a college cam-pus),we must learn to detect that such an event has oc-curred and switch to a new(or cached)set of models. We also plan to explore mechanisms to include the cost of different types of mispredictions in our training in or-der to minimize the anticipated total cost of errors,rather than simply trying to minimize the number of errors.

In addition to caching and on-disk layout optimiza-tion,we envision a much larger class of applications that will bene?t from dynamic policy selection.Attribute-based classi?cation of system failures and break-ins(or anomaly detection)is a natural adjunct to this work

(e.g.,“has this?le been compromised?”).Moreover,

through the same clustering techniques implemented by our decision trees,we feel that semantic clustering can be useful for locating information(e.g.,“are these?les related?”).Both of these are areas of future work.

Acknowledgments

Daniel Ellard and Margo Seltzer were sponsored in part by IBM.The CMU researchers thank the mem-bers and companies of the PDL Consortium(including EMC,Hewlett-Packard,Hitachi,IBM,Intel,Microsoft, Network Appliance,Oracle,Panasas,Seagate,Sun,and Veritas)for their interest,insights,feedback,and sup-port.Their work is partially funded by the National Sci-ence Foundation,via grants#CCR-0326453and#CCR-0113660.

References

[1]Sedat Akyurek and Kenneth Salem.Adap-

tive Block puter Systems,

13(2):89–121,1995.

[2]J.Michael Bennett,Michael A.Bauer,and David

Kinchlea.Characteristics of Files in NFS Environ-

ments.In Proceedings of ACM SIGSMALL Sympo-

sium on Small Systems/PC.,pages18–25,Toronto,

Ontario,Canada,1991.

[3]Matthew A.Blaze.NFS Tracing by Passive Net-

work Monitoring.In Proceedings of the USENIX

Winter1992Technical Conference,pages333–343,

San Fransisco,CA,January1992.

[4]Leo Breiman,Jerome H.Friedman,Richard A.Ol-

shen,and Charles J.Stone.Classi?cation and Re-

gression Trees.Chapman and Hall,1984.

[5]Pei Cao,Edward W.Felten,Anna R.Karlin,

and Kai Li.Implementation and Performance

of Integrated Application-Controlled File Caching,

Prefetching,and Disk Scheduling.ACM Transac-

tions on Computer Systems,14(4):311–343,1996.

[6]Rich Caruana and Dayne Freitag.Greedy Attribute

Selection.In International Conference on Machine

Learning,pages28–36,1994.

[7]Ian Dowse and David Malone.Recent Filesystem

Optimisations on FreeBSD.In Proceedings of the 13

USENIX Annual Technical Conference(FREENIX Track),pages245–258,June2002.

[8]Daniel Ellard,Jonathan Ledlie,Pia Malkani,and

Margo Seltzer.Passive NFS Tracing of Email and Research Workloads.In Proceedings of the Second USENIX Conference on File and Storage Technologies(F AST’03),pages203–216,San Fran-cisco,CA,March2003.

[9]Daniel Ellard,Jonathan Ledlie,and Margo Seltzer.

The Utility of File Names.Technical Report TR-05-03,Harvard University Division of Engineering and Applied Sciences,2003.

[10]Daniel Ellard and Margo Seltzer.New NFS Tracing

Tools and Techniques for System Analysis.In Pro-ceedings of the Seventeenth Annual Large Installa-tion System Administration Conference(LISA’03), pages73–85,San Diego,CA,October2003. [11]Gregory R.Ganger and M.Frans Kaashoek.Em-

bedded Inodes and Explicit Grouping:Exploiting Disk Bandwidth for Small Files.In USENIX An-nual Technical Conference,pages1–17,1997. [12]Gregory R.Ganger,Marshall Kirk McKusick,

Craig A.N.Soules,and Yale N.Patt.Soft Updates:

a Solution to the Metadata Update Problem in File

Systems.ACM Transactions on Computer Systems, 18(2):127–153,2000.

[13]Trevor Hastie,Robert Tibshirani,and Jerome

Friedman.The Elements of Statistical Learning.

Spring-Verlag,2001.

[14]Butler mpson.Hints for Computer System

Design.In ACM Operating Systems Review,vol-ume15(5),pages33–48,October1983.

[15]Tara M.Madhyastha and Daniel A.Reed.In-

put/Output Access Pattern Classi?cation Using Hidden markov Models.In Proceedings of IOPAF, pages57–67,San Jose,CA,December1997. [16]James T.McClave,Frank H.Dietrich II,and Terry

Sincich.Statistics.Prentice Hall,1997.

[17]Marshall K.McKusick,William N.Joy,Samuel J.

Lef?er,and Robert S.Fabry.A Fast File System for puter Systems,2(3):181–197,1984. [18]Michael P.Mesnier,Gregory R.Ganger,and Erik

Riedel.Object-Based Storage.ACM Communica-tions Magazine,41(8):84–90,2003.

[19]Rodney Van Meter.Observing the Effects of Multi-

Zone Disks.In Proceedings of the Usenix Technical Conference,January1997.

[20]Tom M.Mitchell.Machine Learning.McGraw-

Hill,1997.

[21]Todd C.Mowry,Angela K.Demke,and Or-

ran Krieger.Automatic Compiler-Inserted I/O Prefetching for Out-Of-Core Applications.In Pro-

ceedings of the1996Symposium on Operating

Systems Design and Implementation,pages3–17.

USENIX Association,1996.

[22]Sape Mullender and Andrew Tanenbaum.Immedi-

ate Files.In Software–Practice and Experience,

number4in14,April1984.

[23]Keith Muller and Joseph Pasquale.A High Per-

formance Multi-Structured File System Design.In

Proceedings of the13th ACM Symposium on Oper-

ating Systems Principles(SOSP-91),pages56–67,

Asilomar,Paci?c Grove,CA,October1991.

[24]David A.Patterson,Garth Gibson,and Randy H.

Katz.Case for Redundant Arrays of Inexpensive

Disks(RAID).In In Proceedings of the ACM

Conference on Management of Data(SIGMOD),

Chicago,IL,June1988.

[25]R.Hugo Patterson,Garth A.Gibson,Eka Gint-

ing,Daniel Stodolsky,and Jim rmed

Prefetching and Caching.In ACM SOSP Proceed-

ings,1995.

[26]Mendel Rosenblum and John K.Ousterhout.The

Design and Implementation of a Log-Structured

File System.ACM Transactions on Computer Sys-

tems,10(1):26–52,1992.

[27]Yasushi Saito,Christos Karamanolis,Magnus

Karlsson,and Mallik Mahalingam.Taming Ag-

gressive Replication in the Pangaea Wide-Area File

System.In Proceedings of the5th Symposium

on Operating Systems Design and Implementation

(OSDI).,Boston,MA,December2002.

[28]Keith A.Smith and Margo I.Seltzer.File Sys-

tem Aging-Increasing the Relevance of File Sys-

tem Benchmarks.In Proceedings of SIGMETRICS

1997:Measurement and Modeling of Computer

Systems,pages203–213,Seattle,W A,June1997.

[29]David A.Solomon and Mark E.Russinovich.In-

side Microsoft Windows2000,Third Edition.Mi-

crosoft Press,2000.

[30]Carl Hudson Staelin.High Performance File Sys-

tem Design.Technical Report TR-347-91,Prince-

ton University,1991.

[31]John Wilkes,Richard Golding,Carl Staelin,and

Tim Sullivan.The HP AutoRAID Hierarchical

Storage System.In High Performance Mass Stor-

age and Parallel I/O:Technologies and Applica-

tions,pages90–106.IEEE Computer Society Press

and Wiley,2001.

[32]Zhihui Zhang and Kanad Ghose.yFS:A Journaling

File System Design for Handling Large Data Sets

with Reduced Seeking.In Proceedings of the Sec-

ond USENIX Conference on File and Storage Tech-

nologies(F AST’03),pages59–72,San Francisco,

CA,March2003.

本文来源：https://www.bwwdw.com/article/6plj.html

相关文章：

南开17春秋学期《无线网络原理与应用》在线作业03-01

4M变更管理项目一览表08-14

中国科学引文数据库来源期刊列表(2011-2012年)04-12

2019年有关宿舍文化节策划书07-28

危急值管理与处理流程03-14

1.JR 3810使用说明书 - 图文11-23

读后感303-28

审计业务约定书-范本05-04

第十二章国际市场产品策略08-15

上一篇：人称指示语的非常规用法下一篇：山东电网销售电价表2011.12.01