The simplescalar tool set, version 2.0

更新时间：2023-05-16 16:36:01 阅读量：实用文档文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

the推荐度：
相关推荐

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

University of Wisconsin-Madison Computer Sciences Department Technical Report #1342, June, 1997.

The SimpleScalar Tool Set, Version 2.0

Doug Burger*

Computer Sciences DepartmentUniversity of Wisconsin-Madison

1210 West Dayton StreetMadison, Wisconsin 53706 USA

Todd M. Austin

MicroComputer Research Labs, JF3-359Intel Corporation, 2111 NE 25th Avenue

Hillsboro, OR 97124 USA

*Contact: dburger@cs.wisc.edu

http://www.cs.wisc.edu/~mscalar/simplescalar.html

This report describes release 2.0 of the SimpleScalar tool set,a suite of free, publicly available simulation tools that offer bothdetailed and high-performance simulation of modern micropro-cessors. The new release offers more tools and capabilities, pre-compiled binaries, cleaner interfaces, better documentation,easier installation, improved portability, and higher perfor-mance. This report contains a complete description of the toolset, including retrieval and installation instructions, a descrip-tion of how to use the tools, a description of the target SimpleS-calar architecture, and many details about the internals of thetools and how to customize them. With this guide, the tool set canbe brought up and generating results in under an hour (on sup-ported platforms).

easy annotation of instructions, without requiring a retargetedcompiler for incremental changes. The instruction de nitionmethod, along with the ported GNU tools, makes new simulatorseasy to write, and the old ones even simpler to extend. Finally,the simulators have been aggressively tuned for performance,and can run codes approaching “real” sizes in tractable amountsof time. On a 200-MHz Pentium Pro, the fastest, least detailedsimulator simulates about four million machine cycles per sec-ond, whereas the most detailed processor simulator simulatesabout 150,000 per second.

The current release (version 2.0) of the tools is a majorimprovement over the previous release. Compared to version 1.0[2], this release includes better documentation, enhanced perfor-mance, compatibility with more platforms, precompiled SPEC95SimpleScalar binaries, cleaner interfaces, two new processorsimulators, option and statistic management packages, a source-level debugger (DLite!) and a tool to trace the out-of-order pipe-line.

The rest of this document contains information about obtain-ing, installing, running, using, and modifying the tool set. InSection2 we provide a detailed procedure for downloading therelease, installing it, and getting it up and running. In Section3,we describe the SimpleScalar architecture and details about thetarget (simulated) system. In Section4, we describe the SimpleS-calar processor simulators and discuss their internal workings. InSection5, we describe two tools that enhance the utility of thetool set: a pipeline tracer and a source-level debugger (for step-ping through the program being simulated). In Section6, we pro-vide the history of the tools’ development, describe current andplanned efforts to extend the tool set, and conclude. InAppendixA and AppendixB contain detailed de nitions of theSimpleScalar instructions and system calls, respectively.

1 Overview

Modern processors are incredibly complex marvels of engi-neering that are becoming increasingly hard to evaluate. Thisreport describes the SimpleScalar tool set (release 2.0), whichperforms fast, exible, and accurate simulation of modern pro-cessors that implement the SimpleScalar architecture (a closederivative of the MIPS architecture [4]). The tool set takes bina-ries compiled for the SimpleScalar architecture and simulatestheir execution on one of several provided processor simulators.We provide sets of precompiled binaries (including SPEC95),plus a modi ed version of GNU GCC (with associated utilities)that allows you to compile your own SimpleScalar test binariesfrom FORTRAN or C code.

The advantages of the SimpleScalar tools are high exibility,portability, extensibility, and performance. We include ve exe-cution-driven processor simulators in the release. They rangefrom an extremely fast functional simulator to a detailed, out-of-order issue, superscalar processor simulator that supports non-blocking caches and speculative execution.

The tool set is portable, requiring only that the GNU toolsmay be installed on the host system. The tool set has been testedextensively on many platforms (listed in Section2). The tool setis easily extensible. We designed the instruction set to support

This work was initially supported by NSF Grants CCR-9303030, CCR-9509589, and MIP-9505853, ONR Grant N00014-93-1-0465, a donationfrom Intel Corp., and by U.S. Army Intelligence Center and Fort Hua-chuca under Contract DABT63-95-C-0127 and ARPA order no. D346.The current support for this work comes from a variety of sources, all ofto which we are indebted.

2 Installation and Use

The only restrictions on using and distributing the tool set arethat (1) the copyright notice must accompany all re-releases ofthe tool set, and (2) third parties (i.e., you) are forbidden to placeany additional distribution restrictions on extensions to the toolset that you release. The copyright notice can be found in the dis-tribution directory as well as at the head of all simulator source les. We have included the copyright here as well:

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

This tool set is distributed “as is” in the hope that it will beuseful. The tool set comes with no warranty, and no author ordistributor accepts any responsibility for the consequences of itsuse.

Everyone is granted permission to copy, modify and redistrib-ute this tool set under the following conditions: This tool set is distributed for non-commercial use only.

Please contact the maintainer for restrictions applying tocommercial use of these tools. Permission is granted to anyone to make or distribute cop-ies of this tool set, either as received or modi ed, in any

medium, provided that all copyright notices, permission andnonwarranty notices are preserved, and that the distributorgrants the recipient permission for further redistribution aspermitted by this document. Permission is granted to distribute these tools in compiled

or executable form under the same conditions that apply forsource code, provided that either: (1) it is accompanied bythe corresponding machine-readable source code, or (2) itis accompanied by a written offer, with no time limit, to giveanyone a machine-readable copy of the correspondingsource code in return for reimbursement of the cost of distri-bution. This written offer must permit verbatim duplicationby anyone, or (3) it is distributed by someone who receivedonly the executable form, and is accompanied by a copy ofthe written offer of source code that they received concur-rently.

In other words, you are welcome to use, share and improvethese tools. You are forbidden to forbid anyone else to use, shareand improve what you give them.

These utilities are not required to run the simulators them-selves, but is required to compile your own SimpleScalarbenchmark binaries (e.g. test programs other than the oneswe provide). The compressed le is 3 MB, the uncom-pressed le is 14 MB, and the build requires 52 MB. simpletools.tar.gz - contains the retargeted GNU compiler

and library sources needed to build SimpleScalar bench-mark binaries (GCC 2.6.3, glibc 1.0.9, and f2c), as well aspre-built big- and little-endian versions of libc. This le isneeded only to build benchmarks, not to compile or run thesimulators. The tools are 11 MB compressed, 47 MBuncompressed, and the full installation requires 70 MB. simplebench.big.tar.gz - contains a set of the SPEC95

benchmark binaries, compiled to the SimpleScalar architec-ture running on a big-endian host. The binaries take under 5MB compressed, and are 29 MB when uncompressed. simplebench.little.tar.gz - same as above, except that the

binaries were compiled to the SimpleScalar architecturerunning on a little-endian host.

Once you have selected the appropriate les, place the down-loaded les into the desired target directory. If you obtained the les with the “.gz” suf x, run the GNU decompress utility (gun-zip). The les should now have a “.tar” suf x. To remove thedirectories from the archive:

tar xf filename.tar

2.1 Obtaining the tools

The tools can either be obtained through the World WideWeb, or by conventional ftp. For example, to get the lesim-plesim.tar.gz via the WWW, enter the URL:

ftp://ftp.cs.wisc.edu/sohi/Code/simplescalar/

simplesim.tar

and to obtain the same le with traditional ftp:

ftp ftp.cs.wisc.eduuser: anonymous

password: enter your e-mail address herecd sohi/Code/simplescalarget simplesim.tar

Note the “tar.gz” suf x: by requesting the le without the “.gz”suf x, the ftp server uncompresses it automatically. To get thecompressed version, simply request the le with the “.gz” suf x.The ve distribution les in the directory (which are symboliclinks to the les containing the latest version of the tools) are: simplesim.tar.gz - contains the simulator sources, the

instruction set de nition macros, and test program sourceand binaries. The directory is 1 MB compressed and 4 MBuncompressed. When the simulators are built, the directory(including object les) will require 11 MB. This le isrequired for installation of the tool set. simpleutils.tar.gz - contains the GNU binutils source (ver-sion 2.5.2), retargeted to the SimpleScalar architecture.

If you download and unpack all les, release, you should have

the following subdirectories with following contents: simplesim-2.0 - the sources of the SimpleScalar processor

simulators, supporting scripts, and small test benchmarks. Italso holds precompiled binaries of the test benchmarks. binutils-2.5.2 - the GNU binary utilities code, ported to the

SimpleScalar architecture. ssbig-na-sstrix - the root directory for the tree in which the

big-endian SimpleScalar binary utilities and compiler toolswill be installed. The unpacked directories contain header les and a pre-compiled copy of libc and a necessary object le. sslittle-na-sstrix - same as above, except that this directory

holds the little-endian versions of the SimpleScalar utilities. gcc-2.6.3 - the GNU C compiler code, targeted toward the

SimpleScalar architecture. glibc-1.09- the GNU libraries code, ported to the SimpleS-calar architecture. f2c-1994.09.27 - the 1994 release of AT&T Bell Labs’

FORTRAN to C translator code. spec95-big - precompiled SimpleScalar SPEC95 bench-mark binaries (big-endian version).

spec95-little - precompiled SimpleScalar SPEC95 bench-mark binaries (little-endian version)

2.2 Installing and running Simplescalar

We depict a graphical overview of the tool set in Figure1.Benchmarks written in FORTRAN are converted to C using BellLabs’ f2c converter. Both benchmarks written in C and thoseconverted from FORTRAN are compiled using the SimpleScalar

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

FORTRANC

Simulator source(RESULTS

executablesPrecompiled SSbinaries (test, SPEC95)

Figure 1. SimpleScalar tool set overview

version of GCC, which generates SimpleScalar assembly. TheSimpleScalar assembler and loader, along with the necessaryported libraries, produce SimpleScalar executables that can thenbe fed directly into one of the provided simulators. (The simula-tors themselves are compiled with the host platform’s nativecompiler; any ANSI C compiler will do).

If you use the precompiled SPEC95 binaries or the precom-piled test programs, all you have to install is the simulator sourceitself. If you wish to compile your own benchmarks, you willhave to install and build the GCC tree and optionally (recom-mended) the GNU binutils. If you wish to modify the supportlibraries, you will have to install, modify, and build the glibcsource as well.

The SimpleScalar architecture, like the MIPS architecture [4],supports both big-endian and little-endian executables. The toolset supports compilation for either of these targets; the names forthe big-endian and little-endian architecture aressbig-na-sstrixandsslittle-na-sstrix, respectively. You should use the targetendian-ness that matches your host platform; the simulators maynot work correctly if you force the compiler to provide cross-endian support. To determine which endian your host uses, runtheendian program located in thesimplesim-2.0/ direc-tory. For simplicity, the following instructions will assume a big-endian installation. In the following instructions, we will refer tothe directory in which you are installing SimpleScalar as$IDIR/.

The simulators come equipped with their own loader, andthus you do not need to build the GNU binary utilities to run sim-ulations. However, many of these utilities are useful, and we rec-ommend that you install them. If desired, build the GNU binaryutilities1:

cd $IDIR/binutils-2.5.2

configure --host=$HOST --target=ssbig-na-sstrix --with-gnu-as --with-gnu-ld --pre-fix=$IDIR

make

make install

$HOST here is a “canonical con guration” string that representsyour host architecture and system (CPU-COMPANY-SYSTEM).The string for a Sparcstation running SunOS would be sparc-sun-sunos4.1.3, running Solaris: sparc-sun-solaris2, a 386 runningSolaris: i386-sun-solaris2.4, etc. A complete list of supported$HOST strings resides in$IDIR/gcc-2.6.3/INSTALL.This installation will create the needed directories in$IDIR(these includebin/,lib/,include/, andman/). Once thebinutils have been built, build the simulators themselves. This isnecessary to do before building GCC, since one of the binaries isneeded for the cross-compiler build. You should edit$IDIR/simplesim-2.0/Makefile to use the desired compile ags(e.g., the correct optimization level). To use the GNU BFDloader instead of the custom loader in the simulators, uncomment-DBFD_LOADER in the Make le. To build the simulators:

cd $IDIR/simplesim-2.0make

If desired, build the compiler:

cd $IDIR/gcc-2.6.3

configure --host=$HOST --target=ssbig-na-sstrix --with-gnu-as --with-gnu-ld --pre-fix=$IDIRmake LANGUAGES=c

../simplesim-2.0/sim-safe ./enquire -f >!

float.h-crossmake install

1. You must have GNU Make to do the majority of installations describedin this document. To check if you have the GNU version, execute “make -v” or “gmake -v”. The GNU version understands this switch and displaysversion information.

We provide pre-built copies of the necessary libraries inssbig-na-sstrix/lib/, so you do not need to build the code inglibc-1.09, unless you change the library code. Building theselibraries is tricky, and we do not recommend it unless you have aspeci c need to do so. In that event, to build the libraries:

cd $IDIR/glibc-1.09

configure --prefix=$IDIR/ssbig-na-sstrix

ssbig-na-sstrix

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

setenv CC $IDIR/bin/ssbig-na-sstrix-gccunsetenv TZ

unsetenv MACHINEmake

make install

Note that you must have already built the SimpleScalar simula-tors to build this library, since the glibc build requires a compiledsimulator to test target machine-speci c parameters such asendian-ness.

If you have FORTRAN benchmarks, you will need to buildf2c:

cd $IDIR/f2c-1994.09.27make

make install

The entire tool set should now be ready for use. We provide pre-compiled test binaries (big- and little-endian) and their sources in$IDIR/simplesim2.0/tests). To run a test:

cd $IDIR/simplesim-2.0

sim-safe tests/bin.big/test-math

description of each. Both the number and the semantics of theregisters are identical to those in the MIPS-IV ISA.

In Figure3, we depict the three instruction encodings of Sim-pleScalar instructions:register,immediate, andjump formats. Allinstructions are 64 bits in length.

The register format is used for computational instructions.The immediate format supports the inclusion of a 16-bit constant.The jump format supports speci cation of 24-bit jump targets.The register elds are all 8 bits, to support extension of the archi-tected registers to 256 integer and oating point registers. Eachinstruction format has a xed-location, 16-bit opcode eld thatfacilitates fast instruction decoding.

Theannote eld is a 16-bit eld that can be modi ed post-compile, with annotations to instructions in the assembly les.The annotation interface is useful for synthesizing new instruc-tions without having to change and recompile the assembler.Annotations are attached to the opcode, and come in two avors:bit and eld annotations. A bit annotation is written as follows:

lw/a

$r6,4($r7)

The test should generate about a page of output, and will run veryquickly. The release has been ported to—and should run on—thefollowing systems:- gcc/AIX 413/RS6000- xlc/AIX 413/RS6000- gcc/HPUX/PA-RISC- gcc/SunOS 4.1.3/SPARC- gcc/Linux 1.3/x86- gcc/Solaris 2/SPARC- gcc/Solaris 2/x86

- gcc/DEC Unix 3.2/Alpha- c89/DEC Unix 3.2/Alpha- gcc/FreeBSD 2.2/x86- gcc/WindowsNT/x86

The annotation in this example is /a. It speci es that the rst bitof the annotation eld should be set. Bit annotations /a through /pset bits 0 through 15, respectively. Field annotations are writtenin the form:

lw/6:4(7)

$r6,4($r7)

3 The Simplescalar architecture

The SimpleScalar architecture is derived from the MIPS-IVISA [4]. The tool suite de nes both little-endian and big-endianversions of the architecture to improve portability (the versionused on a given host machine is the one that matches the endian-ness of the host). The semantics of the SimpleScalar ISA are asuperset of MIPS with the following notable differences andadditions: There are no architected delay slots: loads, stores, and con-trol transfers do not execute the succeeding instruction. Loads and stores support two addressing modes—for all

data types—in addition to those found in the MIPS architec-ture. These are: indexed (register+register), and auto-incre-ment/decrement. A square-root instruction, which implements both single-and double-precision oating point square roots. An extended 64-bit instruction encoding.

We list all SimpleScalar instructions in Figure2. We providea complete list of the instruction semantics (as implemented inthe simulator) in AppendixA. In Table1, we list the architectedregisters in the SimpleScalar architecture, their hardware andsoftware names (which are recognized by the assembler), and a

This annotation sets the speci ed 3-bit eld (from bit 4 to bit 6within the 16-bit annotation eld) to the value 7.

System calls in SimpleScalar are managed by a proxy handler(located insyscall.c) that intercepts system calls made bythe simulated binary, decodes the system call, copies the systemcall arguments, makes the corresponding call to the host’s operat-ing system, and then copies the results of the call into the simu-lated program’s memory. If you are porting SimpleScalar to anew platform, you will have to code the system call translationfrom SimpleScalar to your host machine insyscall.c. A listof all SimpleScalar system calls is provided in AppendixB.

SimpleScalar uses a 31-bit address space, and its virtualmemory is laid out as follows:

0x000000000x004000000x100000000x7fffc000

Unused

Start of text segmentStart of data segmentStack base (grows down)

The top of the data segment (which includes init and bss) is heldinmem_brk_point. The areas below the text segment andabove the stack base are unused.

4 Simulator internals

In this section, we describe the functionality of the processorsimulators that accompany the tool set. We describe each of thesimulators, their functionality, command-line arguments, andinternal structures.

The compiler outputs binaries that are compatible with theMIPS ECOFF object format. Library calls are handled with theported version of GNU GLIBC and POSIX-compliant Unix sys-tem calls. The simulators currently execute only user-level code.All SimpleScalar-related extensions to GCC are contained in theconfig/ss subdirectory of the GCC source tree that comes

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

Control

j - jump

jal - jump and linkjr - jump register

jalr - jump and link registerbeq - branch == 0bne - branch != 0blez - branch <= 0bgtz - branch > 0bltz - branch < 0bgez - branch >= 0

bct - branch FCC TRUEbcf - branch FCC FALSE

Load/Store

lb - load byte

lbu - load byte unsignedlh - load half (short)

lhu - load half (short) unsignedlw - load word

dlw - load double word

l.s - load single-precision FPl.d - load double-precision FPsb - store byte

sbu - store byte unsignedsh - store half (short)

shu - store half (short) unsignedsw - store word

dsw - store double word

s.s - store single-precision FPs.d - store double-precision FPaddressing modes:(C)

(reg+C) (with pre/post inc/dec)(reg+reg) (with pre/post inc/dec)

Integer Arithmetic

add - integer add

addu - integer add unsignedsub - integer subtract

subu - integer subtract unsignedmult - integer multiply

multu - integer multiply unsigneddiv - integer divide

divu - integer divide unsignedand - logical ANDor - logical ORxor - logical XORnor - logical NORsll - shift left logicalsrl - shift right logicalsra - shift right arithmeticslt - set less than

sltu - set less than unsigned

Floating Point Arithmetic

add.s - single-precision (SP) addadd.d - double-precision (DP) addsub.s - SP subtractsub.d - DP subtractmult.s - SP multiplymult.d - DP multiplydiv.s - SP dividediv.d - DP divide

abs.s - SP absolute valueabs.d - DP absolute valueneg.s - SP negationneg.d - DP negationsqrt.s - SP square rootsqrt.d - DP square root

cvt - int., single, double conversionc.s - SP comparec.d - DP compare

Miscellaneous

nop - no operationsyscall - system call

break - declare program error

Figure 2. Summary of SimpleScalar instructions

16-annote

16-opcode

16-annote

16-opcode

6-unused26-target

16-imm

8-rs

8-rt

8-rd

8-ru/shamt

Immediate format:

Jump format:

Figure 3. SimpleScalar architecture instruction formats

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

with the distribution.

The architecture is de ned inss.def, which contains amacro de nition for each instruction in the instruction set. Eachmacro de nes the opcode, name, ags, operand sources and des-tinations, and actions to be taken for a particular instruction.The instruction actions (which appear as macros) that arecommon to all simulators are de ned inss.h. Those actionsthat require different implementations in different simulators arede ned in each simulator code le.

When running a simulator,main() (de ned inmain.c)does all the initialization and loads the target binary into mem-ory. The routine then callssim_main(), which is simulator-speci c, de ned in each simulator code le. sim_main() pre-decodes the entire text segment for faster simulation, and thenbegins simulation from the target program entry point.

The following command-line arguments are available in allsimulators included with the release:-hprints the simulator help message.-dturn on the debug message.-istart execution in the DLite! debugger (see

Section5.2). This option is not supported inthesim-fast simulator.

-qterminate immediately (for use with -dump-con g).-dumpcon g < le>

generate a con guration le saving the com-mand-line parameters. Comments are per-mitted in the con g les, and begin with a #.

-con g < le>read in and use a con guration le. These

les may reference other con g les.

time is not needed.

sim-cache accepts the following arguments, in addition to theuniversal arguments described in Section4:

-cache:dl1 <con g>con gures a level-one data cache.-cache:dl2 <con g>con gures a level-two data cache.-cache:il1 <con g>con gures a level-one instr. cache.-cache:il2 <con g>con gures a level-two instr. cache.-tlb:dtlb <con g>con gures the data TLB.-tlb:itlb <con g>con gures the instruction TLB.- ush <boolean> ush all caches on a system call;

-icompressremap SimpleScalar’s 64-bit

instructions to a 32-bit equivalent inthe simulation (i.e., model amachine with 4-word instructions).

-pcstat <stat>generate a text-based pro le, as

described in Section4.3.The cache con guration (<con g>) is formatted as follows:

4.1 Functional simulation

The fastest, least detailed simulator (sim-fast) resides insim-fast.c.sim-fast does no time accounting, only func-tional simulation—it executes each instruction serially, simulat-ing no instructions in parallel.sim-fast is optimized for rawspeed, and assumes no cache, instruction checking, and has nosupport for DLite!.

A separate version ofsim-fast, calledsim-safe, also performsfunctional simulation, but checks for correct alignment andaccess permissions for each memory reference. Although similar,sim-fast andsim-safe are split (i.e., protection is not toggledwith a command-line argument in a merged simulator) to maxi-mize performance. Neither of the simulators accept any addi-tional command-line arguments. Both versions are very simple:less than 300 lines of code—they therefore make good startingpoints for understanding the internal workings of the simulators.In addition to the simulator le, bothsim-fast andsim-safe usethe following code les (not including header les):main.c,syscall.c,memory.c,regs.c,loader.c,ss.c,endian.c, andmisc.c.sim-safe also usesdlite.c.

Each of these elds has the following meaning:<name>cache name, must be unique.<nsets>number of sets in the cache.<bsize>block size (for TLBs, use the page size).<assoc>associativity of the cache (power of two).<repl>replacement policy (l | f | r), where

l = LRU,f = FIFO,r = random replacement.

The cache size is therefore the product of <nsets>, <bsize>, and<assoc>. To have a uni ed level in the hierarchy, “point” theinstruction cache to the name of the data cache in the correspond-ing level, as in the following example:

-cache:il1 il1:128:64:1:l-cache:il2 dl2

-cache:dl1 dl1:256:32:1:l-cache:dl2 ul2:1024:64:2:l

The defaults used insim-cache are as follows:L1 instruction cache:L1 data cache:L2 uni ed cache:instruction TLB:data TLB:

il1:256:32:1:ldl1:256:32:1:lul2:1024:64:4:litlb:16:4096:4:ldtlb:32:4096:4:l

(8 KB)(8 KB)(256 KB)(64 entries)(128 entries)

4.2 Cache simulation

The SimpleScalar distribution comes with two functionalcache simulators;sim-cache andsim-cheetah. Both use the lecache.c, and they usesim-cache.c andsim-chee-tah.c, respectively. These simulators are ideal for fast simula-tion of caches if the effect of cache performance on execution

sim-cheetah is based on work performed by Ragin Sugumar andSantosh Abraham while they were at the University of Michigan.It uses their Cheetah cache simulation engine [6] to generate sim-ulation results for multiple cache con gurations with a singlesimulation. The Cheetah engine simulates fully associativecaches ef ciently, as well as simulating a sometimes-optimalreplacement policy. This policy was called MIN by Belady [1],although the simulator refers to it asopt. Opt uses future knowl-edge to select a replacement; it chooses the block that will be ref-erenced the furthest in the future (if at all). This policy is optimalfor read-only instruction streams. It is not optimal for write-backcaches because it may be more expensive to replace a block ref-erenced further in the future if the block must be written back, asopposed to a clean block referenced slightly less far in the future.

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

Horwitz et al. [3] formally described an optimal algorithm thatincludes writes; however, only MIN is implemented in the simu-lator.

We have included the Cheetah engine as a stand-alone library,which is built and resides in thelibcheetah/ directory.sim-cheetah accepts the following command-line arguments, in addi-tion to those listed at the beginning of Section4:-refs [inst | data | uni ed]

specify which reference stream to analyze.

-C [fa | sa | dm]

fully associative, set associative, or direct-mapped cache.replacement policy.

log base 2 minimum bound on number ofsets to simulate simultaneously.

log base 2 maximum bound on set number.cache line size (in bytes).

maximum associativity to analyze (in logbase 2).

cache size interval to report when simulatingfully associative caches.

maximum cache size of interest.

cache size for direct-mapped analyses.

-pcstat <stat>

where <stat> is the integer counter that youwish to pro le by text address.

To generate the statistics for the pro le, follow the followingexample:

sim-profile -pcstat sim_num_insn test-math >&!

test-math.out

objdump -dl test-math >! test-math.distextprof.pl test-math.dis test-math.out

sim_num_insn_by_pc

-R [lru | opt]-a <sets>-b <sets>-l <line>-n <assoc>-in <interval>-M <size>-C <size>

We show a segment of the text pro le output in Figure4. Makesure that “objdump” is the version created when compiling thebinutils. Also, the rst line oftextprof.pl must be changedto re ect your system’s path to Perl (which must be installed onyour system for you to use this script). As an aside, note that “-taddrprof” is equivalent to “-pcstat sim_num_insn”.

4.4 Out-of-order processor timing simulation

The most complicated and detailed simulator in the distribu-tion, by far, issim-outorder (the main code le for which issim-outorder.c—about 3500 lines long). This simulatorsupports out-of-order issue and execution, based on the RegisterUpdate Unit [5]. The RUU scheme uses a reorder buffer to auto-matically rename registers and hold the results of pendinginstructions. Each cycle the reorder buffer retires completedinstructions in program order to the architected register le.

The processor’s memory system employs a load/store queue.Store values are placed in the queue if the store is speculative.Loads are dispatched to the memory system when the addressesof all previous stores are known. Loads may be satis ed either bythe memory system or by an earlier store value residing in thequeue, if their addresses match. Speculative loads may generatecache misses, but speculative TLB misses stall the pipeline untilthe branch condition is known.

We depict the simulated pipeline ofsim-outorder inFigure5. The main loop of the simulator, located insim_main(), is structured as follows:

ruu_init();for (;;) {

ruu_commit();ruu_writeback();lsq_refresh();ruu_issue();ruu_dispatch();ruu_fetch();}

Both of these simulators are ideal for performing high-levelcache studies that do not take access time of the caches intoaccount (e.g., studies that are concerned only with miss rates). Tomeasure the effect of cache organization upon the execution timeof real programs, however, the timing simulator described inSection4.4 must be used.

4.3 Pro ling

The distribution comes with a functional simulator that pro-duces voluminous and varied pro le information.sim-pro lecan generate detailed pro les on instruction classes andaddresses, text symbols, memory accesses, branches, and datasegment symbols.

sim-pro le takes the following command-line arguments,which toggle the various pro ling features:-iclassinstruction class pro ling (e.g. ALU,

branch).

-iprofinstruction pro ling (e.g., bnez, addi).-brprofbranch class pro ling (e.g., direct, calls, con-ditional).-amprofaddr. mode pro ling (e.g., displaced, R+R).-segprofload/store segment pro ling (e.g., data,

heap).

-tsymprofexecution pro le by text symbol (functions).-dsymprofreference pro le by data segment symbol.-taddrprofexecution pro le by text address.-allturn on all pro ling listed above.

Three of the simulators (sim-pro le,sim-cache, andsim-out-order) support text segment pro les for statistical integercounters. The supported counters include any added by users, solong as they are correctly “registered” with the SimpleScalarstats package included with the simulator code (see Section4.5).To use the counter pro les, simply add the command-line ag:

This loop is executed once for each target (simulated)machine cycle. By walking the pipeline in reverse, inter-stagelatch synchronization can be handled correctly with only onepass through each stage. When the target program terminateswith anexit() system call, the simulator performs alongjmp() tomain() to generate the statistics.

The fetch stage of the pipeline is implemented inruu_fetch(). The fetch unit models the machine instructionbandwidth, and takes the following inputs: the program counter,the predictor state, and misprediction detection from the branchexecution unit(s). Each cycle, it fetches instructions from onlyone I-cache line (and it blocks on an I-cache miss until the miss

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

never

00401a10: ( 13, 0.01): <strtod+220> addiu $a1[5],$zero[0],1strtod.c:79

00401a18: ( 13, 0.01): <strtod+228> bc1f 00401a30 <strtod+240>strtod.c:87

00401a20: : <strtod+230> addiu $s1[17],$s1[17],100401a28: : <strtod+238> j 00401a58 <strtod+268>strtod.c:89

00401a30: ( 13, 0.01): <strtod+240> mul.d $f2,$f20,$f4

00401a38: ( 13, 0.01): <strtod+248> addiu $v0[2],$v1[3],-4800401a40: ( 13, 0.01): <strtod+250> mtc1 $v0[2],$f0

Figure 4. Sample output from text segment statistical pro le

Figure 5. Pipeline for sim-outorder

completes). After fetching the instructions, it places them in thedispatch queue, and probes the line predictor to obtain the correctcache line to access in the next cycle.

The code for the dispatch stage of the pipeline resides inruu_dispatch(). This routine is where instruction decodingand register renaming is performed. The function uses theinstructions in the input queue lled by the fetch stage, a pointerto the active RUU, and the rename table. Once per cycle, the dis-patcher takes as many instructions as possible (up to the dispatchwidth of the target machine) from the fetch queue and placesthem in the scheduler queue. This routine is the one in whichbranch mispredictions are noted. (When a misprediction occurs,the simulator uses speculative state buffers, which are managedwith a copy-on-write policy). The dispatch routine enters andlinks instructions into the RUU and the load/store queue (LSQ),as well as splitting memory operations into two separate instruc-tions (the addition to compute the effective address and the mem-ory operation itself).

The issue stage of the pipeline is contained inruu_issue() andlsq_refresh(). These routines modelinstruction wakeup and issue to the functional units, tracking reg-ister and memory dependences. Each cycle, the scheduling rou-tines locate the instructions for which the register inputs are allready. The issue of ready loads is stalled if there is an earlierstore with an unresolved effective address in the load/storequeue. If the address of the earlier store matches that of the wait-ing load, the store value is forwarded to the load. Otherwise, the

load is sent to the memory system.

The execute stage is also handled inruu_issue(). Eachcycle, the routine gets as many ready instructions as possiblefrom the scheduler queue (up to the issue width). The functionalunits’ availability is also checked, and if they have availableaccess ports, the instructions are issued. Finally, the routineschedules writeback events using the latency of the functionalunits (memory operations probe the data cache to obtain the cor-rect latency of the operation). Data TLB misses stall the issue ofthe memory operation, are serviced in the commit stage of thepipeline, and currently assume a xed latency. The functionalunits’ latencies are hardcoded in the de nition offu_config[] insim-outorder.c.

The writeback stage resides inruu_writeback(). Eachcycle it scans the event queue for instruction completions. Whenit nds a completed instruction, it walks the dependence chain ofinstruction outputs to mark instructions that are dependent on thecompleted instruction. If a dependent instruction is waiting onlyfor that completion, the routine marks it as ready to be issued.The writeback stage also detects branch mispredictions; when itdetermines that a branch misprediction has occurred, it rolls thestate back to the checkpoint, discarding the erroneously issuedinstructions.

ruu_commit() handles the instructions from the writebackstage that are ready to commit. This routine does in-order com-mitting of instructions, updating of the data caches (or memory)with store values, and data TLB miss handling. The routine keeps

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

retiring instructions at the head of the RUU that are ready tocommit until the head instruction is one that is not ready. Whenan instruction is committed, its result is placed into the archi-tected register le, and the RUU/LSQ resources devoted to thatinstruction are reclaimed.

sim-outorder runs about an order of magnitude slower thansim-fast. In addition to the arguments listed at the beginning ofSection4,sim-outorder uses the following command-line argu-ments:

Specifying the processor core-fetch:ifqsize <size>

set the fetch width to be <size> instructions.Must be a power of two. The default is 4.

-fetch:speed <ratio>

set the ratio of the front end speed relative tothe execution core (allowing <ratio> times asmany instructions to be fetched as decodedper cycle).

-fetch:mplat <cycles>

set the branch misprediction latency. Thedefault is 3 cycles.

-decode:width <insts>

set the decode width to be <insts>, whichmust be a power of two. The default is 4.

-issue:width <insts>

set the maximum issue width in a givencycle. Must be a power of two. The default is4.

-issue:inorderforce the simulator to use in-order issue. The

default is false.

-issue:wrongpath

allow instructions to issue after a misspecu-lation. The default is true.

-ruu:size <insts>

capacity of the RUU (in instructions). Thedefault is 16.

-lsq:size <insts>

capacity of the load/store queue (in instruc-tions). The default is 8.

-res:ialu <num>

specify number of integer ALUs. The defaultis 4.

-res:imult <num>

specify number of integer multipliers/divid-ers. The default is 1.

-res:memports <num>

specify number of L1 cache ports. Thedefault is 2.

-res:fpalu <num>

specify number of oating point ALUs. Thedefault is 4.

-res: fpmult <num>

specify number of oating point multipliers/dividers. The default is 1.Specifying the memory hierarchy

All of the cache arguments and formats used insim-cache(listed at the beginning of Section4.2) are also used insim-out-

order, with the following additions:-cache:dl1lat <cycles>

specify the hit latency of the L1 data cache.The default is 1 cycle.

-cache:d12lat <cycles>

specify the hit latency of the L2 data cache.The default is 6 cycles.

-cache:il1lat <cycles>

specify the hit latency of the L1 instructioncache. The default is 1 cycle.

-cache:il2lat <cycles>

specify the hit latency of the L2 instructioncache. The default is 6 cycles.

-mem:lat <1st> <next>

specify main memory access latency ( rst,rest). The defaults are 18 cycles and 2 cycles.

-mem:width <bytes>

specify width of memory bus in bytes. Thedefault is 8 bytes.

-tlb:lat <cycles>

specify latency (in cycles) to service a TLBmiss. The default is 30 cycles.Specifying the branch predictor

Branch prediction is speci ed by choosing the following agwith one of the six subsequent arguments. The default is a bimo-dal predictor with 2048 entries.-bpred <type>nottakenalways predict not taken.takenalways predict taken.perfectperfect predictor.bimodbimodal predictor, using a branch target

buffer (BTB) with 2-bit counters.

2lev2-level adaptive bcombined predictor (bimodal and 2-level

adaptive).The predictor-speci c arguments are listed below:-bpred:bimod <size>

set the bimodal predictor table size to be<size> entries.

-bpred:2lev <l1size> <l2size> <hist_size> <xor>

specify the 2-level adaptive predictor.<l1size> speci es the number of entries inthe rst-level table, <l2size> speci es thenumber of entries in the second-level table,<hist_size> speci es the history width, and<xor> allows you to xor the history and theaddress in the second level of the predictor.This organization is depicted in Figure6. InTable2 we show how these parameters cor-respond to modern prediction schemes. Thedefault settings for the four parameters are 1,1024, 8, and 0, respectively.

-bpred:comb <size>

set the meta-table size of the combined pre-dictor to be <size> entries. The default is1024.

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

patternhistory

2-bit

predictors

branchbranchprediction

Figure 6. 2-level adaptive predictor structure

-bpred:ras <size>

set the return stack size to <size> (0 entriesmeans to return stack). The default is 8.entries.

-bpred:btb <sets> <assoc>

con gure the BTB to have <sets> sets and an

associativity of <assoc>. The defaults are512 sets and an associativity of 4.

-bpred:spec_update <stage>

allow speculative updates of the branch pre-dictor in the decode or writeback stages(<stage> = [ID|WB]). The default is non-speculative updates in the commit stage.Visualization-pcstat <stat>

record statistic <stat> by text address; described in Section4.3.

-ptrace < le> <range>

pipeline tracing, described in Section5.

4.5 Simulator code le descriptions

The following list describes the functionality of the C code les in thesimplesim-2.0/ directory, which are used by allof the simulators. bitmap.h: Contains support macros for performing bit-map manipulation. bpred.[c,h]: Handles the creation, functionality, and

updates of the branch predictors.bpred_create(), bpred_lookup(), andbpred_update() are the keyinterface functions.

cache.[c,h]: Contains general functions to support

multiple cache types (e.g., TLBs, instruction and data

caches). Uses a linked-list for tag comparisons in caches oflow associativity (less than or equal to four), and a hashtable for tag comparisons in higher-associativity caches.The important interfaces arecache_create(),cache_access(),cache_probe(),

cache_flush(), andcache_flush_addr().

dlite.[c,h]: Contains the code for DLite!, the source-level target program debugger.

endian.[c,h]: De nes a few simple functions to deter-mine byte- and word-order on the host and target platforms.eval.[c,h]: Contains code to evaluate expressions, usedin DLite!.

eventq.[c,h]: De nes functions and macros to handleordered event queues (used for ordering writebacks). Theimportant interface functions areeventq_queue() andeventq_service_events().

loader.[c,h]: Loads the target program into memory,sets up the segment sizes and addresses, sets up the initialcall stack, and obtains the target program entry point. Theinterface isld_load_prog().

main.c: Performs all initialization and launches the mainsimulator function. The key functions are

sim_options(),sim_config(),sim_main(),andsim_stats().

memory.[c,h]: Contains functions for reading from,writing to, initializing, and dumping the contents of the tar-get main memory. Memory is implemented as a large atspace, each portion of which is allocated on demand.mem_access() is the important interface function.misc.[c,h]: Contains numerous useful support func-tions, such asfatal(),panic(),warn(),info(),debug(),getcore(), andelapsed_time().

options.[c,h]: Contains the SimpleScalar optionspackage code, used to process command-line argumentsand/or option speci cations from con g les. Options areregistered with an option database (see the functions calledopt_reg_*()).opt_print_help() generates a helplisting, andopt_print_options() prints the currentoptions’ state.

ptrace.[c,h]: Contains code to collect and producepipeline traces fromsim-outorder.

range.[c,h]: Holds code that interprets program rangecommands used in DLite!.

regs.[c,h]: Contains functions to initialize the register les and dump their contents.

resource.[c,h]: Contains code to manage functionalunit resources, divided up into classes. The three de nedfunctions create the resource pools and busy tables(res_create_pool()), return a resource from the spec-i ed pool if available (reg_get()), and dump the con-tents of a pool (res_dump()).

sim.h: Contains a few extern variable declarations andfunction prototypes.

stats.[c,h]: Contains routines to handle statistics mea-suring target program behavior. As with the options pack-

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

age, counters are “registered” by type with an internaldatabase. Thestat_reg_*() routines register countersof various types, andstat_reg_formula() allows youto register expressions constructed of other statistics.stat_print_stats() prints all registered statistics.The statistics package also has facilities to measure distribu-tions;stat_reg_dist() creates an array distribution,stat_reg_sdist() creates a sparse array distribution,andstat_add_sample()updates a distribution.

ss.[c,h]: De nes macros to expedite the processing ofinstructions, numerous constants needed across simulators,and a function to print out individual instructions in a read-able format.

ss.def: Holds a list of macro calls (the macros are de nedin the simulators andss.h andss.c), each of whichde nes an instruction. The macro calls accept as argumentsthe opcode, name of the instruction, sources, destinations,actions to execute, and other information. This le serves asthe de nition of the instruction set.

symbol.[c,h]: Holds routines to handle program sym-bol and line information (used in DLite!).

syscall.[c,h]: Contains code that acts as the interfacebetween the SimpleScalar system calls (which are POSIX-compliant) and the system calls on the host machine.

sysprobe.c: Determines byte and word order on the hostplatform, and generates appropriate compiler ags.

version.h: De nes the version number and release dateof the distribution.

The traces may be viewed with thepipeview.pl Perl script,which is provided in the simplesim-2.0 directory. (You will haveto update the rst line ofpipeview.pl to have the correct pathto your local Perl binary, and you must have Perl installed onyour system).

pipeview.pl <ptrace_file>

We depict sample output from the pipetracer in Figure7.

5.2 The DLite! debugger

Release 2.0 of SimpleScalar includes a lightweight symbolicdebugger called DLite!, which runs with all simulators except forsim-fast. DLite! allows you to step through thebenchmark targetcode, not the simulator code. The debugger can be incorporatedinto a simulator by adding only four function calls (which havealready been added to all simulators in the distribution). Theneeded four function prototypes are indlite.h.

To use the debugger in a simulation, add the “-i” option(which stands for interactive) to the simulator command line.Below we list the set of commands that DLite! accepts.Getting help and getting out:help [string]print command reference.versionprint DLite! version information.quitexit simulator.terminategenerate statistics and exit simulator.Running and setting breakpoints:stepexecute next instruction and break.cont [addr]continue execution (optionally continuing

starting at <addr>).

break <addr>set breakpoint at <addr>, returns <id> of

breakpoint.

dbreak <addr> [r,w,x]

set data breakpoint at <addr> for (r)ead,(w)rite, and/or e(x)ecute, returns <id> ofbreakpoint.

rbreak <range> [r,w,x]

set breakpoint at <range> for (r)ead, (w)rite,and/or e(x)ecute, returns <id> of breakpoint.

breakslist active code and data breakpoints.delete <id>delete breakpoint <id>.clearclear all breakpoints (code and data).Printing information:

print [modi ers] <expr>

print the value of <expr> using optionalmodi ers.

display [modi ers] <expr>

display the value of <expr> using optionalmodi ers.

option <string>print the value of option <string>.optionsprint the values of all options.

stat <string>print the value of a statistical variable.statsprint the values of all statistical variables.whatis <expr>print the type of <expr>.regsprint all register contents.iregsprint all instruction register contents.

5 Utilities

In this section we describe the utilities that accompany the

SimpleScalar tool set; pipeline tracing and a source-level debug-ger.

5.1 Out-of-order pipeline tracing

The tool set provides the ability to extract and view traces ofthe out-of-order pipeline. Using the “-ptrace” option, a detailedhistory of all instructions executed in a range may be saved to a le. The information saved includes instruction fetch, retirement,and stage transitions. The syntax of this command is as follows:-ptrace < le> <start>:<end>

< le> is the le to which the trace will besaved. <start> and <end> are the instructionnumbers at which the trace will be startedand stopped. If they are left blank, the tracewill start at the beginning and/or stop at theend of the program, respectively.

For example:

-ptrace FOO.trc 100:500

trace from instructions 100 to 500, store thetrace in le FOO.src.

-ptrace FOO.trc :10000

trace from program beginning to instruction10000.

-ptrace FOO.trc :

trace the entire program execution.

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

new cycleindicator

new instructionde nitions

@ 610

gf = ‘0x0040d098: addiugg = ‘0x0040d0a0: beq[IF]gfgg

[DA]gbgcgd\ge

decoded, orawaiting issue

r2, r4, -1’r3, r5, 0x30’

[EX]fyfzga+

fr\fs

ftfu

results intoRUU, or

awaiting retire

pipeline event:(misprediction

detected), see output[CT]

current pipeline

state

fetched, or infetch queue

executing

results toregister le

Figure 7. Example of sim-outorder pipetrace

fpregsprint all oating point register contents.mstate [string]print machine-speci c state.dump <addr> [count]

dump memory at <addr> (optionally for<count> words).

dis <addr> [count]

disassemble instructions at <addr> (option-ally for <count> instructions).

symbolsprint the value of all program symbols.tsymbolsprint the value of all program text symbols.dsymbolsprint the value of all program data symbols.symbol <string>

print the value of symbol <string>.Legal arguments:

Arguments <addr>, <cnt>, <expr>, and <id> are any legalexpression:

<expr>← <factor> +|- <expr><factor>← <term> *|/ <factor><term>← ( <expr> )

| - <term> | <const> | <symbol> | < le:loc>

<symbol>← <literal> | <function name> | <register><literal>← [0-9]+ | 0x[0-9,a-f]+ | 0[0-7]+

<register>← $r[0-31] | $f[0-31] | $pc | $fcc | $hi | $loLegal ranges:

<range>← <address> | <instruction> | <cycle><address>← @<function name>:{+<literal>}<instruction>← {<literal>}:{<literal>}<cycle>← #{<literal>}:{<literal>}

Omitting optional arguments to the left of the colon will defaultto the smallest value permitted in that range. Omitting anoptional argument at the right of the colon will default to thelargest value permitted in that range.Legal command modi ers:bprint a byte

hprint a half (short)

wtox1fdcs

print a word (default)

print in decimal format (default)print in octal formatprint in hex formatprint in binary formatprint oatprint doubleprint characterprint string

Examples of legal commands:

break main+8break 0x400148dbreak stdin w

dbreak sys_count wrrbreak @main:+279rbreak 2000:3500rbreak #:100cycle 0 to cycle 100rbreak :entire execution

6 Summary

The SimpleScalar tool set was written by Todd Austin overabout one and a half years, between 1994 and 1996. He continuesto add improvements and updates. The ancestors of the tool setdate back to the mid to late 1980s, to tools written by ManojFranklin. At the time the tools were developed, both individualswere research assistants at the University of Wisconsin-MadisonComputer Sciences Department, supervised by Professor GuriSohi. Scott Breach provided valuable assistance with the imple-mentation of the proxy system calls. The rst release was assem-bled, debugged, and documented by Doug Burger, also aresearch assistant at Wisconsin, who is the maintainer of the sec-ond release as well. Kevin Skadron, currently at Princeton,implemented many of the more recent branch prediction mecha-nisms.

Many exciting extensions to SimpleScalar are both underwayand planned. Efforts have begun to extend the processor simula-

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

tors to simulate multithreaded processors and multiprocessors. ASemantics:Linux port to SimpleScalar (enabling simulation of the OS on akernel with publicly available sources) is planned, using device-level emulation and a user-level le system. Other plans includeJR:

extending the tool set to simulate ISAs other than SimpleScalarOpcode:and MIPS (Alpha and SPARC ISA support will be the rst addi-Format:tions).

Semantics:As they stand now, these tools provide researchers with a sim-ulation infrastructure that is fast, exible, and ef cient. Changesin both the target hardware and software may be made with min-JALR:imal effort. We hope that you nd these tools useful, and encour-Opcode:age you to contact us with ways that we can improve the release,Format:Semantics:

documentation, and the tools themselves.

References

BEQ:

[1]

L.A. Belady. A Study of Replacement Algorithms for aOpcode:Virtual-Storage Computer.IBM Systems Journal, 5(2):78–Format:101, 1966.

Semantics:

[2]

Doug Burger, ToddM. Austin, and Steven Bennett. Evalu-ating Future Microprocessors: the SimpleScalar Tool Set.Technical Report 1308, Computer Sciences Department,University of Wisconsin, Madison, WI, July 1996.

BNE:

[3]

L.P. Horwitz, R.M. Karp, R.E. Miller, and A.Winograd.Opcode:Index Register Allocation.Journal of the ACM, 13(1):43–Format:61, January 1966.

Semantics:

[4]Charles Price.MIPS IV Instruction Set, revision 3.1. MIPSTechnologies, Inc., Mountain View, CA, January 1995.[5]

GurindarS. Sohi. Instruction Issue Logic for High-Perfor-mance, Interruptible, Multiple Functional Unit, PipelinedComputers.IEEE Transactions on Computers, 39(3):349–BLEZ:359, March 1990.

Opcode:[6]

RabinA. Sugumar and SantoshG. Abraham. Ef cientFormat:Semantics:

Simulation of Caches under Optimal Replacement withApplications to Miss Characterization. InProceedings ofthe 1993 ACM Sigmetrics Conference on Measurementsand Modeling of Computer Systems, pages 24–35, May1993.

BGTZ:Opcode:A Instruction set de nition

Format:Semantics:

This appendix lists all SimpleScalar instructions with theiropcode, assembler format, and semantics. The semantics areexpressed as a C-style expression that uses the extended opera-tors and operands described in Table3. Operands that are notBLTZ:listed in Table3 refer to actual instruction elds described inOpcode:Figure3. For each instruction, the next PC value (NPC) defaultsFormat:to the current PC value plus eight (CPC+8) unless otherwiseSemantics:

speci ed.

A.1 Control instructions

Jump to absolute address.BGEZ:Opcode:0x01Opcode:Format:J target

Format:Semantics:SET_NPC((CPC & 0xf0000000) | (TARGET<<2)))

Semantics:

JAL:

Jump to absolute address and link.Opcode:0x02

Format:

JAL target

BC1F:SET_NPC((CPC\&0xf0000000) | (TARGET<<2))SET_GPR(31, CPC + 8))

Jump to register address.0x03JR rs

TALIGN(GPR(RS))SET_NPC(GPR(RS))

Jump to register address and link.0x04JALR rs

TALIGN(GPR(RS))

SET_GPR(RD, CPC + 8)

SET_NPC(GPR(RS))

Branch if equal.0x05

BEQ rs,rt,offset

if (GPR(RS) == GPR(RT))

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Branch if not equal.0x06

BEQ rs,rt,offset

if (GPR(RS) != GPR(RT))

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Branch if less than or equal to zero.0x07

BLEZ rs,offset

if (GPR(RS) <= 0)

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Branch if greater than zero.0x08

BGTZ rs,offset

if (GPR(RS) > 0)

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Branch if less than zero.0x09

BLTZ rs,offset

if (GPR(RS) < 0)

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Branch if greater than or equal to zero.0x0a

BGEZ rs,offset

if (GPR(RS) >= 0)

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Branch on oating point compare false.

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

Opcode:Format:Semantics:

0x0b

BC1F offset

if (!FCC)

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

Semantics:LBU:

Opcode:Format:Semantics:

SET_GPR(RT,

READ_SIGNED_BYTE(GPR(RS)+GPR(RD)))

Load byte unsigned, displaced addressing.0x22

LBU rt,offset(rs) inc_dec

SET_GPR(RT,

READ_UNSIGNED_BYTE(GPR(RS)+OFF-SET))

BC1T:Opcode:Format:Semantics:

Branch on oating point compare true.0x0c

BC1T offset

if (FCC)

SET_NPC(CPC + 8 + (OFFSET << 2))else

SET_NPC(CPC + 8)

LBU:

Opcode:Format:Semantics:

Load byte unsigned, indexed addressing.0xc1

LBU rt,(rs+rd) inc_dec

SET_GPR(RT,

READ_UNSIGNED_BYTE(GPR(RS)+GPR(RD)))

A.2 Load/store instructions

LB:

Opcode:Format:Semantics:LB:

Opcode:Format:

Load byte signed, displaced addressing.0x20

LB rt,offset(rs) inc_dec

SET_GPR(RT, READ_SIGNED_BYTE(GPR(RS)+ OFFSET))

LH:

Opcode:Format:Semantics:LH:

Opcode:

Load half signed, displaced addressing.0x24

LH rt,offset(rs) inc_dec

SET_GPR(RT,

READ_SIGNED_HALF(GPR(RS)+OFFSET))

Load byte signed, indexed addressing.0xc0

LB rt,(rs+rd) inc_dec

Load half signed, indexed addressing.0xc2

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

Format:Semantics:LHU:

Opcode:Format:Semantics:

LHU:

Opcode:Format:Semantics:

LW:

Opcode:Format:Semantics:LW:

Opcode:Format:Semantics:DLW:

Opcode:Format:Semantics:

DLW:

Opcode:Format:Semantics:

L.S:Opcode:Format:Semantics:L.S:Opcode:Format:Semantics:L.D:Opcode:

LH rt,(rs+rd) inc_dec

Format:SET_GPR(RT,

Semantics:

READ_SIGNED_HALF(GPR(RS)+GPR(RD)))

Load half unsigned, displaced addressing.0x26

LHU rt,offset(rs) inc_dec

L.D:SET_GPR(RT,

READ_UNSIGNED_HALF(GPR(RS)+OFF-Opcode:SET))

Format:Semantics:

Load half unsigned, indexed addressing.0xc3

LHU rt,(rs+rd) inc_dec

SET_GPR(RT,

READ_UNSIGNED_HALF(GPR(RS)+GPR(RD)LWL:

))

Opcode:Format:Load word, displaced addressing.Semantics:

0x28

LW rt,offset(rs) inc_dec

SET_GPR(RT, READ_WORD(GPR(RS)+OFF-SET))

LWR:

Opcode:Load word, indexed addressing.Format:0xc4

Semantics:

LW rt,(rs+rd) inc_dec

SET_GPR(RT,

READ_WORD(GPR(RS)+GPR(RD)))

SB:

Double load word, displaced addressing.Opcode:0x29

Format:DLW rt,offset(rs) inc_dec

Semantics:SET_GPR(RT, READ_WORD(GPR(RS)+OFF-SET))

SB:

SET_GPR(RT+1,

Opcode:READ_WORD(GPR(RS)+OFFSET+4))

Format:Semantics:Double load word, indexed addressing.0xce

SH:

DLW rt,(rs+rd) inc_dec

Opcode:SET_GPR(RT,

Format:READ_WORD(GPR(RS)+GPR(RD)))Semantics:SET_GPR(RT+1,

READ_WORD(GPR(RS)+GPR(RD)+4))

SH:

Load word into oating point register le,Opcode:displaced addressing.Format:Semantics:0x2a

L.S ft,offset(rs) inc_dec

SW:

SET_FPR_L(FT, READ_WORD(GPR(RS)+OFF-SET))

Opcode:Format:Load word into oating point register le,Semantics:indexed addressing.SW:

0xc5

L.S ft,(rs+rd) inc_dec

Opcode:Format:SET_FPR_L(RT,

Semantics:READ_WORD(GPR(RS)+GPR(RD)))

Load double word into oating point registerDSW:

le, displaced addressing.Opcode:Format:0x2b

Semantics:

L.D ft,offset(rs) inc_dec

SET_FPR_L(FT, READ_WORD(GPR(RS)+OFF-SET))

SET_FPR_L(FT+1,

READ_WORD(GPR(RS)+OFFSET+4))

Load double word into oating point register le, indexed addressing.0xcf

L.D ft,(rs+rd) inc_dec

SET_FPR_L(RT,

READ_WORD(GPR(RS)+GPR(RD)))SET_FPR_L(RT+1,

READ_WORD(GPR(RS)+GPR(RD)+4))

Load word left, displaced addressing.0x2c

LWL offset(rs)

Seess.def for a detailed description of thisinstruction’s semantics. NOTE: LWL does notsupport pre-/post- inc/dec.

Load word right, displaced addressing.0x2d

LWR offset(rs)

Seess.deffor a detailed description of thisinstruction’s semantics. NOTE: LWR does notsupport pre-/post- inc/dec.Store byte, displaced addressing.0x30

SB rt,offset(rs) inc_dec

WRITE_BYTE(GPR(RT), GPR(RS)+OFFSET)

Store byte, indexed addressing.0xc6

SB rt,(rs+rd) inc_dec

WRITE_BYTE(GPR(RT), GPR(RS)+GPR(RD))

Store half, displaced addressing.0x32

SH rt,offset(rs) inc_dec

WRITE_HALF(GPR(RT), GPR(RS)+OFFSET)

Store half, indexed addressing.0xc7

SH rt,(rs+rd) inc_dec

WRITE_HALF(GPR(RT), GPR(RS)+GPR(RD))

Store word, displaced addressing.0x34

SW rt,offset(rs) inc_dec

WRITE_WORD(GPR(RT), GPR(RS)+OFFSET)

Store word, indexed addressing.0xc8

SW rt,(rs+rd) inc_dec

WRITE_WORD(GPR(RT), GPR(RS)+GPR(RD))

Double store word, displaced addressing.0x35

DSW rt,offset(rs) inc_dec

WRITE_WORD(GPR(RT), GPR(RS)+OFFSET)

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

WRITE_WORD(GPR(RT+1), GPR(RS)+OFF-SET+4)

Format:Semantics:

DSW:

Opcode:Format:Double store word, indexed addressing.0xd0

DSW rt,(rs+rd) inc_dec

SWR rt,offset(rs)

Seess.def for a detailed description of thisinstruction’s semantics. NOTE: SWR does notsupport pre-/post- inc/dec.

A.3 Integer instructions

Semantics:

DSZ:

Opcode:Format:Semantics:DSZ:

Opcode:Format:Semantics:S.S:

Opcode:Format:Semantics:S.D:Opcode:Format:Semantics:

S.D:Opcode:Format:Semantics:

SWL:

Opcode:Format:Semantics:

SWR:

Opcode:

WRITE_WORD(GPR(RT), GPR(RS)+GPR(RD))WRITE_WORD(GPR(RT+1),ADD:

GPR(RS)+GPR(RD)+4)

Opcode:Double store zero, displaced addressing.Format:Semantics:0x38

DSW rt,offset(rs) inc_dec

WRITE_WORD(0, GPR(RS)+OFFSET)ADDI:WRITE_WORD(0, GPR(RS)+OFFSET+4)

check).Double store zero, indexed addressing.Opcode:0xd1

Format:DSW rt,(rs+rd) inc_dec

Semantics:WRITE_WORD(0, GPR(RS)+GPR(RD))WRITE_WORD(0, GPR(RS)+GPR(RD)+4)

ADDU:Store word from oating point register le,Opcode:displaced addressing.Format:Semantics:0x36

S.S ft,offset(rs) inc_dec

ADDIU:WRITE_WORD(FPR_L(FT), GPR(RS)+OFF-SET)

check).Opcode:Store word from oating point register le,Format:indexed addressing.Semantics:0xc9

S.S ft,(rs+rd) inc_dec

SUB:

WRITE_WORD(FPR_L(FT),Opcode:GPR(RS)+GPR(RD))

Format:Semantics:Store double word from oating point regis-ter le, displaced addressing.SUBU:0x37

check).S.D ft,offset(rs) inc_dec

Opcode:WRITE_WORD(FPR_L(FT), GPR(RS)+OFF-Format:SET)

WRITE_WORD(FPR_L(FT+1), GPR(RS)+OFF-Semantics:SET+4)

MULT:Store double word from oating point regis-Opcode:ter le, indexed addressing.Format:0xd2

Semantics:S.D ft,(rs+rd) inc_dec

WRITE_WORD(FPR_L(FT),GPR(RS)+GPR(RD))

MULTU:WRITE_WORD(FPR_L(FT+1),Opcode:GPR(RS)+GPR(RD)+4)

Format:Semantics:

Store word left, displaced addressing.0x39

SWL rt,offset(rs)

Seess.deffor a detailed description of thisDIV:

instruction’s semantics. NOTE: SWL does notOpcode:support pre-/post- inc/dec.

Format:Semantics:

Store word right, displaced addressing.0x3a

Add signed (with over ow check).0x40

ADD rd,rs,rt

OVER(GPR(RT),GPR(RT))

SET_GPR(RD, GPR(RS) + GPR(RT))

Add immediate signed (with over ow0x41

ADDI rd,rs,rt

OVER(GPR(RS),IMM)

SET_GPR(RT, GPR(RS) + IMM)

Add unsigned (no over ow check).0x42

ADDU rd,rs,rt

SET_GPR(RD, GPR(RS) + GPR(RT))

Add immediate unsigned (no over ow0x43

ADDIU rd,rs,rt

SET_GPR(RT, GPR(RS) + IMM)

Subtract signed (with under ow check).0x44

SUB rd,rs,rt

UNDER(GPR(RS),GPR(RT))

SET_GPR(RD, GPR(RS) - GPR(RT))

Subtract unsigned (without under ow0x45

SUBU rd,rs,rt

SET_GPR(RD, GPR(RS) - GPR(RT))

Multiply signed.0x46

MULT rs,rt

SET_HI((RS * RT) / (1<<32))SET_LO((RS * RT) % (1<<32))

Multiply unsigned.0x47

MULTU rs,rt

SET_HI(((unsigned)RS * (unsigned)RT)/(1<<32))SET_LO(((unsigned)RS*(unsigned)RT) %(1<<32))

Divide signed.0x48DIV rs,rt

DIV0(GPR(RT))

SET_LO(GPR(RS) / GPR(RT))SET_HI(GPR(RS) % GPR(RT))

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

DIVU

Opcode:Format:Semantics:

MFHI:Opcode:Format:Semantics:MTHI:Opcode:Format:Semantics:MFLO:Opcode:Format:Semantics:MTLO:Opcode:Format:Semantics:AND:

Opcode:Format:Semantics:ANDI:Opcode:Format:Semantics:OR:

Opcode:Format:Semantics:ORI:

Opcode:Format:Semantics:XOR:

Opcode:Format:Semantics:XORI:Opcode:Format:Semantics:NOR:

Opcode:Format:

Divide unsigned.0x49

DIVU rs,rt

DIV0(GPR(RT))

SET_LO((unsigned)GPR(RS)/(unsigned)GPR(RT))

SET_HI((unsigned)GPR(RS)%(unsigned)GPR(RT))

Move from HI register.0x4aMFHI rd

SET_GPR(RD, HI)

Move to HI register.0x4bMTHI rs

SET_HI(GPR(RS))

Move from LO register.0x4cMFLO rd

SET_GPR(RD, LO)

Move to LO register.0x4dMTLO rs

SET_LO(GPR(RS))

Logical AND.0x4e

AND rd,rs,rt

SET_GPR(RD, GPR(RS) & GPR(RT))

Logical AND immediate.0x4f

ANDI rd,rt,imm

SET_GPR(RT, GPR(RS) & UIMM)

Logical OR.0x50

OR rd,rs,rt

SET_GPR(RD, GPR(RS) | GPR(RT))

Logical OR immediate.0x51

ORI rd,rt,imm

SET_GPR(RT, GPR(RS) | UIMM)

Logical XOR.0x52

XOR rd,rs,rt

SET_GPR(RD, GPR(RS) ^ GPR(RT))

Logical XOR immediate.0x53

ORI rd,rt,uimm

SET_GPR(RT, GPR(RS) ^ UIMM)

Logical NOR.0x54

NOR rd,rs,rt

Semantics:SET_GPR(RD, ~(GPR(RS) | GPR(RT)))

SLL:

Shift left logical.Opcode:0x55

Format:SLL rd,rt,shamt

Semantics:SET_GPR(RD, GPR(RT) << SHAMT)

SLLV:

Shift left logical variable.Opcode:0x56

Format:SLLV rd,rt,rs

Semantics:SET_GPR(RD, GPR(RT) << (GPR(RS) & 0x1f))

SRL:

Shift right logical.Opcode:0x57

Format:SRL rd,rt,shamt

Semantics:SET_GPR(RD, GPR(RT) >> SHAMT)

SRLV:Shift right logical variable.Opcode:0x58

Format:SRLV rd,rt,rs

Semantics:SET_GPR(RD, GPR(RT) << (GPR(RS) & 0x1f))

SRA:

Shift right arithmetic.Opcode:0x59

Format:SRA rd,rt,shamt

Semantics:SET_GPR(RD, SEX(GPR(RT) >> SHAMT, 31 -SHAMT))

SRAV:Shift right arithmetic variable.Opcode:0x59

Format:SRAV rd,rt,rs

Semantics:SET_GPR(RD, SEX(GPR(RT) >> SHAMT, 31 -(GPR(RD) & 0x1f)))

SLT:

Set register if less than.Opcode:0x5b

Format:SLT rd,rs,rt

Semantics:SET_GPR(RD, (GPR(RS) < GPR(RT)) ? 1 : 0)

SLTI:

Set register if less than immediate.Opcode:0x5c

Format:SLTI rd,rs,imm

Semantics:SET_GPR(RD, (GPR(RS) < IMM) ? 1 : 0)

SLTU:Set register if less than unsigned.Opcode:0x5d

Format:SLTU rd,rs,rt

Semantics:SET_GPR(RD,

((unsigned)GPR(RS)<(unsigned)GPR(RT)) ? 1 : 0)

SLTIU:Set register if less than unsigned immediate.Opcode:0x5d

Format:SLTIU rd,rs,imm

Semantics:

SET_GPR(RD,

((unsigned)GPR(RS)<(unsigned)GPR(RT)) ? 1 : 0)

A.4 Floating-point instructions

ADD.S:Add oating point, single precision.Opcode:0x70

Format:ADD.S fd,fs,ft

Semantics:

FPALIGN(FD)

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

ADD.D:Opcode:Format:Semantics:

SUB.S:Opcode:Format:Semantics:

SUB.D:Opcode:Format:Semantics:

MUL.S:Opcode:Format:Semantics:

MUL.D:Opcode:Format:Semantics:

DIV.S:Opcode:Format:Semantics:

DIV.D:Opcode:Format:Semantics:

ABS.S:Opcode:Format:FPALIGN(FS)FPALIGN(FT)

SET_FPR_F(FD, FPR_F(FS) + FPR_F(FT)))

Add oating point, double-precision.0x71

ADD.D fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)

SET_FPR_D(FD, FPR_D(FS) + FPR_D(FT)))

Subtract oating point, single precision.0x72

SUB.S fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)

SET_FPR_F(FD, FPR_F(FS) - FPR_F(FT)))

Subtract oating point, double precision.0x73

SUB.D fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)

SET_FPR_D(FD, FPR_D(FS) - FPR_D(FT)))

Multiply oating point, single precision.0x74

MUL.S fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)

SET_FPR_F(FD,FPR_F(FS)*FPR_F(FT)))

Multiply oating point, double precision.0x75

MUL.D fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)

SET_FPR_D(FD, FPR_D(FS) * FPR_D(FT)))

Divide oating point, single precision.0x76

DIV.S fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)DIV0(FPR_F(FT))

SET_FPR_F(FD, FPR_F(FS) / FPR_F(FT)))

Divide oating point, double precision.0x77

DIV.D fd,fs,ft

FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)DIV0(FPR_D(FT))

SET_FPR_D(FD, FPR_D(FS) / FPR_D(FT)))

Absolute value, single precision.0x78

ABS.S fd,fsSemantics:

ABS.D:Opcode:Format:Semantics:

MOV.S:Opcode:Format:Semantics:

MOV.D:Opcode:Format:Semantics:

NEG.S:Opcode:Format:Semantics:

NEG.D:sion.

Opcode:Format:Semantics:

CVT.S.D:Opcode:Format:Semantics:

CVT.S.W:Opcode:Format:Semantics:

CVT.D.S:Opcode:Format:Semantics:

CVT.D.W:Opcode:Format:FPALIGN(FD)FPALIGN(FS)

SET_FPR_F(FD, fabs((double)FPR_F(FS))))

Absolute value, double precision.0x79

ABS.D fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_D(FD, fabs(FPR_D(FS))))

Move oating point value, single precision.0x7a

MOV.S fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_F(FD, FPR_F(FS))

Move oating point value, double precision.0x7b

MOV.D fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_D(FD, FPR_D(FS))

Negate oating point value, single precision.0x7c

NEG.S fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_F(FD, -FPR_F(FS))

Negate oating point value, double preci-0x7d

NEG.D fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_D(FD, -FPR_D(FS))

Convert double precision to single precision.0x80

CVT.S.D fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_D(FD, -FPR_D(FS))

Convert integer to single precision.0x81

CVT.S.W fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_F(FD, ( oat)FPR_L(FS))

Convert single precision to double precision.0x82

CVT.D.S fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_D(FD,(double)FPR_F(FS))

Convert integer to double precision.0x83

CVT.D.W fd,fs

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

CVT.W.S:Opcode:Format:Semantics:

CVT.W.D:Opcode:Format:Semantics:

C.EQ.S:Opcode:Format:Semantics:

C.EQ.D:Opcode:Format:Semantics:

C.LT.S:Opcode:Format:Semantics:

C.LT.D:Opcode:Format:Semantics:

C.LE.S:Opcode:Format:Semantics:

C.LE.D:Opcode:Format:Semantics:

SQRT.S:Opcode:Format:Semantics:

FPALIGN(FS)

SET_FPR_D(FD,(double)FPR_L(FS))

Convert single precision to integer.0x84

CVT.W.S fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_L(FD, (long)FPR_F(FS))

Convert double precision to integer.0x85

CVT.W.D fd,fs

FPALIGN(FD)FPALIGN(FS)

SET_FPR_L(FD, (long)FPR_D(FS))

Test if equal, single precision.0x90

C.EQ.S fs,ft

FPALIGN(FS)FPALIGN(FT)

SET_FCC(FPR_F(FS) == FPR_F(FT))

Test if equal, double precision.0x91

C.EQ.D fs,ft

FPALIGN(FS)FPALIGN(FT)

SET_FCC(FPR_D(FS) == FPR_D(FT))

Test if less than, single precision.0x92

C.LT.S fs,ft

FPALIGN(FS)FPALIGN(FT)

SET_FCC(FPR_F(FS) < FPR_F(FT))

Test if less than, double precision.0x93

C.LT.D fs,ft

FPALIGN(FS)FPALIGN(FT)

SET_FCC(FPR_D(FS) < FPR_D(FT))

Test if less than or equal, single precision.0x94

C.LE.S fs,ft

FPALIGN(FS)FPALIGN(FT)

SET_FCC(FPR_F(FS) <= FPR_F(FT))

Test if less than or equal, double precision.0x95

C.LE.D fs,ft

FPALIGN(FS)FPALIGN(FT)

SET_FCC(FPR_D(FS) <= FPR_D(FT))

Square root, single precision.0x96

SQRT.S fd,fs

FPALIGN(FD)

SET_FPR_F(FD,sqrt((double)FPR_F(FS)))

SQRT.D:Square root, double precision.Opcode:0x97

Format:SQRT.D fd,fs

Semantics:

FPALIGN(FD)FPALIGN(FS)

SET_FPR_D(FD, sqrt(FPR_D(FS)))

A.5 Miscellaneous instructions

NOP:

No operation.Opcode:0x00Format:NOPSemantics:None

SYSCALL:System call.Opcode:0xa0

Format:SYSCALL

Semantics:See AppendixB for details

BREAK:Declare a program error.Opcode:0xa1

Format:BREAK uimm

Semantics:

Actions are simulator-dependent. Typically,an error message is printed andabort() iscalled.

LUI:

Load upper immediate.Opcode:0xa2

Format:LUI uimm

Semantics:SET_GPR(RT, UIMM << 16)

MFC1:Move from oating point to integer register le.Opcode:0xa3

Format:MFC1 rt,fs

Semantics:SET_GPR(RT, FPR_L(FS))

MTC1:Move from integer to oating point register le.Opcode:0xa5

Format:MTC1 rt,fs

Semantics:

SET_FPR_L(FS, GPR(RT))

B System call de nitions

This appendix lists all system calls supported by the simula-tors with their system call code (syscode), interface speci cation,and appropriate POSIX Unix reference. Systems calls are initi-ated with the SYSCALL instruction. Prior to execution of aSYSCALL instruction, register $v0 should be loaded with thesystem call code. The arguments of the system call interface pro-totype should be loaded into registers $a0 - $a3 in the order spec-i ed by the system call interface prototype,e.g., for:

read(int fd, char *buf, int nbyte),

0x03 is loaded into $v0,fd is loaded into $a0,buf into $a1, andnbyte into $a2.EXIT:

Exit process.

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

Interface:Semantics:READ:Syscode:Interface:Semantics:WRITE:Syscode:Interface:Semantics:OPEN:Syscode:Interface:Semantics:CLOSE:Syscode:Interface:Semantics:CREAT:Syscode:Interface:Semantics:UNLINK:Syscode:Interface:Semantics:CHDIR:Syscode:Interface:Semantics:CHMOD:Syscode:Interface:Semantics:CHOWN:Syscode:Interface:Semantics:BRK:

Syscode:Interface:Semantics:LSEEK:Syscode:Interface:Semantics:GETPID:Syscode:Interface:

void exit(int status);Seeexit(2).

GETUID:Read from le to buffer.Syscode:Interface:0x03

Semantics:int read(int fd, char *buf, int nbyte);Seeread(2).

ACCESS:Write from a buffer to a le.Syscode:Interface:0x04

Semantics:int write(int fd, char *buf, int nbyte);Seewrite(2).

STAT:

Open a le.Syscode:Interface:

0x05

int open(char *fname, int ags, int mode);Seeopen(2).Close a le.0x06

int close(int fd);Seeclose(2).

Create a le.0x08

int creat(char *fname, int mode);Seecreat(2).Delete a le.0x0a

int unlink(char *fname);Seeunlink(2).Change process directory.0x0c

int chdir(char *path);Semantics:Seechdir(2).

LSTAT:Change le permissions.Syscode:0x0f

Interface:int chmod(int *fname, int mode);Semantics:Seechmod(2).

DUP:

Change le owner and group.Syscode:0x10

Interface:int chown(char *fname, int owner, int group);Semantics:Seechown(2).

PIPE:

Change process break address.Syscode:0x11

Interface:int brk(long addr);Semantics:Seebrk(2).

GETGID:Move le pointer.Syscode:0x13

Interface:long lseek(int fd, long offset, int whence);Semantics:Seelseek(2).IOCTL:Get process identi er.Syscode:0x14

Interface:int getpid(void);

Semantics:

Get user identi er.0x18

int getuid(void);Seegetuid(2).

Determine accessibility of a le.0x21

int access(char *fname, int mode);Seeaccess(2).

Get le status.0x26struct stat{

shortst_dev;longst_ino;unsigned shortst_mode;shortst_nlink;shortst_uid;shortst_gid;shortst_rdev;intst_size;intst_atime;intst_spare1;intst_mtime;intst_spare2;intst_ctime;intst_spare3;longst_blksize;longst_blocks;longst_gennum;longst_spare4;};

int stat(char *fname, struct stat *buf);Seestat(2).

Get le status (and don’t dereference links).0x28

int lstat(char *fname, struct stat *buf);Seelstat(2).Duplicate a le descriptor.0x29

int dup(int fd);Seedup(2).

Create an interprocess comm. channel.0x2a

int pipe(int fd[2]);Seepipe(2).Get group identi er.0x2f

int getgid(void);Seegetgid(2).

Device control interface.0x36

int ioctl(int fd, int request, char *arg);Seeioctl(2).

under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.

FSTAT:Get le descriptor status.Syscode:0x3e

Interface:int fstat(int fd, struct stat *buf);Semantics:Seefstat(2).GETPAGESIZE:Get page size.Syscode:0x40

Interface:int getpagesize(void);Semantics:

Seegetpagesize(2).

GETDTABLESIZE: Get le descriptor table size.Syscode:0x59Interface:int getdtablesize(void);Semantics:Seegetdtablesize(2).DUP2:

Duplicate a le descriptor.Syscode:0x5a

Interface:int dup2(int fd1, int fd2);Semantics:Seedup2(2).

FCNTL:File control.Syscode:0x5c

Interface:int fcntl(int fd, int cmd, int arg);Semantics:Seefcntl(2).

SELECT:Synchronous I/O multiplexing.Syscode:0x5d

Interface:int select (int width, fd_set *readfds, fd_set*writefds, fd_set *exceptfds, struct timeval*timeout);

Semantics: