The simplescalar tool set, version 2.0
更新时间:2023-08-11 03:35:01 阅读量: 人文社科 文档下载
- the推荐度:
- 相关推荐
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
University of Wisconsin-Madison Computer Sciences Department Technical Report #1342, June, 1997.
The SimpleScalar Tool Set, Version 2.0
Doug Burger*
Computer Sciences DepartmentUniversity of Wisconsin-Madison
1210 West Dayton StreetMadison, Wisconsin 53706 USA
Todd M. Austin
MicroComputer Research Labs, JF3-359Intel Corporation, 2111 NE 25th Avenue
Hillsboro, OR 97124 USA
*Contact: dburger@cs.wisc.edu
http://www.cs.wisc.edu/~mscalar/simplescalar.html
This report describes release 2.0 of the SimpleScalar tool set,a suite of free, publicly available simulation tools that offer bothdetailed and high-performance simulation of modern micropro-cessors. The new release offers more tools and capabilities, pre-compiled binaries, cleaner interfaces, better documentation,easier installation, improved portability, and higher perfor-mance. This report contains a complete description of the toolset, including retrieval and installation instructions, a descrip-tion of how to use the tools, a description of the target SimpleS-calar architecture, and many details about the internals of thetools and how to customize them. With this guide, the tool set canbe brought up and generating results in under an hour (on sup-ported platforms).
easy annotation of instructions, without requiring a retargetedcompiler for incremental changes. The instruction de nitionmethod, along with the ported GNU tools, makes new simulatorseasy to write, and the old ones even simpler to extend. Finally,the simulators have been aggressively tuned for performance,and can run codes approaching “real” sizes in tractable amountsof time. On a 200-MHz Pentium Pro, the fastest, least detailedsimulator simulates about four million machine cycles per sec-ond, whereas the most detailed processor simulator simulatesabout 150,000 per second.
The current release (version 2.0) of the tools is a majorimprovement over the previous release. Compared to version 1.0[2], this release includes better documentation, enhanced perfor-mance, compatibility with more platforms, precompiled SPEC95SimpleScalar binaries, cleaner interfaces, two new processorsimulators, option and statistic management packages, a source-level debugger (DLite!) and a tool to trace the out-of-order pipe-line.
The rest of this document contains information about obtain-ing, installing, running, using, and modifying the tool set. InSection2 we provide a detailed procedure for downloading therelease, installing it, and getting it up and running. In Section3,we describe the SimpleScalar architecture and details about thetarget (simulated) system. In Section4, we describe the SimpleS-calar processor simulators and discuss their internal workings. InSection5, we describe two tools that enhance the utility of thetool set: a pipeline tracer and a source-level debugger (for step-ping through the program being simulated). In Section6, we pro-vide the history of the tools’ development, describe current andplanned efforts to extend the tool set, and conclude. InAppendixA and AppendixB contain detailed de nitions of theSimpleScalar instructions and system calls, respectively.
1 Overview
Modern processors are incredibly complex marvels of engi-neering that are becoming increasingly hard to evaluate. Thisreport describes the SimpleScalar tool set (release 2.0), whichperforms fast, exible, and accurate simulation of modern pro-cessors that implement the SimpleScalar architecture (a closederivative of the MIPS architecture [4]). The tool set takes bina-ries compiled for the SimpleScalar architecture and simulatestheir execution on one of several provided processor simulators.We provide sets of precompiled binaries (including SPEC95),plus a modi ed version of GNU GCC (with associated utilities)that allows you to compile your own SimpleScalar test binariesfrom FORTRAN or C code.
The advantages of the SimpleScalar tools are high exibility,portability, extensibility, and performance. We include ve exe-cution-driven processor simulators in the release. They rangefrom an extremely fast functional simulator to a detailed, out-of-order issue, superscalar processor simulator that supports non-blocking caches and speculative execution.
The tool set is portable, requiring only that the GNU toolsmay be installed on the host system. The tool set has been testedextensively on many platforms (listed in Section2). The tool setis easily extensible. We designed the instruction set to support
This work was initially supported by NSF Grants CCR-9303030, CCR-9509589, and MIP-9505853, ONR Grant N00014-93-1-0465, a donationfrom Intel Corp., and by U.S. Army Intelligence Center and Fort Hua-chuca under Contract DABT63-95-C-0127 and ARPA order no. D346.The current support for this work comes from a variety of sources, all ofto which we are indebted.
2 Installation and Use
The only restrictions on using and distributing the tool set arethat (1) the copyright notice must accompany all re-releases ofthe tool set, and (2) third parties (i.e., you) are forbidden to placeany additional distribution restrictions on extensions to the toolset that you release. The copyright notice can be found in the dis-tribution directory as well as at the head of all simulator source les. We have included the copyright here as well:
Copyright (C) 1994, 1995, 1996, 1997 by Todd M. Austin
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
This tool set is distributed “as is” in the hope that it will beuseful. The tool set comes with no warranty, and no author ordistributor accepts any responsibility for the consequences of itsuse.
Everyone is granted permission to copy, modify and redistrib-ute this tool set under the following conditions: This tool set is distributed for non-commercial use only.
Please contact the maintainer for restrictions applying tocommercial use of these tools. Permission is granted to anyone to make or distribute cop-ies of this tool set, either as received or modi ed, in any
medium, provided that all copyright notices, permission andnonwarranty notices are preserved, and that the distributorgrants the recipient permission for further redistribution aspermitted by this document. Permission is granted to distribute these tools in compiled
or executable form under the same conditions that apply forsource code, provided that either: (1) it is accompanied bythe corresponding machine-readable source code, or (2) itis accompanied by a written offer, with no time limit, to giveanyone a machine-readable copy of the correspondingsource code in return for reimbursement of the cost of distri-bution. This written offer must permit verbatim duplicationby anyone, or (3) it is distributed by someone who receivedonly the executable form, and is accompanied by a copy ofthe written offer of source code that they received concur-rently.
In other words, you are welcome to use, share and improvethese tools. You are forbidden to forbid anyone else to use, shareand improve what you give them.
These utilities are not required to run the simulators them-selves, but is required to compile your own SimpleScalarbenchmark binaries (e.g. test programs other than the oneswe provide). The compressed le is 3 MB, the uncom-pressed le is 14 MB, and the build requires 52 MB. simpletools.tar.gz - contains the retargeted GNU compiler
and library sources needed to build SimpleScalar bench-mark binaries (GCC 2.6.3, glibc 1.0.9, and f2c), as well aspre-built big- and little-endian versions of libc. This le isneeded only to build benchmarks, not to compile or run thesimulators. The tools are 11 MB compressed, 47 MBuncompressed, and the full installation requires 70 MB. simplebench.big.tar.gz - contains a set of the SPEC95
benchmark binaries, compiled to the SimpleScalar architec-ture running on a big-endian host. The binaries take under 5MB compressed, and are 29 MB when uncompressed. simplebench.little.tar.gz - same as above, except that the
binaries were compiled to the SimpleScalar architecturerunning on a little-endian host.
Once you have selected the appropriate les, place the down-loaded les into the desired target directory. If you obtained the les with the “.gz” suf x, run the GNU decompress utility (gun-zip). The les should now have a “.tar” suf x. To remove thedirectories from the archive:
tar xf filename.tar
2.1 Obtaining the tools
The tools can either be obtained through the World WideWeb, or by conventional ftp. For example, to get the lesim-plesim.tar.gz via the WWW, enter the URL:
ftp://ftp.cs.wisc.edu/sohi/Code/simplescalar/
simplesim.tar
and to obtain the same le with traditional ftp:
ftp ftp.cs.wisc.eduuser: anonymous
password: enter your e-mail address herecd sohi/Code/simplescalarget simplesim.tar
Note the “tar.gz” suf x: by requesting the le without the “.gz”suf x, the ftp server uncompresses it automatically. To get thecompressed version, simply request the le with the “.gz” suf x.The ve distribution les in the directory (which are symboliclinks to the les containing the latest version of the tools) are: simplesim.tar.gz - contains the simulator sources, the
instruction set de nition macros, and test program sourceand binaries. The directory is 1 MB compressed and 4 MBuncompressed. When the simulators are built, the directory(including object les) will require 11 MB. This le isrequired for installation of the tool set. simpleutils.tar.gz - contains the GNU binutils source (ver-sion 2.5.2), retargeted to the SimpleScalar architecture.
If you download and unpack all les, release, you should have
the following subdirectories with following contents: simplesim-2.0 - the sources of the SimpleScalar processor
simulators, supporting scripts, and small test benchmarks. Italso holds precompiled binaries of the test benchmarks. binutils-2.5.2 - the GNU binary utilities code, ported to the
SimpleScalar architecture. ssbig-na-sstrix - the root directory for the tree in which the
big-endian SimpleScalar binary utilities and compiler toolswill be installed. The unpacked directories contain header les and a pre-compiled copy of libc and a necessary object le. sslittle-na-sstrix - same as above, except that this directory
holds the little-endian versions of the SimpleScalar utilities. gcc-2.6.3 - the GNU C compiler code, targeted toward the
SimpleScalar architecture. glibc-1.09- the GNU libraries code, ported to the SimpleS-calar architecture. f2c-1994.09.27 - the 1994 release of AT&T Bell Labs’
FORTRAN to C translator code. spec95-big - precompiled SimpleScalar SPEC95 bench-mark binaries (big-endian version).
spec95-little - precompiled SimpleScalar SPEC95 bench-mark binaries (little-endian version)
2.2 Installing and running Simplescalar
We depict a graphical overview of the tool set in Figure1.Benchmarks written in FORTRAN are converted to C using BellLabs’ f2c converter. Both benchmarks written in C and thoseconverted from FORTRAN are compiled using the SimpleScalar
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
FORTRANC
Simulator source(RESULTS
executablesPrecompiled SSbinaries (test, SPEC95)
Figure 1. SimpleScalar tool set overview
version of GCC, which generates SimpleScalar assembly. TheSimpleScalar assembler and loader, along with the necessaryported libraries, produce SimpleScalar executables that can thenbe fed directly into one of the provided simulators. (The simula-tors themselves are compiled with the host platform’s nativecompiler; any ANSI C compiler will do).
If you use the precompiled SPEC95 binaries or the precom-piled test programs, all you have to install is the simulator sourceitself. If you wish to compile your own benchmarks, you willhave to install and build the GCC tree and optionally (recom-mended) the GNU binutils. If you wish to modify the supportlibraries, you will have to install, modify, and build the glibcsource as well.
The SimpleScalar architecture, like the MIPS architecture [4],supports both big-endian and little-endian executables. The toolset supports compilation for either of these targets; the names forthe big-endian and little-endian architecture aressbig-na-sstrixandsslittle-na-sstrix, respectively. You should use the targetendian-ness that matches your host platform; the simulators maynot work correctly if you force the compiler to provide cross-endian support. To determine which endian your host uses, runtheendian program located in thesimplesim-2.0/ direc-tory. For simplicity, the following instructions will assume a big-endian installation. In the following instructions, we will refer tothe directory in which you are installing SimpleScalar as$IDIR/.
The simulators come equipped with their own loader, andthus you do not need to build the GNU binary utilities to run sim-ulations. However, many of these utilities are useful, and we rec-ommend that you install them. If desired, build the GNU binaryutilities1:
cd $IDIR/binutils-2.5.2
configure --host=$HOST --target=ssbig-na-sstrix --with-gnu-as --with-gnu-ld --pre-fix=$IDIR
make
make install
$HOST here is a “canonical con guration” string that representsyour host architecture and system (CPU-COMPANY-SYSTEM).The string for a Sparcstation running SunOS would be sparc-sun-sunos4.1.3, running Solaris: sparc-sun-solaris2, a 386 runningSolaris: i386-sun-solaris2.4, etc. A complete list of supported$HOST strings resides in$IDIR/gcc-2.6.3/INSTALL.This installation will create the needed directories in$IDIR(these includebin/,lib/,include/, andman/). Once thebinutils have been built, build the simulators themselves. This isnecessary to do before building GCC, since one of the binaries isneeded for the cross-compiler build. You should edit$IDIR/simplesim-2.0/Makefile to use the desired compile ags(e.g., the correct optimization level). To use the GNU BFDloader instead of the custom loader in the simulators, uncomment-DBFD_LOADER in the Make le. To build the simulators:
cd $IDIR/simplesim-2.0make
If desired, build the compiler:
cd $IDIR/gcc-2.6.3
configure --host=$HOST --target=ssbig-na-sstrix --with-gnu-as --with-gnu-ld --pre-fix=$IDIRmake LANGUAGES=c
../simplesim-2.0/sim-safe ./enquire -f >!
float.h-crossmake install
1. You must have GNU Make to do the majority of installations describedin this document. To check if you have the GNU version, execute “make -v” or “gmake -v”. The GNU version understands this switch and displaysversion information.
We provide pre-built copies of the necessary libraries inssbig-na-sstrix/lib/, so you do not need to build the code inglibc-1.09, unless you change the library code. Building theselibraries is tricky, and we do not recommend it unless you have aspeci c need to do so. In that event, to build the libraries:
cd $IDIR/glibc-1.09
configure --prefix=$IDIR/ssbig-na-sstrix
ssbig-na-sstrix
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
setenv CC $IDIR/bin/ssbig-na-sstrix-gccunsetenv TZ
unsetenv MACHINEmake
make install
Note that you must have already built the SimpleScalar simula-tors to build this library, since the glibc build requires a compiledsimulator to test target machine-speci c parameters such asendian-ness.
If you have FORTRAN benchmarks, you will need to buildf2c:
cd $IDIR/f2c-1994.09.27make
make install
The entire tool set should now be ready for use. We provide pre-compiled test binaries (big- and little-endian) and their sources in$IDIR/simplesim2.0/tests). To run a test:
cd $IDIR/simplesim-2.0
sim-safe tests/bin.big/test-math
description of each. Both the number and the semantics of theregisters are identical to those in the MIPS-IV ISA.
In Figure3, we depict the three instruction encodings of Sim-pleScalar instructions:register,immediate, andjump formats. Allinstructions are 64 bits in length.
The register format is used for computational instructions.The immediate format supports the inclusion of a 16-bit constant.The jump format supports speci cation of 24-bit jump targets.The register elds are all 8 bits, to support extension of the archi-tected registers to 256 integer and oating point registers. Eachinstruction format has a xed-location, 16-bit opcode eld thatfacilitates fast instruction decoding.
Theannote eld is a 16-bit eld that can be modi ed post-compile, with annotations to instructions in the assembly les.The annotation interface is useful for synthesizing new instruc-tions without having to change and recompile the assembler.Annotations are attached to the opcode, and come in two avors:bit and eld annotations. A bit annotation is written as follows:
lw/a
$r6,4($r7)
The test should generate about a page of output, and will run veryquickly. The release has been ported to—and should run on—thefollowing systems:- gcc/AIX 413/RS6000- xlc/AIX 413/RS6000- gcc/HPUX/PA-RISC- gcc/SunOS 4.1.3/SPARC- gcc/Linux 1.3/x86- gcc/Solaris 2/SPARC- gcc/Solaris 2/x86
- gcc/DEC Unix 3.2/Alpha- c89/DEC Unix 3.2/Alpha- gcc/FreeBSD 2.2/x86- gcc/WindowsNT/x86
The annotation in this example is /a. It speci es that the rst bitof the annotation eld should be set. Bit annotations /a through /pset bits 0 through 15, respectively. Field annotations are writtenin the form:
lw/6:4(7)
$r6,4($r7)
3 The Simplescalar architecture
The SimpleScalar architecture is derived from the MIPS-IVISA [4]. The tool suite de nes both little-endian and big-endianversions of the architecture to improve portability (the versionused on a given host machine is the one that matches the endian-ness of the host). The semantics of the SimpleScalar ISA are asuperset of MIPS with the following notable differences andadditions: There are no architected delay slots: loads, stores, and con-trol transfers do not execute the succeeding instruction. Loads and stores support two addressing modes—for all
data types—in addition to those found in the MIPS architec-ture. These are: indexed (register+register), and auto-incre-ment/decrement. A square-root instruction, which implements both single-and double-precision oating point square roots. An extended 64-bit instruction encoding.
We list all SimpleScalar instructions in Figure2. We providea complete list of the instruction semantics (as implemented inthe simulator) in AppendixA. In Table1, we list the architectedregisters in the SimpleScalar architecture, their hardware andsoftware names (which are recognized by the assembler), and a
This annotation sets the speci ed 3-bit eld (from bit 4 to bit 6within the 16-bit annotation eld) to the value 7.
System calls in SimpleScalar are managed by a proxy handler(located insyscall.c) that intercepts system calls made bythe simulated binary, decodes the system call, copies the systemcall arguments, makes the corresponding call to the host’s operat-ing system, and then copies the results of the call into the simu-lated program’s memory. If you are porting SimpleScalar to anew platform, you will have to code the system call translationfrom SimpleScalar to your host machine insyscall.c. A listof all SimpleScalar system calls is provided in AppendixB.
SimpleScalar uses a 31-bit address space, and its virtualmemory is laid out as follows:
0x000000000x004000000x100000000x7fffc000
Unused
Start of text segmentStart of data segmentStack base (grows down)
The top of the data segment (which includes init and bss) is heldinmem_brk_point. The areas below the text segment andabove the stack base are unused.
4 Simulator internals
In this section, we describe the functionality of the processorsimulators that accompany the tool set. We describe each of thesimulators, their functionality, command-line arguments, andinternal structures.
The compiler outputs binaries that are compatible with theMIPS ECOFF object format. Library calls are handled with theported version of GNU GLIBC and POSIX-compliant Unix sys-tem calls. The simulators currently execute only user-level code.All SimpleScalar-related extensions to GCC are contained in theconfig/ss subdirectory of the GCC source tree that comes
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
Control
j - jump
jal - jump and linkjr - jump register
jalr - jump and link registerbeq - branch == 0bne - branch != 0blez - branch <= 0bgtz - branch > 0bltz - branch < 0bgez - branch >= 0
bct - branch FCC TRUEbcf - branch FCC FALSE
Load/Store
lb - load byte
lbu - load byte unsignedlh - load half (short)
lhu - load half (short) unsignedlw - load word
dlw - load double word
l.s - load single-precision FPl.d - load double-precision FPsb - store byte
sbu - store byte unsignedsh - store half (short)
shu - store half (short) unsignedsw - store word
dsw - store double word
s.s - store single-precision FPs.d - store double-precision FPaddressing modes:(C)
(reg+C) (with pre/post inc/dec)(reg+reg) (with pre/post inc/dec)
Integer Arithmetic
add - integer add
addu - integer add unsignedsub - integer subtract
subu - integer subtract unsignedmult - integer multiply
multu - integer multiply unsigneddiv - integer divide
divu - integer divide unsignedand - logical ANDor - logical ORxor - logical XORnor - logical NORsll - shift left logicalsrl - shift right logicalsra - shift right arithmeticslt - set less than
sltu - set less than unsigned
Floating Point Arithmetic
add.s - single-precision (SP) addadd.d - double-precision (DP) addsub.s - SP subtractsub.d - DP subtractmult.s - SP multiplymult.d - DP multiplydiv.s - SP dividediv.d - DP divide
abs.s - SP absolute valueabs.d - DP absolute valueneg.s - SP negationneg.d - DP negationsqrt.s - SP square rootsqrt.d - DP square root
cvt - int., single, double conversionc.s - SP comparec.d - DP compare
Miscellaneous
nop - no operationsyscall - system call
break - declare program error
Figure 2. Summary of SimpleScalar instructions
16-annote
16-opcode
16-annote
16-opcode
6-unused26-target
16-imm
8-rs
8-rt
8-rd
8-ru/shamt
Register format:
Immediate format:
Jump format:
Figure 3. SimpleScalar architecture instruction formats
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
with the distribution.
The architecture is de ned inss.def, which contains amacro de nition for each instruction in the instruction set. Eachmacro de nes the opcode, name, ags, operand sources and des-tinations, and actions to be taken for a particular instruction.The instruction actions (which appear as macros) that arecommon to all simulators are de ned inss.h. Those actionsthat require different implementations in different simulators arede ned in each simulator code le.
When running a simulator,main() (de ned inmain.c)does all the initialization and loads the target binary into mem-ory. The routine then callssim_main(), which is simulator-speci c, de ned in each simulator code le. sim_main() pre-decodes the entire text segment for faster simulation, and thenbegins simulation from the target program entry point.
The following command-line arguments are available in allsimulators included with the release:-hprints the simulator help message.-dturn on the debug message.-istart execution in the DLite! debugger (see
Section5.2). This option is not supported inthesim-fast simulator.
-qterminate immediately (for use with -dump-con g).-dumpcon g < le>
generate a con guration le saving the com-mand-line parameters. Comments are per-mitted in the con g les, and begin with a #.
-con g < le>read in and use a con guration le. These
les may reference other con g les.
time is not needed.
sim-cache accepts the following arguments, in addition to theuniversal arguments described in Section4:
-cache:dl1 <con g>con gures a level-one data cache.-cache:dl2 <con g>con gures a level-two data cache.-cache:il1 <con g>con gures a level-one instr. cache.-cache:il2 <con g>con gures a level-two instr. cache.-tlb:dtlb <con g>con gures the data TLB.-tlb:itlb <con g>con gures the instruction TLB.- ush <boolean> ush all caches on a system call;
(<boolean> = 0 | 1 | true | TRUE | false | FALSE).
-icompressremap SimpleScalar’s 64-bit
instructions to a 32-bit equivalent inthe simulation (i.e., model amachine with 4-word instructions).
-pcstat <stat>generate a text-based pro le, as
described in Section4.3.The cache con guration (<con g>) is formatted as follows:
<name>:<nsets>:<bsize>:<assoc>:<repl>
4.1 Functional simulation
The fastest, least detailed simulator (sim-fast) resides insim-fast.c.sim-fast does no time accounting, only func-tional simulation—it executes each instruction serially, simulat-ing no instructions in parallel.sim-fast is optimized for rawspeed, and assumes no cache, instruction checking, and has nosupport for DLite!.
A separate version ofsim-fast, calledsim-safe, also performsfunctional simulation, but checks for correct alignment andaccess permissions for each memory reference. Although similar,sim-fast andsim-safe are split (i.e., protection is not toggledwith a command-line argument in a merged simulator) to maxi-mize performance. Neither of the simulators accept any addi-tional command-line arguments. Both versions are very simple:less than 300 lines of code—they therefore make good startingpoints for understanding the internal workings of the simulators.In addition to the simulator le, bothsim-fast andsim-safe usethe following code les (not including header les):main.c,syscall.c,memory.c,regs.c,loader.c,ss.c,endian.c, andmisc.c.sim-safe also usesdlite.c.
Each of these elds has the following meaning:<name>cache name, must be unique.<nsets>number of sets in the cache.<bsize>block size (for TLBs, use the page size).<assoc>associativity of the cache (power of two).<repl>replacement policy (l | f | r), where
l = LRU,f = FIFO,r = random replacement.
The cache size is therefore the product of <nsets>, <bsize>, and<assoc>. To have a uni ed level in the hierarchy, “point” theinstruction cache to the name of the data cache in the correspond-ing level, as in the following example:
-cache:il1 il1:128:64:1:l-cache:il2 dl2
-cache:dl1 dl1:256:32:1:l-cache:dl2 ul2:1024:64:2:l
The defaults used insim-cache are as follows:L1 instruction cache:L1 data cache:L2 uni ed cache:instruction TLB:data TLB:
il1:256:32:1:ldl1:256:32:1:lul2:1024:64:4:litlb:16:4096:4:ldtlb:32:4096:4:l
(8 KB)(8 KB)(256 KB)(64 entries)(128 entries)
4.2 Cache simulation
The SimpleScalar distribution comes with two functionalcache simulators;sim-cache andsim-cheetah. Both use the lecache.c, and they usesim-cache.c andsim-chee-tah.c, respectively. These simulators are ideal for fast simula-tion of caches if the effect of cache performance on execution
sim-cheetah is based on work performed by Ragin Sugumar andSantosh Abraham while they were at the University of Michigan.It uses their Cheetah cache simulation engine [6] to generate sim-ulation results for multiple cache con gurations with a singlesimulation. The Cheetah engine simulates fully associativecaches ef ciently, as well as simulating a sometimes-optimalreplacement policy. This policy was called MIN by Belady [1],although the simulator refers to it asopt. Opt uses future knowl-edge to select a replacement; it chooses the block that will be ref-erenced the furthest in the future (if at all). This policy is optimalfor read-only instruction streams. It is not optimal for write-backcaches because it may be more expensive to replace a block ref-erenced further in the future if the block must be written back, asopposed to a clean block referenced slightly less far in the future.
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
Horwitz et al. [3] formally described an optimal algorithm thatincludes writes; however, only MIN is implemented in the simu-lator.
We have included the Cheetah engine as a stand-alone library,which is built and resides in thelibcheetah/ directory.sim-cheetah accepts the following command-line arguments, in addi-tion to those listed at the beginning of Section4:-refs [inst | data | uni ed]
specify which reference stream to analyze.
-C [fa | sa | dm]
fully associative, set associative, or direct-mapped cache.replacement policy.
log base 2 minimum bound on number ofsets to simulate simultaneously.
log base 2 maximum bound on set number.cache line size (in bytes).
maximum associativity to analyze (in logbase 2).
cache size interval to report when simulatingfully associative caches.
maximum cache size of interest.
cache size for direct-mapped analyses.
-pcstat <stat>
where <stat> is the integer counter that youwish to pro le by text address.
To generate the statistics for the pro le, follow the followingexample:
sim-profile -pcstat sim_num_insn test-math >&!
test-math.out
objdump -dl test-math >! test-math.distextprof.pl test-math.dis test-math.out
sim_num_insn_by_pc
-R [lru | opt]-a <sets>-b <sets>-l <line>-n <assoc>-in <interval>-M <size>-C <size>
We show a segment of the text pro le output in Figure4. Makesure that “objdump” is the version created when compiling thebinutils. Also, the rst line oftextprof.pl must be changedto re ect your system’s path to Perl (which must be installed onyour system for you to use this script). As an aside, note that “-taddrprof” is equivalent to “-pcstat sim_num_insn”.
4.4 Out-of-order processor timing simulation
The most complicated and detailed simulator in the distribu-tion, by far, issim-outorder (the main code le for which issim-outorder.c—about 3500 lines long). This simulatorsupports out-of-order issue and execution, based on the RegisterUpdate Unit [5]. The RUU scheme uses a reorder buffer to auto-matically rename registers and hold the results of pendinginstructions. Each cycle the reorder buffer retires completedinstructions in program order to the architected register le.
The processor’s memory system employs a load/store queue.Store values are placed in the queue if the store is speculative.Loads are dispatched to the memory system when the addressesof all previous stores are known. Loads may be satis ed either bythe memory system or by an earlier store value residing in thequeue, if their addresses match. Speculative loads may generatecache misses, but speculative TLB misses stall the pipeline untilthe branch condition is known.
We depict the simulated pipeline ofsim-outorder inFigure5. The main loop of the simulator, located insim_main(), is structured as follows:
ruu_init();for (;;) {
ruu_commit();ruu_writeback();lsq_refresh();ruu_issue();ruu_dispatch();ruu_fetch();}
Both of these simulators are ideal for performing high-levelcache studies that do not take access time of the caches intoaccount (e.g., studies that are concerned only with miss rates). Tomeasure the effect of cache organization upon the execution timeof real programs, however, the timing simulator described inSection4.4 must be used.
4.3 Pro ling
The distribution comes with a functional simulator that pro-duces voluminous and varied pro le information.sim-pro lecan generate detailed pro les on instruction classes andaddresses, text symbols, memory accesses, branches, and datasegment symbols.
sim-pro le takes the following command-line arguments,which toggle the various pro ling features:-iclassinstruction class pro ling (e.g. ALU,
branch).
-iprofinstruction pro ling (e.g., bnez, addi).-brprofbranch class pro ling (e.g., direct, calls, con-ditional).-amprofaddr. mode pro ling (e.g., displaced, R+R).-segprofload/store segment pro ling (e.g., data,
heap).
-tsymprofexecution pro le by text symbol (functions).-dsymprofreference pro le by data segment symbol.-taddrprofexecution pro le by text address.-allturn on all pro ling listed above.
Three of the simulators (sim-pro le,sim-cache, andsim-out-order) support text segment pro les for statistical integercounters. The supported counters include any added by users, solong as they are correctly “registered” with the SimpleScalarstats package included with the simulator code (see Section4.5).To use the counter pro les, simply add the command-line ag:
This loop is executed once for each target (simulated)machine cycle. By walking the pipeline in reverse, inter-stagelatch synchronization can be handled correctly with only onepass through each stage. When the target program terminateswith anexit() system call, the simulator performs alongjmp() tomain() to generate the statistics.
The fetch stage of the pipeline is implemented inruu_fetch(). The fetch unit models the machine instructionbandwidth, and takes the following inputs: the program counter,the predictor state, and misprediction detection from the branchexecution unit(s). Each cycle, it fetches instructions from onlyone I-cache line (and it blocks on an I-cache miss until the miss
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
never
00401a10: ( 13, 0.01): <strtod+220> addiu $a1[5],$zero[0],1strtod.c:79
00401a18: ( 13, 0.01): <strtod+228> bc1f 00401a30 <strtod+240>strtod.c:87
00401a20: : <strtod+230> addiu $s1[17],$s1[17],100401a28: : <strtod+238> j 00401a58 <strtod+268>strtod.c:89
00401a30: ( 13, 0.01): <strtod+240> mul.d $f2,$f20,$f4
00401a38: ( 13, 0.01): <strtod+248> addiu $v0[2],$v1[3],-4800401a40: ( 13, 0.01): <strtod+250> mtc1 $v0[2],$f0
Figure 4. Sample output from text segment statistical pro le
Figure 5. Pipeline for sim-outorder
completes). After fetching the instructions, it places them in thedispatch queue, and probes the line predictor to obtain the correctcache line to access in the next cycle.
The code for the dispatch stage of the pipeline resides inruu_dispatch(). This routine is where instruction decodingand register renaming is performed. The function uses theinstructions in the input queue lled by the fetch stage, a pointerto the active RUU, and the rename table. Once per cycle, the dis-patcher takes as many instructions as possible (up to the dispatchwidth of the target machine) from the fetch queue and placesthem in the scheduler queue. This routine is the one in whichbranch mispredictions are noted. (When a misprediction occurs,the simulator uses speculative state buffers, which are managedwith a copy-on-write policy). The dispatch routine enters andlinks instructions into the RUU and the load/store queue (LSQ),as well as splitting memory operations into two separate instruc-tions (the addition to compute the effective address and the mem-ory operation itself).
The issue stage of the pipeline is contained inruu_issue() andlsq_refresh(). These routines modelinstruction wakeup and issue to the functional units, tracking reg-ister and memory dependences. Each cycle, the scheduling rou-tines locate the instructions for which the register inputs are allready. The issue of ready loads is stalled if there is an earlierstore with an unresolved effective address in the load/storequeue. If the address of the earlier store matches that of the wait-ing load, the store value is forwarded to the load. Otherwise, the
load is sent to the memory system.
The execute stage is also handled inruu_issue(). Eachcycle, the routine gets as many ready instructions as possiblefrom the scheduler queue (up to the issue width). The functionalunits’ availability is also checked, and if they have availableaccess ports, the instructions are issued. Finally, the routineschedules writeback events using the latency of the functionalunits (memory operations probe the data cache to obtain the cor-rect latency of the operation). Data TLB misses stall the issue ofthe memory operation, are serviced in the commit stage of thepipeline, and currently assume a xed latency. The functionalunits’ latencies are hardcoded in the de nition offu_config[] insim-outorder.c.
The writeback stage resides inruu_writeback(). Eachcycle it scans the event queue for instruction completions. Whenit nds a completed instruction, it walks the dependence chain ofinstruction outputs to mark instructions that are dependent on thecompleted instruction. If a dependent instruction is waiting onlyfor that completion, the routine marks it as ready to be issued.The writeback stage also detects branch mispredictions; when itdetermines that a branch misprediction has occurred, it rolls thestate back to the checkpoint, discarding the erroneously issuedinstructions.
ruu_commit() handles the instructions from the writebackstage that are ready to commit. This routine does in-order com-mitting of instructions, updating of the data caches (or memory)with store values, and data TLB miss handling. The routine keeps
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
retiring instructions at the head of the RUU that are ready tocommit until the head instruction is one that is not ready. Whenan instruction is committed, its result is placed into the archi-tected register le, and the RUU/LSQ resources devoted to thatinstruction are reclaimed.
sim-outorder runs about an order of magnitude slower thansim-fast. In addition to the arguments listed at the beginning ofSection4,sim-outorder uses the following command-line argu-ments:
Specifying the processor core-fetch:ifqsize <size>
set the fetch width to be <size> instructions.Must be a power of two. The default is 4.
-fetch:speed <ratio>
set the ratio of the front end speed relative tothe execution core (allowing <ratio> times asmany instructions to be fetched as decodedper cycle).
-fetch:mplat <cycles>
set the branch misprediction latency. Thedefault is 3 cycles.
-decode:width <insts>
set the decode width to be <insts>, whichmust be a power of two. The default is 4.
-issue:width <insts>
set the maximum issue width in a givencycle. Must be a power of two. The default is4.
-issue:inorderforce the simulator to use in-order issue. The
default is false.
-issue:wrongpath
allow instructions to issue after a misspecu-lation. The default is true.
-ruu:size <insts>
capacity of the RUU (in instructions). Thedefault is 16.
-lsq:size <insts>
capacity of the load/store queue (in instruc-tions). The default is 8.
-res:ialu <num>
specify number of integer ALUs. The defaultis 4.
-res:imult <num>
specify number of integer multipliers/divid-ers. The default is 1.
-res:memports <num>
specify number of L1 cache ports. Thedefault is 2.
-res:fpalu <num>
specify number of oating point ALUs. Thedefault is 4.
-res: fpmult <num>
specify number of oating point multipliers/dividers. The default is 1.Specifying the memory hierarchy
All of the cache arguments and formats used insim-cache(listed at the beginning of Section4.2) are also used insim-out-
order, with the following additions:-cache:dl1lat <cycles>
specify the hit latency of the L1 data cache.The default is 1 cycle.
-cache:d12lat <cycles>
specify the hit latency of the L2 data cache.The default is 6 cycles.
-cache:il1lat <cycles>
specify the hit latency of the L1 instructioncache. The default is 1 cycle.
-cache:il2lat <cycles>
specify the hit latency of the L2 instructioncache. The default is 6 cycles.
-mem:lat <1st> <next>
specify main memory access latency ( rst,rest). The defaults are 18 cycles and 2 cycles.
-mem:width <bytes>
specify width of memory bus in bytes. Thedefault is 8 bytes.
-tlb:lat <cycles>
specify latency (in cycles) to service a TLBmiss. The default is 30 cycles.Specifying the branch predictor
Branch prediction is speci ed by choosing the following agwith one of the six subsequent arguments. The default is a bimo-dal predictor with 2048 entries.-bpred <type>nottakenalways predict not taken.takenalways predict taken.perfectperfect predictor.bimodbimodal predictor, using a branch target
buffer (BTB) with 2-bit counters.
2lev2-level adaptive bcombined predictor (bimodal and 2-level
adaptive).The predictor-speci c arguments are listed below:-bpred:bimod <size>
set the bimodal predictor table size to be<size> entries.
-bpred:2lev <l1size> <l2size> <hist_size> <xor>
specify the 2-level adaptive predictor.<l1size> speci es the number of entries inthe rst-level table, <l2size> speci es thenumber of entries in the second-level table,<hist_size> speci es the history width, and<xor> allows you to xor the history and theaddress in the second level of the predictor.This organization is depicted in Figure6. InTable2 we show how these parameters cor-respond to modern prediction schemes. Thedefault settings for the four parameters are 1,1024, 8, and 0, respectively.
-bpred:comb <size>
set the meta-table size of the combined pre-dictor to be <size> entries. The default is1024.
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
patternhistory
2-bit
predictors
branchbranchprediction
Figure 6. 2-level adaptive predictor structure
-bpred:ras <size>
set the return stack size to <size> (0 entriesmeans to return stack). The default is 8.entries.
-bpred:btb <sets> <assoc>
con gure the BTB to have <sets> sets and an
associativity of <assoc>. The defaults are512 sets and an associativity of 4.
-bpred:spec_update <stage>
allow speculative updates of the branch pre-dictor in the decode or writeback stages(<stage> = [ID|WB]). The default is non-speculative updates in the commit stage.Visualization-pcstat <stat>
record statistic <stat> by text address; described in Section4.3.
-ptrace < le> <range>
pipeline tracing, described in Section5.
4.5 Simulator code le descriptions
The following list describes the functionality of the C code les in thesimplesim-2.0/ directory, which are used by allof the simulators. bitmap.h: Contains support macros for performing bit-map manipulation. bpred.[c,h]: Handles the creation, functionality, and
updates of the branch predictors.bpred_create(), bpred_lookup(), andbpred_update() are the keyinterface functions.
cache.[c,h]: Contains general functions to support
multiple cache types (e.g., TLBs, instruction and data
caches). Uses a linked-list for tag comparisons in caches oflow associativity (less than or equal to four), and a hashtable for tag comparisons in higher-associativity caches.The important interfaces arecache_create(),cache_access(),cache_probe(),
cache_flush(), andcache_flush_addr().
dlite.[c,h]: Contains the code for DLite!, the source-level target program debugger.
endian.[c,h]: De nes a few simple functions to deter-mine byte- and word-order on the host and target platforms.eval.[c,h]: Contains code to evaluate expressions, usedin DLite!.
eventq.[c,h]: De nes functions and macros to handleordered event queues (used for ordering writebacks). Theimportant interface functions areeventq_queue() andeventq_service_events().
loader.[c,h]: Loads the target program into memory,sets up the segment sizes and addresses, sets up the initialcall stack, and obtains the target program entry point. Theinterface isld_load_prog().
main.c: Performs all initialization and launches the mainsimulator function. The key functions are
sim_options(),sim_config(),sim_main(),andsim_stats().
memory.[c,h]: Contains functions for reading from,writing to, initializing, and dumping the contents of the tar-get main memory. Memory is implemented as a large atspace, each portion of which is allocated on demand.mem_access() is the important interface function.misc.[c,h]: Contains numerous useful support func-tions, such asfatal(),panic(),warn(),info(),debug(),getcore(), andelapsed_time().
options.[c,h]: Contains the SimpleScalar optionspackage code, used to process command-line argumentsand/or option speci cations from con g les. Options areregistered with an option database (see the functions calledopt_reg_*()).opt_print_help() generates a helplisting, andopt_print_options() prints the currentoptions’ state.
ptrace.[c,h]: Contains code to collect and producepipeline traces fromsim-outorder.
range.[c,h]: Holds code that interprets program rangecommands used in DLite!.
regs.[c,h]: Contains functions to initialize the register les and dump their contents.
resource.[c,h]: Contains code to manage functionalunit resources, divided up into classes. The three de nedfunctions create the resource pools and busy tables(res_create_pool()), return a resource from the spec-i ed pool if available (reg_get()), and dump the con-tents of a pool (res_dump()).
sim.h: Contains a few extern variable declarations andfunction prototypes.
stats.[c,h]: Contains routines to handle statistics mea-suring target program behavior. As with the options pack-
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
age, counters are “registered” by type with an internaldatabase. Thestat_reg_*() routines register countersof various types, andstat_reg_formula() allows youto register expressions constructed of other statistics.stat_print_stats() prints all registered statistics.The statistics package also has facilities to measure distribu-tions;stat_reg_dist() creates an array distribution,stat_reg_sdist() creates a sparse array distribution,andstat_add_sample()updates a distribution.
ss.[c,h]: De nes macros to expedite the processing ofinstructions, numerous constants needed across simulators,and a function to print out individual instructions in a read-able format.
ss.def: Holds a list of macro calls (the macros are de nedin the simulators andss.h andss.c), each of whichde nes an instruction. The macro calls accept as argumentsthe opcode, name of the instruction, sources, destinations,actions to execute, and other information. This le serves asthe de nition of the instruction set.
symbol.[c,h]: Holds routines to handle program sym-bol and line information (used in DLite!).
syscall.[c,h]: Contains code that acts as the interfacebetween the SimpleScalar system calls (which are POSIX-compliant) and the system calls on the host machine.
sysprobe.c: Determines byte and word order on the hostplatform, and generates appropriate compiler ags.
version.h: De nes the version number and release dateof the distribution.
The traces may be viewed with thepipeview.pl Perl script,which is provided in the simplesim-2.0 directory. (You will haveto update the rst line ofpipeview.pl to have the correct pathto your local Perl binary, and you must have Perl installed onyour system).
pipeview.pl <ptrace_file>
We depict sample output from the pipetracer in Figure7.
5.2 The DLite! debugger
Release 2.0 of SimpleScalar includes a lightweight symbolicdebugger called DLite!, which runs with all simulators except forsim-fast. DLite! allows you to step through thebenchmark targetcode, not the simulator code. The debugger can be incorporatedinto a simulator by adding only four function calls (which havealready been added to all simulators in the distribution). Theneeded four function prototypes are indlite.h.
To use the debugger in a simulation, add the “-i” option(which stands for interactive) to the simulator command line.Below we list the set of commands that DLite! accepts.Getting help and getting out:help [string]print command reference.versionprint DLite! version information.quitexit simulator.terminategenerate statistics and exit simulator.Running and setting breakpoints:stepexecute next instruction and break.cont [addr]continue execution (optionally continuing
starting at <addr>).
break <addr>set breakpoint at <addr>, returns <id> of
breakpoint.
dbreak <addr> [r,w,x]
set data breakpoint at <addr> for (r)ead,(w)rite, and/or e(x)ecute, returns <id> ofbreakpoint.
rbreak <range> [r,w,x]
set breakpoint at <range> for (r)ead, (w)rite,and/or e(x)ecute, returns <id> of breakpoint.
breakslist active code and data breakpoints.delete <id>delete breakpoint <id>.clearclear all breakpoints (code and data).Printing information:
print [modi ers] <expr>
print the value of <expr> using optionalmodi ers.
display [modi ers] <expr>
display the value of <expr> using optionalmodi ers.
option <string>print the value of option <string>.optionsprint the values of all options.
stat <string>print the value of a statistical variable.statsprint the values of all statistical variables.whatis <expr>print the type of <expr>.regsprint all register contents.iregsprint all instruction register contents.
5 Utilities
In this section we describe the utilities that accompany the
SimpleScalar tool set; pipeline tracing and a source-level debug-ger.
5.1 Out-of-order pipeline tracing
The tool set provides the ability to extract and view traces ofthe out-of-order pipeline. Using the “-ptrace” option, a detailedhistory of all instructions executed in a range may be saved to a le. The information saved includes instruction fetch, retirement,and stage transitions. The syntax of this command is as follows:-ptrace < le> <start>:<end>
< le> is the le to which the trace will besaved. <start> and <end> are the instructionnumbers at which the trace will be startedand stopped. If they are left blank, the tracewill start at the beginning and/or stop at theend of the program, respectively.
For example:
-ptrace FOO.trc 100:500
trace from instructions 100 to 500, store thetrace in le FOO.src.
-ptrace FOO.trc :10000
trace from program beginning to instruction10000.
-ptrace FOO.trc :
trace the entire program execution.
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
new cycleindicator
new instructionde nitions
@ 610
gf = ‘0x0040d098: addiugg = ‘0x0040d0a0: beq[IF]gfgg
[DA]gbgcgd\ge
decoded, orawaiting issue
r2, r4, -1’r3, r5, 0x30’
[EX]fyfzga+
fr\fs
ftfu
results intoRUU, or
awaiting retire
pipeline event:(misprediction
detected), see output[CT]
fq
current pipeline
state
fetched, or infetch queue
executing
results toregister le
Figure 7. Example of sim-outorder pipetrace
fpregsprint all oating point register contents.mstate [string]print machine-speci c state.dump <addr> [count]
dump memory at <addr> (optionally for<count> words).
dis <addr> [count]
disassemble instructions at <addr> (option-ally for <count> instructions).
symbolsprint the value of all program symbols.tsymbolsprint the value of all program text symbols.dsymbolsprint the value of all program data symbols.symbol <string>
print the value of symbol <string>.Legal arguments:
Arguments <addr>, <cnt>, <expr>, and <id> are any legalexpression:
<expr>← <factor> +|- <expr><factor>← <term> *|/ <factor><term>← ( <expr> )
| - <term> | <const> | <symbol> | < le:loc>
<symbol>← <literal> | <function name> | <register><literal>← [0-9]+ | 0x[0-9,a-f]+ | 0[0-7]+
<register>← $r[0-31] | $f[0-31] | $pc | $fcc | $hi | $loLegal ranges:
<range>← <address> | <instruction> | <cycle><address>← @<function name>:{+<literal>}<instruction>← {<literal>}:{<literal>}<cycle>← #{<literal>}:{<literal>}
Omitting optional arguments to the left of the colon will defaultto the smallest value permitted in that range. Omitting anoptional argument at the right of the colon will default to thelargest value permitted in that range.Legal command modi ers:bprint a byte
hprint a half (short)
wtox1fdcs
print a word (default)
print in decimal format (default)print in octal formatprint in hex formatprint in binary formatprint oatprint doubleprint characterprint string
Examples of legal commands:
break main+8break 0x400148dbreak stdin w
dbreak sys_count wrrbreak @main:+279rbreak 2000:3500rbreak #:100cycle 0 to cycle 100rbreak :entire execution
6 Summary
The SimpleScalar tool set was written by Todd Austin overabout one and a half years, between 1994 and 1996. He continuesto add improvements and updates. The ancestors of the tool setdate back to the mid to late 1980s, to tools written by ManojFranklin. At the time the tools were developed, both individualswere research assistants at the University of Wisconsin-MadisonComputer Sciences Department, supervised by Professor GuriSohi. Scott Breach provided valuable assistance with the imple-mentation of the proxy system calls. The rst release was assem-bled, debugged, and documented by Doug Burger, also aresearch assistant at Wisconsin, who is the maintainer of the sec-ond release as well. Kevin Skadron, currently at Princeton,implemented many of the more recent branch prediction mecha-nisms.
Many exciting extensions to SimpleScalar are both underwayand planned. Efforts have begun to extend the processor simula-
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
tors to simulate multithreaded processors and multiprocessors. ASemantics:Linux port to SimpleScalar (enabling simulation of the OS on akernel with publicly available sources) is planned, using device-level emulation and a user-level le system. Other plans includeJR:
extending the tool set to simulate ISAs other than SimpleScalarOpcode:and MIPS (Alpha and SPARC ISA support will be the rst addi-Format:tions).
Semantics:As they stand now, these tools provide researchers with a sim-ulation infrastructure that is fast, exible, and ef cient. Changesin both the target hardware and software may be made with min-JALR:imal effort. We hope that you nd these tools useful, and encour-Opcode:age you to contact us with ways that we can improve the release,Format:Semantics:
documentation, and the tools themselves.
References
BEQ:
[1]
L.A. Belady. A Study of Replacement Algorithms for aOpcode:Virtual-Storage Computer.IBM Systems Journal, 5(2):78–Format:101, 1966.
Semantics:
[2]
Doug Burger, ToddM. Austin, and Steven Bennett. Evalu-ating Future Microprocessors: the SimpleScalar Tool Set.Technical Report 1308, Computer Sciences Department,University of Wisconsin, Madison, WI, July 1996.
BNE:
[3]
L.P. Horwitz, R.M. Karp, R.E. Miller, and A.Winograd.Opcode:Index Register Allocation.Journal of the ACM, 13(1):43–Format:61, January 1966.
Semantics:
[4]Charles Price.MIPS IV Instruction Set, revision 3.1. MIPSTechnologies, Inc., Mountain View, CA, January 1995.[5]
GurindarS. Sohi. Instruction Issue Logic for High-Perfor-mance, Interruptible, Multiple Functional Unit, PipelinedComputers.IEEE Transactions on Computers, 39(3):349–BLEZ:359, March 1990.
Opcode:[6]
RabinA. Sugumar and SantoshG. Abraham. Ef cientFormat:Semantics:
Simulation of Caches under Optimal Replacement withApplications to Miss Characterization. InProceedings ofthe 1993 ACM Sigmetrics Conference on Measurementsand Modeling of Computer Systems, pages 24–35, May1993.
BGTZ:Opcode:A Instruction set de nition
Format:Semantics:
This appendix lists all SimpleScalar instructions with theiropcode, assembler format, and semantics. The semantics areexpressed as a C-style expression that uses the extended opera-tors and operands described in Table3. Operands that are notBLTZ:listed in Table3 refer to actual instruction elds described inOpcode:Figure3. For each instruction, the next PC value (NPC) defaultsFormat:to the current PC value plus eight (CPC+8) unless otherwiseSemantics:
speci ed.
A.1 Control instructions
J:
Jump to absolute address.BGEZ:Opcode:0x01Opcode:Format:J target
Format:Semantics:SET_NPC((CPC & 0xf0000000) | (TARGET<<2)))
Semantics:
JAL:
Jump to absolute address and link.Opcode:0x02
Format:
JAL target
BC1F:SET_NPC((CPC\&0xf0000000) | (TARGET<<2))SET_GPR(31, CPC + 8))
Jump to register address.0x03JR rs
TALIGN(GPR(RS))SET_NPC(GPR(RS))
Jump to register address and link.0x04JALR rs
TALIGN(GPR(RS))
SET_GPR(RD, CPC + 8)
SET_NPC(GPR(RS))
Branch if equal.0x05
BEQ rs,rt,offset
if (GPR(RS) == GPR(RT))
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Branch if not equal.0x06
BEQ rs,rt,offset
if (GPR(RS) != GPR(RT))
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Branch if less than or equal to zero.0x07
BLEZ rs,offset
if (GPR(RS) <= 0)
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Branch if greater than zero.0x08
BGTZ rs,offset
if (GPR(RS) > 0)
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Branch if less than zero.0x09
BLTZ rs,offset
if (GPR(RS) < 0)
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Branch if greater than or equal to zero.0x0a
BGEZ rs,offset
if (GPR(RS) >= 0)
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Branch on oating point compare false.
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
Opcode:Format:Semantics:
0x0b
BC1F offset
if (!FCC)
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
Semantics:LBU:
Opcode:Format:Semantics:
SET_GPR(RT,
READ_SIGNED_BYTE(GPR(RS)+GPR(RD)))
Load byte unsigned, displaced addressing.0x22
LBU rt,offset(rs) inc_dec
SET_GPR(RT,
READ_UNSIGNED_BYTE(GPR(RS)+OFF-SET))
BC1T:Opcode:Format:Semantics:
Branch on oating point compare true.0x0c
BC1T offset
if (FCC)
SET_NPC(CPC + 8 + (OFFSET << 2))else
SET_NPC(CPC + 8)
LBU:
Opcode:Format:Semantics:
Load byte unsigned, indexed addressing.0xc1
LBU rt,(rs+rd) inc_dec
SET_GPR(RT,
READ_UNSIGNED_BYTE(GPR(RS)+GPR(RD)))
A.2 Load/store instructions
LB:
Opcode:Format:Semantics:LB:
Opcode:Format:
Load byte signed, displaced addressing.0x20
LB rt,offset(rs) inc_dec
SET_GPR(RT, READ_SIGNED_BYTE(GPR(RS)+ OFFSET))
LH:
Opcode:Format:Semantics:LH:
Opcode:
Load half signed, displaced addressing.0x24
LH rt,offset(rs) inc_dec
SET_GPR(RT,
READ_SIGNED_HALF(GPR(RS)+OFFSET))
Load byte signed, indexed addressing.0xc0
LB rt,(rs+rd) inc_dec
Load half signed, indexed addressing.0xc2
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
Format:Semantics:LHU:
Opcode:Format:Semantics:
LHU:
Opcode:Format:Semantics:
LW:
Opcode:Format:Semantics:LW:
Opcode:Format:Semantics:DLW:
Opcode:Format:Semantics:
DLW:
Opcode:Format:Semantics:
L.S:Opcode:Format:Semantics:L.S:Opcode:Format:Semantics:L.D:Opcode:
LH rt,(rs+rd) inc_dec
Format:SET_GPR(RT,
Semantics:
READ_SIGNED_HALF(GPR(RS)+GPR(RD)))
Load half unsigned, displaced addressing.0x26
LHU rt,offset(rs) inc_dec
L.D:SET_GPR(RT,
READ_UNSIGNED_HALF(GPR(RS)+OFF-Opcode:SET))
Format:Semantics:
Load half unsigned, indexed addressing.0xc3
LHU rt,(rs+rd) inc_dec
SET_GPR(RT,
READ_UNSIGNED_HALF(GPR(RS)+GPR(RD)LWL:
))
Opcode:Format:Load word, displaced addressing.Semantics:
0x28
LW rt,offset(rs) inc_dec
SET_GPR(RT, READ_WORD(GPR(RS)+OFF-SET))
LWR:
Opcode:Load word, indexed addressing.Format:0xc4
Semantics:
LW rt,(rs+rd) inc_dec
SET_GPR(RT,
READ_WORD(GPR(RS)+GPR(RD)))
SB:
Double load word, displaced addressing.Opcode:0x29
Format:DLW rt,offset(rs) inc_dec
Semantics:SET_GPR(RT, READ_WORD(GPR(RS)+OFF-SET))
SB:
SET_GPR(RT+1,
Opcode:READ_WORD(GPR(RS)+OFFSET+4))
Format:Semantics:Double load word, indexed addressing.0xce
SH:
DLW rt,(rs+rd) inc_dec
Opcode:SET_GPR(RT,
Format:READ_WORD(GPR(RS)+GPR(RD)))Semantics:SET_GPR(RT+1,
READ_WORD(GPR(RS)+GPR(RD)+4))
SH:
Load word into oating point register le,Opcode:displaced addressing.Format:Semantics:0x2a
L.S ft,offset(rs) inc_dec
SW:
SET_FPR_L(FT, READ_WORD(GPR(RS)+OFF-SET))
Opcode:Format:Load word into oating point register le,Semantics:indexed addressing.SW:
0xc5
L.S ft,(rs+rd) inc_dec
Opcode:Format:SET_FPR_L(RT,
Semantics:READ_WORD(GPR(RS)+GPR(RD)))
Load double word into oating point registerDSW:
le, displaced addressing.Opcode:Format:0x2b
Semantics:
L.D ft,offset(rs) inc_dec
SET_FPR_L(FT, READ_WORD(GPR(RS)+OFF-SET))
SET_FPR_L(FT+1,
READ_WORD(GPR(RS)+OFFSET+4))
Load double word into oating point register le, indexed addressing.0xcf
L.D ft,(rs+rd) inc_dec
SET_FPR_L(RT,
READ_WORD(GPR(RS)+GPR(RD)))SET_FPR_L(RT+1,
READ_WORD(GPR(RS)+GPR(RD)+4))
Load word left, displaced addressing.0x2c
LWL offset(rs)
Seess.def for a detailed description of thisinstruction’s semantics. NOTE: LWL does notsupport pre-/post- inc/dec.
Load word right, displaced addressing.0x2d
LWR offset(rs)
Seess.deffor a detailed description of thisinstruction’s semantics. NOTE: LWR does notsupport pre-/post- inc/dec.Store byte, displaced addressing.0x30
SB rt,offset(rs) inc_dec
WRITE_BYTE(GPR(RT), GPR(RS)+OFFSET)
Store byte, indexed addressing.0xc6
SB rt,(rs+rd) inc_dec
WRITE_BYTE(GPR(RT), GPR(RS)+GPR(RD))
Store half, displaced addressing.0x32
SH rt,offset(rs) inc_dec
WRITE_HALF(GPR(RT), GPR(RS)+OFFSET)
Store half, indexed addressing.0xc7
SH rt,(rs+rd) inc_dec
WRITE_HALF(GPR(RT), GPR(RS)+GPR(RD))
Store word, displaced addressing.0x34
SW rt,offset(rs) inc_dec
WRITE_WORD(GPR(RT), GPR(RS)+OFFSET)
Store word, indexed addressing.0xc8
SW rt,(rs+rd) inc_dec
WRITE_WORD(GPR(RT), GPR(RS)+GPR(RD))
Double store word, displaced addressing.0x35
DSW rt,offset(rs) inc_dec
WRITE_WORD(GPR(RT), GPR(RS)+OFFSET)
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
WRITE_WORD(GPR(RT+1), GPR(RS)+OFF-SET+4)
Format:Semantics:
DSW:
Opcode:Format:Double store word, indexed addressing.0xd0
DSW rt,(rs+rd) inc_dec
SWR rt,offset(rs)
Seess.def for a detailed description of thisinstruction’s semantics. NOTE: SWR does notsupport pre-/post- inc/dec.
A.3 Integer instructions
Semantics:
DSZ:
Opcode:Format:Semantics:DSZ:
Opcode:Format:Semantics:S.S:
Opcode:Format:Semantics:S.S:
Opcode:Format:Semantics:S.D:Opcode:Format:Semantics:
S.D:Opcode:Format:Semantics:
SWL:
Opcode:Format:Semantics:
SWR:
Opcode:
WRITE_WORD(GPR(RT), GPR(RS)+GPR(RD))WRITE_WORD(GPR(RT+1),ADD:
GPR(RS)+GPR(RD)+4)
Opcode:Double store zero, displaced addressing.Format:Semantics:0x38
DSW rt,offset(rs) inc_dec
WRITE_WORD(0, GPR(RS)+OFFSET)ADDI:WRITE_WORD(0, GPR(RS)+OFFSET+4)
check).Double store zero, indexed addressing.Opcode:0xd1
Format:DSW rt,(rs+rd) inc_dec
Semantics:WRITE_WORD(0, GPR(RS)+GPR(RD))WRITE_WORD(0, GPR(RS)+GPR(RD)+4)
ADDU:Store word from oating point register le,Opcode:displaced addressing.Format:Semantics:0x36
S.S ft,offset(rs) inc_dec
ADDIU:WRITE_WORD(FPR_L(FT), GPR(RS)+OFF-SET)
check).Opcode:Store word from oating point register le,Format:indexed addressing.Semantics:0xc9
S.S ft,(rs+rd) inc_dec
SUB:
WRITE_WORD(FPR_L(FT),Opcode:GPR(RS)+GPR(RD))
Format:Semantics:Store double word from oating point regis-ter le, displaced addressing.SUBU:0x37
check).S.D ft,offset(rs) inc_dec
Opcode:WRITE_WORD(FPR_L(FT), GPR(RS)+OFF-Format:SET)
WRITE_WORD(FPR_L(FT+1), GPR(RS)+OFF-Semantics:SET+4)
MULT:Store double word from oating point regis-Opcode:ter le, indexed addressing.Format:0xd2
Semantics:S.D ft,(rs+rd) inc_dec
WRITE_WORD(FPR_L(FT),GPR(RS)+GPR(RD))
MULTU:WRITE_WORD(FPR_L(FT+1),Opcode:GPR(RS)+GPR(RD)+4)
Format:Semantics:
Store word left, displaced addressing.0x39
SWL rt,offset(rs)
Seess.deffor a detailed description of thisDIV:
instruction’s semantics. NOTE: SWL does notOpcode:support pre-/post- inc/dec.
Format:Semantics:
Store word right, displaced addressing.0x3a
Add signed (with over ow check).0x40
ADD rd,rs,rt
OVER(GPR(RT),GPR(RT))
SET_GPR(RD, GPR(RS) + GPR(RT))
Add immediate signed (with over ow0x41
ADDI rd,rs,rt
OVER(GPR(RS),IMM)
SET_GPR(RT, GPR(RS) + IMM)
Add unsigned (no over ow check).0x42
ADDU rd,rs,rt
SET_GPR(RD, GPR(RS) + GPR(RT))
Add immediate unsigned (no over ow0x43
ADDIU rd,rs,rt
SET_GPR(RT, GPR(RS) + IMM)
Subtract signed (with under ow check).0x44
SUB rd,rs,rt
UNDER(GPR(RS),GPR(RT))
SET_GPR(RD, GPR(RS) - GPR(RT))
Subtract unsigned (without under ow0x45
SUBU rd,rs,rt
SET_GPR(RD, GPR(RS) - GPR(RT))
Multiply signed.0x46
MULT rs,rt
SET_HI((RS * RT) / (1<<32))SET_LO((RS * RT) % (1<<32))
Multiply unsigned.0x47
MULTU rs,rt
SET_HI(((unsigned)RS * (unsigned)RT)/(1<<32))SET_LO(((unsigned)RS*(unsigned)RT) %(1<<32))
Divide signed.0x48DIV rs,rt
DIV0(GPR(RT))
SET_LO(GPR(RS) / GPR(RT))SET_HI(GPR(RS) % GPR(RT))
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
DIVU
Opcode:Format:Semantics:
MFHI:Opcode:Format:Semantics:MTHI:Opcode:Format:Semantics:MFLO:Opcode:Format:Semantics:MTLO:Opcode:Format:Semantics:AND:
Opcode:Format:Semantics:ANDI:Opcode:Format:Semantics:OR:
Opcode:Format:Semantics:ORI:
Opcode:Format:Semantics:XOR:
Opcode:Format:Semantics:XORI:Opcode:Format:Semantics:NOR:
Opcode:Format:
Divide unsigned.0x49
DIVU rs,rt
DIV0(GPR(RT))
SET_LO((unsigned)GPR(RS)/(unsigned)GPR(RT))
SET_HI((unsigned)GPR(RS)%(unsigned)GPR(RT))
Move from HI register.0x4aMFHI rd
SET_GPR(RD, HI)
Move to HI register.0x4bMTHI rs
SET_HI(GPR(RS))
Move from LO register.0x4cMFLO rd
SET_GPR(RD, LO)
Move to LO register.0x4dMTLO rs
SET_LO(GPR(RS))
Logical AND.0x4e
AND rd,rs,rt
SET_GPR(RD, GPR(RS) & GPR(RT))
Logical AND immediate.0x4f
ANDI rd,rt,imm
SET_GPR(RT, GPR(RS) & UIMM)
Logical OR.0x50
OR rd,rs,rt
SET_GPR(RD, GPR(RS) | GPR(RT))
Logical OR immediate.0x51
ORI rd,rt,imm
SET_GPR(RT, GPR(RS) | UIMM)
Logical XOR.0x52
XOR rd,rs,rt
SET_GPR(RD, GPR(RS) ^ GPR(RT))
Logical XOR immediate.0x53
ORI rd,rt,uimm
SET_GPR(RT, GPR(RS) ^ UIMM)
Logical NOR.0x54
NOR rd,rs,rt
Semantics:SET_GPR(RD, ~(GPR(RS) | GPR(RT)))
SLL:
Shift left logical.Opcode:0x55
Format:SLL rd,rt,shamt
Semantics:SET_GPR(RD, GPR(RT) << SHAMT)
SLLV:
Shift left logical variable.Opcode:0x56
Format:SLLV rd,rt,rs
Semantics:SET_GPR(RD, GPR(RT) << (GPR(RS) & 0x1f))
SRL:
Shift right logical.Opcode:0x57
Format:SRL rd,rt,shamt
Semantics:SET_GPR(RD, GPR(RT) >> SHAMT)
SRLV:Shift right logical variable.Opcode:0x58
Format:SRLV rd,rt,rs
Semantics:SET_GPR(RD, GPR(RT) << (GPR(RS) & 0x1f))
SRA:
Shift right arithmetic.Opcode:0x59
Format:SRA rd,rt,shamt
Semantics:SET_GPR(RD, SEX(GPR(RT) >> SHAMT, 31 -SHAMT))
SRAV:Shift right arithmetic variable.Opcode:0x59
Format:SRAV rd,rt,rs
Semantics:SET_GPR(RD, SEX(GPR(RT) >> SHAMT, 31 -(GPR(RD) & 0x1f)))
SLT:
Set register if less than.Opcode:0x5b
Format:SLT rd,rs,rt
Semantics:SET_GPR(RD, (GPR(RS) < GPR(RT)) ? 1 : 0)
SLTI:
Set register if less than immediate.Opcode:0x5c
Format:SLTI rd,rs,imm
Semantics:SET_GPR(RD, (GPR(RS) < IMM) ? 1 : 0)
SLTU:Set register if less than unsigned.Opcode:0x5d
Format:SLTU rd,rs,rt
Semantics:SET_GPR(RD,
((unsigned)GPR(RS)<(unsigned)GPR(RT)) ? 1 : 0)
SLTIU:Set register if less than unsigned immediate.Opcode:0x5d
Format:SLTIU rd,rs,imm
Semantics:
SET_GPR(RD,
((unsigned)GPR(RS)<(unsigned)GPR(RT)) ? 1 : 0)
A.4 Floating-point instructions
ADD.S:Add oating point, single precision.Opcode:0x70
Format:ADD.S fd,fs,ft
Semantics:
FPALIGN(FD)
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
ADD.D:Opcode:Format:Semantics:
SUB.S:Opcode:Format:Semantics:
SUB.D:Opcode:Format:Semantics:
MUL.S:Opcode:Format:Semantics:
MUL.D:Opcode:Format:Semantics:
DIV.S:Opcode:Format:Semantics:
DIV.D:Opcode:Format:Semantics:
ABS.S:Opcode:Format:FPALIGN(FS)FPALIGN(FT)
SET_FPR_F(FD, FPR_F(FS) + FPR_F(FT)))
Add oating point, double-precision.0x71
ADD.D fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)
SET_FPR_D(FD, FPR_D(FS) + FPR_D(FT)))
Subtract oating point, single precision.0x72
SUB.S fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)
SET_FPR_F(FD, FPR_F(FS) - FPR_F(FT)))
Subtract oating point, double precision.0x73
SUB.D fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)
SET_FPR_D(FD, FPR_D(FS) - FPR_D(FT)))
Multiply oating point, single precision.0x74
MUL.S fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)
SET_FPR_F(FD,FPR_F(FS)*FPR_F(FT)))
Multiply oating point, double precision.0x75
MUL.D fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)
SET_FPR_D(FD, FPR_D(FS) * FPR_D(FT)))
Divide oating point, single precision.0x76
DIV.S fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)DIV0(FPR_F(FT))
SET_FPR_F(FD, FPR_F(FS) / FPR_F(FT)))
Divide oating point, double precision.0x77
DIV.D fd,fs,ft
FPALIGN(FD)FPALIGN(FS)FPALIGN(FT)DIV0(FPR_D(FT))
SET_FPR_D(FD, FPR_D(FS) / FPR_D(FT)))
Absolute value, single precision.0x78
ABS.S fd,fsSemantics:
ABS.D:Opcode:Format:Semantics:
MOV.S:Opcode:Format:Semantics:
MOV.D:Opcode:Format:Semantics:
NEG.S:Opcode:Format:Semantics:
NEG.D:sion.
Opcode:Format:Semantics:
CVT.S.D:Opcode:Format:Semantics:
CVT.S.W:Opcode:Format:Semantics:
CVT.D.S:Opcode:Format:Semantics:
CVT.D.W:Opcode:Format:FPALIGN(FD)FPALIGN(FS)
SET_FPR_F(FD, fabs((double)FPR_F(FS))))
Absolute value, double precision.0x79
ABS.D fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_D(FD, fabs(FPR_D(FS))))
Move oating point value, single precision.0x7a
MOV.S fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_F(FD, FPR_F(FS))
Move oating point value, double precision.0x7b
MOV.D fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_D(FD, FPR_D(FS))
Negate oating point value, single precision.0x7c
NEG.S fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_F(FD, -FPR_F(FS))
Negate oating point value, double preci-0x7d
NEG.D fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_D(FD, -FPR_D(FS))
Convert double precision to single precision.0x80
CVT.S.D fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_D(FD, -FPR_D(FS))
Convert integer to single precision.0x81
CVT.S.W fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_F(FD, ( oat)FPR_L(FS))
Convert single precision to double precision.0x82
CVT.D.S fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_D(FD,(double)FPR_F(FS))
Convert integer to double precision.0x83
CVT.D.W fd,fs
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
CVT.W.S:Opcode:Format:Semantics:
CVT.W.D:Opcode:Format:Semantics:
C.EQ.S:Opcode:Format:Semantics:
C.EQ.D:Opcode:Format:Semantics:
C.LT.S:Opcode:Format:Semantics:
C.LT.D:Opcode:Format:Semantics:
C.LE.S:Opcode:Format:Semantics:
C.LE.D:Opcode:Format:Semantics:
SQRT.S:Opcode:Format:Semantics:
FPALIGN(FS)
SET_FPR_D(FD,(double)FPR_L(FS))
Convert single precision to integer.0x84
CVT.W.S fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_L(FD, (long)FPR_F(FS))
Convert double precision to integer.0x85
CVT.W.D fd,fs
FPALIGN(FD)FPALIGN(FS)
SET_FPR_L(FD, (long)FPR_D(FS))
Test if equal, single precision.0x90
C.EQ.S fs,ft
FPALIGN(FS)FPALIGN(FT)
SET_FCC(FPR_F(FS) == FPR_F(FT))
Test if equal, double precision.0x91
C.EQ.D fs,ft
FPALIGN(FS)FPALIGN(FT)
SET_FCC(FPR_D(FS) == FPR_D(FT))
Test if less than, single precision.0x92
C.LT.S fs,ft
FPALIGN(FS)FPALIGN(FT)
SET_FCC(FPR_F(FS) < FPR_F(FT))
Test if less than, double precision.0x93
C.LT.D fs,ft
FPALIGN(FS)FPALIGN(FT)
SET_FCC(FPR_D(FS) < FPR_D(FT))
Test if less than or equal, single precision.0x94
C.LE.S fs,ft
FPALIGN(FS)FPALIGN(FT)
SET_FCC(FPR_F(FS) <= FPR_F(FT))
Test if less than or equal, double precision.0x95
C.LE.D fs,ft
FPALIGN(FS)FPALIGN(FT)
SET_FCC(FPR_D(FS) <= FPR_D(FT))
Square root, single precision.0x96
SQRT.S fd,fs
FPALIGN(FD)
SET_FPR_F(FD,sqrt((double)FPR_F(FS)))
SQRT.D:Square root, double precision.Opcode:0x97
Format:SQRT.D fd,fs
Semantics:
FPALIGN(FD)FPALIGN(FS)
SET_FPR_D(FD, sqrt(FPR_D(FS)))
A.5 Miscellaneous instructions
NOP:
No operation.Opcode:0x00Format:NOPSemantics:None
SYSCALL:System call.Opcode:0xa0
Format:SYSCALL
Semantics:See AppendixB for details
BREAK:Declare a program error.Opcode:0xa1
Format:BREAK uimm
Semantics:
Actions are simulator-dependent. Typically,an error message is printed andabort() iscalled.
LUI:
Load upper immediate.Opcode:0xa2
Format:LUI uimm
Semantics:SET_GPR(RT, UIMM << 16)
MFC1:Move from oating point to integer register le.Opcode:0xa3
Format:MFC1 rt,fs
Semantics:SET_GPR(RT, FPR_L(FS))
MTC1:Move from integer to oating point register le.Opcode:0xa5
Format:MTC1 rt,fs
Semantics:
SET_FPR_L(FS, GPR(RT))
B System call de nitions
This appendix lists all system calls supported by the simula-tors with their system call code (syscode), interface speci cation,and appropriate POSIX Unix reference. Systems calls are initi-ated with the SYSCALL instruction. Prior to execution of aSYSCALL instruction, register $v0 should be loaded with thesystem call code. The arguments of the system call interface pro-totype should be loaded into registers $a0 - $a3 in the order spec-i ed by the system call interface prototype,e.g., for:
read(int fd, char *buf, int nbyte),
0x03 is loaded into $v0,fd is loaded into $a0,buf into $a1, andnbyte into $a2.EXIT:
Exit process.
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
Interface:Semantics:READ:Syscode:Interface:Semantics:WRITE:Syscode:Interface:Semantics:OPEN:Syscode:Interface:Semantics:CLOSE:Syscode:Interface:Semantics:CREAT:Syscode:Interface:Semantics:UNLINK:Syscode:Interface:Semantics:CHDIR:Syscode:Interface:Semantics:CHMOD:Syscode:Interface:Semantics:CHOWN:Syscode:Interface:Semantics:BRK:
Syscode:Interface:Semantics:LSEEK:Syscode:Interface:Semantics:GETPID:Syscode:Interface:
void exit(int status);Seeexit(2).
GETUID:Read from le to buffer.Syscode:Interface:0x03
Semantics:int read(int fd, char *buf, int nbyte);Seeread(2).
ACCESS:Write from a buffer to a le.Syscode:Interface:0x04
Semantics:int write(int fd, char *buf, int nbyte);Seewrite(2).
STAT:
Open a le.Syscode:Interface:
0x05
int open(char *fname, int ags, int mode);Seeopen(2).Close a le.0x06
int close(int fd);Seeclose(2).
Create a le.0x08
int creat(char *fname, int mode);Seecreat(2).Delete a le.0x0a
int unlink(char *fname);Seeunlink(2).Change process directory.0x0c
int chdir(char *path);Semantics:Seechdir(2).
LSTAT:Change le permissions.Syscode:0x0f
Interface:int chmod(int *fname, int mode);Semantics:Seechmod(2).
DUP:
Change le owner and group.Syscode:0x10
Interface:int chown(char *fname, int owner, int group);Semantics:Seechown(2).
PIPE:
Change process break address.Syscode:0x11
Interface:int brk(long addr);Semantics:Seebrk(2).
GETGID:Move le pointer.Syscode:0x13
Interface:long lseek(int fd, long offset, int whence);Semantics:Seelseek(2).IOCTL:Get process identi er.Syscode:0x14
Interface:int getpid(void);
Semantics:
Get user identi er.0x18
int getuid(void);Seegetuid(2).
Determine accessibility of a le.0x21
int access(char *fname, int mode);Seeaccess(2).
Get le status.0x26struct stat{
shortst_dev;longst_ino;unsigned shortst_mode;shortst_nlink;shortst_uid;shortst_gid;shortst_rdev;intst_size;intst_atime;intst_spare1;intst_mtime;intst_spare2;intst_ctime;intst_spare3;longst_blksize;longst_blocks;longst_gennum;longst_spare4;};
int stat(char *fname, struct stat *buf);Seestat(2).
Get le status (and don’t dereference links).0x28
int lstat(char *fname, struct stat *buf);Seelstat(2).Duplicate a le descriptor.0x29
int dup(int fd);Seedup(2).
Create an interprocess comm. channel.0x2a
int pipe(int fd[2]);Seepipe(2).Get group identi er.0x2f
int getgid(void);Seegetgid(2).
Device control interface.0x36
int ioctl(int fd, int request, char *arg);Seeioctl(2).
under Contract DABT63-95-C-0127 and ARPA order no. D346. The current support for this work comes from a variety of sources, all of to which we are indebted.
FSTAT:Get le descriptor status.Syscode:0x3e
Interface:int fstat(int fd, struct stat *buf);Semantics:Seefstat(2).GETPAGESIZE:Get page size.Syscode:0x40
Interface:int getpagesize(void);Semantics:
Seegetpagesize(2).
GETDTABLESIZE: Get le descriptor table size.Syscode:0x59Interface:int getdtablesize(void);Semantics:Seegetdtablesize(2).DUP2:
Duplicate a le descriptor.Syscode:0x5a
Interface:int dup2(int fd1, int fd2);Semantics:Seedup2(2).
FCNTL:File control.Syscode:0x5c
Interface:int fcntl(int fd, int cmd, int arg);Semantics:Seefcntl(2).
SELECT:Synchronous I/O multiplexing.Syscode:0x5d
Interface:int select (int width, fd_set *readfds, fd_set*writefds, fd_set *exceptfds, struct timeval*timeout);
Semantics:
Seeselect(2).
GETTIMEOFDAY: Get the date and time.Syscode:0x74Interface:struct timeval {
long tv_sec;long tv_usec;};
struct int {
timezone tz_minuteswest;int tz_dsttime;};
int gettimeofday(struct timeval *tp,struct timezone *tzp);
Semantics:Seegettimeofday(2).WRITEV:Write output, vectored.Syscode:0x79
Interface:int writev(int fd, struct iovec *iov, int cnt);Semantics:Seewritev(2).
UTIMES:Set le times.Syscode:0x8a
Interface:int utimes(char * le, struct timeval *tvp);Semantics:Seeutimes(2).
GETRLIMIT:Get maximum resource consumption.Syscode:0x90
Interface:int getrlimit(int res, struct rlimit *rlp);Semantics:Seegetrlimit(2).
SETRLIMIT:
Set maximum resource consumption.
Syscode:0x91
Interface:int setrlimit(int res, struct rlimit *rlp);Semantics:Seesetrlimit(2).
正在阅读:
The simplescalar tool set, version 2.008-11
2011单层工业厂房毕业设计计算书03-08
莲花村党员冬训活动方案06-06
配料计算及物料平衡表07-24
大学生学雷锋先进事迹材料(精选多篇)03-08
翻译04-11
编译原理复习题04-29
(整理)各种经典酱汁制作大全.04-17
榴莲作文350字06-25
公共管理专业毕业论文参考题目09-18
- 1WEGO a web tool for plotting GO annotations
- 2Instruction set compiled simulation A technique for fast and
- 3set - input - delay 之经典图解
- 4点焊设备清洁 (version 2)
- 5set_input_delay 之经典图解
- 6Ig As in Pig (Word Families Set 3) 1591972361
- 7Alteon维护手册 - version2
- 8report(english version)参考地质资料
- 92016 Jessup Compromis Final Word Version
- 10unit1 - economy - student version
- 粮油储藏基础知识
- 论文范文(包括统一封面和内容的格式)
- 经典解题方法
- 综合部后勤办公用品管理办法+领用表
- 学生宿舍突发事件应急预案
- 16秋浙大《生理学及病理生理学》在线作业
- 四分比丘尼戒本(诵戒专用)
- 浙江财经大学高财题库第一章习题
- 九大员岗位职责(项目经理、技术负责人、施工员、安全员、质检员、资料员、材料员、造价员、机管员)
- 旅游财务管理习题(学生版)
- 德阳外国语高二秋期入学考试题
- 投资学 精要版 第九版 第11章 期权市场
- 控制性详细规划城市设计认识
- bl03海运提单3国际贸易答案
- 2010-2011学年湖北省武汉市武珞路中学七年级(上)期中数学试卷
- VB程序填空改错设计题库全
- 教师心理健康案例分析 - 年轻班主任的心理困惑
- 民间借贷司法解释溯及力是否适用?
- 三联书店推荐的100本好书
- 《化工原理》(第三版)复习思考题及解答
- simplescalar
- version
- tool
- set
- 2.0