There are a number of programs that you can use to create your source files. Emacs is probably the most popular, but it can bury you in its rich set of features; vi is also available, but its command syntax is difficult to learn and use; finally if you subscribe to the pine program, you can use the pico editor, which combines many features of Emacs into a simple menu-driven facility.
It's okay to use windows-based text editors, like Notepad and Wordpad. However, when looking at these same files in emacs, you may notice that these applications put a ^M at the end of each line. If you're bothered by these extra characters in emacs, you can use an FTP client program to remove the unwanted characters during file transfer, or the Unix command "dos2unix" can help; see its manual page via command "man dos2unix".
Regardless, you will need to learn to use your editor. Because we use Emacs, we can tell you how to access the built-in tutorial facility. Start Emacs with "emacs sourcefile.s", which creates the file called sourcefile.s. Once there, you can type in your program. To learn how to use the control-keys for movement, deletion, insertion, search and replace, etc. you can use the tutorial, accessed by typing "Ctrl-H Ctrl-H t". For other editors, you are on your own.
The assembler is called "isem-as", and is the GNU Assembler (GAS), configured to cross-assemble to a SPARC object format. It is used to take your source code, and produce object code that may be linked and run on the ISEM emulator. The syntax for invoking the assembler is:
isem-as [-a[ls]] sourcefile.s [-o objectfile.o] [>! listing.lst]
The input is read from sourcefile.s, and the output is written to objectfile.o; furthermore, a listing file is generated in listing.lst. The option "-a" tells the assembler to produce a listing file. The sub options "l" and "s" tell the assembler to produce a symbol table, and include the assembly source in the listing file. You can view the listing file by issuing the command "type listing.lst".
The listing file will contain several important pieces of information. First, it will identify all the syntactic errors in your program, and it will warn you if it identifies "suspicious" behavior in your source file (see Assemble Time Errors for more detail). The following is an example of a listing file:
SPARC GAS faq.s page 1
1 .data
2 0000 0000000A a: .word 10
3 0004 FFFFFFFF b: .word -1
4 .text
5 start:
6 0000 11000000 set a,%r8
6 90122000
7
8 0008 13000000 set b,%r9
8 92126000
9
10 0010 D2024000 ld [%r9],%r9
11 mkcall:
12 0014 40000003 call subrtn
13 0018 01000000 nop
14 cleanup:
15 001c 91D02000 ta 0
16
17 subrtn:
18 0020 D2220000 st %r9,[%r8]
19 return:
20 0024 81C3E008 retl
21 0028 01000000 nop
21 01000000
SPARC GAS faq.s page 2
DEFINED SYMBOLS
faq.s:2 .data:00000000 a
faq.s:3 .data:00000004 b
faq.s:5 .text:00000000 start
faq.s:11 .text:00000014 mkcall
faq.s:17 .text:00000020 subrtn
faq.s:14 .text:0000001c cleanup
faq.s:19 .text:00000024 return
NO UNDEFINED SYMBOLS
The listing is divided into columns: the leftmost column identifies a line number in your source file. The next column is an offset for where this instruction or data resides in memory. Later, when the program is actually loaded into memory, these offsets will be added to the real byte-address location of the start of .text (usually 0x20) and .data (usually 0x4000). Hence, the address of the symbol "b" above would be at address 0x4004; similarly, the address of "subrtn" would be at address 0x2040. The next column is the image of what is put in memory, either the machine instructions or the representation of the data. The final column is the source code that produced the line.At the bottom of the file you will find the symbol table. Again, the symbols are represented as offsets that are relocated when the program is loaded into memory. (The assembler has a special convention that, unless stated otherwise with a .global directive, all and only those symbols beginning with a capital 'L' are considered "Local symbols". Local symbols are not listed in the symbol table and are not made available to the linker, to the debugger, or to the emulator. Consequently, you should avoid using capital 'L' as the first character of any of your labels. Alternatively, you could list in a .global directive each symbol that begins with a capital 'L'. The best practice may be to use a separate .global directive for each of those symbols, on the line immediately preceding the symbol's definition.)
Any symbols that your program uses but does not define will be listed under the heading "UNDEFINED SYMBOLS".
You can get more information from the manual pages available via command "man as".
Examples:
% isem-as -als foo.s -o foo.o Assemble foo.s into foo.o, send listing to the screen. % isem-as -o foo.o foo.s Assemble foo.s into foo.o, no listing. % isem-as -als foo.s -o foo.o >! foo.lst Assemble foo.s into foo.o, redirect listing to foo.lst.
Linking turns a set of raw object file(s) into an executable program. From the manual page, "ld combines a number of object and archive files, relocates their data and ties up symbol references. Often the last step in building a new compiled program to run is a call to ld." In essence, if you had several object files from several assembly source files, you can combine those into one executable using ld, and the separate files could reference symbols from one another. The output of the linker is an executable program. See "man ld" for more information. The syntax for the linker is as follows:
isem-ld objectfile.o [-o execfile]
Examples
% isem-ld foo.o -o foo Links foo.o into the executable foo. % isem-ld foo.o Links foo.o into the executable a.out.
The heart of the ISEM package is the emulation of the SPARC hardware. Once you have assembled and linked your program, you will now execute it and test it in the emulation environment. The program "isem" is used to do this, and the majority of its features are covered in your lab manual. For the purposes of this FAQ, you invoke isem as follows
isem [execfile]
Examples
% isem foo Invokes the emulator, loads the program foo % isem Invokes the emulator, no program is loaded
Make sure that you have loaded a proper executable file. Look for a message like the following:
Loading File: lab0
2000 bytes loaded into Text region at address 8:2000
2000 bytes loaded into Data region at address a:4000
PC: 08:00002020 nPC: 00002024 PSR: 0000003e N:0 Z:0 V:0 C:0
start : sethi 0x10, %g2
ISEM>
On the other hand, if you see the message "Not a page aligned executable file", then you have attempted to load a wrong file. Perhaps it was the object file produced by the assembler and not an executable file produced by the linker. Another possible error in loading may cause either or both of the following messages to appear: "Unable to open file a.out for reading" and/or "Unable to Interpret Command". Your executable file's name should begin with a letter (not a digit character). You will need to fix either of these problems before proceeding.
Once you are in the emulator, you can run your program by typing "run" at the prompt. For advice on interactive debugging, see the section on General Debugging Strategies.
Subject: Something that may make life easier
Date: Thu, 30 May 2002 22:05:12 -0400
From: "Tim" <mcconnet@cis.ohio-stat.edu>
Organization: The Ohio State University Dept. of Computer and Info. Science
Newsgroups: cis.course.cis360
Modified: June 3, 2002, by Wayne Heym
This is a useful shell script that will simplify assembling and
linking. If you make a new file called 'isem-mk' (for 'isem make')
and fill it with the following:
(say
% emacs isem-mk)
#! /bin/csh -f
set var = $argv[1]
isem-as -als ${var}.s -o ${var}.o >! ${var}.lst
isem-ld ${var}.o -o $var
---END FILE---% chmod u+x isem-mkon the command line and make it executable, then you can type: (if you save your labs as labN.s etc.)
% isem-mk lab3and it will assemble and link the lab completely and dump a labN.list file with any errors, in addition to printing them on the screen. You can extend this script with
isem $var if you're really
confident. (The file described here is available at
~heym/bin/isem-mk.)isem-as -als foo.s -o foo >! foo.lstAbove, the foo.lst file will contain the listing. You will find your error messages in this file, which you can inspect using either your editor, or "more", or "less".
The most common and hardest to debug errors are run time errors. One to be aware of is using a label that does not exist or forgetting to put a '%' in front of register references like %sp. This sort of error cannot be an assemble time error because the label might exist in a separate object module that is later to be linked with this one. However, you can use the listing file to help you uncover such errors before run time. If there are any symbols listed in the "Undefined Symbols" section, change your source program until that section heading becomes "No Undefined Symbols". See Other ways for all of the above to happen.
Some line that is interpreted as an executable statement (e.g., a non-comment, non-blank line) has an unrecognized mnemonic opcode. This error will be printed in the listing next to the offending statement. There are a couple of possible causes for this error:
A typographical mistake (e.g., misspelling a legitimate opcode)
A missing colon after a label
The parser for the assembly source code has been thrown off track due to a formatting error in your code. In this case, it is likely that you have forgotten commas between operands, have missing operands, or have incorrect operands for the opcode specified. Check the line on which this error is printed in the listing file.
An undefined symbol is one your program refers to (e.g., a label for a branch target, a symbolic constant, etc.) but does not define. You can use the listing file to help you uncover such errors before run time. If there are any symbols listed in the "Undefined Symbols" section, then those symbols are used in the program, but are not defined in it. It is likely that you have mistyped a label, or neglected to label a line that is referenced as a branch target.
A related issue is the use of opcodes as labels. For instance, it is tempting to label a line inc that increments a counter in a loop. This is often a branch target, after all. However, inc is also an opcode, and this can confuse the assembler, and result in incorrect code generation.
This is merely a warning that an error condition may exist. It happens when you label the same line with two different labels. It is likely that you do not need one of the two labels. Check the symbol table in the listing file (or using symb in the interactive environment) to see which symbols have the same address, and delete one of them.
The problem is that if you have declared a label, say something like:X: .word 20
Then in your code, suppose you say something like:
mov X, %r3
What you are trying to do is to copy the value in the word corresponding to X into %r3; except the right way to do it is first get the address of X into another register, etc.
The reason why the assembler assembles this without complaint is that the (unrelocated) address of X is typically small enough to fit in the 13 bits available in the mov instruction; so it assembles using that address. Then the loader relocates the stuff and now the relocated address won't fit in the 13 bits, so it `truncates the relocation to fit'.
There are several likely causes. One is using an undefined symbol as the target in a branch instruction. (See the section "Undefined symbols" in this, the "Assemble Time Errors", section of this document.) Another is using a symbolic constant defined using a .set assembler directive as the target in a branch instruction. Still another is having entirely left the target symbol out of a branch instruction. It would seem that the assembler should have complained about this last error because the branch instruction, then, would not conform to the assembler syntax.The linker was expecting to relocate the target of the branch instruction, but the assembler did not provide the necessary and appropriate relocation information in the object file. The object file had allocated y pieces of relocation information, but the linker was only able to appropriately use x of those.
Make sure every branch instruction has a target symbol that is, in fact, a label in a text section of the source file.
Loading or storing data from or to a location referred to directly by a label, say "m", can cause this error:m: .word 6 .text ... ld [m], %r2 !%r2 = [m]
Before you send e-mail to your instructor asking for help finding a bug, make a determined effort to track it down using these facilities. For example, if you think you have an infinite loop, set a breakpoint at the top of the loop, and trace the execution of each instruction, making sure that the loop makes progress towards its goal during any iteration (i.e., it increments or decrements a counter, and compares and branches based on this counter).
Breakpoints allow you to stop execution of the program when certain addresses are reached. The breakpoint facility functions during trace, step, and execution modes. Breakpoints can be used to stop execution near where you expect to find a bug, and allows you to interact with the emulation environment, display memory contents, registers, etc. You can either break at a hard address or at any labeled line (because a label is an address). (Exception: if the line's label begins with a capital 'L', the emulator will have no information about that label. The assembler did not provide this information because the assembler considers all such labels to be "Local symbols", not to be advertised externally.) The syntax is as follows:
break addressor
b address
Examples:
ISEM> break Lists the currently set breakpoints ISEM> b Lists the currently set breakpoints ISEM> b 8:0x202c Sets breakpoint at address space 8 (user text) and address 0x202c. ISEM> b -d 8:0x202c Removes the breakpoint ISEM> b _bcopy Sets a breakpoint at the routine '_bcopy'
The watch command allows you to watch specific registers or memory locations. Whenever, the value of the watchpoint changes, the watchpoint is printed out.
The watch command without any arguments lists all current watchpoints. The command followed by a memory or register reference sets a watchpoint. The command with the '-d' option followed by a register or memory reference will clear the existing watchpoint. The watch command will also accept -b and -h options which allow setting watchpoints for bytes and halfwords. The syntax is as follows:
watch [-d] [address (or) register]or
w [-d] [address (or) register]
Examples:
ISEM> watch List current watchpoints ISEM> w List current watchpoints ISEM> w %g1 Set a watchpoint for register %g1 ISEM> w -d %g1 Clear the watchpoint for register %g1 ISEM> w 8:0x4004 Set a watchpoint for memory location 8:0x4004
The trace command will allow you to execute a single instruction (or a specified number of instructions), starting at the current program counter, and displaying the contents of registers, program counter, condition code register, and next instruction to be executed after the current instruction is executed. The optional argument allows you to specify the number of steps to execute before halting again. If a breakpoint is encountered in the middle of a trace, the system will halt for the breakpoint. Similar to the trace command is the step command. Step will allow you to execute one or more instructions, but only the result of the last instruction will be shown. The syntax is as follows:
trace [# of steps]or
t [# of steps]
step [# of steps]or
st [# of steps]
Examples:
ISEM> trace Executes the instruction at the current PC, shows the register contents and next instruction ISEM> t Executes the instruction at the current PC, shows the register contents and next instruction ISEM> t 5 Traces the next 5 instructions, showing register contents and next instruction for each ISEM> st 5 Executes the next 5 instructions and shows the status bits after the last instruction finishes.
The reg command can be used to display and modify registers. Without arguments, the reg command will display all 32 integer unit registers, the PC, the nPC, PSR, and integer condition code flags.
To display a specific register, use the reg command followed by the specific register designation. You can refer to the registers by "r" numbers (%r0 thru %r31) or by group name and number (%g0-%g7, %o0-%o7...). In addition to the integer unit data registers, you can also access the following control registers: wim, psr, pc, npc, sp, tbr, y. The contents of a register may be set by adding the desired value after the register name. The syntax is as follows:
reg [register] [new value]
Examples:
ISEM> reg display all registers ISEM> re display all registers ISEM> re %o2 display output register 2 ISEM> re %r10 display register 10 (same as %o2) ISEM> re %pc 0x202c set the PC to the value 0x202c ISEM> re %pc _main set the PC to the address of _main ISEM> re %i4 200 set register %i4 to decimal value 200
The dump command allows the user to inspect memory. Without arguments, the dump command will display 128 bytes of memory starting from the last location dumped or from the base of the loaded data space if it is the first dump command executed.
Optionally, an address can be specified to the dump command which will result in the dump starting from the specified address. An address range (addr1,addr2) can be specified. This will cause dump to display all bytes starting from addr1 and ending at addr2. The syntax is as follows:
dump [addr1] [, addr2]or
du [addr1] [, addr2]
Examples:
ISEM> dump Display memory starting at default location ISEM> du Display memory starting at default location ISEM> du 0x408F Display memory from current address Space and address 0x408F ISEM> du 8:0x4023 Display memory from address space 8 and address 0x4023 ISEM> du _main Display memory starting at address of '_main' ISEM> du 0x4000,0x4400 Display memory starting at address 0x4000 and ending at address 0x4400
The help command provides information on specific commands and features. If used without any arguments, help will display a short list of topics and commands available in the simulator.
You can get additional help on specific commands by typing help followed by the command or keyword on which you want more information. The syntax is as follows:
help [keyword]or
h [keyword]
If the emulation environment traps an exception (due to any of the reasons given below), it will always report the Program Counter address that caused the exception. You can use this number to locate, in your listing file, which instruction caused the error, and place a breakpoint in that vicinity to aid your debugging. The starting address of the Program Counter is (probably) 202016; you should verify this fact immediately after loading your program. So, if you take the PC which caused the error and subtract 202016 from it, you will find the displacement to the instruction that caused the exception.
For example, suppose you got a TRAP (memory address not aligned) occurred at PC: 21A4 error message at run time. Compute 21A416 - 202016 = 18416 and locate the instruction that is at address 18416 in the listing produced by the assmbler to find the troublesome load or store. Remember that this computation will be in hexadecimal, so if you are not comfortable with a base 16 numbering system, then convert the numbers to base 10, do the subtraction, and reconvert back to base 16 to find the line in the listing file.
This is probably the most difficult error to debug, because it may mean that your program has run horribly amok. It happens for various and strange reasons, of which I've been able to identify only a few from my personal experience. You may have done one or more of the following:
Accessed memory (even with a load!) at an address greater than 0x3FFFF, or 2^18 - 1. This is ISEM's apparent limit on the memory address space. Unfortunately, ISEM doesn't recognize such a violation with a nice error message saying so; it immediately begins issuing the repeated message "1000 instructions executed in supervisor mode" ad infinitum. One reason your program may be using such a large memory address is that it went into a long-running or infinite loop.
Branched into a region of memory that contains the trap handlers.
Branched into a region of memory that contains junk.
Fallen off the end of the program because of a missing nop or ta 0.
Entered into an infinite recursion, and caused the stack to grow and overwrite the program code with data.
These bugs are particularly difficult to track, because once you've seen the error message, you are at least 1000 instructions beyond where the error occurs! As a result, you can't locate an exact line that caused the problem, and your best chance to debug this is to set breakpoints at strategic locations in your code. If you expect to encounter a breakpoint, and don't, then you know roughly where the program went astray, and you can begin to trace the execution. If the program counter changes to some bizarre address, then you can back up to see why it did this.
Second in difficulty only to the 1000 instructions error above, is this condition. The only response you get back from ISEM is that you can't halt the execution of the program with a Ctrl-C. Again as above, you can't know where the program has gone wrong because (without the aid of breakpoints) you won't be able to access the interactive environment to display the registers, PC, memory, etc..
The first order of business is to somehow kill the ISEM program so that you can begin to debug it. This is accomplished by:
Control-Z
% jobsThis will print out a list of background jobs, usually:
[1] + Suspended isem
% kill %1
% ps -a | grep isem 28007 pts/94 0:09 isem
% kill 28007
Now, to debug this program, you need to set breakpoints to determine where in your code you have an infinite loop or recursion. Then, trace the code inside the loop to figure out why it is doing what it is. In general, loops will be infinite because you either have an incorrect branch condition, or you aren't making progress toward exiting the loop (i.e., not incrementing or decrementing a counter). Infinite recursions occur because you either don't have a base case in which you return a known value, or you aren't making progress toward that base case. Either way, you'll probably end up seeing that 1000 instructions error above.Be aware that isem can hang up in an infinite loop when it executes an sdiv instruction, when the dividend (the numerator) is not the direct result of an smul instruction. In the program shown below, the first sdiv instruction executes properly with correct results, but the second one hangs up in an infinite loop. Notice that, prior to the second sdiv instruction, the dividend (%r2) was changed away from the value given it by an smul instruction, by a set (synthetic) instruction. Please refer to chapter 4 of the Lab Manual, "Multiplication and Division", for further discussion of the relationship between the smul and sdiv instructions (via hidden state saved in the %y register).
.text start: set 0x60000000, %r2 smul %r2, 8, %r2 sdiv %r2, 8, %r2 smul %r2, 8, %r2 set 10, %r2 sdiv %r2, 8, %r2 ta 0
Briefly put, the error message indicates that the word located at the next PC value is not a legal instruction, and, therefore, was not an instruction generated by the assembler. Look at the value of the PC, and try to determine which line in the listing file caused the trap; likely, it won't exist! In other words, you are executing junk from memory. This could be caused by any number of reasons:
Issuing the trace or run command in isem without first having loaded a program.
A missing nop after a branch, retl, or call instruction at the end of your file.
Missing a ta 0 to halt execution of your main function.
You changed %r15 incorrectly (and hence the return address) before a retl.
Because you have already passed the instruction (i.e., the branch to a bad memory location) that led you to execute this illegal one , you will need breakpoints and tracing to help you figure out how you reached this PC value.
This happens when you try to load or store data to an address that is not properly aligned. In other words, you try to store an integer (a word) to an address that is not word aligned. This error can only be generated by a load or a store instruction (and their variants).
There are three ways that this can occur:
You are generating a bad address at run time
You have word "variables" that aren't word aligned in your data section
You are using a load or store word instruction (ld or st) when you intend a load or store byte instruction (ldub, ldsb, or stb) (or half word: lduh, ldsh, or sth)
In the first case, determine which line of your listing caused the error, and set a breakpoint there. Since the address to which you are loading or storing is kept in a register, check the contents of that register. The address will not be aligned properly (in the case of register indexed or register displacement addressing, the sum of the register(s) or constant will not be aligned). If you are generating the address dynamically, for accessing arrays, etc., then you need to check your address calculation routines.
In the second case, determine which variable is unaligned, and insert the .align 4 directive above its declaration in the data section.
In the third case, determine the size of the datum involved, and use the appropriate instruction for that size.
The second operand to sdiv or udiv was zero, resulting in a divide by zero error. You need to ensure that the second operand is either never zero, or write a special case (in an if-then-else clause) to avoid the error.
ld [fp-12], %r1becomes
ld [0-12], %r1 ! which is the same as "ld [-12], %r1",
! and is sure to cause you grief!
Another common example:
set typo, %r1 ... add %r2, 4, %r2 ld [%r1+%r2], %r3but %r1 has 0 stored in it.
You can accidentally get two different data labels to share the same memory location by leaving the operand out when using the .word, .half or .byte assembler directive. These assembler directives really mean to reserve a memory location of the appropriate size for each operand in the comma separated list of operands following the directive. Therefore, these directives mean to reserve zero or more memory locations. Consider the following example:
two: .word 0, 23 zero: .word one: .word 45
Suppose the address two is 0x4000. The
word at address 0x4000 (initially) contains the integer value 0, and the word at
0x4004 (initially) contains the value 23. There are two words stored at
location two. By virtue of the directive associated with label one, there
is a word whose value (initially) is 45 stored at address 0x4008. The
address one is 0x4008. There is one word stored at location one. The
directive in the middle has caused the address zero to be 0x4008 also.
That is to say, addresses zero and one are equal, both 0x4008, and the value 45
is stored there (initially). Because there are zero words stored at
location zero, location zero is the same address as location one. Such an
alias can cause significant, mysterious trouble in your program.
Unfortunately, the assembler is erroneous when it translates certain immediate values without flagging a syntax error. Let imm be an immediate value. If -4096 ≤ imm ≤ 4095 (-0x1000 ≤ imm ≤ 0xFFF), then the assembler correctly translates it. However, if either -8192 ≤ imm ≤ -4097 (-0x2000 ≤ imm ≤ -0x1001) or 4096 ≤ imm ≤ 8191 (0x1000 ≤ imm ≤ 0x1FFF), then there exists no correct translation, but the assembler goes ahead and translates without flagging an error. Otherwise (either imm < -8192 (imm < -0x2000) or 0x1FFF < imm (8191 < imm)), the assembler does, thankfully, complain of an error: "relocation overflow". For example:
add %r1, 4095, %r1
is translated correctly by the assembler; however,
add %r1, 4096, %r1
has no correct translation, and the assembler incorrectly translates it as
add %r1, -4096, %r1
Furthermore,
add %r1, 8191, %r1
is incorrectly translated as
add %r1, -1, %r1
Thankfully, for
add %r1, 8192, %r1
the assembler issues an error message, "relocation overflow". While the error message in the listing to standard output does not show a line number, the error message sent to the standard error device does.
Fortunately, there exists an automatic method for finding immediate values that are out of range in a program. The old assembler (the one used here before 2004), while it had many problems that are fixed in the new assembler, had the virtue of finding and reporting all immediate values that are out of range. One can apply this automatic method by, for example, giving the following command in a Unix terminal:
old-isem-as -als lab1.s -o /dev/null >! lab1.lst
This command is careful to throw away the object code output that might be produced by the old assembler for a syntactically correct program: important because the new linker is incompatible with the old assembler. (The object code output is thrown away by sending it to /dev/null, a Unix device that receives output without changing anything in the system; /dev/null is "the great bit shredder".) If there are any immediate values out of range in the source file, the error message "constant value must be between -4096 and 4095" is sent to the terminal screen (via the standard error device). Also, this error message appears in the listing file (lab1.lst).
One can get the same behavior as above with the following command (actually, it would be enough to type "~heym/bin/isem-f<Tab>lab1", where "<Tab>" means tapping the <Tab> key.):
~heym/bin/isem-find-value-out-of-range lab1
Last modified: Mon Feb 20 16:26:04 EST 2006