Playstation 2 is a registered trademark of Sony.
Part 1: Introduction
Part 5: Miscellanious
Part 1: Introduction
1.1. What does this FAQ cover and where can I find the latest version of it?
The goal of this FAQ is to provide basic information on the Vector Units and minimalist code on how to operate them. With this FAQ, you should be able to get up and running with the VUs with very little time. The code makes use of some of the libraries provided by the folks over at Sony.
The latest version of this FAQ is always available at
http://www.pakl.net/ps2vufaq.html
and other locations (such as http://playstation2-linux.com/projects/ps2neural and http://playstation2-linux.com/projects/p2lsd) are updated as time allows.
Lastly, you can always request a copy directly from me at patryk@pakl.net.
1.2. What are the VUs?
The VUs are the vector units on the Playstation 2 Emotion Engine chip
capable of rapidly processing 4-dimensional float vector data as well as integer
data. There are two of them, VU0 and VU1. Each one is somewhat
specialized in its abilities and its connections to the rest of the hardware in the PS2.
VU0 | VU1 | |
Operating modes | Can operate in the interactive Macro Mode as a co-processor to the
EECore, (processing instructions directly from the EECore)
Can also operate in Micro Mode like VU1. |
Operates only in the non-interactive Micro Mode in which an entire program is sent to its Micro Mem, and then the EECore transmits a signal to the VU to start running the program. |
Memory Paths | VU1s registers are mapped to VU0s VU Memory. | Has a high-priority memory path to the Graphics Synthesizer chip (GS). |
Calculating Ability | Rapid 4-dimensional floating point vector addition and multiplication, integer operations. | Same as VU0, but also has an Elementary Function Unit capable of function such as sines, cosines, exponents. |
Memory | 4k Micro Memory (for code)
4k VU Memory (for data) |
16k Micro Memory (for code)
16k VU Memory (for data) |
1.3. What are the VUs used (or useful) for?
Most graphics processing is made of matrix and vector operations.
The VUs speed this up.
The VUs also make the PS2 an ideal piece of hardware for
running neural networks (which are instantiated mathematically by MADDs, or inner products).
1.4. What requirements are there for using the VUs?
To run the examples provided in this FAQ, you must have:
Check out http://www.playstation2-linux.com and the manuals that come with the Linux kit.
1.6 Quickstart to VU Programming: vu1nteractive, the interactive VU interpreter
To make the transition to VU programming smoother (and perhaps a little more fun), I have bundled code from this FAQ into an interactive interpreter program. The goal of this interpreter is for it to be the next best thing to a true interactive debugger; currently, it is in a very beta state, but it seems to work quite well. When you run vu1nteractive, you type in a line of VU code, and vu1nteractive executes it on VU1, and lets you view some of the registers. There should be updates forthcoming soon making it easier to select which registers are being viewed and to view memory locations in VU1 Data.
vu1nteractive can be obtained via http://pakl.net/ps2/ or through the project page at http://playstation2-linux.com/vu1nteractive/
If you would like to contribute code to vu1nteractive, please e-mail me at patryk@pakl.net... it could use some console-mode GUI help. =)
Part 2: Vector Units: macro mode and micro mode
2.1. Can you show me a small useful VU macro mode assembly program?
Here's an add_vector macro mode function extracted from libvu0.c. It adds arguments v1 and v2 together on VU0, and stores the result into v0. The result is then stored back to the EECore.
void add_vector(ps2_vu0_fvector v0, ps2_vu0_fvector v1, ps2_vu0_fvector
v2)
{ asm __volatile__(" lqc2 vf4,0x0(%1) lqc2 vf5,0x0(%2) vadd.xyzw vf6,vf4,vf5 sqc2 vf6,0x0(%0) ": : "r" (v0) , "r" (v1), "r" (v2)); } |
2.2. How do you control VU0 in macro mode?
First of all, get your hands on libvu0. It's a collection of C functions containing VU0 macro mode assembly language. In the example below, we have extracted the add_vector function from libvu0 and included it. (Ordinarily, you would just compile libvu0.c along with your source.)
Here's complete code to add two vectors in VU0 macro mode.
#include <stdio.h>
#include <unistd.h> #include <ps2vpufile.h> #include <ps2vpu.h> /* **************************************************
************************************************** */ /* Define what a vector is. */
/* Create 3 vectors for v0 = v1 + v2 */
void add_vector(ps2_vu0_fvector v0, ps2_vu0_fvector v1, ps2_vu0_fvector
v2)
/* Code from libvu0.c
asm __volatile__("
int main (int argc, char * argv[]) {
v1[0] = 1.0; v1[1] = 2.0;
v1[2] = 3.0; v1[3] = 4.0;
/* Without this, an "Illegal
Instruction" error message occurs. */
add_vector(v0, v1, v2); ps2_vpu_close(vdev0); /* Display Results */
return(0);
|
2.3. Can you show me a small useful VU micro mode assembly program?
Certainly:
; -----------------------------------------------
; adder.vsm -- micromode assembly ; Loads two vectors from the beginning of VU Memory, ; adds them, and stores the result back at the beginning of ; VU Memory (location 0). ; ----------------------------------------------- .include "vumacros.h" .global My_dma_start .data
|
2.4. How and why do you compile VU micro mode programs?
To send code to the VUs, it must be compiled into machine code which the VUs can process. You'll need VPU.cmd and vumacros.h (which you can find among the example code). For the adder.vsm code above, the following works:
ee-dvp-as adder.vsm -o adder.vo
ld -o adder.elf adder.vo -T vpu.cmd
2.5. How do you control VU0 or VU1 in micro mode?
The following program sets two vectors at the beginning of VU0's VU Memory, sends the adder.elf program compiled in 2.4., and runs it, all using the functions in libps2dev. If you would like to do everything youself (rather than rely on libps2dev) you can examine the code and perform the memory mapping yourself. See Part 3 for more details on controlling the VUs in micro mode.
To compile: gcc main.c vpudev.c -lps2dev
Note: To get i to go beyond 0, add a lot of ops to the VU adder.vsm
code, such as divides and adds. Also, for code showing that the two VUs can be executed in parallel using these functions, check out vu_parallel_test.tar.gz at http://playstation2-linux.com/projects/ps2neural/.
#include <linux/ps2/ee.h> #include <ps2dma.h> #include <ps2vpu.h> #include <stdio.h> #include <unistd.h> #include <libvu0.h> #include <ps2vpufile.h> #include <sys/mman.h> #include <errno.h> IMPORT_VPU_SYMBOL(__u128 *, My_dma_start, 0) int main() { ps2_vpu *vdev0;
float *q;
/* Prepare access to VU0
*/
/* Retrieve mmaped shared
mem of VU Mem0 */
/* Display some memory locations,
set 2 vectors. */
/* Prepare our vector addition
code to be sent to VU0 */
/* Actually send the code
to VU0 */
printf("\n\n\n\n\n");
/* Fire up VU0 */
while ( ps2_vpu_busy(vdev0))
{ i++; }
printf("BTW, this C program
was able to count to %d before VU0 finished.\n", i);
}
|
Part 3: Controlling the Vector Units in Micro Mode
3.1. How can you start the vector units?
In order to run code that has been passed to the VUs, they must be started. Starting and stopping the vector units is well-documented in the VU User's Manual. The sample program
in 2.5. illustrates the use of the function call ps2_vpu_start() provided
in vpudev.c.
3.2. How can you stop the vector units?
One way is by writing:
NOP [E] NOP
in your assembly code. See the VU User's Manual for details on
the different modes of the VU.
yonder (yonderboy@users.playstation2-linux.com) adds: "Also,
many opcodes can set the E bit, not just NOP."
Part 4: Sending data to the Vector Units
There are 3 main ways to get your data in and out of the VUs.
4.1. How do you write data directly to VU Data Memory?
See the code in Question 2.5., where two vectors ( <1,1,1,1> and
<2,3,4,5> ) are written directly into VU Memory through the memory mapping done
by mmap(). Another way to
write data to the VUs is to send it using DMA to the VUs through the VIF interface.
This has an optional special speed advantage in that the data can be "compressed",
yielding even faster transmission speeds (see next section).
4.2. What is this about sending "compressed data", and hardware decompression in the VIF?
The compression is actually bit-packing. For example, by sending
1s and 0s as bytes, you would be using 8 times as much transmitted data
as needed. The set of 8 ones and zeros could be packed into a single
byte:
00000001 00000000 00000000 00000001 00000001 -> 10011101 00000001 00000000 00000001 |
The VIF handles the unpacking upon receipt of the packet. See
the VU User's Manual for details.
4.3. How do you send data through the VIF?
I haven't done this yet. Anyone want to write this one up? [hikey has volunteered. :-) ]
Part 5: Miscellanious
5.1. What is vpu.cmd?
Anyone want to write this up?
5.2. What if I don't want to use .elf files? How can I link my code in directly?
There is apparently a way to get around .elf files for those who are bothered by them. [I am still not clear on how any of this works, so if someone wants to explain this in more detail, please feel free an e-mail me. Ed.] According to yonder (yonderboy@users.playstation2-linux.com):
"There is another way to load and run vpu code. In your makefile,
include the lines:
%.vh: %.dsm
ee-dvp-as -g -o $*.vo $<>$*.lst nm -g -n -P $*.vo | sed -e 's/^/#define /' -e 's/[^ ] 00\+/0x0/' > $*.vh objcopy -Obinary $*.vo $*.vbin |
that will create a vh file that contains
all microcode labels and the address they are at. If you include
the file, you can
just use vcallms. "
There is something similar to this in aibotpet's makefile:
vu.o:
ee-dvp-as -o $*.vo_ $*.vu objcopy -Obinary $*.vo_ $*.bin_ ./bin2as $* $*.bin_ > $*.a_ as -mcpu=r5900 -KPIC -mips2 -o $*.o $*.a_ rm $*.vo_ $*.bin_ $*.a_ |
which seems to use bin2as.cpp, an "elf code segment ripper".
[Editor's note: If someone could provide me with a minimalist code example like the adder.vsm one above using this linking technique, it would help to make this document a better resource.]
5.3. How about a simple example of this linking?
Yonder provides an example:
assume microcode.dsm looks like this: .global vuo_microentry .global vu0_label_name vu0_microentry: .vu vu0_label_name: mul[e] vf4, vf4, vf5 nop nop mr32 vf4,vf4 step 1: assemble the source and build a header from extracted symbols. Assume the asm file is microcode.dsm and the created header is microcode.vh ee-dvp-as -g -o $*.vo $<>$*.lst nm -g -n -P $*.vo | sed -e 's/^/#define /' -e 's/[^ ] 00\+/0x0/' > $*.vh objcopy -Obinary $*.vo $*.vbin step 2: in your C source: #include "microcode.vh" int do_something(vector x, vector y) { asm __volatile__(" lqc2 vf4,0x0(%0) lqc2 vf5,0x0(%1) vcallms %2 sqc2 vf4, 0x0(%0) "::"r"(x), "r"(y), "r"(vu0_label_name)); }
5.4. How is this related to Colin's and Aibopet's methods?
Aibopet writes (see his mandelbrot code released at http://aibohack.com/ps2/):"source.vu" is the source in raw VU format (no DMA tags, no VCL syntax) The file is assembled, and the text portion is extracted (call this the VUCODE) In the Sony samples, the VUCODE is linked into a separate .ELF file - which is overkill IMHO. Colin's method places the VUCODE in a separate VBIN file (without the ELF file overhead). My method uses the trivial "bin2as" utility turns it into assembler data constants that are assembled and linked into the final program's .text segment, just like any other code [of course it can't run on the main CPU] The "MYVPU::LoadJob" function copies the VU program from the normal .textsegment to VU code RAM with a trivial VUCODE memory manager. The proper routine is then started with "VCALLMS" ---- No DMA is required (although it could be used if you so wish) Right now there is one VU routine per source file [not a terrible limitation for my case]. The actual VUCODE can run from any location (ie. there are no VCALLMS fixed addresses) The nm/sed creation of the .vh file creates absolute symbols at build time for the VUCODE entries. You must load the VBIN file at the very start of the VUCODE area. My technique allocates VUCODE at run time (ie. there are no absoluteaddresses, so you have to use VCALLMSR instead of VCALLMS) In future I want to implement a more dynamic process of VU code loading/optimization(which is why I chose the dynamic technique)--AiboPet ---- from makefile ---- .vu.o: ee-dvp-as -o $*.vo_ $*.vu objcopy -Obinary $*.vo_ $*.bin_ ./bin2as $* $*.bin_ > $*.a_ as -mcpu=r5900 -KPIC -mips2 -o $*.o $*.a_ rm $*.vo_ $*.bin_ $*.a_ ------