(Unofficial) Playstation2 VU Developer's FAQ
Maintained by Patryk Laurent (patryk@pakl.net, patryk@users.playstation2-linux.com)
Version 1.2
Last Update:  October 16, 2002.
 

Playstation 2 is a registered trademark of Sony.
 

Part 1: Introduction

Part 2: Vector Units:  macro mode and micro mode Part 3: Controlling the Vector Units in Micro Mode Part 4: Sending data to the Vector Units


Part 5: Miscellanious



Part 1: Introduction
 

1.1. What does this FAQ cover and where can I find the latest version of it?

The goal of this FAQ is to provide basic information on the Vector Units and minimalist code on how to operate them.  With this FAQ, you should be able to get up and running with the VUs with very little time. The code makes use of some of the libraries provided by the folks over at Sony.

The latest version of this FAQ is always available at
http://www.pakl.net/ps2vufaq.html
and other locations (such as http://playstation2-linux.com/projects/ps2neural and http://playstation2-linux.com/projects/p2lsd) are updated as time allows.

Lastly, you can always request a copy directly from me at patryk@pakl.net.
 

1.2. What are the VUs?

The VUs are the vector units on the Playstation 2 Emotion Engine chip capable of rapidly processing 4-dimensional float vector data as well as integer data.  There are two of them, VU0 and VU1.  Each one is somewhat specialized in its abilities and its connections to the rest of the hardware in the PS2.
 
VU0 VU1
Operating modes Can operate in the interactive Macro Mode as a co-processor to the EECore, (processing instructions directly from the EECore)
Can also operate in Micro Mode like VU1.
Operates only in the non-interactive Micro Mode in which an entire program is sent to its Micro Mem, and then the EECore transmits a signal to the VU to start running the program.
Memory Paths VU1s registers are mapped to VU0s VU Memory. Has a high-priority memory path to the Graphics Synthesizer chip (GS).
Calculating Ability Rapid 4-dimensional floating point vector addition and multiplication, integer operations. Same as VU0, but also has an Elementary Function Unit capable of function such as sines, cosines, exponents.
Memory 4k Micro Memory (for code)
4k VU Memory (for data)
16k Micro Memory (for code)
16k VU Memory (for data)

1.3. What are the VUs used (or useful) for?

Most graphics processing is made of matrix and vector operations.  The VUs speed this up.
The VUs also make the PS2 an ideal piece of hardware for running neural networks (which are instantiated mathematically by MADDs, or inner products).
 

1.4. What requirements are there for using the VUs?

To run the examples provided in this FAQ, you must have:

1.5. Where can I find more information about the VUs?

Check out http://www.playstation2-linux.com and the manuals that come with the Linux kit.

1.6 Quickstart to VU Programming: vu1nteractive, the interactive VU interpreter

To make the transition to VU programming smoother (and perhaps a little more fun), I have bundled code from this FAQ into an interactive interpreter program. The goal of this interpreter is for it to be the next best thing to a true interactive debugger; currently, it is in a very beta state, but it seems to work quite well. When you run vu1nteractive, you type in a line of VU code, and vu1nteractive executes it on VU1, and lets you view some of the registers. There should be updates forthcoming soon making it easier to select which registers are being viewed and to view memory locations in VU1 Data.

vu1nteractive can be obtained via http://pakl.net/ps2/ or through the project page at http://playstation2-linux.com/vu1nteractive/

If you would like to contribute code to vu1nteractive, please e-mail me at patryk@pakl.net... it could use some console-mode GUI help. =)


Part 2: Vector Units:  macro mode and micro mode
 

2.1. Can you show me a small useful VU macro mode assembly program?

Here's an add_vector macro mode function extracted from libvu0.c.  It adds arguments v1 and v2 together on VU0, and stores the result into v0. The result is then stored back to the EECore.
 

void add_vector(ps2_vu0_fvector v0, ps2_vu0_fvector v1, ps2_vu0_fvector v2)
{
       asm __volatile__("
        lqc2    vf4,0x0(%1)
        lqc2    vf5,0x0(%2)
        vadd.xyzw       vf6,vf4,vf5
        sqc2    vf6,0x0(%0)
        ": : "r" (v0) , "r" (v1), "r" (v2));
}

 

2.2. How do you control VU0 in macro mode?

First of all, get your hands on libvu0.  It's a collection of C functions containing VU0 macro mode assembly language.  In the example below, we have extracted the add_vector function from libvu0 and included it. (Ordinarily, you would just compile libvu0.c along with your source.)

Here's complete code to add two vectors in VU0 macro mode.
 

#include <stdio.h>
#include <unistd.h>
#include <ps2vpufile.h>
#include <ps2vpu.h>

/* **************************************************
vu0macro.c -- Code to demo VU0 in macro mode 
  To compile:  gcc vu0macro.c -lps2dev
  To run:  ./a.out 

 ************************************************** */

/* Define what a vector is. */
typedef float   ps2_vu0_fvector[4] __attribute__((aligned (16)));

/* Create 3 vectors for v0 = v1 + v2 */
ps2_vu0_fvector v0 __attribute__ ((aligned (16)));
ps2_vu0_fvector v1 __attribute__ ((aligned (16)));
ps2_vu0_fvector v2 __attribute__ ((aligned (16)));

void add_vector(ps2_vu0_fvector v0, ps2_vu0_fvector v1, ps2_vu0_fvector v2)
{

        /* Code from libvu0.c
           The lqc2 commands load the data onto vu0 (cop2),
           vadd is self-explanatory, and the sqc2 stores
           the result back to the EECore.
        */

        asm __volatile__("
        lqc2    vf4,0x0(%1)
        lqc2    vf5,0x0(%2)
        vadd.xyzw       vf6,vf4,vf5
        sqc2    vf6,0x0(%0)
        ": : "r" (v0) , "r" (v1), "r" (v2));
}

int main (int argc, char * argv[]) {
        ps2_vpu *vdev0;

        v1[0] = 1.0; v1[1] = 2.0; v1[2] = 3.0; v1[3] = 4.0;
        v2[0] = 3.0; v2[1] = 4.0; v2[2] = 5.0; v2[3] = 6.0;

        /* Without this, an "Illegal Instruction" error message occurs. */
        vdev0 = ps2_vpu_open(0); 
        ps2_vpu_reset(vdev0);

        add_vector(v0, v1, v2); 

        ps2_vpu_close(vdev0);

        /* Display Results */ 
        for (i = 0; i < 4; i++) { printf ("%f ", v0[i]);   }
        printf("\n");

        return(0);
}

 

2.3. Can you show me a small useful VU micro mode assembly program?

Certainly:
 

; -----------------------------------------------
; adder.vsm -- micromode assembly
; Loads two vectors from the beginning of VU Memory,
; adds them, and stores the result back at the beginning of
; VU Memory (location 0).
; -----------------------------------------------
.include "vumacros.h"
.global My_dma_start 

.data
.align 4
My_dma_start:
DMAcnt *
MPG 0,*
NOP             IAND VI15, VI00, VI00           ; Set mem indx to 0.
NOP             LQI.xyzw VF10xyzw, (VI15++)     ; Load first vector into vf10.
NOP             LQ.xyzw VF11xyzw, 0(VI15)       ; Load second vector into vf11.
ADD.xyzw VF12xyzw, VF10xyzw, VF11xyzw   NOP             ; Add'em.
NOP             SQ.xyzw VF12xyzw, 0(VI00)       ; Store to first mem pos.
NOP[E]          NOP
NOP             NOP
.EndMPG
.EndDmaData
 

2.4. How and why do you compile VU micro mode programs?

To send code to the VUs, it must be compiled into machine code which the VUs can process.  You'll need VPU.cmd and vumacros.h (which you can find among the example code).   For the adder.vsm code above, the following works:

ee-dvp-as adder.vsm -o adder.vo
ld -o adder.elf adder.vo -T vpu.cmd

2.5. How do you control VU0 or VU1 in micro mode?

The following program sets two vectors at the beginning of VU0's VU Memory, sends the adder.elf program compiled in 2.4., and runs it, all using the functions in libps2dev. If you would like to do everything youself (rather than rely on libps2dev) you can examine the code and perform the memory mapping yourself.   See Part 3 for more details on controlling the VUs in micro mode.

To compile: gcc main.c vpudev.c -lps2dev
 

Note: To get i to go beyond 0, add a lot of ops to the VU adder.vsm code, such as divides and adds. Also, for code showing that the two VUs can be executed in parallel using these functions, check out vu_parallel_test.tar.gz at http://playstation2-linux.com/projects/ps2neural/.
 


#include <linux/ps2/ee.h>
#include <ps2dma.h>
#include <ps2vpu.h>
#include <stdio.h>
#include <unistd.h>
#include <libvu0.h>
#include <ps2vpufile.h>
#include <sys/mman.h>
#include <errno.h>

IMPORT_VPU_SYMBOL(__u128 *, My_dma_start, 0)

int main() {

        ps2_vpu *vdev0;
        VPUFILE *vfd1;

        float *q;
        int i=0;
        int result;

        /* Prepare access to VU0 */
        vdev0 = ps2_vpu_open(0);
        ps2_vpu_reset(vdev0);

        /* Retrieve mmaped shared mem of VU Mem0 */
        q = ps2_vpu_data(vdev0); 

        /* Display some memory locations, set 2 vectors. */
        printf("At startup: %f %f %f %f \n", q[0], q[1], q[2], q[3]); 
        q[0] = 1.0; q[1] = 1.0; q[2] = 1.0; q[3] = 1.0; 
        q[4] = 2.0; q[5] = 3.0; q[6] = 4.0; q[7] = 5.0;

        /* Prepare our vector addition code to be sent to VU0 */
        vfd1 = vpuobj_open("adder.elf", O_DATA_PS2MEM);
        if (vfd1 < 0) {
                printf("couldn't write adder.elf to vpu0\n");
                perror("vpu_open");
                exit(1);
        }

        /* Actually send the code to VU0 */
        ps2_dma_send(ps2_vpu_fd(vdev0), vfd1, (ps2_dmatag *)My_dma_start);

        printf("\n\n\n\n\n");
        printf("           : %f %f %f %f \n", q[0], q[1], q[2], q[3]); 
        printf("        +  : %f %f %f %f \n", q[4], q[5], q[6], q[7]); 
        printf("--------------------------------------------------------\n");
 

        /* Fire up VU0 */
         ps2_vpu_start(vdev0,0);

        while ( ps2_vpu_busy(vdev0)) { i++; } 
        printf("        =  : %f %f %f %f \n", q[0], q[1], q[2], q[3]); 

        printf("BTW, this C program was able to count to %d before VU0 finished.\n", i);
        ps2_vpu_close(vdev0);

}
 


Part 3: Controlling the Vector Units in Micro Mode
 

3.1. How can you start the vector units?

In order to run code that has been passed to the VUs, they must be started. Starting and stopping the vector units is well-documented in the VU User's Manual.  The sample program in 2.5. illustrates the use of the function call ps2_vpu_start() provided in vpudev.c.
 

3.2. How can you stop the vector units?

One way is by writing:
    NOP [E]   NOP
in your assembly code.  See the VU User's Manual for details on the different modes of the VU.

yonder (yonderboy@users.playstation2-linux.com) adds:  "Also, many opcodes can set the E bit, not just NOP."
 


Part 4: Sending data to the Vector Units
  There are 3 main ways to get your data in and out of the VUs.

  1. You can request access to the mirrored memory of VU0 and VU1 in main RAM by using mmap yourself (or have Sony's libps2dev do it for you; this approach is the main focus of this document). You copy your data via memcpy to the mmap()ed memory location.
  2. You can send your data using the PS2's DMA controller. This can be done inefficiently by memcpy()ing your data into a newly allocated area of main RAM that you designate as the DMA packet; the more clever solution is to insert the neccessary prefix and suffix codes directly before and after your data in memory.
  3. Make use of a kernel module to directly access DMA registers (which are listed near the beginning of the EE User's Manual Version 5.0).
Technique 1 is very simple to implement, and is slow if you need to transfer large amounts of data repeatedlly. Technique 3 is the fastest possible and is more like programming on a real PS2 development box (T10k).

4.1. How do you write data directly to VU Data Memory?

See the code in Question 2.5., where two vectors ( <1,1,1,1> and <2,3,4,5> ) are written directly into VU Memory through the memory mapping done by mmap().  Another way to write data to the VUs is to send it using DMA to the VUs through the VIF interface.  This has an optional special speed advantage in that the data can be "compressed", yielding even faster transmission speeds (see next section).
 

4.2. What is this about sending "compressed data", and hardware decompression in the VIF?

The compression is actually bit-packing.  For example, by sending 1s and 0s as bytes, you would be using 8 times as much transmitted data as needed.  The set of 8 ones and zeros could be packed into a single byte:
 


00000001
00000000
00000000
00000001
00000001                  ->          10011101
00000001
00000000
00000001

The VIF handles the unpacking upon receipt of the packet.  See the VU User's Manual for details.
 

4.3. How do you send data through the VIF?

I haven't done this yet.  Anyone want to write this one up? [hikey has volunteered. :-) ]


Part 5: Miscellanious

5.1. What is vpu.cmd?

Anyone want to write this up?

5.2. What if I don't want to use .elf files?  How can I link my code in directly?

There is apparently a way to get around .elf files for those who are bothered by them.  [I am still not clear on how any of this works, so if someone wants to explain this in more detail, please feel free an e-mail me.   Ed.]    According to yonder (yonderboy@users.playstation2-linux.com):

"There is another way to load and run vpu code.  In your makefile, include the lines:
 

 %.vh: %.dsm 
              ee-dvp-as -g -o $*.vo $<>$*.lst 
              nm -g -n -P $*.vo | sed -e 's/^/#define /' -e 's/[^ ] 00\+/0x0/' > $*.vh 
              objcopy -Obinary $*.vo $*.vbin 

      that will create a vh file that contains all microcode labels and the address they are at.  If you include the file, you can
      just use vcallms. "

There is something similar to this in aibotpet's makefile:
 

vu.o:
        ee-dvp-as -o $*.vo_ $*.vu
        objcopy -Obinary $*.vo_ $*.bin_
        ./bin2as $* $*.bin_ > $*.a_
        as -mcpu=r5900 -KPIC -mips2 -o $*.o $*.a_
        rm $*.vo_ $*.bin_ $*.a_
 

which seems to use bin2as.cpp, an "elf code segment ripper".

[Editor's note:  If someone could provide me with a minimalist code example like the adder.vsm one above using this linking technique, it would help to make this document a better resource.]

5.3. How about a simple example of this linking?

Yonder provides an example:

assume microcode.dsm looks like this: 

.global vuo_microentry 
.global vu0_label_name 
vu0_microentry: 
.vu 
vu0_label_name: 
    mul[e] vf4, vf4, vf5         nop 
    nop                                  mr32 vf4,vf4 


step 1: assemble the source and build a header from extracted symbols. Assume the asm file is microcode.dsm and the created header is microcode.vh 
         ee-dvp-as -g -o $*.vo $<>$*.lst 
        nm -g -n -P $*.vo | sed -e 's/^/#define /' -e 's/[^ ] 00\+/0x0/' > $*.vh 
        objcopy -Obinary $*.vo $*.vbin 


step 2: in your C source: 


#include "microcode.vh" 


int do_something(vector x, vector y) 
{ 
   asm __volatile__(" 
   lqc2 vf4,0x0(%0) 
   lqc2 vf5,0x0(%1) 
   vcallms %2 
   sqc2 vf4, 0x0(%0) 
  "::"r"(x), "r"(y), "r"(vu0_label_name)); 
} 

5.4. How is this related to Colin's and Aibopet's methods?

Aibopet writes (see his mandelbrot code released at http://aibohack.com/ps2/):
"source.vu" is the source in raw VU format (no DMA tags, no VCL syntax)
The file is assembled, and the text portion is extracted (call this the VUCODE)
In the Sony samples, the VUCODE is linked into a separate .ELF file - which
is overkill IMHO.

Colin's method places the VUCODE in a separate VBIN file (without the ELF file
overhead).

My method uses the trivial "bin2as" utility turns it into assembler data
constants that are assembled and linked into the final program's .text segment, just like
any other code [of course it can't run on the main CPU]

The "MYVPU::LoadJob" function copies the VU program from the normal .textsegment
to VU code RAM with a trivial VUCODE memory manager.
The proper routine is then started with "VCALLMS"

----

No DMA is required (although it could be used if you so wish)

Right now there is one VU routine per source file [not a terrible limitation
for my case]. The actual VUCODE can run from any location (ie. there are no
VCALLMS fixed addresses)



The nm/sed creation of the .vh file creates absolute symbols at build time
for the VUCODE entries.
You must load the VBIN file at the very start of the VUCODE area.
My technique allocates VUCODE at run time (ie. there are no absoluteaddresses,
    so you have to use VCALLMSR instead of VCALLMS)
In future I want to implement a more dynamic process of VU code
loading/optimization(which is why I chose the dynamic technique)--AiboPet

---- from makefile ----

.vu.o:	ee-dvp-as -o $*.vo_ $*.vu
	objcopy -Obinary $*.vo_ $*.bin_	
	./bin2as $* $*.bin_ > $*.a_
	as -mcpu=r5900 -KPIC -mips2 -o $*.o $*.a_	
	rm $*.vo_ $*.bin_ $*.a_
------


the end.