SPS2 - Direct PS2 Access Environment - Bugs


Summary |  Forums |  Bugs |  News |  Source |  Files | 

Submit A Bug | Open Bugs | Admin

[ Bug #108 ] sps2_Allocate/sps2_Free crashes after many repetitions

Date:
2003-Nov-26 11:21
Submitted By:
gevmage
Assigned To:
sauce
Category:
None
Priority:
3
Bug Group:
None
Resolution:
None
Summary:
sps2_Allocate/sps2_Free crashes after many repetitions

Original Submission:
[I don't have the code in front of me, so I'm guessing at the function names]

The functions that do the allocation/de-allocation of the special non-swappable seem to have some kind of error in them. After several hundred repetitions of
sps2Allocate()
sps2Free()
on the same pointer (without doing anything else), the machine becomes unstable or crashes. I've had it happen while the program was running, and I've had it happen after I'd done a bunch of tries and then was doing something else. Perhaps it's doing something not quite right with the page tables?

When sps2 is doing its stuff, I get lots of messages on the console as to what it's doing. When it's failed in the above situations, I've had it come up with the kernel panic screen. I've also had it have "kemem_alloc: Bad slab magic (corrupt) (name=size-256)", and also "mapping physical 10000000 to virtual 00010000". I tried upgrading to version 0.3.0a for both the module and the dev header files, but I was able to crash it then too.

Monday (December 1), I will be back in my office and able to try different things to reproduce the bug. Since playstation2-linux.com was down yesterday afternoon as I was about to leave, I'm uploading this from the parent-in-law's cabin, which is why I don't have code available.

Add A Comment:

You Are NOT Logged In

Please log in, so followups can be emailed to you.

If you cannot login, then enter your email address here:

Followups

Comment Date By
Very well, thanks for your submission. I'll try to take a look at it soon and see what I can find.

Sauce
2003-Dec-17 09:46sauce
I have built a library that does arbitrary sized matrix-matrix multiplies using VU1 as a computational core. I use sps2 to allocate the memory for the matrices so that the virtual addresses are contiguous and the memory is pinned in physical space.

I recently completed a function wrapper that allows BLAS-type calls to my matrix multiplier. A graduate student here at the University of Illinois is using the multiplier in a test harness that characterizes the speed of the core vs. the speed of the VU by exploring the parameter space of matrix sizes; in other words, allocating many many matrices of different sizes, testing them, and the un-allocating them, then moving on to the next permutation.

That is the application that started causing our machines to fail in mysterious ways. I didn't post the actual code that first caused the problem because it would require a couple of layers of libraries and tons of code. I spent some time tracking the problem to make sure that it wasn't something in my code, and what I posted was the result. That is, the problem still existed (consistently) even when I stripped out everything that I wrote and just did the Allocate/Free in a loop.

The point that I can write a work-around is well taken. However, the existence of this bug means that anyone using the sps2 on a regular basis will have periodic, wierd crashes that will be impossible to track down. After running sps2 a bunch, I've had ittermittent failures in other applications, like gcc or emacs. When gcc seg-faulted on me in that circumstance, I ran the same compile again, and the second time it worked without a problem.

2003-Dec-11 09:28gevmage
Hi Craig,

Indeed I have been aware of this bug for a very long time now. I have given it extremely low priority because -- what you are doing, while is good for reproducing the bug, is not good for much else.

Do you have a specific need or application that actually triggers this bug and thus escalates its priority?

Remember, one of the points of sps2 is to try to reduce the number of user->kernel space transitions. If you're really interested in allocating/freeing so much then you should consider allocating a chunk in advance and writing an extremely simple memory manager in your application.

Sauce

2003-Dec-11 08:53sauce


I have verified that the code works with all of my library calls stripped out of it. The below is my latest test program, which only calls sps2 directly. It can be compiled and run on a PlayStation 2 with the Linux kit and the sps2 installed, the only option required is to have the -I option to point to where sps2lib.h is.

I just ran this code again, and it failed on iteration 109.

Craig Steffen

#include <stdio.h>
#include <unistd.h>
#include <values.h>
#include <time.h>
#include <sps2lib.h>

int main(int nargs, char *args[]){

int N_ops_per_iteration=25;
int N_iterations=200;
int wait_per_iteration=1;
int i,j;
int SPS2_device_handle;
sps2Memory_t *mem_pointer;

if((SPS2_device_handle=sps2Init())<0){
fprintf(stderr,"Failed to initialize SPS device!\n");
return 1;
}

for(i=0;i<N_iterations;i++){
fprintf(stderr,"Iteration %4d: ",i);
fflush(NULL);
for(j=0;j<N_ops_per_iteration;j++){
fprintf(stderr,".");
fflush(NULL);

/* now we do the stuff that needs testing */
if((mem_pointer=sps2Allocate(10000,SPS2_MAP_BLOCK_4K |
SPS2_MAP_CACHED,
SPS2_device_handle))
==NULL){
fprintf(stderr,"sps2Allocate() returned NULL!\n");
exit(1);
}
sps2Free(mem_pointer);
} // for(j...)
fprintf(stderr,"\n");
fflush(NULL);
sleep(wait_per_iteration);
} // for(i..)

return 0;
}

2003-Dec-03 08:29gevmage
A further comment:

The failure seems to happen after about the same number of allocates/de-allocate, and at least at a low level, is not dependent on the size of the allocates, just the number of them.

In the code in the previous comment, I increased N_iterations to 500. The code then crashed right around iteration 117. Decreasing it to 90, it ran Ok, but then died about halfway through the next run. The first test seems to indicate the failure is after about 2000 allocates/deallocates.

Craig Steffen
2003-Dec-02 14:06gevmage
Now that I'm back at my actual office, here's the code that when run multiple times, eventually crashes the machine:

[lib_train_initialize() can be replaced by the sps2Init function, and then the SPS2 device handle used as usual]

int main(int nargs, char *args[]){

int N_ops_per_iteration=25;
int N_iterations=10;
int wait_per_iteration=1;
int i,j;
int SPS2_device_handle;
sps2Memory_t *mem_pointer;

/* general initialize function */
if(lib_train_initialize()){
fprintf(stderr,"Could not initialize lib_train_multiply!\n");
exit(1);
}

if((SPS2_device_handle=get_SPS2_device_handle()) == -1){
fprintf(stderr,"Error from get_SPS2_device_handle()!\n");
exit(1);
}

for(i=0;i<N_iterations;i++){
fprintf(stderr,"Iteration %4d: ",i);
fflush(NULL);
for(j=0;j<N_ops_per_iteration;j++){
fprintf(stderr,".");
fflush(NULL);

/* now we do the stuff that needs testing */
if((mem_pointer=sps2Allocate(400,SPS2_MAP_BLOCK_4K |
SPS2_MAP_CACHED,
SPS2_device_handle))
==NULL){
fprintf(stderr,"sps2Allocate() returned NULL!\n");
exit(1);
}
sps2Free(mem_pointer);
} // for(j...)
fprintf(stderr,"\n");
fflush(NULL);
sleep(wait_per_iteration);
} // for(i..)

return 0;
}
2003-Dec-01 15:45gevmage

Bug Change History

Field Old Value Date By
priority52003-Dec-11 10:15sauce
assigned_tonobody2003-Dec-11 10:15sauce