header image
glTexImage2D doesn’t load sequential RGB data correctly
January 7th, 2011 under OpenGL. [ Comments: none ]

I wanted to render the image I had in memory with OpenGL. The image data are stored as

1
unsigned byte
unsigned byte
data without padding. The colors are represented by RGB (in this order). When I loaded the texture data with

1
2
3
4
5
6
7
8
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB,
             width, height, 0, 
             GL_RGB,  GL_UNSIGNED_BYTE, 
             imageData);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB,
             width, height, 0, 
             GL_RGB,  GL_UNSIGNED_BYTE, 
             imageData);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

the rendered image showed only a black and with image which was line-wise distorted. The reason for the distortion was that by image data was not correctly aligned. The

1
GL_UNPACK_ALIGNMENT
GL_UNPACK_ALIGNMENT
defines the alignment when uploading data. So, setting
1
glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
solved my problem.


AntTweakBar
September 24th, 2009 under OpenGL. [ Comments: none ]

Yesterday I discovered a great libraray for OpenGL/DirectX which allows you to easily add an interface to change varibales. It is called AntTweakBar.

What needs to be done is just to tell the library what your variables are and of what type they are. The rest is done by the library. So if you are looking for an easy-to-use but still visually appealing way to edit certain variables of your OpenGL/DirectX application this is certainly a library to look into.


CUDA 2D/2D Memcpy Issue and Pitched Pointers
August 18th, 2009 under OpenGL. [ Comments: none ]

In my application I wanted to copy 3D array data from linear memory (allocated with

1
cudaMalloc
cudaMalloc
) to a
1
cudaArray
cudaArray
with
1
cudaMemcpy3D
cudaMemcpy3D
. Now, I had the problem that
1
cudaMemcpy3D 
cudaMemcpy3D 
reported
1
cudaErrorInvalidValue
cudaErrorInvalidValue
. If you encounter a problem like that it might be because of the fact that
1
cudaMalloc 
cudaMalloc 
was used instead of
1
cudaMalloc3D
cudaMalloc3D
for allocatimg the linear memory. If
1
cudaMalloc 
cudaMalloc 
was used then you are forced to create you own
1
cudaPitchedPtr 
cudaPitchedPtr 
with
1
make_cudaPitchedPtr
make_cudaPitchedPtr
in order to call
1
cudaMemcpy3D
cudaMemcpy3D
. According to the manual (this information is not extremely emphasized in it)
1
make_cudaPitchedPtr
make_cudaPitchedPtr
is not guaranteed to construct a pitched pointer which is valid for memory copy procedures. One should always use
1
cudaMalloc3D 
cudaMalloc3D 
or
1
cudaMallocPitched 
cudaMallocPitched 
whenever we want to memcpy because these methods guarantee valid pitched pointers.

“For allocations of 2D and 3D objects, it is highly recommended that programmers perform allocations using cudaMalloc3D() or cudaMallocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing memory copies involving 2D or 3D objects (whether linear memory or CUDA arrays).” [from the Reference Manual]

This, in fact, solved my problem of the

1
cudaErrorInvalidValue
cudaErrorInvalidValue
when I tried to
1
cudaMemcpy3D
cudaMemcpy3D
. However, the fact that we use pitched pointer complicates the array access in the kernels. First of all we have to keep track of the data type stored in a pitched pointer (ptr is a void pointer). Furthermore, we have to be careful when dealing with pointer arithmectics because the pitch is, obviously, a byte offset (width of the array).  Here is some sample code that shows how to access pitched pointer data. We assume we want to store
1
float4 
float4 
data in our 3D array which is of size (width, height, depth). In the kernel we want to access the element (x,y,z).

Host Code:

1
2
3
4
5
...<br />
cudaPitchedPtr data;<br />
extent = make_cudaExtent(width* sizeof(float4), height, depth);<br />
cudaMalloc3D(&(d_data), extent);<br />
...
...<br />
cudaPitchedPtr data;<br />
extent = make_cudaExtent(width* sizeof(float4), height, depth);<br />
cudaMalloc3D(&(d_data), extent);<br />
...

Kernel Code:

1
__global__ kernel(char* data, size_t pitch, ...) {
__global__ kernel(char* data, size_t pitch, ...) {

1
2
3
...<br />
float4 element = *((float4*) (data + (x*sizeof(float4) + y*pitch + z*pitch*height)));<br />
...
...<br />
float4 element = *((float4*) (data + (x*sizeof(float4) + y*pitch + z*pitch*height)));<br />
...

The one thing I’d like to point out is the fact that we have to use

1
x*sizeof(float4)
x*sizeof(float4)
for accessing the data. This makes sense because is a char/byte pointer at the time of the pointer arithmetic. Thus, we have to move the pointer by
1
x*sizeof(float4)
x*sizeof(float4)
. This musn’t be done in the y and z direction because the
1
sizeof(float4)
sizeof(float4)
is already considered in the pitch (the pitch is the width of the array in byte, probably padded for faster memory access).


CUDA 3D Array Pitch
August 17th, 2009 under OpenGL. [ Comments: none ]

The CUDA manual isn’t really precise in telling how to use the pitch with 3D Array (memory obtained by using

1
cudaMallocPitch, cudaMalloc3D
cudaMallocPitch, cudaMalloc3D
, not 3D cudaArrays). So here is a brief summary of how (I think) everything works.

What nVidia means by a pitch is basically the width in byte of an array (2D or 3D). We can specify the width of the array in bytes like this

1
volumeSize.width *sizeof(YOUR_TYPE)
volumeSize.width *sizeof(YOUR_TYPE)
. With this information we can access any element in a 3D array. We can compute the
1
slicePitch
slicePitch
(for a 3D array) which is
1
volumeSize.height*pitch
volumeSize.height*pitch
. The
1
slicePitch
slicePitch
is the offset in bytes from one slice in a 3D array to the next slice.

Now let’s say we want to access the element (x,y,z) in a 3D array

(which is the base pointer). This is how we can do it:

1
(myType) (((char*)data)[x*sizeof(myType) + y*pitch + z*slicePitch])
(myType) (((char*)data)[x*sizeof(myType) + y*pitch + z*slicePitch])

So what we get from the methods

1
cudaMallocPitch, cudaMalloc3D
cudaMallocPitch, cudaMalloc3D
is linear memory which need to be accessed in the way described above.
We need to know how pitches work when we want to copy arrays (when the copy method expects a pitched pointer which we can create with
1
make_cudaPitchedPtr(baseAddress, pitch, width, height)
make_cudaPitchedPtr(baseAddress, pitch, width, height)
). This is also sufficient for the copy process. Let’s assume we use
1
cudaMemcpy3D
cudaMemcpy3D
. In this case we must specify the extent, in other words the size of our array. All necessary information to execute the copy are now present (it can be ensured that no invalid memory will be accessed) . Each element can be accessed by
1
cudaMemcpy3D
cudaMemcpy3D
as described above. So, this means that we do not need to use the depth (in case of a 3D array) when we
1
make_cudaPitchedPtr
make_cudaPitchedPtr
.


CUDA and Critical Sections with Locks
July 21st, 2009 under OpenGL. [ Comments: none ]

I had the following problem:

Each CUDA Thread might possible write to any cell in a 3d array in global memory. Thus, some synchronization is required.

The easiest solution is using the atomic operations which are provided by CUDA. Unfortunately, the performance isn’t great. Therefore, I tried improving the performance by implementing a locking mechanism of my own. Simply, using a lock.

The result was that this approach is absolutely useless. I experienced a performance loss of nearly 1000x


« Previous entries