logo
down
shadow

Segmentation error when using thrust::sort in CUDA


Segmentation error when using thrust::sort in CUDA

By : user2953387
Date : November 20 2020, 01:01 AM
Hope that helps
cutInfoptr is a pointer of type TetraCutInfo having the address of the device memory allocated using cudaMalloc.
code :
size_t N = 10;

// raw pointer to device memory
int * raw_ptr;
cudaMalloc((void **) &raw_ptr, N * sizeof(int));

// wrap raw pointer with a device_ptr 
thrust::device_ptr<int> dev_ptr(raw_ptr);

// use device_ptr in thrust algorithms
thrust::fill(dev_ptr, dev_ptr + N, (int) 0);


Share : facebook icon twitter icon
Crash with thrust::min_element on thrust::device_vector (CUDA Thrust)

Crash with thrust::min_element on thrust::device_vector (CUDA Thrust)


By : Shitiz Bhutani
Date : March 29 2020, 07:55 AM
around this issue I can reproduce the error using debug mode targeting Compute Capability 2.0 (i.e nvcc -G0 -arch=sm_20). The bug does not reproduce in release mode or when targeting Compute Capability 1.x devices, which generally suggests a code-generation problem instead of a bug in the library. Wherever the fault lies, I'd encourage you to submit a bug report so this issue gets the attention it deserves. In the meantime, I'd suggest compiling in release mode, which is more rigorously tested.
Is it possible to use thrust::device_vector and thrust::fill for 2D arrays using thrust library in CUDA

Is it possible to use thrust::device_vector and thrust::fill for 2D arrays using thrust library in CUDA


By : 0ver1oad
Date : March 29 2020, 07:55 AM
this will help In STL and thrust, a vector is a container of data elements, adhering to a strict linear sequence, therefore it is basically 1-D in nature. In thrust, these data elements can be ordinary types, and even structs and objects, but they cannot be other vectors (unlike STL).
You can create an array of vectors, but thrust operations on them will generally need to be done one-by-one on each vector within the array.
code :
thrust::device_vector D[5][10];
thrust::device_vector<int> D[5][10];
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>

#define H 5
#define W 10
__global__ void kernel(int *data, int row, int col) {

  printf("Element (%d, %d) = %d\n", row, col, data[(row*W)+col]);

}

int main(void)
{
    int h[H][W];
    thrust::device_vector<int> d(H*W);

    thrust::copy(&(h[0][0]), &(h[H-1][W-1]), d.begin());
    thrust::sequence(d.begin(), d.end());
    kernel<<<1,1>>>(thrust::raw_pointer_cast(d.data()), 2, 3);
    cudaDeviceSynchronize();

    return 0;
}
Sort 2D array in Cuda with Thrust

Sort 2D array in Cuda with Thrust


By : user3205134
Date : March 29 2020, 07:55 AM
like below fixes the issue It's possible to do this in thrust. One possible approach would be to create a custom sort functor that traverses the rows that are given to it (let's say the rows are identified via indices passed to the functor), and then decides the ordering of those rows.
To implement this, we can create an index array, one index per row, that we will sort. We will sort this index array based on the given data array (using the custom sort functor that orders rows).
code :
$ cat t631.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/sort.h>
#include <thrust/sequence.h>
#include <thrust/copy.h>

#define DWIDTH 10

typedef int mytype;

struct my_sort_functor
{
  int my_width;
  mytype *my_data;
  my_sort_functor(int _my_width, mytype * _my_data): my_width(_my_width), my_data(_my_data) {};

  __host__ __device__
  bool operator()(const int idx1, const int idx2) const
    {
      bool flip = false;
      for (int col_idx = 0; col_idx < my_width; col_idx++){
        mytype d1 = my_data[(idx1*my_width)+col_idx];
        mytype d2 = my_data[(idx2*my_width)+col_idx];
        if (d1 > d2) break;
        if (d1 < d2) {flip = true; break;}
        }
      return flip;
    }
};

int main(){

  mytype data[] = {
    3,     2,     2,     3,     2,     2,     3,     3,     3,     3,
    3,     3,     2,     2,     2,     2,     3,     3,     2,     2,
    3,     2,     2,     3,     2,     2,     3,     3,     3,     2,
    2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
    3,     2,     2,     2,     2,     2,     3,     2,     2,     2,
    2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
    3,     3,     2,     3,     2,     2,     3,     3,     2,     3,
    3,     3,     2,     2,     2,     2,     3,     3,     3,     3,
    3,     2,     2,     3,     2,     2,     3,     3,     2,     3,
    3,     3,     2,     3,     2,     2,     3,     3,     3,     3 };

  int cols  = DWIDTH;
  int dsize = sizeof(data)/sizeof(mytype);
  int rows  = dsize/cols;
  thrust::host_vector<mytype>   h_data(data, data+dsize);
  thrust::device_vector<mytype> d_data = h_data;
  thrust::device_vector<int> idxs(rows);
  thrust::sequence(idxs.begin(), idxs.end());
  thrust::sort(idxs.begin(), idxs.end(), my_sort_functor(cols, thrust::raw_pointer_cast(d_data.data())));
  thrust::host_vector<int> h_idxs = idxs;

  for (int i = 0; i<rows; i++){
    thrust::copy(h_data.begin()+h_idxs[i]*cols, h_data.begin()+(h_idxs[i]+1)*cols, std::ostream_iterator<mytype>(std::cout, ", "));
    std::cout << std::endl;}
  return 0;
}

$ nvcc -o t631 t631.cu
$ ./t631
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 2, 2, 2, 2, 2, 3, 2, 2, 2,
3, 2, 2, 3, 2, 2, 3, 3, 2, 3,
3, 2, 2, 3, 2, 2, 3, 3, 3, 2,
3, 2, 2, 3, 2, 2, 3, 3, 3, 3,
3, 3, 2, 2, 2, 2, 3, 3, 2, 2,
3, 3, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 2, 3, 2, 2, 3, 3, 2, 3,
3, 3, 2, 3, 2, 2, 3, 3, 3, 3,
$
CUDA Thrust sort or CUB::DeviceRadixSort

CUDA Thrust sort or CUB::DeviceRadixSort


By : user3419038
Date : March 29 2020, 07:55 AM
To fix this issue In either cub or thrust, we could sort on the .w "keys" only, doing a key-value sort where the values are just a linear incrementing index:
code :
0, 1, 2, 3, ...
$ cat t686.cu
#include <iostream>
#include <vector_types.h>
#include <stdlib.h>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/sequence.h>
#include <thrust/copy.h>
#include <thrust/equal.h>

#include <time.h>
#include <sys/time.h>
#define USECPSEC 1000000ULL

unsigned long long dtime_usec(unsigned long long start){

  timeval tv;
  gettimeofday(&tv, 0);
  return ((tv.tv_sec*USECPSEC)+tv.tv_usec)-start;
}

#define DSIZE (32*1048576)

struct sort_f4_w
{
  __host__ __device__
  bool operator()(const float4 &a, const float4 &b) const {
    return (a.w < b.w);}
};
// functor to extract the .w element from a float4
struct f4_to_fw : public thrust::unary_function<float4, float>
{
  __host__ __device__
  float operator()(const float4 &a) const {
    return a.w;}
};
// functor to extract the .x element from a float4
struct f4_to_fx : public thrust::unary_function<float4, float>
{
  __host__ __device__
  float operator()(const float4 &a) const {
    return a.x;}
};


bool validate(thrust::device_vector<float4> &d1, thrust::device_vector<float4> &d2){
  return thrust::equal(thrust::make_transform_iterator(d1.begin(), f4_to_fx()), thrust::make_transform_iterator(d1.end(), f4_to_fx()), thrust::make_transform_iterator(d2.begin(), f4_to_fx()));
}


int main(){
  unsigned long long t1_time, t2_time;
  float4 *mydata = new float4[DSIZE];
  for (int i = 0; i < DSIZE; i++){
    mydata[i].x = i;
    mydata[i].y = i;
    mydata[i].z = i;
    mydata[i].w = rand()/(float)RAND_MAX;}

  thrust::host_vector<float4>   h_data(mydata, mydata+DSIZE);
  // do once as a warm-up run, then report timings on second run
  for (int i = 0; i < 2; i++){
    thrust::device_vector<float4> d_data1 = h_data;
    thrust::device_vector<float4> d_data2 = h_data;

  // first time sort using typical thrust approach
    t1_time = dtime_usec(0);
    thrust::sort(d_data1.begin(), d_data1.end(), sort_f4_w());
    cudaDeviceSynchronize();
    t1_time = dtime_usec(t1_time);
  // now extract keys and create index values, sort, then rearrange
    t2_time = dtime_usec(0);
    thrust::device_vector<float> keys(DSIZE);
    thrust::device_vector<int> vals(DSIZE);
    thrust::copy(thrust::make_transform_iterator(d_data2.begin(), f4_to_fw()), thrust::make_transform_iterator(d_data2.end(), f4_to_fw()), keys.begin());
    thrust::sequence(vals.begin(), vals.end());
    thrust::sort_by_key(keys.begin(), keys.end(), vals.begin());
    thrust::device_vector<float4> result(DSIZE);
    thrust::copy(thrust::make_permutation_iterator(d_data2.begin(), vals.begin()), thrust::make_permutation_iterator(d_data2.begin(), vals.end()), result.begin());
    cudaDeviceSynchronize();
    t2_time = dtime_usec(t2_time);
    if (!validate(d_data1, result)){
      std::cout << "Validation failure " << std::endl;
      }
    }
  std::cout << "thrust t1 time: " << t1_time/(float)USECPSEC << "s, t2 time: " << t2_time/(float)USECPSEC << std::endl;
}


$ nvcc -o t686 t686.cu
$ ./t686
thrust t1 time: 0.731456s, t2 time: 0.149959
$
Thrust (CUDA Library) Compile error like "'vectorize_from_shared_kernel__entry' : is not a member of 'thrust::detai

Thrust (CUDA Library) Compile error like "'vectorize_from_shared_kernel__entry' : is not a member of 'thrust::detai


By : user3852428
Date : October 15 2020, 01:18 AM
I hope this helps . Is this in a .cu file (compiled with nvcc)? Thrust code should be put in .cu files.
Incidentally, personally I avoid the CUDA VS Wizard (partly because it is not provided/supported by NVIDIA) and use the strategy described in this other post.
shadow
Privacy Policy - Terms - Contact Us © ourworld-yourmove.org