As far as I know, you can share the host's RAM as long as it is page-locked (pinned) memory. In that case, data transfer will be much faster because you don't need to explicitly transfer data, you just need to make sure that your synchronize your work (with cudaDeviceSynchronize
for instance, if using CUDA).
Now, for this question:
I am just wondering, if the total memory (all the variables and all the arrays) in my kernel hits 6 GB of the GPU RAM, can I somehow use the CPU’s one?
I don't know if there is a way to "extend" the GPU memory. I don't think the GPU can use pinned memory that is bigger than its own, but I am not certain. What I think you could do in this case is to work in batches. Can your work be distributed so that you only work on 6gb at a time, save the result, and work on another 6gb? In that case, then working in batches might be a solution.
For example, you could implement a simple batching scheme like this:
int main() {
float *hst_ptr = nullptr;
float *dev_ptr = nullptr;
size_t ns = 128; // 128 elements in this example
size_t data_size = ns * sizeof(*hst_ptr);
cudaHostAlloc((void**)&hst_ptr, data_size, cudaHostAllocMapped);
cudaHostGetDevicePointer(&dev_ptr, hst_ptr, 0);
// say that we want to work on 4 batches of 128 elements
for (size_t cnt = 0; cnt < 4; ++cnt) {
populate_data(hst_ptr); // read from another array in ram
kernel<<<1, ns>>>(dev_ptr);
cudaDeviceSynchronize();
save_data(hst_ptr); // write to another array in ram
}
cudaFreeHost(hst_ptr);
}