DMA Cuda Implementation API¶

group aml_dma_cuda

dma between devices and host.

Cuda dma is an implementation of aml dma to transfer data between devices and between host and devices.

#include <aml/dma/cuda.h>

See also

Parameters:

dma – A pointer to set with a new allocated dma.
kind – The kind of transfer performed: host to device, device to host, device to device, or host to host.

Returns:

-AML_EINVAL if dma can’t be set.

Returns:

-AML_FAILURE if any cuda backend call failed.

Returns:

-AML_ENOMEM if allocation failed.

Returns:

AML_SUCCESS on success.

int aml_dma_cuda_destroy(struct aml_dma **dma)¶: Destroy a created dma and set it to NULL

int aml_dma_cuda_copy_1D(struct aml_layout *dst, const struct aml_layout *src, void *arg)¶

Cuda DMA operator implementation: Use only with aml_dma_cuda_request_create() or higher level aml_dma_async_copy_custom(). This copy operator is compatible only with:

This dma cuda implementation,
Dense source and destination layouts of one dimension. Make a flat copy of contiguous bytes in between two layout raw pointers. The size of the byte stream is computed as the product of dimensions and element size.

See also

aml_layout_dense

Parameters:

dst – [in] The destination layout of the copy.
src – [in] The source layout of the copy.
arg – [in] A pair of device ids obtained with AML_DMA_CUDA_DEVICE_PAIR.op_arg is used only if the dma used with this operator is cudaMemcpyDeviceToDevice kind of dma.

Returns:

an AML error code.

int aml_dma_cuda_memcpy_op(struct aml_layout *dst, const struct aml_layout *src, void *arg)¶

Cuda DMA operator implementation: Use only with aml_dma_cuda_request_create() or higher level aml_dma_async_copy_custom(). This copy operator is compatible only with:

This dma cuda implementation (device to device is not supported),
Flat source and destination pointers. Make a flat asychronous copy of contiguous bytes between two raw pointers. This dma operator casts input layout pointers into void* and assumes these are contiguous set of bytes to copy from src to dst in the linux memcpy() fashion with cudaMemcpyAsync().

Parameters:

dst – [out] The destination (void*) of the copy casted into a struct aml_layout *.
src – [in] The source (void*) of the copy casted into a struct aml_layout *.
arg – [in] The size (size_t) of the copy casted into a void*.

Returns:

AML_SUCCESS

Variables

struct aml_dma aml_dma_cuda¶: Dma on stream 0 with cudaMemcpyDefault copy kind. Requires that the system supports unified virtual memory.

struct aml_dma_cuda_request¶: #include <cuda.h>

Cuda DMA request. Only need a status flag is needed.

struct aml_dma_cuda_data¶: #include <cuda.h>

aml_dma data structure. AML dma cuda contains a single execution stream. When waiting a request, the whole request stream is synchronized and all the requests are waited.

struct aml_dma_cuda_op_arg¶: #include <cuda.h>

Structure passed to aml_dma_operator arg argument by the request created in aml_dma_cuda_request_create(). All aml_dma_operator implementations can expect to obtain a pointer to this structure as arg argument. The pointer is valid only for the lifetime of the aml_dma_operator call.