DMA Cuda Implementation API

group aml_dma_cuda

dma between devices and host.

Cuda dma is an implementation of aml dma to transfer data between devices and between host and devices.

#include <aml/dma/cuda.h>

See also

aml_dma

Defines

AML_DMA_CUDA_REQUEST_STATUS_NONE
AML_DMA_CUDA_REQUEST_STATUS_PENDING
AML_DMA_CUDA_REQUEST_STATUS_DONE
AML_DMA_CUDA_DEVICE_PAIR(src, dst)

Embed a pair of devices in a void* to use as dma copy_operator argument when copying from device to device.

AML_DMA_CUDA_DEVICE_FROM_PAIR(pair, src, dst)

Translate back a pair of device ids stored in pair (void*) into to device id integers.

Functions

int aml_dma_cuda_request_create(struct aml_dma_data *data, struct aml_dma_request **req, struct aml_layout *dest, struct aml_layout *src, aml_dma_operator op, void *op_arg)

AML dma cuda request creation operator.

Returns:

-AML_EINVAL if data, req, *req, dest or src is NULL.

Returns:

-AML_ENOMEM if allocation failed.

Returns:

AML_SUCCESS on success.

int aml_dma_cuda_request_wait(struct aml_dma_data *dma, struct aml_dma_request **req)

AML dma cuda request wait operator.

Returns:

-AML_EINVAL if dma, req, *req is NULL or if data was does not come from the dma used in request creation.

Returns:

AML_SUCCESS on success.

int aml_dma_cuda_barrier(struct aml_dma_data *data)

AML dma cuda barrier operator.

Returns:

AML_SUCCESS on success.

int aml_dma_cuda_request_destroy(struct aml_dma_data *dma, struct aml_dma_request **req)

AML dma cuda request deletion operator

int aml_dma_cuda_create(struct aml_dma **dma, const enum cudaMemcpyKind kind)

Creation of a dma engine for cuda backend.

See also

struct aml_dma_cuda_data.

Parameters:
  • dma – A pointer to set with a new allocated dma.

  • kind – The kind of transfer performed: host to device, device to host, device to device, or host to host.

Returns:

-AML_EINVAL if dma can’t be set.

Returns:

-AML_FAILURE if any cuda backend call failed.

Returns:

-AML_ENOMEM if allocation failed.

Returns:

AML_SUCCESS on success.

int aml_dma_cuda_destroy(struct aml_dma **dma)

Destroy a created dma and set it to NULL

int aml_dma_cuda_copy_1D(struct aml_layout *dst, const struct aml_layout *src, void *arg)

Cuda DMA operator implementation: Use only with aml_dma_cuda_request_create() or higher level aml_dma_async_copy_custom(). This copy operator is compatible only with:

  • This dma cuda implementation,

  • Dense source and destination layouts of one dimension. Make a flat copy of contiguous bytes in between two layout raw pointers. The size of the byte stream is computed as the product of dimensions and element size.

    See also

    aml_layout_dense

Parameters:
  • dst[in] The destination layout of the copy.

  • src[in] The source layout of the copy.

  • arg[in] A pair of device ids obtained with AML_DMA_CUDA_DEVICE_PAIR.op_arg is used only if the dma used with this operator is cudaMemcpyDeviceToDevice kind of dma.

Returns:

an AML error code.

int aml_dma_cuda_memcpy_op(struct aml_layout *dst, const struct aml_layout *src, void *arg)

Cuda DMA operator implementation: Use only with aml_dma_cuda_request_create() or higher level aml_dma_async_copy_custom(). This copy operator is compatible only with:

  • This dma cuda implementation (device to device is not supported),

  • Flat source and destination pointers. Make a flat asychronous copy of contiguous bytes between two raw pointers. This dma operator casts input layout pointers into void* and assumes these are contiguous set of bytes to copy from src to dst in the linux memcpy() fashion with cudaMemcpyAsync().

Parameters:
  • dst[out] The destination (void*) of the copy casted into a struct aml_layout *.

  • src[in] The source (void*) of the copy casted into a struct aml_layout *.

  • arg[in] The size (size_t) of the copy casted into a void*.

Returns:

AML_SUCCESS

Variables

struct aml_dma_ops aml_dma_cuda_ops

Default dma ops used at dma creation

struct aml_dma aml_dma_cuda

Dma on stream 0 with cudaMemcpyDefault copy kind. Requires that the system supports unified virtual memory.

struct aml_dma_cuda_request
#include <cuda.h>

Cuda DMA request. Only need a status flag is needed.

struct aml_dma_cuda_data
#include <cuda.h>

aml_dma data structure. AML dma cuda contains a single execution stream. When waiting a request, the whole request stream is synchronized and all the requests are waited.

struct aml_dma_cuda_op_arg
#include <cuda.h>

Structure passed to aml_dma_operator arg argument by the request created in aml_dma_cuda_request_create(). All aml_dma_operator implementations can expect to obtain a pointer to this structure as arg argument. The pointer is valid only for the lifetime of the aml_dma_operator call.