Memory

The memory part of IR APIs.

tvm.aipu.script.ir.memory.ptr(dtype, scope='private')

Annotates a function parameter as a pointer.

Parameters

dtypeUnion[str, DataType]

The data type of the data that the pointer points to.

Scalar dtype:int8, uint8, int16, uint6, int32, uint32, float16, float32, void
Vector dtype:int8x32, uint8x32, int16x16, uint16x16, int32x8, uint32x8, float16x16, float32x8

scopeOptional[str]

The memory space of the data that the pointer points to. The valid choices are listed below.

global: Represent the global DDR space of Address Space Extension region ID (ASID) 0.
global.1: Represent the global DDR space of ASID 1.
global.2: Represent the global DDR space of ASID 2.
global.3: Represent the global DDR space of ASID 3.
private: Represent the stack space of each TEC.
lsram: Represent the local SRAM space of each TEC.
shared: Represent the shared SRAM space between all TECs in the same core.
constant: Represent the global constant DDR space.

Returns

retPointer: The pointer instance.

Examples

@S.prim_func
def func(a: S.ptr("i8", "global"), b: S.ptr("i8", "global"), n: S.i32):
    for i in range(n):
        b[i] = a[i]

Parameters

pointerPointer: The data pointer to be matched.
shapeUnion[List[PrimExpr], Tuple[PrimExpr], PrimExpr, Integral]: The data shape to match.

Returns

retBuffer: The matched buffer.

Examples

# 2D transpose demo b[j, i] = a[i, j]
@S.prim_func
def func(A: S.ptr("int8", "global"), B: S.ptr("int8", "global"), h: S.int32, w: S.int32):
    a = S.match_buffer(A, shape=(h, w))
    b = S.match_buffer(B, shape=(w, h))

    for i,j in S.grid(h, w):
        b[j, i] = a[i, j]

tvm.aipu.script.ir.memory.alloc(shape, dtype, scope='private')

Allocates buffer with shape, dtype, scope, and returns the pointer.

Parameters

shapeUnion[List[int], Tuple[int], int]

The shape of buffer.

dtypeUnion[str, DataType]

The data type of buffer elements. Can be scalar dtype or vector dtype.

scopeOptional[str]

The memory space in which the data is allocated. The valid choices are listed below.

private: Represent the stack space of each TEC.
lsram: Represent the local SRAM space of each TEC.
shared: Represent the shared SRAM space between all TECs in the same core.

Returns

retPointer: The allocated buffer pointer.

Examples

lsram_a = S.alloc(1024, "int8", scope="lsram")
S.dma_copy(lsram_a, ddr_a, 1024)

lsram_a = S.alloc(256, "float16x16", scope="shared")
lsram_a = S.alloc((256,), "int8", scope="lsram")
lsram_a = S.alloc([1024], "int8", scope="lsram")

Parameters

shapeUnion[List[int], Tuple[int], int]

The shape of buffer.

dtypeUnion[str, DataType]

The data type of buffer elements. Can be scalar dtype or vector dtype.

scopeOptional[str]

The memory space in which the data is allocated. The valid choices are listed below.

private: Represent the stack space of each TEC.
lsram: Represent the local SRAM space of each TEC.
shared: Represent the shared SRAM space between all TECs in the same core.

Returns

retBuffer: The allocated buffer.

Examples

lsram_a = S.alloc_buffer([32, 32], dtype, scope="lsram")
S.dma_copy(lsram_a, ddr_a, 32 * 32)
b = lsram_a[10, 5] + 2

Parameters

shapeUnion[List[PrimExpr], Tuple[PrimExpr]]: The shape of const buffer.
dtypeUnion[str, DataType]: The data type of const buffer elements.
datanp.array: The data of const buffer.

Returns

retBuffer: The allocated buffer.

Examples

# The "lut_data" can be created in pure Python environment during compile time.
lut_data = np.array(list(range(512)),dtype="float16")

@S.prim_func
def func(inp: S.ptr("fp16", "global"), out: S.ptr("fp16", "global")):
    lut = S.alloc_const((512,), "float16", lut_data)
    ...

tvm.aipu.script.ir.memory.vload(ptr, mask=None, lanes=None, stride=1)

Load a vector from contiguous or strided memory addresses.

The inactive elements of result vector are set to zero.
The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.

Parameters

ptrPointer: The pointer that store the base memory address.
maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]: The predication mask to indicate which elements of the vector are active for the operation. None means all elements are active.
lanesOptional[int]: The lanes of result vector dtype. If omitted, will be automatically determined based on the type of input address.
strideOptional[Union[PrimExpr, int]]: The stride of each element. Will take one element every so many stride intervals.

Returns

retPrimExpr: The result expression.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

va = S.vload(ptr_a)
va = S.vload(ptr_a, mask="3T5F")
va = S.vload(ptr_a, mask=S.tail_mask(n, 8))
va = S.vload(ptr_a, mask="1T31F", lanes=32)
va = S.vload(ptr_a, mask="16T16F", lanes=32, stride=4)

Parameters

valuePrimExpr: The vector that needs to be stored.
ptrPointer: The pointer that store the base memory address.
maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]: The predication mask to indicate which elements of the vector are active for the operation. None means all elements are active.
strideOptional[Union[PrimExpr, int]]: The stride of each element. Will store one element every so many stride intervals.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

S.vstore(va, ptr_a)
S.vstore(va, ptr_a, mask="3T5F")
S.vstore(va, ptr_a, mask=S.tail_mask(n, 8))
S.vstore(va, ptr_a, mask="T7F", stride=4)

Parameters

ptrPointer: The pointer that store the base memory address.
indicesPrimExpr: The indices used to calculate the discrete memory addresses, it must be a 16-bit integer vector, its length decide the length of result vector.
maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]: The predication mask to indicate which elements of the vector are active for the operation. None means all elements are active.

Returns

retPrimExpr: The result expression.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

va = S.vload_gather(ptr_a, vb)
va = S.vload_gather(ptr_a, vb, mask="T7F")
va = S.vload_gather(ptr_a, vb, mask=S.tail_mask(n, 8))

Parameters

valuePrimExpr: The vector that need to be stored.
ptrPointer: The pointer that store the base memory address.
indicesPrimExpr: The indices used to calculate the discrete memory addresses, it must be a 16-bit integer vector, its length should be equal to that of value.
maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]: The predication mask to indicate which elements of the vector are active for the operation. None means all elements are active.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

S.vstore_scatter(va, ptr_a, vb)
S.vstore_scatter(va, ptr_a, vb, mask="3T5F")
S.vstore_scatter(va, ptr_a, vb, mask=S.tail_mask(n, 8))

Parameters

dstPointer: The pointer that store the destination memory address.
srcPointer: The pointer that store the source memory address.
widthUnion[PrimExpr, int]: The number of data to be transfer inside one stride jump.
src_strideOptional[Union[PrimExpr, int]]: The number of source data will be jump over for each stride jump. None means equal with the value of width, i.e., load from the source memory address continuously.
timesOptional[Union[PrimExpr, int]]: The total times of the stride jump.
dst_strideOptional[Union[PrimExpr, int]]: The number of destination data will be jump over for each stride jump. None means equal with the value of width, i.e., store to the destination memory address continuously.

Notes

The pointer type of src and dst must be same.
Only below scope combinations of src and dst are not supported.
- The scope of src is lsram and the scope of dst is lsram or shared.
- The scope of src is shared and the scope of dst is lsram or shared.

Examples

# The 1D scenario.
S.dma_copy(ptr_a, ptr_b, 16)

# The 2D scenario. Transfer all "@" in "@@@@@xxx@@@@@xxx@@@@@xxx@@@@@xxx" and store them
# continuously in destination.
S.dma_copy(ptr_a, ptr_b, width=5, src_stride=8, times=4)

Parameters

dstPointer: The pointer that store the destination memory address.
srcPointer: The pointer that store the source memory address.
widthUnion[PrimExpr, int]: The number of data to be transfer inside one stride jump.
src_strideOptional[Union[PrimExpr, int]]: The number of source data will be jump over for each stride jump. None means equal with the value of width, i.e., load from the source memory address continuously.
timesOptional[Union[PrimExpr, int]]: The total times of the stride jump.
dst_strideOptional[Union[PrimExpr, int]]: The number of destination data will be jump over for each stride jump. None means equal with the value of width, i.e., store to the destination memory address continuously.
eventPrimExpr: The event need to be triggered when the entire data transmission is completed. Note if the event is using by others, the DMA hardware will be blocked until the event is triggered by others, then the data transmission will start. The API S.wait_events can be used to wait the DMA operation to finish.

Notes

The pointer type of src and dst must be same.
Only below scope combinations of src and dst are not supported.
- The scope of src is lsram and the scope of dst is lsram or shared.
- The scope of src is shared and the scope of dst is lsram or shared.

Examples

ev0 = S.alloc_events(1)

# The 1D scenario.
S.async_dma_copy(ptr_a, ptr_b, 16, event=ev0)
vc = va + vb
S.wait_events(ev0)

# The 2D scenario. Transfer all "@" in "@@@@@xxx@@@@@xxx@@@@@xxx@@@@@xxx" and store them
# continuously in destination.
S.async_dma_copy(ptr_a, ptr_b, width=5, src_stride=8, times=4, event=ev0)
vc = va + vb
S.wait_events(ev0)

Parameters

dst: Pointer: The pointer that store the destination memory address.
src: Pointer: The pointer that store the source memory address.
row: Union[PrimExpr, int]: The row to be transposed of input 2d data [row, col].
col: Union[PrimExpr, int]: The col to be transposed of input 2d data [row, col].
dst_stride: Optional[Union[PrimExpr, int]]: The width_stride of output 2d data. Default dst_stride = row.
src_stirde: Optional[Union[PrimExpr, int]]: The width_stride of input 2d data. Default src_stride = col.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

S.dma_transpose2d(dst, src, 8, 64)

tvm.aipu.script.ir.memory.dma_upsample(dst, src, h_scale, w_scale, c, w, src_c_stride=None, dst_c_stride=None, dst_w_stride=None)

DMA upsample data from source src to destination dst. Supports two directions between src and dst: 1. global -> [lsram, shared]. 2. [lsram, shared] -> global.

Note: 1. Each call of dma_upsample does a surface on 2D input in WC layout physically, not a 3D input. 2. If you want to upsample 3D input with h_scale, w_scale for H, W dimensions respectively, you need to call H times dma_upsample, where H is H dimension of input in HWC layout.

Parameters

dstPointer: The pointer that store the destination memory address.
srcPointer: The pointer that store the source memory address.
h_scaleUnion[PrimExpr, int]: The scale on h direction.
w_scaleUnion[PrimExpr, int]: The scale on w direction.
cUnion[PrimExpr, int]: The c of each move on source.
wUnion[PrimExpr, int]: The w of each move on source.
src_c_strideOptional[Union[PrimExpr, int]]: The c stride of each move on source. Default src_c_stride = c.
dst_c_strideOptional[Union[PrimExpr, int]]: The c stride of each move on destination. Default dst_c_stride = c.
dst_w_strideOptional[Union[PrimExpr, int]]: The w stride of each move on destination. Default dst_w_stride = w_scale * w.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

# Case0: an easy use case
S.dma_upsample(dst=ddr_ptr, src=sram_ptr, h_scale=2, w_scale=3, c=3, w=2)

# Source 2D data in WC layout(w=2, c=3)
# [
#   [0 1 2],
#   [3 4 5],
# ]
# Upsampled destination 3D data in HWC layout
# [
#   [
#     [0 1 2],
#     [0 1 2],
#     [0 1 2],
#
#     [3 4 5],
#     [3 4 5],
#     [3 4 5],
#   ],
#   [
#     [0 1 2],
#     [0 1 2],
#     [0 1 2],
#
#     [3 4 5],
#     [3 4 5],
#     [3 4 5],
#   ],
# ]

# Case1: a comprehensive use case
S.dma_upsample(dst=ddr_ptr, src=sram_ptr, h_scale=2, w_scale=3,
             c=3, w=2, src_c_stride=4, dst_c_stride=5, dst_w_stride=7)

# Source 2D data in WC layout(c=3, w=2, src_c_stride=4)
# [
#   [0 1 2 ?],
#   [3 4 5 ?],
# ]
# Upsampled destination 3D data in HWC layout(dst_w_stride=7, dst_c_stride=5)
# [
#   [
#     [0 1 2 ? ?],
#     [0 1 2 ? ?],
#     [0 1 2 ? ?],
#
#     [3 4 5 ? ?],
#     [3 4 5 ? ?],
#     [3 4 5 ? ?],
#
#     [? ? ? ? ?],
#   ],
#   [
#     [0 1 2 ? ?],
#     [0 1 2 ? ?],
#     [0 1 2 ? ?],
#
#     [3 4 5 ? ?],
#     [3 4 5 ? ?],
#     [3 4 5 ? ?],
#
#     [? ? ? ? ?],
#   ],
# ]

tvm.aipu.script.ir.memory.dma_memset(ptr, value, num)

Fills the first num elements of addr to the specific value.

Parameters

ptrPointer: The pointer that store the base memory address.
valueUnion[PrimExpr, int]: The same dtype as addr dtype. The value to be set.
numUnion[PrimExpr, int]: The number of scalar elements that need to be set.

Supported DType

“int8/16/32”, “uint8/16/32”, “float16/32”.

Examples

S.dma_memset(ptr_a, 0, 128)

tvm.aipu.script.ir.memory.flush_cache(invalidate=True)

Flushes the whole level 1 data cache by writing back to DDR.

Parameters

invalidatebool: Whether invalidates the data or not.

Examples

S.flush_cache()
S.flush_cache(invalidate=False)

Memory

Parameters

Returns

Examples

See Also

Parameters

Returns

Examples

Parameters

Returns

Examples

See Also

Parameters

Returns

Examples

See Also

Parameters

Returns

Examples

Parameters

Returns

Supported DType

Examples

See Also

Parameters

Supported DType

Examples

See Also

Parameters

Returns

Supported DType

Examples

See Also

Parameters

Supported DType

Examples

See Also

Parameters

Notes

Examples

See Also

Parameters

Notes

Examples

See Also

Parameters

Supported DType

Examples

Parameters

Supported DType

Examples

Parameters

Supported DType

Examples

Parameters

Examples

See Also