How to Use a Pointer
This section describes how to use a pointer in Compass DSL.
Here is the outline:
Define a pointer
Use S.ptr(dtype) as the function argument
Match ptr with the specific shape
Convert Scalar Ptr to Vector ptr
Use ptr with offset
Use ptr and buffer in vload, dma_copy
Define a Pointer
In Compass DSL, we define a pointer with dtype and scope.
a = S.ptr(dtype,scope="global")
dtype:
Scalar dtype:
"int8","uint8","int16","uint6","int32","uint32","float16","float32","void"Vector dtype:
"int8x32","uint8x32","int16x16","uint16x16","int32x8","uint32x8","float16x16","float32x8"
You can also see dtype information in Data Type.
Note: If you want to use the void pointer, set
dtype="void".scope:
global: Represents the global DDR space of Address Space Extension region ID (ASID) 0.
global.1: Represents the global DDR space of ASID 1.
global.2: Represents the global DDR space of ASID 2.
global.3: Represents the global DDR space of ASID 3.
private: Represents the stack space of each TEC.
lsram: Represents the local SRAM space of each TEC.
shared: Represents the shared SRAM space between all TECs in the same core.
constant: Represents the global constant DDR space.
For more information about memory hierarchy, see Zhouyi NPU Architecture.
Use Pointer in Function Argument
You can use S.ptr(dtype), with specific dtype, but without specific shape at the function argument. For 1-Dimension data, you can directly access the data element with index:
@S.prim_func
def func(a: S.ptr("int8", "global"), b: S.ptr("int8", "global"), n: S.int32):
for i in range(n):
b[i] = a[i]
The generated code:
__kernel void func(__global char* a, __global char* b, int n) {
for (int i = 0; i < n; ++i) {
b[i] = a[i];
}
}
Match Pointer with Specific Shape
For multi-dimension data, you can use S.match_buffer to match the ptr with the specific shape.
@S.prim_func
def func(A: S.ptr("int8", "global"), B: S.ptr("int8", "global"), h: S.int32, w: S.int32):
a = S.match_buffer(A, shape=(h, w))
b = S.match_buffer(B, shape=(w, h))
for i, j in S.grid(h,w):
b[j, i]=a[i, j]
In the above example, A is a tir.Pointer, and a is a tir.Buffer, which supports multi-dimension index access a[i, j].
The generated code:
__kernel void func(__global char* a, __global char* b, int h, int w) {
for (int i = 0; i < h; ++i) {
for (int j = 0; j < w; ++j) {
b[((j * h) + i)] = a[((i * w) + j)];
}
}
}
Convert Scalar Pointer to Vector Pointer
Here is the first example with scalar dtype
@S.prim_func
def func(a: S.ptr("int8", "global"), b: S.ptr("int8", "global"), n: S.int32):
for i in range(n):
b[i] = a[i]
The generated code:
__kernel void func(__global char* a, __global char* b, int n) {
for (int i = 0; i < n; ++i) {
b[i] = a[i];
}
}
Here is the example with vector dtype. Note that the tail case is with scalar dtype.
@S.prim_func
def func(a: S.ptr("int8", "global"), b: S.ptr("int8", "global"), n: S.int32):
va = a.as_ptr("int8x32")
vb = b.as_ptr("int8x32")
for i in range(n // 32):
vb[i] = va[i]
tail_offset = n // 32 * 32
for i in range(n % 32):
b[tail_offset + i] = a[tail_offset + i]
Note: If you want to use the vector dtype, you should deal with the non-divisible cases manually.
Use Pointer with Offset
A simple example shows how to create a ptr with offset.
@S.prim_func
def func(a: S.ptr("int8", "global"), b: S.ptr("int8", "global")):
a1 = a + 8 # create a new ptr with offset
for i in range(8):
b[i] = a1[i]
The generated code:
__kernel void func(__global char* a, __global char* b) {
__global char* a1 = (a + 8);
for (int i = 0; i < 8; ++i) {
b[i] = a1[i];
}
}
This function is the sample implementation of b[0:8] = a[8:16].
Use Pointer and Buffer in vload, dma_copy
For S.vload, S.vstore, S.dma_copy, they support both types tir.Pointer/tir.Buffer for addr arguments.
If offset=0, directly use it:
@S.prim_func def func(a: S.ptr(dtype, "global")): lsram = S.alloc_buffer([512], dtype, scope="lsram") va = S.vload(a) # a: ptr vx = S.vload(lsram) # lsram: buffer S.dma_copy(lsram,a,8)
If offset is not zero, use
buffer.addr_of(offset)orptr + offset:@S.prim_func def func(a: S.ptr(dtype, "global")): # buffer lsram = S.alloc_buffer([32], dtype, scope="lsram") # ptr lsram_ptr = lsram.addr_of(0) offset = 8 # lsram.addr_of(offset) S.dma_copy(lsram.addr_of(offset), b + offset, 8) va = S.vload(lsram.addr_of(offset)) # lsram_ptr + offset S.dma_copy(lsram_ptr + offset, b + offset, 8) va = S.vload(lsram_ptr + offset)