Miscellaneous

The miscellaneous part of IR APIs.

tvm.aipu.script.ir.miscellaneous.const_mask(x)

Creates a constant mask through boolean array or a special formatted string.

  • The feature Flexible Width Vector is supported.

  • The feature Multiple Width Vector is supported.

Parameters

xUnion[Tuple[bool], List[bool], numpy.ndarray[bool], str]

The concrete boolean values of the constant mask. String can be used to represent a boolean array. The mask string only can contain uppercase character T, F and decimal digit numbers, T means True, and F means False. To represent multiple repeated boolean patterns, just add numbers before the characters, e.g., 4T means TTTT, and 4TF means TFTFTFTF.

Returns

retPrimExpr

The const mask result.

Examples:

mask = S.const_mask([False] * 4 + [True] * 4)
mask = S.const_mask("FFFFTTTT")
mask = S.const_mask("4F4T")
mask = S.const_mask("FFFF4T")

mask = S.const_mask([False, True, False, True, True, False, True, False])
mask = S.const_mask("2FT2TF")

c = S.vadd(a, b, mask=mask)

See Also

tvm.aipu.script.ir.miscellaneous.tail_mask(n, lanes)

Creates a mask that the lowest n items are True, and others are False.

  • The feature Flexible Width Vector is supported.

  • The feature Multiple Width Vector is supported.

out = S.tail_mask(2, 8)
out: T  T  F  F  F  F  F  F

Parameters

nUnion[PrimExpr, int]

The lowest item count will be set to True.

lanesint

The total item count of the mask.

Returns

retPrimExpr

The masked result.

Examples:

if tail != 0:
    mask = S.tail_mask(tail, 16)

See Also

tvm.aipu.script.ir.miscellaneous.get_local_size()

Returns the number of local work-items specified in the dimension identified by dimension index. For the Zhouyi NPU, returns the TEC number.

Returns

retint

The TEC number.

Examples

tec_num = S.get_local_size()

See Also

  • Zhouyi Compass OpenCL Programming Guide: get_local_size

tvm.aipu.script.ir.miscellaneous.get_local_id()

Returns the unique local work-item ID value for the dimension identified by dimension index. For the Zhouyi NPU, returns the TEC ID from 0 to TEC_NUM-1.

Returns

retint

The TEC ID.

Examples

tid = S.get_local_id()

See Also

  • Zhouyi Compass OpenCL Programming Guide: get_local_id

tvm.aipu.script.ir.miscellaneous.tec_range(start, stop=None)

The explicit TEC parallel For statement.

Parameters

startUnion[PrimExpr, int]

The start of For_range.

stopUnion[PrimExpr, int]

The stop of For_range.

Returns

retframe.ForFrame

The thread-binding For statement.

Note

If you pass only 1 argument to S.tec_range, it will automatically set start = 0, stop = args[0].

Examples

tec_num = S.get_local_size()
for tid in S.tec_range(tec_num):
    xxx  # tid = 0,1,2,3

for tid in S.tec_range(1,3):
    xxx  # tid = 1,2

See Also

  • Zhouyi Compass OpenCL Programming Guide: get_local_id, get_local_size

tvm.aipu.script.ir.miscellaneous.perf_tick_begin(cid)

Used for the profiler, begins recording the current tick count.

Parameters

cid: int

The custom ID, unique identifier of code fragment for performance monitoring.

Examples

S.perf_tick_begin(0)
for i in range(10):
    c[i : i + 8] = a[i : i + 8] + b[i : i + 8]
S.perf_tick_end(0)

S.perf_tick_begin(1)
for i in range(10):
    xxx
    xxx
S.perf_tick_end(1)

See Also

tvm.aipu.script.ir.miscellaneous.perf_tick_end(cid)

Used for the profiler, ends recording the current tick count.

Parameters

cid: int

The custom ID, unique identifier of code fragment for performance monitoring.

Examples

S.perf_tick_begin(0)
for i in range(10):
    c[i : i + 8] = a[i : i + 8] + b[i : i + 8]
S.perf_tick_end(0)

S.perf_tick_begin(1)
for i in range(10):
    xxx
    xxx
S.perf_tick_end(1)

See Also

tvm.aipu.script.ir.miscellaneous.aiff(ctrl_desc, param_desc, act_desc, mode=0)

Do AIFF computation synchronously.

Parameters

ctrl_descPointer

The pointer that store the control descriptor address.

param_descPointer

The pointer that store the parameter descriptor address.

act_descPointer

The pointer that store the activation descriptor address.

modeOptional[int]

The op mode.

  • 0: Full AIFF descriptor.

  • 4: Broadcast mode. This value is only supported by the V3 architecture.

Examples

S.aiff(ctrl_desc, param_desc, act_desc)
tvm.aipu.script.ir.miscellaneous.async_aiff(ctrl_desc, param_desc, act_desc, event, mode=0)

Do AIFF computation asynchronously.

Parameters

ctrl_descPointer

The pointer that store the control descriptor address.

param_descPointer

The pointer that store the parameter descriptor address.

act_descPointer

The pointer that store the activation descriptor address.

eventPrimExpr

The event need to be triggered when the AIFF computation is completed. Note if the event is using by others, the AIFF hardware will be blocked until the event is triggered by others, then the AIFF computation will start. The API S.wait_events can be used to wait the AIFF computation operation to finish.

modeOptional[int]

The op mode.

  • 0: Full AIFF descriptor.

  • 4: Broadcast mode. This value is only supported by the V3 architecture.

Examples

ev = S.alloc_event(1)
S.async_aiff(ctrl_desc, param_desc, act_desc, ev)
vc = va + vb
S.wait_event(ev)
tvm.aipu.script.ir.miscellaneous.printf(fmt, *args)

The printf built-in function, same as C printf.

Parameters

fmtstr

The format string, such as x=%d.

*argsOptional[Union[Tuple[PrimExpr, int, float]]]

The items to be printed.

Note

In extremely rare cases, the results of DSL programs differ between using and not using printf, one possible reason is that using printf may make the optimization of the underlying OpenCL compiler conservative.

Examples

# Scalar
S.printf("tec_num = %d\n", S.get_local_size())
S.printf("tec_num = %d, tec_id = %d\n", tec_num, S.get_local_id())

# Integer vector
S.printf("%v32hhx, %v32hhx\n", va_i8x32, vb_u8x32)
S.printf("%v16hx, %v16hx\n", S.i16x16(i16_max), S.u16x16(u16_max))
S.printf("%v8hlx, %v8hlx\n", va_i32x8, vb_u32x8)

# Floating vector
S.printf("%v8hlf, %v8hlf\n", S.fp32x8(1.25), va_fp32x8)
S.printf("%v16hf, %v16hf\n", S.fp16x16(1.2345678), va_fp16x16)

See Also

  • Zhouyi Compass OpenCL Programming Guide: printf

tvm.aipu.script.ir.miscellaneous.asm(template, outputs=None, inputs=None, clobbers=None, qualifiers=None)

Insert assembly instructions in Compass DSL source code.

Parameters

templatestr

The literal string that consist of assembly code.

outputsOptional[Dict[str, Tuple[str, PrimExpr]]]

The output dictionary which key is the symbolic name in template and value is a two element tuple. The first element of the tuple is the constraint string, and the second one is the variable in Compass DSL code that need replace the corresponding symbolic name in template.

inputsOptional[dict]

The input dictionary which key is the symbolic name in template and value is a two element tuple. The first element of the tuple is the constraint string, and the second one is the variable in Compass DSL code that need replace the corresponding symbolic name in template.

clobbersOptional[Union[Tuple[str], List[str]]

The registers or other values that are changed by the assembler template, beyond those listed in the output dictionary.

qualifiersOptional[str]

The qualifier for Compass OpenCL compiler, valid choices: (None, "inline", "volatile").

Examples

S.asm(
    "{add t0.b, %[inp].b, 1;
}
"
    "{add %[out].b, %[inp].b, t0.b;}",
    outputs={"out": ("=&t", y)},
    inputs={"inp": ("t", x)},
    clobbers=["t0"],
    qualifiers="volatile",
)

See Also

  • Zhouyi Compass OpenCL Programming Guide: Inline assembly, __asm__