Miscellaneous
The miscellaneous part of IR APIs.
- tvm.aipu.script.ir.miscellaneous.const_mask(x)
Creates a constant mask through boolean array or a special formatted string.
The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
Parameters
- xUnion[Tuple[bool], List[bool], numpy.ndarray[bool], str]
The concrete boolean values of the constant mask. String can be used to represent a boolean array. The mask string only can contain uppercase character
T
,F
and decimal digit numbers,T
means True, andF
means False. To represent multiple repeated boolean patterns, just add numbers before the characters, e.g.,4T
meansTTTT
, and4TF
meansTFTFTFTF
.
Returns
- retPrimExpr
The const mask result.
Examples:
mask = S.const_mask([False] * 4 + [True] * 4) mask = S.const_mask("FFFFTTTT") mask = S.const_mask("4F4T") mask = S.const_mask("FFFF4T") mask = S.const_mask([False, True, False, True, True, False, True, False]) mask = S.const_mask("2FT2TF") c = S.vadd(a, b, mask=mask)
See Also
- tvm.aipu.script.ir.miscellaneous.tail_mask(n, lanes)
Creates a mask that the lowest
n
items are True, and others are False.The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
out = S.tail_mask(2, 8) out: T T F F F F F F
Parameters
- nUnion[PrimExpr, int]
The lowest item count will be set to True.
- lanesint
The total item count of the mask.
Returns
- retPrimExpr
The masked result.
Examples:
if tail != 0: mask = S.tail_mask(tail, 16)
See Also
- tvm.aipu.script.ir.miscellaneous.get_local_size()
Returns the number of local work-items specified in the dimension identified by dimension index. For the Zhouyi NPU, returns the TEC number.
Returns
- retint
The TEC number.
Examples
tec_num = S.get_local_size()
See Also
Zhouyi Compass OpenCL Programming Guide: get_local_size
- tvm.aipu.script.ir.miscellaneous.get_local_id()
Returns the unique local work-item ID value for the dimension identified by dimension index. For the Zhouyi NPU, returns the TEC ID from 0 to TEC_NUM-1.
Returns
- retint
The TEC ID.
Examples
tid = S.get_local_id()
See Also
Zhouyi Compass OpenCL Programming Guide: get_local_id
- tvm.aipu.script.ir.miscellaneous.tec_range(start, stop=None)
The explicit TEC parallel For statement.
Parameters
- startUnion[PrimExpr, int]
The start of For_range.
- stopUnion[PrimExpr, int]
The stop of For_range.
Returns
- retframe.ForFrame
The thread-binding For statement.
Note
If you pass only 1 argument to
S.tec_range
, it will automatically setstart = 0, stop = args[0]
.Examples
tec_num = S.get_local_size() for tid in S.tec_range(tec_num): xxx # tid = 0,1,2,3 for tid in S.tec_range(1,3): xxx # tid = 1,2
See Also
Zhouyi Compass OpenCL Programming Guide: get_local_id, get_local_size
- tvm.aipu.script.ir.miscellaneous.perf_tick_begin(cid)
Used for the profiler, begins recording the current tick count.
Parameters
- cid: int
The custom ID, unique identifier of code fragment for performance monitoring.
Examples
S.perf_tick_begin(0) for i in range(10): c[i : i + 8] = a[i : i + 8] + b[i : i + 8] S.perf_tick_end(0) S.perf_tick_begin(1) for i in range(10): xxx xxx S.perf_tick_end(1)
See Also
- tvm.aipu.script.ir.miscellaneous.perf_tick_end(cid)
Used for the profiler, ends recording the current tick count.
Parameters
- cid: int
The custom ID, unique identifier of code fragment for performance monitoring.
Examples
S.perf_tick_begin(0) for i in range(10): c[i : i + 8] = a[i : i + 8] + b[i : i + 8] S.perf_tick_end(0) S.perf_tick_begin(1) for i in range(10): xxx xxx S.perf_tick_end(1)
See Also
- tvm.aipu.script.ir.miscellaneous.aiff(ctrl_desc, param_desc, act_desc, mode=0)
Do AIFF computation synchronously.
Parameters
- ctrl_descPointer
The pointer that store the control descriptor address.
- param_descPointer
The pointer that store the parameter descriptor address.
- act_descPointer
The pointer that store the activation descriptor address.
- modeOptional[int]
The op mode.
0: Full AIFF descriptor.
4: Broadcast mode. This value is only supported by the V3 architecture.
Examples
S.aiff(ctrl_desc, param_desc, act_desc)
- tvm.aipu.script.ir.miscellaneous.async_aiff(ctrl_desc, param_desc, act_desc, event, mode=0)
Do AIFF computation asynchronously.
Parameters
- ctrl_descPointer
The pointer that store the control descriptor address.
- param_descPointer
The pointer that store the parameter descriptor address.
- act_descPointer
The pointer that store the activation descriptor address.
- eventPrimExpr
The event need to be triggered when the AIFF computation is completed. Note if the event is using by others, the AIFF hardware will be blocked until the event is triggered by others, then the AIFF computation will start. The API
S.wait_events
can be used to wait the AIFF computation operation to finish.- modeOptional[int]
The op mode.
0: Full AIFF descriptor.
4: Broadcast mode. This value is only supported by the V3 architecture.
Examples
ev = S.alloc_event(1) S.async_aiff(ctrl_desc, param_desc, act_desc, ev) vc = va + vb S.wait_event(ev)
- tvm.aipu.script.ir.miscellaneous.printf(fmt, *args)
The printf built-in function, same as C printf.
Parameters
- fmtstr
The format string, such as
x=%d
.- *argsOptional[Union[Tuple[PrimExpr, int, float]]]
The items to be printed.
Note
In extremely rare cases, the results of DSL programs differ between using and not using
printf
, one possible reason is that usingprintf
may make the optimization of the underlying OpenCL compiler conservative.Examples
# Scalar S.printf("tec_num = %d\n", S.get_local_size()) S.printf("tec_num = %d, tec_id = %d\n", tec_num, S.get_local_id()) # Integer vector S.printf("%v32hhx, %v32hhx\n", va_i8x32, vb_u8x32) S.printf("%v16hx, %v16hx\n", S.i16x16(i16_max), S.u16x16(u16_max)) S.printf("%v8hlx, %v8hlx\n", va_i32x8, vb_u32x8) # Floating vector S.printf("%v8hlf, %v8hlf\n", S.fp32x8(1.25), va_fp32x8) S.printf("%v16hf, %v16hf\n", S.fp16x16(1.2345678), va_fp16x16)
See Also
Zhouyi Compass OpenCL Programming Guide: printf
- tvm.aipu.script.ir.miscellaneous.asm(template, outputs=None, inputs=None, clobbers=None, qualifiers=None)
Insert assembly instructions in Compass DSL source code.
Parameters
- templatestr
The literal string that consist of assembly code.
- outputsOptional[Dict[str, Tuple[str, PrimExpr]]]
The output dictionary which key is the symbolic name in template and value is a two element tuple. The first element of the tuple is the constraint string, and the second one is the variable in Compass DSL code that need replace the corresponding symbolic name in template.
- inputsOptional[dict]
The input dictionary which key is the symbolic name in template and value is a two element tuple. The first element of the tuple is the constraint string, and the second one is the variable in Compass DSL code that need replace the corresponding symbolic name in template.
- clobbersOptional[Union[Tuple[str], List[str]]
The registers or other values that are changed by the assembler template, beyond those listed in the output dictionary.
- qualifiersOptional[str]
The qualifier for Compass OpenCL compiler, valid choices: (
None
,"inline"
,"volatile"
).
Examples
S.asm( "{add t0.b, %[inp].b, 1; } " "{add %[out].b, %[inp].b, t0.b;}", outputs={"out": ("=&t", y)}, inputs={"inp": ("t", x)}, clobbers=["t0"], qualifiers="volatile", )
See Also
Zhouyi Compass OpenCL Programming Guide: Inline assembly, __asm__