Script API
Zhouyi Compass script APIs.
- tvm.aipu.script.prim_func(func=None, is_entry=False, is_inline=None)
Decorator for prim_func definitions.
Parameters
- funcCallable
The function to be parsed as prim func.
- is_entry: Optional[bool]
Whether the function is entry kernel. If this option is not set, it will automatically perform function dependency analysis to infer which function is the entry kernel function.
- is_inlineOptional[bool]
Whether the function is treated as an inline function in codegen.
None: codegen will not add any attribute for the current function.
True: codegen will add “_CLC_INLINE” attr for the current function.
False: codegen will add “_CLC_NOINLINE” attr for the current function.
Examples
@S.prim_func def func(xxx): xxx @S.prim_func(is_inline=False) def utils_func(xxx): xxx @S.prim_func(is_entry=True) def kernel_func(xxx): utils_func(xxx) func(xxx)
- tvm.aipu.script.macro(*args, hygienic=True)
Decorator for macro definitions.
Parameters
- hygienic: Optional[bool]
Specifies whether the macro is hygienic or not.
A macro is hygienic if all symbols used in the macro’s body are resolved to values from the location of the macro definition. A non-hygienic macro will have its symbols resolved to values at the time of macro use.
Examples
from tvm.aipu import script as S x_value = 128 @S.macro(hygienic=True) def static_capture(A, B): B[x_value] = A[x_value] ## x_value binds to 128 @S.macro(hygienic=False) def dynamic_capture(A, B): B[x_value] = A[x_value] ## x_value will bind at the time of use @S.prim_func def use1(A: S.ptr("fp32", "global"), B: S.ptr("fp32", "global")): for x_value in range(10): static_capture(A, B) ## Produces B[128] = A[128] @S.prim_func def use2(A: S.ptr("fp32", "global"), B: S.ptr("fp32", "global")): for x_value in range(10): dynamic_capture(A, B) ## Produces B[x_value] = A[x_value]
See Also
- class tvm.tir.Pointer
Represents the concept that corresponds to the C/C++ pointer.
Just like the pointer of C/C++, below operations are supported.
Move forward and backward through adding or subtracting an integer value.
Read and write data as a 1-dimension array.
Check whether it is a null pointer or not.
Compare with the other pointer instance.
Cast to another type pointer.
- property is_nullptr
Check whether the current pointer instance is a null pointer or not.
- as_ptr(dtype)
Cast to another pointer whose data type is the given one.
Parameters
- dtypeUnion[str, DataType]
The target data type.
Returns
- retPointer
The new temporary pointer instance.
- __getitem__(indices)
Read data as a 1-dimension array.
Parameters
- indicesUnion[sgentype, slice]
The index used to access the concrete data. Multiple data will be read if it is a slice.
Returns
- retsingle or multiple times of the data type of the pointer
The result data that is read out.
- __add__(other)
Move the pointer to the higher address space.
Parameters
- othersgentype
The step that the pointer will be moved, in units of data type of the pointer.
Returns
- retPointer
The new temporary pointer instance.
- __radd__(other)
Move the pointer to the higher address space.
Parameters
- othersgentype
The step that the pointer will be moved, in units of data type of the pointer.
Returns
- retPointer
The new temporary pointer instance.
- __sub__(other)
Move the pointer to the lower address space.
Parameters
- othersgentype
The step that the pointer will be moved, in units of data type of the pointer.
Returns
- retPointer
The new temporary pointer instance.
- __lt__(other)
Check whether the address that the pointer represents is < that of “other”.
Parameters
- otherPointer
The other pointer instance that will be compared with.
Returns
- retbool
The compare result.
- __le__(other)
Check whether the address that the pointer represents is <= that of “other”.
Parameters
- otherPointer
The other pointer instance that will be compared with.
Returns
- retbool
The compare result.
- __gt__(other)
Check whether the address that the pointer represents is > that of “other”.
Parameters
- otherPointer
The other pointer instance that will be compared with.
Returns
- retbool
The compare result.
- __ge__(other)
Check whether the address that the pointer represents is >= that of “other”.
Parameters
- otherPointer
The other pointer instance that will be compared with.
Returns
- retbool
The compare result.
- class tvm.aipu.tir.BuildManager
The user interface of DSL program compilation.
Examples
@S.prim_func def add_func(xxx): xxx bm = aipu.tir.BuildManager(target="X2_1204") mod = bm.lower(add_func) ex = bm.build(add_func)
- __init__(target='X2_1204', output_dir=None, cc_options='', disabled_pass=None)
Constructor.
Parameters
- targetUnion[str, tvm.target.Target]
The target that it is built for. It can be a literal target string or a tvm.target.Target object.
- output_dirOptional[str]
The directory to store all files generated during DSL program compilation and execution. If not set, a temporary directory inside the current working directory will be used.
- cc_optionsOptional[str]
The extra compilation options that need to be passed to the Compass OpenCL compiler.
- disabled_passOptional[Union[List[str], Tuple[str]]]
The passes need to be disabled during DSL program compilation.
- lower(inp, args=None, name=None, binds=None)
The lower interface of DSL program compilation.
Parameters
- inpUnion[tvm.te.Schedule, tvm.tir.PrimFunc, tvm.IRModule]
The TE schedule or TensorIR PrimFunc/IRModule to be lowered.
- argsOptional[List[Union[tvm.tir.Buffer, tvm.te.Tensor, tvm.tir.Var]]]
The argument lists to the function for TE schedule. It should be None if you want to lower TensorIR.
- nameOptional[str]
The name of the result entry TensorIR PrimFunc. It is required for TE schedule, optional for TensorIR PrimFunc, and unused for TensorIR IRModule. For TensorIR PrimFunc, if set, it will override the current value of attribute “global_symbol” of the entry TensorIR PrimFunc.
- bindsOptional[Mapping[tvm.te.Tensor, tvm.tir.Buffer]]
Dictionary that maps the Tensor to Buffer which specified the data layout requirement of the function. By default, a new compact buffer is created for each tensor in the argument.
Returns
- rettvm.IRModule
The result IRModule.
- gen_op_lib(inp, args=None, name=None, binds=None, output_path=None, verbose=False)
Generate Compass OpenCL code, compile it, and save it to the specified path.
Parameters
- inpUnion[tvm.te.Schedule, tvm.tir.PrimFunc, tvm.IRModule]
The TE schedule or TensorIR PrimFunc/IRModule to be lowered.
- argsOptional[List[Union[tvm.tir.Buffer, tvm.te.Tensor, tvm.tir.Var]]]
The argument lists to the function for TE schedule. It should be None if you want to lower TensorIR.
- nameOptional[str]
The name of the result entry TensorIR PrimFunc. It is required for TE schedule, optional for TensorIR PrimFunc, and unused for TensorIR IRModule. For TensorIR PrimFunc, if set, it will override the current value of attribute “global_symbol” of the entry TensorIR PrimFunc.
- bindsOptional[Mapping[tvm.te.Tensor, tvm.tir.Buffer]]
Dictionary that maps the Tensor to Buffer which specified the data layout requirement of the function. By default, a new compact buffer is created for each tensor in the argument.
- output_pathOptional[str]
The path of the output object file that is compiled from the generated Compass OpenCL code file. If not set, it will be constructed using argument “output_dir” of the constructor and the name of entry PrimFunc.
- verboseOptional[bool]
Print output path if verbose is True. By default, it is False.
- build(inp, args=None, name=None, binds=None)
The build interface of DSL program compilation.
Parameters
- inpUnion[tvm.te.Schedule, tvm.tir.PrimFunc, tvm.IRModule]
The TE schedule or TensorIR PrimFunc/IRModule to be lowered.
- argsOptional[List[Union[tvm.tir.Buffer, tvm.te.Tensor, tvm.tir.Var]]]
The argument lists to the function for TE schedule. It should be None if you want to lower TensorIR.
- nameOptional[str]
The name of the result entry TensorIR PrimFunc. It is required for TE schedule, optional for TensorIR PrimFunc, and unused for TensorIR IRModule. For TensorIR PrimFunc, if set, it will override the current value of attribute “global_symbol” of the entry TensorIR PrimFunc.
- bindsOptional[Mapping[tvm.te.Tensor, tvm.tir.Buffer]]
Dictionary that maps the Tensor to Buffer which specified the data layout requirement of the function. By default, a new compact buffer is created for each tensor in the argument.
Returns
- rettvm.aipu.tir.executor.Executor
The object that is responsible for the subsequent execution job.
- class tvm.aipu.tir.executor.Executor
The class responsible for executing the DSL program.
Examples
@S.prim_func def add_func(a: S.ptr("i8", "global"), b: S.ptr("i8", "global"), n: S.i32): xxx bm = aipu.tir.BuildManager() ex = bm.build(add_func)
- property rpc_sess
Get or set the RPC session that is used to run a DSL program on a remote device. For setting, the value must be an instance of class “tvm.rpc.RPCSession” and already connected to the RPC server.
Examples
from tvm.aipu.utils import get_rpc_session ex.rpc_sess = get_rpc_session() ex(a, aipu_out, 100) # Run on remote device through RPC.
- run(*args)
The run end-to-end interface of DSL program execution.
Parameters
- argsList
The execution arguments, which should be aligned with the parameters of the DSL program.
Examples
from tvm.aipu.utils import rand a = rand(100, "int8") aipu_out = np.empty(100, dtype="int8") ex.run(a, aipu_out, 100) # Can also be "ex(a, aipu_out, 100)".
- benchmark(*args, repeat=2, number=3, min_repeat_ms=0, limit_zero_time_iterations=100, cooldown_interval_ms=0, repeats_to_cooldown=1)
Calculate runtime of a function by repeatedly calling it.
Use this function to get an accurate measurement of the runtime of a function. The function is run multiple times in order to account for variability in measurements, processor speed or other external factors. Mean, median, standard deviation, min and max runtime are all reported. On the AIPU specifically, synchonization and data transfer operations are not counted towards the runtime. This allows for fair comparison of runtimes across different functions and models.
The benchmarking loop looks approximately like so:
for r in range(repeat): time_start = now() for n in range(number): func_name() time_end = now() total_times.append((time_end - time_start)/number)
Parameters
- args: Sequence[Object]
Arguments to the function. These are cached before running timing code, so that data transfer costs are not counted in the runtime.
- repeat: Optional[int]
Number of times to run the outer loop of the timing code (see above). The output will contain repeat number of datapoints.
- number: Optional[int]
Number of times to run the inner loop of the timing code. This inner loop is run in between the timer starting and stopping. In order to amortize any timing overhead, number should be increased when the runtime of the function is small (less than a 1/10 of a millisecond).
- min_repeat_ms: Optional[int]
If set, the inner loop will be run until it takes longer than min_repeat_ms milliseconds. This can be used to ensure that the function is run enough to get an accurate measurement.
- limit_zero_time_iterations: Optional[int]
The maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements.
- cooldown_interval_ms: Optional[int]
The cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown.
- repeats_to_cooldown: Optional[int]
The number of repeats before the cooldown is activated.
Note
The function will be invoked (1 + number x repeat) times, with the first call discarded in case there is lazy initialization.
Returns
- ret: BenchmarkResult
Runtimes of the function. Use .mean to access the mean runtime, use .results to access the individual runtimes (in seconds).
Examples
print(ex.benchmark(a, aipu_out, n))
See Also
- profile(*args)
Collect accurate performance information on remote device and return total cycles.
Parameters
- argsList
The execution arguments, which should be aligned with the parameters of the DSL program.
Returns
- total_cyclesint
The total hardware cycles it took to execute the DSL program.
Examples
ex.profile(a, aipu_out, 100)
See Also
Common AIPU utilities.
- tvm.aipu.utils.get_rpc_session(session_timeout=600, rpc_key=None, tracker_host=None, tracker_port=None, priority=1)
Connect to the RPC tracker and get an RPC session with the RPC key.
Parameters
- session_timeoutOptional[float]
The duration of the session, which allows the server to kill the connection when duration is longer than this value. When duration is zero, it means that the request must always be kept alive.
- rpc_keyOptional[str]
The type key of the device. If rpc_key = “None”, get it from env “AIPU_TVM_RPC_KEY”.
- tracker_hostOptional[str]
The hostname or IP address of the RPC tracker. If tracker_host = “None”, get it from env “AIPU_TVM_RPC_TRACKER_IP”.
- tracker_port: Optional[int, str]
The port of the RPC tracker. If tracker_port = “None”, get it from env “AIPU_TVM_RPC_TRACKER_PORT”.
- priorityOptional[int]
The priority of the request. If priority = “None”, get it from env “AIPU_TVM_RPC_PRIORITY”.
Returns
- sesstvm.rpc.RPCSession
The RPC session that is already connected to the RPC server.
- tvm.aipu.utils.hw_native_vdtype(dtype)
Get the corresponding hardware native vector data type.
Parameters
- dtypeUnion[str, DataType]
The given data type, can be any scalar or vector data type except boolean ones.
Returns
- ret: DataType
The corresponding hardware native vector data type.
Examples
# Generate from string objects. i8x32 = hw_native_vdtype("int8") fp32x8 = hw_native_vdtype("float32") # Generate from DataType objects. u16x16 = hw_native_vdtype(DataType("uint16")) i32x8 = hw_native_vdtype(DataType("int32"))
See Also
- tvm.aipu.utils.rand(shape, dtype, low=None, high=None, enable_corner_values=True, return_python_type=False)
Random values in a given shape, dtype and [low, high) range (including low, excluding high).
Parameters
- shapeUnion[int, Tuple[int], List[int]]
The element number on which rand is performed.
- dtypestr
The data type.
- lowOptional[int, float]
The minimum threshold for the rand range.
- highOptional[int, float]
The maximum threshold for the rand range.
- enable_corner_valuesOptional[bool]
Whether the corner values are forced to be included. Note: 1. The corner values contain: low or dtype minimum value, high or dtype maximum value, and zero value when zero is in the random range. 2. When the value is True and the number of elements is less than the number of corner values, it is uncertain whether corner values are forced to be included: the existence of corner values depends on randomness.
- return_python_typeOptional[bool]
Whether return the result as Python native type or not, if it is False, the result are returned as NumPy type.
Returns
- out: Union[float, int, List[float], List[int], numpy.ndarray]
Rand values, scalar when shape is 1 or numpy.ndarray when shape is a tuple of int.
Examples
# Generate NumPy objects. ndarray_i8_a = rand(100, "int8") ndarray_fp16_b = rand((4, 16), "float16", low=-100, high=100) ndarray_int16_c = rand((1,), "int16") numpy_fp32_c = rand(1, low=0, "float32") # Generate Python native type objects. float_list_d = rand((2, 30), "float32", high=5.5, return_python_type=True) int_value_e = rand(1, "int32", enable_corner_values=False, return_python_type=True) int_list_f = rand((1,), "int8", return_python_type=True)
The official part of IR APIs.
- tvm.aipu.script.ir.ir.vectorized(start: PrimExpr, stop: PrimExpr = None, *, annotations: Dict[str, Any] = None) ForFrame
The vectorized For statement.
Parameters
- startPrimExpr
The minimum value of iteration.
- stopPrimExpr
The maximum value of iteration.
- annotationsDict[str, Any]
The optional annotations of the For statement.
Returns
- resframe.ForFrame
The ForFrame.
- tvm.aipu.script.ir.ir.block(name: str = '', no_realize: bool = False) BlockFrame
The block declaration statement.
Parameters
- namestr
The name of the block.
- no_realizebool
The flag whether to construct BlockRealize or Block.
Returns
- resframe.BlockFrame
The BlockFrame.
- tvm.aipu.script.ir.ir.grid(*extents: PrimExpr) ForFrame
The grid For statement.
Parameters
- extentsPrimExpr
The extents of the iteration.
Returns
- resframe.ForFrame
The ForFrame.
The axis part of IR APIs.
- tvm.aipu.script.ir.axis.spatial(dom: Union[Range, List[PrimExpr], Tuple[PrimExpr]], binding: PrimExpr, dtype: str = 'int32') Var
The spatial block axis defining function.
Parameters
- domUnion[Range, List[PrimExpr], Tuple[PrimExpr]]
The domain of the iteration variable.
- bindingPrimExpr
The binding value of the iteration variable.
- dtypestr
The data type of the iteration variable.
Returns
- resVar
The iteration variable.
- tvm.aipu.script.ir.axis.reduce(dom: Union[Range, List[PrimExpr], Tuple[PrimExpr]], binding: PrimExpr, dtype: str = 'int32') Var
The reduced block axis defining function.
Parameters
- domUnion[Range, List[PrimExpr], Tuple[PrimExpr]]
The domain of the iteration variable.
- bindingPrimExpr
The binding value of the iteration variable.
- dtypestr
The data type of the iteration variable.
Returns
- resVar
The iteration variable.
- tvm.aipu.script.ir.axis.remap(kinds: str, bindings: List[PrimExpr], dtype: str = 'int32') Union[List[Var], Var]
The block axis remapping function.
Parameters
- kindsstr
The types of the iteration variables.
- bindingsList[PrimExpr]
The binding values of the iteration variables.
- dtypestr
The data types of the iteration variables.
Returns
- resVar
The iteration variables.