Script API

Zhouyi Compass script APIs.

tvm.aipu.script.prim_func(func=None, is_entry=False, is_inline=None)

Decorator for prim_func definitions.

Parameters

funcCallable

The function to be parsed as prim func.

is_entry: Optional[bool]

Whether the function is entry kernel. If this option is not set, it will automatically perform function dependency analysis to infer which function is the entry kernel function.

is_inlineOptional[bool]

Whether the function is treated as an inline function in codegen.

  • None: codegen will not add any attribute for the current function.

  • True: codegen will add “_CLC_INLINE” attr for the current function.

  • False: codegen will add “_CLC_NOINLINE” attr for the current function.

Examples

@S.prim_func
def func(xxx):
    xxx

@S.prim_func(is_inline=False)
def utils_func(xxx):
    xxx

@S.prim_func(is_entry=True)
def kernel_func(xxx):
    utils_func(xxx)
    func(xxx)
tvm.aipu.script.macro(*args, hygienic=True)

Decorator for macro definitions.

Parameters

hygienic: Optional[bool]

Specifies whether the macro is hygienic or not.

A macro is hygienic if all symbols used in the macro’s body are resolved to values from the location of the macro definition. A non-hygienic macro will have its symbols resolved to values at the time of macro use.

Examples

from tvm.aipu import script as S

x_value = 128

@S.macro(hygienic=True)
def static_capture(A, B):
    B[x_value] = A[x_value]     ## x_value binds to 128

@S.macro(hygienic=False)
def dynamic_capture(A, B):
    B[x_value] = A[x_value]     ## x_value will bind at the time of use


@S.prim_func
def use1(A: S.ptr("fp32", "global"), B: S.ptr("fp32", "global")):
    for x_value in range(10):
        static_capture(A, B)    ## Produces B[128] = A[128]

@S.prim_func
def use2(A: S.ptr("fp32", "global"), B: S.ptr("fp32", "global")):
    for x_value in range(10):
        dynamic_capture(A, B)   ## Produces B[x_value] = A[x_value]

See Also

class tvm.tir.Pointer

Represents the concept that corresponds to the C/C++ pointer.

Just like the pointer of C/C++, below operations are supported.

  • Move forward and backward through adding or subtracting an integer value.

  • Read and write data as a 1-dimension array.

  • Check whether it is a null pointer or not.

  • Compare with the other pointer instance.

  • Cast to another type pointer.

property is_nullptr

Check whether the current pointer instance is a null pointer or not.

as_ptr(dtype)

Cast to another pointer whose data type is the given one.

Parameters

dtypeUnion[str, DataType]

The target data type.

Returns

retPointer

The new temporary pointer instance.

__getitem__(indices)

Read data as a 1-dimension array.

Parameters

indicesUnion[sgentype, slice]

The index used to access the concrete data. Multiple data will be read if it is a slice.

Returns

retsingle or multiple times of the data type of the pointer

The result data that is read out.

__add__(other)

Move the pointer to the higher address space.

Parameters

othersgentype

The step that the pointer will be moved, in units of data type of the pointer.

Returns

retPointer

The new temporary pointer instance.

__radd__(other)

Move the pointer to the higher address space.

Parameters

othersgentype

The step that the pointer will be moved, in units of data type of the pointer.

Returns

retPointer

The new temporary pointer instance.

__sub__(other)

Move the pointer to the lower address space.

Parameters

othersgentype

The step that the pointer will be moved, in units of data type of the pointer.

Returns

retPointer

The new temporary pointer instance.

__lt__(other)

Check whether the address that the pointer represents is < that of “other”.

Parameters

otherPointer

The other pointer instance that will be compared with.

Returns

retbool

The compare result.

__le__(other)

Check whether the address that the pointer represents is <= that of “other”.

Parameters

otherPointer

The other pointer instance that will be compared with.

Returns

retbool

The compare result.

__gt__(other)

Check whether the address that the pointer represents is > that of “other”.

Parameters

otherPointer

The other pointer instance that will be compared with.

Returns

retbool

The compare result.

__ge__(other)

Check whether the address that the pointer represents is >= that of “other”.

Parameters

otherPointer

The other pointer instance that will be compared with.

Returns

retbool

The compare result.

__eq__(other)

Check whether the address that the pointer represents is == that of “other”.

Parameters

otherPointer

The other pointer instance that will be compared with.

Returns

retbool

The compare result.

__ne__(other)

Check whether the address that the pointer represents is != that of “other”.

Parameters

otherPointer

The other pointer instance that will be compared with.

Returns

retbool

The compare result.

class tvm.aipu.tir.BuildManager

The user interface of DSL program compilation.

Examples

@S.prim_func
def add_func(xxx):
    xxx

bm = aipu.tir.BuildManager(target="X2_1204")
mod = bm.lower(add_func)
ex = bm.build(add_func)
__init__(target='X2_1204', output_dir=None, cc_options='', disabled_pass=None)

Constructor.

Parameters

targetUnion[str, tvm.target.Target]

The target that it is built for. It can be a literal target string or a tvm.target.Target object.

output_dirOptional[str]

The directory to store all files generated during DSL program compilation and execution. If not set, a temporary directory inside the current working directory will be used.

cc_optionsOptional[str]

The extra compilation options that need to be passed to the Compass OpenCL compiler.

disabled_passOptional[Union[List[str], Tuple[str]]]

The passes need to be disabled during DSL program compilation.

lower(inp, args=None, name=None, binds=None)

The lower interface of DSL program compilation.

Parameters

inpUnion[tvm.te.Schedule, tvm.tir.PrimFunc, tvm.IRModule]

The TE schedule or TensorIR PrimFunc/IRModule to be lowered.

argsOptional[List[Union[tvm.tir.Buffer, tvm.te.Tensor, tvm.tir.Var]]]

The argument lists to the function for TE schedule. It should be None if you want to lower TensorIR.

nameOptional[str]

The name of the result entry TensorIR PrimFunc. It is required for TE schedule, optional for TensorIR PrimFunc, and unused for TensorIR IRModule. For TensorIR PrimFunc, if set, it will override the current value of attribute “global_symbol” of the entry TensorIR PrimFunc.

bindsOptional[Mapping[tvm.te.Tensor, tvm.tir.Buffer]]

Dictionary that maps the Tensor to Buffer which specified the data layout requirement of the function. By default, a new compact buffer is created for each tensor in the argument.

Returns

rettvm.IRModule

The result IRModule.

gen_op_lib(inp, args=None, name=None, binds=None, output_path=None, verbose=False)

Generate Compass OpenCL code, compile it, and save it to the specified path.

Parameters

inpUnion[tvm.te.Schedule, tvm.tir.PrimFunc, tvm.IRModule]

The TE schedule or TensorIR PrimFunc/IRModule to be lowered.

argsOptional[List[Union[tvm.tir.Buffer, tvm.te.Tensor, tvm.tir.Var]]]

The argument lists to the function for TE schedule. It should be None if you want to lower TensorIR.

nameOptional[str]

The name of the result entry TensorIR PrimFunc. It is required for TE schedule, optional for TensorIR PrimFunc, and unused for TensorIR IRModule. For TensorIR PrimFunc, if set, it will override the current value of attribute “global_symbol” of the entry TensorIR PrimFunc.

bindsOptional[Mapping[tvm.te.Tensor, tvm.tir.Buffer]]

Dictionary that maps the Tensor to Buffer which specified the data layout requirement of the function. By default, a new compact buffer is created for each tensor in the argument.

output_pathOptional[str]

The path of the output object file that is compiled from the generated Compass OpenCL code file. If not set, it will be constructed using argument “output_dir” of the constructor and the name of entry PrimFunc.

verboseOptional[bool]

Print output path if verbose is True. By default, it is False.

build(inp, args=None, name=None, binds=None)

The build interface of DSL program compilation.

Parameters

inpUnion[tvm.te.Schedule, tvm.tir.PrimFunc, tvm.IRModule]

The TE schedule or TensorIR PrimFunc/IRModule to be lowered.

argsOptional[List[Union[tvm.tir.Buffer, tvm.te.Tensor, tvm.tir.Var]]]

The argument lists to the function for TE schedule. It should be None if you want to lower TensorIR.

nameOptional[str]

The name of the result entry TensorIR PrimFunc. It is required for TE schedule, optional for TensorIR PrimFunc, and unused for TensorIR IRModule. For TensorIR PrimFunc, if set, it will override the current value of attribute “global_symbol” of the entry TensorIR PrimFunc.

bindsOptional[Mapping[tvm.te.Tensor, tvm.tir.Buffer]]

Dictionary that maps the Tensor to Buffer which specified the data layout requirement of the function. By default, a new compact buffer is created for each tensor in the argument.

Returns

rettvm.aipu.tir.executor.Executor

The object that is responsible for the subsequent execution job.

class tvm.aipu.tir.executor.Executor

The class responsible for executing the DSL program.

Examples

@S.prim_func
def add_func(a: S.ptr("i8", "global"), b: S.ptr("i8", "global"), n: S.i32):
    xxx

bm = aipu.tir.BuildManager()
ex = bm.build(add_func)
property c_code

Get the Compass OpenCL code of the DSL program.

Examples

print(ex.c_code)
property rpc_sess

Get or set the RPC session that is used to run a DSL program on a remote device. For setting, the value must be an instance of class “tvm.rpc.RPCSession” and already connected to the RPC server.

Examples

from tvm.aipu.utils import get_rpc_session

ex.rpc_sess = get_rpc_session()
ex(a, aipu_out, 100)  # Run on remote device through RPC.
run(*args)

The run end-to-end interface of DSL program execution.

Parameters

argsList

The execution arguments, which should be aligned with the parameters of the DSL program.

Examples

from tvm.aipu.utils import rand

a = rand(100, "int8")
aipu_out = np.empty(100, dtype="int8")
ex.run(a, aipu_out, 100)  # Can also be "ex(a, aipu_out, 100)".
benchmark(*args, repeat=2, number=3, min_repeat_ms=0, limit_zero_time_iterations=100, cooldown_interval_ms=0, repeats_to_cooldown=1)

Calculate runtime of a function by repeatedly calling it.

Use this function to get an accurate measurement of the runtime of a function. The function is run multiple times in order to account for variability in measurements, processor speed or other external factors. Mean, median, standard deviation, min and max runtime are all reported. On the AIPU specifically, synchonization and data transfer operations are not counted towards the runtime. This allows for fair comparison of runtimes across different functions and models.

The benchmarking loop looks approximately like so:

for r in range(repeat):
    time_start = now()
    for n in range(number):
        func_name()
    time_end = now()
    total_times.append((time_end - time_start)/number)

Parameters

args: Sequence[Object]

Arguments to the function. These are cached before running timing code, so that data transfer costs are not counted in the runtime.

repeat: Optional[int]

Number of times to run the outer loop of the timing code (see above). The output will contain repeat number of datapoints.

number: Optional[int]

Number of times to run the inner loop of the timing code. This inner loop is run in between the timer starting and stopping. In order to amortize any timing overhead, number should be increased when the runtime of the function is small (less than a 1/10 of a millisecond).

min_repeat_ms: Optional[int]

If set, the inner loop will be run until it takes longer than min_repeat_ms milliseconds. This can be used to ensure that the function is run enough to get an accurate measurement.

limit_zero_time_iterations: Optional[int]

The maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements.

cooldown_interval_ms: Optional[int]

The cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown.

repeats_to_cooldown: Optional[int]

The number of repeats before the cooldown is activated.

Note

The function will be invoked (1 + number x repeat) times, with the first call discarded in case there is lazy initialization.

Returns

ret: BenchmarkResult

Runtimes of the function. Use .mean to access the mean runtime, use .results to access the individual runtimes (in seconds).

Examples

print(ex.benchmark(a, aipu_out, n))

See Also

profile(*args)

Collect accurate performance information on remote device and return total cycles.

Parameters

argsList

The execution arguments, which should be aligned with the parameters of the DSL program.

Returns

total_cyclesint

The total hardware cycles it took to execute the DSL program.

Examples

ex.profile(a, aipu_out, 100)

See Also

Common AIPU utilities.

tvm.aipu.utils.get_rpc_session(session_timeout=600, rpc_key=None, tracker_host=None, tracker_port=None, priority=1)

Connect to the RPC tracker and get an RPC session with the RPC key.

Parameters

session_timeoutOptional[float]

The duration of the session, which allows the server to kill the connection when duration is longer than this value. When duration is zero, it means that the request must always be kept alive.

rpc_keyOptional[str]

The type key of the device. If rpc_key = “None”, get it from env “AIPU_TVM_RPC_KEY”.

tracker_hostOptional[str]

The hostname or IP address of the RPC tracker. If tracker_host = “None”, get it from env “AIPU_TVM_RPC_TRACKER_IP”.

tracker_port: Optional[int, str]

The port of the RPC tracker. If tracker_port = “None”, get it from env “AIPU_TVM_RPC_TRACKER_PORT”.

priorityOptional[int]

The priority of the request. If priority = “None”, get it from env “AIPU_TVM_RPC_PRIORITY”.

Returns

sesstvm.rpc.RPCSession

The RPC session that is already connected to the RPC server.

tvm.aipu.utils.hw_native_vdtype(dtype)

Get the corresponding hardware native vector data type.

Parameters

dtypeUnion[str, DataType]

The given data type, can be any scalar or vector data type except boolean ones.

Returns

ret: DataType

The corresponding hardware native vector data type.

Examples

# Generate from string objects.
i8x32 = hw_native_vdtype("int8")
fp32x8 = hw_native_vdtype("float32")

# Generate from DataType objects.
u16x16 = hw_native_vdtype(DataType("uint16"))
i32x8 = hw_native_vdtype(DataType("int32"))

See Also

tvm.aipu.utils.rand(shape, dtype, low=None, high=None, enable_corner_values=True, return_python_type=False)

Random values in a given shape, dtype and [low, high) range (including low, excluding high).

Parameters

shapeUnion[int, Tuple[int], List[int]]

The element number on which rand is performed.

dtypestr

The data type.

lowOptional[int, float]

The minimum threshold for the rand range.

highOptional[int, float]

The maximum threshold for the rand range.

enable_corner_valuesOptional[bool]

Whether the corner values are forced to be included. Note: 1. The corner values contain: low or dtype minimum value, high or dtype maximum value, and zero value when zero is in the random range. 2. When the value is True and the number of elements is less than the number of corner values, it is uncertain whether corner values are forced to be included: the existence of corner values depends on randomness.

return_python_typeOptional[bool]

Whether return the result as Python native type or not, if it is False, the result are returned as NumPy type.

Returns

out: Union[float, int, List[float], List[int], numpy.ndarray]

Rand values, scalar when shape is 1 or numpy.ndarray when shape is a tuple of int.

Examples

# Generate NumPy objects.
ndarray_i8_a = rand(100, "int8")
ndarray_fp16_b = rand((4, 16), "float16", low=-100, high=100)
ndarray_int16_c = rand((1,), "int16")
numpy_fp32_c = rand(1, low=0, "float32")

# Generate Python native type objects.
float_list_d = rand((2, 30), "float32", high=5.5, return_python_type=True)
int_value_e = rand(1, "int32", enable_corner_values=False, return_python_type=True)
int_list_f = rand((1,), "int8", return_python_type=True)

The official part of IR APIs.

tvm.aipu.script.ir.ir.vectorized(start: PrimExpr, stop: PrimExpr = None, *, annotations: Dict[str, Any] = None) ForFrame

The vectorized For statement.

Parameters

startPrimExpr

The minimum value of iteration.

stopPrimExpr

The maximum value of iteration.

annotationsDict[str, Any]

The optional annotations of the For statement.

Returns

resframe.ForFrame

The ForFrame.

tvm.aipu.script.ir.ir.block(name: str = '', no_realize: bool = False) BlockFrame

The block declaration statement.

Parameters

namestr

The name of the block.

no_realizebool

The flag whether to construct BlockRealize or Block.

Returns

resframe.BlockFrame

The BlockFrame.

tvm.aipu.script.ir.ir.grid(*extents: PrimExpr) ForFrame

The grid For statement.

Parameters

extentsPrimExpr

The extents of the iteration.

Returns

resframe.ForFrame

The ForFrame.

The axis part of IR APIs.

tvm.aipu.script.ir.axis.spatial(dom: Union[Range, List[PrimExpr], Tuple[PrimExpr]], binding: PrimExpr, dtype: str = 'int32') Var

The spatial block axis defining function.

Parameters

domUnion[Range, List[PrimExpr], Tuple[PrimExpr]]

The domain of the iteration variable.

bindingPrimExpr

The binding value of the iteration variable.

dtypestr

The data type of the iteration variable.

Returns

resVar

The iteration variable.

tvm.aipu.script.ir.axis.reduce(dom: Union[Range, List[PrimExpr], Tuple[PrimExpr]], binding: PrimExpr, dtype: str = 'int32') Var

The reduced block axis defining function.

Parameters

domUnion[Range, List[PrimExpr], Tuple[PrimExpr]]

The domain of the iteration variable.

bindingPrimExpr

The binding value of the iteration variable.

dtypestr

The data type of the iteration variable.

Returns

resVar

The iteration variable.

tvm.aipu.script.ir.axis.remap(kinds: str, bindings: List[PrimExpr], dtype: str = 'int32') Union[List[Var], Var]

The block axis remapping function.

Parameters

kindsstr

The types of the iteration variables.

bindingsList[PrimExpr]

The binding values of the iteration variables.

dtypestr

The data types of the iteration variables.

Returns

resVar

The iteration variables.