Tutorial 0: Quick Start

In this tutorial, you will learn the basic workflow of Compass DSL. You will learn about how to:

  • Write Basic Compass DSL Function

  • Build the Function

  • Inspect OpenCL C Code

  • Run in Python (PySim)

  • Run on NPU Simulator

  • Run on Remote Hardware Device through RPC

  • Profile

1. Write Basic Compass DSL Function

You can write a simple function with decorator S.prim_func:

  • The input and output data needs to be annotated as S.ptr(dtype, "global").

  • The function body is the computations.

from tvm.aipu import script as S
from tvm import aipu


dtype = "float32"


@S.prim_func
def func_add(a: S.ptr(dtype, "global"), b: S.ptr(dtype, "global"), c: S.ptr(dtype, "global"), n: S.i32):
    for i in range(n):
        c[i] = a[i] + b[i]

2. Build the Function

The BuildManager is the user interface for DSL program compilation. To define an instance of BuildManager, the target is required. The default target is X2_1204. You can then call the build method of bm to build the DSL function and get an executor.

bm = aipu.tir.BuildManager(target="X2_1204")
ex = bm.build(func_add)

For more information, see build workflow of Compass DSL.

3. Inspect OpenCL C Code

After building the function, you can get the generated OpenCL C code.

print(ex.c_code)

The generated OpenCL C code will be something like below.

__kernel void func_add(__global float* a, __global float* b, __global float* c, int n) {
  for (int i = 0; i < n; ++i) {
    c[i] = (a[i] + b[i]);
  }
}

4. Run in Python (PySim)

You can run the DSL function in Python directly.

In this case, you can directly call func_add by passing the appropriate arguments.

n = 1024
a, b = rand(n, dtype), rand(n, dtype)
py_out = np.zeros(n, dtype=dtype)

func_add(a, b, py_out, n)

5. Run on NPU Simulator

After the build step, the ex is an executable object. You can directly run it by passing the appropriate arguments.

aipu_out = np.zeros(n, dtype=dtype)
ex(a, b, aipu_out, n)

You can directly print the output data.

print(aipu_out)

6. Run on Remote Hardware Device through RPC

Compass DSL also supports running on hardware device through RPC.

You need to set the RPC relevant environments:

export AIPU_TVM_RPC_TRACKER_IP="xxx"
export AIPU_TVM_RPC_TRACKER_PORT="xxx"
export AIPU_TVM_RPC_KEY="xxx"
export AIPU_TVM_DEVICE_COMPILER="/xxx/aarch64-linux-gnu-g++"

Establish an RPC session with the remote hardware device and run the function.

# Switch to execute on hardware device through RPC
# rpc_key = "None" means get it from env "AIPU_TVM_RPC_KEY".
ex.rpc_sess = get_rpc_session(session_timeout=60, rpc_key=None)
aipu_out = np.zeros(n, dtype)
ex(a, b, aipu_out, n)

7. Profile

You can use the benchmark API to get the function execution time on remote hardware device.

ex.benchmark(a, b, aipu_out, n)

# Execution time summary:
# mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
#   0.6287       0.6287       0.6613       0.5962       0.0326

If the amount of data is too small, it is recommended to repeat multiple loops to reduce measurement errors, thereby obtaining more reliable performance data.

ex.benchmark(a, b, aipu_out, n, repeat=10, number=10)

You can also use the NPU profiler to collect accurate and detailed performance information.

ex.profile(a, b, aipu_out, n)

# Total cycles from profiler: 2773
# For more details about the profiler report, please see "compass_dsl_xxx/runtime/profile_output.html"

Complete Code

You can find the sample code in PYTHON_PACKAGE_PATH/tvm/aipu/samples/dsl/tutorial_0_quick_start.py. The placeholder PYTHON_PACKAGE_PATH represents the location where you install the Compass DSL Python package, in general, it will be something like ~/.local/lib/python3.8/site-packages.