Arithmetic
The arithmetic part of IR APIs.
- tvm.aipu.script.ir.arithmetic.vadd(x, y, mask=None, saturate=False, out_sign=None, r=None)
Computes the addition on active elements of
x
with the corresponding elements ofy
.The inactive elements of result vector are determined by
r
.The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 1 2 3 4 5 6 7 8 y: 1 2 3 4 5 6 7 8 mask: T T T T F F T T z: 9 8 7 6 4 3 2 1 out = S.vadd(x, y, mask, r=z) out: 2 4 6 8 4 3 14 16
Parameters
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.- saturateOptional[bool]
Whether the result needs to be saturated or not.
- out_signOptional[str]
Specify whether the output sign is signed or unsigned. It is only needed for integer operation.
None
means same as operands, so the sign of operands must be the same,u
means unsigned,s
means signed.- rOptional[PrimExpr, int, float]
Provide the value of the inactive elements in result vector. If it is a scalar, it will be automatically broadcast.
None
means the inactive elements of result vector are undefined.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”, “float16/32”.
Examples
vc = S.vadd(va, vb) vc = S.vadd(va, 3) vc = S.vadd(va, vb, saturate=True) vc = S.vadd(va, vb, out_sign="u") vc = S.vadd(va, vb, mask=S.tail_mask(n, 8)) vc = S.vadd(va, vb, mask="3T5F", r=vb)
See Also
Zhouyi Compass OpenCL Programming Guide: __vadd, __vadds
- tvm.aipu.script.ir.arithmetic.vaddh(x, y)
Performs an add operation on every two adjacent elements in the vector x and vector y, concats the results of x and y, and places the results of x to the lower half part and the results of y to the higher half part.
The feature Multiple Width Vector is supported.
x: 9 1 8 2 7 3 6 4 y: 9 4 8 3 7 2 6 1 out = S.vaddh(x, y) out: 10 10 10 10 13 11 9 7
Parameters
- x, yUnion[PrimExpr, int]
The operands. If either one is a scalar, it will be automatically broadcast.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”.
Examples
vc = S.vaddh(va, vb) vc = S.vaddh(va, 3)
See Also
Zhouyi Compass OpenCL Programming Guide: __vaddh
- tvm.aipu.script.ir.arithmetic.vabs(x, mask=None, saturate=False)
Computes the absolute value of every active element of vector x.
The inactive elements of result vector are undefined.
The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 1 -2 -3 4 -5 6 -127 -128 mask: T T T T F F T T out = S.vabs(x, mask) out: 1 2 3 4 ? ? 127 -128 out = S.vabs(x, mask, saturate=True) out: 1 2 3 4 ? ? 127 127
Parameters
- x: Union[PrimExpr]
The operands. The vector x.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.- saturateOptional[bool]
Whether the result needs to be saturated or not.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”, “float16/32”.
Examples
vc = S.vabs(va) vc = S.vabs(va, saturate=True) vc = S.vabs(va, mask="3T5F") vc = S.vabs(va, mask=S.tail_mask(n, 8))
See Also
Zhouyi Compass OpenCL Programming Guide: __vabs, __vabss
- tvm.aipu.script.ir.arithmetic.vsub(x, y, mask=None, saturate=False, out_sign=None, r=None)
Computes the subtraction on active elements of
x
with the corresponding elements ofy
.The inactive elements of result vector are determined by
r
.The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 1 2 3 4 5 6 7 8 y: 1 1 1 1 2 2 2 2 mask: T T T T F F T T z: 9 8 7 6 4 3 2 1 out = S.vsub(x, y, mask, r=z) out: 0 1 2 3 4 3 5 6
Parameters
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.- saturateOptional[bool]
Whether the result needs to be saturated or not.
- out_signOptional[str]
Specify whether the output sign is signed or unsigned. It is only needed for integer operation.
None
means same as operands, so the sign of operands must be the same,u
means unsigned,s
means signed.- rOptional[PrimExpr, int, float]
Provide the value of the inactive elements in result vector. If it is a scalar, it will be automatically broadcast.
None
means the inactive elements of result vector are undefined.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”, “float16/32”.
Examples
vc = S.vsub(va, vb) vc = S.vsub(va, 3) vc = S.vsub(va, vb, saturate=True) vc = S.vsub(va, vb, out_sign="u") vc = S.vsub(va, vb, mask=S.tail_mask(n, 8)) vc = S.vsub(va, vb, mask="3T5F", r=vb)
See Also
Zhouyi Compass OpenCL Programming Guide: __vsub, __vsubs
- tvm.aipu.script.ir.arithmetic.vsubh(x, y)
Performs a sub operation on every two adjacent elements in the vector x and vector y, concats the results of x and y, and places the results of x to the lower half part and the results of y to the higher half part.
The feature Multiple Width Vector is supported.
x: 9 1 8 2 7 3 6 4 y: 9 4 8 3 7 2 6 1 out = S.vsubh(x, y) out: 8 6 4 2 5 5 5 5
Parameters
- x, yUnion[PrimExpr, int]
The operands. If either one is a scalar, it will be automatically broadcast.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”.
Examples
vc = S.vsubh(va, vb) vc = S.vsubh(va, 3)
See Also
Zhouyi Compass OpenCL Programming Guide: __vsubh
- tvm.aipu.script.ir.arithmetic.vmul(x, y, mask=None, out_sign=None, r=None)
Computes the multiplication on active elements of
x
with the corresponding elements ofy
.The inactive elements of result vector are determined by
r
.The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 1 2 3 4 5 6 7 8 y: 1 2 3 4 5 6 7 8 mask: T T T T F F T T z: 9 8 7 6 4 3 2 1 out = S.vmul(x, y, mask, r=z) out: 1 4 9 16 4 3 49 64
Parameters
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.- out_signOptional[str]
Specify whether the output sign is signed or unsigned. It is only needed for integer operation.
None
means same as operands, so the sign of operands must be the same,u
means unsigned,s
means signed.- rOptional[PrimExpr, int, float]
Provide the value of the inactive elements in result vector. If it is a scalar, it will be automatically broadcast.
None
means the inactive elements of result vector are undefined.
Returns
- retPrimExpr
The result expression.
Supported DType
“int16/32”, “uint16/32”, “float16/32”.
Examples
vc = S.vmul(va, vb) vc = S.vmul(va, 3) vc = S.vmul(va, vb, out_sign="u") vc = S.vmul(va, vb, mask="3T5F") vc = S.vmul(va, vb, mask=S.tail_mask(n, 8)) vc = S.vmul(va, vb, mask"T7F", r=vb)
See Also
Zhouyi Compass OpenCL Programming Guide: __vmul, __vmulh
- tvm.aipu.script.ir.arithmetic.vmull(x, y, mask=None, out_sign=None, r=None)
Computes the multiplication on active elements of low half part of
x
with the corresponding elements ofy
. Expands elements bit: 8bit -> 16bit or 16bit -> 32bit.The inactive elements of result vector are determined by
r
.
x(i16x16): 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 y(i16x16): 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 mask: T T T T F F T T T T T T F F F F z(i32x8): 9 8 7 6 5 4 3 2 out = S.vmull(x, y, mask, r=z) out(i32x8): 1 4 9 16 5 4 49 64
Parameters
- x, yUnion[PrimExpr, int]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.- out_signOptional[str]
Specify whether the output sign is signed or unsigned. It is only needed for integer operation.
None
means same as operands, so the sign of operands must be the same,u
means unsigned,s
means signed.- rOptional[PrimExpr, int, float]
Provide the value of the inactive elements in result vector. If it is a scalar, it will be automatically broadcast.
None
means the inactive elements of result vector are undefined.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”.
Examples
vc = S.vmull(va, vb) vc = S.vmull(va, 3) vc = S.vmull(va, vb, out_sign="u") vc = S.vmull(va, vb, mask="3T5F") vc = S.vmull(va, vb, mask=S.tail_mask(n, 8)) vc = S.vmull(va, vb, mask="T7F", r=vb)
See Also
Zhouyi Compass OpenCL Programming Guide: __vmul
- tvm.aipu.script.ir.arithmetic.vmulh(x, y, mask=None, out_sign=None, r=None)
Computes the multiplication on active elements of high half part of
x
with the corresponding elements ofy
. Expands elements bit: 8bit -> 16bit or 16bit -> 32bit.The inactive elements of result vector are determined by
r
.
x(i16x16): 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 y(i16x16): 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 mask: T T T T F F T T T T T T F F F F z(i32x8): 9 8 7 6 5 4 3 2 out = S.vmulh(x, y, mask, r=z) out(i32x8): 81 0 1 4 5 4 3 2
Parameters
- x, yUnion[PrimExpr, int]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.- out_signOptional[str]
Specify whether the output sign is signed or unsigned. It is only needed for integer operation.
None
means same as operands, so the sign of operands must be the same,u
means unsigned,s
means signed.- rOptional[PrimExpr, int, float]
Provide the value of the inactive elements in result vector. If it is a scalar, it will be automatically broadcast.
None
means the inactive elements of result vector are undefined.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”.
Examples
vc = S.vmulh(va, vb) vc = S.vmulh(va, 3) vc = S.vmulh(va, vb, out_sign="u") vc = S.vmulh(va, vb, mask="3T5F") vc = S.vmulh(va, vb, mask=S.tail_mask(n, 8)) vc = S.vmulh(va, vb, mask="T7F", r=vb)
See Also
Zhouyi Compass OpenCL Programming Guide: __vmulh
- tvm.aipu.script.ir.arithmetic.vdiv(x, y, mask=None)
Computes the division on active elements of
x
with the corresponding elements ofy
.The inactive elements of result vector are undefined.
The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 1 4 9 16 25 36 49 64 y: 1 2 3 4 5 6 7 8 mask: T T F F T T T T out = S.vdiv(x, y, mask) out: 1 2 ? ? 5 6 7 8
Parameters
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”, “float32”.
Examples
vc = S.vdiv(va, vb) vc = S.vdiv(va, 3) vc = S.vdiv(va, vb, mask="3T5F") vc = S.vdiv(va, vb, mask=S.tail_mask(n, 8))
See Also
Zhouyi Compass OpenCL Programming Guide: __vdiv
- tvm.aipu.script.ir.arithmetic.vmod(x, y, mask=None)
Computes the remainder on active elements of
x
with the corresponding elements ofy
.The inactive elements of result vector are undefined.
The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 5 6 7 8 -5 -6 -7 -8 y: -2 -5 4 3 3 4 -3 -4 mask: T T T F F T T T out = S.vmod(x, y, mask) out: 1 1 3 ? ? -2 -1 0
Parameters
- x, yUnion[PrimExpr, int]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”.
Examples
vc = S.vmod(va, vb) vc = S.vmod(va, 3) vc = S.vmod(va, vb, mask="3T5F") vc = S.vmod(va, vb, mask=S.tail_mask(n, 8))
See Also
Zhouyi Compass OpenCL Programming Guide: __vmod
- tvm.aipu.script.ir.arithmetic.vdot(x, y, mask=None)
Computes dot production on every two adjacent elements of
x
with the corresponding elements ofy
. The elements of result vector will be saturated.The inactive elements of result vector are undefined.
x(i16x16): x0 x1 x2 x3 x4 x5 x6 x7 ... x14 x15 y(i16x16): y0 y1 y2 y3 y4 y5 y6 y7 ... y14 y15 mask(boolx16): F F T F F T T T ... F T out = S.vdot(x, y, mask) out(i32x8): ? x2*y2 x5*y5 x6*y6+x7*y7 ... x15*y15
Parameters
- x, yUnion[PrimExpr, int]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
# Only supported integer cases: case result dtype x.dtype y.dtype 1 "int16" "int8" "int8" 2 "int16" "int8" "uint8" 3 "int16" "uint8" "int8" 4 "uint16" "uint8" "uint8" 5 "int32" "int16" "int16" 6 "int32" "int16" "uint16" 7 "int32" "uint16" "int16" 8 "uint32" "uint16" "uint16"
Examples
out0 = S.vdot(x, y) out1 = S.vdot(x, 3, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vdot
- tvm.aipu.script.ir.arithmetic.vqdot(x, y, mask=None)
Computes dot production on every four adjacent elements of
x
with the corresponding elements ofy
. The elements of result vector will be saturated.The inactive elements of result vector are undefined.
x(i8x32): x0 x1 x2 x3 x4 x5 x6 x7 ... x28 x29 x30 x31 y(i8x32): y0 y1 y2 y3 y4 y5 y6 y7 ... y28 y29 y30 y31 mask(boolx32): T F F T T T T T ... F F F T out = S.vqdot(x, y, mask) out(i32x8): x0*y0+x3*y3 x4*y4+x5*y5+x6*y6+x7*y7 ... x31*y31 x(fp16x16): x0 x1 x2 x3 x4 x5 x6 x7 ... x14 x15 y(fp16x16): y0 y1 y2 y3 y4 y5 y6 y7 ... y14 y15 mask(boolx16): F F T F T T T T ... F T out = vqdot(x, y, mask) out(fp32x8): x2*y2 ? x4*y4+x5*y5+x6*y6+x7*y7 ? ... ? # For float, the result stores in even index, the values in odd index are undefined.
Parameters
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
Supported integer cases: case result dtype x.dtype y.dtype 1 "int32" "int8" "int8" 2 "int32" "int8" "uint8" 3 "int32" "uint8" "int8" 4 "uint32" "uint8" "uint8" Supported floating cases: case result dtype x.dtype y.dtype 1 "float32" "float16" "float16"
Examples
out0 = S.vqdot(x, y) out1 = S.vqdot(x, 3, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vqdot
- tvm.aipu.script.ir.arithmetic.vdpa(acc, x, y, mask=None)
Performs an accumulate add operation with every two adjacent elements of inputs.
acc(i32x8): a0 a1 a2 ... a7 x(i16x16): x0 x1 x2 x3 x4 x5 ... x14 x15 y(i16x16): y0 y1 y2 y3 y4 y5 ... y14 y15 mask(boolx16): F F T T T F ... F T out = S.vdpa(acc, x, y, mask) out(i32x8): a0 a1+x2*y2+x3*y3 a2+x4*y4 ... a7+x15*y15
Parameters
- accPrimExpr
The accumulate register, should be initialized.
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
# Only supported integer cases: case acc.dtype x.dtype y.dtype 1 "int16" "int8" "int8" 2 "int16" "int8" "uint8" 3 "int16" "uint8" "int8" 4 "uint16" "uint8" "uint8" 5 "int32" "int16" "int16" 6 "int32" "int16" "uint16" 7 "int32" "uint16" "int16" 8 "uint32" "uint16" "uint16"
Examples
acc = S.int32x8(0) out = S.vdpa(acc, x, y) acc = S.int32x8(0) out = S.vdpa(acc, x, y, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vdpa
- tvm.aipu.script.ir.arithmetic.vqdpa(acc, x, y, mask=None)
Performs an accumulate add operation with every four adjacent elements of inputs.
acc(i32x8): a0 a1 a2 ... a7 x(i8x32): x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 ... x28 x29 x30 x31 y(i8x32): y0 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 ... y28 y29 y30 y31 mask(boolx32): T F F F T F T F F T F F ... T F F T out = S.vqdpa(acc, x, y, mask) out(i32x8): a0+x0*y0 a1+x4*y4+x6*y6 a2+x9*y9 ... a7+x28*y28+x31*y31 acc(fp32x8): a0 a1 a2 a3 ... a7 x(fp16x16): x0 x1 x2 x3 x4 x5 x6 x7 ... x14 x15 y(fp16x16): y0 y1 y2 y3 y4 y5 y6 y7 ... y14 y15 mask(boolx16): T F F T T T T T ... F T out = S.vqdpa(acc, x, y, mask) out(fp32x8): a0+x0*y0+x3*y3 a1 a2+x4*y4+x5*y5+x6*y6+x7*y7 a3 ... a7 # For float, the result stores in even index, odd index keep the value of "acc".
Parameters
- accPrimExpr
The accumulate register, should be initialized.
- x, yUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
Supported integer cases: case acc.dtype x.dtype y.dtype 1 "int32" "int8" "int8" 2 "int32" "int8" "uint8" 3 "int32" "uint8" "int8" 4 "uint32" "uint8" "uint8" Supported floating cases: case acc.dtype x.dtype y.dtype 1 "float32" "float16" "float16"
Examples
acc = S.int32x8(0) out = S.vqdpa(acc, x, y) acc = S.int32x8(0) out = S.vqdpa(acc, x, y, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vqdpa
- tvm.aipu.script.ir.arithmetic.vrpadd(x, mask=None)
Computes the reduction addition of all active elements of
x
, and places the result as the lowest elements of result vector.The remaining upper elements of result vector are undefined.
x: 0 1 2 3 4 5 6 7 mask: T T T T T T F T out = S.vrpadd(x, mask) out: 22 ? ? ? ? ? ? ?
Parameters
- xPrimExpr,
The operands.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”, “float16/32”.
Examples
out = S.vrpadd(x) out = S.vrpadd(x, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vrpadd
- tvm.aipu.script.ir.arithmetic.vmml(ptr, x, y)
Performs a mixture precision 4x4 matrix multiply and addition for float16x16 vector x (row-major) and y (column-major). The result pointer
ptr
is the address of float32x16 with row-major. This behavior is the same asptr[:?] = matrix_multiply(x, y)
.y(fp16x16): y0 y4 y8 y12 y1 y5 y9 y13 y2 y6 y10 y14 y3 y7 y11 y15 x(fp16x16): x0 x1 x2 x3 a0 a1 a2 a3 :ptr(fp32x16) x4 x5 x6 x7 a4 a5 a6 a7 x8 x9 x10 x11 a8 a9 a10 a11 x12 x13 x14 x15 a12 a13 a14 a15 S.vmml(ptr, x, y) # Detailed computation for each result element: a0 = x0*y0 + x1*y1 + x2*y2 + x3*y3 a1 = x0*y4 + x1*y5 + x2*y6 + x3*y7 ... a9 = x8*y4 + x9*y5 + x10*y6 + x11*y7 ... a15 = x12*y12 + x13*y13 + x14*y14 + x15*y15
Parameters
- ptrPointer
The pointer that store the memory address in where the result will be stored, it can be a scalar or vector float32 pointer, the memory space it point to at least must can represent a 4x4 float32 matrix with row major.
- xPrimExpr
The operand x with vector type float16x16 representing 4x4 fp16 elements with row major.
- yPrimExpr
The operand y with vector type float16x16 representing 4x4 fp16 elements with column major.
Supported DType
“float16”.
Examples
# The "vc_fp32_ptr" can be scalar or vector float32 pointer, as long as the memory space # that it point to is enough to store 4x4 float32 data. S.vmml(vc_fp32_ptr, va_fp16x16, vb_fp16x16)
See Also
Zhouyi Compass OpenCL Programming Guide: __vmml
- tvm.aipu.script.ir.arithmetic.vmma(acc_ptr, x, y)
Performs a mixture precision 4x4 matrix multiply and addition for float16x16 vector x (row-major) and y (column-major). The result pointer
acc_ptr
is the address of float32x16 with row-major. This behavior is the same asacc_ptr[:?] += matrix_multiply(x, y)
.y(fp16x16): y0 y4 y8 y12 y1 y5 y9 y13 y2 y6 y10 y14 y3 y7 y11 y15 x(fp16x16): x0 x1 x2 x3 a0 a1 a2 a3 :acc_ptr(fp32x16) x4 x5 x6 x7 a4 a5 a6 a7 x8 x9 x10 x11 a8 a9 a10 a11 x12 x13 x14 x15 a12 a13 a14 a15 S.vmma(acc_ptr, x, y) # Detailed computation for each result element: a0 += x0*y0 + x1*y1 + x2*y2 + x3*y3 a1 += x0*y4 + x1*y5 + x2*y6 + x3*y7 ... a9 += x8*y4 + x9*y5 + x10*y6 + x11*y7 ... a15 += x12*y12 + x13*y13 + x14*y14 + x15*y15
Parameters
- acc_ptrPointer
The pointer that store the memory address in where the result will be stored, it can be a scalar or vector float32 pointer, the memory space it point to at least must can represent a 4x4 float32 matrix with row major.
- xPrimExpr
The operand x with vector type float16x16 representing 4x4 fp16 elements with row major.
- yPrimExpr
The operand y with vector type float16x16 representing 4x4 fp16 elements with column major.
Supported DType
“float16”.
Examples
# The "vc_fp32_ptr" can be scalar or vector float32 pointer, as long as the memory space # that it point to is enough to store 4x4 float32 data. S.vmma(vc_fp32_ptr, va_fp16x16, vb_fp16x16)
See Also
Zhouyi Compass OpenCL Programming Guide: __vmma
- tvm.aipu.script.ir.arithmetic.fma(acc, x, y, mask=None)
Performs float multiply-add operation on every active elements of inputs.
The scalar situation where all of
acc
,x
andy
are scalar is also supported.The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
acc: a0 a1 a2 ... a7 x: x0 x1 x2 ... x7 y: y0 y1 y2 ... y7 mask: F T T ... T out = S.fma(acc, x, y, mask) out: a0 a0+x1*y1 a2+x2*y2 ... a7+x7*y7
Parameters
- accPrimExpr
The accumulate register, should be initialized.
- x, yUnion[PrimExpr, float]
The operands. If it is a scalar in the vector situation, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
“float32”.
Examples
acc = S.float32x8(10) out = S.fma(acc, x, y, mask) scalar_out = S.fma(scalar_acc, scalar_x, scalar_y)
See Also
Zhouyi Compass OpenCL Programming Guide: __vfma, __fma
- tvm.aipu.script.ir.arithmetic.vfmae(acc, x, y, mask=None)
Performs float multiply-add operation on even active elements of inputs.
acc(fp32x8): a0 a1 a2 ... a7 x(fp16x16): x0 x1 x2 x3 x4 x5 ... x14 x15 y(fp16x16): y0 y1 y2 y3 y4 y5 ... y14 y15 mask(boolx8): F T T ... T out = S.vfmae(acc, x, y, mask) out(fp32x8): a0 a1+x2*y2 a2+x4*y4 ... a7+x14*y14
Parameters
- accPrimExpr
The accumulate register, should be initialized.
- x, yUnion[PrimExpr, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
# Only supported floating cases: case acc.dtype x.dtype y.dtype 1 "float32" "float16" "float16"
Examples
acc = S.float32x8(10) out = S.vfmae(acc, x, y) acc = S.float32x8(10) out = S.vfmae(acc, x, y, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vfmae
- tvm.aipu.script.ir.arithmetic.vfmao(acc, x, y, mask=None)
Performs float multiply-add operation on odd active elements of inputs.
acc(fp32x8): a0 a1 a2 ... a7 x(fp16x16): x0 x1 x2 x3 x4 x5 ... x14 x15 y(fp16x16): y0 y1 y2 y3 y4 y5 ... y14 y15 mask(boolx8): F T T ... T out = S.vfmao(acc, x, y, mask) out(fp32x8): a0 a1+x3*y3 a2+x5*y5 ... a7+x15*y15
Parameters
- accPrimExpr
The accumulate register, should be initialized.
- x, yUnion[PrimExpr, float]
The operands. If either one is a scalar, it will be automatically broadcast.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
# Only supported floating cases: case acc.dtype x.dtype y.dtype 1 "float32" "float16" "float16"
Examples
acc = S.float32x8(10) out = S.vfmao(acc, x, y) acc = S.float32x8(10) out = S.vfmao(acc, x, y, mask)
See Also
Zhouyi Compass OpenCL Programming Guide: __vfmao
- tvm.aipu.script.ir.arithmetic.vrint(x, mask=None)
Computes the rounding on active elements of
x
.The inactive elements of result vector are undefined.
The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: -0.4 0.2 1.4 1.5 1.6 1.8 1.9 2.01 mask: T T T T T F T T out = S.vrint(x, mask) out: -0.0 0.0 1.0 2.0 2.0 ? 2.0 2.0
Parameters
- xUnion[PrimExpr]
The operands. The vector x.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
“float16/32”.
Examples
vc = S.vrint(va) vc = S.vrint(va, mask="3T5F") vc = S.vrint(va, mask=S.tail_mask(n, 8))
See Also
Zhouyi Compass OpenCL Programming Guide: __vrint
- tvm.aipu.script.ir.arithmetic.clip(x, min_val, max_val, mask=None)
Clip active elements of
x
with the corresponding elements ofmin_val
andmax_val
.The scalar situation where all of
x
,min_val
,max_val
are scalar is also supported.The inactive elements of result vector are set to the corresponding elements of
x
.The feature Flexible Width Vector is supported.
The feature Multiple Width Vector is supported.
x: 1 3 4 9 4 4 8 8 min_val: 3 3 3 3 5 5 5 5 max_val: 8 8 8 8 7 7 7 7 mask: T T T T F T F T out = S.clip(x, min_val, max_val, mask) out: 3 3 4 8 4 5 8 7
Parameters
- x, min_val, max_valUnion[PrimExpr, int, float]
The operands. If either one is a scalar, it will be automatically broadcast. It should be noted that: min_val < max_val.
- maskOptional[Union[Tuple[bool], List[bool], numpy.ndarray[bool], str, PrimExpr]]
The predication mask to indicate which elements of the vector are active for the operation.
None
means all elements are active.
Returns
- retPrimExpr
The result expression.
Supported DType
“int8/16/32”, “uint8/16/32”, “float16/32”.
Examples
b = S.clip(a, -10, 10) vc = S.clip(va, vb, vc) vc = S.clip(va, 3, 30) vc = S.clip(va, vb, vc, mask="3T5F") vc = S.clip(va, vb, vc, mask=S.tail_mask(n, 8))
See Also
Zhouyi Compass OpenCL Programming Guide: __vmax, __vmin, __vsel, __vclt, __vcgt