Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RoPE (Rotary Position Embedding)

Category: Positional Encoding | Complexity: O(S*D) elementwise | Memory: 2 passes (read+write, plus cos/sin tables)

Algorithm

RoPE (Su et al. 2021) encodes position by rotating pairs of dimensions at frequency-dependent rates:

For each pair (x[2i], x[2i+1]):
  theta = pos / 10000^(2i/d)
  x'[2i]   = x[2i]*cos(theta) - x[2i+1]*sin(theta)
  x'[2i+1] = x[2i]*sin(theta) + x[2i+1]*cos(theta)

Used in every modern LLM (LLaMA, Mistral, GPT-NeoX, Qwen, etc.) to encode token position in Q/K vectors. RoPE is bandwidth-bound for short sequences and compute-bound (cos/sin) for long sequences.

ascend-rs Kernel Source

RoPE using the tile API — safe entry form:

#![allow(unused)]
fn main() {
use ascend_std::tile::{GmView, GmViewMut, safe, tile_load_view_f32, tile_store_view_f32};

#[ascend_std::aiv_kernel]
pub fn tile_rope(
    input:  GmView<'_, 1, 128, f32>,
    output: GmViewMut<'_, 1, 128, f32>,
) {
    let x = tile_load_view_f32(&input);
    let y = safe::tile_rope_f32(x, 0);  // base position = 0
    tile_store_view_f32(&output, y);
}
}

The kernel body is pure safe Rust — shape (rows, cols, dtype) is committed at the type level via const generics, so any host-side mismatch becomes a compile-time error. The #[aiv_kernel] attribute rewrites the emitted signature back to raw *const f32 / *mut f32 so the launcher toolchain sees the same C ABI; #[repr(transparent)] on GmView/GmViewMut makes this rewrite free at the LLVM IR level.

Backend status (lowered by rustc_codegen_mlir): Cambricon BANG, Intel Gaudi, Apple Metal, Vulkan SPIR-V (4/9). Ascend AIV / CUDA / AWS NKI / AMD AIE / Google TPU lowerings are TODO — on those backends RoPE is currently expressed as a buffer-API composition of element-wise cos/sin/mul/add rather than a single fused tile op.

Benchmark configurations

Shape (B, S, D)ElementsBytes (f32)Notes
(1, 64, 128)8K32 KBSingle query, short context
(32, 64, 128)262K1 MBBatched queries
(1, 128, 128)16K64 KBLonger head dim

Results

See Leaderboard filtered to RoPE for the full filterable view.