vllm-npu-plugin/attention at 810a2ef7572a6cbb0d24557aa5316923469c7e6c - vllm-npu-plugin - Gitea: Git with a cup of tea

zzh/vllm-npu-plugin

mirror of https://github.com/handsomezhuzhu/vllm-npu-plugin.git synced 2026-04-18 22:32:53 +00:00

Files

History

handsomezhuzhu 810a2ef757 refactor: align attention with Huawei vllm-ascend - reshape_and_cache with kv_cache[0]/[1], _get_fia_params, npu_fused_infer_attention_score for chunked prefill, add actual_seq_lengths_q

2026-02-10 20:06:52 +08:00

..

__init__.py

feat: initial vllm-npu-plugin for Ascend NPU adaptation

2026-02-10 11:06:01 +08:00

attention_v1.py

refactor: align attention with Huawei vllm-ascend - reshape_and_cache with kv_cache[0]/[1], _get_fia_params, npu_fused_infer_attention_score for chunked prefill, add actual_seq_lengths_q

2026-02-10 20:06:52 +08:00