vllm-npu-plugin/vllm_npu at 1baa36026c34693fd4478efa8024594d455841b6 - vllm-npu-plugin - Gitea: Git with a cup of tea

zzh/vllm-npu-plugin

mirror of https://github.com/handsomezhuzhu/vllm-npu-plugin.git synced 2026-04-18 22:32:53 +00:00

Files

History

handsomezhuzhu e22617f72e feat: Add Ascend NPU attention backend for vLLM using FlashAttention operators.

2026-02-10 22:15:26 +08:00

..

feat: Add Ascend NPU attention backend for vLLM using FlashAttention operators.

2026-02-10 22:15:26 +08:00

feat: initial vllm-npu-plugin for Ascend NPU adaptation

2026-02-10 11:06:01 +08:00

feat: Add Ascend NPU attention backend with NPU-specific FlashAttention, LayerNorm, and Rotary Embedding implementations.

2026-02-10 21:56:45 +08:00

fix: add initialize_cache method to NPU worker

2026-02-10 19:42:32 +08:00

__init__.py

feat: Implement the NPU platform plugin for vLLM, including platform registration, device management, custom operations, and configuration adaptation.

2026-02-10 22:05:06 +08:00

cuda_compat.py

feat: add CUDA-to-NPU monkey patches for GPUModelRunner compatibility

2026-02-10 19:09:14 +08:00

platform.py

feat: Implement the NPU platform plugin for vLLM, including platform registration, device management, custom operations, and configuration adaptation.

2026-02-10 22:05:06 +08:00