
FlashMLA/README.md at main · deepseek-ai/FlashMLA · GitHub
Sep 29, 2025 · FlashMLA: Efficient Multi-head Latent Attention Kernels - FlashMLA/README.md at main · deepseek-ai/FlashMLA
Projects · FlashMLA · GitHub
GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Community Standards · GitHub
FlashMLA: Efficient Multi-head Latent Attention Kernels - Community Standards · deepseek-ai/FlashMLA
dual gemm · Issue #169 · deepseek-ai/FlashMLA - GitHub
Hi, I'm studying the source code of flashmla. I noticed that in sparse decode head64 implementaion, it uses "dual gemm" to compute P=QK^T. I have a few questions about this design. if constexpr (MO...
GitHub · Where software is built
FlashMLA: Efficient Multi-head Latent Attention Kernels - deepseek-ai/FlashMLA
Use `pyproject.toml` · Issue #172 · deepseek-ai/FlashMLA - GitHub
Mar 27, 2026 · Can you please move to pyproject.toml instead of setup.py? Then declare torch as a build dependency
support for rtx 6000 sm120 · Issue #124 · deepseek-ai/FlashMLA - GitHub
Dec 3, 2025 · Can i compile this for rtx 6000 sm120? I saw someone made a fork for Windows? https://github.com/IISuperluminaLII/FlashMLA_Windows_Linux_sm120
Security Overview · deepseek-ai/FlashMLA · GitHub
GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Releases · deepseek-ai/FlashMLA - GitHub
FlashMLA: Efficient Multi-head Latent Attention Kernels - deepseek-ai/FlashMLA
GitHub - deepseek-ai/FlashMLA: FlashMLA: Efficient Multi-head Latent ...
Sep 29, 2025 · FlashMLA is DeepSeek's library of optimized attention kernels, powering the DeepSeek-V3 and DeepSeek-V3.2-Exp models. This repository contains the following implementations: Sparse …