About 50 results
Open links in new tab
  1. FlashMLA/README.md at main · deepseek-ai/FlashMLA · GitHub

    Sep 29, 2025 · FlashMLA: Efficient Multi-head Latent Attention Kernels - FlashMLA/README.md at main · deepseek-ai/FlashMLA

  2. Projects · FlashMLA · GitHub

    GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

  3. Community Standards · GitHub

    FlashMLA: Efficient Multi-head Latent Attention Kernels - Community Standards · deepseek-ai/FlashMLA

  4. dual gemm · Issue #169 · deepseek-ai/FlashMLA - GitHub

    Hi, I'm studying the source code of flashmla. I noticed that in sparse decode head64 implementaion, it uses "dual gemm" to compute P=QK^T. I have a few questions about this design. if constexpr (MO...

  5. GitHub · Where software is built

    FlashMLA: Efficient Multi-head Latent Attention Kernels - deepseek-ai/FlashMLA

  6. Use `pyproject.toml` · Issue #172 · deepseek-ai/FlashMLA - GitHub

    Mar 27, 2026 · Can you please move to pyproject.toml instead of setup.py? Then declare torch as a build dependency

  7. support for rtx 6000 sm120 · Issue #124 · deepseek-ai/FlashMLA - GitHub

    Dec 3, 2025 · Can i compile this for rtx 6000 sm120? I saw someone made a fork for Windows? https://github.com/IISuperluminaLII/FlashMLA_Windows_Linux_sm120

  8. Security Overview · deepseek-ai/FlashMLA · GitHub

    GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

  9. Releases · deepseek-ai/FlashMLA - GitHub

    FlashMLA: Efficient Multi-head Latent Attention Kernels - deepseek-ai/FlashMLA

  10. GitHub - deepseek-ai/FlashMLA: FlashMLA: Efficient Multi-head Latent ...

    Sep 29, 2025 · FlashMLA is DeepSeek's library of optimized attention kernels, powering the DeepSeek-V3 and DeepSeek-V3.2-Exp models. This repository contains the following implementations: Sparse …