Linear Transformers are Secretly Fast Weight Programmers (ICML 2021) Going Beyond Linear Transformers with Recurrent Fast Weight Programmers... While we only used the cuda implementation for all our ...