(a) On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. On RULER (128k context length), it shows Pareto-optimal (84.3), performance and a 3.98x ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results