Abstract: As modern supercomputers have increasingly heterogeneous hardware, the need for writing parallel code that is both portable and performant across different hardware architectures increases.