Abstract: In remote sensing (RS), convolutional neural networks (CNNs) are well-recognized for their spatial–spectral feature extraction capabilities, whereas vision transformers (ViTs), which ...