This paper aims at the transferability of the zero-shot captioning for out-of-domain images. As shown in this image, we demonstrate the susceptibility of pre-trained vision-language models and large ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results