Abstract: Visual grounding, i.e., localizing objects in images ac-cording to natural language queries, is an important topic in visual language understanding. The most effective approaches for this ...
Abstract: To perform household tasks, assistive robots receive commands in the form of user language instructions for tool manipulation. The initial stage involves selecting the intended tool (i.e., ...