Abstract: This paper introduces a groundbreaking enhancement to image captioning through a unique approach that harnesses the combined power of the Vision Encoder-Decoder model. By leveraging the Swin ...
Abstract: Transformers are widely used in natural language processing and computer vision, and Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pre-trained ...
Abstract: Considering the impact of operation and maintenance costs and technology, there is generally a lack of sufficient meteorological observation devices within the distributed photovoltaic (PV) ...
Abstract: Aspect Sentiment Triplet Extraction (ASTE) is an essential task in fine-grained opinion mining and sentiment analysis that involves extracting triplets consisting of aspect terms, opinion ...
Abstract: Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in the encoder ...
Abstract: Traditional proportional integral derivative (PID) falls short for precise control of DC motor speed under changing conditions. This paper presents a novel FPGA based IP (intellectual ...
Abstract: The ionosphere is vital for satellite navigation and radio communication, but observational limitations necessitate ionospheric forecasting. The least squares collocation (LSC) method is ...
Abstract: Change detection is a critical task in earth observation applications. Recently, deep-learning-based methods have shown promising performance and are quickly adopted in change detection.
Abstract: Open-vocabulary semantic segmentation aims to partition an image into distinct semantic regions based on an open set of categories. Existing approaches primarily rely on image-level ...
Abstract: The self-attention (SA) network revisits the essence of data and has achieved remarkable results in text processing and image analysis. SA is conceptualized as a set operator that is ...
Abstract: This article presents a new deep-learning architecture based on an encoder-decoder framework that retains contrast while performing background subtraction (BS) on thermal videos. The ...
Abstract: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results