Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Text-to-image generation aims to generate images from text descriptions. Its main challenge lies in two aspects: (1) Semantic consistency, i.e., the generated images should be semantically consistent with the input text; (2) Visual reality, i.e., the ...
In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text and corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been ...
Image-based 3D model retrieval aims at organizing unlabeled 3D models according to the relevance to the labeled 2D images. With easy accessibility of 2D images and wide applications of 3D models, image-based 3D model retrieval attracts more and more ...
Camouflaged object detection (COD) is important as it has various potential applications. Unlike salient object detection (SOD), which tries to identify visually salient objects, COD tries to detect objects that are visually very similar to the ...
Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage ...
Person re-identification (ReID) plays an important role in applications such as public security and video surveillance. Recently, learning from synthetic data [9], which benefits from the popularity of synthetic data engine, has attracted great attention ...
Image-text retrieval aims to take the text (image) query to retrieve the semantically relevant images (texts), which is fundamental and critical in the search system, online shopping, and social network. Existing works have shown the effectiveness of ...
The 2D virtual try-on task aims to transfer a target clothing image to the corresponding region of a person image. Although an extensive amount of research has been conducted due to its immense applications, this task still remains a great challenge to ...
A bottom-up and top-down attention mechanism has led to the revolutionizing of image captioning techniques, which enables object-level attention for multi-step reasoning over all the detected objects. However, when humans describe an image, they often ...
Moving object detection is still a challenging task in complex scenes. The existing methods based on deep learning mainly use U-Nets and have achieved amazing results. However, they ignore the local continuity between pixels. In order to solve this ...
We study the high dynamic range (HDR) imaging problem in dual-lens systems. Existing methods usually treat the HDR imaging problem as an image fusion problem and the HDR result is estimated by fusing the aligned short exposure image and long exposure ...
The demand and usage of 360° video services are expected to increase. However, despite these services being highly bandwidth intensive, not much is known about the potential value that basic bandwidth saving techniques such as server or edge-network on-...
Deep Neural Networks (DNNs) are widely used for computer vision tasks. However, it has been shown that deep models are vulnerable to adversarial attacks, i.e., their performances drop when imperceptible perturbations are made to the original inputs, which ...
As an important and challenging problem, image generation with limited data aims at generating realistic images through training a GAN model given few samples. A typical solution is to transfer a well-trained GAN model from a data-rich source domain to ...
As a crucial part of natural language processing, event-centered commonsense inference task has attracted increasing attention. With a given observed event, the intention and reaction of the people involved in the event are required to be inferred with ...
Single-label facial expression recognition (FER), which aims to classify single expression for facial images, usually suffers from the label noisy and incomplete problem, where manual annotations for partial training images exist wrong or incomplete ...
Most state-of-the-art deep networks proposed for biomedical image segmentation are developed based on U-Net. While remarkable success has been achieved, its inherent limitations hinder it from yielding more precise segmentation. First, its receptive field ...
Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps ...
Humans can easily infer the motivations behind human actions from only visual data by comprehensively analyzing the complex context information and utilizing abundant life experiences. Inspired by humans’ reasoning ability, existing motivation prediction ...
The hippocampus plays a vital role in the diagnosis and treatment of many neurological disorders. Recent years, deep learning technology has made great progress in the field of medical image segmentation, and the performance of related tasks has been ...