The demand and usage of 360° video services are expected to increase. However, despite these services being highly bandwidth intensive, not much is known about the potential value that basic bandwidth saving techniques such as server or edge-network on-...
Generative Adversarial Networks (GANs) have the ability to generate images that are visually indistinguishable from real images. However, recent studies have revealed that generated and real images share significant differences in the frequency domain. In ...
The just noticeable difference (JND) reveals the key characteristic of visual perception, which has been widely used in many perception-based image and video applications. Nevertheless, the modulatory mechanism of the human visual system (HVS) has not ...
Detecting out-of-distribution (OOD) inputs for deep learning models is a critical task when models are deployed in real-world environments. Recently, a large number of works have been dedicated to tackling the OOD detection problem. One of the most ...
The 2D virtual try-on task aims to transfer a target clothing image to the corresponding region of a person image. Although an extensive amount of research has been conducted due to its immense applications, this task still remains a great challenge to ...
Text-to-video temporal grounding aims to locate a target video moment that semantically corresponds to the given sentence query in an untrimmed video. In this task, fully supervised works require text descriptions for each event along with its temporal ...
We study the high dynamic range (HDR) imaging problem in dual-lens systems. Existing methods usually treat the HDR imaging problem as an image fusion problem and the HDR result is estimated by fusing the aligned short exposure image and long exposure ...
Moving object detection is still a challenging task in complex scenes. The existing methods based on deep learning mainly use U-Nets and have achieved amazing results. However, they ignore the local continuity between pixels. In order to solve this ...
A bottom-up and top-down attention mechanism has led to the revolutionizing of image captioning techniques, which enables object-level attention for multi-step reasoning over all the detected objects. However, when humans describe an image, they often ...
Image-text retrieval aims to take the text (image) query to retrieve the semantically relevant images (texts), which is fundamental and critical in the search system, online shopping, and social network. Existing works have shown the effectiveness of ...
Video stabilization aims to eliminate camera jitter and improve the visual experience of shaky videos. Video stabilization methods often ignore the active movement of the foreground objects and the camera, and may result in distortion and over-smoothing ...