Temporal sentence grounding in videos (TSGV), which aims at localizing one target segment from an untrimmed video with respect to a given sentence query, has drawn increasing attentions in the research community over the past few years. Different from the ...
Most matting research resorts to advanced semantics to achieve high-quality alpha mattes, and a direct low-level features combination is usually explored to complement alpha details. However, we argue that appearance-agnostic integration can only provide ...
To reduce the distortion resulting from the large number of crossing quantization cells and resist steganalysis, a reversible data hiding scheme for 2D engineering graphics is put forward based on reversible dual-direction quantization index modulation (...
Multimodal sequence analysis aims to draw inferences from visual, language, and acoustic sequences. A majority of existing works focus on the aligned fusion of three modalities to explore inter-modal interactions, which is impractical in real-world ...
This article targets the task of language-based video moment localization. The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments. Most existing methods ...
Person re-identification (Re-ID) has achieved great success in single-domain. However, it remains a challenging task to adapt a Re-ID model trained on one dataset to another one. Unsupervised domain adaption (UDA) was proposed to migrate a model from a ...
Dual-connectivity streaming is a key enabler of next-generation six Degrees Of Freedom (6DOF) Virtual Reality (VR) scene immersion. Indeed, using conventional sub-6 GHz WiFi only allows to reliably stream a low-quality baseline representation of the VR ...
A small group is a fundamental interaction unit for achieving a shared goal. Group performance can be automatically predicted using computational methods to analyze members’ verbal behavior in task-oriented interactions, as has been proven in several ...
The advent of convolutional neural networks (CNNs) has brought substantial progress in image super-resolution (SR) reconstruction. However, most SR methods pursue deep architectures to boost performance, and the resulting large model sizes are impractical ...
Camouflaged object detection (COD) is important as it has various potential applications. Unlike salient object detection (SOD), which tries to identify visually salient objects, COD tries to detect objects that are visually very similar to the ...
Knowledge graphs often suffer from incompleteness, and knowledge graph completion (KGC) aims at inferring the missing triplets through knowledge graph embedding from known factual triplets. However, most existing knowledge graph embedding methods only use ...
Video captioning requires that the model has the abilities of video understanding, video-text alignment, and text generation. Due to the semantic gap between vision and language, conducting video-text alignment is a crucial step to reduce the semantic gap,...
No-reference image quality assessment is a basic and challenging problem in the field of image processing. Among them, contrast distortion has a great impact on the perception of image quality. However, there are relatively few studies on no-reference ...
Referring expression comprehension aims to localize a specific object in an image according to a given language description. It is still challenging to comprehend and mitigate the gap between various types of information in the visual and textual domains. ...
Video frame interpolation (VFI) is of great importance for many video applications, yet it is still challenging even in the era of deep learning. Some existing VFI models directly exploit existing lightweight network frameworks, thus making synthesized in-...
Photo-response non-uniformity (PRNU), as a class of device fingerprint, plays a key role in the forgery detection/localization for visual media. The state-of-the-art PRNU-based forensics methods generally rely on the multi-scale trace analysis and result ...
Video captioning, which bridges vision and language, is a fundamental yet challenging task in computer vision. To generate accurate and comprehensive sentences, both visual and semantic information is quite important. However, most existing methods simply ...
The capability of image semantic segmentation may be deteriorated due to the noisy input image, where image denoising prior to segmentation may help. Both image denoising and semantic segmentation have been developed significantly with the advance of deep ...
Image-based 3D model retrieval aims at organizing unlabeled 3D models according to the relevance to the labeled 2D images. With easy accessibility of 2D images and wide applications of 3D models, image-based 3D model retrieval attracts more and more ...
Nowadays, three-dimensional (3D) meshes are widely used in various applications in different areas (e.g., industry, education, entertainment and safety). The 3D models are captured with multiple RGB-D sensors, and the sampled geometric manifolds are ...
Compared to ordinary images, omnidirectional image (OI) usually has a broader view and a higher resolution, and image quality assessment (IQA) can help people to understand and improve their visual experience. However, the current IQA works cannot achieve ...
With the inevitable dominance of video traffic on the Internet, Internet service providers (ISP) are striving to deliver video streaming with high quality. Video resolution, as a direct reflection of video quality, is a key factor of the video quality of ...
360-degree video provides omnidirectional views by a bounding sphere, thus also called omnidirectional video. For omnidirectional video, people can only see specific content in the viewport through head movement, i.e., only a small portion of the 360-...
Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability ...