Bibliometrics
Skip Table Of Content Section
survey
Open Access
A Survey on Temporal Sentence Grounding in Videos
Article No.: 51, pp 1–33https://doi.org/10.1145/3532626

Temporal sentence grounding in videos (TSGV), which aims at localizing one target segment from an untrimmed video with respect to a given sentence query, has drawn increasing attentions in the research community over the past few years. Different from the ...

research-article
Hierarchical and Progressive Image Matting
Article No.: 52, pp 1–23https://doi.org/10.1145/3540201

Most matting research resorts to advanced semantics to achieve high-quality alpha mattes, and a direct low-level features combination is usually explored to complement alpha details. However, we argue that appearance-agnostic integration can only provide ...

research-article
A Low Distortion and Steganalysis-resistant Reversible Data Hiding for 2D Engineering Graphics
Article No.: 53, pp 1–20https://doi.org/10.1145/3539661

To reduce the distortion resulting from the large number of crossing quantization cells and resist steganalysis, a reversible data hiding scheme for 2D engineering graphics is put forward based on reversible dual-direction quantization index modulation (...

research-article
Multimodal Graph for Unaligned Multimodal Sequence Analysis via Graph Convolution and Graph Pooling
Article No.: 54, pp 1–24https://doi.org/10.1145/3542927

Multimodal sequence analysis aims to draw inferences from visual, language, and acoustic sequences. A majority of existing works focus on the aligned fusion of three modalities to explore inter-modal interactions, which is impractical in real-world ...

research-article
Progressive Localization Networks for Language-Based Moment Localization
Article No.: 55, pp 1–21https://doi.org/10.1145/3543857

This article targets the task of language-based video moment localization. The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments. Most existing methods ...

research-article
Local Correlation Ensemble with GCN Based on Attention Features for Cross-domain Person Re-ID
Article No.: 56, pp 1–22https://doi.org/10.1145/3542820

Person re-identification (Re-ID) has achieved great success in single-domain. However, it remains a challenging task to adapt a Re-ID model trained on one dataset to another one. Unsupervised domain adaption (UDA) was proposed to migrate a model from a ...

research-article
Millimeter Wave and Free-space-optics for Future Dual-connectivity 6DOF Mobile Multi-user VR Streaming
Article No.: 57, pp 1–25https://doi.org/10.1145/3544494

Dual-connectivity streaming is a key enabler of next-generation six Degrees Of Freedom (6DOF) Virtual Reality (VR) scene immersion. Indeed, using conventional sub-6 GHz WiFi only allows to reliably stream a low-quality baseline representation of the VR ...

research-article
An Interaction-process-guided Framework for Small-group Performance Prediction
Article No.: 58, pp 1–25https://doi.org/10.1145/3558768

A small group is a fundamental interaction unit for achieving a shared goal. Group performance can be automatically predicted using computational methods to analyze members’ verbal behavior in task-oriented interactions, as has been proven in several ...

research-article
Egocentric Early Action Prediction via Adversarial Knowledge Distillation
Article No.: 59, pp 1–21https://doi.org/10.1145/3544493

Egocentric early action prediction aims to recognize actions from the first-person view by only observing a partial video segment, which is challenging due to the limited context information of the partial video. In this article, to tackle the egocentric ...

research-article
Image Super-Resolution via Lightweight Attention-Directed Feature Aggregation Network
Article No.: 60, pp 1–23https://doi.org/10.1145/3546076

The advent of convolutional neural networks (CNNs) has brought substantial progress in image super-resolution (SR) reconstruction. However, most SR methods pursue deep architectures to boost performance, and the resulting large model sizes are impractical ...

research-article
Frequency-aware Camouflaged Object Detection
Article No.: 61, pp 1–16https://doi.org/10.1145/3545609

Camouflaged object detection (COD) is important as it has various potential applications. Unlike salient object detection (SOD), which tries to identify visually salient objects, COD tries to detect objects that are visually very similar to the ...

research-article
Hyper-node Relational Graph Attention Network for Multi-modal Knowledge Graph Completion
Article No.: 62, pp 1–21https://doi.org/10.1145/3545573

Knowledge graphs often suffer from incompleteness, and knowledge graph completion (KGC) aims at inferring the missing triplets through knowledge graph embedding from known factual triplets. However, most existing knowledge graph embedding methods only use ...

research-article
Learning Video-Text Aligned Representations for Video Captioning
Article No.: 63, pp 1–21https://doi.org/10.1145/3546828

Video captioning requires that the model has the abilities of video understanding, video-text alignment, and text generation. Due to the semantic gap between vision and language, conducting video-text alignment is a crucial step to reduce the semantic gap,...

research-article
No-reference Quality Assessment for Contrast-distorted Images Based on Gray and Color-gray-difference Space
Article No.: 64, pp 1–20https://doi.org/10.1145/3555355

No-reference image quality assessment is a basic and challenging problem in the field of image processing. Among them, contrast distortion has a great impact on the perception of image quality. However, there are relatively few studies on no-reference ...

research-article
Referring Expression Comprehension Via Enhanced Cross-modal Graph Attention Networks
Article No.: 65, pp 1–21https://doi.org/10.1145/3548688

Referring expression comprehension aims to localize a specific object in an image according to a given language description. It is still challenging to comprehend and mitigate the gap between various types of information in the visual and textual domains. ...

research-article
L2BEC2: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation
Article No.: 66, pp 1–19https://doi.org/10.1145/3547660

Video frame interpolation (VFI) is of great importance for many video applications, yet it is still challenging even in the era of deep learning. Some existing VFI models directly exploit existing lightweight network frameworks, thus making synthesized in-...

research-article
PRNU-based Image Forgery Localization with Deep Multi-scale Fusion
Article No.: 67, pp 1–20https://doi.org/10.1145/3548689

Photo-response non-uniformity (PRNU), as a class of device fingerprint, plays a key role in the forgery detection/localization for visual media. The state-of-the-art PRNU-based forensics methods generally rely on the multi-scale trace analysis and result ...

research-article
Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning
Article No.: 68, pp 1–18https://doi.org/10.1145/3550276

Video captioning, which bridges vision and language, is a fundamental yet challenging task in computer vision. To generate accurate and comprehensive sentences, both visual and semantic information is quite important. However, most existing methods simply ...

research-article
Synergy between Semantic Segmentation and Image Denoising via Alternate Boosting
Article No.: 69, pp 1–23https://doi.org/10.1145/3548459

The capability of image semantic segmentation may be deteriorated due to the noisy input image, where image denoising prior to segmentation may help. Both image denoising and semantic segmentation have been developed significantly with the advance of deep ...

research-article
Self-supervised Image-based 3D Model Retrieval
Article No.: 70, pp 1–18https://doi.org/10.1145/3548690

Image-based 3D model retrieval aims at organizing unlabeled 3D models according to the relevance to the labeled 2D images. With easy accessibility of 2D images and wide applications of 3D models, image-based 3D model retrieval attracts more and more ...

research-article
Open Access
Deep Saliency Mapping for 3D Meshes and Applications
Article No.: 71, pp 1–22https://doi.org/10.1145/3550073

Nowadays, three-dimensional (3D) meshes are widely used in various applications in different areas (e.g., industry, education, entertainment and safety). The 3D models are captured with multiple RGB-D sensors, and the sampled geometric manifolds are ...

research-article
Toward A No-reference Omnidirectional Image Quality Evaluation by Using Multi-perceptual Features
Article No.: 72, pp 1–19https://doi.org/10.1145/3549544

Compared to ordinary images, omnidirectional image (OI) usually has a broader view and a higher resolution, and image quality assessment (IQA) can help people to understand and improve their visual experience. However, the current IQA works cannot achieve ...

research-article
Resolution Identification of Encrypted Video Streaming Based on HTTP/2 Features
Article No.: 73, pp 1–23https://doi.org/10.1145/3551891

With the inevitable dominance of video traffic on the Internet, Internet service providers (ISP) are striving to deliver video streaming with high quality. Video resolution, as a direct reflection of video quality, is a key factor of the video quality of ...

research-article
Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks
Article No.: 74, pp 1–19https://doi.org/10.1145/3551641

360-degree video provides omnidirectional views by a bounding sphere, thus also called omnidirectional video. For omnidirectional video, people can only see specific content in the viewport through head movement, i.e., only a small portion of the 360-...

research-article
Aligning Image Semantics and Label Concepts for Image Multi-Label Classification
Article No.: 75, pp 1–23https://doi.org/10.1145/3550278

Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability ...

Subjects

Comments

About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!