Bibliometrics
Skip Table Of Content Section
survey
A Review on Methods and Applications in Multimodal Deep Learning
Article No.: 76, pp 1–41https://doi.org/10.1145/3545572

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the ...

survey
Improved Random Grid-based Cheating Prevention Visual Cryptography Using Latin Square
Article No.: 77, pp 1–21https://doi.org/10.1145/3550275

Visual cryptography scheme is a method of encrypting secret image into n noiselike shares. The secret image can be reconstructed by stacking adequate shares. In the past two decades, many schemes have been proposed to realize the cheating prevention ...

research-article
A Decoupled Kernel Prediction Network Guided by Soft Mask for Single Image HDR Reconstruction
Article No.: 79, pp 1–23https://doi.org/10.1145/3550277

Recent works on single image high dynamic range (HDR) reconstruction fail to hallucinate plausible textures, resulting in information missing and artifacts in large-scale under/over-exposed regions. In this article, a decoupled kernel prediction network ...

research-article
Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric
Article No.: 80, pp 1–26https://doi.org/10.1145/3550274

Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, in many cases, obtaining the reference point clouds is difficult, so no-reference (NR) metrics have become a research hotspot. Few ...

research-article
Pose- and Attribute-consistent Person Image Synthesis
Article No.: 81, pp 1–21https://doi.org/10.1145/3554739

Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems: (1) synthesis distortion due to the ...

research-article
Scalable Color Quantization for Task-centric Image Compression
Article No.: 82, pp 1–18https://doi.org/10.1145/3551389

Conventional image compression techniques targeted for the perceptual quality are not generally optimized for classification tasks using deep neural networks (DNNs). To compress images for DNN inference tasks, recent studies have proposed task-centric ...

research-article
From False-Free to Privacy-Oriented Communitarian Microblogging Social Networks
Article No.: 83, pp 1–23https://doi.org/10.1145/3555354

Online Social Networks (OSNs) have gained enormous popularity in recent years. They provide a dynamic platform for sharing content (text messages or multimedia) and for facilitating communication between friends and acquaintances. Microblogging services ...

research-article
Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation
Article No.: 84, pp 1–20https://doi.org/10.1145/3555314

Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps ...

research-article
ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation
Article No.: 85, pp 1–21https://doi.org/10.1145/3554738

Generating food images from recipe and ingredient information can be applied to many tasks such as food recommendation, recipe development, and health management. For the characteristics of food images, this paper proposes ML-CookGAN, a novel CGAN. This ...

research-article
GHOSM: Graph-based Hybrid Outline and Skeleton Modelling for Shape Recognition
Article No.: 86, pp 1–23https://doi.org/10.1145/3554922

An efficient and accurate shape detection model plays a major role in many research areas. With the emergence of more complex shapes in real-life applications, shape recognition models need to capture the structure with more effective features to achieve ...

research-article
Distill-DBDGAN: Knowledge Distillation and Adversarial Learning Framework for Defocus Blur Detection
Article No.: 87, pp 1–26https://doi.org/10.1145/3557897

Defocus blur detection (DBD) aims to segment the blurred regions from a given image affected by defocus blur. It is a crucial pre-processing step for various computer vision tasks. With the increasing popularity of small mobile devices, there is a need ...

research-article
Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning
Article No.: 88, pp 1–18https://doi.org/10.1145/3556978

Visual and spatial relationship detection in images has been a fast-developing research topic in the multimedia field, which learns to recognize the semantic/spatial interactions between objects in an image, aiming to compose a structured semantic ...

research-article
Robust Long-Term Tracking via Localizing Occluders
Article No.: 89, pp 1–15https://doi.org/10.1145/3557896

Occlusion is known as one of the most challenging factors in long-term tracking because of its unpredictable shape. Existing works devoted into the design of loss functions, training strategies or model architectures, which are considered to have not ...

research-article
Context Prior Guided Semantic Modeling for Biomedical Image Segmentation
Article No.: 90, pp 1–19https://doi.org/10.1145/3558520

Most state-of-the-art deep networks proposed for biomedical image segmentation are developed based on U-Net. While remarkable success has been achieved, its inherent limitations hinder it from yielding more precise segmentation. First, its receptive field ...

research-article
Open Access
A Optimized BERT for Multimodal Sentiment Analysis
Article No.: 91, pp 1–12https://doi.org/10.1145/3566126

Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multi-modal data. As the research on and applications of multi-modal data analysis are becoming more ...

research-article
Progressive Transformer Machine for Natural Character Reenactment
Article No.: 92, pp 1–22https://doi.org/10.1145/3559107

Character reenactment aims to control a target person’s full-head movement by a driving monocular sequence that is made up of the driving character video. Current algorithms utilize convolution neural networks in generative adversarial networks, which ...

research-article
Is it Violin or Viola? Classifying the Instruments’ Music Pieces using Descriptive Statistics
Article No.: 93, pp 1–22https://doi.org/10.1145/3563218

Classifying music pieces based on their instrument sounds is pivotal for analysis and application purposes. Given its importance, techniques using machine learning have been proposed to classify violin and viola music pieces. The violin and viola are two ...

research-article
EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System
Article No.: 94, pp 1–19https://doi.org/10.1145/3561513

Nowadays, the demand for digital images from different intelligent devices and sensors has dramatically increased in smart healthcare. Due to advanced low-cost and easily available tools and software, manipulation of these images is an easy task. Thus, ...

research-article
UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection
Article No.: 95, pp 1–21https://doi.org/10.1145/3561824

Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing ...

research-article
Open Access
Deep Learning-Based Intra Mode Derivation for Versatile Video Coding
Article No.: 96, pp 1–20https://doi.org/10.1145/3563699

In intra coding, Rate Distortion Optimization (RDO) is performed to achieve the optimal intra mode from a pre-defined candidate list. The optimal intra mode is also required to be encoded and transmitted to the decoder side besides the residual signal, ...

research-article
Learning Explicit and Implicit Dual Common Subspaces for Audio-visual Cross-modal Retrieval
Article No.: 97, pp 1–23https://doi.org/10.1145/3564608

Audio-visual tracks in video contain rich semantic information with potential in many applications and research. Since the audio-visual data have inconsistent distributions and because of the heterogeneous nature of representations, the heterogeneous gap ...

research-article
Real-time Image Enhancement with Attention Aggregation
Article No.: 98, pp 1–19https://doi.org/10.1145/3564607

Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that ...

research-article
Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos
Article No.: 99, pp 1–24https://doi.org/10.1145/3565024

Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, ...

research-article
Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning
Article No.: 100, pp 1–22https://doi.org/10.1145/3566127

Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content ...

Subjects

Comments

About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!