Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the ...
Recent works on single image high dynamic range (HDR) reconstruction fail to hallucinate plausible textures, resulting in information missing and artifacts in large-scale under/over-exposed regions. In this article, a decoupled kernel prediction network ...
Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, in many cases, obtaining the reference point clouds is difficult, so no-reference (NR) metrics have become a research hotspot. Few ...
Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems: (1) synthesis distortion due to the ...
Conventional image compression techniques targeted for the perceptual quality are not generally optimized for classification tasks using deep neural networks (DNNs). To compress images for DNN inference tasks, recent studies have proposed task-centric ...
Online Social Networks (OSNs) have gained enormous popularity in recent years. They provide a dynamic platform for sharing content (text messages or multimedia) and for facilitating communication between friends and acquaintances. Microblogging services ...
Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps ...
Generating food images from recipe and ingredient information can be applied to many tasks such as food recommendation, recipe development, and health management. For the characteristics of food images, this paper proposes ML-CookGAN, a novel CGAN. This ...
An efficient and accurate shape detection model plays a major role in many research areas. With the emergence of more complex shapes in real-life applications, shape recognition models need to capture the structure with more effective features to achieve ...
Defocus blur detection (DBD) aims to segment the blurred regions from a given image affected by defocus blur. It is a crucial pre-processing step for various computer vision tasks. With the increasing popularity of small mobile devices, there is a need ...
Visual and spatial relationship detection in images has been a fast-developing research topic in the multimedia field, which learns to recognize the semantic/spatial interactions between objects in an image, aiming to compose a structured semantic ...
Occlusion is known as one of the most challenging factors in long-term tracking because of its unpredictable shape. Existing works devoted into the design of loss functions, training strategies or model architectures, which are considered to have not ...
Most state-of-the-art deep networks proposed for biomedical image segmentation are developed based on U-Net. While remarkable success has been achieved, its inherent limitations hinder it from yielding more precise segmentation. First, its receptive field ...
Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multi-modal data. As the research on and applications of multi-modal data analysis are becoming more ...
Character reenactment aims to control a target person’s full-head movement by a driving monocular sequence that is made up of the driving character video. Current algorithms utilize convolution neural networks in generative adversarial networks, which ...
Classifying music pieces based on their instrument sounds is pivotal for analysis and application purposes. Given its importance, techniques using machine learning have been proposed to classify violin and viola music pieces. The violin and viola are two ...
Nowadays, the demand for digital images from different intelligent devices and sensors has dramatically increased in smart healthcare. Due to advanced low-cost and easily available tools and software, manipulation of these images is an easy task. Thus, ...
Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing ...
In intra coding, Rate Distortion Optimization (RDO) is performed to achieve the optimal intra mode from a pre-defined candidate list. The optimal intra mode is also required to be encoded and transmitted to the decoder side besides the residual signal, ...
Audio-visual tracks in video contain rich semantic information with potential in many applications and research. Since the audio-visual data have inconsistent distributions and because of the heterogeneous nature of representations, the heterogeneous gap ...
Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that ...
Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, ...
Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content ...