Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Content cold-start is a core problem in recommendation field, by which service providers can mine the potential profit from content that has not yet been discovered by most users, and provide more accurate personalized service to their users. In video ...
ACM Multimedia 2019 Video Relation Understanding Challenge is the first grand challenge aiming at pushing video content analysis at the relational and structural level. This year, the challenge asks the participants to explore and develop innovative ...
In this paper, we present our solutions to the grand challenge task "Relation Understanding in Videos" in ACM Multimedia 2019. The challenge task aims to detect instances of target visual relations in a video, where a visual relation instance is ...
Video visual relation detection is a meaningful research problem, which aims to build a bridge between dynamic vision and language. In this paper, we propose a novel video visual relation detection method with multi-model feature fusion. First, we ...
This paper introduces our solution for iQIYI Celebrity Video Identification Challenge. After analyzing the iQIYI-VID-2019 dataset, we find the distribution of the dataset is very unbalanced and there are many unlabeled samples in the validation set and ...
Inspired by recent advances in computer vision and deep learning, we propose new enhancements to tackle problems appearing in endoscopic image analysis, especially abnormality finding and anatomical landmark detection. In details, a combination of ...
In this paper, we present a method to automatically identify diseases from videos of gastrointestinal (GI) tract examinations using a Deep Convolutional Neural Network (DCNN) that processes images from digital endoscopes. Our goal is to aid domain ...
Beauty and Personal care product retrieval has attracted more and more research attention for its value in real life. However, suffering from data variants and complex background, this task has been very challenging. In this paper, we propose a novel ...
Medical image classification and diagnosis is currently a hot topic in the field of deep learning. The ACM International Conference on Multimedia and Simula co-hosted the MutilMedia Grand Challenge, which aims to use artificial intelligence aiding ...
Beauty product retrieval is a challenging task due to the severe image variation issue in real-world scenes. In this work, to mitigate the data variation problem, we contribute a background-agnostic feature extractor, which is trained by a self-...
With the rise of multi-modal applications, the need for better understanding of the relationship between language and vision becomes prominent. While modern applications often consider both text and image, human perception is often only of secondary ...
Sign Language is the primary means of communication for the majority of the Deaf and hard-of-hearing communities. Current computational approaches in this general research area have focused specifically on sign language recognition and the translation ...
In recent years, we have observed a rise of interest in the multimedia community towards research topics related to health. It can be observed that this goes into two interesting directions. One is personal health with a larger focus on well-being and ...
With the prevalence of accessible depth sensors, dynamic human body skeletons have attracted much attention as a robust modality for action recognition. Previous methods model skeletons based on RNN or CNN, which has limited expressive power for ...
This companion paper supports the replication of scene image recognition experiments using Adaptive Discriminative Region Discovery (Adi-Red), an approach presented at ACM Multimedia 2018. We provide a set of artifacts that allow the replication of the ...
In our recent papers, we proposed a new family of residual convolutional neural networks trained for semi-dense and sparse depth reconstruction without use of RGB channel. The proposed models can be used in low-resolution depth sensors or SLAM methods ...
Mixed dish is a food category that contains different dishes mixed in one plate, and is popular in Eastern and Southeast Asia. Recognizing individual dishes in a mixed dish image is important for health related applications, e.g. calculating the ...
Humans have a surprising capacity to induce general rules that describe the specific actions portrayed in a video sequence. The rules learned through this kind of process allow us to achieve similar goals to those shown in the video but in more general ...
Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, frame-wisely recognizing localized text regions, tracking ...
Zero-Shot Learning (ZSL) seeks to recognize a sample from either seen or unseen domain by projecting the image data and semantic labels into a joint embedding space. However, most existing methods directly adapt a well-trained projection from one domain ...