Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Visual saliency computing aims to imitate the human visual attention mechanism to identify the most prominent or unique areas or objects from a visual scene. It is one of the basic low-level image processing techniques and can be applied to many ...
Road intersection plays a vital role in road network construction, automatic drive, and intelligent transportation systems. Most methods detect road intersections only using geometrical features without spatio-temporal features, leading to insufficient ...
Temporal Sentence Grounding in Videos (TSGV), \ie, grounding a natural language sentence which indicates complex human activities in a long and untrimmed video sequence, has received unprecedented attentions over the last few years. Although each newly ...
A very large amount of multimedia data is continually being shared through social networks. In these public spaces, administrators are legally responsible for moderating and controlling the content uploaded or posted to their platforms. However, the ...
Determining the author's intent in a social media post is a challenging multimodal task and requires identifying complex relationships between image and text in the post. For example, the post image can represent an object, person, product, or company, ...
Audio quality degradation can have many causes. For musical applications, this fragmentation may lead to highly unpleasant experiences. Restoration algorithms may be employed to reconstruct missing parts of the audio in a similar way as for image ...
Fake news can rapidly spread through internet users. Approaches proposed in the literature for content classification usually learn models considering textual and contextual features from real and fake news to minimize the spread of disinformation. One ...
Human pose estimation is an important field of Computer Vision that aims to predict poses of individuals from videos and images. It has been used in many different areas including human-computer interaction, motion analysis, surveillance, action ...
A crucial task to overall video understanding is the recognition and localisation in time of different actions or events that are present along the scenes. To address this problem, action segmentation must be achieved. Action segmentation consists of ...
The Internet's popularization has increased the amount of content produced and consumed on the Web. To take advantage of this new market, major content producers such as Netflix and Amazon Prime have emerged focusing on video streaming services. However,...
Anaglyph Stereo Matching is a particular, harder, case of the widely studied Stereo Matching algorithms. When applied to Anaglyph Reversion problem helps to improve stereo content storage and transmission, stereoscopic visualization independence and ...
Due to the evolution of motion capture devices, natural user interfaces have been applied in several areas, such as neuromotor rehabilitation supported by virtual environments. This paper presents a smartphone application that allows the user to ...
Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. ...
Online user reviews written in Portuguese language are widely used by native speakers from several countries for decision-making and a large amount user reviews are available on the Web. However, only few studies on Portuguese sentiment analysis are ...
The massive use of online social media is a reality nowadays. Such an increasing usage also raises growth in malicious activities in social media, one of which is the use of automated users (bots) that disseminate false information and can insert bias ...
This technical report describes the overview of our approach to the "Watch and Buy: Multimodal Product Identification Challenge". Specifically, we tackle this problem with a three-stage framework, i.e., product detection, retrieval and classification. ...
Disentangling factors has proven to be crucial for building interpretable AI systems. Disentangled generative models would have explanatory input variables to increase the trustworthiness and robustness. Previous works apply a progressive ...
Adversarial example(AE) aims at fooling a Convolution Neural Network by introducing small perturbations in the input image. The proposed work uses the magnitude and phase of the Fourier Spectrum and the entropy of the image to defend against AE. We ...
The routing-by-agreement mechanism in capsule networks (CapsNets) is used to build visual hierarchical relationships with a characteristic of assigning parts to wholes. The connections between capsules of different layers become sparser with more ...
Systematic error, which is not determined by chance, often refers to the inaccuracy (involving either the observation or measurement process) inherent to a system. In this paper, we exhibit some long-neglected but frequent-happening adversarial examples ...