Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Document-level event argument extraction (EAE) is a critical event semantic understanding task that requires a model to identify an event's global arguments beyond the sentence level. Existing approaches to this problem are based on supervised ...
Neural machine translation uses a decoder to generate target words auto-regressively by predicting the next target word conditioned on a given source sentence and its previously predicted target words, i.e, its translation history, which suffers from two ...
Open Relation Extraction (OpenRE) aims at clustering relation instances to extract relation types. By learning relation patterns between named entities, it clusters semantically equivalent patterns into a unified relation cluster. Existing clustering-...
Emotion-cause pair extraction (ECPE) is an emerging task born out of Emotion cause extraction (ECE), which aims to extract the emotion clause and the corresponding cause clause simultaneously. Previous methods decompose ECPE into multiple sub-tasks, ...
In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behavior of one of the most prominent Neural Language Models (NLM) based on Transformer architectures, BERT Devlin et al., when dealing with a particular source of ...
Audio-visual signals can be used jointly for robotic perception as they complement each other. Such multi-modal sensory fusion has a clear advantage, especially under noisy acoustic conditions. Speaker localization, as an essential robotic function, was ...
Following the success of Natural Language Processing (NLP) transformers pretrained via self-supervised learning, similar models have been proposed recently for speech processing such as Wav2Vec2, HuBERT and UniSpeech-SAT. An interesting yet unexplored ...
Speech enhancement plays an essential role in a wide range of speech processing applications. Recent studies on speech enhancement tend to investigate how to effectively capture the long-term contextual dependencies of speech signals to boost performance. ...
Knowledge graph completion (KGC) aims to predict missing links based on observed triples. However, current KGC models are still limited by the following two aspects. (1) the entity semantics is implicitly learned by neural network and merely depends on ...
Previous works on cross-lingual Named Entity Recognition (NER) have achieved great success. However, few of them consider the effect of language families between the source and target languages. In this study, we find that the cross-lingual NER ...
Automatic speech recognition (ASR) has been significantly improved in the past years. However, most robust ASR systems are based on air-conducted (AC) speech, and their performances in low signal-to-noise-ratio (SNR) conditions are not satisfactory. Bone-...
Unsupervised cross-lingual transfer has been shown great potentials for dependency parsing of the low-resource languages when there is no annotated treebank available. Recently, the self-training method has received increasing interests because of its ...
In this paper, we propose a method that uses a combination of the Connectionist Temporal Classification (CTC) loss and the cross-entropy loss to train a note-level singing transcription model. By considering the task as predicting a note sequence of the ...
The real-time detection of speech steganography in Voice-over-Internet-Protocol (VoIP) scenarios remains an open problem, as it requires steganalysis methods to perform for low-intensity embeddings and short-sample inputs, as well as provide rapid ...
Adaptive filtering algorithms are pervasive throughout signal processing and have had a material impact on a wide variety of domains including audio processing, telecommunications, biomedical sensing, astrophysics and cosmology, seismology, and many more. ...
In the natural language processing community, open-domain conversational agents, also known as chatbots, are gaining popularity. One of the difficulties is getting them to communicate in an emotionally intelligent manner. To generate dialogues, current ...
Contextual knowledge is essential for reducing speech recognition errors on high-valued long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) component that enables end-to-end ASR models to bias towards a list of long-...
Deep learning based speech enhancement in the short-time Fourier transform (STFT) domain typically uses a large window length such as 32 ms. A larger window can lead to higher frequency resolution and potentially better enhancement. This however incurs an ...
In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This ...
Dialogue state tracking (DST) is often used to track the system's understanding of the user goal in task-oriented dialogue systems. Existing DST methods mainly fall into two categories according to their adopted model structure: non-hierarchical ...