Search by Subject

Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Applied Filters

Clear all

People

Publications

Conferences

Reproducibility Badges

Publication Date

88 ResultsEdit SearchSave Search

Searched The ACM Full-Text Collection (691,749 records)|Expand your search to The ACM Guide to Computing Literature (3,482,418 records)

Showing 1 - 20of88 Results

Filters

Select All

Export Citations Save to Binder

per page:

Latest

research-article
September 2021
Artifacts Available / v1.1
Accelerating recommendation system training by leveraging popular choices
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 1September 2021, pp 127–140https://doi.org/10.14778/3485450.3485462

Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (...
3
122
Metrics
Total Citations3
Total Downloads122
Last 12 Months99
Last 6 weeks17
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
September 2021
WindTunnel: towards differentiable ML pipelines beyond a single model
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 1September 2021, pp 11–20https://doi.org/10.14778/3485450.3485452

While deep neural networks (DNNs) have shown to be successful in several domains like computer vision, non-DNN models such as linear models and gradient boosting trees are still considered state-of-the-art over tabular data. When using these models, ...
0
101
Metrics
Total Citations0
Total Downloads101
Last 12 Months73
Last 6 weeks10
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
September 2021
Towards scalable online machine learning collaborations with OpenML
- Joaquin Vanschoren
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 13September 2021, pp 3418https://doi.org/10.14778/3484224.3484239

Is massively collaborative machine learning possible? Can we share and organize our collective knowledge of machine learning to solve ever more challenging problems? In a way, yes: as a community, we are already very successful at developing high-...
0
Metrics
Total Citations0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
July 2021
Progressive compressed records: taking a byte out of deep learning data
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 11July 2021, pp 2627–2641https://doi.org/10.14778/3476249.3476308

Deep learning accelerators efficiently train over vast and growing amounts of data, placing a newfound burden on commodity networks and storage devices. A common approach to conserve bandwidth involves resizing or compressing data prior to training. We ...
1
19
Metrics
Total Citations1
Total Downloads19
Last 12 Months13
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
July 2021
LANCET: labeling complex data at scale
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 11July 2021, pp 2154–2166https://doi.org/10.14778/3476249.3476269

Cutting-edge machine learning techniques often require millions of labeled data objects to train a robust model. Because relying on humans to supply such a huge number of labels is rarely practical, automated methods for label generation are needed. ...
0
69
Metrics
Total Citations0
Total Downloads69
Last 12 Months34
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
June 2021
Dual-objective fine-tuning of BERT for entity matching
- Ralph Peeters,
- Christian Bizer
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 10June 2021, pp 1913–1921https://doi.org/10.14778/3467861.3467878

An increasing number of data providers have adopted shared numbering schemes such as GTIN, ISBN, DUNS, or ORCID numbers for identifying entities in the respective domain. This means for data integration that shared identifiers are often available for a ...
8
102
Metrics
Total Citations8
Total Downloads102
Last 12 Months82
Last 6 weeks5
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
March 2021
Adaptive data augmentation for supervised learning over missing data
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 7March 2021, pp 1202–1214https://doi.org/10.14778/3450980.3450989

Real-world data is dirty, which causes serious problems in (supervised) machine learning (ML). The widely used practice in such scenario is to first repair the labeled source (a.k.a. train) data using rule-, statistical- or ML-based methods and then use ...
2
240
Metrics
Total Citations2
Total Downloads240
Last 12 Months89
Last 6 weeks5
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
March 2021
The case for NLP-enhanced database tuning: towards tuning tools that "read the manual"
- Immanuel Trummer
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 7March 2021, pp 1159–1165https://doi.org/10.14778/3450980.3450984

A large body of knowledge on database tuning is available in the form of natural language text. We propose to leverage natural language processing (NLP) to make that knowledge accessible to automated tuning tools. We describe multiple avenues to exploit ...
3
150
Metrics
Total Citations3
Total Downloads150
Last 12 Months71
Last 6 weeks6
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
January 2021
Improving information extraction from visually rich documents using visual span representations
- Ritesh Sarkhel,
- Arnab Nandi
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 5January 2021, pp 822–834https://doi.org/10.14778/3446095.3446104

Along with textual content, visual features play an essential role in the semantics of visually rich documents. Information extraction (IE) tasks perform poorly on these documents if these visual cues are not taken into account. In this paper, we ...
1
120
Metrics
Total Citations1
Total Downloads120
Last 12 Months31
Last 6 weeks8
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
January 2021
DBTagger: multi-task learning for keyword mapping in NLIDBs using Bi-directional recurrent neural networks
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 5January 2021, pp 813–821https://doi.org/10.14778/3446095.3446103

Translating Natural Language Queries (NLQs) to Structured Query Language (SQL) in interfaces deployed in relational databases is a challenging task, which has been widely studied in database community recently. Conventional rule based systems utilize ...
0
79
Metrics
Total Citations0
Total Downloads79
Last 12 Months32
Last 6 weeks4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
November 2020
Nearest neighbor classifiers over incomplete information: from certain answers to certain predictions
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 3November 2020, pp 255–267https://doi.org/10.14778/3430915.3430917

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, and their impact on ML applications ...
1
29
Metrics
Total Citations1
Total Downloads29
Last 12 Months27
Last 6 weeks2
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
May 2020
Guided exploration of user groups
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 9May 2020, pp 1469–1482https://doi.org/10.14778/3397230.3397242

Finding a set of users of interest serves several applications in behavioral analytics. Often times, identifying users requires to explore the data and gradually choose potential targets. This is a special case of Exploratory Data Analysis (EDA), an ...
7
56
Metrics
Total Citations7
Total Downloads56
Last 12 Months12
Last 6 weeks4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
May 2020
ADnEV: cross-domain schema matching using deep similarity matrix adjustment and evaluation
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 9May 2020, pp 1401–1415https://doi.org/10.14778/3397230.3397237

Schema matching is a process that serves in integrating structured and semi-structured data. Being a handy tool in multiple contemporary business and commerce applications, it has been investigated in the fields of databases, AI, Semantic Web, and data ...
6
270
Metrics
Total Citations6
Total Downloads270
Last 12 Months90
Last 6 weeks12
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
May 2020
ARDA: automatic relational data augmentation for machine learning
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 9May 2020, pp 1373–1387https://doi.org/10.14778/3397230.3397235

Automatic machine learning (AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the ...
12
110
Metrics
Total Citations12
Total Downloads110
Last 12 Months30
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
January 2020
MDedup: duplicate detection with matching dependencies
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 5January 2020, pp 712–725https://doi.org/10.14778/3377369.3377379

Duplicate detection is an integral part of data cleaning and serves to identify multiple representations of same real-world entities in (relational) datasets. Existing duplicate detection approaches are effective, but they are also hard to parameterize ...
5
182
Metrics
Total Citations5
Total Downloads182
Last 12 Months25
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
January 2020
MEGA: multi-view semi-supervised clustering of hypergraphs
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 5January 2020, pp 698–711https://doi.org/10.14778/3377369.3377378

Complex relationships among entities can be modeled very effectively using hypergraphs. Hypergraphs model real-world data by allowing a hyperedge to include two or more entities. Clustering of hypergraphs enables us to group the similar entities ...
4
423
Metrics
Total Citations4
Total Downloads423
Last 12 Months100
Last 6 weeks8
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
September 2019
Fast large-scale trajectory clustering
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 1September 2019, pp 29–42https://doi.org/10.14778/3357377.3357380

In this paper, we study the problem of large-scale trajectory data clustering, k-paths, which aims to efficiently identify k "representative" paths in a road network. Unlike traditional clustering approaches that require multiple data-dependent ...
18
451
Metrics
Total Citations18
Total Downloads451
Last 12 Months98
Last 6 weeks7
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
August 2019
Smile: a system to support machine learning on EEG data at scale
Proceedings of the VLDB Endowment (PVLDB), Volume 12, Issue 12August 2019, pp 2230–2241https://doi.org/10.14778/3352063.3352138

In order to reduce the possibility of neural injury from seizures and sidestep the need for a neurologist to spend hours on manually reviewing the EEG recording, it is critical to automatically detect and classify "interictal-ictal continuum" (IIC) ...
5
247
Metrics
Total Citations5
Total Downloads247
Last 12 Months41
Last 6 weeks12
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
August 2019
TitAnt: online real-time transaction fraud detection in Ant Financial
Proceedings of the VLDB Endowment (PVLDB), Volume 12, Issue 12August 2019, pp 2082–2093https://doi.org/10.14778/3352063.3352126

With the explosive growth of e-commerce and the booming of e-payment, detecting online transaction fraud in real time has become increasingly important to Fintech business. To tackle this problem, we introduce the TitAnt, a transaction fraud detection ...
6
310
Metrics
Total Citations6
Total Downloads310
Last 12 Months66
Last 6 weeks5
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
August 2019
Machine learning meets big spatial data
- Ibrahim Sabek,
- Mohamed F. Mokbel
Proceedings of the VLDB Endowment (PVLDB), Volume 12, Issue 12August 2019, pp 1982–1985https://doi.org/10.14778/3352063.3352115

The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with ...
3
279
Metrics
Total Citations3
Total Downloads279
Last 12 Months50
Last 6 weeks4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access

Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Applied Filters

People

Names

Affiliations

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Accelerating recommendation system training by leveraging popular choices

WindTunnel: towards differentiable ML pipelines beyond a single model

Towards scalable online machine learning collaborations with OpenML

Progressive compressed records: taking a byte out of deep learning data

LANCET: labeling complex data at scale

Dual-objective fine-tuning of BERT for entity matching

Adaptive data augmentation for supervised learning over missing data

The case for NLP-enhanced database tuning: towards tuning tools that "read the manual"

Improving information extraction from visually rich documents using visual span representations

DBTagger: multi-task learning for keyword mapping in NLIDBs using Bi-directional recurrent neural networks

Nearest neighbor classifiers over incomplete information: from certain answers to certain predictions

Guided exploration of user groups

ADnEV: cross-domain schema matching using deep similarity matrix adjustment and evaluation

ARDA: automatic relational data augmentation for machine learning

MDedup: duplicate detection with matching dependencies

MEGA: multi-view semi-supervised clustering of hypergraphs

Fast large-scale trajectory clustering

Smile: a system to support machine learning on EEG data at scale

TitAnt: online real-time transaction fraud detection in Ant Financial

Machine learning meets big spatial data