COP: Planning Conflicts for Faster Parallel Transactional Machine Learning.


Road to Freedom in Big Data Analytics.


Messing up with BART: Error Generation for Evaluating Data Cleaning Algorithms. 

Veracity of Big Data. From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. 

Laure Berti- Equille, Javier Borge- Holthoefer

Tutorial of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia, October 2015

Veracity of Big Data. From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics.

#Data Veracity Estimation with Ensembling Truth Discovery Methods.

A quality- Aware Spatial Data Warehouse for Querying Hydroecological Data.

A Masking Index for Quantifying Hidden Glitches.

Towards Principled Data Science Assessment - The Personal Data Science Process (PdsP). 

Unsupervised Quantification of Under- and Over- Segmentation for Object- Based Remote Sensing Image Analysis.

Learning to Identify Relevant Studies for Systematic Reviews

Similarity Group- by Operators for Multi- dimensional Relational Data

Lightning Fast and Space Efficient Inequality Joins

AQWA: Adaptive Query- Workload- Aware Partitioning of Big Spatial Data.

Divide Conquer- based Inclusion Dependency Discovery

KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing (A demo)

A Demonstration of AQWA: Adaptive Query- Workload- Aware Partitioning of Big Spatial Data (A demo)

BigDansing: A System for Big Data Cleansing

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

Updating Graph Indices with a One- Pass Algorithm

DataXFormer: An Interactive Data Transformation Tool (Best Demo Award)

Squall: Fine- Grained Live Reconfiguration for Partitioned Main Memory Databases

Deep Learning for the Web

Abstractive Meeting Summarization Using Dependency Graph Fusion

Using Subjectivity Analysis to Improve Thread Retrieval in Online Forums

Query- Time Record Linkage and Fusion over Web Databases

CliqueSquare: Flat Plans for Massively Parallel RDF Queries

Proof Positive and Negative in Data Cleaning

CliqueSquare in Action: Flat Plans for Massively Parallel RDF Queries (demo)

AllegatorTrack: Visualizing and Explaining Truth Discovery Results from Multisource Data (demo)

Cost Estimation of Spatial k- Nearest- Neighbor Operators

Efficient Processing of Hamming- Distance- Based Similarity- Search Queries Over MapReduce

Approving Updates in Collaborative Databases

DataXFormer: Leveraging the Web for Semantic Data Transformations


Towards Dependable Data Repairing with Fixing Rules

The Similarity-aware Relational Intersect Database Operator (Best Paper Award)

Big Data CleaningAPWeb 2014 (invited as Distinguished Lecture Series)

A Masking Index for Quantifying Hidden Glitches (extended version)

Web Data Quality: Current State and New Challenges

Descriptive and Prescriptive Data Cleaning

NADEEF/ER: Generic and Interactive Entity Resolution

Interaction between Record Matching and Data Repairing

Conflict Resolution with Data Currency and Consistency

Towards Zero-Overhead Static and Adaptive Indexing

Discovering Denial Constraints

Scalable Discovery of Unique Column Combinations

iHUB – An Information and Collaborative Management Platform for Life Sciences

RuleMiner: Data Quality Rules Discovery

Detecting Unique Column Combinations on Dynamic Data

Mapping and Cleaning

IQ-Meter: An Evaluation Tool for Data-Transformation Systems

JISC: Adaptive Stream Processing Using Just-In-Time State Completion


A Masking Index for Quantifying Hidden Glitches

Data Quality Problems beyond Consistency and Deduplication

HandsOn DB: Managing Data Dependencies involving Human Actions

Future Locations Prediction with Uncertain Data

Extraction and Integration of Partially Overlapping Web Sources

The Llunatic Data Cleaning Framework

NADEEF: A Generalized Data Cleaning System

Author Disambiguation by Hierarchical Agglomerative Clustering with Adaptive Stopping Criterion

NADEEF: A Commodity Data Cleaning System

Don’t be SCAREd: Use SCalable Automatic REpairing with Maximal Likelihood and Bounded Changes

Cartilage: Adding Flexibility to the Hadoop Skeleton

Elephant, Do not Forget Everything! Efficient Processing of Growing Datasets

Introduction to the special issue on data quality

Holistic Data Cleaning: Put Violations Into Context

On the Relative Trust between Inconsistent Data and Inaccurate Constraints

Inferring Data Currency and Consistency for Conflict Resolution

Data Curation at Scale: The Data Tamer System

WWHow! Freeing Data Storage from Cages


Atlas: a tool to explore interconnected ionomic, genomic and environmental data

What is the IQ of your data transformation system?

Incremental Detection of Inconsistencies in Distributed Data

Incremental Detection of Inconsistencies in Distributed Data

Spatial Queries with Two kNN Predicates

M3: Stream Processing on Main-Memory MapReduce

High-resolution genome-wide scan of genes, gene-networks and cellular systems impacting the yeast ionome

Interactive web-based breastfeeding monitoring: feasibility, usability, and acceptability

Development and Assessment of an Interactive Web-Based Breastfeeding Monitoring System (LACTOR)


Semantic Web Services for Web DatabasesPublisher:

Guided Data Repair Proceedings

ACConv – An Access Control Model for Conversational Web Services