Research Interests

There have been a lot of data systems for managing, querying, and processing data, ranging from relational DBMSs, graph databases, and distributed computing systems such as Spark and Hadoop. We are currently building systems in complementary to these systems, in the following three directions.
  1. Data Preparation: discover, integrate and clean datasets; that is preparing data for data analytics.
  2. Smart Data Systems: enable general users to (visually) analyze and understand data.
  3. Collaborative Data Science: facilitate multiple users (data scientists or workers) to collaborate.

Research Projects

Current Projects

  • Data Civilizer: a data preparation tool to find, ingest, clean, and integrate diverse data sets. [Github]
  • Cymphony: a generic and extensible system for collaborative data science.
  • DeepEye: a system for general users to easily find and visualize interesting data. [Demo]

Closed Projects

  • Castor: Deductive optimization of relational data storage. [Github]
  • DeepER: using distributed representations of tuples for entity resolution. [Github]
  • JARVIS: immersive, interactive, and intelligent data analytics using AR/VR devices.
  • Synthesizer: using program synthesis to generate concise entity resolution rules.
  • TCM: a novel two-dimensional graph stream summarization.
  • FALCON: an interactive, deterministic, and declarative data cleaning system.
  • NADEEF: a commodity data cleaning system. [Github]
  • KATARA: a trusted data cleaning system powered by knowledge bases and crowdsourcing.

My Publications

DBLP Google Scholar
To see the full list of publications, please go to here.