Dr. Nan Tang

Curriculum Vitae

Dr. Nan Tang is a senior scientist at Qatar Center for Artificial Intelligence, QCRI, HBKU, Qatar. His research interests center around data preparation.

Prior to joining QCRI in Dec 2011, He was a Research Fellow at LFCS (Laboratory for Foundations of Computer Science) at the University of Edinburgh, Edinburgh, UK (2010--2011). He was a scientific staff member with the CWI (Dutch National Research Center for Mathematics and Computer Science), Amsterdam, Netherlands (2008--2010). He got his PhD. degree from The Chinese University of Hong Kong, China (2007). He holds a visiting position at University of Waterloo, Canada (03/2007-08/2007).


No one can whistle a symphony. It takes a whole orchestra to play it. I am really grateful to have the opportunities to work with the best talents in the world.

  • Michael Stonebraker, MIT (2015-): We initiated the project Data Civilizer from 2015, with the goal to build a data preparation tool to find, ingest, clean, and integrate diverse datasets. The unique lesson learned from this 2015 Turing award winner is how to make data systems work for real applications. Of course, this requires to interact with real users from (non-IT) companies, such as Merck and Massachusetts General Hospital (MGH) to understand and solve their daily problems.
  • Sam Madden, MIT (2015-): We started the collaboration also from the Data Civilizer project.
  • Guoliang Li, Tsinghua University (2015-): We started a collaboration based on a common interest in data cleaning and integration [Falcon]. We also solved some cute problems such as cleaning dirty Google Scholar entries [paper].
  • Anhai Doan, University of Wisconsin-Madison (2019-): We are currently working on building systems for collaborative data preparation.
  • Ju Fan, Renmin University of China (2020-): We are working on deep learning for data preparation [RPT], as well as data preparation for deep learning [DAGAN].
  • Armando Solar-Lezama, MIT (2016-2020): Armando is an expert in programming synthesis, a research area that lies at the intersection of programming systems and artificial intelligence. The first project [Synthesizer] we have been working on, together with Sam Madden and an MIT Ph.D. Rohit Singh, is to use program synthesis to synthesize Generic Boolean Formulas as rules for entity resolution. The second project [Castor], together with Sam Madden and another MIT Ph.D. Jack Feser, is to extend the relational algebra with layout operators that describe the particular data items to be stored and the layout of that data in memory, as well as providing a set of equivalence preserving transformations which can transform both the query and the data layout.
  • Collaborators in QCRI: QCRI really provides unique opportunities to work with great people, not only worldwide, but also internally. There is a long list of researchers that I have been working with in QCRI: Ahmed K. Elmagarmid, Divy Agrawal, Ihab Ilyas, Mohammed Zaki, Prasenjit Mitra, Mourad Ouzzani, Paolo Papotti, Jorge Arnulfo Quiane-Ruiz, and Saravanan Thirumuruganathan.