Self-automatic Data Visualization for Interpretable Data Science
My initial goal was to help users understand the tables discovered from a data lake beyond eyeballing, in particular for the Data Civilizer project. I started to explore (automatic) data visualization from 2018, in collaboration with Tsinghua University. However, the overwhelming choices of interactive data visualization tools (e.g., Tableau and D3) only allow experts to create good visualizations. Non-experts have poor choices for visualization recommendation systems, which allow anyone to create good visualizations automatically or simply like a Google search.
For a general introduction about data visualization for data preparation, please check out:
- Towards Democratizing Relational Data Visualization: SIGMOD'19 tutorial, co-presented with Eugene Wu [part 2] from Columbia University and Guoliang Li [part 3] from Tsinghua University, where I presented the [part 1] and the [part 4].
- Making Data Visualization More Efficient and Effective: A Survey, published at VLDB Journal'20.
Visualization RecommendationDeepEye is among the first ML-based visualization recommendation systems. It tackles two problems:
- Visualization recognition: whether a visualization for a given dataset is interesting, from an understanding of human perception; and
- Visualization ranking: given two visualizations, which one is “better”.
The initial DeepEye paper is at ICDE'18. You can read more from our SIGMOD'18 demo paper and an online demo. The code for DeepEye-APIs is available. We have also adopted DeepEye for COVID-19 data analysis at here, as well as a paper at IEEE Data Bulletin'20.
Natural Language to VisualizationA common concern for visualization recommendation systems is that, they may recommend visualizations that could be worse than nothing by misleading users, simply because it is hard to guess a user’s query intent. We extended DeepEye to support Google-like keyword search at EDBT'18, such as “show me the trend of flight delays”, using NLP semantic parsers.
Apparently, the state-of-the-art natural language techniques are deep learning based, but a big obstacle for advancing the field of NL2VIS is the lack of benchmarks. We propose the first NL2VIS benchmark, called nvBench at SIGMOD'21. Based on this benchmark, we further propose a Transformer-based sequence-to-sequence model that translates natural language queries to targeted visualizations VIS'21.
COVID-19 DashboardsI have worked on a few COVID-19 dashboards.
- Qatar situation dashboard (link): I have developed a situation dashboard for Qatar, which was used by MOI Qatar, and was showcased on AlJazeera and Turkish TV.
- COVID mobility analysis (link): Google released the mobility data for help people combat the COVID-19 pandemic. I built a Google mobility dashboard based on data released by Google, which was used by MOPH Qatar, Kuwait Health Ministry, and Nigerian National Bureau of Statistics.
- COVID data and mobility analysis in China. In early 2020, we built a COVID-19 dashboard that attracted millions of visits, and worked with China Mobile to visualize and analyze the trajectories of infected persons in Beijing (IEEE Data Bulletin'20).