An Overview of Smart Data Systems

The first step of any data analytics is to prepare the data. Please find more about data preparation here.

Now, given a well prepared dataset, a user typically needs data visualizations to help understand the data. Instead of asking the user to be familiar with visualization languages such as Vega-Lite, Vega, D3, or tools such as Tableau, we want to enable novices to easily generate visualizations. Naturally, a user-friendly query interface is Google-like keywords, where have proposed in our vision paper, and showed in a SIGMOD demo. Moreover, we need to automatically recommend good visualizations to the user (see paper for more details). All these are components of the DeepEye system.

When a user picks a visualization, oftentimes, the visualization might be wrong because the underlying data is dirty. Instead of cleaning the entire dataset, which is prohibitively expensive in practice, we propose to progressively clean the visualization through interactive data cleaning (see paper for more details).

DeepEye: Democratizing Data Visualization (online demo)

Why?

A picture is worth a thousand words. A good visualization is worth a terabyte of data. Nowadays, the ability to create good visualizations has shifted from a nice-to-have skill to a must-have skill for all data analysts to help managers make business decisions, where data is the primary force behind for its high volume and overwhelming velocity. Despite the overwhelming choices of interactive data visualization tools for experts, non-experts have poor choices for effective visualization recommendation systems such that everyone can easily create great visualizations.

What are the fundamental problems?

Visualization Recognition

Given one visualization, such as a bar chart or a scatter plot, is it good from understanding human perception?

Visualization Ranking

Given two visualizations, can we quantify them and say that which one is better?

Visualization Searching

It might be very hard, if not impossible, to guess what a user really wants. Can they do simple keyword like search?

How?

Our basic idea is "visualization by examples" – there are plenty of generic priors to showcase great visualizations, which can be used to learn human perception, e.g., a bar chart with more than 50 bars is clearly bad. Given thousands of good examples and some human labeled ranking orders, DeepEye trains a binary classifier (a random forest, or a SVM) for visualization recognition, and uses a supervised learning-to-rank model for visualization ranking.

Interactive Cleaning for Progressive Visualization

Why?

Data visualizations are not always exact and good, and the uncertainty of data visualization may misguide users by showing false discoveries. One common reason for generating such bad (uncertain) visualizations is because real-life data is dirty.

How?

Practically, it is too expensive to completely clean a dataset. Intuitively, compared with cleaning the entire dataset, only cleaning task (such as a bar chart or a pie chart) relevant data should be much cheaper. We study a new problem, interactive cleaning for progressive visualization, to progressively improve the quality of visualization by minimizing the cost of interacting with the user to clean the visualization-aware data. Please find more details in the paper.

Publications

Interactive Cleaning for Progressive Visualization through Composite Questions [pdf]
The 36th IEEE International Conference on Data Engineering (ICDE), Dallas, Texas, USA, 2020
Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, and Guoliang Li

Making Data Visualization More Efficient and Effective: A Survey [pdf]
The VLDB Journal (VLDBJ), to appear
Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li

Towards Democratizing Relational Data Visualization [Tutorial] [pdf]
ACM SIGMOD Conference on Management of Data (SIGMOD Tutorial), Amsterdam, The Netherlands, 2019
Nan Tang, Eugene Wu and Guoliang Li
Presentations: Introduction, 30 mins [keynote]; Efficient data visualization, 1 hour [ppt]; Smart data visualization, 1 hour [keynote]; Uncertainty, collaborative and immersive data visualization, 30 mins [keynote]

DeepEye: Creating Good Data Visualizations by Keyword Search
ACM SIGMOD Conference on Management of Data (SIGMOD demo), Houston, USA, 2018
Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li and Xinran Wang

DeepEye: Visualizing Your Data by Keyword Search
The 21st International Conference on Extending Database Technology (EDBT vision paper), Austria, Vienna, 2018
Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li

DeepEye: Towards Automatic Data Visualization
The 34th IEEE International Conference on Data Engineering (ICDE), Paris, France, 2018
Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li