TY - GEN
T1 - Disentangled Representation of Data Distributions in Scatterplots
AU - Jo, Jaemin
AU - Seo, Jinwook
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - We present a data-driven approach to obtain a disentangled and interpretable representation that can characterize bivariate data distributions of scatterplots. We first collect tabular datasets from the Web and build a training corpus consisting of over one million scatterplot images. Then, we train a state-of-the-art disentangling model, β-variational autoencoder, to derive a disentangled representation of the scatterplot images. The main output of this work is a list of 32 representative features that can capture the underlying structures of bivariate data distributions. Through latent traversals, we seek for high-level semantics of the features and compare them to previous human-derived concepts such as scagnostics measures. Finally, using the 32 features as an input, we build a simple neural network to predict the perceptual distances between scatterplots that were previously scored by human annotators. We found Pearson's correlation coefficient between the predicted and perceptual distances was above 0.75, which indicates the effectiveness of our representation in the quantitative characterization of scatterplots.
AB - We present a data-driven approach to obtain a disentangled and interpretable representation that can characterize bivariate data distributions of scatterplots. We first collect tabular datasets from the Web and build a training corpus consisting of over one million scatterplot images. Then, we train a state-of-the-art disentangling model, β-variational autoencoder, to derive a disentangled representation of the scatterplot images. The main output of this work is a list of 32 representative features that can capture the underlying structures of bivariate data distributions. Through latent traversals, we seek for high-level semantics of the features and compare them to previous human-derived concepts such as scagnostics measures. Finally, using the 32 features as an input, we build a simple neural network to predict the perceptual distances between scatterplots that were previously scored by human annotators. We found Pearson's correlation coefficient between the predicted and perceptual distances was above 0.75, which indicates the effectiveness of our representation in the quantitative characterization of scatterplots.
KW - concepts and paradigms
KW - Human-centered computing
KW - Visualization
KW - Visualization theory
UR - https://www.scopus.com/pages/publications/85078068072
U2 - 10.1109/VISUAL.2019.8933670
DO - 10.1109/VISUAL.2019.8933670
M3 - Conference contribution
AN - SCOPUS:85078068072
T3 - 2019 IEEE Visualization Conference, VIS 2019
SP - 136
EP - 140
BT - 2019 IEEE Visualization Conference, VIS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Visualization Conference, VIS 2019
Y2 - 20 October 2019 through 25 October 2019
ER -