TY - JOUR
T1 - PasswordTensor
T2 - Analyzing and explaining password strength using tensor decomposition
AU - Shin, Youjin
AU - Woo, Simon S.
N1 - Publisher Copyright:
© 2022
PY - 2022/5
Y1 - 2022/5
N2 - A textual password is widely used for user authentication for a variety of applications. Passwords that are easy to remember are also easy to be guessed, while complex and long passwords that provide strong security are difficult to remember. Also, there has been limited quantitative research to understand the factors that make passwords strong. In this research, we aim to expand our understanding of passwords through the lenses of data-driven analysis by characterizing a large number of password datasets with four different hypotheses. In particular, we use the tensor decomposition method that is effective in analyzing unlabeled high dimensional data. We first obtain 362,805 passwords from four different leaked password datasets. Next, we generate syntactic and semantic features for each password, then classify it into three strength groups using a statistical guessing attack model. Finally, we construct a 3rd-order password tensor and decompose it using the PARAFAC2 algorithm to examine the main characteristics which make passwords strong. Also, we apply an orthogonal constraint to the component matrix to mitigate the uniqueness problem. For the optimal rank and constraint selection, we compare three types of constraints in terms of the computational time, reconstruction ratio, and Corcondia score. With various statistical and tensor decomposition analyses, we find dominant factors that influence on a strong password. In addition, we extend our tensor decomposition-based model for strength retrieval when a new password needs to be evaluated. This strength retrieval model can estimate the strength of the new password input quickly and provide recommendations to strengthen the password. We hope that our model based on data science perspective can validate widely accepted password composition policy and suggestion methods, and further provide insights to designing better password suggestion systems and password composition policies.
AB - A textual password is widely used for user authentication for a variety of applications. Passwords that are easy to remember are also easy to be guessed, while complex and long passwords that provide strong security are difficult to remember. Also, there has been limited quantitative research to understand the factors that make passwords strong. In this research, we aim to expand our understanding of passwords through the lenses of data-driven analysis by characterizing a large number of password datasets with four different hypotheses. In particular, we use the tensor decomposition method that is effective in analyzing unlabeled high dimensional data. We first obtain 362,805 passwords from four different leaked password datasets. Next, we generate syntactic and semantic features for each password, then classify it into three strength groups using a statistical guessing attack model. Finally, we construct a 3rd-order password tensor and decompose it using the PARAFAC2 algorithm to examine the main characteristics which make passwords strong. Also, we apply an orthogonal constraint to the component matrix to mitigate the uniqueness problem. For the optimal rank and constraint selection, we compare three types of constraints in terms of the computational time, reconstruction ratio, and Corcondia score. With various statistical and tensor decomposition analyses, we find dominant factors that influence on a strong password. In addition, we extend our tensor decomposition-based model for strength retrieval when a new password needs to be evaluated. This strength retrieval model can estimate the strength of the new password input quickly and provide recommendations to strengthen the password. We hope that our model based on data science perspective can validate widely accepted password composition policy and suggestion methods, and further provide insights to designing better password suggestion systems and password composition policies.
KW - Authentication
KW - Orthogonality
KW - PARAFAC2
KW - Password
KW - Tensor decomposition
UR - https://www.scopus.com/pages/publications/85124422797
U2 - 10.1016/j.cose.2022.102634
DO - 10.1016/j.cose.2022.102634
M3 - Article
AN - SCOPUS:85124422797
SN - 0167-4048
VL - 116
JO - Computers and Security
JF - Computers and Security
M1 - 102634
ER -