TY - GEN
T1 - Black-box and Target-specific Attack Against Interpretable Deep Learning Systems
AU - Abdukhamidov, Eldor
AU - Juraev, Firuz
AU - Abuhamad, Mohammed
AU - Abuhmed, Tamer
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/5/30
Y1 - 2022/5/30
N2 - Deep neural network models are susceptible to malicious manipulations even in the black-box settings. Providing explanations for DNN models offers a sense of security by human involvement, which reveals whether the sample is benign or adversarial even though previous studies achieved a high attack success rate. However, interpretable deep learning systems (IDLSes) are shown to be susceptible to adversarial manipulations in white-box settings. Attacking IDLSes in black-box settings is challenging and remains an open research domain. In this work, we propose a black-box version of the white-box AdvEdge approach against IDLSes, which is query-efficient and gradient-free without obtaining any knowledge of the target DNN model and its coupled interpreter. Our approach takes advantage of transfer-based and score-based techniques using the effective microbial genetic algorithm (MGA). We achieve a high attack success rate with a small number of queries and high similarity in interpretations between adversarial and benign samples.
AB - Deep neural network models are susceptible to malicious manipulations even in the black-box settings. Providing explanations for DNN models offers a sense of security by human involvement, which reveals whether the sample is benign or adversarial even though previous studies achieved a high attack success rate. However, interpretable deep learning systems (IDLSes) are shown to be susceptible to adversarial manipulations in white-box settings. Attacking IDLSes in black-box settings is challenging and remains an open research domain. In this work, we propose a black-box version of the white-box AdvEdge approach against IDLSes, which is query-efficient and gradient-free without obtaining any knowledge of the target DNN model and its coupled interpreter. Our approach takes advantage of transfer-based and score-based techniques using the effective microbial genetic algorithm (MGA). We achieve a high attack success rate with a small number of queries and high similarity in interpretations between adversarial and benign samples.
KW - adversarial machine learning
KW - genetic algorithm
KW - interpretable machine learning
KW - single-class attack
KW - target-specific attack
UR - https://www.scopus.com/pages/publications/85131651353
U2 - 10.1145/3488932.3527283
DO - 10.1145/3488932.3527283
M3 - Conference contribution
AN - SCOPUS:85131651353
T3 - ASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security
SP - 1216
EP - 1218
BT - ASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security
PB - Association for Computing Machinery, Inc
T2 - 17th ACM ASIA Conference on Computer and Communications Security 2022, ASIA CCS 2022
Y2 - 30 May 2022 through 3 June 2022
ER -