Black-box and Target-specific Attack Against Interpretable Deep Learning Systems

  • Eldor Abdukhamidov
  • , Firuz Juraev
  • , Mohammed Abuhamad
  • , Tamer Abuhmed

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep neural network models are susceptible to malicious manipulations even in the black-box settings. Providing explanations for DNN models offers a sense of security by human involvement, which reveals whether the sample is benign or adversarial even though previous studies achieved a high attack success rate. However, interpretable deep learning systems (IDLSes) are shown to be susceptible to adversarial manipulations in white-box settings. Attacking IDLSes in black-box settings is challenging and remains an open research domain. In this work, we propose a black-box version of the white-box AdvEdge approach against IDLSes, which is query-efficient and gradient-free without obtaining any knowledge of the target DNN model and its coupled interpreter. Our approach takes advantage of transfer-based and score-based techniques using the effective microbial genetic algorithm (MGA). We achieve a high attack success rate with a small number of queries and high similarity in interpretations between adversarial and benign samples.

Original languageEnglish
Title of host publicationASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery, Inc
Pages1216-1218
Number of pages3
ISBN (Electronic)9781450391405
DOIs
StatePublished - 30 May 2022
Event17th ACM ASIA Conference on Computer and Communications Security 2022, ASIA CCS 2022 - Virtual, Online, Japan
Duration: 30 May 20223 Jun 2022

Publication series

NameASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security

Conference

Conference17th ACM ASIA Conference on Computer and Communications Security 2022, ASIA CCS 2022
Country/TerritoryJapan
CityVirtual, Online
Period30/05/223/06/22

Keywords

  • adversarial machine learning
  • genetic algorithm
  • interpretable machine learning
  • single-class attack
  • target-specific attack

Fingerprint

Dive into the research topics of 'Black-box and Target-specific Attack Against Interpretable Deep Learning Systems'. Together they form a unique fingerprint.

Cite this