TY - GEN
T1 - A Look Back on a Function Identification Problem
AU - Koo, Hyungjoon
AU - Park, Soyeon
AU - Kim, Taesoo
N1 - Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/12/6
Y1 - 2021/12/6
N2 - A function recognition problem serves as a basis for further binary analysis and many applications. Although common challenges for function detection are well known, prior works have repeatedly claimed a noticeable result with a high precision and recall. In this paper, we aim to fill the void of what has been overlooked or misinterpreted by closely looking into the previous datasets, metrics, and evaluations with varying case studies. Our major findings are that i) a common corpus like GNU utilities is insufficient to represent the effectiveness of function identification, ii) it is difficult to claim, at least in the current form, that an ML-oriented approach is scientifically superior to deterministic ones like IDA or Ghidra, iii) the current metrics may not be reasonable enough to measure varying function detection cases, and iv) the capability of recognizing functions depends on each tool's strategic or peculiar choice. We perform re-evaluation of existing approaches on our own dataset, demonstrating that not a single state-of-the-art tool dominates all the others. In conclusion, a function detection problem has not yet been fully addressed, and we need a better methodology and metric to make advances in the field of function identification.
AB - A function recognition problem serves as a basis for further binary analysis and many applications. Although common challenges for function detection are well known, prior works have repeatedly claimed a noticeable result with a high precision and recall. In this paper, we aim to fill the void of what has been overlooked or misinterpreted by closely looking into the previous datasets, metrics, and evaluations with varying case studies. Our major findings are that i) a common corpus like GNU utilities is insufficient to represent the effectiveness of function identification, ii) it is difficult to claim, at least in the current form, that an ML-oriented approach is scientifically superior to deterministic ones like IDA or Ghidra, iii) the current metrics may not be reasonable enough to measure varying function detection cases, and iv) the capability of recognizing functions depends on each tool's strategic or peculiar choice. We perform re-evaluation of existing approaches on our own dataset, demonstrating that not a single state-of-the-art tool dominates all the others. In conclusion, a function detection problem has not yet been fully addressed, and we need a better methodology and metric to make advances in the field of function identification.
KW - Binary
KW - Function identification
KW - Function recognition
KW - Look-back
KW - ML-oriented
UR - https://www.scopus.com/pages/publications/85121574836
U2 - 10.1145/3485832.3488018
DO - 10.1145/3485832.3488018
M3 - Conference contribution
AN - SCOPUS:85121574836
T3 - ACM International Conference Proceeding Series
SP - 158
EP - 168
BT - Proceedings - 37th Annual Computer Security Applications Conference, ACSAC 2021
PB - Association for Computing Machinery
T2 - 37th Annual Computer Security Applications Conference, ACSAC 2021
Y2 - 6 December 2021 through 10 December 2021
ER -