TY - GEN
T1 - Phishing detection with popular search engines
T2 - 4th Canada-France MITACS Workshop on Foundations and Practice of Security, FPS 2011
AU - Huh, Jun Ho
AU - Kim, Hyoungshick
PY - 2012
Y1 - 2012
N2 - We propose a new phishing detection heuristic based on the search results returned from popular web search engines such as Google, Bing and Yahoo. The full URL of a website a user intends to access is used as the search string, and the number of results returned and ranking of the website are used for classification. Most of the time, legitimate websites get back large number of results and are ranked first, whereas phishing websites get back no result and/or are not ranked at all. To demonstrate the effectiveness of our approach, we experimented with four well-known classification algorithms - Linear Discriminant Analysis, Naïve Bayesian, K-Nearest Neighbour, and Support Vector Machine - and observed their performance. The K-Nearest Neighbour algorithm performed best, achieving true positive rate of 98% and false positive and false negative rates of 2%. We used new legitimate websites and phishing websites as our dataset to show that our approach works well even on newly launched websites/webpages - such websites are often misclassified in existing blacklisting and whitelisting approaches.
AB - We propose a new phishing detection heuristic based on the search results returned from popular web search engines such as Google, Bing and Yahoo. The full URL of a website a user intends to access is used as the search string, and the number of results returned and ranking of the website are used for classification. Most of the time, legitimate websites get back large number of results and are ranked first, whereas phishing websites get back no result and/or are not ranked at all. To demonstrate the effectiveness of our approach, we experimented with four well-known classification algorithms - Linear Discriminant Analysis, Naïve Bayesian, K-Nearest Neighbour, and Support Vector Machine - and observed their performance. The K-Nearest Neighbour algorithm performed best, achieving true positive rate of 98% and false positive and false negative rates of 2%. We used new legitimate websites and phishing websites as our dataset to show that our approach works well even on newly launched websites/webpages - such websites are often misclassified in existing blacklisting and whitelisting approaches.
KW - Classification
KW - Phishing detection
KW - URL Reputation
UR - https://www.scopus.com/pages/publications/84862930433
U2 - 10.1007/978-3-642-27901-0_15
DO - 10.1007/978-3-642-27901-0_15
M3 - Conference contribution
AN - SCOPUS:84862930433
SN - 9783642279003
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 194
EP - 207
BT - Foundations and Practice of Security - 4th Canada-France MITACS Workshop, FPS 2011, Revised Selected Papers
Y2 - 12 May 2011 through 13 May 2011
ER -