TY - GEN
T1 - A partitioning method for high dimensional data
AU - Lee, Seunghoon
AU - Bang, Sung Woo
AU - Kim, Bo Keong
AU - Kim, Jaekwang
AU - Lee, Jee Hyong
PY - 2010
Y1 - 2010
N2 - Nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia database. Evaluating similarities of a point to all other points in high-dimensional space need the high computational cost. For reducing the computational cost, index-structures are frequently used. Most of these index-structures are built by partitioning the data set based on a specific criterion. However, partitioning approaches potentially have a problem failing to find the nearest neighbor which is caused by disjoint partitions. In this paper, we propose an Error Minimizing Partitioning (E-MP) method with a novel tree structure, which minimizes the failure problem in finding the nearest neighbors. E-MP divides the data into subsets with considering the distribution of data set. For partitioning data set, the proposed method finds the first principal component of the data set using the principal component analysis (PCA). And then, the method finds the centroid of data set. Finally, it decides the partitioning hyper-plane that passes the centroid and is perpendicular to the principal component vector. We also make a comparative study of existing methods and the proposed method, to verify the usability of our method.
AB - Nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia database. Evaluating similarities of a point to all other points in high-dimensional space need the high computational cost. For reducing the computational cost, index-structures are frequently used. Most of these index-structures are built by partitioning the data set based on a specific criterion. However, partitioning approaches potentially have a problem failing to find the nearest neighbor which is caused by disjoint partitions. In this paper, we propose an Error Minimizing Partitioning (E-MP) method with a novel tree structure, which minimizes the failure problem in finding the nearest neighbors. E-MP divides the data into subsets with considering the distribution of data set. For partitioning data set, the proposed method finds the first principal component of the data set using the principal component analysis (PCA). And then, the method finds the centroid of data set. Finally, it decides the partitioning hyper-plane that passes the centroid and is perpendicular to the principal component vector. We also make a comparative study of existing methods and the proposed method, to verify the usability of our method.
KW - high dimensionality
KW - index tree
KW - nearest neighbor search
KW - similarity search
UR - https://www.scopus.com/pages/publications/84863275718
U2 - 10.1145/2108616.2108657
DO - 10.1145/2108616.2108657
M3 - Conference contribution
AN - SCOPUS:84863275718
SN - 9781605588933
T3 - Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication ICUIMC 10
SP - 234
EP - 237
BT - Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication, ICUIMC 10
T2 - 4th International Conference on Ubiquitous Information Management and Communication, ICUIMC'10
Y2 - 14 January 2010 through 15 January 2010
ER -