A partitioning method for high dimensional data

Seunghoon Lee, Sung Woo Bang, Bo Keong Kim, Jaekwang Kim, Jee Hyong Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia database. Evaluating similarities of a point to all other points in high-dimensional space need the high computational cost. For reducing the computational cost, index-structures are frequently used. Most of these index-structures are built by partitioning the data set based on a specific criterion. However, partitioning approaches potentially have a problem failing to find the nearest neighbor which is caused by disjoint partitions. In this paper, we propose an Error Minimizing Partitioning (E-MP) method with a novel tree structure, which minimizes the failure problem in finding the nearest neighbors. E-MP divides the data into subsets with considering the distribution of data set. For partitioning data set, the proposed method finds the first principal component of the data set using the principal component analysis (PCA). And then, the method finds the centroid of data set. Finally, it decides the partitioning hyper-plane that passes the centroid and is perpendicular to the principal component vector. We also make a comparative study of existing methods and the proposed method, to verify the usability of our method.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Ubiquitous Information Management and Communication, ICUIMC 10
Pages234-237
Number of pages4
DOIs
StatePublished - 2010
Externally publishedYes
Event4th International Conference on Ubiquitous Information Management and Communication, ICUIMC'10 - Suwon, Korea, Republic of
Duration: 14 Jan 201015 Jan 2010

Publication series

NameProceedings of the 4th International Conference on Ubiquitous Information Management and Communication ICUIMC 10

Conference

Conference4th International Conference on Ubiquitous Information Management and Communication, ICUIMC'10
Country/TerritoryKorea, Republic of
CitySuwon
Period14/01/1015/01/10

Keywords

  • high dimensionality
  • index tree
  • nearest neighbor search
  • similarity search

Fingerprint

Dive into the research topics of 'A partitioning method for high dimensional data'. Together they form a unique fingerprint.

Cite this