SlideShare a Scribd company logo
Fast Nearest Neighbor Search with Keywords 
ABSTRACT: 
Conventional spatial queries, such as range search and nearest neighbor retrieval, 
involve only conditions on objects’ geometric properties. Today, many modern 
applications call for novel forms of queries that aim to find objects satisfying both 
a spatial predicate, and a predicate on their associated texts. For example, instead 
of considering all the restaurants, a nearest neighbor query would instead ask for 
the restaurant that is the closest among those whose menus contain “steak, 
spaghetti, brandy” all at the same time. Currently, the best solution to such queries 
is based on the IR2 -tree, which, as shown in this paper, has a few deficiencies that 
seriously impact its efficiency. Motivated by this, we develop a new access method 
called the spatial inverted index that extends the conventional inverted index to 
cope with multidimensional data, and comes with algorithms that can answer 
nearest neighbor queries with keywords in real time. As verified by experiments, 
the proposed techniques outperform the IR2 -tree in query response time 
significantly, often by a factor of orders of magnitude.
EXISTING SYSTEM: 
 Spatial queries with keywords have not been extensively explored. In the 
past years, the community has sparked enthusiasm in studying keyword 
search in relational databases. 
 It is until recently that attention was diverted to multidimensional data. The 
best method to date for nearest neighbor search with keywords is due to 
Felipe et al.. They nicely integrate two well-known concepts: R-tree, a 
popular spatial index, and signature file, an effective method for keyword-based 
document retrieval. By doing so they develop a structure called the 
IR2 -tree, which has the strengths of both R-trees and signature files. 
 Like R-trees, the IR2 - tree preserves objects’ spatial proximity, which is the 
key to solving spatial queries efficiently. On the other hand, like signature 
files, the IR2 -tree is able to filter a considerable portion of the objects that 
do not contain all the query keywords, thus significantly reducing the 
number of objects to be examined. 
DISADVANTAGES OF EXISTING SYSTEM: 
 Fail to provide real time answers on difficult inputs.
 The real nearest neighbor lies quite far away from the query point, 
while all the closer neighbors are missing at least one of the query 
keywords. 
PROPOSED SYSTEM: 
 In this paper, we design a variant of inverted index that is optimized for 
multidimensional points, and is thus named the spatial inverted index (SI-index). 
This access method successfully incorporates point coordinates into 
a conventional inverted index with small extra space, owing to a delicate 
compact storage scheme. 
 Meanwhile, an SI-index preserves the spatial locality of data points, and 
comes with an R-tree built on every inverted list at little space overhead. As 
a result, it offers two competing ways for query processing. 
 We can (sequentially) merge multiple lists very much like merging 
traditional inverted lists by ids. Alternatively, we can also leverage the R-trees 
to browse the points of all relevant lists in ascending order of their 
distances to the query point. As demonstrated by experiments, the SI-index 
significantly outperforms the IR2 -tree in query efficiency, often by a factor 
of orders of magnitude.
ADVANTAGES OF PROPOSED SYSTEM: 
 Distance browsing is easy with R-trees. In fact, the best-first algorithm is 
exactly designed to output data points in ascending order of their distances 
 It is straight forward to extend our compression scheme to any dimensional 
space 
SYSTEM REQUIREMENTS: 
HARDWARE REQUIREMENTS: 
 System : Pentium IV 2.4 GHz. 
 Hard Disk : 40 GB. 
 Floppy Drive : 1.44 Mb. 
 Monitor : 15 VGA Colour. 
 Mouse : Logitech. 
 Ram : 512 Mb.
SOFTWARE REQUIREMENTS: 
 Operating system : Windows XP/7. 
 Coding Language : JAVA/J2EE 
 IDE : Netbeans 7.4 
 Database : MYSQL 
REFERENCE: 
Yufei Tao and Cheng Sheng “Fast Nearest Neighbor Search with Keywords” 
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 
VOL. 26, NO. 4, APRIL 2014.

More Related Content

JPJ1422 Fast Nearest Neighbour Search With Keywords

  • 1. Fast Nearest Neighbor Search with Keywords ABSTRACT: Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions on objects’ geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest neighbor query would instead ask for the restaurant that is the closest among those whose menus contain “steak, spaghetti, brandy” all at the same time. Currently, the best solution to such queries is based on the IR2 -tree, which, as shown in this paper, has a few deficiencies that seriously impact its efficiency. Motivated by this, we develop a new access method called the spatial inverted index that extends the conventional inverted index to cope with multidimensional data, and comes with algorithms that can answer nearest neighbor queries with keywords in real time. As verified by experiments, the proposed techniques outperform the IR2 -tree in query response time significantly, often by a factor of orders of magnitude.
  • 2. EXISTING SYSTEM:  Spatial queries with keywords have not been extensively explored. In the past years, the community has sparked enthusiasm in studying keyword search in relational databases.  It is until recently that attention was diverted to multidimensional data. The best method to date for nearest neighbor search with keywords is due to Felipe et al.. They nicely integrate two well-known concepts: R-tree, a popular spatial index, and signature file, an effective method for keyword-based document retrieval. By doing so they develop a structure called the IR2 -tree, which has the strengths of both R-trees and signature files.  Like R-trees, the IR2 - tree preserves objects’ spatial proximity, which is the key to solving spatial queries efficiently. On the other hand, like signature files, the IR2 -tree is able to filter a considerable portion of the objects that do not contain all the query keywords, thus significantly reducing the number of objects to be examined. DISADVANTAGES OF EXISTING SYSTEM:  Fail to provide real time answers on difficult inputs.
  • 3.  The real nearest neighbor lies quite far away from the query point, while all the closer neighbors are missing at least one of the query keywords. PROPOSED SYSTEM:  In this paper, we design a variant of inverted index that is optimized for multidimensional points, and is thus named the spatial inverted index (SI-index). This access method successfully incorporates point coordinates into a conventional inverted index with small extra space, owing to a delicate compact storage scheme.  Meanwhile, an SI-index preserves the spatial locality of data points, and comes with an R-tree built on every inverted list at little space overhead. As a result, it offers two competing ways for query processing.  We can (sequentially) merge multiple lists very much like merging traditional inverted lists by ids. Alternatively, we can also leverage the R-trees to browse the points of all relevant lists in ascending order of their distances to the query point. As demonstrated by experiments, the SI-index significantly outperforms the IR2 -tree in query efficiency, often by a factor of orders of magnitude.
  • 4. ADVANTAGES OF PROPOSED SYSTEM:  Distance browsing is easy with R-trees. In fact, the best-first algorithm is exactly designed to output data points in ascending order of their distances  It is straight forward to extend our compression scheme to any dimensional space SYSTEM REQUIREMENTS: HARDWARE REQUIREMENTS:  System : Pentium IV 2.4 GHz.  Hard Disk : 40 GB.  Floppy Drive : 1.44 Mb.  Monitor : 15 VGA Colour.  Mouse : Logitech.  Ram : 512 Mb.
  • 5. SOFTWARE REQUIREMENTS:  Operating system : Windows XP/7.  Coding Language : JAVA/J2EE  IDE : Netbeans 7.4  Database : MYSQL REFERENCE: Yufei Tao and Cheng Sheng “Fast Nearest Neighbor Search with Keywords” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 4, APRIL 2014.