SlideShare a Scribd company logo
Image Search: Then and Now
Integrated Knowledge Solutions
iksinc@yahoo.com
sikrishan@gmail.com
iksinc.wordpress.com
Outline
• Introduction
• Image = Content + Context
• Content Based Image Retrieval (CBIR)
• Bridging the Semantic Gap
• Using Social Interactions for Retrieval
• Where do we go from here
What is Image Search?
• Image search means retrieving images from an
image database that satisfy the user’s need.
• The user need may be expressed in the following
ways:
– Keywords or text describing the image content
– An exemplar image
• Other names for image search
– Image retrieval
– Image similarity search
– Content based image retrieval (CBIR)
Document Search Not a New Problem
Nalanda University was one of the first universities
in the world, founded in the 5th Century BC, and
reported to have been visited by the Buddha during
his lifetime. At its peak, in the 7th century AD,
Nalanda held some 10,000 students when it was
visited by the Chinese scholar Xuanzang.
The Royal Library of Alexandria, in Egypt, seems to have been
the largest and most significant great library of the ancient
world. It functioned as a major center of scholarship from its
construction in the third century B.C. until the Roman
conquest of Egypt in 48 B.C.
However, Earlier
Few Document Producers
Many Document Consumers
But Now a Days
No Distinction Between Document Producers
and Consumers
Image Search: Then and Now
Some Relevant Numbers
Flickr has over 6 billion pictures as of August 2011,
and 3.5 million images are uploaded daily.
Photobucket has more than 10 Billion images, and
over 4 million images are uploaded everyday.
Facebook has over 60 Billion photos and more than
350 million photos are uploaded everyday.
Instagram has over 20 billion photos. About 60
million photos are uploaded everyday.
An image now a days is not just a
picture but it is a picture with
thousand words
Image = Content + Context
Tags
Cherry
blossom
Japantown
San
Francisco
Peace
Pagoda
Content Context
So, image retrieval should benefit
from the contextual component, if
present.
How?
But, first let us look at image
retrieval from the content
perspective only
QBIC/signal
similarity
Concept
/semantic
similarity
Concept
plus
context
History of Image Retrieval
1993
2002
1999
A Typical QBIC Type Image Retrieval
System
Feature
Extraction
FeaturesMedia
Collection
Indexing &
Matching
Query Feature
Extraction
Retrieved
Results
Relevance
Feedback
Such systems/approaches are often referred to as Content
Based Image Retrieval (CBIR)
Image Search: Then and Now
Image Search: Then and Now
Semantic Gap
Early systems produced results
wherein the retrieved
documents were visually similar
(signal level similar) but not
necessarily similar in showing
the same semantic concept.
Content-Based Image Retrieval at the End of the Early Years,
IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold
Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta ,
Ramesh Jain , December 2000
http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/
Semantic Gap
Users also like to query using descriptive
words rather than query images or other
multimedia objects. This requires retrieval
systems to correlate low-level features
with high level concepts.
Visually dissimilar
images representing
the same concept.
Semantic Gap Challenge
How to Bridge the Semantic Gap?
Manual annotation
Use machine learning to:
• Build image category classifiers to
perform semantic filtering of the
results
• Build specific detectors for objects to
associate concepts with images
•Build object models using low level
features
Exploit context:
• Text surrounding images
• Associated sound track and
closed captions in videos
• Query history
Crowdsourcing for Manual Annotation
Image Search: Then and Now
Example of Image Search using Keywords
Search result in 2010
Example of Image Search using Keywords
Search result in 2014
The results are better organized in sub-categories
Example of Image Search using Keywords
Example of Image Search using Keywords
Search result in 2014
Again, the results are better organized in sub-
categories
Exploiting Context: An Example
Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”
Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001
Machine Learning of Image Concepts
• Challenging problem
• Presence of multiple concepts/multiple instances
• Disproportionate number of negative examples
• Manpower need for labeling training examples
Feature Extraction Issues
Whole image based features.
Easy to use but not very
effective
Region based features. Both
regular region structure and
segmented regions are popular
Salient objects based features.
Connected regions
corresponding to dominant
visual properties of objects in an
image
Scale Invariant Feature Transform
(SIFT) Descriptors
SIFT descriptors or its variants are
currently the most popular features
in use. Each image generates
thousands of features (key point
descriptors) with each feature
typically consisting of 128 values
http://www.vlfeat.org/
D. G. Lowe, “Distinctive image
features from scale-invariant
keypoints,” IJCV, 2004.
Learning Image Concepts
• Both supervised and unsupervised
learning methods (SVM, DT, AdaBoost,
VQ etc.) have been used
• Early work limited to few tens of
categories; however some of the current
systems can work with thousands of
categories/concepts
VQ Based Learning Classifier
Test
Image
Best
Codebook
Label
Water Codebook
Sky Codebook
Fire Codebook
Mustafa & Sethi (2004)
http://vision.stanford.edu/teaching/cs223b/lecture/lecture14_intro_objr
ecog_bow_cs223b.pdf
Bag of Words Approach
Bag of Words Representation of
Images
Co-occurrence of Bag of Words
Image
Collection
Edge
Analysis
Images
Collection of
Binary Image
Blocks
Clustering
Local
Feature
Descriptors
(Codewords)
Codeword
Representation
Of Images
Co-occurrence
Matrices of
Local Features
Compute
Distances
Image
Distance
Matrix
Pathfinder
Network
Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image
Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004
Co-occurrence of BoW
Original image
Representation by
feature indices
(cluster membership)
Co-occurrence matrix
)},(),,(max{),( ABhBAhBAH 
))max(min(),( AaBbbaBAh 
Hausdorff metric
Manhattan distance
Notice how similar
images are placed
together in the graph
Image Search: Then and Now
Image Search: Then and Now
Object Detectors for Image Concepts
PASCAL Visual Object Classes Challenge
Project
http://labelme.csail.mit.edu/
Web-based annotation tool to segment and label image
regions. Labeled objects in images are used as training images
to build object detectors.
IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings,
activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM
alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos
per day. Several demos of IMARS are available (see IMARS demos)
Image Category Classifiers Examples
Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic
concept and learns a probabilistic model for each concept. (b) The system represents
each image by a vector of posterior concept probabilities.
From Pixels to Semantic Spaces: Advances in Content-Based Image
Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)
Image Classification via Probabilistic
Modeling
Image = Content + Context
Tags
Cherry
blossom
Japantown
San
Francisco
Peace
Pagoda
Content Context
Tagging
All time most popular tags at Flickr
About Tags
• User centered
• Imprecise and often overly personalized
• Tag distribution follows power law
• Most users use very few distinct tags while a small group of users works
with extremely large set of tags
• Also known as Folksonomy, social tagging, and social classification
Why Not Use Social Tags for Retrieval?
Problem: The relevant tag is
often not at the top of the list.
Only less than 10% of the
images have their most relevant
tag at the top of the list.
Solution: Improve tagging by
suggesting potential tags to a
user / tag ranking /tag
completion etc.
Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-
Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain
Tag Recommendation using Tags
Co-occurrences
Given a target image and initial tags, use co-occurrence of tags to
recommend tags for the target image. This approach doesn’t take into
account the visual features co-occurrences.
Tag Recommendation using Tags
Co-occurrences and Visual Similarity
Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)
Given a target image and initial tags, use the existing tagged images to
suggest tags for the target image.
Tag Ranking
Tag Ranking: Another Approach
Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-
Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain
How to Compute Tag Similarity
Image Search: Then and Now
Tag Recommendation After Tag
Ranking
• Given an untagged image, find its visually similar “k” images
• Pool the top two ranked tags from k images and select the unique tags as
recommended tags
Tag Completion
The complete tag matrix is
generated by imposing
constraints based on visual
similarity, tag to tag similarity,
and similarity with the initial
tag matrix. The matrix
completion is done by an
optimization procedure.
Wu and Jain, IEEE-PAMI, JANUARY
2011
What about Taggers & Commenters?
Question: How can we incorporate taggers/commenters
characteristics for improved tag recommendations?
Answer: Use three sets of features: derived from image to
be tagged, user’s tag history, and user’s social interactions
Tag History & Social Interaction
Features
Tag history features are based
on the tags the user has used
in the past
Social interaction features are
derived from tags/comments
posted by the user’s
friends/favorite posters
X. Chen & H. Shin, ICDM 2010
Current Status of Image Search
• Extensive interest as evident from conferences, journals, and
special issues
• Overall, solid progress is being made
• Efforts towards performance evaluation with benchmarked
collections are gaining more traction
• Integration of content and context through tags and
comments is receiving increasing attention to help improve
retrieval
• Killer applications are beginning to emerge as visual search
gains prominence
• Need for more applications outside entertainment
Performance Evaluation Efforts
ImageCLEF2013
- Annotation Task:
- 250000 Training Images
- 95 (develop), 116 (test) concepts to be identified
- A lot of label Noise inside the training set, due to the automatic label
extraction from websites
Performance Evaluation Efforts
TRECVID workshops, an offshoot of TREC, are yearly evaluation meetings since
2003. The goal of the workshops is to encourage research in content-based
video retrieval and analysis by providing large test collections, realistic system
tasks, uniform scoring procedures, and a forum for organizations interested in
comparing their results.
Application Examples
Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain,
Jung-Eun Lee, and Rong Jin)
CBIR for Whole Slide Imageries
• The availability of digital whole slide data sets
represent an enormous opportunity to carry out
new forms of numerical and data- driven query,
in modes not based on textual, ontological or
lexical matching.
– Search image repositories with whole images or
image regions of interest
– Carry our search in real-time via use of scalable
computational architectures
Extraction from Image
repositories based upon
spatial information
Analysis of data
in the digital domain
…001011010111010111..
Resultant Surface Map or
gallery of matching images
or
Slide courtesy of Ulysses J. Balis, M.D.
Director, Division of Pathology Informatics
Department of Pathology
University of Michigan Health System
Medical Image Retrieval
Text
“Find all the cases in which a tumor decrease in size
for less than three month post treatment, then
resumed a growth pattern after that period”
QUERY ?
Text + medical image
“Find images with large-sized frontal lobes brain tumors for
patients approximately 35 years old”
+Medical image
QUERY IMAGE-BASED CONCEPTS
Medical image ij - Specific Signature
ImageiQuery
VB-Spec CUIp
VB-Gen CUI1
VB-Spec CUIkIMAGE-BASED
ONTOLOGY
GENRAL AND SPECIALIZED
QUERY MEDICAL IMAGE
VISUAL ANALYSIS
Text query
CUIn
CUI1
CUI2
QUERY TEXT-BASED CONCEPTS
Textual query i - Indexes
MEDICAL
ONTOLOGY
TEXT QUERY
CONCEPTS
EXTRACTION
Image Search Products
http://www.picalike.com/products/similarity-search.php
Image Search Products
http://www.pcsso.com/
Image Search Products
Image Search Products
http://viral.image.ntua.gr/
Image Search: Then and Now
Take Home Message
• Image/video retrieval is moving in the
commercial domain. Lot more activity is expected
in near future
• Multimodal/cross-modal retrieval is gaining
importance
• Approaches combining social search and visual
search techniques are expected to gain
prominence
• Crowdsourcing is a cheap and effective way of
tagging media
Acknowledgement
• This presentation is based on the work of
numerous researchers from the MIR/ML/CVPR
community. I have tried to give
credit/references wherever possible. Any
omission is unintentional and I apologize for
that.
• Also want to thank my present and past
students and collaborators.
Questions?
Email: iksinc@yahoo.com
Email: sikrishan@gmail.com

More Related Content

Image Search: Then and Now

  • 1. Image Search: Then and Now Integrated Knowledge Solutions iksinc@yahoo.com sikrishan@gmail.com iksinc.wordpress.com
  • 2. Outline • Introduction • Image = Content + Context • Content Based Image Retrieval (CBIR) • Bridging the Semantic Gap • Using Social Interactions for Retrieval • Where do we go from here
  • 3. What is Image Search? • Image search means retrieving images from an image database that satisfy the user’s need. • The user need may be expressed in the following ways: – Keywords or text describing the image content – An exemplar image • Other names for image search – Image retrieval – Image similarity search – Content based image retrieval (CBIR)
  • 4. Document Search Not a New Problem
  • 5. Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang.
  • 6. The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.
  • 7. However, Earlier Few Document Producers Many Document Consumers
  • 8. But Now a Days No Distinction Between Document Producers and Consumers
  • 10. Some Relevant Numbers Flickr has over 6 billion pictures as of August 2011, and 3.5 million images are uploaded daily. Photobucket has more than 10 Billion images, and over 4 million images are uploaded everyday. Facebook has over 60 Billion photos and more than 350 million photos are uploaded everyday. Instagram has over 20 billion photos. About 60 million photos are uploaded everyday.
  • 11. An image now a days is not just a picture but it is a picture with thousand words
  • 12. Image = Content + Context Tags Cherry blossom Japantown San Francisco Peace Pagoda Content Context
  • 13. So, image retrieval should benefit from the contextual component, if present. How? But, first let us look at image retrieval from the content perspective only
  • 15. A Typical QBIC Type Image Retrieval System Feature Extraction FeaturesMedia Collection Indexing & Matching Query Feature Extraction Retrieved Results Relevance Feedback Such systems/approaches are often referred to as Content Based Image Retrieval (CBIR)
  • 18. Semantic Gap Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept. Content-Based Image Retrieval at the End of the Early Years, IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000 http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/
  • 19. Semantic Gap Users also like to query using descriptive words rather than query images or other multimedia objects. This requires retrieval systems to correlate low-level features with high level concepts. Visually dissimilar images representing the same concept.
  • 21. How to Bridge the Semantic Gap? Manual annotation Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features Exploit context: • Text surrounding images • Associated sound track and closed captions in videos • Query history
  • 24. Example of Image Search using Keywords Search result in 2010
  • 25. Example of Image Search using Keywords Search result in 2014 The results are better organized in sub-categories
  • 26. Example of Image Search using Keywords
  • 27. Example of Image Search using Keywords Search result in 2014 Again, the results are better organized in sub- categories
  • 28. Exploiting Context: An Example Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,” Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001
  • 29. Machine Learning of Image Concepts • Challenging problem • Presence of multiple concepts/multiple instances • Disproportionate number of negative examples • Manpower need for labeling training examples
  • 30. Feature Extraction Issues Whole image based features. Easy to use but not very effective Region based features. Both regular region structure and segmented regions are popular Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image
  • 31. Scale Invariant Feature Transform (SIFT) Descriptors SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values http://www.vlfeat.org/ D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.
  • 32. Learning Image Concepts • Both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc.) have been used • Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts
  • 33. VQ Based Learning Classifier Test Image Best Codebook Label Water Codebook Sky Codebook Fire Codebook Mustafa & Sethi (2004)
  • 35. Bag of Words Representation of Images
  • 36. Co-occurrence of Bag of Words Image Collection Edge Analysis Images Collection of Binary Image Blocks Clustering Local Feature Descriptors (Codewords) Codeword Representation Of Images Co-occurrence Matrices of Local Features Compute Distances Image Distance Matrix Pathfinder Network Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004
  • 37. Co-occurrence of BoW Original image Representation by feature indices (cluster membership) Co-occurrence matrix )},(),,(max{),( ABhBAhBAH  ))max(min(),( AaBbbaBAh  Hausdorff metric Manhattan distance
  • 38. Notice how similar images are placed together in the graph
  • 41. Object Detectors for Image Concepts PASCAL Visual Object Classes Challenge
  • 42. Project http://labelme.csail.mit.edu/ Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.
  • 43. IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos) Image Category Classifiers Examples
  • 44. Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic concept and learns a probabilistic model for each concept. (b) The system represents each image by a vector of posterior concept probabilities. From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007) Image Classification via Probabilistic Modeling
  • 45. Image = Content + Context Tags Cherry blossom Japantown San Francisco Peace Pagoda Content Context
  • 46. Tagging All time most popular tags at Flickr
  • 47. About Tags • User centered • Imprecise and often overly personalized • Tag distribution follows power law • Most users use very few distinct tags while a small group of users works with extremely large set of tags • Also known as Folksonomy, social tagging, and social classification
  • 48. Why Not Use Social Tags for Retrieval? Problem: The relevant tag is often not at the top of the list. Only less than 10% of the images have their most relevant tag at the top of the list. Solution: Improve tagging by suggesting potential tags to a user / tag ranking /tag completion etc. Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong- Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain
  • 49. Tag Recommendation using Tags Co-occurrences Given a target image and initial tags, use co-occurrence of tags to recommend tags for the target image. This approach doesn’t take into account the visual features co-occurrences.
  • 50. Tag Recommendation using Tags Co-occurrences and Visual Similarity Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08) Given a target image and initial tags, use the existing tagged images to suggest tags for the target image.
  • 52. Tag Ranking: Another Approach Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong- Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain
  • 53. How to Compute Tag Similarity
  • 55. Tag Recommendation After Tag Ranking • Given an untagged image, find its visually similar “k” images • Pool the top two ranked tags from k images and select the unique tags as recommended tags
  • 56. Tag Completion The complete tag matrix is generated by imposing constraints based on visual similarity, tag to tag similarity, and similarity with the initial tag matrix. The matrix completion is done by an optimization procedure. Wu and Jain, IEEE-PAMI, JANUARY 2011
  • 57. What about Taggers & Commenters? Question: How can we incorporate taggers/commenters characteristics for improved tag recommendations? Answer: Use three sets of features: derived from image to be tagged, user’s tag history, and user’s social interactions
  • 58. Tag History & Social Interaction Features Tag history features are based on the tags the user has used in the past Social interaction features are derived from tags/comments posted by the user’s friends/favorite posters X. Chen & H. Shin, ICDM 2010
  • 59. Current Status of Image Search • Extensive interest as evident from conferences, journals, and special issues • Overall, solid progress is being made • Efforts towards performance evaluation with benchmarked collections are gaining more traction • Integration of content and context through tags and comments is receiving increasing attention to help improve retrieval • Killer applications are beginning to emerge as visual search gains prominence • Need for more applications outside entertainment
  • 60. Performance Evaluation Efforts ImageCLEF2013 - Annotation Task: - 250000 Training Images - 95 (develop), 116 (test) concepts to be identified - A lot of label Noise inside the training set, due to the automatic label extraction from websites
  • 61. Performance Evaluation Efforts TRECVID workshops, an offshoot of TREC, are yearly evaluation meetings since 2003. The goal of the workshops is to encourage research in content-based video retrieval and analysis by providing large test collections, realistic system tasks, uniform scoring procedures, and a forum for organizations interested in comparing their results.
  • 62. Application Examples Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)
  • 63. CBIR for Whole Slide Imageries • The availability of digital whole slide data sets represent an enormous opportunity to carry out new forms of numerical and data- driven query, in modes not based on textual, ontological or lexical matching. – Search image repositories with whole images or image regions of interest – Carry our search in real-time via use of scalable computational architectures Extraction from Image repositories based upon spatial information Analysis of data in the digital domain …001011010111010111.. Resultant Surface Map or gallery of matching images or Slide courtesy of Ulysses J. Balis, M.D. Director, Division of Pathology Informatics Department of Pathology University of Michigan Health System
  • 64. Medical Image Retrieval Text “Find all the cases in which a tumor decrease in size for less than three month post treatment, then resumed a growth pattern after that period” QUERY ? Text + medical image “Find images with large-sized frontal lobes brain tumors for patients approximately 35 years old” +Medical image QUERY IMAGE-BASED CONCEPTS Medical image ij - Specific Signature ImageiQuery VB-Spec CUIp VB-Gen CUI1 VB-Spec CUIkIMAGE-BASED ONTOLOGY GENRAL AND SPECIALIZED QUERY MEDICAL IMAGE VISUAL ANALYSIS Text query CUIn CUI1 CUI2 QUERY TEXT-BASED CONCEPTS Textual query i - Indexes MEDICAL ONTOLOGY TEXT QUERY CONCEPTS EXTRACTION
  • 71. Take Home Message • Image/video retrieval is moving in the commercial domain. Lot more activity is expected in near future • Multimodal/cross-modal retrieval is gaining importance • Approaches combining social search and visual search techniques are expected to gain prominence • Crowdsourcing is a cheap and effective way of tagging media
  • 72. Acknowledgement • This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that. • Also want to thank my present and past students and collaborators.

Editor's Notes

  1. MPE(Minimum probability of error