Image Search: Then and Now

Image Search: Then and Now
Integrated Knowledge Solutions
iksinc@yahoo.com
sikrishan@gmail.com
iksinc.wordpress.com

Outline
• Introduction
• Image = Content + Context
• Content Based Image Retrieval (CBIR)
• Bridging the Semantic Gap
• Using Social Interactions for Retrieval
• Where do we go from here

What is Image Search?
• Image search means retrieving images from an
image database that satisfy the user’s need.
• The user need may be expressed in the following
ways:
– Keywords or text describing the image content
– An exemplar image
• Other names for image search
– Image retrieval
– Image similarity search
– Content based image retrieval (CBIR)

Document Search Not a New Problem

Nalanda University was one of the first universities
in the world, founded in the 5th Century BC, and
reported to have been visited by the Buddha during
his lifetime. At its peak, in the 7th century AD,
Nalanda held some 10,000 students when it was
visited by the Chinese scholar Xuanzang.

The Royal Library of Alexandria, in Egypt, seems to have been
the largest and most significant great library of the ancient
world. It functioned as a major center of scholarship from its
construction in the third century B.C. until the Roman
conquest of Egypt in 48 B.C.

However, Earlier
Few Document Producers
Many Document Consumers

But Now a Days
No Distinction Between Document Producers
and Consumers

Some Relevant Numbers
Flickr has over 6 billion pictures as of August 2011,
and 3.5 million images are uploaded daily.
Photobucket has more than 10 Billion images, and
over 4 million images are uploaded everyday.
Facebook has over 60 Billion photos and more than
350 million photos are uploaded everyday.
Instagram has over 20 billion photos. About 60
million photos are uploaded everyday.

An image now a days is not just a
picture but it is a picture with
thousand words

Image = Content + Context
Tags
Cherry
blossom
Japantown
San
Francisco
Peace
Pagoda
Content Context

So, image retrieval should benefit
from the contextual component, if
present.
How?
But, first let us look at image
retrieval from the content
perspective only

QBIC/signal
similarity
Concept
/semantic
similarity
Concept
plus
context
History of Image Retrieval
1993
2002
1999

A Typical QBIC Type Image Retrieval
System
Feature
Extraction
FeaturesMedia
Collection
Indexing &
Matching
Query Feature
Extraction
Retrieved
Results
Relevance
Feedback
Such systems/approaches are often referred to as Content
Based Image Retrieval (CBIR)

Semantic Gap
Early systems produced results
wherein the retrieved
documents were visually similar
(signal level similar) but not
necessarily similar in showing
the same semantic concept.
Content-Based Image Retrieval at the End of the Early Years,
IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold
Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta ,
Ramesh Jain , December 2000
http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/

Semantic Gap
Users also like to query using descriptive
words rather than query images or other
multimedia objects. This requires retrieval
systems to correlate low-level features
with high level concepts.
Visually dissimilar
images representing
the same concept.

How to Bridge the Semantic Gap?
Manual annotation
Use machine learning to:
• Build image category classifiers to
perform semantic filtering of the
results
• Build specific detectors for objects to
associate concepts with images
•Build object models using low level
features
Exploit context:
• Text surrounding images
• Associated sound track and
closed captions in videos
• Query history

Crowdsourcing for Manual Annotation

Example of Image Search using Keywords
Search result in 2010

The results are better organized in sub-categories

Again, the results are better organized in sub-
categories

Exploiting Context: An Example
Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”
Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001

Machine Learning of Image Concepts
• Challenging problem
• Presence of multiple concepts/multiple instances
• Disproportionate number of negative examples
• Manpower need for labeling training examples

Feature Extraction Issues
Whole image based features.
Easy to use but not very
effective
Region based features. Both
regular region structure and
segmented regions are popular
Salient objects based features.
Connected regions
corresponding to dominant
visual properties of objects in an
image

Scale Invariant Feature Transform
(SIFT) Descriptors
SIFT descriptors or its variants are
currently the most popular features
in use. Each image generates
thousands of features (key point
descriptors) with each feature
typically consisting of 128 values
http://www.vlfeat.org/
D. G. Lowe, “Distinctive image
features from scale-invariant
keypoints,” IJCV, 2004.

Learning Image Concepts
• Both supervised and unsupervised
learning methods (SVM, DT, AdaBoost,
VQ etc.) have been used
• Early work limited to few tens of
categories; however some of the current
systems can work with thousands of
categories/concepts

VQ Based Learning Classifier
Test
Image
Best
Codebook
Label
Water Codebook
Sky Codebook
Fire Codebook
Mustafa & Sethi (2004)

http://vision.stanford.edu/teaching/cs223b/lecture/lecture14_intro_objr
ecog_bow_cs223b.pdf
Bag of Words Approach

Bag of Words Representation of
Images

Co-occurrence of Bag of Words
Image
Collection
Edge
Analysis
Images
Collection of
Binary Image
Blocks
Clustering
Local
Feature
Descriptors
(Codewords)
Codeword
Representation
Of Images
Co-occurrence
Matrices of
Local Features
Compute
Distances
Image
Distance
Matrix
Pathfinder
Network
Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image
Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004

Co-occurrence of BoW
Original image
Representation by
feature indices
(cluster membership)
Co-occurrence matrix
)},(),,(max{),( ABhBAhBAH 
))max(min(),( AaBbbaBAh 
Hausdorff metric
Manhattan distance

Notice how similar
images are placed
together in the graph

Object Detectors for Image Concepts
PASCAL Visual Object Classes Challenge

Project
http://labelme.csail.mit.edu/
Web-based annotation tool to segment and label image
regions. Labeled objects in images are used as training images
to build object detectors.

IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings,
activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM
alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos
per day. Several demos of IMARS are available (see IMARS demos)
Image Category Classifiers Examples

Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic
concept and learns a probabilistic model for each concept. (b) The system represents
each image by a vector of posterior concept probabilities.
From Pixels to Semantic Spaces: Advances in Content-Based Image
Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)
Image Classification via Probabilistic
Modeling

Tagging
All time most popular tags at Flickr

About Tags
• User centered
• Imprecise and often overly personalized
• Tag distribution follows power law
• Most users use very few distinct tags while a small group of users works
with extremely large set of tags
• Also known as Folksonomy, social tagging, and social classification

Why Not Use Social Tags for Retrieval?
Problem: The relevant tag is
often not at the top of the list.
Only less than 10% of the
images have their most relevant
tag at the top of the list.
Solution: Improve tagging by
suggesting potential tags to a
user / tag ranking /tag
completion etc.
Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-
Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain

Tag Recommendation using Tags
Co-occurrences
Given a target image and initial tags, use co-occurrence of tags to
recommend tags for the target image. This approach doesn’t take into
account the visual features co-occurrences.

Tag Recommendation using Tags
Co-occurrences and Visual Similarity
Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)
Given a target image and initial tags, use the existing tagged images to
suggest tags for the target image.

Tag Ranking: Another Approach
Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong-
Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain

Tag Recommendation After Tag
Ranking
• Given an untagged image, find its visually similar “k” images
• Pool the top two ranked tags from k images and select the unique tags as
recommended tags

Tag Completion
The complete tag matrix is
generated by imposing
constraints based on visual
similarity, tag to tag similarity,
and similarity with the initial
tag matrix. The matrix
completion is done by an
optimization procedure.
Wu and Jain, IEEE-PAMI, JANUARY
2011

What about Taggers & Commenters?
Question: How can we incorporate taggers/commenters
characteristics for improved tag recommendations?
Answer: Use three sets of features: derived from image to
be tagged, user’s tag history, and user’s social interactions

Tag History & Social Interaction
Features
Tag history features are based
on the tags the user has used
in the past
Social interaction features are
derived from tags/comments
posted by the user’s
friends/favorite posters
X. Chen & H. Shin, ICDM 2010

Current Status of Image Search
• Extensive interest as evident from conferences, journals, and
special issues
• Overall, solid progress is being made
• Efforts towards performance evaluation with benchmarked
collections are gaining more traction
• Integration of content and context through tags and
comments is receiving increasing attention to help improve
retrieval
• Killer applications are beginning to emerge as visual search
gains prominence
• Need for more applications outside entertainment

Performance Evaluation Efforts
ImageCLEF2013
- Annotation Task:
- 250000 Training Images
- 95 (develop), 116 (test) concepts to be identified
- A lot of label Noise inside the training set, due to the automatic label
extraction from websites

Performance Evaluation Efforts
TRECVID workshops, an offshoot of TREC, are yearly evaluation meetings since
2003. The goal of the workshops is to encourage research in content-based
video retrieval and analysis by providing large test collections, realistic system
tasks, uniform scoring procedures, and a forum for organizations interested in
comparing their results.

Application Examples
Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain,
Jung-Eun Lee, and Rong Jin)

CBIR for Whole Slide Imageries
• The availability of digital whole slide data sets
represent an enormous opportunity to carry out
new forms of numerical and data- driven query,
in modes not based on textual, ontological or
lexical matching.
– Search image repositories with whole images or
image regions of interest
– Carry our search in real-time via use of scalable
computational architectures
Extraction from Image
repositories based upon
spatial information
Analysis of data
in the digital domain
…001011010111010111..
Resultant Surface Map or
gallery of matching images
or
Slide courtesy of Ulysses J. Balis, M.D.
Director, Division of Pathology Informatics
Department of Pathology
University of Michigan Health System

Medical Image Retrieval
Text
“Find all the cases in which a tumor decrease in size
for less than three month post treatment, then
resumed a growth pattern after that period”
QUERY ?
Text + medical image
“Find images with large-sized frontal lobes brain tumors for
patients approximately 35 years old”
+Medical image
QUERY IMAGE-BASED CONCEPTS
Medical image ij - Specific Signature
ImageiQuery
VB-Spec CUIp
VB-Gen CUI1
VB-Spec CUIkIMAGE-BASED
ONTOLOGY
GENRAL AND SPECIALIZED
QUERY MEDICAL IMAGE
VISUAL ANALYSIS
Text query
CUIn
CUI1
CUI2
QUERY TEXT-BASED CONCEPTS
Textual query i - Indexes
MEDICAL
ONTOLOGY
TEXT QUERY
CONCEPTS
EXTRACTION

Image Search Products
http://www.picalike.com/products/similarity-search.php

Image Search Products
http://www.pcsso.com/

Take Home Message
• Image/video retrieval is moving in the
commercial domain. Lot more activity is expected
in near future
• Multimodal/cross-modal retrieval is gaining
importance
• Approaches combining social search and visual
search techniques are expected to gain
prominence
• Crowdsourcing is a cheap and effective way of
tagging media

Acknowledgement
• This presentation is based on the work of
numerous researchers from the MIR/ML/CVPR
community. I have tried to give
credit/references wherever possible. Any
omission is unintentional and I apologize for
that.
• Also want to thank my present and past
students and collaborators.

Questions?
Email: iksinc@yahoo.com
Email: sikrishan@gmail.com

Image Search: Then and Now

More Related Content

Image Search: Then and Now

Editor's Notes