SlideShare a Scribd company logo
[Open Source]
Search Evolution
Otis Gospodnetić @otisg
Today
The Early Days
Even Earlier Days
Foci
1974 1995 now()
__________________________________________________________________________________________________________________
______
SEARCH
Otis Who?
SEARCH
Then & Now
1990s 2014
WebGlimpse
Swish
Harvest
Ht://Dig
freeWAIS elasticsearch.
Still New?
elasticsearch.
…………………... 2000
…………………... 2004
…………………... 2010
Dominance
[Open Source]
Search Evolution
Big Cake
Big Data
Beyond Text
Memory Footprint
Distributed Model
Language Support
Indexing Speed, NRT
Relevance Algorithms
Language Support: Stemming
Language Support: Lemmatization
Language Support: Morphology
Language Support
Lucene 2004: ~ 20 languages
Lucene 2014: ~ 40 languages
most are stemmers
Relevance Models: VSM
TF IDF
For term i in document j
wi,j
= tfi,j
x log(N/dfi
)
tfi,j
= number of occurrences of i in j
dfi
= number of document containing i
N = total number of documents
Relevance Models: Pluggable
Lucene until 2011: 1 relevance model
Lucene 2014: 6 relevance models
got more?
Distributed Architecture
1 Master - N Slaves
good for scaling queries
not good for scaling data
Sharded index with replication
good for scaling queries
good for scaling data
Indexing Speed & NRT Search
Memory Footprint
Beyond Text
Geospatial Search
Classifier
Recommendation Engine
Key Value Store
NoSQL DB
Analytical DB
Geospatial Search
Classifier
Recommender
Content Similarity
Collaborative Filtering
Key Value Store
id123 ⇒ manu:Apple desc:foo bar price:$111
id234 ⇒ manu:Sony desc:baz bam price:$222
NoSQL DB
Distributed
Replicated
Horizontally Scalable
Fast Retrieval
Searchable?
Slicing & Dicing
Analytical Queries
Gobble Gobble
If software is eating the world,
then [open source] search is gobbling it.
And has been for years.
FIN. Questions
otis@sematext.com

More Related Content

Open Source Search Evolution