Open Source Search Evolution
- 7. Then & Now
1990s 2014
WebGlimpse
Swish
Harvest
Ht://Dig
freeWAIS elasticsearch.
- 10. Big Cake
Big Data
Beyond Text
Memory Footprint
Distributed Model
Language Support
Indexing Speed, NRT
Relevance Algorithms
- 15. Relevance Models: VSM
TF IDF
For term i in document j
wi,j
= tfi,j
x log(N/dfi
)
tfi,j
= number of occurrences of i in j
dfi
= number of document containing i
N = total number of documents
- 17. Distributed Architecture
1 Master - N Slaves
good for scaling queries
not good for scaling data
Sharded index with replication
good for scaling queries
good for scaling data