5

I will be implementing log viewing utility soon. But I stuck with DB choice. My requirements are like below:

  • Store 5 GB data daily
  • Total size of 5 TB data
  • Search in this log data in less than 10 sec

I know that PostgreSQL will work if I fragment tables. But will I able to get this performance written above. As I understood NoSQL is better choice for log storing, since logs are not very structured. I saw an example like below and it seems promising using hadoop-hbase-lucene: http://blog.mgm-tp.com/2010/03/hadoop-log-management-part1/

But before deciding I wanted to ask if anybody did a choice like this before and could give me an idea. Which DBMS will fit this task best?

2
  • if i were you i will go with NoSQL Commented Nov 19, 2012 at 8:41
  • 1
    "Fragment" - do you mean partition? Commented Nov 19, 2012 at 9:46

2 Answers 2

5

My logs are very structured :)

I would say you don't need database you need search engine:

  • Solr based on Lucene and it packages everything what you need together
  • ElasticSearch another Lucene based search engine
  • Sphinx nice thing is that you can use multiple sources per search index -- enrich your raw logs with other events
  • Scribe Facebook way to search and collect logs

Update for @JustBob: Most of the mentioned solutions can work with flat file w/o affecting performance. All of then need inverted index which is the hardest part to build or maintain. You can update index in batch mode or on-line. Index can be stored in RDBMS, NoSQL, or custom "flat file" storage format (custom - maintained by search engine application)

2
  • 1
    +1 this is a search problem much more than it is a data storage problem. Also, there's splunk if you're willing to pay to get something OTS. Commented Nov 19, 2012 at 9:47
  • Depending on your view search and database are one in the same. Can you be more specific in what you mean you don't need a database, but a search engine?
    – Kuberchaun
    Commented Nov 19, 2012 at 13:17
4

You can find a lot of information here:

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

See which fits your needs.

Anyway for such a task NoSQL is the right choice.


You should also consider the learning curve, MongoDB / CouchDB, even though they don't perform such as Cassandra or Hadoop, they are easier to learn.

MongoDB being used by Craigslist to store old archives: http://www.10gen.com/presentations/mongodb-craigslist-one-year-later

Not the answer you're looking for? Browse other questions tagged or ask your own question.