Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL

ToroDB
Open-source, MongoDB-compatible database,
built on top of PostgreSQL
Álvaro Hernández <aht@torodb.com>

ToroDB @NoSQLonSQL
About *8Kdata*
● Research & Development in databases
●
Consulting, Training and Support in PostgreSQL
●
Founders of PostgreSQL España, 3rd
largest PUG
in the world (>400 members as of today)
●
About myself: CTO at 8Kdata:
@ahachete
http://linkd.in/1jhvzQ3
www.8kdata.com

ToroDB @NoSQLonSQL
ToroDB in one slide
●
Document-oriented, JSON, NoSQL db
●
Open source (AGPL)
●
MongoDB compatibility (wire protocol
level)
●
Uses PostgreSQL as a storage backend

ToroDB @NoSQLonSQL
Why relational databases:
technical perspective
●
Document model is very appealing to
many. But all dbs started from scratch
●
DRY: why not use relational
databases? They are proven, durable,
concurrent and flexible
●
Why not base it on relational databases,
like PostgreSQL?

ToroDB @NoSQLonSQL
ToroDB
tables structure

ToroDB @NoSQLonSQL
ToroDB storage
●
Data is stored in tables. No blobs
●
JSON documents are split by hierarchy
levels into “subdocuments”, which
contain no nested structures. Each
subdocument level is stored separately
●
Subdocuments are classified by “type”.
Each “type” maps to a different table

ToroDB @NoSQLonSQL
ToroDB storage (II)
●
A “structure” table keeps the
subdocument “schema”
●
Keys in JSON are mapped to attributes,
which retain the original name
●
Tables are created dinamically and
transparently to match the exact types of
the documents

ToroDB @NoSQLonSQL
ToroDB storage internals
{
"name": "ToroDB",
"data": {
"a": 42, "b": "hello world!"
},
"nested": {
"j": 42,
"deeper": {
"a": 21, "b": "hello"
}
}
}

ToroDB @NoSQLonSQL
The document is split into the following subdocuments:
{ "name": "ToroDB", "data": {}, "nested": {} }
{ "a": 42, "b": "hello world!"}
{ "j": 42, "deeper": {}}
{ "a": 21, "b": "hello"}

ToroDB @NoSQLonSQL
select * from demo.t_3
┌─────┬───────┬────────────────────────────┬────────┐
│ did │ index │ _id │ name │
├─────┼───────┼────────────────────────────┼────────┤
│ 0 │ ¤ │ x5451a07de7032d23a908576d │ ToroDB │
└─────┴───────┴────────────────────────────┴────────┘
┌─────┬───────┬────┬──────────────┐
│ did │ index │ a │ b │
├─────┼───────┼────┼──────────────┤
│ 0 │ ¤ │ 42 │ hello world! │
│ 0 │ 1 │ 21 │ hello │
└─────┴───────┴────��──────────────┘
┌─────┬───────┬────┐
│ did │ index │ j │
├─────┼───────┼────┤
│ 0 │ ¤ │ 42 │
└─────┴───────┴────┘

ToroDB @NoSQLonSQL
select * from demo.structures
┌─────┬────────────────────────────────────────────────────────────────────────────┐
│ sid │ _structure │
├─────┼────────────────────────────────────────────────────────────────────────────┤
│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │
└─────┴────────────────────────────────────────────────────────────────────────────┘
select * from demo.root;
┌─────┬─────┐
│ did │ sid │
├─────┼─────┤
│ 0 │ 0 │
└─────┴─────┘

ToroDB @NoSQLonSQL
ToroDB storage and I/O savings
29% - 68% storage required,
compared to Mongo 2.6

ToroDB @NoSQLonSQL
The software
ToroDB is written in Java, compatible with
versions 6 and above.
It has been tested on Oracle's VM, but we
will also test and verify it on Azul's VM.
It is currently a standalone JAR file but will
also be offered as an EAR, to easily
deploy to application servers.

ToroDB @NoSQLonSQL
Going beyond MongoDB

ToroDB @NoSQLonSQL
MongoDB brought the document model
and several features that many love.
But can we go further than that?
Can't the foundation of relational
databases provide a basis for offering
new features on a NoSQL, document-like,
JSON database?

ToroDB @NoSQLonSQL
●
Avoid schema repetition. Query-by-type
●
Cheap single-node durability
●
“Clean” reads
●
Atomic bulk operations
●
Highest concurrency

ToroDB @NoSQLonSQL
The schema-less fallacy
{
“name”: “Álvaro”,
“surname”: “Hernández”,
“height”: 200,
“hobbies”: [
“PostgreSQL”, “triathlon”
]
}

ToroDB @NoSQLonSQL
{
“name”: “Álvaro”,
“surname”: “Hernández”,
“height”: 200,
“hobbies”: [
“PostgreSQL”, “triathlon”
]
}
metadata → Isn't that... schema?

ToroDB @NoSQLonSQL
The schema-less fallacy: BSON
metadata → Isn't that... schema?
{
“name”: (string) “Álvaro”,
“surname”: (string) “Hernández”,
“height”: (number) 200,
“hobbies”: {
“0”: (string) “PostgreSQL” ,
“1”: (string) “triathlon”
}
}

ToroDB @NoSQLonSQL
●
It's not schema-less
●
It is “attached-schema”
●
It carries an overhead which is not 0

ToroDB @NoSQLonSQL
Schema-attached repetition
{ “a”: 1, “b”: 2 }
{ “a”: 3 }
{ “a”: 4, “c”: 5 }
{ “a”: 6, “b”: 7 }
{ “b”: 8 }
{ “a”: 9, “b”: 10 }
{ “a”: 11, “b”: 12, “j”: 13 }
{ “a”: 14, “c”: 15 }
Counting
“document
types” in
collections
of millions:
at most,
1000s of
different
types

ToroDB @NoSQLonSQL
Schema-attached repetition
How data is stored in schema-less

ToroDB @NoSQLonSQL
This is how we store in ToroDB

ToroDB @NoSQLonSQL
ToroDB: query “by structure”
●
ToroDB is effectively partitioning by
type
●
Structures (schemas, partitioning types)
are cached in ToroDB memory
●
Queries only scan a subset of the data.
●
Negative queries are served directly
from memory.

ToroDB @NoSQLonSQL
●
Without journaling, MongoDB is not
durable nor crash-safe
●
MongoDB requires “j: true” for true
single-node durability. But who
guarantees its consistent usage? Who
uses it by default?
j:true creates I/O storms equivalent to
SQL CHECKPOINTs

ToroDB @NoSQLonSQL
“Clean” reads
Oh really?

ToroDB @NoSQLonSQL
“Clean” reads
http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior
“MongoDB will allow clients to read the results of a
write operation before the write operation returns.”
“If the mongod terminates before the journal
commits, even if a write returns successfully, queries
may have read data that will not exist after the
mongod restarts.”
“Other database systems refer to these isolation
semantics as read uncommitted.”

ToroDB @NoSQLonSQL
“Clean” reads
Thus, MongoDB suffers from dirty reads.
Or probably better called “tainted
reads”.
What about $snapshot? Nope:
“The snapshot() does not guarantee that the data returned
by the query will reflect a single moment in time nor does it
provide isolation from insert or delete operations.”
http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

ToroDB @NoSQLonSQL
ToroDB: going beyond MongoDB
●
PostgreSQL is 100% durable. Always.
And it's cheap (doesn't do I/O storms)
●
“Clean” reads
Cursors in ToroDB run in repeatable
read, read-only mode:
globalCursorDataSource.setTransactionIsolation("TRANSACTIO
N_REPEATABLE_READ");
globalCursorDataSource.setReadOnly(true);

ToroDB @NoSQLonSQL
Atomic operations
●
There is no support for atomic bulk
insert/update/delete operations
●
Not even with $isolated:
“Prevents a write operation that affects multiple documents
from yielding to other reads or writes […] You can ensure
that no client sees the changes until the operation completes
or errors out. The $isolated isolation operator does not
provide “all-or-nothing” atomicity for write
operations.”
http://docs.mongodb.org/manual/reference/operator/update/isolated/

ToroDB @NoSQLonSQL
High concurrency
●
MMAPv1 is still collection-locked
●
WiredTiger is document-locked
●
But still exclusive locks (MMAP). Most
relational databases have MVCC, which
means almost conflict-free readers and
writers at the same time

ToroDB @NoSQLonSQL
●
Atomic bulk operations
By default, bulk operations in ToroDB are
atomic. Use flag ContinueOnError: 1 to
perform non-atomic bulk operations
●
Highest concurrency
PostgreSQL uses MVCC. Readers and
writers do not block each other. Writers
block writers only for the same record
ToroDB: going beyond MongoDB

ToroDB @NoSQLonSQL
ToroDB: Developer Preview
●
ToroDB launched on October 2014, as
a Developer Preview. Support for CRUD
and most of the SELECT API
●
github.com/torodb
●
RERO policy. Comments, feedback,
patches... greatly appreciated
●
AGPLv3

ToroDB @NoSQLonSQL
ToroDB: Developer Preview
●
Clone the repo, build with Maven
●
Or download the JAR:
http://maven.torodb.com/jar/com/torodb/torodb/
0.20/torodb.jar
●
Usage:
java -jar torodb-0.20.jar –help
java -jar torodb-0.20.jar -d dbname -u dbuser -P 27017
Connect with normal mongo console!

ToroDB @NoSQLonSQL
ToroDB: Community Response

ToroDB @NoSQLonSQL
ToroDB: Roadmap
●
Current Developer Preview is
single-node
●
Version 1.0:
➔
Expected Q4 2015
➔
Production-ready
➔
MongoDB Replication support
➔
Very high compatibility with Mongo API

ToroDB @NoSQLonSQL
ToroDB: Development priorities
#1 Offer MongoDB-like experience on
top of existing IT infrastructure, like
relational databases and app servers
#2 Go beyond current MongoDB
features, like in ACID and concurrency
#3 Great performance

ToroDB @NoSQLonSQL
ToroDB: Experimental research directions
●
User columnar storage (CitusDB)
●
Use Postgres-XL as a backend. This
requires us to distribute ToroDB's cache
(ehcache, Hazelcast)
●
Use pg_shard for sharding

ToroDB @NoSQLonSQL
Big Data speaking mongo:
Vertical ToroDB
What if we use CitusData's cstore to store
the JSON documents?

ToroDB @NoSQLonSQL
1.17% - 20.26% storage required,
compared to Mongo 2.6
Big Data speaking mongo:
Vertical ToroDB

Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL

Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL

Related slideshows

More Related Content

Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL