SlideShare a Scribd company logo
Sadayuki Furuhashi
Founder & Software Architect
Treasure Data, inc.
Presto + MySQL
道玄坂LT祭り
で分散SQL
A little about me...
> Sadayuki Furuhashi
> github/twitter: @frsyuki
> Treasure Data, Inc.
> Founder & Software Architect
> Open-source hacker
> MessagePack - Efficient object serializer
> Fluentd - An unified data collection tool
> ServerEngine - A Ruby framework to build multiprocess servers
> Prestogres - PostgreSQL protocol gateway for Presto
> LS4 - A distributed object storage with cross-region replication
> kumofs - A distributed strong-consistent key-value data store
Check: www.treasuredata.com
Cloud service for the entire data pipeline,
including Presto. We’re hiring!
Presto+MySQLで分散SQL
Presto+MySQLで分散SQL
What’s Presto?
A distributed SQL query engine

for interactive data analisys

against GBs to PBs of data.
Client
Coordinator Connector

Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
1. find servers in a cluster

Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
2. Client sends a query

using HTTP
Client
Coordinator Connector

Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
3. Coordinator builds

a query plan
Connector plugin

provides metadata
(table schema, etc.)
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
4. Coordinator sends

tasks to workers
Client
Coordinator Connector

Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
5. Workers read data

through connector plugin
Client
Coordinator Connector

Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
6. Workers run tasks

in memory
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
7. Client gets the result

from a worker
Client
Client
Coordinator Connector

Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
Client
Coordinator Hive

Connector
Worker
Worker
Worker
HDFS,

Hive Metastore
Discovery Service
find servers in a cluster
Hive connector
Client
Coordinator JDBC

Connector
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Cassandra connector
Client
Coordinator
other

connectors

...
Worker
Worker
Worker
PostgreSQL
Discovery Service
find servers in a cluster
Hive

Connector
HDFS / Metastore
Multiple connectors in a query
JDBC

Connector
Other data sources...
All stages are pipe-lined
✓ No wait time
✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
map map
reduce reduce
task task
task task
task
task
memory-to-memory
data transfer
✓ No disk IO
✓ Data chunk must
fit in memory
task
disk
map map
reduce reduce
disk
disk
Write data

to disk
Wait between

stages
Presto meetup!
Presto
JOIN
Hive
MySQL
client
select orderkey, orderdate, custkey, email

from orders

join mysql.presto_test.users

on orders.custkey = users.id

order by custkey, orderdate;
Presto
JOIN
Hive
MySQLINSERT INTO
client
create table mysql.presto_test.recent_user_info

as
select users.id, users.email, count(1) as count

from orders

join mysql.presto_test.users

on orders.custkey = users.id

group by 1, 2;
Presto
JOIN
Hive
MySQL
$ psql Prestogres
Presto
JOIN
Hive
MySQL
$ psql Prestogres
PostgreSQL protocol gateway
for Presto

More Related Content

Presto+MySQLで分散SQL