SlideShare a Scribd company logo
Sadayuki Furuhashi
Founder & Software Architect
ODBC & JDBC connectivity for Presto
Treasure Data, inc.
A little about me...
> Sadayuki Furuhashi
github/twitter: @frsyuki
> Treasure Data, Inc.
Founder & Software Architect
> Open source projects
MessagePack - efficient object serializer
Fluentd - data collection tool
ServerEngine - ruby framework to build multiprocess servers
LS4 - distributed object storage system (suspended)
kumofs - distributed key-value data store (suspended)
Background + Intro:
Background
Pig
• Tableau
• Pentaho
• Web apps
RDB, HTTP, etc.
“Plazma”
Columnar

Cloud Storage
This is us

(Treasure Data)
Pig
• Tableau
• Pentaho
• Web apps
RDB, HTTP, etc.
“Plazma”
Columnar

Cloud Storage
Data collection
> “Fluentd”streaming data collection tool
> Plugin architecture
> github.com/fluent/fluentd
Pig
• Tableau
• Pentaho
• Web apps
RDB, HTTP, etc.
“Plazma”
Columnar

Cloud Storage
Hadoop as a service
> “BigData”processing
• Funnel analysis for

web services
• Correlation analysis for

ad-tech (DSP/SSP/DMP)
• Creating OLAP cube
> Multi-tenant scheduling
• utilize idling resources

purchased by other users
Pig
• Tableau
• Pentaho
• Web apps
RDB, HTTP, etc.
“Plazma”
Columnar

Cloud Storage
Presto as a service
> Interactive queries
> Multi-tenant scheduling

(in progress)
Pig
• Tableau
• Pentaho
• Web apps
RDB, HTTP, etc.
“Plazma”
Columnar

Cloud Storage
Here is the problem…
ODBC/JDBC
Missing!
The problem to solve
• Providing open-source ODBC/JDBC connectivity 
for Presto quickly
• Tableau
• Pentaho
• Web apps
ODBC/JDBC
• ODBC/JDBC are VERY complicated API
> PostgreSQL ODBC driver: 60,000 lines
> PostgreSQL JDBC driver: 43,000 lines
A solution
•Using PostgreSQL ODBC/JDBC drivers
•Creating PostgreSQL protocol gateway
A solution
•Using PostgreSQL ODBC/JDBC drivers
•Creating PostgreSQL protocol gateway
PostgreSQL protocol gateway for Presto
feature-complete &

matured for many years
some middleware

already implemented
Architecture
Architecture
Tableau
Pentaho

Web apps

…
PostgreSQL protocol
PostgreSQL ODBC/JDBC driver,

Other PostgreSQL clients
pgpool-II
(patched)
Internal Architecture
Tableau…
select count(*) from x;
run_presto_as_temp_table(

…, ’select count(*) from x’);
patched pgpool-II wraps

the SQL in a function call
PostgreSQL
the function sends the

original sql to Presto
select count(*) from x;
SELECT from system catalogs
pgpool-II
(patched)
Tableau…
get table list
PostgreSQL
run CREATE TABLE

for each actual table
run the original query
to get metadata of tables
Demo
Limitations
• Server-side prepare is not supported
• Cursor (DECLARE/FETCH) is not supported
• JDBC driver needs ?protocolVersion=2 option
We’re hiring!
www.treasuredata.com/careers

More Related Content

Prestogres, ODBC & JDBC connectivity for Presto

  • 1. Sadayuki Furuhashi Founder & Software Architect ODBC & JDBC connectivity for Presto Treasure Data, inc.
  • 2. A little about me... > Sadayuki Furuhashi github/twitter: @frsyuki > Treasure Data, Inc. Founder & Software Architect > Open source projects MessagePack - efficient object serializer Fluentd - data collection tool ServerEngine - ruby framework to build multiprocess servers LS4 - distributed object storage system (suspended) kumofs - distributed key-value data store (suspended)
  • 4. Background Pig • Tableau • Pentaho • Web apps RDB, HTTP, etc. “Plazma” Columnar
 Cloud Storage This is us
 (Treasure Data)
  • 5. Pig • Tableau • Pentaho • Web apps RDB, HTTP, etc. “Plazma” Columnar
 Cloud Storage Data collection > “Fluentd”streaming data collection tool > Plugin architecture > github.com/fluent/fluentd
  • 6. Pig • Tableau • Pentaho • Web apps RDB, HTTP, etc. “Plazma” Columnar
 Cloud Storage Hadoop as a service > “BigData”processing • Funnel analysis for
 web services • Correlation analysis for
 ad-tech (DSP/SSP/DMP) • Creating OLAP cube > Multi-tenant scheduling • utilize idling resources
 purchased by other users
  • 7. Pig • Tableau • Pentaho • Web apps RDB, HTTP, etc. “Plazma” Columnar
 Cloud Storage Presto as a service > Interactive queries > Multi-tenant scheduling
 (in progress)
  • 8. Pig • Tableau • Pentaho • Web apps RDB, HTTP, etc. “Plazma” Columnar
 Cloud Storage Here is the problem… ODBC/JDBC Missing!
  • 9. The problem to solve • Providing open-source ODBC/JDBC connectivity  for Presto quickly • Tableau • Pentaho • Web apps ODBC/JDBC • ODBC/JDBC are VERY complicated API > PostgreSQL ODBC driver: 60,000 lines > PostgreSQL JDBC driver: 43,000 lines
  • 10. A solution •Using PostgreSQL ODBC/JDBC drivers •Creating PostgreSQL protocol gateway
  • 11. A solution •Using PostgreSQL ODBC/JDBC drivers •Creating PostgreSQL protocol gateway PostgreSQL protocol gateway for Presto feature-complete &
 matured for many years some middleware
 already implemented
  • 14. pgpool-II (patched) Internal Architecture Tableau… select count(*) from x; run_presto_as_temp_table(
 …, ’select count(*) from x’); patched pgpool-II wraps
 the SQL in a function call PostgreSQL the function sends the
 original sql to Presto select count(*) from x;
  • 15. SELECT from system catalogs pgpool-II (patched) Tableau… get table list PostgreSQL run CREATE TABLE
 for each actual table run the original query to get metadata of tables
  • 16. Demo
  • 17. Limitations • Server-side prepare is not supported • Cursor (DECLARE/FETCH) is not supported • JDBC driver needs ?protocolVersion=2 option