SlideShare a Scribd company logo
Data Analytics Service Company
and Its Ruby Usage
RubyKaigi 2015 (Dec 12, 2015)
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.
Data Analytics Service Company and Its Ruby Usage
http://www.treasuredata.com/

Recommended for you

Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines

This document discusses automating analytics pipelines and workflows using a workflow engine. It describes the challenges of managing workflows across multiple cloud services and database technologies. It then introduces a multi-cloud workflow engine called Digdag that can automate workflows, handle errors, enable parallel execution, support modularization and parameterization. Examples are given of using Digdag to define and run workflows across services like BigQuery, Treasure Data, Redshift, and Tableau. Key features of Digdag like loops, parameters, parallel tasks and pushing workflows to servers with Docker are also summarized.

workflowdigdag
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby

This document discusses middleware in Ruby and provides examples of considerations when writing middleware: - Middleware should be a long-running daemon process that is compatible across platforms and environments and handles various data formats and traffic volumes. - Tests must be run on all supported platforms to ensure compatibility as thread and process scheduling differs between operating systems. - Memory usage and object leaks must be carefully managed in long-running processes to avoid consuming resources over time. - Performance of JSON parsing/generation should be benchmarked and the most optimized library used to avoid unnecessary CPU usage.

fluentdruby
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable

1) The document proposes making a key-value storage system (CDP KVS) 10 times more scalable to support real-time data delivery. 2) Three ideas are presented: using an alternative distributed KVS, implementing a storage hierarchy on the existing KVS, and shipping edit logs to indexed archives. 3) The storage hierarchy approach of partitioning, compressing, and writing data to DynamoDB in batches is selected as it improves write performance and reduces storage costs while remaining stateless.

http://www.fluentd.org/
Fluentd
Unified Logging Layer
For Stream Data
Written in CRuby
http://www.slideshare.net/treasure-data/the-basics-of-fluentd-35681111
Bulk Data Loader
High Throughput&Reliability
Embulk
Written in Java/JRuby
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
http://www.embulk.org/
Data Analytics Platform
Data Analytics Service
Data Analytics Service Company and Its Ruby Usage

Recommended for you

Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data

This document summarizes Tagomori Satoshi's presentation on handling "not so big data" at the YAPC::Asia 2014 conference. It discusses different types of data processing frameworks for various data sizes, from sub-gigabytes up to petabytes. It provides overviews of MapReduce, Spark, Tez, and stream processing frameworks. It also discusses what Hadoop is and how the Hadoop ecosystem has evolved to include these additional frameworks.

mppnorikrahadoop
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理

Digdag can automate large-scale data processing and handle errors. It provides constructs like operators, parameters, and task groups to organize workflows. Operators package tasks to run queries or process data. Parameters allow passing variables between tasks. Task groups modularize and organize workflows. Digdag supports error handling, monitoring, parallelization, versioning, and reproducing workflows across environments.

workflowdigdag
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then

Fluentd is an open source data collector that allows flexible data collection, processing, and output. It supports streaming data from sources like logs and metrics to destinations like databases, search engines, and object stores. Fluentd's plugin-based architecture allows it to support a wide variety of use cases. Recent versions of Fluentd have added features like improved plugin APIs, nanosecond time resolution, and Windows support to make it more suitable for containerized environments and low-latency applications.

fluentd
Data Analytics Service Company and Its Ruby Usage
Services
Services
JVM
Services
JVMC++

Recommended for you

Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark

Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them. Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in your job. The talk will wrap up with Holden trying to get everyone to buy several copies of her new book, High Performance Spark.

sparkbig data
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users

This document provides tips for troubleshooting common issues with Sqoop. It discusses how to effectively provide debugging information when seeking help, addresses specific problems with MySQL connections and importing to Hive, Oracle case-sensitive errors and export failures, and recommends best practices like using separate tables for import and export and specifying options correctly.

nychadoopmeetup
London devops logging
London devops loggingLondon devops logging
London devops logging

Tomas Doran presented on their implementation of Logstash at TIM Group to process over 55 million messages per day. Their applications are all Java/Scala/Clojure and they developed their own library to send structured log events as JSON to Logstash using ZeroMQ for reliability. They index data in Elasticsearch and use it for metrics, alerts and dashboards but face challenges with data growth.

Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Analytics Platform
• Data collection, storage
• Console & API endpoints
• Schema management
• Processing (batch, query, ...)
• Queuing & Scheduling
• Data connector/exporter
Treasure Data Internals

Recommended for you

User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB

User defined partitioning is a new partitioning strategy in Treasure Data that allows users to specify which column to use for partitioning, in addition to the default "time" column. This provides more flexible partitioning that better fits customer data platform workloads. The user can define partitioning rules through Presto or Hive to improve query performance by enabling colocated joins and filtering data by the partitioning column.

prestobig datacloud
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau

Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, as well as some common errors and how to detect them. Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version. In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.

spark summit eastapache spark
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs

An updated talk about how to use Solr for logs and other time-series data, like metrics and social media. In 2016, Solr, its ecosystem, and the operating systems it runs on have evolved quite a lot, so we can now show new techniques to scale and new knobs to tune. We'll start by looking at how to scale SolrCloud through a hybrid approach using a combination of time- and size-based indices, and also how to divide the cluster in tiers in order to handle the potentially spiky load in real-time. Then, we'll look at tuning individual nodes. We'll cover everything from commits, buffers, merge policies and doc values to OS settings like disk scheduler, SSD caching, and huge pages. Finally, we'll take a look at the pipeline of getting the logs to Solr and how to make it fast and reliable: where should buffers live, which protocols to use, where should the heavy processing be done (like parsing unstructured data), and which tools from the ecosystem can help.

solrsolrcloudtuning
Data Analytics Platform
• Data collection, storage: Ruby(OSS), Java/JRuby(OSS)
• Console & API endpoints: Ruby(RoR)
• Schema management: Ruby/Java (MessagePack)
• Processing (batch, query, ...): Java(Hadoop,Presto)
• Queuing & Scheduling: Ruby(OSS)
• Data connector/exporter: Java, Java/JRuby(OSS)
Treasure Data Architecture: Overview
http://www.slideshare.net/tagomoris/data-analytics-service-company-and-its-ruby-usage
Console
API
EventCollector
PlazmaDB
Worker
Hadoop
Cluster
Presto
Cluster
USERS
TD SDKs
SERVERS
DataConnector
CUSTOMER's
SYSTEMS
Scheduler
Console
API
EventCollector
PlazmaDB
Worker
Scheduler
Hadoop
Cluster
Presto
Cluster
USERS
TD SDKs
SERVERS
DataConnector
CUSTOMER's
SYSTEMS
Console
API
EventCollector
PlazmaDB
Worker
Scheduler
Hadoop
Cluster
Presto
Cluster
USERS
TD SDKs
SERVERS
DataConnector
CUSTOMER's
SYSTEMS
50k/day
200k/day

Recommended for you

Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL

Cloudera Morphlines is a new open source framework, recently added to the CDK, that reduces the time and skills necessary to integrate, build, and change Hadoop processing applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, or analytic online dashboards.

morphlineshadoopetl
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)

The document discusses Azure Cache and Redis. It provides an overview of Redis, including its data types, transactions, pub/sub capabilities, scripting, and sharding/partitioning. It then discusses common patterns for using Redis, such as caching, counting likes on Facebook, getting the latest reviews, rate limiting, and autocompletion. The document emphasizes that Redis is very flexible and can be used for more than just caching, acting as a general datastore. It concludes by recommending a Redis reference book for further learning.

cacheredisazure
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs

Using Solr to Search and Analyze Logs discusses how to use Solr and related tools like Logstash and Elasticsearch to index, search, and analyze log data. It covers sending logs to Solr using various methods like Logstash, Flume, and rsyslog plugins. It also discusses best practices for handling logs like production data in Solr, including using docValues, omitting norms/term positions, configuring caches, commits, and merges. The document concludes by discussing scaling options like SolrCloud and collections/aliases for partitioning logs over time.

flumesolrlogs
Console
API
EventCollector
PlazmaDB
Worker
Scheduler
Hadoop
Cluster
Presto
Cluster
USERS
TD SDKs
SERVERS
DataConnector
CUSTOMER's
SYSTEMS
50k/day
200k/day
12M/day
(138/sec)
Queue/Worker, Scheduler
• Treasure Data: multi-tenant data analytics service
• executes many jobs in shared clusters (queries,
imports, ...)
• CORE: queues-workers & schedulers
• Clusters have queues/scheduler... it's not enough
• resource limitations for each price plans
• priority queues for job types
• and many others
PerfectSched
https://github.com/treasure-data/perfectsched
PerfectSched
• Provides periodical/scheduled queries for customers
• it's like reliable "cron"
• Highly available distributed scheduler using RDBMS
• Written in CRuby
• At-least-once semantics
• PerfectSched enqueues jobs into PerfectQueue

Recommended for you

Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr

Presented by Rahul Jain, System Analyst (Software Engineer), IVY Comptech Pvt Ltd Consolidation and Indexing of logs to search them in real time poses an array of challenges when you have hundreds of servers producing terabytes of logs every day. Since the log events mostly have a small size of around 200 bytes to few KBs, makes it more difficult to handle because lesser the size of a log event, more the number of documents to index. In this session, we will discuss the challenges faced by us and solutions developed to overcome them. The list of items that will be covered in the talk are as follows. Methods to collect logs in real time. How Lucene was tuned to achieve an indexing rate of 1 GB in 46 seconds Tips and techniques incorporated/used to manage distributed index generation and search on multiple shards How choosing a layer based partition strategy helped us to bring down the search response times. Log analysis and generation of analytics using Solr. Design and architecture used to build the search platform.

solrlucene/solr revolutionlucene/solr
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores

Cloud deployments of Apache Hadoop are becoming more commonplace. Yet Hadoop and it's applications don't integrate that well —something which starts right down at the file IO operations. This talk looks at how to make use of cloud object stores in Hadoop applications, including Hive and Spark. It will go from the foundational "what's an object store?" to the practical "what should I avoid" and the timely "what's new in Hadoop?" — the latter covering the improved S3 support in Hadoop 2.8+. I'll explore the details of benchmarking and improving object store IO in Hive and Spark, showing what developers can do in order to gain performance improvements in their own code —and equally, what they must avoid. Finally, I'll look at ongoing work, especially "S3Guard" and what its fast and consistent file metadata operations promise.

apacheconsparkhadoop
How to create/improve OSS product and its community (revised)
How to create/improve OSS product and its community (revised)How to create/improve OSS product and its community (revised)
How to create/improve OSS product and its community (revised)

1) The document discusses how to create and improve open source software (OSS) projects and their communities. It addresses questions around the purpose of the OSS, languages used, versioning, and community engagement. 2) Key recommendations for building community include using English, being open to contributions, demonstrating stability and maintenance, and having a pluggable architecture. 3) The document debates tradeoffs like clean code vs quick contributions, focused vs feature-rich software, and localized vs global development and highlights the need to choose approaches given limitations. Overall it stresses continuous improvement over time.

Jobs in TD
LOST Duplicated
Retried
for
Errors
Throughput
Execution
time
DATA
import/
export
NG NG
OK
or
NG
HIGH SHORT
(secs-mins)
QUERY NG OK OK LOW
SHORT
(secs)
or
LONG
(mins-hours)
PerfectQueue
https://github.com/treasure-data/perfectqueue
Worker
Worker
Worker
Worker
PerfectQueue overview
Worker
MySQL
1 table for 1 queue
workers for queues
hypervisor process
worker processes
worker processes
worker processes
Features
• Priorities for query types
• Resource limits per accounts
• Graceful restarts
• Queries must run long time (<= 1d)
• New worker code should be loaded, besides
running job with older code

Recommended for you

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage

This document summarizes Satoshi Tagomori's presentation on Treasure Data, a data analytics service company. It discusses Treasure Data's use of Ruby for various components of its platform including its logging (Fluentd), ETL (Embulk), scheduling (PerfectSched), and storage (PlazmaDB) technologies. The document also provides an overview of Treasure Data's architecture including how it collects, stores, processes, and visualizes customer data using open source tools integrated with services like Hadoop and Presto.

rubyeurukoeuruko2015
RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby

This document summarizes Yurie Yamane's presentation on using mruby to make robots. It discusses using mruby and TOPPERS on the LEGO EV3 to create an inverted pendulum self-balancing robot. It then covers creating a DIY self-balancing robot using a Raspberry Pi, gyro sensor, DC motors, and a motor driver. Code examples are provided for reading sensor values, controlling motors with PWM, and implementing a balancing algorithm using a balancer class. Source code links are included for further reference.

raspberry piev3mruby
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015

It's Dangerous to GC alone. Take this! IBM's talk on work integrating the OMR GC into Ruby. OMR preview: goo.gl/P3yXuy

rubykaigirubygc
PerfectQueue
• Highly available distributed queue using RDBMS
• Enqueue by INSERT INTO
• Dequeue/Commit by UPDATE
• Using transactions
• Flexible scheduling rather than scalability
• Workers does many things
• Plazmadb operations (including importing data)
• Building job parameters
• Handling results of jobs + kicking other jobs
• Using Amazon RDS (MySQL) internally (+ Workers on EC2)
Builing Jobs/Parameters
• Parameters
• for job types, accounts, price plans and clusters
• to control performance/parallelism, permissions
and data types
• ex: Java properties
• Jobs
• to prepare for customers' queries
• to make queries safer/faster
• ex: Hive Queries (HiveQL, a variety of SQL)
Example: Hive job
env HADOOP_CLASSPATH=test.jar:td-hadoop-1.0.jar 
HADOOP_OPTS="-Xmx738m -Duser.name=221" 
hive --service jar td-hadoop-1.0.jar 
com.treasure_data.hadoop.hive.runner.QueryRunner 
-hiveconf td.jar.version= 
-hiveconf plazma.metadb.config={} 
-hiveconf plazma.storage.config={} 
-hiveconf td.worker.database.config={} 
-hiveconf mapreduce.job.priority=HIGH 
-hiveconf mapreduce.job.queuename=root.q221.high 
-hiveconf mapreduce.job.name=HiveJob379515 
-hiveconf td.query.mergeThreshold=1333382400 
-hiveconf td.query.apikey=12345 
-hiveconf td.scheduled.time=1342449253 
-hiveconf td.outdir=./jobs/379515 
-hiveconf hive.metastore.warehouse.dir=/user/hive/221/warehouse 
-hiveconf hive.auto.convert.join.noconditionaltask=false 
-hiveconf hive.mapjoin.localtask.max.memory.usage=0.7 
-hiveconf hive.mapjoin.smalltable.filesize=25000000 
-hiveconf hive.resultset.use.unique.column.names=false 
-hiveconf hive.auto.convert.join=false 
-hiveconf hive.optimize.sort.dynamic.partition=false 
-hiveconf mapreduce.job.reduces=-1 
-hiveconf hive.vectorized.execution.enabled=false 
-hiveconf mapreduce.job.ubertask.enable=true 
-hiveconf yarn.app.mapreduce.am.resource.mb=2048
env HADOOP_CLASSPATH=test.jar:td-hadoop-1.0.jar 
HADOOP_OPTS="-Xmx738m -Duser.name=221" 
hive --service jar td-hadoop-1.0.jar 
com.treasure_data.hadoop.hive.runner.QueryRunner 
-hiveconf td.jar.version= 
-hiveconf plazma.metadb.config={} 
-hiveconf plazma.storage.config={} 
-hiveconf td.worker.database.config={} 
-hiveconf mapreduce.job.priority=HIGH 
-hiveconf mapreduce.job.queuename=root.q221.high 
-hiveconf mapreduce.job.name=HiveJob379515 
-hiveconf td.query.mergeThreshold=1333382400 
-hiveconf td.query.apikey=12345 
-hiveconf td.scheduled.time=1342449253 
-hiveconf td.outdir=./jobs/379515 
-hiveconf hive.metastore.warehouse.dir=/user/hive/221/warehouse 
-hiveconf hive.auto.convert.join.noconditionaltask=false 
-hiveconf hive.mapjoin.localtask.max.memory.usage=0.7 
-hiveconf hive.mapjoin.smalltable.filesize=25000000 
-hiveconf hive.resultset.use.unique.column.names=false 
-hiveconf hive.auto.convert.join=false 
-hiveconf hive.optimize.sort.dynamic.partition=false 
-hiveconf mapreduce.job.reduces=-1 
-hiveconf hive.vectorized.execution.enabled=false 
-hiveconf mapreduce.job.ubertask.enable=true 
-hiveconf yarn.app.mapreduce.am.resource.mb=2048 
-hiveconf mapreduce.job.ubertask.maxmaps=1 
-hiveconf mapreduce.job.ubertask.maxreduces=1 
-hiveconf mapreduce.job.ubertask.maxbytes=536870912 
-hiveconf td.hive.insertInto.dynamic.partitioning=false 
-outdir ./jobs/379515

Recommended for you

Ruby meets Go
Ruby meets GoRuby meets Go
Ruby meets Go

Presentation in RubyKaigi 2015

nttrubygolang
"fireap" - fast task runner on consul
"fireap" - fast task runner on consul"fireap" - fast task runner on consul
"fireap" - fast task runner on consul

"Fireap" is a consul-based rapid propagative task runner for large systems. https://github.com/key-amb/fireap

rubyinfrastructureserverside
grifork - fast propagative task runner -
grifork - fast propagative task runner -grifork - fast propagative task runner -
grifork - fast propagative task runner -

Grifork runs defined tasks on the system in a way like tree's branching. Give grifork a list of hosts, then it creates a tree graph internally, and runs tasks in a top-down way.

task runnerrubysoftware deployment
Example: Hive job (cont)
ADD JAR 'td-hadoop-1.0.jar';
CREATE DATABASE IF NOT EXISTS `db`;
USE `db`;
CREATE TABLE tagomoris (`v` MAP<STRING,STRING>, `time` INT)
STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler'
WITH SERDEPROPERTIES ('msgpack.columns.mapping'='*,time')
TBLPROPERTIES (
'td.storage.user'='221',
'td.storage.database'='dfc',
'td.storage.table'='users_20100604_080812_ce9203d0',
'td.storage.path'='221/dfc/users_20100604_080812_ce9203d0',
'td.table_id'='2',
'td.modifiable'='true',
'plazma.data_set.name'='221/dfc/users_20100604_080812_ce9203d0'
);
CREATE TABLE tbl1 (
`uid` INT,
`key` STRING,
`time` INT
)
STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler'
WITH SERDEPROPERTIES ('msgpack.columns.mapping'='uid,key,time')
TBLPROPERTIES (
'td.storage.user'='221',
'td.storage.database'='dfc',
ADD JAR 'td-hadoop-1.0.jar';
CREATE DATABASE IF NOT EXISTS `db`;
USE `db`;
CREATE TABLE tagomoris (`v` MAP<STRING,STRING>, `time` INT)
STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler'
WITH SERDEPROPERTIES ('msgpack.columns.mapping'='*,time')
TBLPROPERTIES (
'td.storage.user'='221',
'td.storage.database'='dfc',
'td.storage.table'='users_20100604_080812_ce9203d0',
'td.storage.path'='221/dfc/users_20100604_080812_ce9203d0',
'td.table_id'='2',
'td.modifiable'='true',
'plazma.data_set.name'='221/dfc/users_20100604_080812_ce9203d0'
);
CREATE TABLE tbl1 (
`uid` INT,
`key` STRING,
`time` INT
)
STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler'
WITH SERDEPROPERTIES ('msgpack.columns.mapping'='uid,key,time')
TBLPROPERTIES (
'td.storage.user'='221',
'td.storage.database'='dfc',
'td.storage.table'='contests_20100606_120720_96abe81a',
'td.storage.path'='221/dfc/contests_20100606_120720_96abe81a',
'td.table_id'='4',
'td.modifiable'='true',
'plazma.data_set.name'='221/dfc/contests_20100606_120720_96abe81a'
);
USE `db`;
USE `db`;
CREATE TEMPORARY FUNCTION MSGPACK_SERIALIZE AS
'com.treasure_data.hadoop.hive.udf.MessagePackSerialize';
CREATE TEMPORARY FUNCTION TD_TIME_RANGE AS
'com.treasure_data.hadoop.hive.udf.GenericUDFTimeRange';
CREATE TEMPORARY FUNCTION TD_TIME_ADD AS
'com.treasure_data.hadoop.hive.udf.UDFTimeAdd';
CREATE TEMPORARY FUNCTION TD_TIME_FORMAT AS
'com.treasure_data.hadoop.hive.udf.UDFTimeFormat';
CREATE TEMPORARY FUNCTION TD_TIME_PARSE AS
'com.treasure_data.hadoop.hive.udf.UDFTimeParse';
CREATE TEMPORARY FUNCTION TD_SCHEDULED_TIME AS
'com.treasure_data.hadoop.hive.udf.GenericUDFScheduledTime';
CREATE TEMPORARY FUNCTION TD_X_RANK AS
'com.treasure_data.hadoop.hive.udf.Rank';
CREATE TEMPORARY FUNCTION TD_FIRST AS
'com.treasure_data.hadoop.hive.udf.GenericUDAFFirst';
CREATE TEMPORARY FUNCTION TD_LAST AS
'com.treasure_data.hadoop.hive.udf.GenericUDAFLast';
CREATE TEMPORARY FUNCTION TD_SESSIONIZE AS
'com.treasure_data.hadoop.hive.udf.UDFSessionize';
CREATE TEMPORARY FUNCTION TD_PARSE_USER_AGENT AS
'com.treasure_data.hadoop.hive.udf.GenericUDFParseUserAgent';
CREATE TEMPORARY FUNCTION TD_HEX2NUM AS
'com.treasure_data.hadoop.hive.udf.UDFHex2num';
CREATE TEMPORARY FUNCTION TD_MD5 AS
'com.treasure_data.hadoop.hive.udf.UDFmd5';
CREATE TEMPORARY FUNCTION TD_RANK_SEQUENCE AS
'com.treasure_data.hadoop.hive.udf.UDFRankSequence';
CREATE TEMPORARY FUNCTION TD_STRING_EXPLODER AS
'com.treasure_data.hadoop.hive.udf.GenericUDTFStringExploder';
CREATE TEMPORARY FUNCTION TD_URL_DECODE AS
CREATE TEMPORARY FUNCTION TD_URL_DECODE AS
'com.treasure_data.hadoop.hive.udf.UDFUrlDecode';
CREATE TEMPORARY FUNCTION TD_DATE_TRUNC AS
'com.treasure_data.hadoop.hive.udf.UDFDateTrunc';
CREATE TEMPORARY FUNCTION TD_LAT_LONG_TO_COUNTRY AS
'com.treasure_data.hadoop.hive.udf.UDFLatLongToCountry';
CREATE TEMPORARY FUNCTION TD_SUBSTRING_INENCODING AS
'com.treasure_data.hadoop.hive.udf.GenericUDFSubstringInEncoding';
CREATE TEMPORARY FUNCTION TD_DIVIDE AS
'com.treasure_data.hadoop.hive.udf.GenericUDFDivide';
CREATE TEMPORARY FUNCTION TD_SUMIF AS
'com.treasure_data.hadoop.hive.udf.GenericUDAFSumIf';
CREATE TEMPORARY FUNCTION TD_AVGIF AS
'com.treasure_data.hadoop.hive.udf.GenericUDAFAvgIf';
CREATE TEMPORARY FUNCTION hivemall_version AS
'hivemall.HivemallVersionUDF';
CREATE TEMPORARY FUNCTION perceptron AS
'hivemall.classifier.PerceptronUDTF';
CREATE TEMPORARY FUNCTION train_perceptron AS
'hivemall.classifier.PerceptronUDTF';
CREATE TEMPORARY FUNCTION train_pa AS
'hivemall.classifier.PassiveAggressiveUDTF';
CREATE TEMPORARY FUNCTION train_pa1 AS
'hivemall.classifier.PassiveAggressiveUDTF';
CREATE TEMPORARY FUNCTION train_pa2 AS
'hivemall.classifier.PassiveAggressiveUDTF';
CREATE TEMPORARY FUNCTION train_cw AS
'hivemall.classifier.ConfidenceWeightedUDTF';
CREATE TEMPORARY FUNCTION train_arow AS
'hivemall.classifier.AROWClassifierUDTF';
CREATE TEMPORARY FUNCTION train_arowh AS
'hivemall.classifier.AROWClassifierUDTF';

Recommended for you

Introduction to poloxy - proxy for alerting
Introduction to poloxy - proxy for alertingIntroduction to poloxy - proxy for alerting
Introduction to poloxy - proxy for alerting

- "poloxy" is a proxy system that pools and proxies alerts to prevent monitoring systems from bursting alerts to recipients. It works by having monitoring systems send alerts to "poloxy" instead of recipients. "poloxy" then enqueues alerts into a queue and has a worker dequeue alerts every minute and deliver them to original recipients, merging duplicate alerts and preventing repeated alerts from being delivered too frequently. This allows customizing how alerts are merged and controlling the frequency of alerts received by each recipient.

system operationmonitoring systemruby
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)

Tips and experiences about building and running Ruby on Solaris OS, with examples of bug fixes for supporting Solaris on Ruby.

#rubykaigi2015#rubykaigib#rubykaigi
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era

Distributed Logging Architecture in Container Era The document discusses distributed logging architecture in the container era. It covers: 1) The difficulties of logging with microservices and containers due to their ephemeral and distributed nature, 2) The need to redesign logging to push logs from containers to destinations quickly without fixed addresses or mappings; 3) Common patterns for distributed logging architectures including source aggregation, destination aggregation, and scaling; and 4) A case study using Docker and Fluentd to implement source aggregation and scaling for logging. Open source solutions are important to keep the logging layer transparent, interoperable, and able to scale independently of applications and infrastructure.

dockerfluentd
CREATE TEMPORARY FUNCTION train_arowh AS
'hivemall.classifier.AROWClassifierUDTF';
CREATE TEMPORARY FUNCTION train_scw AS
'hivemall.classifier.SoftConfideceWeightedUDTF';
CREATE TEMPORARY FUNCTION train_scw2 AS
'hivemall.classifier.SoftConfideceWeightedUDTF';
CREATE TEMPORARY FUNCTION adagrad_rda AS
'hivemall.classifier.AdaGradRDAUDTF';
CREATE TEMPORARY FUNCTION train_adagrad_rda AS
'hivemall.classifier.AdaGradRDAUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_perceptron AS
'hivemall.classifier.multiclass.MulticlassPerceptronUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_pa AS
'hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_pa1 AS
'hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_pa2 AS
'hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_cw AS
'hivemall.classifier.multiclass.MulticlassConfidenceWeightedUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_arow AS
'hivemall.classifier.multiclass.MulticlassAROWClassifierUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_scw AS
'hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF';
CREATE TEMPORARY FUNCTION train_multiclass_scw2 AS
'hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF';
CREATE TEMPORARY FUNCTION cosine_similarity AS
'hivemall.knn.similarity.CosineSimilarityUDF';
CREATE TEMPORARY FUNCTION cosine_sim AS
'hivemall.knn.similarity.CosineSimilarityUDF';
CREATE TEMPORARY FUNCTION jaccard AS
'hivemall.knn.similarity.JaccardIndexUDF';
CREATE TEMPORARY FUNCTION jaccard AS
'hivemall.knn.similarity.JaccardIndexUDF';
CREATE TEMPORARY FUNCTION jaccard_similarity AS
'hivemall.knn.similarity.JaccardIndexUDF';
CREATE TEMPORARY FUNCTION angular_similarity AS
'hivemall.knn.similarity.AngularSimilarityUDF';
CREATE TEMPORARY FUNCTION euclid_similarity AS
'hivemall.knn.similarity.EuclidSimilarity';
CREATE TEMPORARY FUNCTION distance2similarity AS
'hivemall.knn.similarity.Distance2SimilarityUDF';
CREATE TEMPORARY FUNCTION hamming_distance AS
'hivemall.knn.distance.HammingDistanceUDF';
CREATE TEMPORARY FUNCTION popcnt AS 'hivemall.knn.distance.PopcountUDF';
CREATE TEMPORARY FUNCTION kld AS 'hivemall.knn.distance.KLDivergenceUDF';
CREATE TEMPORARY FUNCTION euclid_distance AS
'hivemall.knn.distance.EuclidDistanceUDF';
CREATE TEMPORARY FUNCTION cosine_distance AS
'hivemall.knn.distance.CosineDistanceUDF';
CREATE TEMPORARY FUNCTION angular_distance AS
'hivemall.knn.distance.AngularDistanceUDF';
CREATE TEMPORARY FUNCTION jaccard_distance AS
'hivemall.knn.distance.JaccardDistanceUDF';
CREATE TEMPORARY FUNCTION manhattan_distance AS
'hivemall.knn.distance.ManhattanDistanceUDF';
CREATE TEMPORARY FUNCTION minkowski_distance AS
'hivemall.knn.distance.MinkowskiDistanceUDF';
CREATE TEMPORARY FUNCTION minhashes AS 'hivemall.knn.lsh.MinHashesUDF';
CREATE TEMPORARY FUNCTION minhash AS 'hivemall.knn.lsh.MinHashUDTF';
CREATE TEMPORARY FUNCTION bbit_minhash AS
'hivemall.knn.lsh.bBitMinHashUDF';
CREATE TEMPORARY FUNCTION voted_avg AS
'hivemall.ensemble.bagging.VotedAvgUDAF';
CREATE TEMPORARY FUNCTION voted_avg AS
'hivemall.ensemble.bagging.VotedAvgUDAF';
CREATE TEMPORARY FUNCTION weight_voted_avg AS
'hivemall.ensemble.bagging.WeightVotedAvgUDAF';
CREATE TEMPORARY FUNCTION wvoted_avg AS
'hivemall.ensemble.bagging.WeightVotedAvgUDAF';
CREATE TEMPORARY FUNCTION max_label AS
'hivemall.ensemble.MaxValueLabelUDAF';
CREATE TEMPORARY FUNCTION maxrow AS 'hivemall.ensemble.MaxRowUDAF';
CREATE TEMPORARY FUNCTION argmin_kld AS
'hivemall.ensemble.ArgminKLDistanceUDAF';
CREATE TEMPORARY FUNCTION mhash AS
'hivemall.ftvec.hashing.MurmurHash3UDF';
CREATE TEMPORARY FUNCTION sha1 AS 'hivemall.ftvec.hashing.Sha1UDF';
CREATE TEMPORARY FUNCTION array_hash_values AS
'hivemall.ftvec.hashing.ArrayHashValuesUDF';
CREATE TEMPORARY FUNCTION prefixed_hash_values AS
'hivemall.ftvec.hashing.ArrayPrefixedHashValuesUDF';
CREATE TEMPORARY FUNCTION polynomial_features AS
'hivemall.ftvec.pairing.PolynomialFeaturesUDF';
CREATE TEMPORARY FUNCTION powered_features AS
'hivemall.ftvec.pairing.PoweredFeaturesUDF';
CREATE TEMPORARY FUNCTION rescale AS 'hivemall.ftvec.scaling.RescaleUDF';
CREATE TEMPORARY FUNCTION rescale_fv AS
'hivemall.ftvec.scaling.RescaleUDF';
CREATE TEMPORARY FUNCTION zscore AS 'hivemall.ftvec.scaling.ZScoreUDF';
CREATE TEMPORARY FUNCTION normalize AS
'hivemall.ftvec.scaling.L2NormalizationUDF';
CREATE TEMPORARY FUNCTION conv2dense AS
'hivemall.ftvec.conv.ConvertToDenseModelUDAF';
CREATE TEMPORARY FUNCTION to_dense_features AS
'hivemall.ftvec.conv.ToDenseFeaturesUDF';
CREATE TEMPORARY FUNCTION to_dense_features AS
'hivemall.ftvec.conv.ToDenseFeaturesUDF';
CREATE TEMPORARY FUNCTION to_dense AS
'hivemall.ftvec.conv.ToDenseFeaturesUDF';
CREATE TEMPORARY FUNCTION to_sparse_features AS
'hivemall.ftvec.conv.ToSparseFeaturesUDF';
CREATE TEMPORARY FUNCTION to_sparse AS
'hivemall.ftvec.conv.ToSparseFeaturesUDF';
CREATE TEMPORARY FUNCTION quantify AS
'hivemall.ftvec.conv.QuantifyColumnsUDTF';
CREATE TEMPORARY FUNCTION vectorize_features AS
'hivemall.ftvec.trans.VectorizeFeaturesUDF';
CREATE TEMPORARY FUNCTION categorical_features AS
'hivemall.ftvec.trans.CategoricalFeaturesUDF';
CREATE TEMPORARY FUNCTION indexed_features AS
'hivemall.ftvec.trans.IndexedFeatures';
CREATE TEMPORARY FUNCTION quantified_features AS
'hivemall.ftvec.trans.QuantifiedFeaturesUDTF';
CREATE TEMPORARY FUNCTION quantitative_features AS
'hivemall.ftvec.trans.QuantitativeFeaturesUDF';
CREATE TEMPORARY FUNCTION amplify AS
'hivemall.ftvec.amplify.AmplifierUDTF';
CREATE TEMPORARY FUNCTION rand_amplify AS
'hivemall.ftvec.amplify.RandomAmplifierUDTF';
CREATE TEMPORARY FUNCTION addBias AS 'hivemall.ftvec.AddBiasUDF';
CREATE TEMPORARY FUNCTION add_bias AS 'hivemall.ftvec.AddBiasUDF';
CREATE TEMPORARY FUNCTION sortByFeature AS
'hivemall.ftvec.SortByFeatureUDF';
CREATE TEMPORARY FUNCTION sort_by_feature AS
'hivemall.ftvec.SortByFeatureUDF';
CREATE TEMPORARY FUNCTION extract_feature AS
'hivemall.ftvec.ExtractFeatureUDF';

Recommended for you

The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015

Video presentation: https://www.youtube.com/watch?v=jLAFXQ1Av50 Most applications written in Ruby are great, but also exists evil code applying WOP techniques. There are many workarounds in several programming languages, but in Ruby, when it happens, the proportion is bigger. It's very easy to write Ruby code with collateral damage. You will see a collection of bad Ruby codes, with a description of how these codes affected negatively their applications and the solutions to fix and avoid them. Long classes, coupling, misapplication of OO, illegible code, tangled flows, naming issues and other things you can ever imagine are examples what you'll get.

rubysoftware developmentprodis
Introduction to Meteor & React
Introduction to Meteor & ReactIntroduction to Meteor & React
Introduction to Meteor & React

This document summarizes a presentation about Meteor and React. It introduces Meteor as a full-stack platform for building web and mobile apps with JavaScript. It demonstrates storing messages in a Meteor collection and rendering them with React components. It also covers latency compensation, user accounts, and the future of Meteor integrating React, Redux, GraphQL and Socket.io on the backend.

javascriptmeteorreact
つくること = 生きること : パターン・ランゲージによる創造の支援
つくること = 生きること : パターン・ランゲージによる創造の支援 つくること = 生きること : パターン・ランゲージによる創造の支援
つくること = 生きること : パターン・ランゲージによる創造の支援

成蹊大学「コース特殊講義A (デザインとIT)」(坂井直樹先生)での井庭崇の講演スライド。2013年12月3日(火)。

creative societycreativitypattern language
CREATE TEMPORARY FUNCTION extract_feature AS
'hivemall.ftvec.ExtractFeatureUDF';
CREATE TEMPORARY FUNCTION extract_weight AS
'hivemall.ftvec.ExtractWeightUDF';
CREATE TEMPORARY FUNCTION add_feature_index AS
'hivemall.ftvec.AddFeatureIndexUDF';
CREATE TEMPORARY FUNCTION feature AS 'hivemall.ftvec.FeatureUDF';
CREATE TEMPORARY FUNCTION feature_index AS
'hivemall.ftvec.FeatureIndexUDF';
CREATE TEMPORARY FUNCTION tf AS 'hivemall.ftvec.text.TermFrequencyUDAF';
CREATE TEMPORARY FUNCTION train_logregr AS
'hivemall.regression.LogressUDTF';
CREATE TEMPORARY FUNCTION train_pa1_regr AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION train_pa1a_regr AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION train_pa2_regr AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION train_pa2a_regr AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION train_arow_regr AS
'hivemall.regression.AROWRegressionUDTF';
CREATE TEMPORARY FUNCTION train_arowe_regr AS
'hivemall.regression.AROWRegressionUDTF';
CREATE TEMPORARY FUNCTION train_arowe2_regr AS
'hivemall.regression.AROWRegressionUDTF';
CREATE TEMPORARY FUNCTION train_adagrad_regr AS
'hivemall.regression.AdaGradUDTF';
CREATE TEMPORARY FUNCTION train_adadelta_regr AS
'hivemall.regression.AdaDeltaUDTF';
CREATE TEMPORARY FUNCTION train_adagrad AS
'hivemall.regression.AdaGradUDTF';
CREATE TEMPORARY FUNCTION train_adagrad AS
'hivemall.regression.AdaGradUDTF';
CREATE TEMPORARY FUNCTION train_adadelta AS
'hivemall.regression.AdaDeltaUDTF';
CREATE TEMPORARY FUNCTION logress AS 'hivemall.regression.LogressUDTF';
CREATE TEMPORARY FUNCTION pa1_regress AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION pa1a_regress AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION pa2_regress AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION pa2a_regress AS
'hivemall.regression.PassiveAggressiveRegressionUDTF';
CREATE TEMPORARY FUNCTION arow_regress AS
'hivemall.regression.AROWRegressionUDTF';
CREATE TEMPORARY FUNCTION arowe_regress AS
'hivemall.regression.AROWRegressionUDTF';
CREATE TEMPORARY FUNCTION arowe2_regress AS
'hivemall.regression.AROWRegressionUDTF';
CREATE TEMPORARY FUNCTION adagrad AS 'hivemall.regression.AdaGradUDTF';
CREATE TEMPORARY FUNCTION adadelta AS 'hivemall.regression.AdaDeltaUDTF';
CREATE TEMPORARY FUNCTION float_array AS
'hivemall.tools.array.AllocFloatArrayUDF';
CREATE TEMPORARY FUNCTION array_remove AS
'hivemall.tools.array.ArrayRemoveUDF';
CREATE TEMPORARY FUNCTION sort_and_uniq_array AS
'hivemall.tools.array.SortAndUniqArrayUDF';
CREATE TEMPORARY FUNCTION subarray_endwith AS
'hivemall.tools.array.SubarrayEndWithUDF';
CREATE TEMPORARY FUNCTION subarray_startwith AS
'hivemall.tools.array.SubarrayStartWithUDF';
CREATE TEMPORARY FUNCTION collect_all AS
CREATE TEMPORARY FUNCTION collect_all AS
'hivemall.tools.array.CollectAllUDAF';
CREATE TEMPORARY FUNCTION concat_array AS
'hivemall.tools.array.ConcatArrayUDF';
CREATE TEMPORARY FUNCTION subarray AS 'hivemall.tools.array.SubarrayUDF';
CREATE TEMPORARY FUNCTION array_avg AS
'hivemall.tools.array.ArrayAvgGenericUDAF';
CREATE TEMPORARY FUNCTION array_sum AS
'hivemall.tools.array.ArraySumUDAF';
CREATE TEMPORARY FUNCTION to_string_array AS
'hivemall.tools.array.ToStringArrayUDF';
CREATE TEMPORARY FUNCTION map_get_sum AS
'hivemall.tools.map.MapGetSumUDF';
CREATE TEMPORARY FUNCTION map_tail_n AS 'hivemall.tools.map.MapTailNUDF';
CREATE TEMPORARY FUNCTION to_map AS 'hivemall.tools.map.UDAFToMap';
CREATE TEMPORARY FUNCTION to_ordered_map AS
'hivemall.tools.map.UDAFToOrderedMap';
CREATE TEMPORARY FUNCTION sigmoid AS
'hivemall.tools.math.SigmoidGenericUDF';
CREATE TEMPORARY FUNCTION taskid AS 'hivemall.tools.mapred.TaskIdUDF';
CREATE TEMPORARY FUNCTION jobid AS 'hivemall.tools.mapred.JobIdUDF';
CREATE TEMPORARY FUNCTION rowid AS 'hivemall.tools.mapred.RowIdUDF';
CREATE TEMPORARY FUNCTION generate_series AS
'hivemall.tools.GenerateSeriesUDTF';
CREATE TEMPORARY FUNCTION convert_label AS
'hivemall.tools.ConvertLabelUDF';
CREATE TEMPORARY FUNCTION x_rank AS 'hivemall.tools.RankSequenceUDF';
CREATE TEMPORARY FUNCTION each_top_k AS 'hivemall.tools.EachTopKUDTF';
CREATE TEMPORARY FUNCTION tokenize AS 'hivemall.tools.text.TokenizeUDF';
CREATE TEMPORARY FUNCTION is_stopword AS
'hivemall.tools.text.StopwordUDF';
CREATE TEMPORARY FUNCTION split_words AS
CREATE TEMPORARY FUNCTION split_words AS
'hivemall.tools.text.SplitWordsUDF';
CREATE TEMPORARY FUNCTION normalize_unicode AS
'hivemall.tools.text.NormalizeUnicodeUDF';
CREATE TEMPORARY FUNCTION lr_datagen AS
'hivemall.dataset.LogisticRegressionDataGeneratorUDTF';
CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.FMeasureUDAF';
CREATE TEMPORARY FUNCTION mae AS
'hivemall.evaluation.MeanAbsoluteErrorUDAF';
CREATE TEMPORARY FUNCTION mse AS
'hivemall.evaluation.MeanSquaredErrorUDAF';
CREATE TEMPORARY FUNCTION rmse AS
'hivemall.evaluation.RootMeanSquaredErrorUDAF';
CREATE TEMPORARY FUNCTION mf_predict AS 'hivemall.mf.MFPredictionUDF';
CREATE TEMPORARY FUNCTION train_mf_sgd AS
'hivemall.mf.MatrixFactorizationSGDUDTF';
CREATE TEMPORARY FUNCTION train_mf_adagrad AS
'hivemall.mf.MatrixFactorizationAdaGradUDTF';
CREATE TEMPORARY FUNCTION fm_predict AS
'hivemall.fm.FMPredictGenericUDAF';
CREATE TEMPORARY FUNCTION train_fm AS
'hivemall.fm.FactorizationMachineUDTF';
CREATE TEMPORARY FUNCTION train_randomforest_classifier AS
'hivemall.smile.classification.RandomForestClassifierUDTF';
CREATE TEMPORARY FUNCTION train_rf_classifier AS
'hivemall.smile.classification.RandomForestClassifierUDTF';
CREATE TEMPORARY FUNCTION train_randomforest_regr AS
'hivemall.smile.regression.RandomForestRegressionUDTF';
CREATE TEMPORARY FUNCTION train_rf_regr AS
'hivemall.smile.regression.RandomForestRegressionUDTF';
CREATE TEMPORARY FUNCTION tree_predict AS
'hivemall.smile.tools.TreePredictByStackMachineUDF';

Recommended for you

FizzBuzzではじめるテスト
FizzBuzzではじめるテストFizzBuzzではじめるテスト
FizzBuzzではじめるテスト

The document discusses testing PHP applications using SimpleTest, Selenium IDE, and CakePHP. It provides an overview of these testing tools and frameworks and recommends them for testing PHP applications.

cakephpphpselenium
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details

The document summarizes the new plugin API in Fluentd v0.14. Key points include: - The v0.12 plugin API was fragmented and difficult to write tests for. The v0.14 API provides a unified architecture. - The main plugin classes are Input, Filter, Output, Buffer, and plugins must subclass Fluent::Plugin::Base. - The Output plugin supports both buffered and non-buffered processing. Buffering can be configured by tags, time, or custom fields. - "Owned" plugins like Buffer are instantiated by primary plugins and can access owner resources. Storage is a new owned plugin for persistent storage. - New test drivers emulate plugin

fluentd
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"

The document discusses Fluentd's changes to its plugin API between versions 0.12 and 0.14. In 0.14, the API was overhauled to separate entry points from implementations and introduce a plugin base class to control data and control flow. A compatibility layer was added to allow most 0.12 plugins to work unmodified in 0.14 by handling calls to overridden methods. However, plugins that override certain methods like #emit may cause errors due to changes in how buffering works.

fluentdrubyyapcasia
CREATE TEMPORARY FUNCTION tree_predict AS
'hivemall.smile.tools.TreePredictByStackMachineUDF';
CREATE TEMPORARY FUNCTION vm_tree_predict AS
'hivemall.smile.tools.TreePredictByStackMachineUDF';
CREATE TEMPORARY FUNCTION rf_ensemble AS
'hivemall.smile.tools.RandomForestEnsembleUDAF';
CREATE TEMPORARY FUNCTION train_gradient_boosting_classifier AS
'hivemall.smile.classification.GradientTreeBoostingClassifierUDTF';
CREATE TEMPORARY FUNCTION guess_attribute_types AS
'hivemall.smile.tools.GuessAttributesUDF';
CREATE TEMPORARY FUNCTION tokenize_ja AS
'hivemall.nlp.tokenizer.KuromojiUDF';
CREATE TEMPORARY MACRO max2(x DOUBLE, y DOUBLE) if(x>y,x,y);
CREATE TEMPORARY MACRO min2(x DOUBLE, y DOUBLE) if(x<y,x,y);
CREATE TEMPORARY MACRO rand_gid(k INT) floor(rand()*k);
CREATE TEMPORARY MACRO rand_gid2(k INT, seed INT) floor(rand(seed)*k);
CREATE TEMPORARY MACRO idf(df_t DOUBLE, n_docs DOUBLE) log(10, n_docs /
max2(1,df_t)) + 1.0;
CREATE TEMPORARY MACRO tfidf(tf FLOAT, df_t DOUBLE, n_docs DOUBLE) tf *
(log(10, n_docs / max2(1,df_t)) + 1.0);
SELECT time, COUNT(1) AS cnt FROM tbl1
WHERE TD_TIME_RANGE(time, '2015-12-11', '2015-12-12', 'JST');
Do you still love
Java / SQL ?
PQ written in Ruby
• Building jobs/parameters is so complex!
• using data from many configurations (YAML, JSON),
internal APIs and RDBMSs
• with many ext syntaxes/rules to tune performance,
override configurations for tests, ...
• Ruby empower to write fat/complex worker code
• Testing!
• Unit tests using Rspec
• System tests (executing real queries/jobs) using
Rspec
For Further improvement
about workers
• More performance for more customers and less
costs
• More scalability for many other kind jobs
• Better and well-controlled tests (indented here
documents!)

Recommended for you

せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜

5/14~16で開催された スタートアップウィークエンド東京 Animeの「せいまち」チームのプレゼン資料です。 http://tokyo.startupweekend.org/

anime pilgrimagesanime聖地
Practical Testing of Ruby Core
Practical Testing of Ruby CorePractical Testing of Ruby Core
Practical Testing of Ruby Core

The document discusses testing practices for the Ruby programming language. It provides details on how to run various test suites that are part of the Ruby source code repository, including: 1. Running the "make test" command which runs sample tests, known bug tests, and tests defined in the test/ directory. 2. Running "make test-all" which runs core library and standard library tests under the test/ directory. 3. Running "make check" which builds encodings and extensions, runs all test tasks including test frameworks like Test::Unit and Minitest. 4. It also discusses strategies for merging test changes from external repositories like RubyGems and RDoc back into the Ruby source code

Experiments in Sharing Java VM Technology with CRuby
Experiments in Sharing Java VM Technology with CRubyExperiments in Sharing Java VM Technology with CRuby
Experiments in Sharing Java VM Technology with CRuby

IBM is developing a just-in-time (JIT) compiler called Testarossa based on its Open Runtime (OMR) toolkit. The goal is to integrate the JIT compiler into MRI Ruby to improve performance without changing how MRI works. So far, the JIT supports most opcodes and can run Rails applications, but performance gains are modest. IBM hopes to collaborate with the Ruby community to further optimize Ruby and help make it faster.

rubyjitibm
"Done is better than Perfect."
PerfectQueue
DoneQueue
?
We'll improve our code step by step,
with improvements of ruby and developer
community <3
Thanks!

More Related Content

What's hot

Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Mitsunori Komatsu
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
SATOSHI TAGOMORI
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
Taro L. Saito
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
SATOSHI TAGOMORI
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
Sadayuki Furuhashi
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
SATOSHI TAGOMORI
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
Sadayuki Furuhashi
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
SATOSHI TAGOMORI
 
Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
Joey Echeverria
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
Kathleen Ting
 
London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
Kai Sasaki
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
Sematext Group, Inc.
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
Cloudera, Inc.
 
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Maarten Balliauw
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
Sematext Group, Inc.
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
lucenerevolution
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 

What's hot (20)

Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 

Viewers also liked

How to create/improve OSS product and its community (revised)
How to create/improve OSS product and its community (revised)How to create/improve OSS product and its community (revised)
How to create/improve OSS product and its community (revised)
SATOSHI TAGOMORI
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby
yamanekko
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015
craig lehmann
 
Ruby meets Go
Ruby meets GoRuby meets Go
"fireap" - fast task runner on consul
"fireap" - fast task runner on consul"fireap" - fast task runner on consul
"fireap" - fast task runner on consul
IKEDA Kiyoshi
 
grifork - fast propagative task runner -
grifork - fast propagative task runner -grifork - fast propagative task runner -
grifork - fast propagative task runner -
IKEDA Kiyoshi
 
Introduction to poloxy - proxy for alerting
Introduction to poloxy - proxy for alertingIntroduction to poloxy - proxy for alerting
Introduction to poloxy - proxy for alerting
IKEDA Kiyoshi
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
ngotogenome
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
SATOSHI TAGOMORI
 
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
Fernando Hamasaki de Amorim
 
Introduction to Meteor & React
Introduction to Meteor & ReactIntroduction to Meteor & React
Introduction to Meteor & React
Max Li
 
つくること = 生きること : パターン・ランゲージによる創造の支援
つくること = 生きること : パターン・ランゲージによる創造の支援 つくること = 生きること : パターン・ランゲージによる創造の支援
つくること = 生きること : パターン・ランゲージによる創造の支援
Takashi Iba
 
FizzBuzzではじめるテスト
FizzBuzzではじめるテストFizzBuzzではじめるテスト
FizzBuzzではじめるテスト
Masashi Shinbara
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
SATOSHI TAGOMORI
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
SATOSHI TAGOMORI
 
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
Junichi Noda
 
Practical Testing of Ruby Core
Practical Testing of Ruby CorePractical Testing of Ruby Core
Practical Testing of Ruby Core
Hiroshi SHIBATA
 
Experiments in Sharing Java VM Technology with CRuby
Experiments in Sharing Java VM Technology with CRubyExperiments in Sharing Java VM Technology with CRuby
Experiments in Sharing Java VM Technology with CRuby
Matthew Gaudet
 
Redis勉強会資料(2015/06 update)
Redis勉強会資料(2015/06 update)Redis勉強会資料(2015/06 update)
Redis勉強会資料(2015/06 update)
Yuji Otani
 

Viewers also liked (20)

How to create/improve OSS product and its community (revised)
How to create/improve OSS product and its community (revised)How to create/improve OSS product and its community (revised)
How to create/improve OSS product and its community (revised)
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015
 
Ruby meets Go
Ruby meets GoRuby meets Go
Ruby meets Go
 
"fireap" - fast task runner on consul
"fireap" - fast task runner on consul"fireap" - fast task runner on consul
"fireap" - fast task runner on consul
 
grifork - fast propagative task runner -
grifork - fast propagative task runner -grifork - fast propagative task runner -
grifork - fast propagative task runner -
 
Introduction to poloxy - proxy for alerting
Introduction to poloxy - proxy for alertingIntroduction to poloxy - proxy for alerting
Introduction to poloxy - proxy for alerting
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
The worst Ruby codes I’ve seen in my life - RubyKaigi 2015
 
Introduction to Meteor & React
Introduction to Meteor & ReactIntroduction to Meteor & React
Introduction to Meteor & React
 
つくること = 生きること : パターン・ランゲージによる創造の支援
つくること = 生きること : パターン・ランゲージによる創造の支援 つくること = 生きること : パターン・ランゲージによる創造の支援
つくること = 生きること : パターン・ランゲージによる創造の支援
 
FizzBuzzではじめるテスト
FizzBuzzではじめるテストFizzBuzzではじめるテスト
FizzBuzzではじめるテスト
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
せいまち〜聖地探訪に出会いを求めるのは間違っているだろうか〜
 
Practical Testing of Ruby Core
Practical Testing of Ruby CorePractical Testing of Ruby Core
Practical Testing of Ruby Core
 
Experiments in Sharing Java VM Technology with CRuby
Experiments in Sharing Java VM Technology with CRubyExperiments in Sharing Java VM Technology with CRuby
Experiments in Sharing Java VM Technology with CRuby
 
Redis勉強会資料(2015/06 update)
Redis勉強会資料(2015/06 update)Redis勉強会資料(2015/06 update)
Redis勉強会資料(2015/06 update)
 

Similar to Data Analytics Service Company and Its Ruby Usage

ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
Chiradeep Vittal
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
SATOSHI TAGOMORI
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
Taras Matyashovsky
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
whoschek
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
Chester Chen
 
20151010 my sq-landjavav2a
20151010 my sq-landjavav2a20151010 my sq-landjavav2a
20151010 my sq-landjavav2a
Ivan Ma
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
SnappyData
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
Jags Ramnarayan
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
20160307 apex connects_jira
20160307 apex connects_jira20160307 apex connects_jira
20160307 apex connects_jira
MT AG
 
Simplifying Apache Cascading
Simplifying Apache CascadingSimplifying Apache Cascading
Simplifying Apache Cascading
Ming Yuan
 
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Masahiro Nagano
 
비동기 회고 발표자료
비동기 회고 발표자료비동기 회고 발표자료
비동기 회고 발표자료
Benjamin Kim
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Building Scalable Websites with Perl
Building Scalable Websites with PerlBuilding Scalable Websites with Perl
Building Scalable Websites with Perl
Perrin Harkins
 

Similar to Data Analytics Service Company and Its Ruby Usage (20)

ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
20151010 my sq-landjavav2a
20151010 my sq-landjavav2a20151010 my sq-landjavav2a
20151010 my sq-landjavav2a
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
20160307 apex connects_jira
20160307 apex connects_jira20160307 apex connects_jira
20160307 apex connects_jira
 
Simplifying Apache Cascading
Simplifying Apache CascadingSimplifying Apache Cascading
Simplifying Apache Cascading
 
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015
 
비동기 회고 발표자료
비동기 회고 발표자료비동기 회고 발표자료
비동기 회고 발표자료
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Building Scalable Websites with Perl
Building Scalable Websites with PerlBuilding Scalable Websites with Perl
Building Scalable Websites with Perl
 

More from SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
SATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
SATOSHI TAGOMORI
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
SATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
SATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
SATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
SATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
SATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
SATOSHI TAGOMORI
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
SATOSHI TAGOMORI
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
SATOSHI TAGOMORI
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
SATOSHI TAGOMORI
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
SATOSHI TAGOMORI
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
SATOSHI TAGOMORI
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
SATOSHI TAGOMORI
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
SATOSHI TAGOMORI
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
SATOSHI TAGOMORI
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
SATOSHI TAGOMORI
 

More from SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
 

Recently uploaded

Safe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work PermitsSafe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work Permits
sheqnetworkmarketing
 
NYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdfNYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdf
AUGNYC
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
sofiafernandezon
 
Migrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS CloudMigrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS Cloud
Ortus Solutions, Corp
 
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
Ortus Solutions, Corp
 
Cultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational TransformationCultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational Transformation
Mindfire Solution
 
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
Semiosis Software Private Limited
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Livetecs LLC
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
bhatinidhi2001
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
VishrutGoyani1
 
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
onemonitarsoftware
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
Severalnines
 
Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …
908dutch
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
karim wahed
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdfAWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
karim wahed
 
dachnug51 - HCL Sametime 12 as a Software Appliance.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdfdachnug51 - HCL Sametime 12 as a Software Appliance.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdf
DNUG e.V.
 
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
sudsdeep
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Estuary Flow
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Sparity1
 

Recently uploaded (20)

Safe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work PermitsSafe Work Permit Management Software for Hot Work Permits
Safe Work Permit Management Software for Hot Work Permits
 
NYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdfNYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdf
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
 
Migrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS CloudMigrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS Cloud
 
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
 
Cultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational TransformationCultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational Transformation
 
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
 
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
 
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
 
Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdfAWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) AWS Security .pdf
 
dachnug51 - HCL Sametime 12 as a Software Appliance.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdfdachnug51 - HCL Sametime 12 as a Software Appliance.pdf
dachnug51 - HCL Sametime 12 as a Software Appliance.pdf
 
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
 

Data Analytics Service Company and Its Ruby Usage

  • 1. Data Analytics Service Company and Its Ruby Usage RubyKaigi 2015 (Dec 12, 2015) Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  • 5. http://www.fluentd.org/ Fluentd Unified Logging Layer For Stream Data Written in CRuby http://www.slideshare.net/treasure-data/the-basics-of-fluentd-35681111
  • 6. Bulk Data Loader High Throughput&Reliability Embulk Written in Java/JRuby http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed http://www.embulk.org/
  • 7. Data Analytics Platform Data Analytics Service
  • 13. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 14. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 15. Data Analytics Platform • Data collection, storage • Console & API endpoints • Schema management • Processing (batch, query, ...) • Queuing & Scheduling • Data connector/exporter
  • 17. Data Analytics Platform • Data collection, storage: Ruby(OSS), Java/JRuby(OSS) • Console & API endpoints: Ruby(RoR) • Schema management: Ruby/Java (MessagePack) • Processing (batch, query, ...): Java(Hadoop,Presto) • Queuing & Scheduling: Ruby(OSS) • Data connector/exporter: Java, Java/JRuby(OSS)
  • 18. Treasure Data Architecture: Overview http://www.slideshare.net/tagomoris/data-analytics-service-company-and-its-ruby-usage Console API EventCollector PlazmaDB Worker Hadoop Cluster Presto Cluster USERS TD SDKs SERVERS DataConnector CUSTOMER's SYSTEMS Scheduler
  • 22. Queue/Worker, Scheduler • Treasure Data: multi-tenant data analytics service • executes many jobs in shared clusters (queries, imports, ...) • CORE: queues-workers & schedulers • Clusters have queues/scheduler... it's not enough • resource limitations for each price plans • priority queues for job types • and many others
  • 24. PerfectSched • Provides periodical/scheduled queries for customers • it's like reliable "cron" • Highly available distributed scheduler using RDBMS • Written in CRuby • At-least-once semantics • PerfectSched enqueues jobs into PerfectQueue
  • 25. Jobs in TD LOST Duplicated Retried for Errors Throughput Execution time DATA import/ export NG NG OK or NG HIGH SHORT (secs-mins) QUERY NG OK OK LOW SHORT (secs) or LONG (mins-hours)
  • 27. Worker Worker Worker Worker PerfectQueue overview Worker MySQL 1 table for 1 queue workers for queues hypervisor process worker processes worker processes worker processes
  • 28. Features • Priorities for query types • Resource limits per accounts • Graceful restarts • Queries must run long time (<= 1d) • New worker code should be loaded, besides running job with older code
  • 29. PerfectQueue • Highly available distributed queue using RDBMS • Enqueue by INSERT INTO • Dequeue/Commit by UPDATE • Using transactions • Flexible scheduling rather than scalability • Workers does many things • Plazmadb operations (including importing data) • Building job parameters • Handling results of jobs + kicking other jobs • Using Amazon RDS (MySQL) internally (+ Workers on EC2)
  • 30. Builing Jobs/Parameters • Parameters • for job types, accounts, price plans and clusters • to control performance/parallelism, permissions and data types • ex: Java properties • Jobs • to prepare for customers' queries • to make queries safer/faster • ex: Hive Queries (HiveQL, a variety of SQL)
  • 31. Example: Hive job env HADOOP_CLASSPATH=test.jar:td-hadoop-1.0.jar HADOOP_OPTS="-Xmx738m -Duser.name=221" hive --service jar td-hadoop-1.0.jar com.treasure_data.hadoop.hive.runner.QueryRunner -hiveconf td.jar.version= -hiveconf plazma.metadb.config={} -hiveconf plazma.storage.config={} -hiveconf td.worker.database.config={} -hiveconf mapreduce.job.priority=HIGH -hiveconf mapreduce.job.queuename=root.q221.high -hiveconf mapreduce.job.name=HiveJob379515 -hiveconf td.query.mergeThreshold=1333382400 -hiveconf td.query.apikey=12345 -hiveconf td.scheduled.time=1342449253 -hiveconf td.outdir=./jobs/379515 -hiveconf hive.metastore.warehouse.dir=/user/hive/221/warehouse -hiveconf hive.auto.convert.join.noconditionaltask=false -hiveconf hive.mapjoin.localtask.max.memory.usage=0.7 -hiveconf hive.mapjoin.smalltable.filesize=25000000 -hiveconf hive.resultset.use.unique.column.names=false -hiveconf hive.auto.convert.join=false -hiveconf hive.optimize.sort.dynamic.partition=false -hiveconf mapreduce.job.reduces=-1 -hiveconf hive.vectorized.execution.enabled=false -hiveconf mapreduce.job.ubertask.enable=true -hiveconf yarn.app.mapreduce.am.resource.mb=2048
  • 32. env HADOOP_CLASSPATH=test.jar:td-hadoop-1.0.jar HADOOP_OPTS="-Xmx738m -Duser.name=221" hive --service jar td-hadoop-1.0.jar com.treasure_data.hadoop.hive.runner.QueryRunner -hiveconf td.jar.version= -hiveconf plazma.metadb.config={} -hiveconf plazma.storage.config={} -hiveconf td.worker.database.config={} -hiveconf mapreduce.job.priority=HIGH -hiveconf mapreduce.job.queuename=root.q221.high -hiveconf mapreduce.job.name=HiveJob379515 -hiveconf td.query.mergeThreshold=1333382400 -hiveconf td.query.apikey=12345 -hiveconf td.scheduled.time=1342449253 -hiveconf td.outdir=./jobs/379515 -hiveconf hive.metastore.warehouse.dir=/user/hive/221/warehouse -hiveconf hive.auto.convert.join.noconditionaltask=false -hiveconf hive.mapjoin.localtask.max.memory.usage=0.7 -hiveconf hive.mapjoin.smalltable.filesize=25000000 -hiveconf hive.resultset.use.unique.column.names=false -hiveconf hive.auto.convert.join=false -hiveconf hive.optimize.sort.dynamic.partition=false -hiveconf mapreduce.job.reduces=-1 -hiveconf hive.vectorized.execution.enabled=false -hiveconf mapreduce.job.ubertask.enable=true -hiveconf yarn.app.mapreduce.am.resource.mb=2048 -hiveconf mapreduce.job.ubertask.maxmaps=1 -hiveconf mapreduce.job.ubertask.maxreduces=1 -hiveconf mapreduce.job.ubertask.maxbytes=536870912 -hiveconf td.hive.insertInto.dynamic.partitioning=false -outdir ./jobs/379515
  • 33. Example: Hive job (cont) ADD JAR 'td-hadoop-1.0.jar'; CREATE DATABASE IF NOT EXISTS `db`; USE `db`; CREATE TABLE tagomoris (`v` MAP<STRING,STRING>, `time` INT) STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler' WITH SERDEPROPERTIES ('msgpack.columns.mapping'='*,time') TBLPROPERTIES ( 'td.storage.user'='221', 'td.storage.database'='dfc', 'td.storage.table'='users_20100604_080812_ce9203d0', 'td.storage.path'='221/dfc/users_20100604_080812_ce9203d0', 'td.table_id'='2', 'td.modifiable'='true', 'plazma.data_set.name'='221/dfc/users_20100604_080812_ce9203d0' ); CREATE TABLE tbl1 ( `uid` INT, `key` STRING, `time` INT ) STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler' WITH SERDEPROPERTIES ('msgpack.columns.mapping'='uid,key,time') TBLPROPERTIES ( 'td.storage.user'='221', 'td.storage.database'='dfc',
  • 34. ADD JAR 'td-hadoop-1.0.jar'; CREATE DATABASE IF NOT EXISTS `db`; USE `db`; CREATE TABLE tagomoris (`v` MAP<STRING,STRING>, `time` INT) STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler' WITH SERDEPROPERTIES ('msgpack.columns.mapping'='*,time') TBLPROPERTIES ( 'td.storage.user'='221', 'td.storage.database'='dfc', 'td.storage.table'='users_20100604_080812_ce9203d0', 'td.storage.path'='221/dfc/users_20100604_080812_ce9203d0', 'td.table_id'='2', 'td.modifiable'='true', 'plazma.data_set.name'='221/dfc/users_20100604_080812_ce9203d0' ); CREATE TABLE tbl1 ( `uid` INT, `key` STRING, `time` INT ) STORED BY 'com.treasure_data.hadoop.hive.mapred.TDStorageHandler' WITH SERDEPROPERTIES ('msgpack.columns.mapping'='uid,key,time') TBLPROPERTIES ( 'td.storage.user'='221', 'td.storage.database'='dfc', 'td.storage.table'='contests_20100606_120720_96abe81a', 'td.storage.path'='221/dfc/contests_20100606_120720_96abe81a', 'td.table_id'='4', 'td.modifiable'='true', 'plazma.data_set.name'='221/dfc/contests_20100606_120720_96abe81a' ); USE `db`;
  • 35. USE `db`; CREATE TEMPORARY FUNCTION MSGPACK_SERIALIZE AS 'com.treasure_data.hadoop.hive.udf.MessagePackSerialize'; CREATE TEMPORARY FUNCTION TD_TIME_RANGE AS 'com.treasure_data.hadoop.hive.udf.GenericUDFTimeRange'; CREATE TEMPORARY FUNCTION TD_TIME_ADD AS 'com.treasure_data.hadoop.hive.udf.UDFTimeAdd'; CREATE TEMPORARY FUNCTION TD_TIME_FORMAT AS 'com.treasure_data.hadoop.hive.udf.UDFTimeFormat'; CREATE TEMPORARY FUNCTION TD_TIME_PARSE AS 'com.treasure_data.hadoop.hive.udf.UDFTimeParse'; CREATE TEMPORARY FUNCTION TD_SCHEDULED_TIME AS 'com.treasure_data.hadoop.hive.udf.GenericUDFScheduledTime'; CREATE TEMPORARY FUNCTION TD_X_RANK AS 'com.treasure_data.hadoop.hive.udf.Rank'; CREATE TEMPORARY FUNCTION TD_FIRST AS 'com.treasure_data.hadoop.hive.udf.GenericUDAFFirst'; CREATE TEMPORARY FUNCTION TD_LAST AS 'com.treasure_data.hadoop.hive.udf.GenericUDAFLast'; CREATE TEMPORARY FUNCTION TD_SESSIONIZE AS 'com.treasure_data.hadoop.hive.udf.UDFSessionize'; CREATE TEMPORARY FUNCTION TD_PARSE_USER_AGENT AS 'com.treasure_data.hadoop.hive.udf.GenericUDFParseUserAgent'; CREATE TEMPORARY FUNCTION TD_HEX2NUM AS 'com.treasure_data.hadoop.hive.udf.UDFHex2num'; CREATE TEMPORARY FUNCTION TD_MD5 AS 'com.treasure_data.hadoop.hive.udf.UDFmd5'; CREATE TEMPORARY FUNCTION TD_RANK_SEQUENCE AS 'com.treasure_data.hadoop.hive.udf.UDFRankSequence'; CREATE TEMPORARY FUNCTION TD_STRING_EXPLODER AS 'com.treasure_data.hadoop.hive.udf.GenericUDTFStringExploder'; CREATE TEMPORARY FUNCTION TD_URL_DECODE AS
  • 36. CREATE TEMPORARY FUNCTION TD_URL_DECODE AS 'com.treasure_data.hadoop.hive.udf.UDFUrlDecode'; CREATE TEMPORARY FUNCTION TD_DATE_TRUNC AS 'com.treasure_data.hadoop.hive.udf.UDFDateTrunc'; CREATE TEMPORARY FUNCTION TD_LAT_LONG_TO_COUNTRY AS 'com.treasure_data.hadoop.hive.udf.UDFLatLongToCountry'; CREATE TEMPORARY FUNCTION TD_SUBSTRING_INENCODING AS 'com.treasure_data.hadoop.hive.udf.GenericUDFSubstringInEncoding'; CREATE TEMPORARY FUNCTION TD_DIVIDE AS 'com.treasure_data.hadoop.hive.udf.GenericUDFDivide'; CREATE TEMPORARY FUNCTION TD_SUMIF AS 'com.treasure_data.hadoop.hive.udf.GenericUDAFSumIf'; CREATE TEMPORARY FUNCTION TD_AVGIF AS 'com.treasure_data.hadoop.hive.udf.GenericUDAFAvgIf'; CREATE TEMPORARY FUNCTION hivemall_version AS 'hivemall.HivemallVersionUDF'; CREATE TEMPORARY FUNCTION perceptron AS 'hivemall.classifier.PerceptronUDTF'; CREATE TEMPORARY FUNCTION train_perceptron AS 'hivemall.classifier.PerceptronUDTF'; CREATE TEMPORARY FUNCTION train_pa AS 'hivemall.classifier.PassiveAggressiveUDTF'; CREATE TEMPORARY FUNCTION train_pa1 AS 'hivemall.classifier.PassiveAggressiveUDTF'; CREATE TEMPORARY FUNCTION train_pa2 AS 'hivemall.classifier.PassiveAggressiveUDTF'; CREATE TEMPORARY FUNCTION train_cw AS 'hivemall.classifier.ConfidenceWeightedUDTF'; CREATE TEMPORARY FUNCTION train_arow AS 'hivemall.classifier.AROWClassifierUDTF'; CREATE TEMPORARY FUNCTION train_arowh AS 'hivemall.classifier.AROWClassifierUDTF';
  • 37. CREATE TEMPORARY FUNCTION train_arowh AS 'hivemall.classifier.AROWClassifierUDTF'; CREATE TEMPORARY FUNCTION train_scw AS 'hivemall.classifier.SoftConfideceWeightedUDTF'; CREATE TEMPORARY FUNCTION train_scw2 AS 'hivemall.classifier.SoftConfideceWeightedUDTF'; CREATE TEMPORARY FUNCTION adagrad_rda AS 'hivemall.classifier.AdaGradRDAUDTF'; CREATE TEMPORARY FUNCTION train_adagrad_rda AS 'hivemall.classifier.AdaGradRDAUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_perceptron AS 'hivemall.classifier.multiclass.MulticlassPerceptronUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_pa AS 'hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_pa1 AS 'hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_pa2 AS 'hivemall.classifier.multiclass.MulticlassPassiveAggressiveUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_cw AS 'hivemall.classifier.multiclass.MulticlassConfidenceWeightedUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_arow AS 'hivemall.classifier.multiclass.MulticlassAROWClassifierUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_scw AS 'hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF'; CREATE TEMPORARY FUNCTION train_multiclass_scw2 AS 'hivemall.classifier.multiclass.MulticlassSoftConfidenceWeightedUDTF'; CREATE TEMPORARY FUNCTION cosine_similarity AS 'hivemall.knn.similarity.CosineSimilarityUDF'; CREATE TEMPORARY FUNCTION cosine_sim AS 'hivemall.knn.similarity.CosineSimilarityUDF'; CREATE TEMPORARY FUNCTION jaccard AS 'hivemall.knn.similarity.JaccardIndexUDF';
  • 38. CREATE TEMPORARY FUNCTION jaccard AS 'hivemall.knn.similarity.JaccardIndexUDF'; CREATE TEMPORARY FUNCTION jaccard_similarity AS 'hivemall.knn.similarity.JaccardIndexUDF'; CREATE TEMPORARY FUNCTION angular_similarity AS 'hivemall.knn.similarity.AngularSimilarityUDF'; CREATE TEMPORARY FUNCTION euclid_similarity AS 'hivemall.knn.similarity.EuclidSimilarity'; CREATE TEMPORARY FUNCTION distance2similarity AS 'hivemall.knn.similarity.Distance2SimilarityUDF'; CREATE TEMPORARY FUNCTION hamming_distance AS 'hivemall.knn.distance.HammingDistanceUDF'; CREATE TEMPORARY FUNCTION popcnt AS 'hivemall.knn.distance.PopcountUDF'; CREATE TEMPORARY FUNCTION kld AS 'hivemall.knn.distance.KLDivergenceUDF'; CREATE TEMPORARY FUNCTION euclid_distance AS 'hivemall.knn.distance.EuclidDistanceUDF'; CREATE TEMPORARY FUNCTION cosine_distance AS 'hivemall.knn.distance.CosineDistanceUDF'; CREATE TEMPORARY FUNCTION angular_distance AS 'hivemall.knn.distance.AngularDistanceUDF'; CREATE TEMPORARY FUNCTION jaccard_distance AS 'hivemall.knn.distance.JaccardDistanceUDF'; CREATE TEMPORARY FUNCTION manhattan_distance AS 'hivemall.knn.distance.ManhattanDistanceUDF'; CREATE TEMPORARY FUNCTION minkowski_distance AS 'hivemall.knn.distance.MinkowskiDistanceUDF'; CREATE TEMPORARY FUNCTION minhashes AS 'hivemall.knn.lsh.MinHashesUDF'; CREATE TEMPORARY FUNCTION minhash AS 'hivemall.knn.lsh.MinHashUDTF'; CREATE TEMPORARY FUNCTION bbit_minhash AS 'hivemall.knn.lsh.bBitMinHashUDF'; CREATE TEMPORARY FUNCTION voted_avg AS 'hivemall.ensemble.bagging.VotedAvgUDAF';
  • 39. CREATE TEMPORARY FUNCTION voted_avg AS 'hivemall.ensemble.bagging.VotedAvgUDAF'; CREATE TEMPORARY FUNCTION weight_voted_avg AS 'hivemall.ensemble.bagging.WeightVotedAvgUDAF'; CREATE TEMPORARY FUNCTION wvoted_avg AS 'hivemall.ensemble.bagging.WeightVotedAvgUDAF'; CREATE TEMPORARY FUNCTION max_label AS 'hivemall.ensemble.MaxValueLabelUDAF'; CREATE TEMPORARY FUNCTION maxrow AS 'hivemall.ensemble.MaxRowUDAF'; CREATE TEMPORARY FUNCTION argmin_kld AS 'hivemall.ensemble.ArgminKLDistanceUDAF'; CREATE TEMPORARY FUNCTION mhash AS 'hivemall.ftvec.hashing.MurmurHash3UDF'; CREATE TEMPORARY FUNCTION sha1 AS 'hivemall.ftvec.hashing.Sha1UDF'; CREATE TEMPORARY FUNCTION array_hash_values AS 'hivemall.ftvec.hashing.ArrayHashValuesUDF'; CREATE TEMPORARY FUNCTION prefixed_hash_values AS 'hivemall.ftvec.hashing.ArrayPrefixedHashValuesUDF'; CREATE TEMPORARY FUNCTION polynomial_features AS 'hivemall.ftvec.pairing.PolynomialFeaturesUDF'; CREATE TEMPORARY FUNCTION powered_features AS 'hivemall.ftvec.pairing.PoweredFeaturesUDF'; CREATE TEMPORARY FUNCTION rescale AS 'hivemall.ftvec.scaling.RescaleUDF'; CREATE TEMPORARY FUNCTION rescale_fv AS 'hivemall.ftvec.scaling.RescaleUDF'; CREATE TEMPORARY FUNCTION zscore AS 'hivemall.ftvec.scaling.ZScoreUDF'; CREATE TEMPORARY FUNCTION normalize AS 'hivemall.ftvec.scaling.L2NormalizationUDF'; CREATE TEMPORARY FUNCTION conv2dense AS 'hivemall.ftvec.conv.ConvertToDenseModelUDAF'; CREATE TEMPORARY FUNCTION to_dense_features AS 'hivemall.ftvec.conv.ToDenseFeaturesUDF';
  • 40. CREATE TEMPORARY FUNCTION to_dense_features AS 'hivemall.ftvec.conv.ToDenseFeaturesUDF'; CREATE TEMPORARY FUNCTION to_dense AS 'hivemall.ftvec.conv.ToDenseFeaturesUDF'; CREATE TEMPORARY FUNCTION to_sparse_features AS 'hivemall.ftvec.conv.ToSparseFeaturesUDF'; CREATE TEMPORARY FUNCTION to_sparse AS 'hivemall.ftvec.conv.ToSparseFeaturesUDF'; CREATE TEMPORARY FUNCTION quantify AS 'hivemall.ftvec.conv.QuantifyColumnsUDTF'; CREATE TEMPORARY FUNCTION vectorize_features AS 'hivemall.ftvec.trans.VectorizeFeaturesUDF'; CREATE TEMPORARY FUNCTION categorical_features AS 'hivemall.ftvec.trans.CategoricalFeaturesUDF'; CREATE TEMPORARY FUNCTION indexed_features AS 'hivemall.ftvec.trans.IndexedFeatures'; CREATE TEMPORARY FUNCTION quantified_features AS 'hivemall.ftvec.trans.QuantifiedFeaturesUDTF'; CREATE TEMPORARY FUNCTION quantitative_features AS 'hivemall.ftvec.trans.QuantitativeFeaturesUDF'; CREATE TEMPORARY FUNCTION amplify AS 'hivemall.ftvec.amplify.AmplifierUDTF'; CREATE TEMPORARY FUNCTION rand_amplify AS 'hivemall.ftvec.amplify.RandomAmplifierUDTF'; CREATE TEMPORARY FUNCTION addBias AS 'hivemall.ftvec.AddBiasUDF'; CREATE TEMPORARY FUNCTION add_bias AS 'hivemall.ftvec.AddBiasUDF'; CREATE TEMPORARY FUNCTION sortByFeature AS 'hivemall.ftvec.SortByFeatureUDF'; CREATE TEMPORARY FUNCTION sort_by_feature AS 'hivemall.ftvec.SortByFeatureUDF'; CREATE TEMPORARY FUNCTION extract_feature AS 'hivemall.ftvec.ExtractFeatureUDF';
  • 41. CREATE TEMPORARY FUNCTION extract_feature AS 'hivemall.ftvec.ExtractFeatureUDF'; CREATE TEMPORARY FUNCTION extract_weight AS 'hivemall.ftvec.ExtractWeightUDF'; CREATE TEMPORARY FUNCTION add_feature_index AS 'hivemall.ftvec.AddFeatureIndexUDF'; CREATE TEMPORARY FUNCTION feature AS 'hivemall.ftvec.FeatureUDF'; CREATE TEMPORARY FUNCTION feature_index AS 'hivemall.ftvec.FeatureIndexUDF'; CREATE TEMPORARY FUNCTION tf AS 'hivemall.ftvec.text.TermFrequencyUDAF'; CREATE TEMPORARY FUNCTION train_logregr AS 'hivemall.regression.LogressUDTF'; CREATE TEMPORARY FUNCTION train_pa1_regr AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION train_pa1a_regr AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION train_pa2_regr AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION train_pa2a_regr AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION train_arow_regr AS 'hivemall.regression.AROWRegressionUDTF'; CREATE TEMPORARY FUNCTION train_arowe_regr AS 'hivemall.regression.AROWRegressionUDTF'; CREATE TEMPORARY FUNCTION train_arowe2_regr AS 'hivemall.regression.AROWRegressionUDTF'; CREATE TEMPORARY FUNCTION train_adagrad_regr AS 'hivemall.regression.AdaGradUDTF'; CREATE TEMPORARY FUNCTION train_adadelta_regr AS 'hivemall.regression.AdaDeltaUDTF'; CREATE TEMPORARY FUNCTION train_adagrad AS 'hivemall.regression.AdaGradUDTF';
  • 42. CREATE TEMPORARY FUNCTION train_adagrad AS 'hivemall.regression.AdaGradUDTF'; CREATE TEMPORARY FUNCTION train_adadelta AS 'hivemall.regression.AdaDeltaUDTF'; CREATE TEMPORARY FUNCTION logress AS 'hivemall.regression.LogressUDTF'; CREATE TEMPORARY FUNCTION pa1_regress AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION pa1a_regress AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION pa2_regress AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION pa2a_regress AS 'hivemall.regression.PassiveAggressiveRegressionUDTF'; CREATE TEMPORARY FUNCTION arow_regress AS 'hivemall.regression.AROWRegressionUDTF'; CREATE TEMPORARY FUNCTION arowe_regress AS 'hivemall.regression.AROWRegressionUDTF'; CREATE TEMPORARY FUNCTION arowe2_regress AS 'hivemall.regression.AROWRegressionUDTF'; CREATE TEMPORARY FUNCTION adagrad AS 'hivemall.regression.AdaGradUDTF'; CREATE TEMPORARY FUNCTION adadelta AS 'hivemall.regression.AdaDeltaUDTF'; CREATE TEMPORARY FUNCTION float_array AS 'hivemall.tools.array.AllocFloatArrayUDF'; CREATE TEMPORARY FUNCTION array_remove AS 'hivemall.tools.array.ArrayRemoveUDF'; CREATE TEMPORARY FUNCTION sort_and_uniq_array AS 'hivemall.tools.array.SortAndUniqArrayUDF'; CREATE TEMPORARY FUNCTION subarray_endwith AS 'hivemall.tools.array.SubarrayEndWithUDF'; CREATE TEMPORARY FUNCTION subarray_startwith AS 'hivemall.tools.array.SubarrayStartWithUDF'; CREATE TEMPORARY FUNCTION collect_all AS
  • 43. CREATE TEMPORARY FUNCTION collect_all AS 'hivemall.tools.array.CollectAllUDAF'; CREATE TEMPORARY FUNCTION concat_array AS 'hivemall.tools.array.ConcatArrayUDF'; CREATE TEMPORARY FUNCTION subarray AS 'hivemall.tools.array.SubarrayUDF'; CREATE TEMPORARY FUNCTION array_avg AS 'hivemall.tools.array.ArrayAvgGenericUDAF'; CREATE TEMPORARY FUNCTION array_sum AS 'hivemall.tools.array.ArraySumUDAF'; CREATE TEMPORARY FUNCTION to_string_array AS 'hivemall.tools.array.ToStringArrayUDF'; CREATE TEMPORARY FUNCTION map_get_sum AS 'hivemall.tools.map.MapGetSumUDF'; CREATE TEMPORARY FUNCTION map_tail_n AS 'hivemall.tools.map.MapTailNUDF'; CREATE TEMPORARY FUNCTION to_map AS 'hivemall.tools.map.UDAFToMap'; CREATE TEMPORARY FUNCTION to_ordered_map AS 'hivemall.tools.map.UDAFToOrderedMap'; CREATE TEMPORARY FUNCTION sigmoid AS 'hivemall.tools.math.SigmoidGenericUDF'; CREATE TEMPORARY FUNCTION taskid AS 'hivemall.tools.mapred.TaskIdUDF'; CREATE TEMPORARY FUNCTION jobid AS 'hivemall.tools.mapred.JobIdUDF'; CREATE TEMPORARY FUNCTION rowid AS 'hivemall.tools.mapred.RowIdUDF'; CREATE TEMPORARY FUNCTION generate_series AS 'hivemall.tools.GenerateSeriesUDTF'; CREATE TEMPORARY FUNCTION convert_label AS 'hivemall.tools.ConvertLabelUDF'; CREATE TEMPORARY FUNCTION x_rank AS 'hivemall.tools.RankSequenceUDF'; CREATE TEMPORARY FUNCTION each_top_k AS 'hivemall.tools.EachTopKUDTF'; CREATE TEMPORARY FUNCTION tokenize AS 'hivemall.tools.text.TokenizeUDF'; CREATE TEMPORARY FUNCTION is_stopword AS 'hivemall.tools.text.StopwordUDF'; CREATE TEMPORARY FUNCTION split_words AS
  • 44. CREATE TEMPORARY FUNCTION split_words AS 'hivemall.tools.text.SplitWordsUDF'; CREATE TEMPORARY FUNCTION normalize_unicode AS 'hivemall.tools.text.NormalizeUnicodeUDF'; CREATE TEMPORARY FUNCTION lr_datagen AS 'hivemall.dataset.LogisticRegressionDataGeneratorUDTF'; CREATE TEMPORARY FUNCTION f1score AS 'hivemall.evaluation.FMeasureUDAF'; CREATE TEMPORARY FUNCTION mae AS 'hivemall.evaluation.MeanAbsoluteErrorUDAF'; CREATE TEMPORARY FUNCTION mse AS 'hivemall.evaluation.MeanSquaredErrorUDAF'; CREATE TEMPORARY FUNCTION rmse AS 'hivemall.evaluation.RootMeanSquaredErrorUDAF'; CREATE TEMPORARY FUNCTION mf_predict AS 'hivemall.mf.MFPredictionUDF'; CREATE TEMPORARY FUNCTION train_mf_sgd AS 'hivemall.mf.MatrixFactorizationSGDUDTF'; CREATE TEMPORARY FUNCTION train_mf_adagrad AS 'hivemall.mf.MatrixFactorizationAdaGradUDTF'; CREATE TEMPORARY FUNCTION fm_predict AS 'hivemall.fm.FMPredictGenericUDAF'; CREATE TEMPORARY FUNCTION train_fm AS 'hivemall.fm.FactorizationMachineUDTF'; CREATE TEMPORARY FUNCTION train_randomforest_classifier AS 'hivemall.smile.classification.RandomForestClassifierUDTF'; CREATE TEMPORARY FUNCTION train_rf_classifier AS 'hivemall.smile.classification.RandomForestClassifierUDTF'; CREATE TEMPORARY FUNCTION train_randomforest_regr AS 'hivemall.smile.regression.RandomForestRegressionUDTF'; CREATE TEMPORARY FUNCTION train_rf_regr AS 'hivemall.smile.regression.RandomForestRegressionUDTF'; CREATE TEMPORARY FUNCTION tree_predict AS 'hivemall.smile.tools.TreePredictByStackMachineUDF';
  • 45. CREATE TEMPORARY FUNCTION tree_predict AS 'hivemall.smile.tools.TreePredictByStackMachineUDF'; CREATE TEMPORARY FUNCTION vm_tree_predict AS 'hivemall.smile.tools.TreePredictByStackMachineUDF'; CREATE TEMPORARY FUNCTION rf_ensemble AS 'hivemall.smile.tools.RandomForestEnsembleUDAF'; CREATE TEMPORARY FUNCTION train_gradient_boosting_classifier AS 'hivemall.smile.classification.GradientTreeBoostingClassifierUDTF'; CREATE TEMPORARY FUNCTION guess_attribute_types AS 'hivemall.smile.tools.GuessAttributesUDF'; CREATE TEMPORARY FUNCTION tokenize_ja AS 'hivemall.nlp.tokenizer.KuromojiUDF'; CREATE TEMPORARY MACRO max2(x DOUBLE, y DOUBLE) if(x>y,x,y); CREATE TEMPORARY MACRO min2(x DOUBLE, y DOUBLE) if(x<y,x,y); CREATE TEMPORARY MACRO rand_gid(k INT) floor(rand()*k); CREATE TEMPORARY MACRO rand_gid2(k INT, seed INT) floor(rand(seed)*k); CREATE TEMPORARY MACRO idf(df_t DOUBLE, n_docs DOUBLE) log(10, n_docs / max2(1,df_t)) + 1.0; CREATE TEMPORARY MACRO tfidf(tf FLOAT, df_t DOUBLE, n_docs DOUBLE) tf * (log(10, n_docs / max2(1,df_t)) + 1.0); SELECT time, COUNT(1) AS cnt FROM tbl1 WHERE TD_TIME_RANGE(time, '2015-12-11', '2015-12-12', 'JST');
  • 46. Do you still love Java / SQL ?
  • 47. PQ written in Ruby • Building jobs/parameters is so complex! • using data from many configurations (YAML, JSON), internal APIs and RDBMSs • with many ext syntaxes/rules to tune performance, override configurations for tests, ... • Ruby empower to write fat/complex worker code • Testing! • Unit tests using Rspec • System tests (executing real queries/jobs) using Rspec
  • 48. For Further improvement about workers • More performance for more customers and less costs • More scalability for many other kind jobs • Better and well-controlled tests (indented here documents!)
  • 49. "Done is better than Perfect."
  • 51. We'll improve our code step by step, with improvements of ruby and developer community <3 Thanks!