SlideShare a Scribd company logo
LONDON USERGROUP
APRIL 30th
2009
Nicola Cardace
Topics
• Auto-Scaling Using Amazon EC2 and Scalr
• Nginx and Memcached on EC2, a 400% boost!
• NASDAQ exchange re-play on AWS
• Persistent Django on Amazon EC2 and EBS
• Taking Massive Distributed Computing to the
Common Man - Hadoop on Amazon EC2/S3
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09

Recommended for you

Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit

1) The document discusses memory management in Spark applications and summarizes different approaches tried by developers to address out of memory errors in Spark executors. 2) It analyzes the root causes of memory issues like executor overheads and data sizes, and evaluates fixes like increasing memory overhead, reducing cores, frequent garbage collection. 3) The document dives into Spark and JVM level configuration options for memory like storage pool sizes, caching formats, and garbage collection settings to improve reliability, efficiency and performance of Spark jobs.

#apachespark #sparksummit
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop

Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl Ahad Rana, engineer at CommonCrawl, will go over CommonCrawl’s extensive use of Hadoop to fulfill their mission of building an open, and accessible Web-Scale crawl. He will discuss their Hadoop data processing pipeline, including their PageRank implementation, describe techniques they use to optimize Hadoop, discuss the design of their URL Metadata service, and conclude with details on how you can leverage the crawl (using Hadoop) today.

webcrawlhadoop
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks

This document provides tips and best practices for optimizing Apache Spark performance and resource allocation. It discusses: - The components of Spark including executors, drivers, and tasks - Configuring Spark on YARN and dynamic resource allocation - Optimizing memory usage, avoiding data skew, and reducing serialization costs - Best practices for Spark Streaming around microbatching, fault tolerance, and performance - Recommendations for running Spark on cloud object stores like S3

big datasparkhadoop
Auto-Scaling Using
Amazon EC2 and Scalr
Scalr, a redundant, self-curing, self-scaling hosting
solution built on EC2
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1603
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1357&categ
tinyurl.com/4lkr3n
tinyurl.com/c7num9
Scalr sourcecode:
http://scalr.googlecode.com/svn/trunk/
***

Recommended for you

Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012

This document provides a retrospective on data infrastructure at Facebook from 2007-2011 written by the ex-Facebook data infrastructure lead. It summarizes the goals of building a universal data logging and computing platform, the state and growth of the Hadoop cluster from 10TB to 50PB, and key components like Hive, Scribe, and reporting tools that helped various teams access and analyze data. It also discusses challenges around query performance, unnecessary duplication, and a lack of APIs that were missed opportunities. The overall message is that building useful services around the software was more important than the software itself.

datahadoophive
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)

This document discusses stream computing from an engineer's perspective. It begins by contrasting batch and stream processing, noting that stream processing handles data one record at a time with an emphasis on latency over throughput. The document then explores how to achieve scalability, performance, durability and availability in stream processing systems. It notes the tradeoffs between these goals and discusses challenges like handling failures. Specific open-source stream processing systems like Storm, Flink and Apex are then analyzed in terms of how they work, strengths, weaknesses and failure handling. The document concludes by discussing using distributed databases for state management in stream processing applications.

apexstream processingdata engineering
Nextag talk
Nextag talkNextag talk
Nextag talk

Hive provides an SQL-like interface to query data stored in Hadoop's HDFS distributed file system and processed using MapReduce. It allows users without MapReduce programming experience to write queries that Hive then compiles into a series of MapReduce jobs. The document discusses Hive's components, data model, query planning and optimization techniques, and performance compared to other frameworks like Pig.

Scalr overview
• By using Scalr, you can create a server farm that uses prebuilt AMIs
for load balancing, web servers, and databases. You also can
customize a generic AMI, which you can use to host your actual
application.
• Scalr monitors the health of the entire server farm, ensuring that
instances stay running and that load averages stay below a
configurable threshold. If an instance crashes, another one of the
proper type will be launched and added to the load balancer.
Scalr (2)
• Scalr is an open source, fully redundant, self-curing, and
self-scaling hosting environment that uses Amazon EC2.
• Scalr allows network administrators to create virtual
server farms, using prebuilt components. Scalr uses four
Amazon Machine Instances (AMIs) for load balancing,
databases, application server, and a generic base
image.
• Administrators can preconfigure one machine and, when
the load warrants, bring online additional machines with
the same image, to handle the increased requests.
Nginx and Memcached on EC2
400% boost!
Nginx and Memcached on EC2
400% boost!
(with a five minutes config tweak!)

Recommended for you

SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser

Hue is a web interface tool for exploring, analyzing, and visualizing data with Apache Hadoop. It allows users to prepare and browse data, compose queries using various editors and APIs, and productionize workflows. Key features include querying data, building search dashboards, and scheduling workflows. Hue aims to improve the SQL and search experience, enhance metadata search capabilities, and adopt a single page layout user interface.

hadoop summit
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha

This document describes how Apache Spark and Apache Lucene can be used together for near-real-time predictive model building. It discusses representing streaming device data in Lucene documents that are indexed for fast search and retrieval. A framework called Trapezium is used to build batch, streaming, and API services on top of Spark and Lucene. It shows how to index large datasets in Lucene efficiently using Spark and analyze retrieved devices to generate statistical and predictive models.

apache spark
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It is written in Java and uses a pluggable backend. Presto is fast due to code generation and runtime compilation techniques. It provides a library and framework for building distributed services and fast Java collections. Plugins allow Presto to connect to different data sources like Hive, Cassandra, MongoDB and more.

presto
Originally developed by Igor Sysoev for rambler.ru (second largest
Russian web-site), it is a high-performance HTTP server / reverse
proxy known for its stability, performance, and ease of use. The great
track record, a lot of great modules, and an active development
community have rightfully earned it a steady uptick of users
memcached is a high-performance, distributed memory object
caching system, generic in nature, but intended for use in
speeding up dynamic web applications by alleviating database
load.
“Memcached, the darling of every web-developer, is
capable of turning almost any application into a speed-
demon. Benchmarking one of my own Rails applications
resulted in ~850 req/s on commodity, non-optimized
hardware - more than enough in the case of this
application. However, what if we took Mongrel out of the
equation? Nginx, by default, comes prepackaged with the
Memcached module, which allows us to bypass the
Mongrel (from rubyforge) servers and talk to Memcached
directly. Same hardware, and a quick test later: ~3,550
req/s, or almost a 400% improvement!”
AWS (Hadoop) Meetup 30.04.09
Nginx and Memcached on EC2
400% boost!
http://tinyurl.com/3a7t9y
***

Recommended for you

Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data

a comprehensive good introduction to the the Big data world in AWS cloud, hadoop, Streaming, batch, Kinesis, DynamoDB, Hbase, EMR, Athena, Hive, Spark, Piq, Impala, Oozie, Data pipeline, Security , Cost, Best practices

streamingbatchkinesis
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified

Zeppelin is an open-source web-based notebook that enables data ingestion, exploration, visualization, and collaboration on Apache Spark. It has built-in support for languages like SQL, Python, Scala and R. Zeppelin notebooks can be stored in S3 for persistence and sharing. Apache Livy is a REST API that allows managing Spark jobs and provides a way to securely run and share notebooks across multiple users.

awsamazonbig data
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran

This document provides an overview of using Apache Spark with object stores like Amazon S3, Azure Blob Storage, and Google Cloud Storage. It discusses the key challenges of classpath configuration, credentials, code examples, and ensuring data consistency and durability. Specific tips are provided for configuring and working with S3 and Azure Blob Storage. The document emphasizes that object stores can be treated like any other URL, but some configuration is needed and performance/commitment challenges exist.

apache spark
NASDAQ exchange
re-play on AWS
your homework 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Persistent Django on Amazon
EC2 and EBS

Recommended for you

How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark

An over-ambitious introduction to Spark programming, test and deployment. This slide tries to cover most core technologies and design patterns used in SpookyStuff, the fastest query engine for data collection/mashup from the deep web. For more information please follow: https://github.com/tribbloid/spookystuff A bug in PowerPoint used to cause transparent background color not being rendered properly. This has been fixed in a recent upload.

apache sparkweb scrapingdata collection
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case

This document provides an overview of Apache Sqoop, a tool for transferring bulk data between Apache Hadoop and structured data stores like relational databases. It describes how Sqoop can import data from external sources into HDFS or related systems, and export data from Hadoop to external systems. The document also demonstrates how to use basic Sqoop commands to list databases and tables, import and export data between MySQL and HDFS, and perform updates during export.

big dataapachesqoop
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...

AWS Big Data Demystified is all about knowledge sharing b/c knowledge should be given for free. in this lecture we will dicusss the advantages of working with Zeppelin + spark sql, jdbc + thrift, ganglia, r+ spark r + livy, and a litte bit about ganglia on EMR.\ subscribe to you youtube channel to see the video of this lecture: https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

awsbig dataspark sql
AWS (Hadoop) Meetup 30.04.09
Credit:
Thomas Brox Røst,
Visiting researcher, Decision Systems Group, Harvard
Persistent Django
on Amazon EC2 and EBS - The easy way
thomas.broxrost.com
tinyurl.com/6b48g9
Now that Amazon’s Elastic Block Store (EBS) is publicly available,
running a complete Django installation on Amazon Web Services
(AWS) is easier than ever.
---
EBS provides persistent storage, which means that the Django database
is kept safe even after the Django EC2 instances terminate
To setup Django with persistent PostgreSQL database on AWS:
Set up an AWS account
Download and install the Elasticfox Firefox extension
Add your AWS credentials to Firefox
Create a new EC2 security group
By default, EC2 instances are an introverted lot: They prefer keeping to themselves and don’t expose any
of their ports to the outside world. We will be running a web application on port 8000 so therefore port
8000 has to be opened. (Normally we would be opening port 80, but since I will only be using the Django
development web server then port 8000 is preferable). SSH access is also essential, so port 22 should be
opened as well. To make this happen we must create a new security group where these ports are opened.

Recommended for you

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1

This document summarizes a presentation about Presto, an open source distributed SQL query engine. It discusses Presto's distributed and plug-in architecture, query planning process, and cluster configuration options. For architecture, it explains that Presto uses coordinators, workers, and connectors to distribute queries across data sources. For query planning, it shows how SQL queries are converted into logical and physical query plans with stages, tasks, and splits. For configuration, it reviews single-server, multi-worker, and multi-coordinator cluster topologies. It also provides an overview of Presto's recent updates.

prestodb
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)

Typesafe has launched Spark support for Mesosphere's Data Center Operating System (DCOS). Typesafe engineers are contributing to the Mesos support for Spark and Typesafe will provide commercial support for Spark development and production deployment on Mesos. Mesos' flexibility allows many frameworks like Spark to run on top of it. This document discusses Spark on Mesos in coarse-grained and fine-grained modes and some features coming soon like dynamic allocation and constraints.

apache sparkspark summit 2015
A eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águasA eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águas

O documento discute a eficiência energética e redução de perdas na gestão da água. Apresenta as principais etapas do ciclo da água e aponta que grande quantidade de água captada não chega aos usuários, o que representa ineficiência. Também destaca algumas áreas chave de intervenção como a otimização dos processos comerciais e a gestão eficiente de receitas e ativos.

perdas e fugasisasmas
Set up a key pair
Launch an EC2 Instance
Connect with your new instance (ssh using putty)
- Install subversion
- Install, initialize and launch PostgreSQL
- Modify PostgreSQL config to avoid username/password problems
- Restart PostgreSQL to enable new security policy
- Set up a database for Django
- Install Django (checkout from SVN)
- Install psycopg2 (for database access from Python)
Set up a Django project
Test the installation
Launch the dev server
Create a Django app
Create and mount an EBS Instance
Mount the filesystem
Move the database to persistent storage (with server stopped)
***
AWS
Elastic MapReduce
Amazon Elastic MapReduce

Recommended for you

Tribal Moose Power Point V2
Tribal Moose Power Point V2Tribal Moose Power Point V2
Tribal Moose Power Point V2

The document summarizes Tribal Moose's public relations strategies and results for 2009 and outlines plans for 2010. In 2009, Tribal Moose saw success growing their social media presence on Twitter and Facebook and receiving positive blogger reviews. Their PR work also improved search engine rankings. However, some opportunities were missed. For 2010, goals include increasing sales, building the brand and reputation, and strengthening key relationships through PR efforts like events and trade shows. Success requires timely client responses and clear communication.

gourmet saucebbq marketingpremium sauce
Cahier Planète Pme 2010
Cahier Planète Pme 2010Cahier Planète Pme 2010
Cahier Planète Pme 2010

Cahier spécial "Compétitivité internationale des PME" pour Planète PME 15 juin 2010 en présence du Président de la République, M. Nicolas Sarkozy

internationalubifrancecgpme
Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!
wk 2014kosmopolitanbrazilië
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09

Recommended for you

How To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePointHow To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePoint

Hi, I’m Nick Inglis and I’m the SharePoint Program Manager at AIIM International. AIIM is the community that provides education, research, and best practices to help organizations find, control, and optimize their information… and I am the SharePoint guy at AIIM. You can learn more about us at http://www.AIIM.org. Today we’re going to be talking about how to Collaborate and Adopt SharePoint successfully.

software release life cycleuser experiencemicrosoft sharepoint
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...

Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the Internet of Things How to move beyond corporate hype, and make the Internet of Things happen (almost) now.

cloudcampcloud computing
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Learn about the only solution to instantly provision a full-featured ETL environment running on AWS for less than your Sunday newspaper!

data integrationelastic mapreduceaws
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09

Recommended for you

Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS

Brightpearl is a cloud-based business management platform that provides e-commerce, inventory, order, customer, and shipping functionality to over 1,300 customers. It is built on Amazon Web Services (AWS) using various programming languages and services. Some challenges of building and scaling such a platform on AWS include designing for redundancy, performance, concurrency, cost efficiency, and failure tolerance.

javaaws
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09

These are the slides from my presentation at CLOUDCOMP 2009 on AppScale, an open source platform for running Google App Engine apps on. See our project home page at http://appscale.cs.ucsb.edu or our code page at http://code.google.com/p/appscale

appscalegoogle app enginecloud computing
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)

You sit on a big pile of data and want to know how to leverage it in your company? Interested in use-cases, examples and practical demos about the full Hadoop stack? Looking for big-data inspiration? In this talk we will cover: - Use-cases how implementing a Hadoop stack in TheNewMotion drastically helped us, software engineers, with our everyday challenges. And how Hadoop enables our management team, marketing and operations to become more data-driven. - Practical introduction into our data warehouse, analytical and visualization stack: Apache Pig, Impala, Hue, Apache Spark, IPython notebook and Angular with D3.js. - Easy deployment of the Hadoop stack to the cloud. - Hermes - our homegrown command-line tool which helps us automate data-related tasks. - Examples of exciting machine learning challenges that we are currently tackling - Hadoop with Azure and Microsoft stack.

hadoop bigdata impala pig hue angularjs ipython sp
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Data and Computing Trends:
Source: Facebook
• Explosion of Data
– Web Logs, Ad-Server logs, Sensor Networks, Seismic Data, DNA
sequences (?)
– User generated content/Web 2.0
– Data as BI => Data as product (Search, Ads, Digg, Quantcast, …)
• Declining Revenue/GB
– Milk @ $3/gallon => $15M / GB
– Ads @ 20c / 10^6 impressions => $1/GB
– Google Analytics, Facebook Lexicon == Free!
• Hardware Trends
– Commodity Rocks: $4K 1U box = 8 cores + 16GB mem + 4x1TB
– CPU: SMP  NUMA, Storage: $ Shared-Nothing << $ Shared,
Networking: Ethernet
Hadoop

Recommended for you

Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML

Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.

machine learningh2oaih2o
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts

Slide presentation from Webinar on February 17, 2016. People in analytical roles are demanding more and more compute and storage to get their jobs done. Instead of building out infrastructure for a few employees or a department, systems engineers and IT managers can find value in creating a compute stack in the cloud to meet the fluctuating demand of their clients. In this 45-minute webinar, you’ll learn: - How to identify the right analytical workloads - How to create a scalable compute environment using the cloud for analysts in under 10 minutes - How to best manage costs associated with the cloud compute stack - How to create dedicated client stacks with their own scratch space as well as general access to reference data Health systems departments, research & development departments, and business analyst groups all face silos of these challenging, compute-intensive use cases. By learning how to quickly build this flexible workflow that can be scaled up and down (or off) instantly, you can support business objectives while efficiently managing costs.

analyticsapplicationscloud computing
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture

Introduction to Apache Spark, architecture, resilient distributed datasets, working, use-cases and comparision with Hadoop.

big datanosqlhadoop
Hadoop
• Parallel Computing platform
– Distributed FileSystem (HDFS)
– Parallel Processing model (Map/Reduce)
– Express Computation in any language
– Job execution for Map/Reduce jobs
(scheduling+localization+retries/speculation)
• Open-Source
– Most popular Apache project!
– Highly Extensible Java Stack (@ expense of Efficiency)
– Develop/Test on EC2!
• Ride the commodity curve:
– Cheap (but reliable) shared nothing storage
– Data Local computing (don’t need high speed networks)
– Highly Scalable (@expense of Efficiency)
Hadoop
Map/Reduce DataFLow
Hadoop Running MapReduce

Recommended for you

Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture

Introduction to Apache Spark, understanding of the architecture, resilient distributed datasets and working.

sparkbig datanosql
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo

http://bit.ly/1BTaXZP – As organizations look for even faster ways to derive value from big data, they are turning to Apache Spark is an in-memory processing framework that offers lightning-fast big data analytics, providing speed, developer productivity, and real-time processing advantages. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. By the end of the session, you’ll come away with a deeper understanding of how you can unlock deeper insights from your data, faster, with Spark.

big data everywhereapache sparkhadoop
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark

A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.

apache sparkapache kafkaanalytics
In Pictures (Source: Facebook)
Looks like this ..
Disks
Node
Disks
Node
Disks
Node
Disks
Node
Disks
Node
Disks
Node
1 Gigabit 4-8 Gigabit
Node
=
DataNode
+
Map-Reduce
Why HIVE?
• Large installed base of SQL users 
– ie. map-reduce is for ultra-geeks
– much much easier to write sql query
• Analytics SQL queries translate really well
to map-reduce
• Files as insufficient data management
abstraction
– Tables, Schemas, Partitions, Indices
AWS (Hadoop) Meetup 30.04.09

Recommended for you

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS

This document provides an overview of migrating applications and workloads to AWS. It discusses key considerations for different migration approaches including "forklift", "embrace", and "optimize". It also covers important AWS services and best practices for architecture design, high availability, disaster recovery, security, storage, databases, auto-scaling, and cost optimization. Real-world customer examples of migration lessons and benefits are also presented.

legacytoolsmicrosoft
Cloud computing & lamp applications
Cloud computing & lamp applicationsCloud computing & lamp applications
Cloud computing & lamp applications

The document discusses strategies for scaling LAMP applications on cloud computing platforms like AWS. It recommends: 1) Moving static files to scalable services like S3 and using a CDN to distribute load. 2) Using dedicated caching systems like Memcache instead of local caches and storing sessions in Memcache or DynamoDB for scalability. 3) Scaling databases horizontally using master-slave replication or sharding across multiple availability zones for high availability and read scaling. 4) Leveraging auto-scaling and load balancing on AWS with tools like Elastic Load Balancers, CloudWatch, and scaling alarms to dynamically scale application instances based on metrics.

php cloudlampaws
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners

This is a presentation on apache hadoop technology. This presentation may be helpful for the beginners to know about the terminologies of hadoop. This presentation contains some pictures which describes about the working function of this technology. I hope it will be helpful for the beginners. Thank you.

Hive Query Language
• Basic SQL
– From clause subquery
– ANSI JOIN (equi-join only)
– Multi-table Insert
– Multi group-by
– Sampling
– Objects traversal
• Extensibility
– Pluggable Map-reduce scripts using
TRANSFORM
Data Warehousing at Facebook
(Scribe is a server for aggregating log data streamed in real time from a large
number of servers. It is designed to be scalable, extensible without client-side
modification, and robust to failure of the network or any specific machine)
Web Servers Scribe Servers
Filers
Hive on
Hadoop Cluster
Oracle RAC Federated MySQL
Hadoop Usage @ Facebook
• Data warehouse running Hive
• 600 machines, 4800 cores
• 3200 jobs per day
• 50+ engineers have used Hadoop
• Data statistics:
– Total Data: ~2.5PB
– Net Data added/day: ~15TB
• 6TB of uncompressed source logs
• 4TB of uncompressed dimension data reloaded daily
– Compression Factor ~5x (gzip, more with bzip)
• Usage statistics:
– 3200 jobs/day with 800K tasks(map-reduce tasks)/day
– 55TB of compressed data scanned daily
– 15TB of compressed output data written to hdfs
– 80 MM compute minutes/day
Hadoop Job types @ Facebook
• Production jobs: load data, compute
statistics, detect spam, etc
• Long experiments: machine learning, etc
• Small ad-hoc queries: Hive jobs, sampling
• GOAL: Provide fast response times for
small jobs and guaranteed service levels
for production jobs

Recommended for you

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners

This presentation is about apache hadoop technology. It may be helpful for the beginners to know some terminologies of hadoop.

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners

This presentation is about apache hadoop technology. This may be helpful for the beginners. The beginners will know about some terminologies of hadoop technology. There is also some diagrams which will show the working of this technology. Thank you.

litrecytechnologyengineering
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS

This document discusses logging scenarios using DynamoDB and Elastic MapReduce. It covers collecting log data in real-time using tools like Fluentd and storing it in DynamoDB. It then describes using EMR to perform ETL processes on the data, extracting from DynamoDB, transforming the data across EC2 instances, and loading to S3 or DynamoDB. Finally, it discusses analyzing the data using Redshift for queries or CloudSearch for search capabilities.

awscloud computing
Usage patterns in Yahoo
• ETL
– Put large data source (eg. Log files) onto the Hadoop File System
– Perform aggregations, transformations, normalizations on the data
– Load into RDBMS / data mart
• Reporting and Analytics
– Run canned and ad-hoc queries over large data
– Run analytics and data mining operations on large data
– Produce reports for end-user consumption or loading into data mart
Usage patterns in Yahoo
• Data Processing Pipelines
– Multi-step pipelines for data processing
– Coordination, scheduling, data collection and publishing of feeds
– SLA carrying, regularly scheduled jobs
• Machine Learning & Graph Algorithms
– Traverse large graphs and data sets, building models and classifiers
– Implement machine learning algorithms over massive data sets
• General Back end processing
– Implement significant portions of back-end, batch oriented processing on the grid
– General computation framework
– Simplify back-end architecture
What is Hadoop Pig
Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled
with infrastructure for evaluating these programs.
http://www.cloudera.com/hadoop-training-pig-introduction
AWS (Hadoop) Meetup 30.04.09

Recommended for you

12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL

The document discusses scaling a web application called Wanelo that is built on PostgreSQL. It describes 12 steps for incrementally scaling the application as traffic increases. The first steps involve adding more caching, optimizing SQL queries, and upgrading hardware. Further steps include replicating reads to additional PostgreSQL servers, using alternative data stores like Redis where appropriate, moving write-heavy tables out of PostgreSQL, and tuning PostgreSQL and the underlying filesystem. The goal is to scale the application while maintaining PostgreSQL as the primary database.

solrapache solrtwemproxy
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics

This document provides an overview of SK Telecom's use of big data analytics and Spark. Some key points: - SKT collects around 250 TB of data per day which is stored and analyzed using a Hadoop cluster of over 1400 nodes. - Spark is used for both batch and real-time processing due to its performance benefits over other frameworks. Two main use cases are described: real-time network analytics and a network enterprise data warehouse (DW) built on Spark SQL. - The network DW consolidates data from over 130 legacy databases to enable thorough analysis of the entire network. Spark SQL, dynamic resource allocation in YARN, and BI integration help meet requirements for timely processing and quick responses.

sparksparkstreamingsparksql
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong

This document provides an overview of SK Telecom's use of big data analytics and Spark. Some key points: - SKT collects around 250 TB of data per day which is stored and analyzed using a Hadoop cluster of over 1400 nodes. - Spark is used for both batch and real-time processing due to its performance benefits over other frameworks. Two main use cases are described: real-time network analytics and a network enterprise data warehouse (DW) built on Spark SQL. - The network DW consolidates data from over 130 legacy databases to enable thorough analysis of the entire network. Spark SQL, dynamic resource allocation in YARN, and integration with BI tools help meet requirements for timely processing and quick

spark summit euapache spark
AWS (Hadoop) Meetup 30.04.09
Thanks to the kind sponsorship
to the AWS LONDON USER
GROUP
from
LONDON USERGROUP
Thank you !
@n1c0la

More Related Content

What's hot

Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
Spark Summit
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
Sigmoid
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
Jason Hubbard
 
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
Joydeep Sen Sarma
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
Ilya Ganelin
 
Nextag talk
Nextag talkNextag talk
Nextag talk
Joydeep Sen Sarma
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
DataWorks Summit/Hadoop Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
Omid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
Omid Vahdaty
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
Peng Cheng
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
Davin Abraham
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
Omid Vahdaty
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark Summit
 

What's hot (20)

Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012Facebook Retrospective - Big data-world-europe-2012
Facebook Retrospective - Big data-world-europe-2012
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 

Viewers also liked

A eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águasA eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águas
mbenquerenca
 
Tribal Moose Power Point V2
Tribal Moose Power Point V2Tribal Moose Power Point V2
Tribal Moose Power Point V2
Word's Out PR
 
Cahier Planète Pme 2010
Cahier Planète Pme 2010Cahier Planète Pme 2010
Cahier Planète Pme 2010
Renaud Favier
 
Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!
Kosmopolitan
 
How To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePointHow To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePoint
Nick Inglis
 
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
Chris Purrington
 

Viewers also liked (6)

A eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águasA eficiência energética e a redução de perdas e fugas na gestão das águas
A eficiência energética e a redução de perdas e fugas na gestão das águas
 
Tribal Moose Power Point V2
Tribal Moose Power Point V2Tribal Moose Power Point V2
Tribal Moose Power Point V2
 
Cahier Planète Pme 2010
Cahier Planète Pme 2010Cahier Planète Pme 2010
Cahier Planète Pme 2010
 
Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!Teaser WK 2014 Brazilië!
Teaser WK 2014 Brazilië!
 
How To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePointHow To Collaborate And Deploy SharePoint
How To Collaborate And Deploy SharePoint
 
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
CloudCamp. Paul Hopton, @relayr_cloud - 'The WunderBar - Bootstrapping the In...
 

Similar to AWS (Hadoop) Meetup 30.04.09

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
Jonathan Holloway
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
Chris Bunch
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
Marcel Krcah
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
Avere Systems
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Cloud computing & lamp applications
Cloud computing & lamp applicationsCloud computing & lamp applications
Cloud computing & lamp applications
Corley S.r.l.
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
Paolo latella
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
Konstantin Gredeskoul
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
Yousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 

Similar to AWS (Hadoop) Meetup 30.04.09 (20)

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Cloud computing & lamp applications
Cloud computing & lamp applicationsCloud computing & lamp applications
Cloud computing & lamp applications
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 

More from Chris Purrington

PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
Chris Purrington
 
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon -   AI Ethics - Bias in DataLucy Craddock CloudCampLondon -   AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
Chris Purrington
 
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Chris Purrington
 
Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019
Chris Purrington
 
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
Chris Purrington
 
CloudCamp. Rhys Sharp Applications & PaaS
CloudCamp. Rhys Sharp   Applications & PaaSCloudCamp. Rhys Sharp   Applications & PaaS
CloudCamp. Rhys Sharp Applications & PaaS
Chris Purrington
 
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
Chris Purrington
 
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
Chris Purrington
 
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
Chris Purrington
 
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
Chris Purrington
 
CloudCamp. Danile Power - It's All About Managing the App
CloudCamp. Danile Power -  It's All About Managing the AppCloudCamp. Danile Power -  It's All About Managing the App
CloudCamp. Danile Power - It's All About Managing the App
Chris Purrington
 
CloudCamp justin cormack hypervise my app!
CloudCamp   justin cormack    hypervise my app! CloudCamp   justin cormack    hypervise my app!
CloudCamp justin cormack hypervise my app!
Chris Purrington
 
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
Steve chambers   cloud psychopaths- cloud camplondon 24.10.12Steve chambers   cloud psychopaths- cloud camplondon 24.10.12
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
Chris Purrington
 
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
Chris Purrington
 
Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris Purrington
 
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Chris Purrington
 
Joe baguley cloudcamp london intro 24.10.12
Joe baguley   cloudcamp london intro 24.10.12Joe baguley   cloudcamp london intro 24.10.12
Joe baguley cloudcamp london intro 24.10.12
Chris Purrington
 
5. shanley cloudcamplondon
5. shanley cloudcamplondon5. shanley cloudcamplondon
5. shanley cloudcamplondon
Chris Purrington
 
4. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 20124. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 2012
Chris Purrington
 
1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp
Chris Purrington
 

More from Chris Purrington (20)

PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
PaulJohnston CloudCamp London Ethics Climate Change Nov 2019
 
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon -   AI Ethics - Bias in DataLucy Craddock CloudCampLondon -   AI Ethics - Bias in Data
Lucy Craddock CloudCampLondon - AI Ethics - Bias in Data
 
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
Dr Caitlin McDonald CloudCamp London - Sustainable Digital Ethics through Evo...
 
Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019Chris Swan Intro CloudCamp London November 2019
Chris Swan Intro CloudCamp London November 2019
 
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
@cpswan on what is hybridcloud and shouldn't you have hybridstrategy
 
CloudCamp. Rhys Sharp Applications & PaaS
CloudCamp. Rhys Sharp   Applications & PaaSCloudCamp. Rhys Sharp   Applications & PaaS
CloudCamp. Rhys Sharp Applications & PaaS
 
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...CloudCamp.  Julian Fischer   Anynines - migrating a cloud foundry from vm war...
CloudCamp. Julian Fischer Anynines - migrating a cloud foundry from vm war...
 
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
CloudCamp. Richard Weerasinghe, ElasticBox - 'Cloud-Enabling Enterprise Appli...
 
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...CloudCamp. Anthony Stanley -  'The Anatomy of an App.. Everything but the App...
CloudCamp. Anthony Stanley - 'The Anatomy of an App.. Everything but the App...
 
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...CloudCamp.  Philip Carey:  'Grey Cloud' do you pass the Yorkshire Test. A lig...
CloudCamp. Philip Carey: 'Grey Cloud' do you pass the Yorkshire Test. A lig...
 
CloudCamp. Danile Power - It's All About Managing the App
CloudCamp. Danile Power -  It's All About Managing the AppCloudCamp. Danile Power -  It's All About Managing the App
CloudCamp. Danile Power - It's All About Managing the App
 
CloudCamp justin cormack hypervise my app!
CloudCamp   justin cormack    hypervise my app! CloudCamp   justin cormack    hypervise my app!
CloudCamp justin cormack hypervise my app!
 
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
Steve chambers   cloud psychopaths- cloud camplondon 24.10.12Steve chambers   cloud psychopaths- cloud camplondon 24.10.12
Steve chambers cloud psychopaths- cloud camplondon 24.10.12
 
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12Phil wainewright  risks of eu clopud strategy   cloudcamp london 24.10.12
Phil wainewright risks of eu clopud strategy cloudcamp london 24.10.12
 
Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12
 
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
Ali khajeh hosseini -plan forcloud - cloudcamp london 24.10.12
 
Joe baguley cloudcamp london intro 24.10.12
Joe baguley   cloudcamp london intro 24.10.12Joe baguley   cloudcamp london intro 24.10.12
Joe baguley cloudcamp london intro 24.10.12
 
5. shanley cloudcamplondon
5. shanley cloudcamplondon5. shanley cloudcamplondon
5. shanley cloudcamplondon
 
4. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 20124. james Governor cloud camp july 4 2012
4. james Governor cloud camp july 4 2012
 
1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp1. fran bennett 2012 07 04_cloudcamp
1. fran bennett 2012 07 04_cloudcamp
 

Recently uploaded

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 

Recently uploaded (20)

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 

AWS (Hadoop) Meetup 30.04.09

  • 2. Topics • Auto-Scaling Using Amazon EC2 and Scalr • Nginx and Memcached on EC2, a 400% boost! • NASDAQ exchange re-play on AWS • Persistent Django on Amazon EC2 and EBS • Taking Massive Distributed Computing to the Common Man - Hadoop on Amazon EC2/S3
  • 5. Auto-Scaling Using Amazon EC2 and Scalr Scalr, a redundant, self-curing, self-scaling hosting solution built on EC2
  • 9. Scalr overview • By using Scalr, you can create a server farm that uses prebuilt AMIs for load balancing, web servers, and databases. You also can customize a generic AMI, which you can use to host your actual application. • Scalr monitors the health of the entire server farm, ensuring that instances stay running and that load averages stay below a configurable threshold. If an instance crashes, another one of the proper type will be launched and added to the load balancer.
  • 10. Scalr (2) • Scalr is an open source, fully redundant, self-curing, and self-scaling hosting environment that uses Amazon EC2. • Scalr allows network administrators to create virtual server farms, using prebuilt components. Scalr uses four Amazon Machine Instances (AMIs) for load balancing, databases, application server, and a generic base image. • Administrators can preconfigure one machine and, when the load warrants, bring online additional machines with the same image, to handle the increased requests.
  • 11. Nginx and Memcached on EC2 400% boost!
  • 12. Nginx and Memcached on EC2 400% boost! (with a five minutes config tweak!)
  • 13. Originally developed by Igor Sysoev for rambler.ru (second largest Russian web-site), it is a high-performance HTTP server / reverse proxy known for its stability, performance, and ease of use. The great track record, a lot of great modules, and an active development community have rightfully earned it a steady uptick of users
  • 14. memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. “Memcached, the darling of every web-developer, is capable of turning almost any application into a speed- demon. Benchmarking one of my own Rails applications resulted in ~850 req/s on commodity, non-optimized hardware - more than enough in the case of this application. However, what if we took Mongrel out of the equation? Nginx, by default, comes prepackaged with the Memcached module, which allows us to bypass the Mongrel (from rubyforge) servers and talk to Memcached directly. Same hardware, and a quick test later: ~3,550 req/s, or almost a 400% improvement!”
  • 16. Nginx and Memcached on EC2 400% boost! http://tinyurl.com/3a7t9y ***
  • 17. NASDAQ exchange re-play on AWS your homework 
  • 20. Persistent Django on Amazon EC2 and EBS
  • 22. Credit: Thomas Brox Røst, Visiting researcher, Decision Systems Group, Harvard Persistent Django on Amazon EC2 and EBS - The easy way thomas.broxrost.com tinyurl.com/6b48g9
  • 23. Now that Amazon’s Elastic Block Store (EBS) is publicly available, running a complete Django installation on Amazon Web Services (AWS) is easier than ever. --- EBS provides persistent storage, which means that the Django database is kept safe even after the Django EC2 instances terminate
  • 24. To setup Django with persistent PostgreSQL database on AWS: Set up an AWS account Download and install the Elasticfox Firefox extension Add your AWS credentials to Firefox Create a new EC2 security group By default, EC2 instances are an introverted lot: They prefer keeping to themselves and don’t expose any of their ports to the outside world. We will be running a web application on port 8000 so therefore port 8000 has to be opened. (Normally we would be opening port 80, but since I will only be using the Django development web server then port 8000 is preferable). SSH access is also essential, so port 22 should be opened as well. To make this happen we must create a new security group where these ports are opened.
  • 25. Set up a key pair Launch an EC2 Instance Connect with your new instance (ssh using putty) - Install subversion - Install, initialize and launch PostgreSQL - Modify PostgreSQL config to avoid username/password problems - Restart PostgreSQL to enable new security policy - Set up a database for Django - Install Django (checkout from SVN) - Install psycopg2 (for database access from Python) Set up a Django project Test the installation Launch the dev server Create a Django app Create and mount an EBS Instance Mount the filesystem Move the database to persistent storage (with server stopped)
  • 26. ***
  • 39. Data and Computing Trends: Source: Facebook • Explosion of Data – Web Logs, Ad-Server logs, Sensor Networks, Seismic Data, DNA sequences (?) – User generated content/Web 2.0 – Data as BI => Data as product (Search, Ads, Digg, Quantcast, …) • Declining Revenue/GB – Milk @ $3/gallon => $15M / GB – Ads @ 20c / 10^6 impressions => $1/GB – Google Analytics, Facebook Lexicon == Free! • Hardware Trends – Commodity Rocks: $4K 1U box = 8 cores + 16GB mem + 4x1TB – CPU: SMP  NUMA, Storage: $ Shared-Nothing << $ Shared, Networking: Ethernet
  • 41. Hadoop • Parallel Computing platform – Distributed FileSystem (HDFS) – Parallel Processing model (Map/Reduce) – Express Computation in any language – Job execution for Map/Reduce jobs (scheduling+localization+retries/speculation) • Open-Source – Most popular Apache project! – Highly Extensible Java Stack (@ expense of Efficiency) – Develop/Test on EC2! • Ride the commodity curve: – Cheap (but reliable) shared nothing storage – Data Local computing (don’t need high speed networks) – Highly Scalable (@expense of Efficiency)
  • 45. In Pictures (Source: Facebook)
  • 46. Looks like this .. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4-8 Gigabit Node = DataNode + Map-Reduce
  • 47. Why HIVE? • Large installed base of SQL users  – ie. map-reduce is for ultra-geeks – much much easier to write sql query • Analytics SQL queries translate really well to map-reduce • Files as insufficient data management abstraction – Tables, Schemas, Partitions, Indices
  • 49. Hive Query Language • Basic SQL – From clause subquery – ANSI JOIN (equi-join only) – Multi-table Insert – Multi group-by – Sampling – Objects traversal • Extensibility – Pluggable Map-reduce scripts using TRANSFORM
  • 50. Data Warehousing at Facebook (Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine) Web Servers Scribe Servers Filers Hive on Hadoop Cluster Oracle RAC Federated MySQL
  • 51. Hadoop Usage @ Facebook • Data warehouse running Hive • 600 machines, 4800 cores • 3200 jobs per day • 50+ engineers have used Hadoop • Data statistics: – Total Data: ~2.5PB – Net Data added/day: ~15TB • 6TB of uncompressed source logs • 4TB of uncompressed dimension data reloaded daily – Compression Factor ~5x (gzip, more with bzip) • Usage statistics: – 3200 jobs/day with 800K tasks(map-reduce tasks)/day – 55TB of compressed data scanned daily – 15TB of compressed output data written to hdfs – 80 MM compute minutes/day
  • 52. Hadoop Job types @ Facebook • Production jobs: load data, compute statistics, detect spam, etc • Long experiments: machine learning, etc • Small ad-hoc queries: Hive jobs, sampling • GOAL: Provide fast response times for small jobs and guaranteed service levels for production jobs
  • 53. Usage patterns in Yahoo • ETL – Put large data source (eg. Log files) onto the Hadoop File System – Perform aggregations, transformations, normalizations on the data – Load into RDBMS / data mart • Reporting and Analytics – Run canned and ad-hoc queries over large data – Run analytics and data mining operations on large data – Produce reports for end-user consumption or loading into data mart
  • 54. Usage patterns in Yahoo • Data Processing Pipelines – Multi-step pipelines for data processing – Coordination, scheduling, data collection and publishing of feeds – SLA carrying, regularly scheduled jobs • Machine Learning & Graph Algorithms – Traverse large graphs and data sets, building models and classifiers – Implement machine learning algorithms over massive data sets • General Back end processing – Implement significant portions of back-end, batch oriented processing on the grid – General computation framework – Simplify back-end architecture
  • 55. What is Hadoop Pig Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. http://www.cloudera.com/hadoop-training-pig-introduction
  • 58. Thanks to the kind sponsorship to the AWS LONDON USER GROUP from

Editor's Notes

  1. 200bytes/transaction Milk – assuming each transaction is for 1Gallon Who needs another programming Language (PLSQL ) Gotchas later on (about networking trends) Anyone can rent a computer!!!! (UC Berkeley)
  2. UC Berkeley EC2 example
  3. UC Berkeley EC2 example
  4. Point out that now we know how HDFS works – we can run maps close to data
  5. Point out that now we know how HDFS works – we can run maps close to data
  6. Point out that now we know how HDFS works – we can run maps close to data
  7. Nomenclature: Core switch and Top of Rack
  8. Simple map-reduce is easy – but it can get complicated very quickly.
  9. Multi table inserts and multi group by’s allow us to reduce the number of scans required. Poor man’s alternative to MQO.