Hpc lunch and learn

IDEAS for thought
SHPC lunch and learn
JULY 25, 2013

John D. Almon
• Full stack software engineer
• Implemented RTM on GPU using MPI
• Implemented Cloud basedWEM using SOA
• Terabyte scale database design and data warehousing
• Architected hybrid web interpretation and processing system
• C++, Java, MPI, C, Oracle PL/SQL, HTML,Web Based Systems, XML
• Managed software team
• Currently serves as CEO ofAdvanced SeismicTechnologies

Small HPC setup - Guess what company
• Fiber optic to every desktop using HPC grid
• 400Terabytes of Storage
• 300 x 10 GbE ports
• 1500 x 1 GbE ports
• Desktop workstations automatically added to HPC grid after hours
• 5,000 AMD processors + 3,000 desktop processors at night

Monsters University
• 100 Million CPU hours
• 5.5 million individual hairs
• 127 simulated garments
• Global illumination ray tracing

Key point #1
Perhaps we can learn new techniques from
other industries that operate at scale

Bi Modal Distribution of Developers
This shapes Architecture and Design Innovation
Loosely coupled code
Fast hardware
Open source
Closely coupled code
Slow hardware
More optimization
Geoscience Gap
Massive hardware changes

Better compilers and cheaper hardware has
changed everything about software development
• No more fortran ( sort of )
• Object oriented approach
• Teenage internet billionaires

Software access patterns affect memory
speed ( affected by data and users )
Word Size Affects
Memory Bandwidth
Temporal Locality &
Spatial Locality
Can affect bandwidth

Memory Mountain software code
/* Iterate over first "elems" elements of array "data" with stride of
* "stride". */
void test(int elems, int stride)
{
int i;
double result = 0.0;
volatile double sink;
for (i = 0; i < elems; i += stride)
result += data[i];
sink = result; /* So compiler doesn't optimize away the loop */
}

Everything is a cache ( memory heirachy )
• Register, ~2ns
• Primary cache, ~4-5ns
• Secondary cache, ~30ns
• Main memory, ~22ns
• Magnetic Disk, ~3ms
• SSD,~100µs
• File server on Gigabit ethernet
• Cloud
Bottleneck is the
memory bus
Bottleneck is the
network

New Paradigm for Optimization of Compute
at Cluster / Cloud level
• Pre sorting / caching of data for maximum
throughput
• Hueristic analysis at the application level
• Optimization of hardware resources determined by
the application
• Hardware switching based on access patterns of
application and user

All developers are:
(artists | engineers | brilliant | clueless )
• There is no one right way to build a piece of software
• Heterogeous development staff builds heterogeneous
solutions
• What about UI / UX ( User Interface / User Experience )
• Business workflows should drive UI / UX
• Steve jobs was tyrannical about every detail fitting into his one
overaching product vision

Key point #2
Software developers shape the choice of
architecture and available tools

2 Companies with really “Big Data”

• $50 Billion in revenue
• 30,000 + employees
• Optimization throughout entire stack
• Google Filesystem, Operating System, CHROME
• 2,000,000 servers
• Free food to keep their developers working long
hours

Google tools
• Google Hangout - collaboration
• Google Maps
• Google compute engine
• Google bigQuery

• $1 Billion data center in Iowa
• 450,000 servers
• API first development strategy
• Supports multiple interface connectivity using
“restful” applications
• Compete with UI / UX
• Creates user lock in through iterative conditioning

Iterative conditioning
• Workflows are hard to learn
• You should need software training to learn how to use software
• Software fatigue
• Switching cost
• Adoption rates
• Advanced features
• Tracking all of this and dynamic menus and configuration

Facebook tools and contributions
• Apache Cassandra ( Big data database, linear
scalability )
• ApacheThrift ( cross language services )
Architecture choices provide insight … still have to
implement for specifics of Oil and Gas

Open Source Licensing
• MIT X11 License – ANY use permissible
• BSD – Identical to MIT X11
• GPL – no linking
• LPGL – linking allowed
• Appliances – ethical / versus legal
Must read the fine print before using, but can save very large amount
of time by using these frameworks and implementations where
possible

Key point #3
Internet companies have innovation at scale

Using REST architecture to go FAST

Representational State Transfer
• 6 constraints
• Client Server – clients are not concerned with data storage
• Stateless – server does not store client context
• Cacheable – client stores responses
• Layered system – client does not know if it is at end server or intermediary
• Optional code on demand – client downloads code and runs
• Uniform interface – decouples interface and allows each part to evolve
independently

Simplified REST
Web Browser Web Server
Database
File Servers
Presentation Layer
can’t handle
Geoscience or
local compute
Web server has the
majority of control
Compute Engine
REST API

REST with Mashup
Web Browser Web Server 1
Database
File Servers
Presentation Layer
can mashup data
from 2 separate
sources
Compute Engine
Web Server 2
REST API

REST with new application layer
Form window Application
Database
File Servers
Compute Engine
Web Server 2
REST API
OpenGLWindow
Web Browser

Internet architecture / legacy style code
• REST Architecture for NON – INTERNET
applications
• Can keep inside corporate networks
• Distributed systems architecture
• Predominant webAPI design model
• Allows for distributed development team
• Separate data model from view model
• But allows for computation on either side

Client Server
• FINALLY !! Interactive HPC apps made easy
• Our tabs are the clients connection to application
layer via a “REST” style API
• Application layer provides caching and file system
access
• Application layer provides access to heterogeneous
compute

Stateless
• Each tab does not know about other tabs
• This creates the ability to very quickly have
developer from different teams and disciplines work
independently
• Application layer provides synchronization states
• Application layer provides for off-workstation
transferability ( work from iPad on the Beach )

Cacheable
• Heuristic data sorting and precaching based on user /
algorithm needs
• Allows for compute distribution without presentation layer
needing to know
• Allows for disparate file systems
• Abstracts data location from user
• Communicate with HPC grid in more advanced manner

Layered System
• Allows for use of 3rd party plugins
• Allows EVERY application connect to HPC grid
• Graphics as plugins
• Workflows as plugins - dynamic workflow
• No menu on Amazon
• Optimize each layer independently

Code on demand
• Safer since security is controlled by application layer
• Sandbox each user and only give access with additional security
credentials
• Can download and run legacy code through Pinvoke
• DLL injection

Uniform Interface
• HTML for cross platform consistency
• User adoption and ease of use
• Internet style decoupling of functionality from
graphics creates a better user experience and more
intuitive style workflow
• Most graphic designers do NOT know C++
• Geoscientists won’t always agree on color scheme,
styles, icons

Most important benefits
• More flexibility means rapid application development and easier
maintenance
• Presentation layer needs change as business requirements needs
change over time
• Hooking into outside tools that have REST API’s
• Data
• Social
• Compute engines
• Mash ups

Key point #4
A REST architecture enables scalability,
extensible development, and mashup of
tools and ideas created for the Internet

InterestingTechnologies for Big Data

Google BigQuery
• Underlying technology is called DREMEL
• Uses google file system as abstraction for database
• Dremel can even execute a complex regular expression text matching on a huge
logging table that consists of about 35 billion rows and 20TB, in merely tens of
seconds

Cassandra
• Cassandra provides a structured key-value store with tunable
consistency.
• Keys map to multiple values, which are grouped into column families.
The column families are fixed when a Cassandra database is created,
but columns can be added to a family at any time.
• Furthermore, columns are added only to specified keys, so different
keys can have different numbers of columns in any given family.
• The values from a column family for each key are stored together.

Palantir
• Does work for government agencies
• High security layer that sits on top of disparate data sources
• The Palantir Stack Layer
• Brings together structured and unstructured data
• Serves as foundation for applications using the dataAPI
• Search and discovery layer
• Granular multi layered security model
• Revisioning database and original source tracking
• Collaboration and data editing

Ayasdi
• Topological data analysis using machine learning
• Can cross analyze multiple data
sources
• Query free approach

Zoom Data
• Automated connectivity to third party sources
• Visualization studio
• Interactive visualizations

WebGL ( Open GL in web browser )
• Could be used for presentation layer in mobile device
http://demos.vicomtech.org/x3dom/test/functional/volrenShaderBoun
daryEnh.xhtml
http://ourbricks.com/viewer/178d62ac29aa44459a6d57ce474fa6b6

Key point #5
Connect to these and other tools using REST

Questions ?
john@advancedseismic.com
832.544.7305

Hpc lunch and learn

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Hpc lunch and learn

Similar to Hpc lunch and learn (20)

Recently uploaded

Recently uploaded (20)

Hpc lunch and learn