Chep2012
- 2. A non-exhaustive
Middleware timeline...
1999
Purdue University Network Computing Hubs (PUNCH)
Kapadia, Fortes, Lundstrom, Figueiredo et al.
Powered nanoHUB
Virtual file system
Shadow Accounts
Access to batch queues
Interactive applications via VNC
- 3. A Middleware timeline...
The Grid
1999 2001
Purdue University Network Computing Hubs (PUNCH)
Powered nanoHUB
Virtual file system
Shadow Accounts
Access to batch queues
Interactive applications via VNC
- 5. “Anatomy of the Grid”
• “Why do we also consider application programming interfaces (APIs) and
software development kits (SDKs)? There is, of course, more to VOs than
interoperability, protocols, and services. Developers must be able to
develop sophisticated applications in complex and dynamic execution
environments. Users must be able to operate these applications. Application
robustness, correctness, development costs, and maintenance costs are all
important concerns. Standard abstractions, APIs, and SDKs can
accelerate code development, enable code sharing, and
enhance application portability. APIs and SDKs are an adjunct to,
not an alternative to, protocols.”
• “In summary, our approach to Grid architecture emphasizes
the identification and definition of protocols and services,
first, and APIs and SDKs, second.”
• “The anatomy of the grid” Foster, Kesselman, Tuecke, published in 2001
- 6. A Middleware timeline...
The Grid
1999 2001
2003
PUNCH InVIGO
Powered nanoHUB Virtual file system
Virtual file system Virtual machines
Shadow Accounts Overlay Networks
Access to batch queues Access to batch queues
Interactive applications via VNC Interactive applications via VNC
- 7. • Fortes and
InVIGO
Figueiredo, circa
2004/2005
• Virtual machines,
virtual file system,
virtual networks
• In 2012: Only ViNE,
IPOP remains...maybe
because it was
created as a single
system rather than a
composition of
services with multiple
providers.
- 8. Virtualization
• Create an isolated and portable execution
environment which:
• Guarantees execution
• Isolates users
• Hides WAN complexities
• Delivers the data where it is needed
• Deploys application on-demand
- 9. A Middleware timeline...
The Grid
1999 2001 2003 2004
PUNCH InVIGO Dynamic Virtual Environments
Kate Keahey, mother of Nimbus
- 10. A Middleware timeline...
Eucalyptus, UCSB
Nimbus, UC
Opennebula, Madrid
Openstack
The Grid Cloudstack
1999 2001 2003 2004
2008-2012
PUNCH InVIGO DVE
- 11. A Middleware timeline...
Eucalyptus, UCSB
Nimbus, UC
Opennebula, Madrid
Openstack
The Grid Cloudstack
1999 2001 2003 2004
2008-2012
PUNCH InVIGO DVE
Virtual Organization Clusters
Clusters of Virtual Machines
provisioned on the grid to create a
personal condor cluster.
- 12. A Middleware timeline...
Eucalyptus, UCSB
Nimbus, UC
Opennebula, Madrid
Openstack
The Grid Cloudstack
1999 2001 2003 2004
2008-2012
PUNCH InVIGO DVE
Virtual Organization Clusters
Clusters of Virtual Machines
provisioned on the grid to create a
personal condor cluster.
- 13. Google trends
VOCs
• Cloud computing trending down, while “Big Data” is booming.
Virtualization remains “constant”.
- 14. Careful, Head Winds
Ahead
• Cloud
Computing
Going down to
the “through of
Disillusionment”
• “Big Data” on
the Technology
Trigger
- 15. Clouds are in Production
• Amazon Web Services (AWS), reported to reach $1B business in 2012.
• http://www.geekwire.com/2011/amazon-web-services-billiondollar-business/
• Zynga, reportedly spent $100M per year on AWS. Moved to their own cloud (zcloud).
Used to deploy on EC2 and do reverse cloud bursting. Now “owning the base and renting
the peak”. Zynga can add as many as 1,000 new servers to accommodate a surge of users
in a 24-hour period. The company’s servers can deliver a petabyte of data to users each
day.
• http://www.wired.com/cloudline/2012/03/zynga-zcloud/
- 16. Proven scalability
• Cyclecloud provisioned a 50,000 cores on EC2 (April 2012).
• LXCLOUD@CERN demonstrated management of 16,000
virtual machines using Opennebula (Summer 2010).
• Cloudstack (now an Apache incubator project) planning
scalability to 50,000 hypervisors by the end of 2012.
• Hadoop scales to 30 PB (at Facebook, ~March 2011)
• Q1 2012, Amazon S3 was 905 Billion objects, routinely
accessed at 650,000 requests per second.
- 17. Yet,
We do not seem to embrace it
• 50 hosts in LXCLOUD running VMs for batch process
• FermiCloud, for internal use only, ~200 VMs
• Pales in comparison to the scale seen in industry
Thanks to Ulrich Schwickerath and Steve Timm for figures
Disclaimer: Opinion is not theirs :)
- 18. A clue from Industry
• KPMG Survey, “Clarity in the Cloud: Business
Adoption”
Text
“I don’t believe everyone yet fully
realizes how much this stimulates
innovation, how many opportunities will
be presented, how many new challenges
will need to be addressed, and how
much change is coming”, Pat Howard,
VP Global Services, IBM Partner
- 19. SaaS
• 1.8 PB transferred in last
6 months
• GO on ESnet: 643 TBs in
last 6 months, 25 sites
exceeded 3 Gbps.
Text
• GO for XSEDE: 607 TBs
transferred, 4 sites
exceeded 3 Gbps.
• Leveraged Cloud APIs to
provide a new service
Thanks to Raj Kettimuthu and Lee Liming for the data
- 20. PaaS
• Azure
• Amazon Bean Stalk
• Heroku, PaaS for facebook
applications
• Openshift, now open source
(May 2012)
PaaS have not really seen much success in the
scientific community, but could be used to
create new types of scalable applications
- 21. A PaaS for personal
Condor and GO
• Globus Provision by Borja Sotomayor
• http://www.globus.org/provision/
[general]
deploy: ec2
domains: simple
[domain-simple]
users: gp-user
gridftp: yes
nis: yes
filesystem: nfs
condor: yes
condor-nodes: 4
This solves my problem
go-endpoint: go-user#gp-test
go-auth: go from 1999 when I wanted
a batch farm at
[ec2]
keypair: gp-key
keyfile: ~/.ec2/gp-key.pem
hand...utility computing
username: ubuntu
ami: latest-32bit
instance-type: t1.micro
[globusonline]
ssh-key: ~/.ssh/id_rsa
- 22. IaaS on OSG at Clemson
• Transform the Grid into
a Cloud
• All sites deploy an
hypervisor
• Provision VMs
depending on jobs
queued in batch
• Keep normal grid
workflow
• Move to Cloud by
VTDC08, PDP09, CCGRID09,
ICAC10, JGC 10,FGCS10
offering an “EC2”
interface
- 23. First Gen IaaS@CU
• Campus firewall
• NATed cluster
• Fear or VMs: no bridge network, no NAT even,
only userland net on the hypervisors
• Meant:
• Developed a pull base task dispatcher (Kestrel,
XMPP based used by STAR)
• Created image in DMZ (See Tony Cass’s talk
for VM exchange, build trust in VM provenance,
HEPiX wg on virtualization)
• Started VMs as regular batch jobs
No interactive access
- 24. STAR with Kestrel
• http://wiki.github.com/legastero/Kestrel/
• Built to deal with Clemson’s “adverse”
networking environment
• Started as a student project based on
the idea that XMPP ( Jabber ) was a
scalable, production proven messaging
protocol.
• Run IM client in VM and send IM
Lots of french non-sense and then: “...To simulate the equivalent
messages to manage jobs
sample of 12.2 Billion Monte-Carlo events with ~ 10 Million
• All VM instances are buddies in a accepted by event triggering after full event reconstruction, we
would have taken 3 years at BNL on 50 machines This Monte-
Jabber server
Carlo event generation would essentially not have been done.
With the resources from cloud, we took 3-4 weeks.”
–Jerome Lauret BNL, STAR
- 25. Second Gen IaaS@CU
• Sub-interface created on all nodes (only
one NIC per node)
• VLAN provisioned to isolate VM traffic
• Bridge networking enabled, VM get address
via DHCP
• Demonstrated thousands VMs scale.
• Opennebula provisioning + Cumulus S3
storage to upload images.
- 26. Onecloud
• Opennebula based IaaS at
Clemson developed through
the NSF EXTENCI project
(OSG+XSEDE)
• Used by STAR (see ACAT
2010, CHEP 2010)
• CERNVM can be used as a
client or a batch image.
• https://sites.google.com/site/cuonecloud/
- 27. Cloud back to Networking:
Openflow
• Onecloud, integrates Openflow to provide dynamic network services to solve NAT and
firewall issues. Developed an implementation of Amazon Security Groups and Elastic IP
using openflow. Avoids the use of complex and failure prone Network Overlays, once rule
are set, switch operates at line rate.
• Software Defined Networking, aims at bringing the control plane of the networking in the
hands of the developers. Opens the door for network aware applications, dynamic network
topologies according to load. A low level API for the network, we can now
program the network
• Google announced that their network ran using Openflow at the Open Networking
Summit (http://opennetsummit.org/):
• http://www.eetimes.com/electronics-news/4371179/Google-describes-its-OpenFlow-netwo
• There is work to support MPLS with Openflow, so that any Openflow switch could be
used as an MPLS switch. This means that OSCARS could be used to provision circuits on
an openflow network.
- 28. 3rd Gen Iaas@CU
• Move OneCloud in the
“ESNet” Science DMZ
• Deploy PaaS with OpenShift
• 100Gbps link
• 10Gbps to Amazon via I2
Commercial peering service
• Fully configurable via
Openflow and maybe
OSCARS ....
See: ARCHSTONE + VNOD
• Provide on-demand
DOE ASCR funded project
resources and on-
demand data paths. Dimitrios Katramatos (BNL)
- 29. Conclusions
• Virtualization has matured to the point of seeing fruitful
competition in Cloud IaaS solutions (both academically and in
industry)
• Cloud (and APIs) give us great agility to create new services to
serve the community. Reduce “time to use” of these services
and sustain scale.
• Clouds probably fulfilling the true vision of Grids
• Advanced VM provisioning and network services mean that on-
demand, elastic data centers are possible today.
• This work was possible through support from NSF
OCI-0753335, OCI-1007115 and BMW