SlideShare a Scribd company logo
Middleware Evolution:
 from Grids to Clouds,
a non-HEP Perspective.
      Dr. Sebastien Goasguen
       Clemson University
A non-exhaustive
     Middleware timeline...

      1999


Purdue University Network Computing Hubs (PUNCH)
    Kapadia, Fortes, Lundstrom, Figueiredo et al.
                   Powered nanoHUB
                    Virtual file system
                    Shadow Accounts
                 Access to batch queues
             Interactive applications via VNC
A Middleware timeline...

         The Grid
      1999 2001


Purdue University Network Computing Hubs (PUNCH)
                   Powered nanoHUB
                    Virtual file system
                    Shadow Accounts
                 Access to batch queues
             Interactive applications via VNC
The Grid
“Anatomy of the Grid”
•   “Why do we also consider application programming interfaces (APIs) and
    software development kits (SDKs)? There is, of course, more to VOs than
    interoperability, protocols, and services. Developers must be able to
    develop sophisticated applications in complex and dynamic execution
    environments. Users must be able to operate these applications. Application
    robustness, correctness, development costs, and maintenance costs are all
    important concerns. Standard abstractions, APIs, and SDKs can
    accelerate code development, enable code sharing, and
    enhance application portability. APIs and SDKs are an adjunct to,
    not an alternative to, protocols.”

•   “In summary, our approach to Grid architecture emphasizes
    the identification and definition of protocols and services,
    first, and APIs and SDKs, second.”

•   “The anatomy of the grid” Foster, Kesselman, Tuecke, published in 2001
A Middleware timeline...

              The Grid
           1999 2001

                               2003
            PUNCH                              InVIGO
      Powered nanoHUB                     Virtual file system
       Virtual file system               Virtual machines
       Shadow Accounts                  Overlay Networks
    Access to batch queues             Access to batch queues
Interactive applications via VNC   Interactive applications via VNC
•   Fortes and
                             InVIGO
    Figueiredo, circa
    2004/2005

•   Virtual machines,
    virtual file system,
    virtual networks

•   In 2012: Only ViNE,
    IPOP remains...maybe
    because it was
    created as a single
    system rather than a
    composition of
    services with multiple
    providers.
Virtualization
• Create an isolated and portable execution
  environment which:
 • Guarantees execution
 • Isolates users
 • Hides WAN complexities
 • Delivers the data where it is needed
 • Deploys application on-demand
A Middleware timeline...

       The Grid
    1999 2001 2003 2004


PUNCH    InVIGO   Dynamic Virtual Environments
                    Kate Keahey, mother of Nimbus
A Middleware timeline...
                           Eucalyptus, UCSB
                             Nimbus, UC
                          Opennebula, Madrid
                              Openstack
       The Grid               Cloudstack

    1999 2001 2003 2004
                          2008-2012
PUNCH    InVIGO   DVE
A Middleware timeline...
                              Eucalyptus, UCSB
                                Nimbus, UC
                             Opennebula, Madrid
                                 Openstack
       The Grid                  Cloudstack

    1999 2001 2003 2004
                              2008-2012
PUNCH    InVIGO   DVE


                        Virtual Organization Clusters
                           Clusters of Virtual Machines
                        provisioned on the grid to create a
                             personal condor cluster.
A Middleware timeline...
                              Eucalyptus, UCSB
                                Nimbus, UC
                             Opennebula, Madrid
                                 Openstack
       The Grid                  Cloudstack

    1999 2001 2003 2004
                              2008-2012
PUNCH    InVIGO   DVE


                        Virtual Organization Clusters
                           Clusters of Virtual Machines
                        provisioned on the grid to create a
                             personal condor cluster.
Google trends

                    VOCs




•   Cloud computing trending down, while “Big Data” is booming.
    Virtualization remains “constant”.
Careful, Head Winds
               Ahead
•   Cloud
    Computing
    Going down to
    the “through of
    Disillusionment”

•   “Big Data” on
    the Technology
    Trigger
Clouds are in Production


•   Amazon Web Services (AWS), reported to reach $1B business in 2012.

    •   http://www.geekwire.com/2011/amazon-web-services-billiondollar-business/



•   Zynga, reportedly spent $100M per year on AWS. Moved to their own cloud (zcloud).
    Used to deploy on EC2 and do reverse cloud bursting. Now “owning the base and renting
    the peak”. Zynga can add as many as 1,000 new servers to accommodate a surge of users
    in a 24-hour period. The company’s servers can deliver a petabyte of data to users each
    day.

    •   http://www.wired.com/cloudline/2012/03/zynga-zcloud/
Proven scalability
•   Cyclecloud provisioned a 50,000 cores on EC2 (April 2012).

•   LXCLOUD@CERN demonstrated management of 16,000
    virtual machines using Opennebula (Summer 2010).

•   Cloudstack (now an Apache incubator project) planning
    scalability to 50,000 hypervisors by the end of 2012.


•   Hadoop scales to 30 PB (at Facebook, ~March 2011)

•   Q1 2012, Amazon S3 was 905 Billion objects, routinely
    accessed at 650,000 requests per second.
Yet,
We do not seem to embrace it


• 50 hosts in LXCLOUD running VMs for batch process
• FermiCloud, for internal use only, ~200 VMs
• Pales in comparison to the scale seen in industry

                  Thanks to Ulrich Schwickerath and Steve Timm for figures
                             Disclaimer: Opinion is not theirs :)
A clue from Industry
• KPMG Survey, “Clarity in the Cloud: Business
  Adoption”

                   Text
                                “I don’t believe everyone yet fully
                               realizes how much this stimulates
                            innovation, how many opportunities will
                            be presented, how many new challenges
                              will need to be addressed, and how
                             much change is coming”, Pat Howard,
                                 VP Global Services, IBM Partner
SaaS
•   1.8 PB transferred in last
    6 months

•   GO on ESnet: 643 TBs in
    last 6 months, 25 sites
    exceeded 3 Gbps.
                                     Text

•   GO for XSEDE: 607 TBs
    transferred, 4 sites
    exceeded 3 Gbps.

•   Leveraged Cloud APIs to
    provide a new service
                                 Thanks to Raj Kettimuthu and Lee Liming for the data
PaaS
•   Azure

•   Amazon Bean Stalk

•   Heroku, PaaS for facebook
    applications

•   Openshift, now open source
    (May 2012)


    PaaS have not really seen much success in the
     scientific community, but could be used to
      create new types of scalable applications
A PaaS for personal
   Condor and GO
• Globus Provision by Borja Sotomayor
• http://www.globus.org/provision/
 [general]
 deploy: ec2
 domains: simple

 [domain-simple]
 users: gp-user
 gridftp: yes
 nis: yes
 filesystem: nfs
 condor: yes
 condor-nodes: 4
                                 This solves my problem
 go-endpoint: go-user#gp-test
 go-auth: go                    from 1999 when I wanted
                                     a batch farm at
 [ec2]
 keypair: gp-key
 keyfile: ~/.ec2/gp-key.pem


                                  hand...utility computing
 username: ubuntu
 ami: latest-32bit
 instance-type: t1.micro

 [globusonline]
 ssh-key: ~/.ssh/id_rsa
IaaS on OSG at Clemson
•   Transform the Grid into
    a Cloud

    •   All sites deploy an
        hypervisor

    •   Provision VMs
        depending on jobs
        queued in batch

    •   Keep normal grid
        workflow

•   Move to Cloud by
                              VTDC08, PDP09, CCGRID09,
                               ICAC10, JGC 10,FGCS10
    offering an “EC2”
    interface
First Gen IaaS@CU
•   Campus firewall

•   NATed cluster

•   Fear or VMs: no bridge network, no NAT even,
    only userland net on the hypervisors

•   Meant:

    •   Developed a pull base task dispatcher (Kestrel,
        XMPP based used by STAR)

    •   Created image in DMZ (See Tony Cass’s talk
        for VM exchange, build trust in VM provenance,
        HEPiX wg on virtualization)

    •   Started VMs as regular batch jobs
        No interactive access
STAR with Kestrel
•   http://wiki.github.com/legastero/Kestrel/

•   Built to deal with Clemson’s “adverse”
    networking environment

•   Started as a student project based on
    the idea that XMPP ( Jabber ) was a
    scalable, production proven messaging
    protocol.

•   Run IM client in VM and send IM
                                                Lots of french non-sense and then: “...To simulate the equivalent
    messages to manage jobs
                                                    sample of 12.2 Billion Monte-Carlo events with ~ 10 Million
•   All VM instances are buddies in a               accepted by event triggering after full event reconstruction, we
                                                    would have taken 3 years at BNL on 50 machines This Monte-
    Jabber server
                                                    Carlo event generation would essentially not have been done.
                                                    With the resources from cloud, we took 3-4 weeks.”
                                                 –Jerome Lauret BNL, STAR
Second Gen IaaS@CU
• Sub-interface created on all nodes (only
  one NIC per node)
• VLAN provisioned to isolate VM traffic
• Bridge networking enabled, VM get address
  via DHCP
• Demonstrated thousands VMs scale.
• Opennebula provisioning + Cumulus S3
  storage to upload images.
Onecloud
•   Opennebula based IaaS at
    Clemson developed through
    the NSF EXTENCI project
    (OSG+XSEDE)

•   Used by STAR (see ACAT
    2010, CHEP 2010)

•   CERNVM can be used as a
    client or a batch image.

•   https://sites.google.com/site/cuonecloud/
Cloud back to Networking:
        Openflow
•   Onecloud, integrates Openflow to provide dynamic network services to solve NAT and
    firewall issues. Developed an implementation of Amazon Security Groups and Elastic IP
    using openflow. Avoids the use of complex and failure prone Network Overlays, once rule
    are set, switch operates at line rate.

•   Software Defined Networking, aims at bringing the control plane of the networking in the
    hands of the developers. Opens the door for network aware applications, dynamic network
    topologies according to load. A low level API for the network, we can now
    program the network

•   Google announced that their network ran using Openflow at the Open Networking
    Summit (http://opennetsummit.org/):

    •   http://www.eetimes.com/electronics-news/4371179/Google-describes-its-OpenFlow-netwo

•   There is work to support MPLS with Openflow, so that any Openflow switch could be
    used as an MPLS switch. This means that OSCARS could be used to provision circuits on
    an openflow network.
3rd Gen Iaas@CU
•   Move OneCloud in the
    “ESNet” Science DMZ

•   Deploy PaaS with OpenShift

•   100Gbps link

•   10Gbps to Amazon via I2
    Commercial peering service

•   Fully configurable via
    Openflow and maybe
    OSCARS ....
                                 See: ARCHSTONE + VNOD
•   Provide on-demand
                                  DOE ASCR funded project
    resources and on-
    demand data paths.           Dimitrios Katramatos (BNL)
Conclusions
•   Virtualization has matured to the point of seeing fruitful
    competition in Cloud IaaS solutions (both academically and in
    industry)

•   Cloud (and APIs) give us great agility to create new services to
    serve the community. Reduce “time to use” of these services
    and sustain scale.

•   Clouds probably fulfilling the true vision of Grids

•   Advanced VM provisioning and network services mean that on-
    demand, elastic data centers are possible today.

•   This work was possible through support from NSF
    OCI-0753335, OCI-1007115 and BMW
Questions ?


• sebgoa@clemson.edu
• http://sites.google.com/site/runseb

More Related Content

Chep2012

  • 1. Middleware Evolution: from Grids to Clouds, a non-HEP Perspective. Dr. Sebastien Goasguen Clemson University
  • 2. A non-exhaustive Middleware timeline... 1999 Purdue University Network Computing Hubs (PUNCH) Kapadia, Fortes, Lundstrom, Figueiredo et al. Powered nanoHUB Virtual file system Shadow Accounts Access to batch queues Interactive applications via VNC
  • 3. A Middleware timeline... The Grid 1999 2001 Purdue University Network Computing Hubs (PUNCH) Powered nanoHUB Virtual file system Shadow Accounts Access to batch queues Interactive applications via VNC
  • 5. “Anatomy of the Grid” • “Why do we also consider application programming interfaces (APIs) and software development kits (SDKs)? There is, of course, more to VOs than interoperability, protocols, and services. Developers must be able to develop sophisticated applications in complex and dynamic execution environments. Users must be able to operate these applications. Application robustness, correctness, development costs, and maintenance costs are all important concerns. Standard abstractions, APIs, and SDKs can accelerate code development, enable code sharing, and enhance application portability. APIs and SDKs are an adjunct to, not an alternative to, protocols.” • “In summary, our approach to Grid architecture emphasizes the identification and definition of protocols and services, first, and APIs and SDKs, second.” • “The anatomy of the grid” Foster, Kesselman, Tuecke, published in 2001
  • 6. A Middleware timeline... The Grid 1999 2001 2003 PUNCH InVIGO Powered nanoHUB Virtual file system Virtual file system Virtual machines Shadow Accounts Overlay Networks Access to batch queues Access to batch queues Interactive applications via VNC Interactive applications via VNC
  • 7. Fortes and InVIGO Figueiredo, circa 2004/2005 • Virtual machines, virtual file system, virtual networks • In 2012: Only ViNE, IPOP remains...maybe because it was created as a single system rather than a composition of services with multiple providers.
  • 8. Virtualization • Create an isolated and portable execution environment which: • Guarantees execution • Isolates users • Hides WAN complexities • Delivers the data where it is needed • Deploys application on-demand
  • 9. A Middleware timeline... The Grid 1999 2001 2003 2004 PUNCH InVIGO Dynamic Virtual Environments Kate Keahey, mother of Nimbus
  • 10. A Middleware timeline... Eucalyptus, UCSB Nimbus, UC Opennebula, Madrid Openstack The Grid Cloudstack 1999 2001 2003 2004 2008-2012 PUNCH InVIGO DVE
  • 11. A Middleware timeline... Eucalyptus, UCSB Nimbus, UC Opennebula, Madrid Openstack The Grid Cloudstack 1999 2001 2003 2004 2008-2012 PUNCH InVIGO DVE Virtual Organization Clusters Clusters of Virtual Machines provisioned on the grid to create a personal condor cluster.
  • 12. A Middleware timeline... Eucalyptus, UCSB Nimbus, UC Opennebula, Madrid Openstack The Grid Cloudstack 1999 2001 2003 2004 2008-2012 PUNCH InVIGO DVE Virtual Organization Clusters Clusters of Virtual Machines provisioned on the grid to create a personal condor cluster.
  • 13. Google trends VOCs • Cloud computing trending down, while “Big Data” is booming. Virtualization remains “constant”.
  • 14. Careful, Head Winds Ahead • Cloud Computing Going down to the “through of Disillusionment” • “Big Data” on the Technology Trigger
  • 15. Clouds are in Production • Amazon Web Services (AWS), reported to reach $1B business in 2012. • http://www.geekwire.com/2011/amazon-web-services-billiondollar-business/ • Zynga, reportedly spent $100M per year on AWS. Moved to their own cloud (zcloud). Used to deploy on EC2 and do reverse cloud bursting. Now “owning the base and renting the peak”. Zynga can add as many as 1,000 new servers to accommodate a surge of users in a 24-hour period. The company’s servers can deliver a petabyte of data to users each day. • http://www.wired.com/cloudline/2012/03/zynga-zcloud/
  • 16. Proven scalability • Cyclecloud provisioned a 50,000 cores on EC2 (April 2012). • LXCLOUD@CERN demonstrated management of 16,000 virtual machines using Opennebula (Summer 2010). • Cloudstack (now an Apache incubator project) planning scalability to 50,000 hypervisors by the end of 2012. • Hadoop scales to 30 PB (at Facebook, ~March 2011) • Q1 2012, Amazon S3 was 905 Billion objects, routinely accessed at 650,000 requests per second.
  • 17. Yet, We do not seem to embrace it • 50 hosts in LXCLOUD running VMs for batch process • FermiCloud, for internal use only, ~200 VMs • Pales in comparison to the scale seen in industry Thanks to Ulrich Schwickerath and Steve Timm for figures Disclaimer: Opinion is not theirs :)
  • 18. A clue from Industry • KPMG Survey, “Clarity in the Cloud: Business Adoption” Text “I don’t believe everyone yet fully realizes how much this stimulates innovation, how many opportunities will be presented, how many new challenges will need to be addressed, and how much change is coming”, Pat Howard, VP Global Services, IBM Partner
  • 19. SaaS • 1.8 PB transferred in last 6 months • GO on ESnet: 643 TBs in last 6 months, 25 sites exceeded 3 Gbps. Text • GO for XSEDE: 607 TBs transferred, 4 sites exceeded 3 Gbps. • Leveraged Cloud APIs to provide a new service Thanks to Raj Kettimuthu and Lee Liming for the data
  • 20. PaaS • Azure • Amazon Bean Stalk • Heroku, PaaS for facebook applications • Openshift, now open source (May 2012) PaaS have not really seen much success in the scientific community, but could be used to create new types of scalable applications
  • 21. A PaaS for personal Condor and GO • Globus Provision by Borja Sotomayor • http://www.globus.org/provision/ [general] deploy: ec2 domains: simple [domain-simple] users: gp-user gridftp: yes nis: yes filesystem: nfs condor: yes condor-nodes: 4 This solves my problem go-endpoint: go-user#gp-test go-auth: go from 1999 when I wanted a batch farm at [ec2] keypair: gp-key keyfile: ~/.ec2/gp-key.pem hand...utility computing username: ubuntu ami: latest-32bit instance-type: t1.micro [globusonline] ssh-key: ~/.ssh/id_rsa
  • 22. IaaS on OSG at Clemson • Transform the Grid into a Cloud • All sites deploy an hypervisor • Provision VMs depending on jobs queued in batch • Keep normal grid workflow • Move to Cloud by VTDC08, PDP09, CCGRID09, ICAC10, JGC 10,FGCS10 offering an “EC2” interface
  • 23. First Gen IaaS@CU • Campus firewall • NATed cluster • Fear or VMs: no bridge network, no NAT even, only userland net on the hypervisors • Meant: • Developed a pull base task dispatcher (Kestrel, XMPP based used by STAR) • Created image in DMZ (See Tony Cass’s talk for VM exchange, build trust in VM provenance, HEPiX wg on virtualization) • Started VMs as regular batch jobs No interactive access
  • 24. STAR with Kestrel • http://wiki.github.com/legastero/Kestrel/ • Built to deal with Clemson’s “adverse” networking environment • Started as a student project based on the idea that XMPP ( Jabber ) was a scalable, production proven messaging protocol. • Run IM client in VM and send IM Lots of french non-sense and then: “...To simulate the equivalent messages to manage jobs sample of 12.2 Billion Monte-Carlo events with ~ 10 Million • All VM instances are buddies in a accepted by event triggering after full event reconstruction, we would have taken 3 years at BNL on 50 machines This Monte- Jabber server Carlo event generation would essentially not have been done. With the resources from cloud, we took 3-4 weeks.” –Jerome Lauret BNL, STAR
  • 25. Second Gen IaaS@CU • Sub-interface created on all nodes (only one NIC per node) • VLAN provisioned to isolate VM traffic • Bridge networking enabled, VM get address via DHCP • Demonstrated thousands VMs scale. • Opennebula provisioning + Cumulus S3 storage to upload images.
  • 26. Onecloud • Opennebula based IaaS at Clemson developed through the NSF EXTENCI project (OSG+XSEDE) • Used by STAR (see ACAT 2010, CHEP 2010) • CERNVM can be used as a client or a batch image. • https://sites.google.com/site/cuonecloud/
  • 27. Cloud back to Networking: Openflow • Onecloud, integrates Openflow to provide dynamic network services to solve NAT and firewall issues. Developed an implementation of Amazon Security Groups and Elastic IP using openflow. Avoids the use of complex and failure prone Network Overlays, once rule are set, switch operates at line rate. • Software Defined Networking, aims at bringing the control plane of the networking in the hands of the developers. Opens the door for network aware applications, dynamic network topologies according to load. A low level API for the network, we can now program the network • Google announced that their network ran using Openflow at the Open Networking Summit (http://opennetsummit.org/): • http://www.eetimes.com/electronics-news/4371179/Google-describes-its-OpenFlow-netwo • There is work to support MPLS with Openflow, so that any Openflow switch could be used as an MPLS switch. This means that OSCARS could be used to provision circuits on an openflow network.
  • 28. 3rd Gen Iaas@CU • Move OneCloud in the “ESNet” Science DMZ • Deploy PaaS with OpenShift • 100Gbps link • 10Gbps to Amazon via I2 Commercial peering service • Fully configurable via Openflow and maybe OSCARS .... See: ARCHSTONE + VNOD • Provide on-demand DOE ASCR funded project resources and on- demand data paths. Dimitrios Katramatos (BNL)
  • 29. Conclusions • Virtualization has matured to the point of seeing fruitful competition in Cloud IaaS solutions (both academically and in industry) • Cloud (and APIs) give us great agility to create new services to serve the community. Reduce “time to use” of these services and sustain scale. • Clouds probably fulfilling the true vision of Grids • Advanced VM provisioning and network services mean that on- demand, elastic data centers are possible today. • This work was possible through support from NSF OCI-0753335, OCI-1007115 and BMW
  • 30. Questions ? • sebgoa@clemson.edu • http://sites.google.com/site/runseb