SlideShare a Scribd company logo
Developing Cyberinfrastructure to Support Computational Chemistry WorkflowsMarlon Pierce (IU), Suresh Marru (IU), SudhakarPamidighantam (NCSA) SashikiranChalla (IU), Ye Fan (NCSA/IU), PatanachaiTangchaisin (IU)
Part 1: Reusable Middleware for OREChemServices and workflows for OREChem
Microsoft Research’s ORECHEM Project“A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.”http://research.microsoft.com/en-us/projects/orechem/3
Bibliographic metadata
Citations
Figures
Tables
Chunks
Reactions
Molecular Compounds
NMR Spectra and Structural Data
Experiment data SouthamptonPSUCambridgeIndianaWorkflows, TeraGrid
servicesTriplestoreOn Azure CloudA not particularly accurate summary of OREChem4
IU’s ObjectiveTo build a pipeline to:Fetch ORE ATOM feedsTransform ATOM feeds into triples and store them into a triple store ( Using GRDDL/Saxon HE)Extract crystallographically obtained 3D coordinates informationSubmit compute intensive electronic structure calculations, geometry optimization tasks to tools like Gaussian09 on TeraGrid supercomputing resources.Transform the Gaussian output into triples and store them into a triple store5
OREChem-Computation WorkflowConvert CML to Gaussian Input format Extract Moiety feeds in CML formatGaussian on TeraGridMoiety filesGaussian Output to RDF triplesATOM Feeds from eCrystals or CrystalEyeN3 files or RDF/XMLTriplestore6
ORECHEM REST Services7
ORECHEM REST Serviceshttp://gf18.ucs.indiana.edu:8146/FeedsHarvester/cml3d/csv?harvester=moiety&numofentries=5http://gf18.ucs.indiana.edu:8146/CML2GaussianSemCompChem/gauss/inputgenerator8
9OREChem Workflow in XBaya
Part 2: Computational Chemistry MiddlewareReusing software from the Open Gateway Computing Environments (OGCE) Project
What Is a Science Gateway?User Interface and supporting Web services to scientific applications, data sets, and resources running on cyberinfrastructure.Science portals, Grid Computing Environments, …Broaden and simplify usageCyberinfrastructure: Distributed computing resources and overlaying middleware for scientific computing.Prominent examples include TeraGrid, Open Science GridMiddleware includes Globus, Condor, iRods/SRB, …Some of these approaches being pushed by scientific cloud computingThat is another topic
TeraGrid is one of the largest investments in shared CI from NSF’s Office of CyberinfrastructureSoon to become TeraGrid/XDDedicated high-speed, cross—country networkStaff & Advanced Support20 Petabytes Storage2 PetaFLOPSComputationVisualization
Computational Chemistry GridHas a long history (S. Pamidighantam)Started in 1998as Quantum Chemistry WorkbenchEvolved into ChemVizin NCSA Expedition EraA pioneer of the TeraGrid Science Gateway and Community Account conceptsManages software installations and licensing as well as middlewareCurrently in two incarnationsGridChem- Science Gateway for Molecular SciencesProduction gateway ParamChem– Automatic Parameterization of Molecular MechanicsInfrastructure research built on GridChem
GridChem Science GatewaySupported Applications Gaussian, CHARMM, NWChem, GAMESS, Molpro, QMCPack, MD Amber, ACES, NAMD, Wien2K, Gromacs, CastepUsage Statistics (December 2010)431 Distinct Users 37,500 Computational jobs’ metadata in DBOver 2,000,000 Service Units consumedTracked over 50 peer reviewed publicationsReportable metrics are an important issue
Simplified GridChem ArchitectureGaussian, GAMES & Other Molecular Editors & Input GeneratorsGaussian, CHARMM, NWChem, GAMESS, NAMD, Amber …Configure InputsJob Managers& Data Movement Interfaces GridChem ClientOGCE/GridChem MiddlewareMonitorResourcesManage JobsSubmit & Monitor JobsDownload OutputOutput Analysis & Visualization
Sample GridChem Post Processing
Collaborations with Open Gateway Computing EnvironmentsThe OGCE has several general purpose tools that are being phased into GridChem’s production middleware.XBaya: Graphical composition and execution of sequence of tasks.Workflow Interpreter Service and GFACSupports long running executions and asynchronous invocations.Stop, rewind, and replay executions.Support parametric sweeps of workflows.Integrate human interactions into workflow executions.
OGCE-Generalized GridChem InfrastructureJava CoGAbstractionTeraGrid/XDGlobusMolecular Editors & Input GeneratorsDRMAA & SSH UtilitiesCampus ResourcesEuropean GridsAmazon, EucalyptusEC2 InterfaceUnicore, Open NebulaCondor, SSH, (SLURM)OGCE Workflow & Job ManagementGridChem ClientCloud API’sOther Grid MiddlewareOutput Analysis and Visualization(Requirements Driven)

More Related Content

PNNL April 2011 ogce

  • 1. Developing Cyberinfrastructure to Support Computational Chemistry WorkflowsMarlon Pierce (IU), Suresh Marru (IU), SudhakarPamidighantam (NCSA) SashikiranChalla (IU), Ye Fan (NCSA/IU), PatanachaiTangchaisin (IU)
  • 2. Part 1: Reusable Middleware for OREChemServices and workflows for OREChem
  • 3. Microsoft Research’s ORECHEM Project“A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.”http://research.microsoft.com/en-us/projects/orechem/3
  • 11. NMR Spectra and Structural Data
  • 13. servicesTriplestoreOn Azure CloudA not particularly accurate summary of OREChem4
  • 14. IU’s ObjectiveTo build a pipeline to:Fetch ORE ATOM feedsTransform ATOM feeds into triples and store them into a triple store ( Using GRDDL/Saxon HE)Extract crystallographically obtained 3D coordinates informationSubmit compute intensive electronic structure calculations, geometry optimization tasks to tools like Gaussian09 on TeraGrid supercomputing resources.Transform the Gaussian output into triples and store them into a triple store5
  • 15. OREChem-Computation WorkflowConvert CML to Gaussian Input format Extract Moiety feeds in CML formatGaussian on TeraGridMoiety filesGaussian Output to RDF triplesATOM Feeds from eCrystals or CrystalEyeN3 files or RDF/XMLTriplestore6
  • 19. Part 2: Computational Chemistry MiddlewareReusing software from the Open Gateway Computing Environments (OGCE) Project
  • 20. What Is a Science Gateway?User Interface and supporting Web services to scientific applications, data sets, and resources running on cyberinfrastructure.Science portals, Grid Computing Environments, …Broaden and simplify usageCyberinfrastructure: Distributed computing resources and overlaying middleware for scientific computing.Prominent examples include TeraGrid, Open Science GridMiddleware includes Globus, Condor, iRods/SRB, …Some of these approaches being pushed by scientific cloud computingThat is another topic
  • 21. TeraGrid is one of the largest investments in shared CI from NSF’s Office of CyberinfrastructureSoon to become TeraGrid/XDDedicated high-speed, cross—country networkStaff & Advanced Support20 Petabytes Storage2 PetaFLOPSComputationVisualization
  • 22. Computational Chemistry GridHas a long history (S. Pamidighantam)Started in 1998as Quantum Chemistry WorkbenchEvolved into ChemVizin NCSA Expedition EraA pioneer of the TeraGrid Science Gateway and Community Account conceptsManages software installations and licensing as well as middlewareCurrently in two incarnationsGridChem- Science Gateway for Molecular SciencesProduction gateway ParamChem– Automatic Parameterization of Molecular MechanicsInfrastructure research built on GridChem
  • 23. GridChem Science GatewaySupported Applications Gaussian, CHARMM, NWChem, GAMESS, Molpro, QMCPack, MD Amber, ACES, NAMD, Wien2K, Gromacs, CastepUsage Statistics (December 2010)431 Distinct Users 37,500 Computational jobs’ metadata in DBOver 2,000,000 Service Units consumedTracked over 50 peer reviewed publicationsReportable metrics are an important issue
  • 24. Simplified GridChem ArchitectureGaussian, GAMES & Other Molecular Editors & Input GeneratorsGaussian, CHARMM, NWChem, GAMESS, NAMD, Amber …Configure InputsJob Managers& Data Movement Interfaces GridChem ClientOGCE/GridChem MiddlewareMonitorResourcesManage JobsSubmit & Monitor JobsDownload OutputOutput Analysis & Visualization
  • 25. Sample GridChem Post Processing
  • 26. Collaborations with Open Gateway Computing EnvironmentsThe OGCE has several general purpose tools that are being phased into GridChem’s production middleware.XBaya: Graphical composition and execution of sequence of tasks.Workflow Interpreter Service and GFACSupports long running executions and asynchronous invocations.Stop, rewind, and replay executions.Support parametric sweeps of workflows.Integrate human interactions into workflow executions.
  • 27. OGCE-Generalized GridChem InfrastructureJava CoGAbstractionTeraGrid/XDGlobusMolecular Editors & Input GeneratorsDRMAA & SSH UtilitiesCampus ResourcesEuropean GridsAmazon, EucalyptusEC2 InterfaceUnicore, Open NebulaCondor, SSH, (SLURM)OGCE Workflow & Job ManagementGridChem ClientCloud API’sOther Grid MiddlewareOutput Analysis and Visualization(Requirements Driven)
  • 28. ParamChem OverviewCollaboration between University of Maryland, NCSA, University of Kentucky, University of FloridaGoal: automate the process of parameterization for classical molecular mechanics and semi-empirical methodsThese are realized as parameter sweeps of workflows.Results disseminated through GridChem data management toolsCoupled execution of Quantum Chemistry and Molecular Mechanics.OGCE partners with ParamChem through the NSF SDCI program to provide workflow and job management middleware.Dynamics applications with optimization algorithms are being constructed as workflow chains.Workflow chains are submitted as part of parametric sweepsIn progress
  • 29. Empirical ForceFields Parameterization Need ProcessLack of Accurate Force Fields Produce Erroneous Property EstimationVanommeslaeghe et al. J. Comp.Chem2010, 31, 671-690
  • 32. Part 3: Developing Sustainable Science Gateway SoftwareThe Open Gateway Computing Environments Project and Apache Software Foundation
  • 33. OGCE SoftwareWe try very hard to keep software scope under control. We don’t build data management systems, for example. We collaborate with groups who do.
  • 34. OGCE Funds Software LifecycleObvious but new of NSF as it becomes more interested in sustaining its research investments.
  • 35. Apache IncubatorsJoining Apache is our software sustainability strategyOpen source licensing, meritocracy, visibilityApache’s community development model is our experimentMore important than simply being open source.Need to go beyond SourceForgeDistributed control, distributed credit.Airavata: tools for science gateway services and workflowsXBaya, GFAC, Messenger, XRegistryCollaboration with WS02/LSF, IBMBuilds on Apache Axis2, Apache ODE, (Apache Hadoop)Rave: OpenSocial gadget manager, general purpose gadgetsCollaboration with Hippo, Mitre, SURFnetBuilds on Apache Shindig
  • 36. More InformationOGCE Web Site: http://www.collab-ogce.orgNews Feed/Blog: http://collab-ogce.blogspot.comContact us: ogce-discuss@googlegroups.comhttp://groups.google.com/group/ogce-discuss/Software Downloads: Software is available via SVN from our SourceForge project. http://sourceforge.net/projects/ogce/ See http://www.collab-ogce.org/ogce/index.php/Portal_download