SlideShare a Scribd company logo
Integrating Chemistry Scholarship with Web Architectures, Grid Computing and Semantic WebSashiKiranChalla, Marlon Pierce, Suresh MarruIndiana University, Bloomington
Microsoft Research’s ORECHEM Project“A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.”http://research.microsoft.com/en-us/projects/orechem/2
OAI-ORE and ORE-Chem	Open Archive Initiative – Object Reuse and Exchangedefines standards for the description and exchange of aggregations of Web resources.based around the ORE-Model which introduces the Resource Map (ReM) that makes it possible to associate an identity with aggregations of resources and make assertions about their structure and semantics.ReMs are expressed in ATOM/XML, RDF/XML, n3, turtle formats.We want to use, extend this to describe all aspects of crystallography experimentsPublication links and metadata, data,  3
Bibliographic metadata
Citations
Figures
Tables
Chunks
Reactions
Molecular Compounds
NMR Spectra and Structural Data
Experiment data SouthamptonPSUCambridgeIndianaWorkflows, TeraGrid
servicesTriplestoreOn Azure CloudCarl Lagoze’s OreCHEM eScience Presentation Slides 4
Our ObjectiveTo build a pipeline to:Fetch ATOM feedsTransform ATOM feeds into triples and store them into a triple store ( Using GRDDL/Saxon HE)Extract Crystallographically obtained 3D coordinates informationSubmit compute intensive electronic structure calculations, geometry optimization tasks to tools like Gaussian09 on TeraGrid.Transform the Gaussian output into triples and store them into a triple store5
OREChem-Computation WorkflowConvert CML to Gaussian Input format Extract Moiety feeds in CML formatGaussian on TeraGridMoiety filesGaussian Output to RDF triplesATOM Feeds from eCrystals or CrystalEyeN3 files or RDF/XMLTriplestore6ImplementedYet to ImplementFrom Partners
RESTful Web servicesREST is the way the Web already works.
URI for a resource.
HTTP GET/POST/PUT/DELETE
Very easy to build one using Java APIs (JAX-RS Jersey (server & client))7
Jersey Skeleton Methods@Singleton@Path("/cml3d")public class MoietyHarvester{	@GET  @Path("/csv")	@Produces("text/plain”)public Stringharvestfeeds(@QueryParam("harvester") String harvester,@DefaultValue("10") @QueryParam("numofentries") String num_entries){.........}@GET @Path("/json")	@Produces("application/json")publicJSONArrayharvestfeedsJSON(@QueryParam("harvester") String harvester,@DefaultValue("10") @QueryParam("numofentries") String num_entries){..........}}http://gf18.ucs.indiana.edu/FeedsHarvester/cml3d/csv?parametershttp://gf18.ucs.indiana.edu/FeedsHarvester/cml3d/json?parameters8
ORECHEM REST Services9
ORECHEM REST Serviceshttp://gf18.ucs.indiana.edu:8146/FeedsHarvester/cml3d/csv?harvester=moiety&numofentries=5http://gf18.ucs.indiana.edu:8146/CML2GaussianSemCompChem/gauss/inputgenerator10
Testing Servicespublic class JerseyClient{public static void main(String[] args) {Client client = Client.create();WebResource cml2gauss = client.resource ( " "+"http://localhost:8080" +"/CML2GaussianSemCompChem/gauss/inputgenerator“ );		String cmlfileURL= "http://gridfarm018.ucs.indiana.edu/" + "orechem/moieties/ic0620900sup1_comp9_” + moiety_1.complete.cml.xml";		 String gaussURL = cml2gauss.accept(MediaType.TEXT_PLAIN_TYPE,MediaType.APPLICATION_XML_TYPE).post(String.class,cmlfileURL);System.out.println(gaussURL);	}}11Jersey Client API
TeraGrid12
13OREChem Workflow in XBaya
Triple StoreA triple store is framework used for storing and querying RDF data. It provides a mechanism for persistent storage and access of RDF graphs. 	Commercial: Allegrograph, BigOWLIM, 				Virtuoso	Open Source: Jena SDB, Sesame, 					Virtuoso, Intellidimension14
Virtuoso Triple StoreORDBMS extended into a Triple store.Command line loaders; isql utility (interactive sql access to a database)Support for SPARQL and web server to perform SPARQL queries Uploading of data over HTTP, WEBDAV browser.15http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFWP
What’s in Triple StoreRDF GraphExperiments performed on a particular crystalJournal articles containing this crystal (research groups working with the crystal)Moieties in the crystal, their energies geometries, vibrational frequencies, etc.All this information in the triple store can be queried on, using a single GRAPH IRI.16
Virtuoso Triple Store GRAPH IRI : used to perform sparql query on the RDF triples.	* Unique for every file uploaded. http://local.virt/DAV/home/schalla/rdf_sink/oreatomfeed_102.rdf	* A common GRAPH IRI for all the data uploaded into rdf_sink .	(virt:rdf_graph, virt:rdf_sponger)http://localhost:8890/DAV/home/schalla/rdf_sink/17
Future WorkReal future work (through Dec 2010)Use OGCE workflow interpreter engine to run workflow as a service.Integrate with simple visualization services (JMOL).Store input and output URLs persistently in the triple store.Anticipating higher level services.Better support for REST services in OGCE GFAC and XBayaHopeful future work (next year)Integrate with services from GridChem/ParamChemHandle larger scale job submissionDevelop a full gateway for public browsing and retrieval.Investigate push-style publish/subscribe solutions for notifications.Great deal of JMS and Web Service experience with this, but very scalable REST messaging for RSS/Atom is comingPubsubhubbub and Twitter live feeds for example.  OGCE Messaging system prototyped with REST interfaces for small iPlant collaboration.18
Come by the IU booth for more information on OGCE tools used here.Mini-symposium: 10-12 noon on TuesdayInteractive presentations all week at the flat screen kiosk.NCSA walkup demos: 1-2 PM on WednesdaySource code for our ORE-Chem services is available from SourceForgeContact: mpierce@cs.indiana.edu19More Information
Thank You20
Future WorkGoogle’s PubSubHubbub : 	As soon as a feed is published, hub notifies the subscriber. Thus get the new entry and start the pipeline.PublisherHubSubscriberhttp://code.google.com/p/pubsubhubbub/21
Questions ??22
ATOM to RDF/XMLGRDDL Transformation: (Jena GRDDL Reader)GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. atom-grddl.xsl - XSLT stylesheet	GRDDLReader grddl=new GRDDLReader();	grddl.read (defaultmodel, atomfeedURL);GRDDL  W3C documentation: http://www.w3.org/TR/grddl/23
24ORE Representation of an Aggregation of a Moiety in Turtle format
ATOM to RDF/XMLSaxon XSLT Tranformation :ByteArrayOutputStreamtransformOutputStream = new ByteArrayOutputStream();TransformerFactory factory = TransformerFactory.newInstance();StreamSourcexslSource = new StreamSource(xslstream);StreamSourcexmlSource = new StreamSource(atomstream);StreamResultoutResult = new StreamResult(transformOutputStream);	Transformer transformer = factory.newTransformer(xslSource);transformer.transform(xmlSource, outResult);transformOutputStream.close();25
OGCE-Workflow SuiteTools to wrap command-line applications as light weight web services, compose workflows from those web services and, execute and monitor the workflows.1) GFAC : allows users to wrap any command-line application as a web service.2) XRegistry :XRegistry is the information repository of the workflow suite enabling users to register, search and access application service and workflow deployment descriptions.	3) XBaya :Java webstart workflow composer. Used for composing workflows from web services created by the GFAC, and running and monitoring those workflows.Open Grid Computing Environments Wiki  http://www.collab-ogce.org/ogce/index.php/Workflow26
27

More Related Content

OREChem Services and Workflows

  • 1. Integrating Chemistry Scholarship with Web Architectures, Grid Computing and Semantic WebSashiKiranChalla, Marlon Pierce, Suresh MarruIndiana University, Bloomington
  • 2. Microsoft Research’s ORECHEM Project“A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.”http://research.microsoft.com/en-us/projects/orechem/2
  • 3. OAI-ORE and ORE-Chem Open Archive Initiative – Object Reuse and Exchangedefines standards for the description and exchange of aggregations of Web resources.based around the ORE-Model which introduces the Resource Map (ReM) that makes it possible to associate an identity with aggregations of resources and make assertions about their structure and semantics.ReMs are expressed in ATOM/XML, RDF/XML, n3, turtle formats.We want to use, extend this to describe all aspects of crystallography experimentsPublication links and metadata, data, 3
  • 11. NMR Spectra and Structural Data
  • 13. servicesTriplestoreOn Azure CloudCarl Lagoze’s OreCHEM eScience Presentation Slides 4
  • 14. Our ObjectiveTo build a pipeline to:Fetch ATOM feedsTransform ATOM feeds into triples and store them into a triple store ( Using GRDDL/Saxon HE)Extract Crystallographically obtained 3D coordinates informationSubmit compute intensive electronic structure calculations, geometry optimization tasks to tools like Gaussian09 on TeraGrid.Transform the Gaussian output into triples and store them into a triple store5
  • 15. OREChem-Computation WorkflowConvert CML to Gaussian Input format Extract Moiety feeds in CML formatGaussian on TeraGridMoiety filesGaussian Output to RDF triplesATOM Feeds from eCrystals or CrystalEyeN3 files or RDF/XMLTriplestore6ImplementedYet to ImplementFrom Partners
  • 16. RESTful Web servicesREST is the way the Web already works.
  • 17. URI for a resource.
  • 19. Very easy to build one using Java APIs (JAX-RS Jersey (server & client))7
  • 20. Jersey Skeleton Methods@Singleton@Path("/cml3d")public class MoietyHarvester{ @GET @Path("/csv") @Produces("text/plain”)public Stringharvestfeeds(@QueryParam("harvester") String harvester,@DefaultValue("10") @QueryParam("numofentries") String num_entries){.........}@GET @Path("/json") @Produces("application/json")publicJSONArrayharvestfeedsJSON(@QueryParam("harvester") String harvester,@DefaultValue("10") @QueryParam("numofentries") String num_entries){..........}}http://gf18.ucs.indiana.edu/FeedsHarvester/cml3d/csv?parametershttp://gf18.ucs.indiana.edu/FeedsHarvester/cml3d/json?parameters8
  • 23. Testing Servicespublic class JerseyClient{public static void main(String[] args) {Client client = Client.create();WebResource cml2gauss = client.resource ( " "+"http://localhost:8080" +"/CML2GaussianSemCompChem/gauss/inputgenerator“ ); String cmlfileURL= "http://gridfarm018.ucs.indiana.edu/" + "orechem/moieties/ic0620900sup1_comp9_” + moiety_1.complete.cml.xml"; String gaussURL = cml2gauss.accept(MediaType.TEXT_PLAIN_TYPE,MediaType.APPLICATION_XML_TYPE).post(String.class,cmlfileURL);System.out.println(gaussURL); }}11Jersey Client API
  • 26. Triple StoreA triple store is framework used for storing and querying RDF data. It provides a mechanism for persistent storage and access of RDF graphs. Commercial: Allegrograph, BigOWLIM, Virtuoso Open Source: Jena SDB, Sesame, Virtuoso, Intellidimension14
  • 27. Virtuoso Triple StoreORDBMS extended into a Triple store.Command line loaders; isql utility (interactive sql access to a database)Support for SPARQL and web server to perform SPARQL queries Uploading of data over HTTP, WEBDAV browser.15http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFWP
  • 28. What’s in Triple StoreRDF GraphExperiments performed on a particular crystalJournal articles containing this crystal (research groups working with the crystal)Moieties in the crystal, their energies geometries, vibrational frequencies, etc.All this information in the triple store can be queried on, using a single GRAPH IRI.16
  • 29. Virtuoso Triple Store GRAPH IRI : used to perform sparql query on the RDF triples. * Unique for every file uploaded. http://local.virt/DAV/home/schalla/rdf_sink/oreatomfeed_102.rdf * A common GRAPH IRI for all the data uploaded into rdf_sink . (virt:rdf_graph, virt:rdf_sponger)http://localhost:8890/DAV/home/schalla/rdf_sink/17
  • 30. Future WorkReal future work (through Dec 2010)Use OGCE workflow interpreter engine to run workflow as a service.Integrate with simple visualization services (JMOL).Store input and output URLs persistently in the triple store.Anticipating higher level services.Better support for REST services in OGCE GFAC and XBayaHopeful future work (next year)Integrate with services from GridChem/ParamChemHandle larger scale job submissionDevelop a full gateway for public browsing and retrieval.Investigate push-style publish/subscribe solutions for notifications.Great deal of JMS and Web Service experience with this, but very scalable REST messaging for RSS/Atom is comingPubsubhubbub and Twitter live feeds for example. OGCE Messaging system prototyped with REST interfaces for small iPlant collaboration.18
  • 31. Come by the IU booth for more information on OGCE tools used here.Mini-symposium: 10-12 noon on TuesdayInteractive presentations all week at the flat screen kiosk.NCSA walkup demos: 1-2 PM on WednesdaySource code for our ORE-Chem services is available from SourceForgeContact: mpierce@cs.indiana.edu19More Information
  • 33. Future WorkGoogle’s PubSubHubbub : As soon as a feed is published, hub notifies the subscriber. Thus get the new entry and start the pipeline.PublisherHubSubscriberhttp://code.google.com/p/pubsubhubbub/21
  • 35. ATOM to RDF/XMLGRDDL Transformation: (Jena GRDDL Reader)GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. atom-grddl.xsl - XSLT stylesheet GRDDLReader grddl=new GRDDLReader(); grddl.read (defaultmodel, atomfeedURL);GRDDL W3C documentation: http://www.w3.org/TR/grddl/23
  • 36. 24ORE Representation of an Aggregation of a Moiety in Turtle format
  • 37. ATOM to RDF/XMLSaxon XSLT Tranformation :ByteArrayOutputStreamtransformOutputStream = new ByteArrayOutputStream();TransformerFactory factory = TransformerFactory.newInstance();StreamSourcexslSource = new StreamSource(xslstream);StreamSourcexmlSource = new StreamSource(atomstream);StreamResultoutResult = new StreamResult(transformOutputStream); Transformer transformer = factory.newTransformer(xslSource);transformer.transform(xmlSource, outResult);transformOutputStream.close();25
  • 38. OGCE-Workflow SuiteTools to wrap command-line applications as light weight web services, compose workflows from those web services and, execute and monitor the workflows.1) GFAC : allows users to wrap any command-line application as a web service.2) XRegistry :XRegistry is the information repository of the workflow suite enabling users to register, search and access application service and workflow deployment descriptions. 3) XBaya :Java webstart workflow composer. Used for composing workflows from web services created by the GFAC, and running and monitoring those workflows.Open Grid Computing Environments Wiki http://www.collab-ogce.org/ogce/index.php/Workflow26
  • 39. 27
  • 40. Experiments, Protocols ???(Experimental Data)Moieties’, their energies, latent heats of fusion, vibrational frequencies ?(Molecular Properties,etc)Who ? Where ? When ?(Bibliographic Data) 28
  • 41. Microsoft Research’s ORECHEM Project“A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.”http://research.microsoft.com/en-us/projects/orechem/29
  • 42. 30ORE representation of a Resource Map in Turtle format
  • 44. 32Moiety and its 3D co-ordinates.every atom & it’s X,Y,Z co-ordinates.Currently ~30000 moieties in Crystal Eye Repositorybond order , Smiles & InChI representations
  • 45. OGCE-Workflow SuiteOGCE Workflow Toolkit for Multi-Disciplinary Science Applications, Suresh Marru’s Presentation.33
  • 47. AcknowledgementsDr. Marlon PierceAssistant Director,Community Grid Labs, Pervasive Technology Institute,Indiana UniversityDr. David J.WildAssistant Professor of Informatics & ComputingDirector of Cheminformatics ProgramSchool of Informatics and Computing, Indiana UniversityOrechem Group : Dr. Carl Lagoze(Cornell University), Dr. Peter Murray Rust, Nick Day, Jim Downing (University of Cambridge), Mark Borkum(University of Southampton), Na Li (Penn State), Alex, Lee Dirks (Microsoft Research)Suresh MarruResearch Scientist,Pervasive Technology Institute,Indiana UniversityJaliyaEkanayake, Scott Beason, All the members in Pervasive Technology Institute35
  • 48. Future WorkWrap the tool that generates triples from gaussian output, into a REST service.Install Virtuoso triple store on the Azure cloud.Fetch & process the feeds from Southampton, Penn State.36
  • 49. 37Moiety and its 3D co-ordinates.every atom & it’s X,Y,Z co-ordinates.Currently ~30000 moieties in Crystal Eye Repositorybond order , Smiles & InChI representations
  • 50. 38ORE representation of a Resource Map in Turtle format
  • 51. Virtuoso Triple StoreWindows and Linux versions are installed and tested. Currently Linux version being used.Conductor: http://gf18.ucs.indiana.edu:8890/conductorSparql endpoint : http://gf18.ucs.indiana.edu:8890/sparqlImplementing a SPARQL compliant RDF Triple Store using a SQL-ORDBMS. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFWP39