SlideShare a Scribd company logo
Semantic web technologies applied to bioinformatics and laboratory data management Toni Hermoso Pulido [email_address]
Bioinformatics Core Facility
http://biocore.crg.cat
THE CLASSICAL WEB >  Syntax Markup languages (HTML, XHTML, etc.) >  Content Text inside the tags (or as attributes) >  Style HTML tags themselves
CSS (in content or as external files) Robert Cailliau WWW fomer logo Tim Berners-Lee, Robert Cailiau. CERN (1990)
THE CLASSICAL WEB
WEB 2.0 >  Buzz word. First coinage associated to Tim O'Reilly. >  The term "Web 2.0" (2004–present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web.  >  Examples of Web 2.0 include web-based communities, hosted services, web applications, social-networking sites, video-sharing sites, wikis, blogs, mashups, and folksonomies. >  AJAX, RSS, Web APIs…
wikis may allow anyone to edit
wikis are intended to be easy to use
wiki content is easy to link
wikis support tracking of all changes
wikis may allow upload different media Wiki – Wiki ! WikiWikiWeb. Ward Cunningham - 1994
MediaWiki >  Most popular wiki software >  Behind Wikimedia Foundation. >  The most know implementation is: Wikipedia http://www.wikipedia.org First version 2002. Wikipedia before UseModWiki (Perl Wiki)
Gene Wiki: Gene annotation project in Wikipedia http://en.wikipedia.org/wiki/Portal:Gene_Wiki >  Approach rellevant human genes information to end-users >  Manual collaborative annotation & automated external reference thanks to robot software >  Wikipedia portal within Molecular and Cellular Biology Project Published September 2009
Gene Wiki: Gene annotation project in Wikipedia
GENE WIKI >  Example of a wiki page Reelin
GENE WIKI >  Example of a wiki category page Human proteins
GENE WIKI >  Example of a wiki source page: Reelin
GENEWIKI >  Example of a wiki template page: Reelin
Web parsing / scraping >  To get information from a HTML source (wiki included) Download tools:  Lynx
Wget
Perl LWP
Perl WWW::Mechanize
Python Beautiful Soap…
Web parsing / scraping >  Processing content. (example, EC: 3.4.21.-) Regular expressions s/<a href=&quot;httpwww.genome.jpdbget-binwww_bget?enzyme+(+)?&quot;/g Xpath id('bodyContent')/x:table[2]/x:tbody/x:tr[8]/x:td/x:span/x:a
MediaWiki API >  http://en.wikipedia.org/w/api.php
MediaWiki API >  Common scripting with Python or Perl: MediaWiki::Bot >  You can get / store information from/in wiki.
MediaWiki API >  Easier to extract data: Retrieve wiki syntax, not direct HTML content
Useful when templates are used
Can retrieve all pages from a category
SEMANTIC WEB >  The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in collaboration. Sir Tim Berners-Lee
The Berners-Lee Semantic Web ‘Birthday Cake’ http://www.mkbergman.com/231/from-data-federation-pyramid-to-the-semantic-web-birthday-cake/
The evolution of the Web
SNPedia: a  semantic  wiki for human genetic studies >  http://www.snpedia.com   (starts from 2007) >  Semantic MediaWiki  (first releases 2005) >   Database of SNP (Single Nucleotide Polymorphisms) >   In September 2009, website claimed 7,938 SNPs in their database. >   Predictive medicine report against SNPedia using Promethease: An application to query SNPedia against your genotyping
SNPedia: a  semantic  wiki for human genetic studies
SNPedia >  Example of a wiki page Rs333
SNPedia >  Example of a wiki page Rs333
SNPedia >  Example of a wiki page properties Rs333
SNPedia >  Example of a page property ( disease ) value HIV
Semantic MediaWiki Data Types *  Type:Page :  links to pages (the default) *  Type:String :  text strings that are not longer than 250 letters *  Type:Number :  integer and decimal numbers with optional exponent *  Type:Boolean :  restricts the value of a property to true/false (also 1/0 and yes/no) *  Type:Date:   specifies particular points in time *  Type:Text:   like Type:String but can have unlimited length; the trade-off is values of this type cannot be selection or sort criteria in queries. *  Type:Code:   like Type:Text but with additional precautions to preserve special formatting as used for technical texts. The value displays as regular text everywhere else (query results, factbox, &quot;Pages using the property&quot;, etc.). *  Type:Temperature:   variation of Type:Number that supports uits of temperature (cannot be user-defined since converting temperature units is more complicated than multiplying by a conversion factor). *  Type:Telephone number:   validates and stores international telephone numbers based on the RFC 3966 standard *  Type:Record:  type for compound property values that consists of a short list of values with fixed type and order
Semantic MediaWiki Data Types For specifying URLs and emails, there are some special variations of the string data type: *  Type:URL:  displays an external link to its URL object *  Type:Email:  displays an e-mail address as a link (with mailto:) *  Type:Annotation URI:  similar to Type:URL but with some technical differences in SMW's RDF export  Some extension provide further types: *  Type:Geographic coordinate  (provided by  Semantic Maps ): describes geographic locations. Different forms of geographic coordinates are supported.  http://semantic-mediawiki.org/wiki/Help:Properties_and_types
SNPedia – RDF behind a wiki page
RDF (Resource Description Framework) Triple {subject, property/predicate, object} Defining & describing data and relations among data Suitable to attach metadata to certain resources Understood by machines  (not so much by humans…) Normally in XML format Alternative: RDFa (in XHTML pages directly)
RDF: Gene Ontology
OWL: Gene Ontology
SPARQL RDF query language; its name is a recursive acronym that stands for  SPARQL Protocol and RDF Query Language . Example query of Wikipedia:  http://dbpedia.org/sparql
Example query of biological resources:
http://www.semantic-systems-biology.org/biogateway/querying
SPARQL
SPARQL
Semantic MediaWiki vs MediaWiki (I) Semantic MediaWiki (and other semantic addons) is an extension of MediaWiki. At least as much as with MediaWiki. Better and more specific search capabilities Not only free text search on pages It resembles relational database searching SPARQL =~ SQL
Semantic MediaWiki vs MediaWiki (II) Better browsing interface (browsing through properties, not only categories) Importing and exporting of logical mesh. Easier exchange of information with 3 rd  party applications (through RDF)
Protein-Wiki Semantic wiki-based system for the management of a protein production service. Currently in testing phase In collaboration with CRG Protein Service Customisation built up after study of their present workflow and actual needs. Intended for internal use
Protein-Wiki . Advantages >  Cheaper approach than most commercial similar solutions >  Open-source technology. Blooming comunity behind. >  Avoidance of vendor lock-in and abusive licensing. >  Customisable to specific needs. Extrapolable to other cases.
Protein-Wiki . Example Workflow Create study Accept Lab Member Researcher Access web interface Fill form Submit request Reject Review scientific  info Review  study Accept Reject Reject Lab Manager Assign study to core members Finance Controller (ORDER MANAGEMENT SYSTEM) Review financial Info? Accept Open study Retrieve SOP Perform all study steps ( quotation ) Review  study results Reject Sign-off  (?) Request review Prepare report ( order number ) Send results/report ( communication ) // Retrieve results/report Sign-off  (?) Accept Meeting Meeting Meeting Meeting Meeting Meeting // ? // // ? Receive invoice Design: Guglielmo Roma
Protein-Wiki : Users roles S ubmit requests to the service using pre-defined templates, view the status of his/her requests at any time, and retrieve the study reports when experiments are complete Can add, edit experimental data, cannot create or delete experiments. Can create, edit, delete new experiments, associated to submitted requests, using pre-defined templates C reation of new templates, users management and their training
Protein-Wiki : permissions & security Login & role permissions.  Done automatically or via administrator Namespaces specific permissions:  Experiment:: (only lab members/managers) Template:: (only administrators) Page specific permissions By using user and parse functions extensions Network?
Protein-Wiki  Homepage
Protein-Wiki . Request Form
Protein-Wiki . Request Form

More Related Content

Semantic web technologies applied to bioinformatics and laboratory data management

  • 1. Semantic web technologies applied to bioinformatics and laboratory data management Toni Hermoso Pulido [email_address]
  • 4. THE CLASSICAL WEB > Syntax Markup languages (HTML, XHTML, etc.) > Content Text inside the tags (or as attributes) > Style HTML tags themselves
  • 5. CSS (in content or as external files) Robert Cailliau WWW fomer logo Tim Berners-Lee, Robert Cailiau. CERN (1990)
  • 7. WEB 2.0 > Buzz word. First coinage associated to Tim O'Reilly. > The term &quot;Web 2.0&quot; (2004–present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web. > Examples of Web 2.0 include web-based communities, hosted services, web applications, social-networking sites, video-sharing sites, wikis, blogs, mashups, and folksonomies. > AJAX, RSS, Web APIs…
  • 8. wikis may allow anyone to edit
  • 9. wikis are intended to be easy to use
  • 10. wiki content is easy to link
  • 11. wikis support tracking of all changes
  • 12. wikis may allow upload different media Wiki – Wiki ! WikiWikiWeb. Ward Cunningham - 1994
  • 13. MediaWiki > Most popular wiki software > Behind Wikimedia Foundation. > The most know implementation is: Wikipedia http://www.wikipedia.org First version 2002. Wikipedia before UseModWiki (Perl Wiki)
  • 14. Gene Wiki: Gene annotation project in Wikipedia http://en.wikipedia.org/wiki/Portal:Gene_Wiki > Approach rellevant human genes information to end-users > Manual collaborative annotation & automated external reference thanks to robot software > Wikipedia portal within Molecular and Cellular Biology Project Published September 2009
  • 15. Gene Wiki: Gene annotation project in Wikipedia
  • 16. GENE WIKI > Example of a wiki page Reelin
  • 17. GENE WIKI > Example of a wiki category page Human proteins
  • 18. GENE WIKI > Example of a wiki source page: Reelin
  • 19. GENEWIKI > Example of a wiki template page: Reelin
  • 20. Web parsing / scraping > To get information from a HTML source (wiki included) Download tools: Lynx
  • 21. Wget
  • 25. Web parsing / scraping > Processing content. (example, EC: 3.4.21.-) Regular expressions s/<a href=&quot;httpwww.genome.jpdbget-binwww_bget?enzyme+(+)?&quot;/g Xpath id('bodyContent')/x:table[2]/x:tbody/x:tr[8]/x:td/x:span/x:a
  • 26. MediaWiki API > http://en.wikipedia.org/w/api.php
  • 27. MediaWiki API > Common scripting with Python or Perl: MediaWiki::Bot > You can get / store information from/in wiki.
  • 28. MediaWiki API > Easier to extract data: Retrieve wiki syntax, not direct HTML content
  • 30. Can retrieve all pages from a category
  • 31. SEMANTIC WEB > The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in collaboration. Sir Tim Berners-Lee
  • 32. The Berners-Lee Semantic Web ‘Birthday Cake’ http://www.mkbergman.com/231/from-data-federation-pyramid-to-the-semantic-web-birthday-cake/
  • 33. The evolution of the Web
  • 34. SNPedia: a semantic wiki for human genetic studies > http://www.snpedia.com (starts from 2007) > Semantic MediaWiki (first releases 2005) > Database of SNP (Single Nucleotide Polymorphisms) > In September 2009, website claimed 7,938 SNPs in their database. > Predictive medicine report against SNPedia using Promethease: An application to query SNPedia against your genotyping
  • 35. SNPedia: a semantic wiki for human genetic studies
  • 36. SNPedia > Example of a wiki page Rs333
  • 37. SNPedia > Example of a wiki page Rs333
  • 38. SNPedia > Example of a wiki page properties Rs333
  • 39. SNPedia > Example of a page property ( disease ) value HIV
  • 40. Semantic MediaWiki Data Types * Type:Page : links to pages (the default) * Type:String : text strings that are not longer than 250 letters * Type:Number : integer and decimal numbers with optional exponent * Type:Boolean : restricts the value of a property to true/false (also 1/0 and yes/no) * Type:Date: specifies particular points in time * Type:Text: like Type:String but can have unlimited length; the trade-off is values of this type cannot be selection or sort criteria in queries. * Type:Code: like Type:Text but with additional precautions to preserve special formatting as used for technical texts. The value displays as regular text everywhere else (query results, factbox, &quot;Pages using the property&quot;, etc.). * Type:Temperature: variation of Type:Number that supports uits of temperature (cannot be user-defined since converting temperature units is more complicated than multiplying by a conversion factor). * Type:Telephone number: validates and stores international telephone numbers based on the RFC 3966 standard * Type:Record: type for compound property values that consists of a short list of values with fixed type and order
  • 41. Semantic MediaWiki Data Types For specifying URLs and emails, there are some special variations of the string data type: * Type:URL: displays an external link to its URL object * Type:Email: displays an e-mail address as a link (with mailto:) * Type:Annotation URI: similar to Type:URL but with some technical differences in SMW's RDF export Some extension provide further types: * Type:Geographic coordinate (provided by Semantic Maps ): describes geographic locations. Different forms of geographic coordinates are supported. http://semantic-mediawiki.org/wiki/Help:Properties_and_types
  • 42. SNPedia – RDF behind a wiki page
  • 43. RDF (Resource Description Framework) Triple {subject, property/predicate, object} Defining & describing data and relations among data Suitable to attach metadata to certain resources Understood by machines (not so much by humans…) Normally in XML format Alternative: RDFa (in XHTML pages directly)
  • 46. SPARQL RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language . Example query of Wikipedia: http://dbpedia.org/sparql
  • 47. Example query of biological resources:
  • 51. Semantic MediaWiki vs MediaWiki (I) Semantic MediaWiki (and other semantic addons) is an extension of MediaWiki. At least as much as with MediaWiki. Better and more specific search capabilities Not only free text search on pages It resembles relational database searching SPARQL =~ SQL
  • 52. Semantic MediaWiki vs MediaWiki (II) Better browsing interface (browsing through properties, not only categories) Importing and exporting of logical mesh. Easier exchange of information with 3 rd party applications (through RDF)
  • 53. Protein-Wiki Semantic wiki-based system for the management of a protein production service. Currently in testing phase In collaboration with CRG Protein Service Customisation built up after study of their present workflow and actual needs. Intended for internal use
  • 54. Protein-Wiki . Advantages > Cheaper approach than most commercial similar solutions > Open-source technology. Blooming comunity behind. > Avoidance of vendor lock-in and abusive licensing. > Customisable to specific needs. Extrapolable to other cases.
  • 55. Protein-Wiki . Example Workflow Create study Accept Lab Member Researcher Access web interface Fill form Submit request Reject Review scientific info Review study Accept Reject Reject Lab Manager Assign study to core members Finance Controller (ORDER MANAGEMENT SYSTEM) Review financial Info? Accept Open study Retrieve SOP Perform all study steps ( quotation ) Review study results Reject Sign-off (?) Request review Prepare report ( order number ) Send results/report ( communication ) // Retrieve results/report Sign-off (?) Accept Meeting Meeting Meeting Meeting Meeting Meeting // ? // // ? Receive invoice Design: Guglielmo Roma
  • 56. Protein-Wiki : Users roles S ubmit requests to the service using pre-defined templates, view the status of his/her requests at any time, and retrieve the study reports when experiments are complete Can add, edit experimental data, cannot create or delete experiments. Can create, edit, delete new experiments, associated to submitted requests, using pre-defined templates C reation of new templates, users management and their training
  • 57. Protein-Wiki : permissions & security Login & role permissions. Done automatically or via administrator Namespaces specific permissions: Experiment:: (only lab members/managers) Template:: (only administrators) Page specific permissions By using user and parse functions extensions Network?
  • 61. Protein-Wiki . Request result page
  • 62. Protein-Wiki . Enable experiment
  • 65. Protein-Wiki . Experiment form Logical input Restrictions. Data Type linked
  • 68. Protein-Wiki . Semantic properties ￧ Allowed values Invalid value
  • 69. Protein-Wiki . Conditional syntax Enable certain experiment sections if asked by the researcher or lab manager Input value restriction at the form level Example: Only nucleotides allowed in Primer sequences
  • 70. Protein-Wiki . List of tasks May be visible or not to researchers. Workload. Different fields depending on the user's role.
  • 71. Protein-Wiki . List of tasks Any kind of customised list can be created from semantic properties.
  • 72. Conclusions (I) Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments Efficient collaboration between different users Group roles specific permissions Researchers , lab members, lab managers, administrators Well-know interface. All people should have edited Wikipedia once! Note-taking in wiki for future consultation
  • 73. Conclusions (II) Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments Users can be both humans and robot script applications Refined and specific queries Logic connection with other semantic empowered software Easy set up of new environments (high level programming) Wiki templates, properties and forms vs coding and database design
  • 74. Conclusions (III) Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments Tracking (page history and recent changes) Unless performed by the wiki administrator, workflow cannot be avoided Unless performed by the system administrator, history cannot be forged. Permits 3rd party quality check auditing
  • 77. Francesco Mancuso Acknowledgments Protein Service Michela Bertero