SlideShare a Scribd company logo
Creating Knowledge out of Interlinked Data
          MultilingualWeb – 2012/06/11 Dublin – Page 1                           http://lod2.eu




        Linked Data in Linguistics
      for NLP and Web Annotation



                                                            http://nlp2rdf.org
                                                              http://lod2.eu
                                                         Sebastian Hellmann
                                                            AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page                                      http://lod2.eu
MultilingualWeb – 2012/06/11 Dublin – Page 2   http://lod2.eu




 The Semantic Gap
MultilingualWeb – 2012/06/11 Dublin – Page 3                         http://lod2.eu

          Turning Walled Gardens into Park Networks of
          Semantic Linguistic Data
How can we leverage the Data Web for natural
language processing?
                                                            50 Billion facts covering
                                                            all kinds of domains are
                                                            readily available
                                    1. Use the Data         Leverage the wisdom of
                                        Web as              the crowds
                                      background
                                     knowledge for
                                          NLP

                                                    2. Use Data
                         3. Make the
                                                        Web
                        output of NLP
                                                   technologies
                       tools available
                                                  for integrating    RDF is all about
                         on the Data
On the Web, by                                      NLP tools &      semantic
                             Web
sharing and                                         approaches
                                                                     interoperability
copying the value
of information
increases
MultilingualWeb – 2012/06/11 Dublin – Page 4                            http://lod2.eu

 1. Use the Data Web as
 background knowledge for NLP




                                               Linguistic Data currently filed
                                                   under “cross-domain”
MultilingualWeb – 2012/06/11 Dublin – Page 5        http://lod2.eu

    1. Use the Data Web as
    background knowledge for NLP


Three communities with three resources:
 • Working Group for Open Linguistics Data (OWLG)
    – > http://linguistics.okfn.org
 • DBpedia Internationalization Committee
    – > http://wiki.dbpedia.org/Internationalization
 • Wiktionary2RDF Wrappers
    – > http://dbpedia.org/Wiktionary
All communities are open, please join!
MultilingualWeb – 2012/06/11 Dublin – Page 6   http://lod2.eu




 The Linguistic Linked Open Data Cloud
MultilingualWeb – 2012/06/11 Dublin – Page 7   http://lod2.eu




 Main question
MultilingualWeb – 2012/06/11 Dublin – Page 8                        http://lod2.eu




 Wiktionary2RDF – Mediator Wrapper
                                               http://dbpedia.org/Wiktionary
MultilingualWeb – 2012/06/11 Dublin – Page 9                        http://lod2.eu




 Wiktionary2RDF – Mediator Wrapper
                                               http://dbpedia.org/Wiktionary


                                                                 Mediator
                                                                  Lemon
MultilingualWeb – 2012/06/11 Dublin – Page 10     http://lod2.eu

          2. Use Data Web Technologies for
          Integrating NLP Tools and Approaches


Golden Hammer Anti-pattern



The question is not whether to
use RDF and Linked Data, but when
to use...




  Image from http://pbmo.wordpress.com/2011/09/29/maslows-hammer/
MultilingualWeb – 2012/06/11 Dublin – Page 11   http://lod2.eu
MultilingualWeb – 2012/06/11 Dublin – Page 12   http://lod2.eu

   2. Use Data Web Technologies for
   Integrating NLP Tools and Approaches




• Ontologies provide (formal) documentation (UML, ERD)
• Structure is easy to understand
• Wide range of RDF tools can be used, e.g. LOD2 Stack
• Indexing and querying as Big Picture possible
MultilingualWeb – 2012/06/11 Dublin – Page 13         http://lod2.eu

      2. Use Data Web Technologies for
      Integrating NLP Tools and Approaches

  The NLP Interchange Format (NIF) is an RDF/OWL-based
  format that aims to achieve interoperability between Natural
  Language Processing (NLP) tools, language resources and
  annotations.
• Road map
   • Bootstrapped by LOD2, but a community project
   • First release in September 2011
   • Great resonance
      – Over 50 people joined the mailing list:
         http://lists.okfn.org/mailman/listinfo/open-linguistics
      – First third party implementations and contributions
      – Several project discuss usage
   • Currently setting up advisory board, next draft in July
MultilingualWeb – 2012/06/11 Dublin – Page 14                                         http://lod2.eu




S. Auer and S. Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable
LREC 2012, http://www.lrec-conf.org/proceedings/lrec2012/keynotes/LREC%202012.Keynote%20Speech%201.Soeren%20Auer.pdf
MultilingualWeb – 2012/06/11 Dublin – Page 15   http://lod2.eu

        3. Make the Output of NLP Tools
         available on the Web




Currently there is no standard mechanism to transparently
combine the WWW, GGG and NLP




GGG = Giant Global Graph (basically the Web of Data)
see: http://dig.csail.mit.edu/breadcrumbs/node/215
MultilingualWeb – 2012/06/11 Dublin – Page 16   http://lod2.eu

 3. Make the Output of NLP Tools
  available on the Web
MultilingualWeb – 2012/06/11 Dublin – Page 17            http://lod2.eu

           3. Make the Output of NLP Tools
            available on the Web




http://dbpedia.org/spotlight P. Mendes et. al. DBpedia spotlight: Shedding
           light on the web of documents. In I-Semantics, 2011
MultilingualWeb – 2012/06/11 Dublin – Page 18    http://lod2.eu

 3. Make the Output of NLP Tools
  available on the Web




http://annotateit.org
http://sourceforge.net/projects/fragmentlinks/
MultilingualWeb – 2012/06/11 Dublin – Page 19                http://lod2.eu

         3. Make the Output of NLP Tools
          available on the Web

        NLP Interchange Format (NIF) join the mailing list at:
                         http://nlp2rdf.org




Hellmann et.al.: Towards an Ontology for Representing Strings In: EKAW 2012
    http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
LOD2 Title . 02.09.2010 . Page 20                               http://lod2.eu




            Contact

            Address

            University of Leipzig
            Faculty of Mathematics and Computer
            Science
            Institute of Computer Science
            Department of Business Information
            Systems

            Postfach 100920
            04009 Leipzig
            Germany



     Project: http://lod2.eu
     Organisation: http://uni-leipzig.de, http://aksw.org
     Presenter: http://bis.informatik.uni-leipzig.de/SebastianHellmann
     NLP2RDF page: http://nlp2rdf.org

                                                     Acknowledgement:
       CC-BY-SA                            some slides are taken from the keynote
  Thanks for your
unless otherwise stated                         of Sören Auer at LREC 2012

More Related Content

Linked Data in Linguistics for NLP and Web Annotation

  • 1. Creating Knowledge out of Interlinked Data MultilingualWeb – 2012/06/11 Dublin – Page 1 http://lod2.eu Linked Data in Linguistics for NLP and Web Annotation http://nlp2rdf.org http://lod2.eu Sebastian Hellmann AKSW, Universität Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
  • 2. MultilingualWeb – 2012/06/11 Dublin – Page 2 http://lod2.eu The Semantic Gap
  • 3. MultilingualWeb – 2012/06/11 Dublin – Page 3 http://lod2.eu Turning Walled Gardens into Park Networks of Semantic Linguistic Data How can we leverage the Data Web for natural language processing? 50 Billion facts covering all kinds of domains are readily available 1. Use the Data Leverage the wisdom of Web as the crowds background knowledge for NLP 2. Use Data 3. Make the Web output of NLP technologies tools available for integrating RDF is all about on the Data On the Web, by NLP tools & semantic Web sharing and approaches interoperability copying the value of information increases
  • 4. MultilingualWeb – 2012/06/11 Dublin – Page 4 http://lod2.eu 1. Use the Data Web as background knowledge for NLP Linguistic Data currently filed under “cross-domain”
  • 5. MultilingualWeb – 2012/06/11 Dublin – Page 5 http://lod2.eu 1. Use the Data Web as background knowledge for NLP Three communities with three resources: • Working Group for Open Linguistics Data (OWLG) – > http://linguistics.okfn.org • DBpedia Internationalization Committee – > http://wiki.dbpedia.org/Internationalization • Wiktionary2RDF Wrappers – > http://dbpedia.org/Wiktionary All communities are open, please join!
  • 6. MultilingualWeb – 2012/06/11 Dublin – Page 6 http://lod2.eu The Linguistic Linked Open Data Cloud
  • 7. MultilingualWeb – 2012/06/11 Dublin – Page 7 http://lod2.eu Main question
  • 8. MultilingualWeb – 2012/06/11 Dublin – Page 8 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://dbpedia.org/Wiktionary
  • 9. MultilingualWeb – 2012/06/11 Dublin – Page 9 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://dbpedia.org/Wiktionary Mediator Lemon
  • 10. MultilingualWeb – 2012/06/11 Dublin – Page 10 http://lod2.eu 2. Use Data Web Technologies for Integrating NLP Tools and Approaches Golden Hammer Anti-pattern The question is not whether to use RDF and Linked Data, but when to use... Image from http://pbmo.wordpress.com/2011/09/29/maslows-hammer/
  • 11. MultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.eu
  • 12. MultilingualWeb – 2012/06/11 Dublin – Page 12 http://lod2.eu 2. Use Data Web Technologies for Integrating NLP Tools and Approaches • Ontologies provide (formal) documentation (UML, ERD) • Structure is easy to understand • Wide range of RDF tools can be used, e.g. LOD2 Stack • Indexing and querying as Big Picture possible
  • 13. MultilingualWeb – 2012/06/11 Dublin – Page 13 http://lod2.eu 2. Use Data Web Technologies for Integrating NLP Tools and Approaches The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Road map • Bootstrapped by LOD2, but a community project • First release in September 2011 • Great resonance – Over 50 people joined the mailing list: http://lists.okfn.org/mailman/listinfo/open-linguistics – First third party implementations and contributions – Several project discuss usage • Currently setting up advisory board, next draft in July
  • 14. MultilingualWeb – 2012/06/11 Dublin – Page 14 http://lod2.eu S. Auer and S. Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable LREC 2012, http://www.lrec-conf.org/proceedings/lrec2012/keynotes/LREC%202012.Keynote%20Speech%201.Soeren%20Auer.pdf
  • 15. MultilingualWeb – 2012/06/11 Dublin – Page 15 http://lod2.eu 3. Make the Output of NLP Tools available on the Web Currently there is no standard mechanism to transparently combine the WWW, GGG and NLP GGG = Giant Global Graph (basically the Web of Data) see: http://dig.csail.mit.edu/breadcrumbs/node/215
  • 16. MultilingualWeb – 2012/06/11 Dublin – Page 16 http://lod2.eu 3. Make the Output of NLP Tools available on the Web
  • 17. MultilingualWeb – 2012/06/11 Dublin – Page 17 http://lod2.eu 3. Make the Output of NLP Tools available on the Web http://dbpedia.org/spotlight P. Mendes et. al. DBpedia spotlight: Shedding light on the web of documents. In I-Semantics, 2011
  • 18. MultilingualWeb – 2012/06/11 Dublin – Page 18 http://lod2.eu 3. Make the Output of NLP Tools available on the Web http://annotateit.org http://sourceforge.net/projects/fragmentlinks/
  • 19. MultilingualWeb – 2012/06/11 Dublin – Page 19 http://lod2.eu 3. Make the Output of NLP Tools available on the Web NLP Interchange Format (NIF) join the mailing list at: http://nlp2rdf.org Hellmann et.al.: Towards an Ontology for Representing Strings In: EKAW 2012 http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
  • 20. LOD2 Title . 02.09.2010 . Page 20 http://lod2.eu Contact Address University of Leipzig Faculty of Mathematics and Computer Science Institute of Computer Science Department of Business Information Systems Postfach 100920 04009 Leipzig Germany Project: http://lod2.eu Organisation: http://uni-leipzig.de, http://aksw.org Presenter: http://bis.informatik.uni-leipzig.de/SebastianHellmann NLP2RDF page: http://nlp2rdf.org Acknowledgement: CC-BY-SA some slides are taken from the keynote Thanks for your unless otherwise stated of Sören Auer at LREC 2012