SlideShare a Scribd company logo
Remote Procedure Calls  and Web Services Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 3, 2009
Today Reminder HW2 Milestone 1 due tonight Distributed programming, concluded:  RPC and Web Services
Some Common Modes of Building Distributed Applications Data-intensive: XQuery (fetch XML from multiple sites, produce new XML) Turing-complete functional programming language Good for Web Services; not much support for I/O, etc. MapReduce (built over DHT or distributed file system) Single filter (map), followed by single aggregation (reduce) Languages over it:  Sawzall, Pig Latin, Dryad, … Message passing / request-response: e.g., over a DHT, sockets, or message queue Communication via asynchronous messages Processing in message handler loop Function calls: Remote procedure call / remote method invocation
Fully Synchronous Request/Response:  Remote Procedure Calls Remote procedure calls  have been around forever, including: COM+ CORBA Java RMI The basic idea:  put a function elsewhere in the system, call in distributed fashion but using standard languages, methods An RPC API defines a format for: Initiating a call on a particular server, generally in a reliable way Sending parameters ( marshalling ) to the server Receiving a return value, which may require marshalling as well And an RPC call is  synchronous  (i.e., it generally blocks)
A Remote Procedure Call Visualized time working server is busy request function server waits for req. client blocks RPC Server RPC Client
How RPC Generally Works You write an application with a series of functions One of these functions,  F , will be distributed remotely You call a “stub generator” A  caller stub  emulates the function  F : Opens a connection to the server Requests  F , marshalling all parameters Receives  F ’s return status and parameters A  server stub  emulates the caller: Receives a request for  F  with parameters Unmarshals the parameters, invokes  F Takes  F ’s return status (e.g., protection fault), return value, and marshals it back to the client
Passing Value Parameters Steps involved in doing remote computation through RPC 2-8
RPC Components Generally, you need to write: Your function, in a compatible language An  interface definition , analogous to a C header file, so other people can program for  F  without having its source Generally, software will take the interface definition and generate the appropriate stubs (In the case of Java, RMIC knows enough about Java to run directly on the source file) The server stubs will generally run in some type of daemon process on the server Each function will need a globally unique name or GUID
Parameter Passing Can Be Tricky Because of References The situation when passing an object by reference or by value. 2-18
What Are the Hard Problems with RPC?  Esp. Inter-Language RPC? Resolving different data formats between languages (e.g., Java vs. Fortran arrays) Reliability, security Finding remote procedures in the first place Extensibility/maintainability (Some of these might look familiar from when we talked about data exchange!)
Web Services Goal:  provide an infrastructure for connecting components, building applications in a way similar to hyperlinks between data It’s another distributed computing platform for the Web Goal:  Internet-scale, language-independent, upwards-compatible where possible This one is based on many familiar concepts Standard protocols:  HTTP Standard marshalling formats:  XML-based, XML Schemas All new data formats are XML-based
Three Parts to Web Services “ Wire” / messaging protocols Data encodings, RPC calls or document passing, etc. Describing what goes on the wire Schemas for the data “ Service discovery” Means of finding web services
The Protocol Stacks of Web Services Enhanced + expanded from a figure from IBM’s “Web Services Insider”, http://www-106.ibm.com/developerworks/webservices/library/ws-ref2/ Other extensions SOAP Attachments WS-Security WS-AtomicTransaction, WS-Coordination SOAP,  XML-RPC XML XML Schema Service Description  (WSDL) Service Capabilities (WS-Capability) Message Sequencing Orchestration  (WS-BPEL) Inspection Directory (UDDI) Wire Format Stack Discovery Stack Description Stack WS-Addressing High-level state transition + msging diagrams between modules
Messaging Protocol: SOAP Simple Object Access Protocol:  XML-based format for passing parameters Has a SOAP header and body inside an  envelope As a defined HTTP binding ( POST  with  content-type  of  application/soap+xml ) A companion SOAP Attachments encapsulates other (MIME) data The header defines information about processing:  encoding, signatures, etc. It’s extensible, and there’s a special attribute called  mustUnderstand  that is attached to elements that  must  be supported by the callee The body defines the actual application-defined data
A SOAP Envelope <SOAP-ENV:Envelope xmlns:SOAP-ENV=“http://www.w3.org/2001/12/soap-envelope” xmlns:xsd=“http://www.w3.org/www.w3.org/2001/XMLSchema-instance”> <SOAP-ENV:Header> <t:Transaction xmlns:t=“www.mytrans.com” SOAP-ENV:mustUnderstand=“1” /> </SOAP-ENV:Header> <SOAP-ENV:Body> <m:PlaceOrder xmlns:m=“www.somewhere/there”> <orderno xsi:type=“xsd:string”>12</orderno> </m:PlaceOrder> </SOAP-ENV:Body> </SOAP-ENV: Envelope>
Making a SOAP Call To execute a call to service PlaceOrder: POST /PlaceOrder HTTP/1.1 Host: my.server.com Content-Type: application/soap+xml; charset=“utf-8” Content-Length:  nnn <SOAP-ENV:Envelope> … </SOAP-ENV:Envelope>
SOAP Return Values If successful, the SOAP response will generally be another SOAP message with the return data values, much like the request If failure, the contents of the SOAP envelop will generally be a Fault message, along the lines of: <SOAP-ENV:Body> <SOAP-ENV:Fault xmlns=“mynamespace”> <faultcode>SOAP-ENV:Client</faultcode> <faultstring>Could not parse message</faultstring> …
How Do We Declare Functions? WSDL is the interface definition language for web services Defines notions of protocol bindings, ports, and services Generally describes data types using XML Schema In CORBA, this was called an IDL In Java, the interface uses the same language as the Java code
A WSDL Service Service Port Port Port PortType Operation Operation PortType Operation Operation PortType Operation Operation Binding Binding Binding
Web Service Terminology Service:  the entire Web Service Port:  maps a set of port types to a transport binding (a protocol, frequently SOAP, COM, CORBA, …) Port Type:  abstract grouping of operations, i.e. a class Operation:  the type of operation – request/response, one-way Input message and output message; maybe also fault message Types:  the XML Schema type definitions
Example WSDL <service name=“POService”> <port binding=“my:POBinding”> <soap:address location=“http://yyy:9000/POSvc”/> </port> </service> <binding xmlns:my=“…” name=“POBinding”> <soap:binding style=“rpc” transport=“ http://www.w3.org/2001/...” /> <operation name=“POrder”> <soap:operation soapAction=“POService/POBinding” style=“rpc” /> <input name=“POrder”> <soap:body use=“literal” … namespace=“POService” …/> </input> <output name=“POrderResult”> <soap:body use=“literal” … namespace=“POService” …/> </output> </operation> </binding>
JAX-RPC: Java and Web Services To write  JAX-RPC web service “endpoint”, you need two parts: An endpoint interface – this is basically like the IDL statement An implementation class – your actual code public interface BookQuote extends java.rmi.Remote { public float getBookPrice(String isbn) throws java.rmi.RemoteException; } public class BookQuote_Impl_1 implements BookQuote { public float getBookPrice(String isbn) { return 3.22; } }
Different Options for Calling The conventional approach is to generate a stub, as in the RPC model described earlier You can also  dynamically  generate the call to the remote interface, e.g., by looking up an interesting function to call Finally, the “DII” (Dynamic Instance Invocation) method allows you to assemble the SOAP call on your own
Creating a Java Web Service A compiler called wscompile is used to generate your WSDL file and stubs You need to start with a configuration file that says something about the service you’re building and the interfaces that you’re converting into Web Services
Example Configuration File <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <configuration xmlns=&quot;http://java.sun.com/xml/ns/jax- rpc/ri/config&quot;> <service name=&quot;StockQuote&quot; targetNamespace=&quot;http://example.com/stockquote.wsdl&quot; typeNamespace=&quot;http://example.com/stockquote/types&quot; packageName=&quot;stockqt&quot;> <interface name=&quot;stockqt.StockQuoteProvider&quot; servantName=&quot;stockqt.StockQuoteServiceImpl&quot;/>  </service>  </configuration>
Starting a WAR The Web Service version of a Java JAR file is a Web Archive, WAR There’s a tool called wsdeploy that generates WAR files Generally this will automatically be called from a build tool such as Ant Finally, you may need to add the WAR file to the appropriate location in Apache Tomcat (or WebSphere, etc.) and enable it See  http://java.sun.com/developer/technicalArticles/WebServices/WSPack2/jaxrpc.html  for a detailed example
Finding a Web Service UDDI: Universal Description, Discovery, and Integration registry Think of it as DNS for web services It’s a replicated database, hosted by IBM, HP, SAP, MS UDDI takes SOAP requests to add and query web service interface data
What’s in UDDI White pages: Information about business names, contact info, Web site name, etc. Yellow pages: Types of businesses, locations, products Includes predefined taxonomies for location, industry, etc. Green pages – what we probably care the most about: How to interact with business services; business process definitions; etc Pointer to WSDL file(s) Unique ID for each service
Data Types in UDDI businessEntity :  top-level structure describing info about the business businessService :  name and description of a service bindingTemplate :  how to access the service tModel  (t = type/technical) :  unique identifier for each service-template specification publisherAssertion :  describes relationship between  businessEntities  (e.g., department, division)
Relationships between UDDI Structures publisherAssertion businessEntity businessService bindingTemplate tModel n 2 1 n 1 n m n
Example UDDI businessEntity <businessEntity businessKey=“0123…” xmlns=“urn:uddi-org:api_v2”> <discoveryURLs> <discoveryURL useType=“businessEntity”> http://uddi.ibm.com/registery/uddiget?businessKey=0123 ... </discoveryURL> <name>My Books</name> <description>Technical Book Wholesaler</description> … <businessServices> … </businessServices> <identifierBag> <!– keyedReferences to tModels   </identifierBag> <categoryBag> … </categoryBag> </businessEntity>
UDDI in Perspective Original idea was that it would just organize itself in a way that people could find anything they wanted Today UDDI is basically a very simple catalog of services, which can be queried with standard APIs It’s not clear that it really does what people really want:  they want to find services “like Y” or “that do Z”
The Problem: With UDDI and Plenty of Other Situations There’s no universal, unambiguous way of describing “what I mean” Relational database idea of “normalization” doesn’t convert concepts into some normal form – it just helps us cluster our concepts in meaningful ways “ Knowledge representation” tries to encode definitions clearly – but even then, much is up to interpretation The best we can do:  describe  how things relate pollo  =  chicken  =  poulet  =   雞  =  鸡  = jī =  मुर्गी   = murg Note that this  mapping  may be imprecise or situation-specific! Calling someone a chicken, vs. a chicken that’s a bird
This Brings Us Back to XQuery, Whose Main Role Is to Relate XML Suppose we define an  XML schema  for our target data and our source data A  view  is a stored query Function from a set of (XML) sources to an XML output In fact, in XQuery, a view is actually called a function Can directly translate between XML schemas or structures Describes a relationship between two items Transform 2 into 6 by “add 4” operation Convert from S1 to S2 by applying the query described by view V Often, we don’t need to transfer  all  data – instead, we want to use the data at one source to help answer a query over another source…
Lazy Evaluation: A  Virtual  View Source2.xml Source1.xml Virtual XML doc. XQuery Query Form Browser/App Server(s) Query Results XQuery Source2.xml Source1.xml Composed XQuery HTML XSLT
Let’s Look at Some Simple Mappings Beginning with examples of using XQuery to convert from one schema to another, e.g., to import data First:  let’s review what our XQuery mappings need to accomplish…
Challenges of Mapping Schemas In a perfect world, it would be easy to match up items from one schema with another Each  element  would have a simple correspondence to an element in the other schema Every  value  would clearly map to a value in the other schema Real world:  as with human languages, things don’t map clearly! Different decompositions into elements Different structures Tag name vs. value Values may not exactly correspond It may be unclear whether a value is the same It’s a tough job, but often things can be mapped
Example Schemas Bob’s Movie Database <movie>   <title>…</title>   <year>…</year>   <director>…</director>   <editor>…</editor>   <star>…</star>* </movie>* Mary’s Art List <workOfArt>   <id>…</id>   <type>…</type>   <artist>…</artist>   <subject>…</subject>   <title>…</title> </workOfArt>* Want to map data from one schema to the other
Mapping Bob’s Movies    Mary’s Art Start with the schema of the output as a template: <workOfArt>   <id>$i</id>   <type>$y</type>   <artist>$a</artist>   <subject>$s</subject>   <title>$t</title> </workOfArt> Then figure out where to find the values in the source, and create XPaths
The Final Schema Mapping Mary’s Art    Bob’s Movies for $m in doc(“movie.xml”)//movie,   $a in $m/director/text(),    $i in $m/title/text(),   $t in $m/title/text() return  <workOfArt>   <id>$i</id>   <type>movie</type>   <artist>$a</artist>   <title>$t</title> </workOfArt> Note the absence of  subject … We had no reasonable source, so we are leaving it out.

More Related Content

jkljklj

  • 1. Remote Procedure Calls and Web Services Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 3, 2009
  • 2. Today Reminder HW2 Milestone 1 due tonight Distributed programming, concluded: RPC and Web Services
  • 3. Some Common Modes of Building Distributed Applications Data-intensive: XQuery (fetch XML from multiple sites, produce new XML) Turing-complete functional programming language Good for Web Services; not much support for I/O, etc. MapReduce (built over DHT or distributed file system) Single filter (map), followed by single aggregation (reduce) Languages over it: Sawzall, Pig Latin, Dryad, … Message passing / request-response: e.g., over a DHT, sockets, or message queue Communication via asynchronous messages Processing in message handler loop Function calls: Remote procedure call / remote method invocation
  • 4. Fully Synchronous Request/Response: Remote Procedure Calls Remote procedure calls have been around forever, including: COM+ CORBA Java RMI The basic idea: put a function elsewhere in the system, call in distributed fashion but using standard languages, methods An RPC API defines a format for: Initiating a call on a particular server, generally in a reliable way Sending parameters ( marshalling ) to the server Receiving a return value, which may require marshalling as well And an RPC call is synchronous (i.e., it generally blocks)
  • 5. A Remote Procedure Call Visualized time working server is busy request function server waits for req. client blocks RPC Server RPC Client
  • 6. How RPC Generally Works You write an application with a series of functions One of these functions, F , will be distributed remotely You call a “stub generator” A caller stub emulates the function F : Opens a connection to the server Requests F , marshalling all parameters Receives F ’s return status and parameters A server stub emulates the caller: Receives a request for F with parameters Unmarshals the parameters, invokes F Takes F ’s return status (e.g., protection fault), return value, and marshals it back to the client
  • 7. Passing Value Parameters Steps involved in doing remote computation through RPC 2-8
  • 8. RPC Components Generally, you need to write: Your function, in a compatible language An interface definition , analogous to a C header file, so other people can program for F without having its source Generally, software will take the interface definition and generate the appropriate stubs (In the case of Java, RMIC knows enough about Java to run directly on the source file) The server stubs will generally run in some type of daemon process on the server Each function will need a globally unique name or GUID
  • 9. Parameter Passing Can Be Tricky Because of References The situation when passing an object by reference or by value. 2-18
  • 10. What Are the Hard Problems with RPC? Esp. Inter-Language RPC? Resolving different data formats between languages (e.g., Java vs. Fortran arrays) Reliability, security Finding remote procedures in the first place Extensibility/maintainability (Some of these might look familiar from when we talked about data exchange!)
  • 11. Web Services Goal: provide an infrastructure for connecting components, building applications in a way similar to hyperlinks between data It’s another distributed computing platform for the Web Goal: Internet-scale, language-independent, upwards-compatible where possible This one is based on many familiar concepts Standard protocols: HTTP Standard marshalling formats: XML-based, XML Schemas All new data formats are XML-based
  • 12. Three Parts to Web Services “ Wire” / messaging protocols Data encodings, RPC calls or document passing, etc. Describing what goes on the wire Schemas for the data “ Service discovery” Means of finding web services
  • 13. The Protocol Stacks of Web Services Enhanced + expanded from a figure from IBM’s “Web Services Insider”, http://www-106.ibm.com/developerworks/webservices/library/ws-ref2/ Other extensions SOAP Attachments WS-Security WS-AtomicTransaction, WS-Coordination SOAP, XML-RPC XML XML Schema Service Description (WSDL) Service Capabilities (WS-Capability) Message Sequencing Orchestration (WS-BPEL) Inspection Directory (UDDI) Wire Format Stack Discovery Stack Description Stack WS-Addressing High-level state transition + msging diagrams between modules
  • 14. Messaging Protocol: SOAP Simple Object Access Protocol: XML-based format for passing parameters Has a SOAP header and body inside an envelope As a defined HTTP binding ( POST with content-type of application/soap+xml ) A companion SOAP Attachments encapsulates other (MIME) data The header defines information about processing: encoding, signatures, etc. It’s extensible, and there’s a special attribute called mustUnderstand that is attached to elements that must be supported by the callee The body defines the actual application-defined data
  • 15. A SOAP Envelope <SOAP-ENV:Envelope xmlns:SOAP-ENV=“http://www.w3.org/2001/12/soap-envelope” xmlns:xsd=“http://www.w3.org/www.w3.org/2001/XMLSchema-instance”> <SOAP-ENV:Header> <t:Transaction xmlns:t=“www.mytrans.com” SOAP-ENV:mustUnderstand=“1” /> </SOAP-ENV:Header> <SOAP-ENV:Body> <m:PlaceOrder xmlns:m=“www.somewhere/there”> <orderno xsi:type=“xsd:string”>12</orderno> </m:PlaceOrder> </SOAP-ENV:Body> </SOAP-ENV: Envelope>
  • 16. Making a SOAP Call To execute a call to service PlaceOrder: POST /PlaceOrder HTTP/1.1 Host: my.server.com Content-Type: application/soap+xml; charset=“utf-8” Content-Length: nnn <SOAP-ENV:Envelope> … </SOAP-ENV:Envelope>
  • 17. SOAP Return Values If successful, the SOAP response will generally be another SOAP message with the return data values, much like the request If failure, the contents of the SOAP envelop will generally be a Fault message, along the lines of: <SOAP-ENV:Body> <SOAP-ENV:Fault xmlns=“mynamespace”> <faultcode>SOAP-ENV:Client</faultcode> <faultstring>Could not parse message</faultstring> …
  • 18. How Do We Declare Functions? WSDL is the interface definition language for web services Defines notions of protocol bindings, ports, and services Generally describes data types using XML Schema In CORBA, this was called an IDL In Java, the interface uses the same language as the Java code
  • 19. A WSDL Service Service Port Port Port PortType Operation Operation PortType Operation Operation PortType Operation Operation Binding Binding Binding
  • 20. Web Service Terminology Service: the entire Web Service Port: maps a set of port types to a transport binding (a protocol, frequently SOAP, COM, CORBA, …) Port Type: abstract grouping of operations, i.e. a class Operation: the type of operation – request/response, one-way Input message and output message; maybe also fault message Types: the XML Schema type definitions
  • 21. Example WSDL <service name=“POService”> <port binding=“my:POBinding”> <soap:address location=“http://yyy:9000/POSvc”/> </port> </service> <binding xmlns:my=“…” name=“POBinding”> <soap:binding style=“rpc” transport=“ http://www.w3.org/2001/...” /> <operation name=“POrder”> <soap:operation soapAction=“POService/POBinding” style=“rpc” /> <input name=“POrder”> <soap:body use=“literal” … namespace=“POService” …/> </input> <output name=“POrderResult”> <soap:body use=“literal” … namespace=“POService” …/> </output> </operation> </binding>
  • 22. JAX-RPC: Java and Web Services To write JAX-RPC web service “endpoint”, you need two parts: An endpoint interface – this is basically like the IDL statement An implementation class – your actual code public interface BookQuote extends java.rmi.Remote { public float getBookPrice(String isbn) throws java.rmi.RemoteException; } public class BookQuote_Impl_1 implements BookQuote { public float getBookPrice(String isbn) { return 3.22; } }
  • 23. Different Options for Calling The conventional approach is to generate a stub, as in the RPC model described earlier You can also dynamically generate the call to the remote interface, e.g., by looking up an interesting function to call Finally, the “DII” (Dynamic Instance Invocation) method allows you to assemble the SOAP call on your own
  • 24. Creating a Java Web Service A compiler called wscompile is used to generate your WSDL file and stubs You need to start with a configuration file that says something about the service you’re building and the interfaces that you’re converting into Web Services
  • 25. Example Configuration File <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <configuration xmlns=&quot;http://java.sun.com/xml/ns/jax- rpc/ri/config&quot;> <service name=&quot;StockQuote&quot; targetNamespace=&quot;http://example.com/stockquote.wsdl&quot; typeNamespace=&quot;http://example.com/stockquote/types&quot; packageName=&quot;stockqt&quot;> <interface name=&quot;stockqt.StockQuoteProvider&quot; servantName=&quot;stockqt.StockQuoteServiceImpl&quot;/> </service> </configuration>
  • 26. Starting a WAR The Web Service version of a Java JAR file is a Web Archive, WAR There’s a tool called wsdeploy that generates WAR files Generally this will automatically be called from a build tool such as Ant Finally, you may need to add the WAR file to the appropriate location in Apache Tomcat (or WebSphere, etc.) and enable it See http://java.sun.com/developer/technicalArticles/WebServices/WSPack2/jaxrpc.html for a detailed example
  • 27. Finding a Web Service UDDI: Universal Description, Discovery, and Integration registry Think of it as DNS for web services It’s a replicated database, hosted by IBM, HP, SAP, MS UDDI takes SOAP requests to add and query web service interface data
  • 28. What’s in UDDI White pages: Information about business names, contact info, Web site name, etc. Yellow pages: Types of businesses, locations, products Includes predefined taxonomies for location, industry, etc. Green pages – what we probably care the most about: How to interact with business services; business process definitions; etc Pointer to WSDL file(s) Unique ID for each service
  • 29. Data Types in UDDI businessEntity : top-level structure describing info about the business businessService : name and description of a service bindingTemplate : how to access the service tModel (t = type/technical) : unique identifier for each service-template specification publisherAssertion : describes relationship between businessEntities (e.g., department, division)
  • 30. Relationships between UDDI Structures publisherAssertion businessEntity businessService bindingTemplate tModel n 2 1 n 1 n m n
  • 31. Example UDDI businessEntity <businessEntity businessKey=“0123…” xmlns=“urn:uddi-org:api_v2”> <discoveryURLs> <discoveryURL useType=“businessEntity”> http://uddi.ibm.com/registery/uddiget?businessKey=0123 ... </discoveryURL> <name>My Books</name> <description>Technical Book Wholesaler</description> … <businessServices> … </businessServices> <identifierBag> <!– keyedReferences to tModels  </identifierBag> <categoryBag> … </categoryBag> </businessEntity>
  • 32. UDDI in Perspective Original idea was that it would just organize itself in a way that people could find anything they wanted Today UDDI is basically a very simple catalog of services, which can be queried with standard APIs It’s not clear that it really does what people really want: they want to find services “like Y” or “that do Z”
  • 33. The Problem: With UDDI and Plenty of Other Situations There’s no universal, unambiguous way of describing “what I mean” Relational database idea of “normalization” doesn’t convert concepts into some normal form – it just helps us cluster our concepts in meaningful ways “ Knowledge representation” tries to encode definitions clearly – but even then, much is up to interpretation The best we can do: describe how things relate pollo = chicken = poulet = 雞 = 鸡 = jī = मुर्गी = murg Note that this mapping may be imprecise or situation-specific! Calling someone a chicken, vs. a chicken that’s a bird
  • 34. This Brings Us Back to XQuery, Whose Main Role Is to Relate XML Suppose we define an XML schema for our target data and our source data A view is a stored query Function from a set of (XML) sources to an XML output In fact, in XQuery, a view is actually called a function Can directly translate between XML schemas or structures Describes a relationship between two items Transform 2 into 6 by “add 4” operation Convert from S1 to S2 by applying the query described by view V Often, we don’t need to transfer all data – instead, we want to use the data at one source to help answer a query over another source…
  • 35. Lazy Evaluation: A Virtual View Source2.xml Source1.xml Virtual XML doc. XQuery Query Form Browser/App Server(s) Query Results XQuery Source2.xml Source1.xml Composed XQuery HTML XSLT
  • 36. Let’s Look at Some Simple Mappings Beginning with examples of using XQuery to convert from one schema to another, e.g., to import data First: let’s review what our XQuery mappings need to accomplish…
  • 37. Challenges of Mapping Schemas In a perfect world, it would be easy to match up items from one schema with another Each element would have a simple correspondence to an element in the other schema Every value would clearly map to a value in the other schema Real world: as with human languages, things don’t map clearly! Different decompositions into elements Different structures Tag name vs. value Values may not exactly correspond It may be unclear whether a value is the same It’s a tough job, but often things can be mapped
  • 38. Example Schemas Bob’s Movie Database <movie> <title>…</title> <year>…</year> <director>…</director> <editor>…</editor> <star>…</star>* </movie>* Mary’s Art List <workOfArt> <id>…</id> <type>…</type> <artist>…</artist> <subject>…</subject> <title>…</title> </workOfArt>* Want to map data from one schema to the other
  • 39. Mapping Bob’s Movies  Mary’s Art Start with the schema of the output as a template: <workOfArt> <id>$i</id> <type>$y</type> <artist>$a</artist> <subject>$s</subject> <title>$t</title> </workOfArt> Then figure out where to find the values in the source, and create XPaths
  • 40. The Final Schema Mapping Mary’s Art  Bob’s Movies for $m in doc(“movie.xml”)//movie, $a in $m/director/text(), $i in $m/title/text(), $t in $m/title/text() return <workOfArt> <id>$i</id> <type>movie</type> <artist>$a</artist> <title>$t</title> </workOfArt> Note the absence of subject … We had no reasonable source, so we are leaving it out.