SlideShare a Scribd company logo
XML Information set
• An abstract data set used to describe information contained in an
  (well-formed) XML document
• Provide a consistent set of definitions for use in other specifications
  that need to refer to the information in a well-formed XML document


• Not exhaustive; Include only those that are expected to be useful in
  future specifications
• Not minimum set of information that must be returned by an XML
  processor
• Analogous to tree
 Each XML document has an information set if it is well-formed
  and satisfies some namespace constraints
  • Not require to be valid
  • May be created by methods other than parsing an XML document
 XML document’s infoset
  • Consists of a a number of information items
  • At least a document information item and several others

 Information item
  • An abstraction description of some part of an XML document
  • Has a set of acossiated named properties

 Have 11 types of information items
 Information set is same as a tree
 Information item is same as a node of tree
 Have 11 types of information items
  1.   Document                          7. Comment
  2.   Element                           8. The Document Type Declaration
  3.   Attribute                         9. Unparsed Entity
  4.   Processing Instruction            10. Notation
  5.   Unexpanded Entity Reference       11. Namespace
  6.   Character

 Each information item has properties
  • Property named ‘xyz’ is indicated by [xyz]
 There is exactly one document information item in the infoset of
  an XML document
 All other information items are accessible from the properties of
  the document information item, either directly or indirectly
  through the properties of other information items
 Has properties
  • [children]           • [unparsed entities]           • [standalone]
  • [document element]   • [baseURI]                     • [version]
  • [notations]          • [character encoding scheme]   • [all declarations processed]
 There is an element information item for each element
  appearing in the XML document
  • One of the element information items is the value of the [document element]
    property of the document information item, corresponding to the root of the element
    tree, and

  • All other element information items are accessible by recursively following its
    [children] property

 Has properties
  • [namespace name]       • [children]                      • [in-scope namespaces]
  • [local name]           • [attributes]                    • [base URI]
  • [prefix]               • [namespace attributes]          • [parent]
 There is an attribute information item for each attribute
  (specified or defaulted) of each element in the document
  • including those which are namespace declarations

  • The latter however appear as members of an element's [namespace attributes]
    property rather than its [attributes] property

 Has properties
  • [namespace name]         • [normailized value]     • [references]
  • [local name]             • [specified]             • [owner element]
  • [prefix]                 • [attribute type]
 There is a processing instruction information item for each
  processing instruction in the document

 The XML declaration and text declarations for external parsed
  entities are not considered processing instructions

 Has properties
       • [target]     • [notation]
       • [content]    • [parent]
       • [base URI]
 A unexpanded entity reference information item serves as a
  placeholder by which an XML processor can indicate that it has
  not expanded an external parsed entity
 A validating XML processor, or a non-validating processor that
  reads all external general entities, will never generate
  unexpanded entity reference information items for a valid
  document.
 Has properties
       • [name]                • [declaration base URI]
       • [system identifier]   • [parent]
       • [public identifier]
 There is a character information item for each data character
  that appears in the document, whether literally, as a character
  reference, or within a CDATA section

 Each character is a logically separate information item, but XML
  applications are free to chunk characters into larger groups as
  necessary or desirable

 Has properties
       • [character code]               • [parent]
       • [element content whitespace]
 There is a comment information item for each XML comment
  in the original document, except for those appearing in the DTD
  (which are not represented)

 Has properties
       • [content]          • [parent]
 If the XML document has a document type declaration, then the
  information set contains a single document type declaration
  information item
 Note that entities and notations are provided as properties of
  the document information item, not the document type
  declaration information item
 Has properties
       • [system identifier]   • [children]
       • [public identifier]   • [parent]
 There is an unparsed entity information item for each
  unparsed general entity declared in the DTD

 Has properties
    • [name]                • [declaration base URI]
    • [system identifier]   • [notation name]
    • [public identifier]   • [notation]
 There is a notation information item for each notation
  declared in the DTD

 Has properties
       • [name]                • [public identifier]
       • [system identifier]   • [declaration base URI]
 Each element in the document has a namespace information
  item for each namespace that is in scope for that element

 Has properties
       • [prefix]        • [namespace name]
 Information Sets are extensible

 New recommendations can associate properties with info items
  by adding properties

 For example, XML Schema adds properties to the infoset to
  record the results of validation
  • Post-Schema -Validation Infoset (PSVI)

 Proprietary software can add their own properties too
1. The content models of elements, from ELEMENT declarations in the DTD.
2. The grouping and ordering of attribute declarations in ATTLIST declarations.
3. The order of attributes within a start-tag.
4. The document type name.
5. White space outside the document element.
6. White space immediately following the target name of a PI.
7. Whether characters are represented by character references.
8. White space within start-tags (other than significant white space in attribute
   values) and end-tags.
9. The difference between the two forms of an empty element: <foo/> and
   <foo></foo>.
10. The difference between CR, CR-LF, and LF line termination.
11. The order of declarations within the DTD.
12. The boundaries of conditional sections in the DTD.
13. The boundaries of parameter entities in the DTD.
14. The boundaries of general parsed entities.
15. The boundaries of CDATA marked sections.
16. Comments in the DTD.
17. The location of declarations (whether in internal or external subset or
    parameter entities).
18. Any ignored declarations, including those within an IGNORE conditional
    section, as well as entity and attribute declarations ignored because
    previous declarations override them.
19. The kind of quotation marks (single or double) used to quote attribute values.
20. The default value of attributes declared in the DTD.
2. Used in other specifications that need to refer to the
   information in a well-formed XML document
1. XML Information Set (Second Edition)

More Related Content

XML Information set

  • 2. • An abstract data set used to describe information contained in an (well-formed) XML document • Provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document • Not exhaustive; Include only those that are expected to be useful in future specifications • Not minimum set of information that must be returned by an XML processor • Analogous to tree
  • 3.  Each XML document has an information set if it is well-formed and satisfies some namespace constraints • Not require to be valid • May be created by methods other than parsing an XML document
  • 4.  XML document’s infoset • Consists of a a number of information items • At least a document information item and several others  Information item • An abstraction description of some part of an XML document • Has a set of acossiated named properties  Have 11 types of information items  Information set is same as a tree  Information item is same as a node of tree
  • 5.  Have 11 types of information items 1. Document 7. Comment 2. Element 8. The Document Type Declaration 3. Attribute 9. Unparsed Entity 4. Processing Instruction 10. Notation 5. Unexpanded Entity Reference 11. Namespace 6. Character  Each information item has properties • Property named ‘xyz’ is indicated by [xyz]
  • 6.  There is exactly one document information item in the infoset of an XML document  All other information items are accessible from the properties of the document information item, either directly or indirectly through the properties of other information items  Has properties • [children] • [unparsed entities] • [standalone] • [document element] • [baseURI] • [version] • [notations] • [character encoding scheme] • [all declarations processed]
  • 7.  There is an element information item for each element appearing in the XML document • One of the element information items is the value of the [document element] property of the document information item, corresponding to the root of the element tree, and • All other element information items are accessible by recursively following its [children] property  Has properties • [namespace name] • [children] • [in-scope namespaces] • [local name] • [attributes] • [base URI] • [prefix] • [namespace attributes] • [parent]
  • 8.  There is an attribute information item for each attribute (specified or defaulted) of each element in the document • including those which are namespace declarations • The latter however appear as members of an element's [namespace attributes] property rather than its [attributes] property  Has properties • [namespace name] • [normailized value] • [references] • [local name] • [specified] • [owner element] • [prefix] • [attribute type]
  • 9.  There is a processing instruction information item for each processing instruction in the document  The XML declaration and text declarations for external parsed entities are not considered processing instructions  Has properties • [target] • [notation] • [content] • [parent] • [base URI]
  • 10.  A unexpanded entity reference information item serves as a placeholder by which an XML processor can indicate that it has not expanded an external parsed entity  A validating XML processor, or a non-validating processor that reads all external general entities, will never generate unexpanded entity reference information items for a valid document.  Has properties • [name] • [declaration base URI] • [system identifier] • [parent] • [public identifier]
  • 11.  There is a character information item for each data character that appears in the document, whether literally, as a character reference, or within a CDATA section  Each character is a logically separate information item, but XML applications are free to chunk characters into larger groups as necessary or desirable  Has properties • [character code] • [parent] • [element content whitespace]
  • 12.  There is a comment information item for each XML comment in the original document, except for those appearing in the DTD (which are not represented)  Has properties • [content] • [parent]
  • 13.  If the XML document has a document type declaration, then the information set contains a single document type declaration information item  Note that entities and notations are provided as properties of the document information item, not the document type declaration information item  Has properties • [system identifier] • [children] • [public identifier] • [parent]
  • 14.  There is an unparsed entity information item for each unparsed general entity declared in the DTD  Has properties • [name] • [declaration base URI] • [system identifier] • [notation name] • [public identifier] • [notation]
  • 15.  There is a notation information item for each notation declared in the DTD  Has properties • [name] • [public identifier] • [system identifier] • [declaration base URI]
  • 16.  Each element in the document has a namespace information item for each namespace that is in scope for that element  Has properties • [prefix] • [namespace name]
  • 17.  Information Sets are extensible  New recommendations can associate properties with info items by adding properties  For example, XML Schema adds properties to the infoset to record the results of validation • Post-Schema -Validation Infoset (PSVI)  Proprietary software can add their own properties too
  • 18. 1. The content models of elements, from ELEMENT declarations in the DTD. 2. The grouping and ordering of attribute declarations in ATTLIST declarations. 3. The order of attributes within a start-tag. 4. The document type name. 5. White space outside the document element. 6. White space immediately following the target name of a PI. 7. Whether characters are represented by character references. 8. White space within start-tags (other than significant white space in attribute values) and end-tags. 9. The difference between the two forms of an empty element: <foo/> and <foo></foo>. 10. The difference between CR, CR-LF, and LF line termination.
  • 19. 11. The order of declarations within the DTD. 12. The boundaries of conditional sections in the DTD. 13. The boundaries of parameter entities in the DTD. 14. The boundaries of general parsed entities. 15. The boundaries of CDATA marked sections. 16. Comments in the DTD. 17. The location of declarations (whether in internal or external subset or parameter entities). 18. Any ignored declarations, including those within an IGNORE conditional section, as well as entity and attribute declarations ignored because previous declarations override them. 19. The kind of quotation marks (single or double) used to quote attribute values. 20. The default value of attributes declared in the DTD.
  • 20. 2. Used in other specifications that need to refer to the information in a well-formed XML document
  • 21. 1. XML Information Set (Second Edition)