[om] A Proposal for extending OpenMath with structure sharing

Michael Kohlhase kohlhase+ at cs.cmu.edu
Fri Mar 29 17:47:34 CET 2002


Dear all,

In Cannes I proposed to add structure-sharing to the XML encoding of the
OpenMath standard. The main benefits would be that OM objects can be
encoded as directed acyclic graphs (DAG), as opposed to trees, resulting in
more compact representations.

In the following I will sketch an extension of the OpenMath XML encoding and
discuss the design decisions and necessary changes to the standard. 

Michael

-------------------------------------------------------------------------
   Dr. Michael Kohlhase,               Office: Newell Simon Hall 4623
   Adjunct Associate Professor            5000 Forbes Avenue, 
   School of Computer Science             Pittsburgh, Pa 15213-3891, USA.
   Carnegie Mellon University          tel/fax: +1 412 268 5749/6298           
   http://www.cs.cmu.edu/~kohlhase     e-mail: <kohlhase+ at cs.cmu.edu>          
--------------------------------------------------------------------------

========================= MOTIVATION ===============================

It is will known that in the worst case, tree representations of objects can be
exponentially larger than corresponding DAG representations and almost all
(loss-less) compression algorithms are based on some clever way of sub-structure
sharing (i.e. going to a DAG representation). Moreover, many mathematical software
systems already have DAGs as their internal data structures; these have to explode
the structures into OpenMath tree representations on output and re-compute DAG
representations on input.  All of this can be avoided, if the XML encoding
directly supports cross-referencing.

The core idea of structure sharing (DAG) representations is that the following two
representations of terms are equivalent


                  f                                f
                 / \                              / \
                /   \                             \ /
               /     \                             f
              f       f                           / \
             / \     / \                          \ /
            f   f   f   f                          f
           / \ / \ / \ / \                        / \
           | | | | | | | |                        \ /
           a a a a a a a a                         a

Obviously, the left hand tree (generalized to depth n) has 2^n + m occurrences of
the symbols, whereas the right one has n symbol occurrences. The left term can be
directly encoded as the XML-encoded OpenMath object
<OMOBJ>
 <OMA>
  <OMA>
   <OMV name="f"/>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
  <OMA>
   <OMV name="f"/>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
  </OMA>
  </OMA>
 </OMA>
</OMOBJ>

while the right hand one cannot.

========================= THE PROPOSAL ===============================

To remedy this, I propose to add an optional attribute 'id' to the elements OMA,
OMBIND, and OMATTR since they make up the term structure, and a new element 'OMR'
with the xlink [1] attributes for simple links. Thus the object above would have
the representation.

<OMOBJ>
 <OMA>
  <OMA name="t1">
   <OMV name="f"/>
   <OMA id="t11">
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
   <OMR xlink:href="t11"/>
  </OMA>
 <OMR xlink:href="t1"/>
</OMOBJ>

Note that we do not need to introduce id attributes for OMV and OMS, since they
are empty and essentially identified by a name mechanism themselves; so replacing
them by an 'OMR' element would not help.

If we add 'id' attributes to enable structure sharing for non-empty elements, we
can enable structure sharing by the same mechanism for OMI, OMB, OMSTR, OMF,
(which carry content that can get arbitrarily big). 

I propose to leave the element OMATP and OME unchanged, since OMATP alone is not a
well-formed OM object, and I do not see the necessity of sharing OME (this might
however very well be my lack of vision).

An alternative to adding a dedicated 'OMR' element (this is actually the proposal
I presented in Cannes, and that is currently used in OMDoc [2]) is to add
'xlink:href' attributes to the elements OMA, OMBIND, OMATTR, OMI, OMB, OMSTR, OMF,
with the understanding that the presence of an xlink:href attribute in an empty
element would represent cross-reference, but this would make the DTD much weaker,
and moreover interfere with cross-linking (parallel markup) in MathML (I will
discuss this below).

I think that we should use the xlink [1] attribute xlink:href, since this is the
relevant W3C standard. Furthermore, I propose that we should use fixed (as
specified in the DTD and the XML Schema) xlink attributes 
- xlink:type="simple" (we are using simple links) and 
- xlink:show="embed" (the OMR element is to be replaced!). 

This way, xlink-aware processors like browsers would directly do the right thing,
if they do not have a special behavior for the OMR element. Since these elements
are fixed in the DTD and XML Schema, they do not need to be specified in the
objects themselves.

========================= NECESSARY CHANGES ===============================

Note that since I only propose to extend the XML encoding, we only need to
consider changes to the DTD and section 4.1 (pp. 14-20) of the OpenMath standard
[3]. All other parts of OpenMath (type systems,...) are untouched by the current
proposal, since the semantics does not change.

I will now sketch out the necessary changes to the standard, after a discussion
period, I will submit a patch to the standard for review.

4.1.1 needs no change
4.1.2 we need to change the Grammar and the DTD. 
  p. 16. we need to add  the declaration for the OMR element

    <!ELEMENT OMR EMPTY>
    <!ATTLIST OMR xlink:href CDATA #REQUIRED
                  xlink:type CDATA #FIXED 'simple'
		  xlink:show CDATA #FIXED 'embed'>

  furthermore, we need to add attribute list declarations for the elements
  OMA, OMBIND, OMATTR, OMI, OMB, OMSTR, OMF, like 
    
    <!ATTLIST OMA id ID #IMPLIED>

  p. 17 we need to augment the grammar by the new element and the attributes,
  adding the following lines at the appropriate places

   hrefatt  -->  xlink:href ?S = ?S (" uriref " | ' uriref ')
   typeatt  -->  xlink:type ?S = ?S ("simple" | 'simple')
   showatt  -->  xlink:show ?S = ?S ("simple" | 'embed')
   idatt    -->  id ?S = ?S (" symbname " | ' symbname ')

   reference --> <OMR S hrefatt S ?typeatt S ?showatt/>
              |  <OMR S hrefatt S ?typeatt S ?showatt/>
              |  <OMR S ?typeatt S hrefatt S ?showatt/>
              |  <OMR S ?typeatt S ?showatt S hrefatt/>
              |  <OMR S ?showatt S hrefatt S ?typeatt/>
              |  <OMR S ?showatt S ?typeatt S hrefatt/>

   the rule for object must be extended by a 

 object -->   symbol
              variable  
              reference
	      <OMI S? ?idatt S?> S? integer S? </OMI S?>
	      <OMF S? ?idatt S fpdecatt S? /?
	      <OMF S? ?idatt S fphexatt S? /?
	      <OMSTR S? ?idatt S?> char </OMSTR S?>
	      <OMB S? ?idatt S?> base64 </OMB S?>
	      <OMA S? ?idatt S?> SC? object SC? objects SC? </OMA S?>
	      <OMBIND S? ?idatt S?> SC? object SC?
  	         <OMBVAR S?> SC? variables SC? </OMBVAR S?>
        	 SC? object SC? </OMBIND S?>
	      <OME S?> SC? symbol SC? objects SC? </OME S?>
	      <OMATTR S? ?idatt S?> SC? <OMATP S?> SC? attrs SC? </OMATP S?>
  	         SC? object SC? </OMATTR S?> 

 p.19 here an explanation of the 'OMR' element has to be given, basically using
 the motivation and examples from above. 

4.1.3: p. 20, here we need to mention that the xlink namespace must be added 
    in the OMOBJ by the namespace declaration 
    xmlns:xlink="http://www.w3.org/1999/xlink" or 
    xmlns:xlink='http://www.w3.org/1999/xlink'



========================= OPEN QUESTIONS ===============================

Here I protocol some questions that remain in my mind, I think that the proposal
would profit from a discussion of (at least) these. 

1) do we need an 'id' attribute on OME?

2) should we add a fixed attributes like 
   xlink:role="http://www.openmath.org/standard/DAG.html"
   xlink:actuate="OnLoad"
   to specify the default behavior for xlink-aware applications?

3) should the OMR element have an 'id' attribute itself, so that we can have
   chains of links? These do occur in actual implementations, but could always be
   collapsed upon output in the phrasebooks.

4) we should discuss how this proposal really relates to the notion of parallel
   markup in MathML. The 'id' attributes allow parallel markup from presentation
   MathML subtrees into OpenMath (sub)-objects. However, the MathML 2.0
   specification states that for parallel markup, it expects to find
   crossreferences on OpenMath INTO presentation MathML, so that deleting
   <annotation-xml> elements does not have dangling links (which is perfectly
   reasonable, but at the moment not supported by OpenMath). A remedy might be to
   to allow xlink:href attributes on all OM objects (with appropriate
   xlink:role,... attributes). NOTE: all of this is NOT part of the current
   proposal, since this measure would clash at the xlink:href element for
   <OMR>. It might be good to discuss this though.

========================= REFERENCES ===============================

[1] http://www.w3.org/TR/xlink/
[2] http://www.mathweb.org/omdoc
[3] http://www.nag.co.uk/projects/OpenMath/omstd/omstd.ps.gz
--
om at openmath.org  -  general discussion on OpenMath
Post public announcements to om-announce at openmath.org
Automatic list maintenance software at majordomo at openmath.org
Mail om-owner at openmath.org for assistance with any problems



More information about the Om mailing list