[om] [CfD] Proposal for a Standard Change

Michael Kohlhase kohlhase+ at cs.cmu.edu
Thu Sep 12 23:34:04 CEST 2002


Dear all,

In Cannes I proposed to add structure-sharing to the XML encoding of the
OpenMath standard. On March 29, I sent around a concrete proposal with
proposed wording, which was discussed on this mailing list until June.

In this mail, I would like to re-submit an updated proposal (it is
essentially the same, but takes the discussion into account). The purpose
of this is to make the proposal public, so we can vote on it at the Pisa
meeting.

Summary: I propose to add structure sharing to the XML encoding via XLink
         simple links. In effect, OM objects are no longer represented as
         XML trees, but by DAGs (directed acyclic graphs). This can be done
         by adding 'id' attributes to various OM-Elements and a new OM
         element OMR with an xlink:href attribute. In particular data model
         of the the OpenMath Objects as specified in the OM Standard and
         the binary encoding remain unchanged.

The discussion on the mailing list largely clarified the issue of what the
scope of the change really is. This clarification can largely be framed in
two variants of the proposal.

P2). ==================== DAGs at the data structure level ==============
     This proposal would also change the data model of the OpenMath objects
     to DAGs, which generalize trees and so this proposal would allow a
     closer modeling of data structures in Computer Algebra Systems (which
     are often DAGs), but might lead to un-intuitive identity conditions
     (see e.g. the thread starting at [1]).

P3). ================= DGs at the Data structure level ===================
     This proposal would change the OM data model to allow for general
     directed graphs (DG; they may be cyclic). This proposal generalizes P2
     and would allow the inclusion of rational trees (a certain subclass of
     infinite trees) into the data mode (see e.g. the thread starting at [2]).
     
The proposal below does not endorse proposals P2 and P3, and does not
provide concrete wording for it (I personally view it as less disruptive
than P2 and P3, and understand the formal consequences much better). If
someone else wants to use it as a basis for a concrete proposal for P2, I
will be happy to help with the wording. 

Michael

[1] http://www.openmath.org/archive/om/msg00388.html
[2] http://www.openmath.org/archive/om/msg00484.html
-------------------------------------------------------------------------
   Dr. Michael Kohlhase,               Office: Newell Simon Hall 4623
   Adjunct Associate Professor            5000 Forbes Avenue, 
   School of Computer Science             Pittsburgh, Pa 15213-3891, USA.
   Carnegie Mellon University          tel/fax: +1 412 268 5749/6298           
   http://www.cs.cmu.edu/~kohlhase     e-mail: <kohlhase+ at cs.cmu.edu>          
--------------------------------------------------------------------------

========================= MOTIVATION ===============================

It is will known that in the worst case, tree representations of objects can be
exponentially larger than corresponding DAG representations and almost all
(loss-less) compression algorithms are based on some clever way of sub-structure
sharing (i.e. going to a DAG representation). Moreover, many mathematical software
systems already have DAGs as their internal data structures; these have to explode
the structures into OpenMath tree representations on output and re-compute DAG
representations on input.  All of this can be avoided, if the XML encoding
directly supports cross-referencing.

The core idea of structure sharing (DAG) representations is that the following two
representations of terms are equivalent


                  f                                f
                 / \                              / \
                /   \                             \ /
               /     \                             f
              f       f                           / \
             / \     / \                          \ /
            f   f   f   f                          f
           / \ / \ / \ / \                        / \
           | | | | | | | |                        \ /
           a a a a a a a a                         a

Obviously, the left hand tree (generalized to depth n) has 2^n + m occurrences of
the symbols, whereas the right one has n symbol occurrences. The left term can be
directly encoded as the XML-encoded OpenMath object
<OMOBJ>
 <OMA>
  <OMA>
   <OMV name="f"/>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
  <OMA>
   <OMV name="f"/>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
   <OMA>
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
  </OMA>
  </OMA>
 </OMA>
</OMOBJ>

while the right hand one cannot.

========================= THE PROPOSAL ===============================

To remedy this, I propose to add an optional attribute 'id' to the elements OMA,
OMBIND, and OMATTR since they make up the term structure, and a new element 'OMR'
with the xlink [1] attributes for simple links. Thus the object above would have
the representation.

<OMOBJ>
 <OMA>
  <OMA id="t1">
   <OMV name="f"/>
   <OMA id="t11">
    <OMV name="f"/>
    <OMV name="a"/>
    <OMV name="a"/>
   </OMA>
   <OMR xlink:href="t11"/>
  </OMA>
 <OMR xlink:href="t1"/>
</OMOBJ>

Note that we do not need to introduce id attributes for OMV and OMS, since they
are empty and essentially identified by a name mechanism themselves; so replacing
them by an 'OMR' element would not help.

If we add 'id' attributes to enable structure sharing for non-empty elements, we
can enable structure sharing by the same mechanism for OMI, OMB, OMSTR, OMF,
(which carry content that can get arbitrarily big). 

I propose to leave the element OMATP and OME unchanged, since OMATP alone is not a
well-formed OM object, and I do not see the necessity of sharing OME.

An alternative to adding a dedicated 'OMR' element (this is actually the proposal
I presented in Cannes, and that is currently used in OMDoc [2]) is to add
'xlink:href' attributes to the elements OMA, OMBIND, OMATTR, OMI, OMB, OMSTR, OMF,
with the understanding that the presence of an xlink:href attribute in an empty
element would represent cross-reference, but this would make the DTD much weaker,
and moreover interfere with cross-linking (parallel markup) in MathML (I will
discuss this below).

I think that we should use the xlink [1] attribute xlink:href, since this is the
relevant W3C standard. Furthermore, I propose that we should use fixed (as
specified in the DTD and the XML Schema) xlink attributes 
- xlink:type="simple" (we are using simple links) and 
- xlink:show="embed" (the OMR element is to be replaced!). 

This way, xlink-aware processors like browsers would directly do the right thing,
if they do not have a special behavior for the OMR element. Since these elements
are fixed in the DTD and XML Schema, they do not need to be specified in the
objects themselves.

============= NECESSARY CHANGES TO THE STANDARD ===============================

Note that since I only propose to extend the XML encoding, we only need to
consider changes to the DTD and section 4.1 (pp. 14-20) of the OpenMath standard
[3]. All other parts of OpenMath (type systems,...) are untouched by the current
proposal, since the semantics does not change.

I will now sketch out the necessary changes to the standard.

4.1.1 needs no change
4.1.2 we need to change the Grammar and the DTD. 
  p. 20. we need to add  the declaration for the OMR element

    <!ELEMENT OMR EMPTY>
    <!ATTLIST OMR xlink:href CDATA #REQUIRED
                  xlink:type CDATA #FIXED 'simple'
		  xlink:show CDATA #FIXED 'embed'>

  furthermore, we need to add attribute list declarations for the elements
  OMA, OMBIND, OMATTR, OMI, OMB, OMSTR, OMF, like 
    
    <!ATTLIST OMA id ID #IMPLIED>

  finally, we need to extend the entity declaration for %omel;, so that it reads

   <!ENTITY % omel "OMS | OMV | OMI | OMB | OMSTR
                        | OMF | OMA | OMBIND | OME | OMATTR 
                        | OMR ">



  p. 21 we need to augment the grammar by the new element and the attributes,
  adding the following lines at the appropriate places

   hrefatt  -->  xlink:href ?S = ?S (" uriref " | ' uriref ')
   typeatt  -->  xlink:type ?S = ?S ("simple" | 'simple')
   showatt  -->  xlink:show ?S = ?S ("simple" | 'embed')
   idatt    -->  id ?S = ?S (" symbname " | ' symbname ')

   reference --> <OMR S hrefatt S ?typeatt S ?showatt/>
              |  <OMR S hrefatt S ?typeatt S ?showatt/>
              |  <OMR S ?typeatt S hrefatt S ?showatt/>
              |  <OMR S ?typeatt S ?showatt S hrefatt/>
              |  <OMR S ?showatt S hrefatt S ?typeatt/>
              |  <OMR S ?showatt S ?typeatt S hrefatt/>

   the rule for object must be extended by a 

 object -->   symbol
              variable  
              reference
	      <OMI S? ?idatt S?> S? integer S? </OMI S?>
	      <OMF S? ?idatt S fpdecatt S? /?
	      <OMF S? ?idatt S fphexatt S? /?
	      <OMSTR S? ?idatt S?> char </OMSTR S?>
	      <OMB S? ?idatt S?> base64 </OMB S?>
	      <OMA S? ?idatt S?> SC? object SC? objects SC? </OMA S?>
	      <OMBIND S? ?idatt S?> SC? object SC?
  	         <OMBVAR S?> SC? variables SC? </OMBVAR S?>
        	 SC? object SC? </OMBIND S?>
	      <OME S?> SC? symbol SC? objects SC? </OME S?>
	      <OMATTR S? ?idatt S?> SC? <OMATP S?> SC? attrs SC? </OMATP S?>
  	         SC? object SC? </OMATTR S?> 

 p.23 needs an  explanation of the 'OMR' element has to be given,
      concretely we should a paragraph at the end stating (in LaTeX for
      convenience):

      OpenMath integers, floating point numbers, character strings,
      byearrays, applications, binding, attribuitions can also be encoded
      as an empty {\tt{OMR}} element with an {\tt{xlink:href}} attribute
      whose value is the value of an id attribute of an OpenMath object of
      that type. The OpenMath element represented by this {\tt{OMR}}
      element is a copy of the OpenMath element pointed to in the
      {\tt{xlink:xref}} attribute. Note that the representation of the
      {\tt{OMR}} element is {\emph{structurally equal}}, but not identical
      to the element it points to. 

      For instance, the OpenMath object 

      application(application(application(f,a,a),
                              application(f,a,a)),
                  application(application(f,a,a),
                              application(f,a,a)))

      can be encoded in the XML encoding as either one of the XML encodings
      below (and some intermedidate versions as well).

<OMOBJ>                                <OMOBJ>
  <OMA>                                   <OMA>
    <OMA>                                    <OMA id="t1">
      <OMV name="f"/>                           <OMV name="f"/>
      <OMA>                                     <OMA id="t11">
        <OMV name="f"/>                            <OMV name="f"/>
        <OMV name="a"/>                            <OMV name="a"/>
        <OMV name="a"/>                            <OMV name="a"/>
      </OMA>                                    </OMA>
      <OMA>                                     <OMR xlink:href="t11"/>
        <OMV name="f"/>
        <OMV name="a"/> 
        <OMV name="a"/>
      </OMA>                                   </OMA>
      <OMA>                                     <OMR xlink:href="t1"/>
        <OMV name="f"/>
        <OMA>
          <OMV name="f"/>
          <OMV name="a"/>
          <OMV name="a"/>
        </OMA>
        <OMA>
          <OMV name="f"/>
          <OMV name="a"/>
          <OMV name="a"/>
        </OMA>
      </OMA>
     </OMA>
   </OMA>
</OMOBJ>                                 </OMOBJ>


4.1.3: p. 20, here we need to mention that the xlink namespace must be added 
    in the OMOBJ by the namespace declaration 
    xmlns:xlink='http://www.w3.org/1999/xlink'

concretely, the wording should change to:

In particular, if OpenMath is used with applications that use the XML
Namespace Recommendation [16] then they should ensure that OpenMath
elements are in the namespace http: www.openmath.org/OpenMath. This is most
conveniently achieved by adding the namespace declaration
xmlns="http:www.openmath.org/OpenMath" as an attribute to each OMOBJ
element in the document.  Furthermore, for any OM object that contains the
OMR element, we have to add the XLink namespace declaration 
xmlns:xlink="http://www.w3.org/1999/xlink".

========================= REFERENCES ===============================

[1] http://www.w3.org/TR/xlink/
[2] http://www.mathweb.org/omdoc
[3] http://monet.nag.co.uk/cocoon/openmath/standard/omstd.pdf
--
om at openmath.org  -  general discussion on OpenMath
Post public announcements to om-announce at openmath.org
Automatic list maintenance software at majordomo at openmath.org
Mail om-owner at openmath.org for assistance with any problems



More information about the Om mailing list