[om] A Proposal for extending OpenMath with structure sharing

Thu Apr 4 15:55:59 CEST 2002

On Thu, 4 Apr 2002 jhd at cs.bath.ac.uk wrote:
> On Wed, 3 Apr 2002, Andreas Strotmann wrote:
> > A)  interpret(ref(label)) = deref'(label) where deref' is dereferencing
> >          on the semantic layer
> >     (syntactic sharing => semantic sharing)
> >
> > or is it
> >
> > B)  interpret(ref(label)) = interpret(deref(label))
> >     (syntactic sharing does not imply semantic sharing necessarily)
> >
> > Note that B) is also the only choice that I can see is compatible with
> > avoiding changing the OpenMath Standard itself, since right now the
> > standard does not say anything at all about the semantics of common
> > sub-trees in an OM Object (the way I understand that document).
> I don't think we need to change the standard, anywhere except in OMR (or,
> following my suggestion, OMRSYN and OMRSEM), and OMR implies a change
> anyway.

Hmm, the way I understood Michael's proposal, he says there is actually no
need to change the OM standard in the sense of changing the definition of
OM Objects *even* for OMR (and I agree with him provided we choose
interpretation B instead, or OMR:=OMRSYN in your suggestion).

The reason is that the encoding process would use a standard OM Object as
input and simply output OMRs in cases where it detects syntactic sharing.
An OM application that understands Michael's encoding with OMRs then has
the choice when decoding an OMR whether to just retrieve a pointer to the
referenced OM Object (making it a DAG, which is licensed by the OM
standard) or whether to generate a copy of that OM Object (making it a
tree, which is just as legal).  When it chooses the latter way of handling
syntactic sharing, like Maple presumably would most of the time, it needs
to check if the syntactically shared sub-Objects can safely be assumed to
be semantically shared, too. (=> Michael's question of "equality")

Thus, I read Michael's proposal to say that OMR is just a piece of
syntactic compression sugar, just like the binary encoding's reference to
the internal tables of sharable information which have no equivalent on
the OM Object level at all.

By contrast, OMRSEM requires changes to the OM Object defs because they
are currently (by design, I think) entirely mute on the question of
semantic sharing except via variables and bindings.  Only because it does
keep quiet on this can the current XML OM encoding (without OMRSEM) be a
valid one in the sense we defined it in the OM Objectives (lossless
conversion going down and up again must be possible).

Not that I'm averse to adding OMRSEM (or rather, explicitly stating that
OM Objects are DAGs rather than trees, and then providing standard ways of
expressing semantic DAGs in the XML encoding).  We do need to remain
backwards compatible, though.

Therefore, I think the cleanest way to accommodate Michael's proposal is
to have a clear distinction somehow between OM Objects encoded in the
current XML syntax and those encoded in an updated/enhanced XML syntax.
The cleanest way to do that might be to require some kind of version
attribute for the OMOBJ tag if you support the enhanced XML syntax.

If we add semantic sharing, too, the same comment applies to an enhanced
binary encoding, and indeed to distinguishing OpenMath 1.0 Objects (trees)
and OpenMath 1.1 Objects (DAGs or even graphs) *as*such*.

And while I'm at it, I think we should perhaps specify the enhanced XML
encoding to be an XML application with a well-defined DTD and (as Michael
requests) with standard XLink ingredients for (syntactic) sharing.  The
"old" XML encoding should be kept around for those applications that don't
implement sufficient XML capabilities, as well as for legacy applications.

 -- Andreas

PS: The way I understand it, gzip cannot possibly do the potentially n ->
log n size reductions you can in principle get by syntactic sharing, I
suspect.  Compression works much better in a domain-specific mode. Also,
compression in this sense is an expensive process, both in time and space,
and one of the tenets for OM that we specified in the Objectives was that
information that is "expensive to compute" should be transmittable via OM.
Maple goes to extreme lengths to do syntactic sharing, but I know that
REDUCE, too, does a lot of that to control intermediate expression swell.
I would therefore very much like to have it in the specs at last.

--
om at openmath.org  -  general discussion on OpenMath
Post public announcements to om-announce at openmath.org
Automatic list maintenance software at majordomo at openmath.org
Mail om-owner at openmath.org for assistance with any problems