[Om3] Language Dictionaries (Was: Re: Initializing OM3 at 2013 Process)

Mon May 5 11:28:03 CEST 2014

Michael Kohlhase skrev 2014-05-02 19.28:
> Dear all,
> On 1.5.14 12:11, David Carlisle wrote:
[snip]
>>>
>>> A problem with this example is that it seems to be aimed at showing
>>> off the file format (which at this point must be considered very
>>> preliminary) rather than the underlying mechanism; the translations
>>> it performs are very trivial -- changing<OMNATS>  to<OMA><OMS
>>> cd="seqs" name="nats"/>  and so on could equally well be done by
>>> substring replacements! -- and the language extension achieved is
>>> highly uncompelling. Having special elements for straightforward
>>> combinations of OMA and OMS gains practically nothing, but the cost
>>> in increased language complexity is considerable.
>>
>> That was my initial thought as well, but perhaps more examples would be
>> more convincing.....
> I would be happy to do more examples once we agree that language
> extensions are an avenue worth exploring in principle.

It's certainly worth exploring, but it still remains to be seen how much it 
can manage. You should be prepared to at least work out the whole of your 
own extension proposal (sequences) to demonstrate that the approach is viable.

That said, I can here provide two examples of language extensions that could 
be handled through this kind of translations, provided the translation 
language is powerful enough. These would thus be positive examples.

1. More IEEE float formats.

OM currently only supports the 64-bit double format for floats, but the 
current version of the source IEEE standard defines floats of arbitrarily 
high precision (every multiple of 32 bits, if I recall correctly; there's a 
formula for how many bits go into the exponent and how many go into the 
mantissa). The OM standard could be extended to allow OMF with hex attribute 
values any multiple of 8 hexdigits long. 
(https://trac.mathweb.org/OM3/ticket/137)

A natural translation of some

   <OMF hex="0123456789ABCDEF0123456789ABCDEF"/>

(128 bits) to strict OM2 would be as some

   <OMA>
     <OMS cd="IEEEfloat" name="fromOMB"/>
     <OMB>ASNFZ4mrze8BI0VniavN7w==</OMB>
   </OMA>

The only tricky thing here is that <OMB> wants data base64-encoded, not 
hex-encoded. Is the translation engine up to this?

2. OMOBJs as mathematical objects

In mathematical logic, from Gödel and forth, it is a very important trick 
that mathematical formulae can be encoded as actual mathematical objects, 
and thus operated on or made claims about. Most of the time, it doesn't 
matter how you encode things as long as the encoding is effective and 
faithful, so most authors tend to express their proofs using an abstract 
"the encoding of the formula ..." notation. It traditionally looks like
$$
   \ulcorner f(x) \urcorner
$$
for "the constant term encoding the formula $f(x)$".

The "OMOBJs as mathematical objects" extension is to add an equivalent of 
the $\ulcorner ... \urcorner$ quoting mechanism to OM -- an element that, 
when placed around a valid omel yields a new omel that semantically is as 
atomic as an OMV or OMI. The element name I find most natural for this is 
OMOBJ, so my proposed schema extension would be

   omel |= element OMOBJ {omel}

For example the formula
$$
   6 < \ulcorner f(x) \urcorner
$$
(possibly "6 is less than the Gödel number of f(x)") would become
<OMOBJ>
   <OMA>
     <OMS cd="relation1" name="lt"/>
     <OMI>6</OMI>
     <OMOBJ>
       <OMA>
         <OMV name="f"/>
         <OMV name="x"/>
       </OMA>
     </OMOBJ>
   </OMA>
</OMOBJ>
in which neither f nor x occurs freely; an embedded OMOBJ is a constant.

An advantage of having this at the language level is that it can then 
interact sensibly with references. Concretely, one could have a formula
<OMOBJ>
   <OMA>
     <OMS cd="foo" name="evaluates_to"/>
     <OMOBJ>
       <OMA id="twoplustwo">
         <OMS cd="arith1" name="plus"/>
         <OMI>2</OMI>
         <OMI>2</OMI>
       </OMA>
     </OMOBJ>
     <OMR href="#twoplustwo"/>
   </OMA>
</OMOBJ>
that by referential transparency is equivalent to
<OMOBJ>
   <OMA>
     <OMS cd="foo" name="evaluates_to"/>
     <OMOBJ>
       <OMA id="twoplustwo">
         <OMS cd="arith1" name="plus"/>
         <OMI>2</OMI>
         <OMI>2</OMI>
       </OMA>
     </OMOBJ>
     <OMA>
       <OMS cd="arith1" name="plus"/>
       <OMI>2</OMI>
       <OMI>2</OMI>
     </OMA>
   </OMA>
</OMOBJ>
If quoted formulae are instead expressed using explicit constructor 
operations, then it would be syntactically different from the same formula 
not quoted, and thus references could not be used to share common parts.

That far the Standard Extension Proposal, now for the translation back to 
something strictly OM2, which boils down to having a bunch of constructor 
operations. Basically, all one needs is a content dictionary with 
application symbols for every OM element. The interpretation of one of these 
should be that it constructs the encoding of an omel from any pieces 
involved. Concrete rules could be

<rule>
   <OMOBJ>
     <OMA>
       <exprlist name="seq"> <expr name="whatever"/> </exprlist>
     </OMA>
   </OMOBJ>
   <OMA>
     <OMS cd="openmath" name="application"/>
     <iterate name="seq">
       <OMOBJ> <render name="whatever"/> </OMOBJ>
     </iterate>
   </OMA>
</rule>

<rule>
   <OMOBJ>
     <OMV  ??? >
     <!-- Don't know how to match against an attribute value here. -->
   </OMOBJ>
   <OMA>
     <OMS cd="openmath" name="variable"/>
     <OMSTR><render name="name"/></OMSTR>
   </OMA>
</rule>

What I suspect is the trickiest issue here is making sure that the 
translations apply recursively as intended. It is in these kinds of fine 
details that the small homegrown language tends to lose against the big 
established standard; the latter has had many pairs of eyes looking at the 
corner cases and ironing out the messy parts.

Lars Hellström