[Om3] Language Dictionaries (Was: Re: Initializing OM3 at 2013 Process)

Wed Apr 30 09:23:37 CEST 2014

Michael Kohlhase skrev 2014-04-23 15.24:
> Dear all,
>
> In thinking about OpenMath3, we have problem that there are some
> language extension proposals, but at the same time, we have the problem
> that we have to keep MathML3 compatibility.

No, we don't (as far as OM3 is concerned); that Unicode 1.1 became ISO/IEC 
10646-1:1993 didn't prevent the creation of Unicode 2.0 (which introduced 
notable features such as surrogate pairs). It may still be desirable to stay 
compatible with MathML3, but it cannot be an a priori limit on the evolution 
of OpenMath.

> This has been bothering me,
> but I think I have the beginnings of a solution. I would like to submit
> the blue note attached to the discussion of our committee.

I have to admit that at first sight, this made me suspect ulterior motives. 
No activity for five months, and then the president suddenly drops a 
polished-looking note "for discussion" -- is the committee being railroaded? 
But no, once past the first page it becomes clear that this suggestion is 
indeed barely hatched, and definitely at a stage that invites to _genuine_ 
discussion. I find the polished look a bit curious (why bother so much about 
the appearance of something so preliminary?), as I would probably have gone 
for a plain (but longish) email instead, but different communities have 
different traditions. Maybe it needs to be this polished to get a reaction 
within the OM community?

> Please give me feedback,

Right, time for some quoting! (Sorry if the indentation is wrong somewhere; 
the newlines didn't make it through when I copied from the PDF.)

> <OMLangExt xmlns="http://www.openmath.org/OpenMathCD">

Surely you don't propose to bundle this with the CD namespace? The 
differences are considerable, so it should have its own.

>  <OLEName>seq</OLEName> <OLEDate>2014-04-22</OLEDate>
>  <OLEStatus>experimental</OLEStatus>
>  <OLEVersion>1</OLEVersion>
>  <OLERevision>1</OLERevision>

Not even the few elements that are common have the same names...

>  <schemaext>
>    OMNATS = element OMNATS {omel}
>    OMNTH = element OMNTH {omel,omel}
>    omel |= OMNTH | OMNATS
>  </schemaext>

Mixing XML and non-XML syntax for something that actually has an XML 
syntax?? It certainly enhances readability, but no, that's just wrong.

>  <equality>
>    <OMBIND>
>      <OMS cd="quant1" name="forall"/>
>      <OMBVAR><OMV name="n"/><OMV name="m"/></OMBVAR>
>      <OMA><OMS cd="logic1" name="implies"/>
>        <OMA><OMS cd="relation1" name="gt"/>
>          <OMV name="m"/>
>          <OMV name="n"/>
>        </OMA>
>        <OMA><OMS cd="relation1" name="eq"/>
>          <OMNTH>
>            <OMI>n</OMI>

I presume you meant <OMV name="n"/>?

>            <OMNATS><OMI>m</OMI></OMNATS>
>          </OMNTH>
>          <OMI>n</OMI>
>        </OMA>
>      </OMA>
>    </OMBIND>
>  </equality>

So the point of this part is effectively to state FMPs (what about CMPs?) 
for the new elements, since there is no separate CD in which to state those.

Many FMPs are indeed equalities, but not all, so it feels odd to name that 
section <equality>. The one class of properties of these symbols that 
definitely are equalities -- that of a new element and its rule based 
translation -- isn't stated here, but in the subsequent <translation> 
section! (An additional quirk is that this latter type of equality probably 
isn't the colloquial relation1#eq, but an absolute "there should be 
absolutely no way within the formal system to tell the difference between 
these two" invariance under substitution equality.)

I understand EdNote2 as saying that this particular FMP is superfluous, as 
it would anyway be implied by a similar FMP in the argseq CD. I tend to 
agree, but note that could well be properties of new notation that isn't as 
natural to state for ordinary symbols.

>  <translation cd="argseq">

Now this is the really heavy part of the proposal, but also that part which 
is most vaguely explained. Is this supposed to be XSLT, or something 
homegrown inspired by the same?

>    <rule>
>      <OMNTH>
>        <expr name="n"/>
>        <exprlist name="seq"> <expr name="elt"/> </exprlist>
>      </OMNTH>
>      <OMA>
>        <OMS cd="seqs" name="nth"/>

Does that cd="argseq" attribute on the translation element even mean 
anything? It appears that the symbols that the rules are generating rather 
reside in the seqs CD.

>        <render name="n"/>
>        <iterate name="seq">
>          <render name="elt"/>
>        </iterate>
>      </OMA>
>    </rule>
>    <rule>
>      <OMNATS>
>        <expr name="n"/>
>      </OMNATS>
>      <OMA>
>        <OMS cd="seqs" name="nats"/>
>        <render name="n"/>
>      </OMA>
>    </rule>
>  </translation>
> </OMLangExt>

A problem with this example is that it seems to be aimed at showing off the 
file format (which at this point must be considered very preliminary) rather 
than the underlying mechanism; the translations it performs are very trivial 
-- changing <OMNATS> to <OMA><OMS cd="seqs" name="nats"/> and so on could 
equally well be done by substring replacements! -- and the language 
extension achieved is highly uncompelling. Having special elements for 
straightforward combinations of OMA and OMS gains practically nothing, but 
the cost in increased language complexity is considerable.

What would be a more interesting example is a set of translations that cover 
the whole sequences proposal. Inventing translations for OMNTH and OMNATS is 
one thing, but OMSEQ and OMSV are much tougher customers (at least if one 
wants to achieve the intended semantics; it may be that the above already 
gets it wrong with OMNATS). It is not at all obvious that you would succeed. 
And is there a proposed standard extension other than the sequences one for 
which this OMLangExt mechanism would even apply? I don't recall any example 
of any other introducing new XML elements, so how does one deal with other 
language extensions?

This last point _could_ be taken as an argument in favour of the above with 
respect to another matter, namely the organisation of data within the 
OMLangExt. As it is, the schemaext, equality, and translation sections are 
fairly close to being three separate files bundled into one; they are 
syntactically somewhat disparate. Content dictionaries instead organise data 
per symbol, which often has the advantage of keeping related pieces of 
information close together. Rewriting the above example in this way (not 
fixing any of the details remarked about) would produce something like

<OMLangExt xmlns="http://www.openmath.org/OpenMathCD">
  <OLEName>seq</OLEName> <OLEDate>2014?04?22</OLEDate>
  <OLEStatus>experimental</OLEStatus>
  <OLEVersion>1</OLEVersion>
  <OLERevision>1</OLERevision>

  <OLEDefinition>
    <Name>OMNTH</Name>
    <schemaext>
      omel |= element OMNTH {omel,omel}
    </schemaext>
    <rule>
      <OMNTH>
        <expr name="n"/>
        <exprlist name="seq"> <expr name="elt"/> </exprlist>
      </OMNTH>
      <OMA>
        <OMS cd="seqs" name="nth"/>
        <render name="n"/>
        <iterate name="seq">
          <render name="elt"/>
        </iterate>
      </OMA>
    </rule>
  </OLEDefinition>

  <OLEDefinition>
    <Name>OMNATS</Name>
    <schemaext>
      omel |= element OMNATS {omel}
    </schemaext>
    <rule>
      <OMNATS>
        <expr name="n"/>
      </OMNATS>
      <OMA>
        <OMS cd="seqs" name="nats"/>
        <render name="n"/>
      </OMA>
    </rule>
    <FMP>
      <OMBIND>
        <OMS cd="quant1" name="forall"/>
        <OMBVAR><OMV name="n"/><OMV name="m"/></OMBVAR>
        <OMA><OMS cd="logic1" name="implies"/>
          <OMA><OMS cd="relation1" name="gt"/>
            <OMV name="m"/>
            <OMV name="n"/>
          </OMA>
          <OMA><OMS cd="relation1" name="eq"/>
            <OMNTH>
              <OMI>n</OMI>
              <OMNATS><OMI>m</OMI></OMNATS>
            </OMNTH>
            <OMI>n</OMI>
          </OMA>
        </OMA>
      </OMBIND>
    </FMP>
  </OLEDefinition>
</OMLangExt>

It is not clear that one is better than the other, but both are possible.

That's all I have on this matter right now.

Lars Hellström