[Om] Mathematical Vernacular in formulae

Tue Jan 25 08:46:02 CET 2011

Dear all,

Here is an issue that has been bothering me for a while. I am asking you 
for your advice.

as you know, we use OpenMath for content markup of formulae in natural 
language in OMDoc. For instance, we write something like

<p>This is the way we write a sum:
<OMOBJ>
<OMA>
<OMS cd="arith1" name="plus"/>
<OMV name"x"/>
<OMI>1</OMI>
</OMA>
</OMA>,
simple isn't it?
</p>

This is then transformed to parallel Math Markup using a user-adaptable 
presentation process. We can even generate the markup above from a 
special version of LaTeX... But you probably know all this already.

NOW comes the problem. In mathematical texts we often find constructions 
that have natural language _inside_ mathematical formulae. For instance, 
I found (using TeX for simplicity; what a pity we cannot use MathML in 
e-mails yet)

$\{\langle a,b\rangle\bigl|\text{$a\in T$ and $P$ terminates for $a$ with $b$}\}$

There are multiple other examples and you have probably seen many of 
them. But how do we mark this up in OpenMath (should we at all?).

For the moment, we came up with the following markup

<om:OMOBJ>
   <om:OMA>
     <om:OMS cd="sets-introduction" name="setst"/>
     <om:OMA>
       <om:OMS cd="sets-operations" name="tup"/>
       <om:OMV name="a"/>
       <om:OMV name="b"/>
     </om:OMA>
     <om:OMSTR>
      <om:OMOBJ>
       <om:OMA>
	<om:OMS cd="sets-introduction" name="inset"/>
	<om:OMV name="a"/>
	<om:OMS cd="terms" name="terms"/>
	</om:OMA>
       </om:OMA>
       </om:OMOBJ>
	and
       <om:OMOBJ><om:OMV name="P"/></om:OMOBJ>
	terminates for
       <om:OMOBJ><om:OMV name="a"/><om:OMSTR>
	with
       <om:OMOBJ><om:OMV name="b"/></om:OMOBJ>
       </om:OMSTR>
   </om:OMA>
</om:OMOBJ>

It (ab-)uses an OMSTR element for the text element and embeds OpenMath 
Objects into it. This is clearly not right, since OMSTR was not meant 
for such usages. In MathML we would be slightly better off, we could 
just escape presentation MathML (and you may want to say that this is 
the correct thing to do). This would be

<math>
   <apply>
     <csymbol cd="sets-introduction">setst</csymbol>
      <apply>
       <csymbol cd="sets-operations">tup</csymbol>
       <ci>a</ci>
       <ci>b</ci>
     </apply>
     <mtext>
      <math>
       <apply>
	<csymbol cd="sets-introduction">inset</csymbol>
	<ci>a</ci>
	<csymbol cd="terms">terms</csymbol>
	</apply>
       </apply>
       </math>
	and
       <math><ci>P</ci></math>
	terminates for
       <math><ci>a</ci></math>
	with
       <math><ci>b</ci>></math>
       </mtext>
   </apply>
</math>

which is slightly more palatable semantically, since it does not treat 
the natural language as a string.

The correct (if very tedious) thing to do is probably something like
<math>
<apply>
<csymbol cd="sets-introduction">setst</csymbol>
<apply>
<csymbol cd="sets-operations">tup</csymbol>
<ci>a</ci>
<ci>b</ci>
</apply>
<sematnics>
<mtext>
<math><share src="f1"/></math>
and
<math><share src="f2"/></math>
terminates for
<math><share src="f3"/></math>
with
<math><share src="f4"/></math>
</mtext>
<annotation-xml encoding="Content-MathML"/>
<apply>
<csymbol cd="logic1">and</csymbol>
<apply id="f1">
<csymbol cd="sets-introduction">inset</csymbol>
<ci>a</ci>
<csymbol cd="terms">terms</csymbol>
</apply>
<apply>
<ci>terminates-with-on</ci>
<ci id="f2">P</ci>
<ci id="f3">a</ci>
<ci id="f4">b</ci>>
</apply>
</apply>
</semantics>
</apply>
</math>

For this we are using a variant of parallel markup with the new <share> 
element in OpenMath3 (which corresponds to <OMR> elemen) since we want 
to keep to content markup as far as possible.

Tell me what you think; how should we deal with such situations? Maybe 
OpenMath should have a (restricted) way to escape to mathematical 
vernacular into formulae via an OMNL element that allows to encapsulate 
natural language (i.e. that could be used where OMSTR was abused in the 
first example)?

best,

Michael