[Om] On notation

Sat Nov 4 12:31:24 CET 2023

Summarising (and elaborating on) some points from a discussion Michael and I had after my seminar yesterday.

Michael was of the opinion that everything would be much easier for me if I would just use STeX.

I had certain reservations. If the aim was strictly for me to write math papers where the semantic content[*] was machine-readable, then indeed STeX would be better for this than the openmathcd package I've been working on, but that's not the main use-case for me at the moment. Also, many of the problems I talked about have to do with mathematical structures that in formalised logic would be part of a well-formed formula but in traditional publication is written out in words, often spanning several sentences — it's not clear to me that STeX helps with that (but I could be convinced otherwise).

Rather, I see OM primarily as a tool for letting computers speak math. There is a population of people who write programs for carrying out some advanced calculation, because it's simply too massive to do by hand. If the result is just a list of numbers then exporting the result is straightforward, but if the result of the calculation is rather some formula then things get messier. I've written such programs encoding their results as Maple input, as Mathematica input, as LaTeX, as XHTML, and probably other things still — it's quirky, fragile, and not at all pretty. A great thing about OM is that it provides exactly the level of detail these people need for their primary output: only the semantics, none of that messy prettyprinting or presentation.

A further selling point for us would be Open Research Data — increasingly it's not sufficient to just publish a paper, you must also make your research data available. If your research data consist of big symbolic expressions, then how can you make sure people will be able to parse them? You use OpenMath!

One stumbling point is finding all the symbols you need. Michael claimed writing new content dictionaries is easy in STeX, and that he himself has written hundreds. But if they don't show up in the big list on www.openmath.org, then how is anyone outside the Kohlhase academic lineage to know?

Another stumbling block is notation. It's nice that your program outputs results that are future-proof and unambiguously machine-readable, but you probably would like to read them yourself as well. That means you need to generate a presentation, and you may want to control the notation used in that. How does one do that? Last time I looked into the matter, there wasn't much of a system (apart from the collection of XSLTs on www.openmath.org that generate MathML from OMOBJs in our CDs, and it's not clear to me if that is meant to be adaptable).

Michael claims there are not one system, but two: an STeX one, and an MMT one. He expresses regret that there is no common standard (but he only has so many hours in a day).

From my perspective, it is news that there even IS some system — presuming that it is generic and comprehensible — and I consider standardisation to be a minor issue (that may well solve itself, by some system emerging as a de facto standard). I would LOVE to see a talk on how it works and how one uses it to this end: given an OMOBJ and some source of notation specifications, generate a presentation of that OMOBJ in some *ML or LaTeX.

This needn't even be a drain on valuable professor time, but could be handled as an exercise (in preparing and performing academic presentations) for some student who is already familiar with one system: either do a Zoom talk/tutorial, or produce a write-up for posting on www.openmath.org.

Lars Hellström

[*] As a side remark, I note that there are developments in public administration vis-a-vis accessibility that over time could grow into requirements that formulae in official documents must have machine-readable semantic content. I'm extrapolating, but last month when the Swedish national library requested comments on some future guidelines for open access, they were very explicit that replies in PDF had to be "accessible PDF" (which after some unravelling turned out to mean tagged PDF, to aid speech synthesis). I can certainly picture bureaucrats imposing similar requirements on grant proposals.