[Om] Content Dictionary for Linked Data with RDF

Fri Mar 2 21:26:07 CET 2012

Hi Andrew,

2012-03-02 17:33 Andrew Robbins:
> This is my first post to the OM list, so bear with me.
> I have a few comments about OpenMath and RDF integration.

welcome – good to hear from you!

@All: While writing the text below I recalled that there have been some 
relevant discussions about RDF on the www-math at w3.org list (i.e. the 
list of the W3C Math WG).  These links may be interesting:

http://lists.w3.org/Archives/Public/www-math/2010Jul/thread.html#msg19 
(How to use CDBase URIs in Content MathML?)

http://lists.w3.org/Archives/Public/www-math/2011Oct/thread.html#msg4 
(Namespaced RDFa attributes make RDFa more compatible with math markup)

> (1)
> I've been thinking about this for a long time, but so far I haven't
> done anything about it. In 2009, I made a blog post about it:
> http://straymindcough.blogspot.com/2009/06/semantic-mathml.html in
> which I discuss the opposite direction (OM/SCMML to RDF) than we seem
> to be discussing here (RDF to OM/SCMML). That post was supposed to be
> a discussion, apologies to Christoph for not responding.

BTW meanwhile I have given your work some coverage here (such search for 
"Robbins"):

@article{Lange:OntoLangMathSemWeb,
     title = {Ontologies and Languages for Representing Mathematical 
Knowledge on the Semantic Web},
     author = {Christoph Lange},
     year = {2012},
     journal = {Semantic Web Journal},
     publisher = {IOS Press},
     pubstate = {inpress},
     url = 
{http://www.semantic-web-journal.net/content/ontologies-and-languages-representing-mathematical-knowledge-semantic-web},
     }

I still like your approach, and what I particularly like about it is 
that you reuse the URIs of the OpenMath symbols.

> (2)
> Identifier mappings have been, and always will be, nontrivial. We know
> all OMSymbols have a standard map to URI, and that any QName has at
> least 2 semi-common maps to a URI: {ns}/{local} and {ns}#{local}.

In RDF you fortunately don't have to think about QNames.  RDF is really 
just about URIs, and when prefixes are around, they are just syntactic 
sugar and lead to concatenating a namespace URI and a local name to 
obtain a fully qualified name.

QNames are an XML idiosyncrasy.

> Mapping from a URI to a QName is much easier, just look for / or # and
> reverse the process. Mapping a URI to an OMSymbols might be as simple
> as Christoph suggests, but this ignores URNs which do not have any /
> characters. I see two possible solutions to this: (3) special-case
> RDF, or (4) extend OM semantics to match SCMML3.
>
> (3)
> If we are going to treat RDF as a special case, then we should include
> all URIs traditionally associated with the rdf: and rdfs: prefixes
> with a single CD called 'rdf' (so we could use<OMS cd="rdf"
> name="type"/>  without a nonstandard cdbase). Also, if there is any
> interest in my next suggestion (4), then I would recommend keeping
> literal_lang, literal_type (and only those two) in the 'rdf' CD, and
> move things like resource sets and value queries to a CD called 'rdf2'
> or something similar. I like the idea of literal_lang for encoding
> (@lang) and literal_type for encoding (^^type), but I would encourage
> more adventurous and pathological examples.

I agree (as in my original mail to Ken) that having a symbol 
literal_lang is necessary, as RDF itself does not have a name for this 
feature.

But literal_type is not necessary, as there is rdf:datatype for it.  OK, 
there is not strictly an URI for it, as there is e.g. for rdf:type, but 
it exists as an XML attribute of the RDF/XML syntax.

> I can come up with a few
> examples, such as when the subject/object appear more than once, etc.
> How would each graph be converted into OM? Do all graphs correspond do
> a set of OMATTRs? In my opinion, blank nodes should be treated
> differently, converting to values rather than references, but this
> cannot be done in the most general case because of repeated predicates
> and repeated objects.

Abbreviations of repetitions, same as namespace prefixes, are just 
syntactic sugar offered by some serializations of RDF.  They are not 
part of the RDF semantics.  They are of course nice to have, but for 
getting started with representing RDF in OM, it is sufficient to cover

* URIs
* literals (plain literals, literals with language tag, datatyped literals)
* blank nodes
* single triples
* graphs = sets of triples
* possibly (going beyond the RDF standard, but it's a widely accepted 
feature soon to be standardized) named graphs, i.e. the possibility to 
assign a URI to a graph

> I don't have a general solution, but I look
> forward to hearing discussions on this.
>
> (4)
> In Strict Content MathML3, all OMSymbols can be written<math
> cdgroup="{cdgroup file associated with symbol's cdbase}"
> xmlns="..."><csymbol cd="{cd}">{name}</csymbol></math>  so there is no
> mismatch there,

A side note on math/@cdgroup: In principle yes, but IMHO the "cdgroup" 
feature is fatally flawed when it comes to linked data style publishing, 
i.e. making it as easily as possible for stupid clients to look up 
information about, here, symbols, in a large set of, here, OM/MathML 
data.  (Note that I'm referring to linked data as a general paradigm, 
not about RDF specifically.)  I once commented on this as follows (see 
the www-math threads linked above for further details if interested):

--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---
{Note that, in contrast to \identifier{OMS}, \identifier{csymbol} only 
supports the \attribute{cd} and \attribute{name} attributes but not 
\attribute{cdbase}.  Instead, there can be a \identifier{math/@cdgroup} 
attribute for the whole \mathml object, which points to an \openmath CD 
group file that maps CD names to the corresponding CD \uris 
\cite[chapter~4.2.3]{W3C:MathML3:biblatex}.  This mechanism has to be 
used for any object using symbols from CD bases other than the default 
\url{http://www.openmath.org/cd}, unless the document format embedding 
the \mathml object defines a different mechanism – which, so far, only 
\omdoc does.  In a linked data setting, where objects possibly use 
symbols from many different CDs from  distributed sources, that creates 
the challenge of where and how to provide such a CD group file.}
--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---

> but the csymbol element definition (even in the Strict
> Content profile) specifies an href attribute

But @href has a different purpose, hasn't it?  A non-semantic hyperlink 
to click on, IIRC.

> (and a definitionURL
> attribute in the full profile), which allows for encoding any URI as
> is.

This is why I'd consider the problem of encoding RDF in MathML almost 
solved.  definitionURL not being part of the strict profile is possibly 
just due to the MathML authors not having thought about RDF 
compatibility, or …

> Semantically, this should also denote an OMSymbol, but how to do
> so is not obvious.

… because they deliberately wanted to distinguish mathematical symbols 
from "arbitrary things that have URIs" (where the latter is roughly 
RDF's view on "resources").

My personal view is that the things OM or MathML likes to call symbols 
could also represent arbitrary things, which is why I advocate 
liberalizing the OMSymbol URI format (just like csymbol/@definitionURL 
does).

> Since OM has had 3-field semantics since the
> beginning, concatenating an OMSymbol's fields is not an option.
>
> Consider if special semantics are attached to the case when the cd
> attribute is specified, but is empty. Consider also modifying the
> section of the OM standard "Canonical URIs for Symbols" to the effect
> that if {cd} is empty, then the canonical URI for the OMSymbol is:
>
>    URI = cdbase-value + name-value

Interesting suggestion!  This sounds really appropriate, because a CD 
base is something like a namespace URI (but note that not everyone would 
agree with this view; see David Carlisle's post on www-math: 
http://lists.w3.org/Archives/Public/www-math/2010Jul/0027.html), which 
is the first of the two (instead of three) components that make up an 
RDF URI when thinking in terms of "namespace plus local name".

> otherwise, {cd} is non-empty, and the canonical URI is constructed
> according to OM-2.0 and previous.

But then wouldn't we run the risk of the translation function not having 
a well-defined inverse?  I could easily come up with a URI u for which

OMStoURI(URItoOMS(u)) != u

or similarly an OMS s=(cdbase,cd,name) for which

URItoOMS(OMStoURI(s)) != s

I think with this …

> The reverse direction could then
> make the assumption that it was constructed with a cdbase that ends in
> some non-NCName character, which can be used to split the URI into an
> OMSymbol.

… you are trying to rule out such problems, but it's not yet exactly 
clear to me how.

Cheers,

Christoph

-- 
Christoph Lange, Jacobs University Bremen
http://kwarc.info/clange, Skype duke4701

→ SePublica Workshop @ ESWC 2012.  Crete, Greece, 27/28 May 2012.
   Deadline 29 Feb.  http://sepublica.mywikipaper.org
→ I-SEMANTICS 2012.  Graz, Austria, 5-7 September 2012
   Abstract Deadline 2 April.  http://www.i-semantics.at