[Om] Adding DLMF links to CDs [Re: How to translate csymbol/@definitionURL]
Paul Libbrecht
paul at activemath.org
Sun Aug 29 00:10:34 CEST 2010
Le 28 août 2010 à 14:34, Lars Hellström a écrit :
> Paul Libbrecht skrev:
>> Well, let's continue the lesson... applying it to canonicalization for search purposes.
>
> Where of course canonisation of data may turn out to be a too expensive operation to be applied to the entire database being searched, and other methods must be used to sieve it first, but that's true regardless of how you decide equality.
No performance issue here... canonicalization would be applied at "analysis" time:
- on each query terms at query time
- on each database's terms at indexing time (once a document changes)
>> It's quite human and I have the impression we're waiting on the invasion of rewriting systems:
>
> They have been around for ages... A problem publicity-wise is that when it was new then computers were too small to fit much anything in memory.
But they still don't seem to enjoy a super visibility. Or?
>> - uniqueness is the holy grail to be reached, I don't think it can be in any way *ensured*. I had the impression rewriting systems were on a safer side
>
> Uniqueness is indeed not ensured by the definition, but something one strongly desires if the point of the exercise is to find the canonical form of some given expression. (Generative grammars may also be viewed as rewriting systems, in which case uniqueness would only hold for singleton languages.) Asking questions about arbitrary rewriting systems runs straight into the general word problem, but questions asked about specific rewriting systems may well be possible to address computationally.
Maybe that's the problem: uniqueness is the grail that involves a "perfect" transmission of the knowledge of the "index guru" to the search-engine-programme.
Such perfectness is not achievable at all in word-based search engines where the precision and recall have a mean of around 25% in classical competitions.
> Normally, you first want to prove termination (e.g. that you avoid silly rewrite cycles and rewrites that will forever make the result longer);
Seems necessary indeed, though, again... a strategy such as "ah well, this one is too complex to get a smart treatment" is quite safe.
> many tools and techniques for this are available (cf. http://www.floc-conference.org/WST-home.html). Then checking whether all critical pairs satisfy the diamond property will typically (it depends slightly on what kind of things you allow as rules) yield an *algorithm* that for any finite rewrite system either proves that it is complete (that normal forms with respect to it are unique) or gives an example of an expression that can be rewritten to two different normal forms.
That seems to be way too theoretic for the practical case.
I would wish statistical approaches...
paul
More information about the Om
mailing list