An OpenMath Float proposal
Bruce R Miller
miller at cam.nist.gov
Tue Jul 13 18:54:12 CEST 1999
[This may be a 2nd copy? the 1st didn't seem to go through]
The main body of this proposal is what I would ideally like to
see in OpenMath, but there are several options presented that
would preserve (more) compatibility with the current format.
Further, there are several points that bear discussion. Thus,
this is not presented as a `vote up or down' proposal, but for
further discussion.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I) OpenMath Float (Section 3.1)
Floating-point numbers represent (possibly) inexact real
numbers. Each such number has an optional precision attribute
which specifies how many digits are considered valid, or whether
it is to be considered as an exact (rational) number. OpenMath
Floats may also be specified as double precision numbers in
IEEE 754-1985[2] format.
An OpenMath float does not cover all possible notions of inexact
number, nor does it attempt to cover arbitrary exact real values
--- symbolic expressions involving symbols from an appropriate
CD should be used in such cases.
On reading an OpenMath float, an application is encouraged to
preserve the specified precision of the float, especially interface
libraries and programs which primarily act as `filters', writing
the processed OpenMath expressions out for archiving or further
processing. However, applications which lack multi-precision
float facilities (`bigfloats'), particularly numerical programs,
are permitted to truncate the numbers to their internal
representation. In such cases, IEEE 754-1985[2] is the preferred
arithmetic.
[*** NOTES ***
Comments invited, particularly on paragraph 3.
]
II) XML Encoding (Section 4.1.2)
Floating-point numbers are encoded using the OMF element in one
of two forms:
a) The XML-attribute hex [Note 1] provides 16 characters [0-9,A-F]
representing the internal format of an IEEE double precision
float, with the most significant byte first [Note 2]. For
example, <OMF hex="0123456789ABCDF"/> represents something.
b) The value may be expressed in base 10, using the common syntax
(-?)([0-9]+)?("."[0-9]+)?(e([+-]?)[0-9]+)?
in the content of the OMF element. The optional attribute
prec specifies the precision of the number, and may take on
of the following values:
n : ([0-9]+) n decimal digits of the mantissa are
assumed to be correct. n can be greater than the
length of the mantissa, trailing `0's are implied.
Conversely, more than n digits can be given in
order to assist in reconstructing a more faithful
internal (eg. binary) representation.
exact : an exact (rational) value is presented [Note 3]
asis : exactly the given digits are correct. [Note 4]
If the prec attribute is missing, asis is assumed
(or exact, or ? [Note 5])
Attribute dec is deprecated.
[*** NOTES ***
*1*: I would prefer the attribute IEEE over hex, but this would
break compatibility.
*2*: Am I misreading the current draft spec, or is the value
of the hex attribute to be given least significant byte first?
The binary encoding specifies most significant byte first!
*3*: This seems like a useful concept to represent, but there
certainly is NO expectation that an application has a
representation for it.
*4*: "asis" is obviously a poor name; is it even needed?
*5*: what should be the default?
]
III) Binary Encoding (Section 4.2.2)
[This adds to, rather than changes, the encoding of IEEE float]
float --> [3]{_}{_} ; ie. 8 bytes
| [9][prec][n] data:n ; 1 byte for prec & n
| [9+128]{prec}{n} data:n ; 4 bytes for prec & n
prec gives the number of decimal digits precision using 1 or
4 bytes.[Note 1]
The n bytes of data gives the number formatted in the common,
decimal, floating point format, then packed in what might be
called Augmented BCD [Note 2]. Each byte represents 2 nybbles,
with each 4 bit nybble corresponding to a character, as follows:
nybble <=> character
[0-9] <=> ['0'-'9']
A <=> +
B <=> -
C <=> .
[DE] <=> e [Note 3]
F <=> null (for unneeded lower 4bits of an odd length string)
[*** NOTES ***
*1*: The special precision values exact, asis,... could be
encoded by 0 or negative numbers -- except that [_] and {_}
are interpreted as unsigned! Sort this out after (and if!)
there is agreement on prec.
*2*: It is _trivial_ to convert between this format and an ASCII
string; the latter can be passed to library functions for
parsing. Further, this format uses only half the space of a
normal string.
*3*: D could represent a space if we wanted to preserve that.
]
IV) Binary Encoding of OMI (Section 4.2.2)
I would suggest that Augmented BCD be used in the binary encoding
of OM Integers, as well.
PRO: 1) The same function could be used for packing/unpacking
in OMI & OMF.
2) It requires HALF the space of the current encoding.
CON: 1) It is incompatible with any currently stored objects.
2) Numbers could only be given in base 10; NOT base 16
(the proposed scheme is _much_ more compact, though).
If this sub-proposal is not accepted, it may be better to encode
OMF using the plain `unpacked' ascii string, so as to reduce
confusion with multiple strange packing schemes.
--
--
bruce.miller at nist.gov
http://math.nist.gov/~BMiller/
More information about the Om
mailing list