An OpenMath Float proposal

Tue Jul 13 18:54:12 CEST 1999

[This may be a 2nd copy? the 1st didn't seem to go through]

The main body of this proposal is what I would ideally like to
see in OpenMath, but there are several options presented that
would preserve (more) compatibility with the current format.
Further, there are several points that bear discussion.  Thus,
this is not presented as a `vote up or down' proposal, but for
further discussion.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I) OpenMath Float (Section 3.1)

  Floating-point numbers represent (possibly) inexact real
  numbers.  Each such number has an optional precision attribute
  which specifies how many digits are considered valid,  or whether
  it is to be considered as an exact (rational) number.  OpenMath
  Floats may also be specified as double precision numbers in
   IEEE 754-1985[2] format.

  An OpenMath float does not cover all possible notions of inexact
  number, nor does it attempt to cover arbitrary exact real values
   ---  symbolic expressions involving symbols from an appropriate
  CD should be used in such cases.

  On reading an OpenMath float, an application is encouraged to
  preserve the specified precision of the float, especially interface
  libraries and programs which primarily act as `filters', writing
  the processed OpenMath expressions out for archiving or further
  processing.  However, applications which lack multi-precision
  float facilities (`bigfloats'), particularly numerical programs,
  are permitted to truncate the numbers to their internal
  representation.  In such cases, IEEE 754-1985[2] is the preferred
  arithmetic.

  [*** NOTES ***
    Comments invited, particularly on paragraph 3.
  ]

II) XML Encoding (Section 4.1.2)

  Floating-point numbers are encoded using the OMF element in one
  of two forms:
   a) The XML-attribute hex [Note 1] provides 16 characters [0-9,A-F]
      representing the internal format of an IEEE double precision
      float, with the most significant byte first [Note 2].  For
      example, <OMF hex="0123456789ABCDF"/> represents something.
   b) The value may be expressed in base 10, using the common syntax
         (-?)([0-9]+)?("."[0-9]+)?(e([+-]?)[0-9]+)?
      in the content of the OMF element.  The optional attribute
      prec specifies the precision of the number, and may take on
      of the following values:
         n     : ([0-9]+) n decimal digits of the mantissa are
                 assumed to be correct. n can be greater than the
                 length of the mantissa, trailing `0's are implied.
                 Conversely, more than n digits can be given in
                 order to assist in reconstructing a more faithful
                 internal (eg. binary) representation.
         exact : an exact (rational) value is presented [Note 3]
         asis  : exactly the given digits are correct. [Note 4]
      If the prec attribute is missing, asis is assumed
      (or exact, or ? [Note 5])

  Attribute dec is deprecated.

  [*** NOTES ***
   *1*: I would prefer the attribute IEEE over hex, but this would
        break compatibility.
   *2*: Am I misreading the current draft spec, or is the value
        of the hex attribute to be given least significant byte first?
        The binary encoding specifies most significant byte first!
   *3*: This seems like a useful concept to represent, but there
        certainly is NO expectation that an application has a
        representation for it.
   *4*: "asis" is obviously a poor name; is it even needed?
   *5*: what should be the default?
   ]

III) Binary Encoding (Section 4.2.2)

  [This adds to, rather than changes, the encoding of IEEE float]

  float --> [3]{_}{_}               ;  ie. 8 bytes
         |  [9][prec][n] data:n     ;  1 byte for prec & n
         |  [9+128]{prec}{n} data:n ;  4 bytes for prec & n

   prec gives the number of decimal digits precision using 1 or
   4 bytes.[Note 1]

   The n bytes of data gives the number formatted in the common,
   decimal, floating point format, then packed in what might be
   called Augmented BCD [Note 2].  Each byte represents 2 nybbles,
   with each 4 bit nybble corresponding to a character, as follows:
      nybble  <=>  character
      [0-9]   <=>  ['0'-'9']
      A       <=>  +
      B       <=>  -
      C       <=>  .
      [DE]    <=>  e     [Note 3]
      F       <=> null (for unneeded lower 4bits of an odd length string)

  [*** NOTES ***
   *1*: The special precision values exact, asis,... could be
        encoded by 0 or negative numbers -- except that [_] and {_}
        are interpreted as unsigned!  Sort this out after (and if!)
        there is agreement on prec.
   *2*: It is _trivial_ to convert between this format and an ASCII
        string; the latter can be passed to library functions for
        parsing. Further, this format uses only half the space of a
        normal string.
   *3*: D could represent a space if we wanted to preserve that.
   ]

IV) Binary Encoding of OMI (Section 4.2.2)

    I would suggest that Augmented BCD be used in the binary encoding
    of OM Integers, as well.
    PRO: 1) The same function could be used for packing/unpacking
            in OMI & OMF.
         2) It requires HALF the space of the current encoding.
    CON: 1) It is incompatible with any currently stored objects.
         2) Numbers could only be given in base 10; NOT base 16
            (the proposed scheme is _much_ more compact, though).

    If this sub-proposal is not accepted, it may be better to encode
    OMF using the plain `unpacked' ascii string, so as to reduce
    confusion with multiple strange packing schemes.

-- 
--
bruce.miller at nist.gov
http://math.nist.gov/~BMiller/