# OM Floats

Richard J. Fateman fateman at cs.berkeley.edu
Wed Jul 7 00:27:05 CEST 1999

```>> (RJF) Since the double-precision semantics appears to be defective
>> all by itself, I'm neutral on this.

>(David) Can you expand on this point. You may be misled because of the rather
>selective way I quoted from the current draft, or there may be a real
>problem that you are hightlighting, I can not tell.

It seems to me there are two parts of the specification for a number.

1. The value of the number.  In particular, 2, 2.0 2.0000 4/2, 16/8,
1*2^1, X2, etc. could be different strings representing the SAME number.

If OM is unable to specify unambiguously and efficiently the value
of a rational number, then you should all go home and work on this.
(Note that in general exact REAL numbers cannot be specified, though
nearby rational numbers cause no problems.)

2. The range/precision/radix encoding of the number.
For example, IEEE binary single is shorthand for a particular parameter set.
So is IEEE double.  IEEE extended sets only a MINIMUM precision, not
a maximum.

This could be specified by a short list of parameters.  This list
of parameters would be general enough to include any radix, any
exponent length, any precision, and maybe even rounding modes etc.
(though I think of those as being attributes of the arithmetic,
not the data)

If you want to represent IEEE infinities and NaNs, then you may wish
to revisit this. It may be a mistake to use floats in decimal or hex at all.

AFter all, unless the host machine chooses to use floats
for these values, why should OM insist?

THESE ARE NOT RATIONAL NUMBERS.

In the massively richer environment of OM, there should be ways of
dealing with these issues that are FAR FAR better.  You can have any
number of infinities (real, signed, complex with arguments) and any
number of NaNs.  In fact, all your favorite symbols like x, y,
z... they are NaNs.

In the paucity of representation of the scientific computing world
you have 64 bits in your value universe.  That's why NaNs help.
For OM, where numbers seem to be represented by about a zillion
characters

<yo, man, here comes a number> dec=2 <bye, man, that was a number>

you can hardly claim you need this for compactness.

Now if you want to convey exactly the same NaN from one machine
to another, be warned that a NaN can have encoded information that
is specific to a machine type, and in fact to a process loaded at
a specific memory location.  E.g. it may encode the process status
word and program counter of the process that produced an overflow.

If you need this info, I suggest you use "64 bit binary quantity"
encoding.

>(David)In particular I am a bit confused by your comment

>> (RJF)At some point the real numbers
>> 2.0
>> and
>> 1.99999999999999999999999999999999999999999999999999...
>> become indistinguishable.  When does that happen?

>as I thought that the main motivation for allowing arbitrary precision
>real numbers was to ensure that for systems where it mattered, those
>two, for any finite length of 9s would be distinguishable.

Exactly not the case.  As Bruce Miller points out,
To have the precision of a number depend on the number of
decimal digits you happen to write down is a terrrrrible rule.

It contributes to major lossage in that it makes it impossible
to explain Mathematica's arithmetic rationally.  It makes binary to
decimal conversion impossible, in general.

>This
>behaviour would be distinct from the behaviour of the current
>OMF which is a fixed double precision representation.

>> I don't follow everything here, but unless you have specified
>> algorithms for binary-to-decimal conversion, you haven't specified
>> what any decimal encoding means, exactly

>Which is a reasonable point. Since the libraries that exist do accept
>floats in decimal representation, there does exist a defacto standard
>algorithm that is being used, perhaps the standard should say what that
>is.

There is an accuracy requirement in IEEE 754, but this is probably
wrong to use if you are in an environment in which mathematical
accuracy is paramount, and longer numbers are expected to be
available.

I suggest that if you want to have a useful floating representation
that you specify data something like this:

(pardon my ignorance of OM. I will write in lisp again)

(float-type=:IEEE-double, values= signed hex-pairs)
(array [2,2]
(2 4) (-3 5)    ;; comments.  2x2^4 or 32.0; -3*2^5 = -128.0
(0 0) (1  0)  ;; etc.

)

This is something anyone can read and write, I think.

You could also use rational number specifications..

Note that this is a sequence of characters.

You could encode it as a  list of 32-bit or 64-bit numbers efficiently,
or as a list of arbitrary-precision integers, with somewhat lower
efficiency.

At the substantial risk of belaboring this
point, I disagree with Gaston on the feasibility
of leaving out floats: it is feasible for OM to convey numeric
rationals only.  If someone wishes to encode one of these as
a bigfloat of some particular length, then there can be operations
for making this transformation in the computer system.  We do
not require the result of cos(12) to be an exact OM data quantity; why
should we require doublefloat(12) to be an exact OM data quantity so
long as we can transmit  "cos(12)"  or "doublefloat(12)".

>David

Richard

```