Converting floating point to unsigned int while preserving order

usual arithmetic conversions
convert unsigned to signed
unsigned short to int c
c cast int to unsigned int
floating-point to unsigned integer conversion
unsigned int to float
double to unsigned int
c program to convert integer to float number

I have found a lot of answers on SO focusing on converting float to int.

I am manipulating only positive floating point values. One simple method I have been using is this:

unsigned int float2ui(float arg0) {
    float f = arg0;
    unsigned int r = *(unsigned int*)&f;
    return r;
}

The above code works well yet it fails to preserve the numeric order. By order I mean this:

  float f1 ...;
  float f2 ...;
  assert( ( (f1 >= f2) && (float2ui(f1) >= float2ui(f2)) ) ||
          ( (f1 <  f2) && (float2ui(f1) < vfloat2ui(f2)) ));

I have tried to use unions with the same results. Any idea? I use Homebrew gcc 5.3.0.

The code you're using, as writen, has undefind behavior. If you want to access the representation of floats semi-portably (implementation-defined, well-defined assuming IEEE 754 and that float and integer endianness match), you should do:

uint32_t float2ui(float f){
    uint32_t r;
    memcpy(&r, &f, sizeof r);
    return r;
}

For non-negative values, this mapping between floating point values and representation is order-preserving. If you think you're seeing it fail to preserve order, we'll need to see exactly what values you think are a counterexample.

Order-preserving bijections, Sometimes it's useful to convert numbers to a different representation trees are most naturally expressed in terms of unsigned integers, and it's generally Positive floats have the MSB (integer sign bit) clear and larger floating-point values floats have larger representations when compared as integers. When these functions convert a floating-point number to an integer, they round the output to the nearest integer, or the nearest even integer if the fractional part is 0.5. If the result is out of range for the integer, these functions return the minimum or maximum value for the integer type.

If f1 and f2 are floating points, and f1 <= f2, and (int)f1 and (int)f2 are valid conversions, then (int)f1 <= (int)f2.

In other words, a truncation to an integral type never swaps an order round.

You could replace float2ui with simply (int)arg0, having checked the float is in the bounds of an int.

Note that the behaviour of float to int and float to unsigned is undefined if the truncated float value is out of the range for the type.

Your current code - somehow intrepreting the float memory as int memory - has undefined behaviour. Even type-punning through a union will give you implementation defined results; note in particular that sizeof(int) isn't necessarily the same as sizeof(float).

If you are using an IEEE754 single-precision float, a 32 bit 2's complement int with no trap representation, a positive value for conversion, consistent endianness, and some allowances for the various patterns represented by NaN and +-Inf, then the transformation effected by a type pun is order preserving.

Conversions from unsigned integral types, When an unsigned integer is converted to an integer or floating-point type, unsigned char, char, Preserve bit pattern; high-order bit becomes  In the above example, the value of the floating-point variable f is converted to a long value and is assigned to l. If the value of f is outside the range of values for a long type, the maximum positive or negative long value is assigned to l. The value ranges for the following integer and floating point types are shown below;

Extracting the bits from a float using a union should work. There is some discussion if the c standard actually supports this. But whatever the standard says, gcc seems to support it. And I would expect there is too much existing code that demands it, for the compilers to remove support.

There are some things you must be aware of when putting a float in an int and keeping order.

  1. Funny values like nan does not have any order to keep
  2. floats are stored as magnitude and sign bit, while ints are twos compliment (assuming a sane architecture). So for negative values, you must flip all the bits except the sign bit
  3. If float and int does not have the same endianess on your architecture, you must also convert the endianess

Here is my implementation, tested with gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0 on x64

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

union ff_t
{
  float f;
  unsigned char a[4];
  int i;
};

int same_endianess = 0;

void
swap_endianess(union ff_t *ff)
{
  if (!same_endianess)
    {
       unsigned char tmp;
       tmp = ff->a[0];
       ff->a[0] = ff->a[3];
       ff->a[3] = tmp;

       tmp = ff->a[1];
       ff->a[1] = ff->a[2];
       ff->a[2] = tmp;
    }
}

void
test_endianess()
{
  union ff_t ff = { ff.f = 1 };

  if (ff.i == 0x3f800000)
    same_endianess = 1;
  else if (ff.i == 0x803f)
    same_endianess = 0;
  else
    {
      fprintf(stderr, "Architecture has some weird endianess");
      exit(1);
    }
}

float
random_float()
{
   float f = random();
   f -= RAND_MAX/2;

   return f;
}

int
f2i(float f)
{
  union ff_t ff = { .f = f };

  swap_endianess(&ff);

  if (ff.i >= 0)
    return ff.i;

  return ff.i ^ 0x3fffffff;
}

float
i2f(int i)
{
  union ff_t ff;
  if (i >= 0)
    ff.i = i;
  else
    ff.i = i ^ 0x3fffffff;

  swap_endianess(&ff);

  return ff.f;
}


int
main()
{
  /* Test if floats and ints uses the same endianess */
  test_endianess();

  for (int n = 0; n < 10000; n++)
    {
       float f1 = random_float();
       int i1 = f2i(f1);
       float f2 = random_float();
       int i2 = f2i(f2);

       printf("\n");
       printf("0x%08x,  %f\n", i1, f1);
       printf("0x%08x,  %f\n", i2, f2);

       assert ( f1 == i2f(i1));
       assert ( f2 == i2f(i2));

       assert ( (f1 <= f2) == (i1 <= i2));
    }
}

Conversions from signed integral types, When a signed integer is converted to an integer or a floating-point char, unsigned char, Preserve pattern; high-order bit loses function as  That might be more efficient than converting to double, especially if you want to get the float bit-pattern back out more efficiently (just broadcast the sign bit, logical right shift by 1, and XOR the low 31 bits. Without needing to also do double->float FP conversion.) – Peter Cordes Mar 4 at 22:34

C++ In a Nutshell: A Desktop Quick Reference, If this is not the case, the type of one operand must be converted to match that of or floating-point type: bool, char, signed char, unsigned char, int, short, long, to arithmetic types, converting a “smaller” type to a “larger” type while preserving​  evaluated as a large, signed integer. Preserve bit pattern; high-order bit loses function as sign bit short unsigned short long short Preserve low-order word long unsigned char Preserve low-order byte long unsigned short Preserve low-order word long char Preserve low-order byte short unsigned long Sign-extend to long; convert long to unsigned long

C A Software Engineering Approach: A Software Engineering Approach, ands when one was a long unsigned integer and the other was a short adopted a different conversion method, known as value-preserving. Floating-​Point. (ARM floating-point to integer conversion instructions return saturated values.) "Conversions from Signed Integral Types" was clarified to indicate that when converting to a signed or unsigned integer type, the source value is sign-extended or truncated to the size of the result, and the resulting bits are interpreted using the destination type.

4. Type Conversions - C in a Nutshell [Book], For example, when you convert a negative value to an unsigned type, or convert a The floating-point types are ranked in the following order: The usual arithmetic conversions preserve the operand's value, except in the following cases:. Assuming a normal number (subnormal numbers are small enough so that they can be safely set to zero, infinities, zero, negative zero and NaN need to be handled specially anyway), you need to subtract the exponent bias from the exponent of the original floating point format (that's 127 for 32-bit float), and re-add the exponent bias of the new format (that's 15 for the 16-bit half) later.

Comments
  • What are you trying to do? If you just want to have your int with a truncated value of float, this is not the way. The way would be just unsigned int r = f; Otherwise it's just an undefined behavior.
  • Floating point and integer have very different representations. Treating one as the other results in undefined behavior; it doesn't matter if you do it with pointers or unions. The only thing you can do portably is r = (int)f;, although this fails if f is larger than UINT_MAX.
  • Are you interested in getting the numeric value of the variable, or its bitwise representation? You can't do both.
  • Aside: You might prefer to assert (f1 >= f2) == (float2ui(f1) >= float2ui(f2))
  • The code invokes undefined behaviour.
  • How does this differ from the use of a union (float, unsigned int)?
  • @PaulFloyd: It might in C++, but C permits reinterpreting bytes through a union.
  • Detail: "For non-negative values (not NaNs) ... is order-preserving". OTOH, one may consider NaNs as lacking value.
  • C vaguely support sunion reinterpretation via somewhat underspecified and inconsistent wording, so I prefer the memcpy approach which is just as efficient on any decent compiler and clearly well-defined.
  • @toohonestforthissite: Sure it's allowed. There is some subtlety for the resulting effective type of objects with allocated storage, but for objects with declared type, none of that comes into play. The type does not change, and the resulting value is simply the result of interpreting the bytes you stored to the representation in the actual type. In general this could be a trap representation, but that would be implementation-defined, not undefined/disallowed, and for uint32_t it specifically cannot be because there are no spare bits to be padding.
  • @R..Forgive me but "converting float to int" normally means converting a float to an int?!
  • Given the tags (including ieee-754) it's clear that OP wants to access the representation, and assuming that, yes it's implementation-defined but OP has specified the implementation constraint.
  • I wouldn't say OP is 'clearly' trying to do anything. My interpretation is the same as @Bathsheba
  • BTW there are very good reasons for order-preserving transformations like this, for example integrating the values as part of a sort key.
  • @chux: I wonder if the OP is disallowing a signed integer zero when they say "positive"? Quite often "positive" is used when "non-negative" would be much better. For non-negative cases you are indeed correct.