7 Dubious Solutions to Approximate Floating-Point Equality

March 20, 2019

The vast majority of technical computing applications use floating point numbers for their arithmetic and value storage. It becomes necessary frequently to answer the question of when two floating point numbers are (almost) equal to one another. This can be a very subtle question, we generally want the following properties of a comparison function for floating point numbers:

  1. Scale invariance: if x=y then C*x=C*y for any C
  2. Symmetry: if x=y then y=x

I’ll present some common dubious solutions to it in C++ and explain why they don’t work. I end with one solution which does.

Dubious Solution 1: Actually Using Equality

I think most programmers mostly know they should not do the following:

bool isEqual(double x,double y){
  return x==y;
}

The reason is that it gives no respect to actual tolerances. A technical computing problem is not fully specified until you have given not just the answer you want, but an actual range of equally plausible answers. You have to account for inaccurate inputs, possible roundoff in the algorithm, and possible approximations. Therefore simply saying “the answer is 1.23456” is not informative in technical computing. Rather you should say “The answer is 1.23456 with a tolerance of 1e-6”.

/*Assume we tolerate errors in the 7th digit.*/
isEqual(1.234566,1.234567); // False!

Dubious Solution 2: Absolute Difference

The next obvious solution to this problem is to take absolute difference as follows:

bool almostEqual(double x,double y,double tolerance){
  return std::abs(x-y)<tolerance;
}

This fixes the example from dubious solution #1:

double x=1.23456;
double y=1.234567;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//True!

But this creates another problem. The problem with this solution is it does not account for scaling in x and y. Consider the above example, slightly modified:

double x=1.23456;
double y=1.234567;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//True!
almostEqual(10.0*x,10.0*y,tolerance);//False!

This breaks scale invariance. It flies in the face of what we expect from equality. If x=y then surely 10x=10y? This leads us to the next dubious solution.

Dubious Solution 3: Naive Relative Difference

bool almostEqual(double x,double y,double tolerance){
  return (std::abs(x-y)/std::abs(x))<tolerance;
}

This leads to fixing the problems of dubious solution #2:

double x=1.23456;
double y=1.234567;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//True!
almostEqual(10.0*x,10.0*y,tolerance);//Now true!

This leads to two very problematic results. The most obvious case is potential division by zero if x=0 but a more subtle problem is the case when y=0 but x!=0. I illustrate these below:

double x=0.0;
double y=1e-7;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//False! Divided by zero.

double x=1e-7;
double y=0.0;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//False! (1e-7/1e-7)=1

This solution is scale invariant, but does not give us what we expect of equality.

Dubious Solution 4: Naive Thresholded Relative Difference

I see the following solution so frequently that I include it, but it’s a very unfortunate solution because it does not actually solve any of the problems we have just encountered:

bool almostEqual(double x,double y,double tolerance){
  //Threshold denominator so we don't divide by zero
  double threshold=1e-10;
  return (std::abs(x-y)/std::max(threshold,std::abs(x)))<tolerance;
}

The problems encountered in dubious solutions #3 still apply:

double x=0.0;
double y=1e-7;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//Still false. Did not divide by zero, but 1e-6/1e-10=1e4.

double x=1e-7;
double y=0.0;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//Still false!

The problem with this solution is that the threshold is totally arbitrary. That breaks the scale invariance property we just achieved with naive relative error without giving us any expected behavior of equality.

I still frequently see hardcoded thresholds like this in floating point comparison code. In my opinion this is incorrect. The comparison should give results that are independent of a threshold value. Indeed, this solution simply degrades to absolute comparison for inputs below 1e-10 so it actually creates more problems than it solves.

Dubious Solution 5: Better Thresholded Relative Difference

Dubious solution #4 is actually on the right track I think, but we should take it much further. One big problem with #4 is that it simply falls back to absolute error when the inputs are below the threshold, which breaks scale invariance. We know that absolute error isn’t good, but we also want to avoid dividing by zero. The way we avoid this is not by picking a totally arbitrary threshold, but rather the smallest possible threshold that is not zero. We don’t want subnormal numbers in the denominator either because dividing by these can be just as problematic as dividing by zero. Fortunately C++ gives us some solutions:

C++ offers numeric_limits which offers many functions for querying floating point limits. The function of interest for us here is std::numeric_limits<double>::min() which gives the smallest positive floating point number that is not subnormal.

#include <limits>
bool almostEqual(double x,double y,double tolerance){
  //Threshold denominator so we don't divide by zero
  double threshold=std::numeric_limits<double>::min(); //A very small nonzero number!
  return (std::abs(x-y)/std::max(threshold,std::abs(x)))<tolerance;
}

But yet again we haven’t actually solved any problems here. We have regained some scale invariance becuase threshold above is so small that it is impossible to scale the inputs to be smaller than it without them becoming subnormal, so the relative difference no longer degrades to absolute error when it shouldn’t (as dubious solution #4 did). But it still breaks when either x or y is zero. Let’s fix this now.

Semi-Dubious Solution 6: Account for zero inputs and subnormals.

#include <limits>
bool almostEqual(double x,double y,double tolerance){
  //Threshold denominator so we don't divide by zero
  double threshold=std::numeric_limits<double>::min(); //A very small nonzero number!
  if(std::abs(x)==0.0){
    return std::abs(x-y)<tolerance;
  }
  return (std::abs(x-y)/std::max(threshold,std::abs(x)))<tolerance;
}

This implementation finally recovers scale invariance and also gives us much of what we expect out of equality. I did break one of my own rules by checking std::abs(x)==0.0 and also by resorting to absolute error in case that is true - but I encourage you to try out many different inputs to this. It does capture floating point nearness very well and does not depend on the scale of the inputs to do so. The only thing missing is symmetry, i.e. we still have the annoying cases like:

double x=1e-7;
double y=0.0;
double tolerance=1e-6;
almostEqual(x,y,tolerance);//Still false!
almostEqual(y,x,tolerance);//True

this is because it gives preference to x: I fix this below

Semi-Dubious Solution 7: Account for zero inputs, subnormals, symmetrize.

#include <limits>
bool almostEqual(double x,double y,double tolerance){
  //Threshold denominator so we don't divide by zero
  double threshold=std::numeric_limits<double>::min(); //A very small nonzero number!
  double min=std::min(std::abs(x),std::abs(y));
  if(std::abs(min)==0.0){
    return std::abs(x-y)<tolerance;
  }
  return (std::abs(x-y)/std::max(threshold,min))<tolerance;
}

This implementation finally starts behaving more like an equality. It is symmetric, it is scale invariant, it doesn’t misbehave just because one of its inputs is 0. I still label it “semi-dubious” because I’m sure there are some cases I have missed which could still break this, but it captures a huge variety of floating point inputs that should work in many cases.

Semi-Dubious Solution 7 Part 2: Honorable mention

One implementation I see frequently is the following:

#include <limits>
bool almostEqual(double x,double y,double tolerance){
  //Threshold denominator so we don't divide by zero
  double threshold=std::numeric_limits<double>::min(); //A very small nonzero number!
  if(std::abs(min)==0.0){
    return std::abs(x-y)<tolerance;
  }
  double average=0.5*(std::abs(x)+std::abs(y));
  return std::abs(x-y)/std::max(threshold,average)<tolerance;
}

The difference between this and implementation #7 is very small so that I think one can legitimately choose either of them and it would make virtually no difference in outcome. I have a slight preference to #7 because by using min rather than average in the denominator we make the comparison slightly more conservative, that is more likely to return false than true.

The main point of failure for solution #7: if for example x=0 and y=1e-30 with tolerance 1e-16. You might think there’s no problem, it will decide y is almost zero and return true. But what if the calculations that lead to y=1e-30 involved numbers in the range [1e-40,1e-30]? Then actually y=1e-30 is at the top of that range and we may not want it to be determined as zero. Again here the problem is relative versus absolute tolerance

A Final Implementation: More Careful with Zero

I think to be complete we need to handle the case x=0 or y=0 a little better. Since that uses absolute error and absolute error scales differently from relative error we actually need a new tolerance to decide this. let’s call this zeroTolerance

#include <limits>
bool almostEqual(double x,double y,double tolerance,double zeroTolerance){
  //Threshold denominator so we don't divide by zero
  double threshold=std::numeric_limits<double>::min(); //A very small nonzero number!
  double min=std::min(std::abs(x),std::abs(y));
  if(std::abs(min)==0.0){
    return std::abs(x-y)<zeroTolerance;
  }
  return (std::abs(x-y)/std::max(threshold,min))<tolerance;
}

Now the comparison handles the zero case with a different tolerance from the nonzero case:

double x=0.0;
double y=1e-40;
double tolerance=1e-6;
double zeroTolerance=1e-30;
almostEqual(x,y,tolerance,zeroTolerance); //True

Conclusion

Comparing two floating point numbers is hard to do right. It’s important to account for subnormals,zero inputs,different scales and it’s useful if the result is symmetric.

By the way: the title of this post was inspired by Cleve Moler and Van Loan’s article Nineteen Dubious Ways to Compute the Exponential of a Matrix