Here I investigate how to calculate

\[ x = \sum _ {i=1} ^N x _ i \]

with floating point arithmetic.

Calculating this sum efficiently and accurately can be tricky, especially for large \( N \) The obvious algorithm

```
float xhat=0.0;
for(int i=0;i<N;i++){
xhat+=x[i];
}
```

has a worst-case relative error of

\[ \frac{\hat{x} - x}{x} = O(\epsilon N) \]

where \( \epsilon \) depends on the precision used in the floating point arithmetic, here I take \( \epsilon \approx 10^{-6} \). This is a very problematic error when working on large datasets. Scientific computation algorithms routinely work with arrays well above the limit that would cause this pessimistic error bound to sound serious correctness alarms.

Some algorithms which can reduce this bound to more useful levels. Examples include pairwise summation which reduces the linear factor to a logarithmic factor, and Kahan summation which eliminates the \( N \) dependent factor completely in the error estimate. But these algorithms have their own problems too.

Pairwise summation and Kahan summation are complex and hurt parallelism and vectorization opportunities unless they fall back to naive summation once they achieve smaller array sizes.

What I’m curious to know is whether this horrendous error estimate for naive summation is ever actually realized in practice? My experience suggests not as I have worked on very large-scale codes which use naive summation and they had no issues with explosively large relative errors. So I decided to test this numerically.

One issue with naive summation on large problems is that it will likely be executed in parallel. Since floating point arithmetic is not associative, that means the errors we create between different runs of the same code (even given the same inputs) could be different. Ideally we would like to give some kind of guarantee that even though these answers are different, that they are within some prescribed error tolerance. Since the errors are not determnistic, this leaves us only with the worst-case bound which is useless.

Thus I model the relative errors as a random variable with normal distribution:

\[ \frac{\hat{x} - x}{x\epsilon} \sim N(\mu,\sigma) \]

and attempt to find the parameters \( \mu, \sigma \). Originally I thought to use the Bayesian approach of my previous post, however in the large sample sizes that I needed for convergence I found severe numerical problems in evaluating things like the likelihood, which in turn ruins the numeric integration. I believe these problems are fixable, but I need to learn more about those techniques. So for now I simply use the maximum likelihood estimate. To simulate non-deterministic summation as might happen in parallel execution I lightly modify the naive summation as follows:

```
float xhat=0.0;
for(int i=0;i<N;i++){
xhat+=x[perm[i]];
}
```

where the index array `perm`

is a permutation of the original indices. I compute different permutations through the NAG routine nag_rand_permute. In what follows I use a fixed array `x`

of \(N=10000\) floating point numbers. I compute relative error by computing the permuted sum in 32 bit floating point arithmetic and the “correct” sum again in 64 bit floating point with Kahan summation. The relative error is computed in 64 bit floating point by casting up the 32 bit result. I compute the mean and standard deviation from random permutation samples of sizes `1,10,100,1000,100000,1000000`

hoping to see a convergence trend, and follow with histogram plots:

Number of permutation samples | \( \mu \) | \( \sigma \) |
---|---|---|

1 | -1.115 | 0 |

10 | -1.072 | 4.97 |

100 | -1.650 | 8.59 |

1000 | -1.528 | 7.45 |

10000 | -0.912 | 7.37 |

100000 | -0.940 | 7.38 |

1000000 | -0.942 | 7.34 |

In order to have confidence in the mean and standard deviation calculations I used the robust NAG routine nag_summary_stats_onevar. I show next histograms of the computed relative errors so as to visually confirm approximate normal distribution.

Number of permutation samples | Histogram |
---|---|

1 Permutation samples | |

10 Permutation samples | |

100 Permutation samples | |

1000 Permutation samples | |

10000 Permutation samples | |

100000 Permutation samples | |

1000000 Permutation samples |

The worst case error in this case is about \(N=10000\). With our fitted normal distribution I can see immediately that this is not even worth computing because it will yield such a small probability.

To compute other more interesting probabilities I used the julia script:

```
using Distributions
N=Normal(-0.942,7.34);
function f(x)
return pdf(N,x);
end
(P1,err)=quadgk(f,10,100);
(P2,err)=quadgk(f,-100,-10);
P=P1+P2;
println(P);
```

using this I compute some other probabilities:

\[ P \left( \frac{\hat{x} - x}{x\epsilon} > 10 \right ) \approx 0.18 \]

which means that

\[ P \left( \frac{\hat{x} - x}{x\epsilon} \leq 10 \right ) \approx 0.82 \]

We could also determine a new “probabilistic worst case” estimate by finding a bound which results in a very low probability. For example:

\[ P \left( \frac{\hat{x} - x}{x\epsilon} > 20 \right ) < 0.001 \]

The purpose of this exercise was to see how likely we are to actually achieve worst-case error in the naive floating point sum algorithm. I also wanted to produce an estimate that would apply to the parallel summation case where we don’t know what order certain summations will happen, I achieved this by randomly sampling permutations of the floating point array. In this particular case I reduced the worst-case error bound of \( \frac{|error|}{\epsilon} < 10000 \) to a much better \( \frac{|error|}{\epsilon} < 20 \) (with probability greater than \( 0.99 \)).

This analysis was very simple. An actual parallel algorithm won’t actually take all permutations of the sum with equal probability, which I assumed in making these calculations. Therefore a more sophisticated analysis would account for the fact that some evaluation orders are more likely than others. Another issue here is the use of a fixed array `x,`

the only thing allowed to change here was the evaluation order of floating point summation, not the floating point numbers themselves. Thus the analysis could be improved to account for all of these weaknesses.

For future work I can systematically account for the following issues:

- Determine probability of an evaluation order, given a certain parallel summation algorithm.
- Instead of assuming all evaluation orders are equally probable.

- Extend analysis to non-fixed floating point array
`x`

- Perhaps by specifying its underlying distribution and analyzing with respect to its parameters

- Combine first-principles derivations from floating point literature with data-oriented modeling as I have done here.
- e.g. analysis such as Higham’s The Accuracy of Floating Point Summation
- This could maybe form the basis for selection of a good prior for a Bayesian analysis

- Rigorously test alternative (not-normal) distributions (e.g. t distribution)
- Thanks to Manjari Naryan for this and other suggestions.

I put the code for these computations on Github.

The large sample sizes are very computationally intense, evidenced by the following screengrab of “top”