Fixed point exponentiation

Started by andrewstanfordjason 5 years ago16 replieslatest reply 5 years ago2883 views

I have a need to raise two to the power of a fixed point number, for example 2**(Q31(-0.4)). Does anyone know an efficient way this I can implement this to a reasonable precision, say 12 bits, without making large lookup tables(i.e. greater than 16 elements long)? It would seem like the 2 would help things. The input range can be constrained to -1 to 0, i.e. the output will always fit in a Q31 as well. Thanks for any advice.

[ - ]
Reply by DaniloDaraJuly 5, 2019

This is the documentation of a Fixed Point package that I have created some years ago.
It describes power of 10 - but the things are correlated.

It is based on the exp() function.
Which could be described in a similar way.

1.1Power of ten function (Exp10)


There is no “algorithm” for the power of ten, since the function is based over a trivial identity:

10x = ex ln 10

And so, other than setting a different Taylor series expansion (which is possible, anyhow), with tests and maintenance required again, by just paying one more MPY (x * ln 10) we get a new function.
ln 10 is – of course – a constant.

And since the function is fast, indeed, there is no problem in paying more time for a multiplication.

The problem might actually be in the range check: the admitted range for Exp10 must necessarily comply with the ranges of both the MPY function and the native Exp function.


Due to Exp dependency, “home” is where input operands are close to 0.

The new required range for Exp10 is -24/20,…, +24/20.
This becomes (after being multiplied by ln (10))

-2.76 < x < 2.76

As input to the Exp ( ).

This is HIGHER than the formal input range of the Exp function.
But not high enough as to convince us having a separate Taylor series for this function.
Since the originally declared input range for this function was not -24/20,…,24/20 but rather
-12/20,…,12/20, I suggest to
- double check the necessary input range for the Exp10,
- verify the number of iterations necessary to get -2.76,…,2.76 range,
and then
- analyze whether to always increase the iterations number for the Exp function, or having an optional parameter Exp(x, iter=8) defaulting to normal iterations count and instantiated to 10/11/…. or whatsoever when called from inside Exp10.

1.1.3Devices and Example

No need for examples, I believe, since the function is actually a mere juxtaposition of elementary ones.


The cost of an Exp function plus a MPY by a constant.
The constant belongs to <2,…, 3> interval, so attention must be taken to Scenery and normalization issues.


It is not likely, but a rewriting of the Exp10 function in terms of a native Taylor series could be necessary, and the possibility must be taken in mind.

Maybe even differentiating the function kernel into two separate intervals could be an option.

[ - ]
Reply by jms_nhJuly 5, 2019

You can always optimize a polynomial implementation over a given range. I would recommend Chebyshev approximation, especially since you have the obvious -1 to 0 range. Don't use Taylor series, they are not optimal except around one point.

You're probably looking at a 3rd / 4th / 5th order polynomial.

See my article: https://www.embeddedrelated.com/showarticle/152.php -- I didn't evaluate 2**x but I did evaluate e**x in the [0,1] range for the Chebyshev coefficients up to degree 5, and they are 


so a 3rd order polynomial should get you a max error around c4 = 0.0005 = 11 bits, and a 4th order polynomial should get you a max error around c5 = 15 bits.

[ - ]
Reply by rbjJuly 5, 2019

In this stackexchange answer are the coefficients for some short finite-order polynomials for evaluating (at a pretty good accuracy) a bunch of transcendental functions including the base 2 exp.


To implement this in fixed-point requires some care with ranges and scaling.  Instead of Q31 (or another notation is S1.31), I might suggest 4 bits left of the binary point. (or S4.28).  You need to establish the maximum value of the result of the 2^x function and scale the coefficients accordingly.  Always scale down intermediate results with the right shift operator (">>" in C), so your coefficients will be integers that are scaled up by some power of two.

[ - ]
Reply by andrewstanfordjasonJuly 5, 2019

Thank you all for the ideas, I'll hopefully be able to get back with the solution I went with.

[ - ]
Reply by CedronJuly 5, 2019
This is how I would do it.  Split your value into its integer (n) and fractional (f) parts.

$$ 2^x = 2^{n+f} = 2^n 2^f = 2^n  e^{\ln(2)f} = 2^n  L e^{\ln(2)f - ln(L)} $$

$$ y = \ln(2)f - ln(L) \approx 0 $$

$$ 2^x = 2^n L e^{y} = 2^n L \left[ 1 + y + y^2/2 + y^3/6 + y^4/24 ... \right]  $$

So you need a lookup table of $L$ with $\ln(L)$ values.  Since f is between 0 and 1, then the range of $\ln(2)f$ is about $ 0 \to 0.69$, if you build your table with fourteen entries you should always be able to find an entry within a thirtysecond (five significant bits) of $\ln(2)f$.  Therefore, you should only need to take the first four terms of the series to get your desired accuracy.  It's only the fourth term and beyond that need a division, the first three terms can be calculated with multiplications, additions, and shifts.

You may want to increase the size of your lookup table so you only need to calculate the first three terms if you want the fastest speed.  Of course the last 2^n times can be done by shifting as well.

You can index your lookup table on the first few bits of $\ln(2)f$, so the corresponding lookup value is ln(L) = leading bits + 1/2 the last bit place value, and the L is the exp of that.

For sample code in Gambas to show how it works (but not how to implement fixed point), I've posted an example at the Gambas.One forum.

See: DSP: Power of Two Approximation

[ - ]
Reply by bmcgee22July 5, 2019

You could use the Hyperbolic Cordic algorithm to compute e^x, then use the identity of 2^x = e^(x*ln(2)).


[ - ]
Reply by rbjJuly 5, 2019

it just seems to me, Brian, that especially for a fixed-point implementation, that 2^x is the simplest primitive and then use A^x = 2^(x*log2(A)).

When it's base 2, you can separate the integer and fractional parts of x, compute 2^fract_part(x) and then shift that result to the left by int_part(x) bits (or right if it's negative).

[ - ]
Reply by bmcgee22July 5, 2019

Not knowing the target platform and design constraints, a polynomial and shift operation like you suggest might very well be the best solution, that is a method I have used myself, I was merely offering an alternative. The Cordic algorithm is relatively low complexity using only shifts and adds, and e^x is one of the functions that can be computed directly, so it might make sense for their application.  Since they appear to be resource constrained (wanting to keep the LUT <= 16 elements), I offered one of the solutions I have used in the past (implemented in an FPGA) that had not been mentioned already, something they could add to the trade space if it made sense.


[ - ]
Reply by rbjJuly 5, 2019

but can't you develop a Cordic operation based on 2^x instead of e^x just as efficiently?  there might be an internal scaling operation in there that would be different for 2^x.

the issue, to me, is that if you break x into integer and fractional parts: x = i + f, then 2^f can be implemented to a good precision for 0 <= f < 1 .

[ - ]
Reply by bmcgee22July 5, 2019

Yes, I think you can scale the cordic operations to have the ln(2) scaling built in getting the 2^x directly (I needed e^x, so didn't work on that optimization). 

Breaking the operation into x = i + f makes, this is just an alternate way to compute the 2^f part that uses no multiplies, just iterates using shifts, add/sub, and a small look up table.  You can control the precision by increasing iterations and LUT size, so it is pretty easy to get the 12 or more bits precision desired.

[ - ]
Reply by CedronJuly 5, 2019

Hi rbj,

Did you see what I did above?


[ - ]
Reply by rbjJuly 5, 2019

i do now.  but it looks a little Maclaurenish to me.  since, ultimately, the power series will have to be finite (which is a polynomial), you will want (slightly) different coefficients on each power of "y" in the finite-order polynomial than you would get from the Maclauren series having an infinite number of terms.  If what you're trying to get is to minimize the maximum error (perhaps the error is weighted in some meaningful manner), then you want to arrive at those coefficients from some numerical optimization such as Remes Exchange Alg.

[ - ]
Reply by CedronJuly 5, 2019
Well it should, it is a Taylor series centered at zero.  But that is the key, the expected argument values (y) are also centered on zero, so by symmetry, tweaking the coefficients will not improve expected errors much, if at all, at the expense of an extra multiplication per term (except the first of course).  Using those multiplications (you can do the divisions my multiplication because the divisors are constants known ahead of time) to evaluate an extra term or two will buy you a lot more accuracy for your buck.

Using the Lookup table is, in effect, converting the Maclaurin series at a value to a Maclaurin series at near zero, so it converges faster.

Edit:  It will only cost two extra multiplications as the others can be combined with the existing coefficients ahead of time.

[ - ]
Reply by chalilJuly 5, 2019
there is yet another approach which can be helpful for several practical cases: stated problem is to find y = 2**x. 

look for one more level higher than the 2**x. ie, typically the x value must be returned by another fixed-point operation, most of the cases involving log - for dimensional reasons. 

let's say x = f(t). then the new expression is y = 2**f(t). try to find out the range of t. once the range is within the limit, apply curve fitting. if the accuracy requirement is higher split the range into different bands and apply fit. most practical cases, 2nd order poly fit should suffice. 

alternately, once can also go in the other direction where y is used in another expression, say z = g(y). then the overall expression is z = g(2**x). try the curve fit range. 

i remember there was a case where the just keeping different LUT against 't' solve the problem ! original floating point code was referring x from table against different indices. 

in general looking around the primitives will help. ie, rather than converting all primitives directly to fixed-point try to see how they are used together. this would need higher level of analysis and sometime more number of LUTs and functions. but once can get higher accuracy for the range she/he is looking for. 

[ - ]
Reply by andrewstanfordjasonJuly 5, 2019

Results time, I went with a polynomial approximation. For those interested 



These are the python I used to generate the coefficients and the C code I tested them on.

The results: (abs_max_diff = the max difference between the fixed point integer implementation and a double precision reference, MSE = mean squared error across all possible input values of interest to me [-1.0, 0.0]

2nd order

max_abs_diff: -53.65372dB -8.91 bits

MSE: -62.52919dB -10.39 bits

3rd order

max_abs_diff: -79.87001dB -13.27 bits

MSE: -89.74021dB -14.91 bits

4th order

max_abs_diff: -108.23040dB -17.98 bits

MSE: -118.90955dB -19.75 bits

5th order

max_abs_diff: -138.25612dB -22.96 bits

MSE: -149.67265dB -24.86 bits

6th order

max_abs_diff: -168.05046dB -27.91 bits

MSE: -179.93220dB -29.89 bits

7th order

max_abs_diff: -174.21027dB -28.94 bits

MSE: -188.13559dB -31.25 bits

After 7th the arithmetic precision starts to decrease the performance. I implemented this on a 32 bit machine with a 64 bit accumulator(hence the included C implementation)

For my purposes, 3rd order would suffice but the assembly implementation was so fast I will probably go higher. Thanks again all. 

[ - ]
Reply by CedronJuly 5, 2019
Ah, a brute force approximation.  Where's the elegance in that?

Just kidding.  If it is fast enough, generally simpler is better.  Thanks for the followup letting us know the outcome.