DSPRelated.com
Forums

Fast Cube Root Using C33

Started by pnachtwey February 26, 2009
On Feb 27, 8:39&#4294967295;am, Martin Eisenberg <martin.eisenb...@udo.edu> wrote:
> c...@claysturner.com wrote: > > Hey Peter, you know you can do range reduction to put the number > > into something like [0.125, 1.000]. Just divide or mult the > > original number by 8 until it is in the right range. And the > > answer is scaled by 2^reductions. (reductions is signed number), > > then use a poly to find the cube root within this range. > > > Or range reduce and then pick a crude seed and use a Newton's or > > higher order method to iterate the solution. > > The thing with explicit range reduction is that you need to extract > the exponent field and reassemble the result, whereas Peter wants a > code that swallows the input whole. I've experimented with the > analogous Quake inverse-sqrt trick on x86, and the advantage was > worth roughly one Newton iteration or two reduced-interval polynomial > orders. > > Martin > > -- > Quidquid latine scriptum est, altum videtur.
Thanks guys, I was looking for a particular type of solution and I found one that is similar to Kahan's 'bit hack' by studying what was being do and then modifying and optimizing. I had to make a few adjustments and I am sure my solution is not optimized but close. I tried different constants and then tried my new solution and compared it to the pow(f,1./3.) over a range of 0.000000001 to 100000000 in increments of f*=1.001; I may be off a zero or two. I recorder the max error and modified the constants to reduced the max error. I used no divides or mod operators. The worse error was around 1. I think this is due to the odd ball TI floating point. Peter Nachtwey
On Fri, 27 Feb 2009 08:56:15 -0500, Jerry Avins wrote:

> Tim Wescott wrote: >> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: >> >>> I need a fast cube root routine. I started with the obvious >>> http://metamerist.com/cbrt/cbrt.htm >>> What is key is getting the initial estimate using the Kahan bit hack >>> but that only works with IEEE floating point and not with TI floating >>> point. >>> I can't believe that I am the first to do this since the C33 is >>> ancient. The obvious solution is to convert the TI float to the IEEE >>> float, apply the hack, and then convert back to TI float but there >>> must be a better way. Actually doing another iteration is probably >>> faster. The problem is that sign of the mantissa is at bit 23. >>> >>> Does anyboy know of a link? Thanks. >>> >>> Peter Nachtwey >> >> Does anyone here remember how to take the square root of a decimal >> number via "long division"? My dad taught me how when I was >> considerably shorter than I am now, possibly because he was appalled at >> his brother teaching me how to do it on a calculator using Newton's >> method. >> >> For those of you under 80, it's basically a method for churning out >> decimals of the square root of a number in a method very akin to long >> division, except that at each step instead of subtracting (cumulative >> result) * (guess) you subtract (2 * cumulative result + guess) * >> (guess) -- you're basically trying to complete the equation (cumulative >> result + guess) ^ 2 = (cumulative result)^2 + 2 * (cumulative >> result)(guess) + (guess)^2. >> >> It turns out that -- like long division -- it's really easy to do on a >> computer in binary because when you're only multiplying by 0 or 1 you >> don't have to waste a lot of time guessing. >> >> It also turns out that if you're a _real_ math geek you can work out >> how to do this with cube roots, and, like square roots and division the >> guesswork is held to a minimum because you're just dealing with ones >> and zeros. >> >> If the group shows sufficient interest I'll write it up and post it. I >> figured it out when I was in high school then for got it. Peter's post >> got me interested enough to figure it out again (thanks Peter!), but >> I'm not sure if it's faster than Newton's method unless you're doing >> the work on an FPGA. >> >> Each step starts with you knowing the starting value - answer^3, and >> needing to calculate 3*answer^2 and 3*answer. You then have to >> subtract those two numbers from your residual to know if you emit a '1' >> or a '0'. >> >> I _suspect_ that it's not as quick as Newton's method, but it's late >> and I'm not (by gawd) going to check! > > There is in general a fundamental difference between paper-and-pencil > algorithms and those run on a computer with a good divide, and usually, > the computer version needs fewer steps. > > With paper and pencil, one wants the next digit that is just not too > large. With machinery that can divide, one wants the best approximation > without regard to the number of digits produced. > > Jerry
Divide is fiendishly expensive; while Pentium-class machines and other "scientific" processors may pay for it with transistors, most of the embedded processors in my experience take far more time to perform a divide than any other 'fundamental' arithmetic operations. Even the fixed-point DSP chips that I have used will do a single-cycle multiply and accumulate, but take one cycle per bit of answer to perform a divide. -- http://www.wescottdesign.com
On Fri, 27 Feb 2009 08:18:49 -0600, Vladimir Vassilevsky wrote:

> Tim Wescott wrote: > >> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: >> >> >>>I need a fast cube root routine. > > [...] > > > >> Does anyone here remember how to take the square root of a decimal >> number via "long division"? > > [...] > > That algorithm computes one bit per iteration. Simple, but very slow. > Newton-Raphson efficiently doubles the number of known bits at each > step. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultant http://www.abvolt.com
I just checked a simple N-R algorithm for both square root and cube root. For square roots, N-R delivered a factor of 2.4 precision (so slightly more than one bit) per iteration -- not a doubling. For cubes, it's a factor of 1.8 or so -- so less than one bit per iteration. That's not to say that my algorithm doesn't take longer, because there's considerably more math per iteration for a crummy third of a bit more per iteration or whatever I'm achieving. -- http://www.wescottdesign.com
On Fri, 27 Feb 2009 06:54:02 -0800, pnachtwey wrote:

> On Feb 26, 10:35&nbsp;pm, Tim Wescott <t...@seemywebsite.com> wrote: >> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: >> > I need a fast cube root routine. &nbsp;I started with the obvious >> >http://metamerist.com/cbrt/cbrt.htm >> > What is key is getting the initial estimate using the Kahan bit hack >> > but that only works with IEEE floating point and not with TI floating >> > point. >> > &nbsp;I can't believe that I am the first to do this since the C33 is >> > ancient. &nbsp;The obvious solution is to convert the TI float to the IEEE >> > float, apply the hack, and then convert back to TI float but there >> > must be a better way. &nbsp; Actually doing another iteration is probably >> > faster. The problem is that sign of the mantissa is at bit 23. >> >> > Does anyboy know of a link? &nbsp;Thanks. >> >> > Peter Nachtwey >> >> Dunno the details (I started doing DSP long after the C33 went >> obsolete, and I'm too cool* to use a floating point DSP anyway). &nbsp;So I >> can't help you directly, but I _can_ jump in and kibbitz. >> >> Have you considered extracting the exponent, doing a quick table >> look-up on the resulting mantissa bits, and using whatever the table >> coughs up as a seed for N-R? >> >> * or perhaps my employers are too cheap. &nbsp;Hmm. >> >> --http://www.wescottdesign.com > The exponent look up table is a good alternative because is quick but a > memory hog. I was looking for a variation of the Kahan "bit hack". I > might try that and use the mantissa to interpolate between the values. > At least I will know. I think this can be done without a divide. > > Peter Nachtwey
I wasn't suggesting an exponent look-up. I was suggesting that you normalize to the range [1, 0.125), do however large a look up you wanted to on the top bits of the resulting _mantissa_, and use that as a seed for N-R. -- http://www.wescottdesign.com
On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote:

> I need a fast cube root routine. I started with the obvious > http://metamerist.com/cbrt/cbrt.htm > What is key is getting the initial estimate using the Kahan bit hack but > that only works with IEEE floating point and not with TI floating point. > I can't believe that I am the first to do this since the C33 is > ancient. The obvious solution is to convert the TI float to the IEEE > float, apply the hack, and then convert back to TI float but there must > be a better way. Actually doing another iteration is probably faster. > The problem is that sign of the mantissa is at bit 23. > > Does anyboy know of a link? Thanks. > > Peter Nachtwey
If the C33 has a fast multiply you can use a truncated, corrected Taylor's series to get close fast. Getting _there_ would take an unreasonable number of terms, but getting to within 0.1% only takes three terms. Again, the process would be to normalize down to the range [0.125, 1), then do your look up, then de-normalize by 1/3 the amount that you adjusted the exponent to normalize. -- http://www.wescottdesign.com
Tim Wescott wrote:
> On Fri, 27 Feb 2009 08:18:49 -0600, Vladimir Vassilevsky wrote: > >> Tim Wescott wrote: >> >>> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: >>> >>> >>>> I need a fast cube root routine. >> [...] >> >> >> >>> Does anyone here remember how to take the square root of a decimal >>> number via "long division"? >> [...] >> >> That algorithm computes one bit per iteration. Simple, but very slow. >> Newton-Raphson efficiently doubles the number of known bits at each >> step. >> >> Vladimir Vassilevsky >> DSP and Mixed Signal Design Consultant http://www.abvolt.com > > I just checked a simple N-R algorithm for both square root and cube root. > > For square roots, N-R delivered a factor of 2.4 precision (so slightly > more than one bit) per iteration -- not a doubling. For cubes, it's a > factor of 1.8 or so -- so less than one bit per iteration. > > That's not to say that my algorithm doesn't take longer, because there's > considerably more math per iteration for a crummy third of a bit more per > iteration or whatever I'm achieving.
Tim, Your results puzzle me. Vladimir's agree with my experience and what I have read. Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
Jerry Avins wrote:

> Tim Wescott wrote:
(snip)
>> I just checked a simple N-R algorithm for both square root
>> and cube root.
>> For square roots, N-R delivered a factor of 2.4 precision >> (so slightly more than one bit) per iteration -- not a doubling.
>> For cubes, it's a factor of 1.8 or so -- so less than one >> bit per iteration.
>> That's not to say that my algorithm doesn't take longer, because >> there's considerably more math per iteration for a crummy third of a >> bit more per iteration or whatever I'm achieving.
> Your results puzzle me. Vladimir's agree with my experience and what I > have read.
I thought so, too. But I think if you are very far from the solution that the improvement per iteration may be much less. All the ones I know, given a floating point input, adjust the exponent as appropriate first. For cube root you divide by three. Next, use the remainder to adjust as appropriate. In the olden days (S/360) they then used fixed point arithmetic to do an initial approximation before starting the NR. That worked when fixed point was faster than floating point, buy may not be best today. With a good initial guess for square root, two iterations for single precision and four for double precision is enough. With a bad guess it might take a lot more. -- glen
On Feb 27, 8:52&#4294967295;pm, Tim Wescott <t...@seemywebsite.com> wrote:
> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: > > I need a fast cube root routine. &#4294967295;I started with the obvious > >http://metamerist.com/cbrt/cbrt.htm > > What is key is getting the initial estimate using the Kahan bit hack but > > that only works with IEEE floating point and not with TI floating point. > > &#4294967295;I can't believe that I am the first to do this since the C33 is > > ancient. &#4294967295;The obvious solution is to convert the TI float to the IEEE > > float, apply the hack, and then convert back to TI float but there must > > be a better way. &#4294967295; Actually doing another iteration is probably faster. > > The problem is that sign of the mantissa is at bit 23. > > > Does anyboy know of a link? &#4294967295;Thanks. > > > Peter Nachtwey > > If the C33 has a fast multiply you can use a truncated, corrected > Taylor's series to get close fast. &#4294967295;Getting _there_ would take an > unreasonable number of terms, but getting to within 0.1% only takes three > terms. > > Again, the process would be to normalize down to the range [0.125, 1), > then do your look up, then de-normalize by 1/3 the amount that you > adjusted the exponent to normalize. > > --http://www.wescottdesign.com
I tried the Turkowski algorithm. I have a directory of cube root algorithms. I have know about the Turkowski algorithm for a while while since I have a Mac Pro and a Mac Mini at home and have scoured the Apple info. We actually have a MPC5200 for some of our products and the Turkowski algorithm may be more suitable with the MPC5200. The TI TMS320C33 does not have a mod or divide instruction. To emulate a divide or mod instruction takes about 35 clock/instruction cycles. A multiply and add or subtract takes 1. The Turkowski algorithm requires too many divide and mod instructions and it is much faster to do multiple iterations. Peter Nachtwey
Glen Herrmannsfeldt wrote:
> Jerry Avins wrote: > >> Tim Wescott wrote: > (snip) > >>> I just checked a simple N-R algorithm for both square root > >> and cube root. > >>> For square roots, N-R delivered a factor of 2.4 precision >>> (so slightly more than one bit) per iteration -- not a doubling. > >> For cubes, it's a factor of 1.8 or so -- so less than one > >> bit per iteration. > >>> That's not to say that my algorithm doesn't take longer, because >>> there's considerably more math per iteration for a crummy third of a >>> bit more per iteration or whatever I'm achieving. > >> Your results puzzle me. Vladimir's agree with my experience and what I >> have read. > > I thought so, too. But I think if you are very far from the solution > that the improvement per iteration may be much less. > > All the ones I know, given a floating point input, adjust the > exponent as appropriate first. For cube root you divide by > three. Next, use the remainder to adjust as appropriate. > > In the olden days (S/360) they then used fixed point arithmetic > to do an initial approximation before starting the NR. That > worked when fixed point was faster than floating point, > buy may not be best today. > > With a good initial guess for square root, two iterations > for single precision and four for double precision is enough. > > With a bad guess it might take a lot more.
Let's use a deliberately bad guess as an example. We want sqrt(N^2=100), and use out universal initial guess, g[0]= 1. (I hope nobody objects to decimal arithmetic.) 1 ( N^2 ) next_guess = - * (------ + guess) N is the root we want. 2 ( guess ) g[n] = (N^2/g[n-1] + g[n-1])/2 g[0] = 1 g[1] = 50.00500 Far from convergence, convergence is slow. g[2] = 26.24010 The error is approximately halved. g[3] = 15.02553 g[4] = 10.84043 Good to 2 digits. When close, convergence g[5] = 10.03258 Good to 3 digits. is fast. g[6] = 10.00005 Good to 6 digits. 6[7] = 10.0000000001 11 digits Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Martin Eisenberg <martin.eisenberg@udo.edu> writes:

> pnachtwey wrote: > >> On Feb 26, 8:19&nbsp;pm, cincy...@gmail.com wrote: >>> On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote: >>> >>> > I need a fast cube root routine. &nbsp;I started with the >>> > obvious http://metamerist.com/cbrt/cbrt.htm > >> I was hoping someone knows a variation of this that works on a >> TI C33. >> unsigned int* p = (unsigned int *) &f; >> *p = *p/3 + 709921077; > > My first try would be to clear the sign bit, do the divide and > offset, and put the sign back on, though I don't know how easy the > C33 makes the last step. Since the exponent is unbiased an offset of > zero should work. However, you can do better -- plot the relative > error over any interval spanning three exponent values (e.g. [1/2,4]) > and pick the offset to minimize the peak error. I have Maple code to > do this for IEEE754 that you could adapt. > > Additionally, you might gain from computing the *inverse* cube root > because that has a polynomial Newton function, reducing the number of > float divisions to one inversion at the end.
No divisions are required at the end. To amplify, the function for which we are trying to find a root is: f(x) = x^-3 - n The Newton iteration is: newx = x - f(x)/f'(x) = (4/3)*x - (1/3)*n*x^4 The (4/3) and (1/3) are precomputed constants, so there are no divisions. If x is close-enough to the inverse cube root of n, the number of accurate bits doubles with each iteration. When we are done, we can calculate: n^(1/3) = n*x^2 Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear