DSPRelated.com
Forums

Fast Cube Root Using C33

Started by pnachtwey February 26, 2009
pnachtwey wrote:

> On Feb 26, 8:19&#4294967295;pm, cincy...@gmail.com wrote: >> On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote: >> >> > I need a fast cube root routine. &#4294967295;I started with the >> > obvious http://metamerist.com/cbrt/cbrt.htm
> I was hoping someone knows a variation of this that works on a > TI C33. > unsigned int* p = (unsigned int *) &f; > *p = *p/3 + 709921077;
My first try would be to clear the sign bit, do the divide and offset, and put the sign back on, though I don't know how easy the C33 makes the last step. Since the exponent is unbiased an offset of zero should work. However, you can do better -- plot the relative error over any interval spanning three exponent values (e.g. [1/2,4]) and pick the offset to minimize the peak error. I have Maple code to do this for IEEE754 that you could adapt. Additionally, you might gain from computing the *inverse* cube root because that has a polynomial Newton function, reducing the number of float divisions to one inversion at the end. Martin -- Quidquid latine scriptum est, altum videtur.
Jerry Avins wrote:
> pnachtwey wrote: >> On Feb 26, 8:19 pm, cincy...@gmail.com wrote: >>> On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote: >>> >>>> I need a fast cube root routine. I started with the >>>> obvioushttp://metamerist.com/cbrt/cbrt.htm >>>> What is key is getting the initial estimate using the Kahan bit hack >>>> but that only works with IEEE floating point and not with TI floating >>>> point. I can't believe that I am the first to do this since the C33 >>>> is ancient. The obvious solution is to convert the TI float to the >>>> IEEE float, apply the hack, and then convert back to TI float but >>>> there must be a better way. Actually doing another iteration is >>>> probably faster. The problem is that sign of the mantissa is at bit >>>> 23. >>>> Does anyboy know of a link? Thanks. >>>> Peter Nachtwey >>> Did you evaluate the Turkowski algorithm at the page you linked? It >>> uses a slightly more complicated seeding algorithm for the initial >>> guess for Newton's algorithm, but it's still relatively simple (a >>> quadratic). >> I didn't consider that one because of the mod operator and too many >> divides. The polynomial isn't a problem because all the instructions >> are 1 clock cycle and just a few multiply and adds but there is no >> divide or mod instruction on a TI C33 so they are expensive. The ldexp >> () and frexp() functions aren't that cheap either. >> >> I was hoping someone knows a variation of this that works on a TI C33. >> unsigned int* p = (unsigned int *) &f; >> *p = *p/3 + 709921077; >> This is mean and lean. >> I think that even the generic nth root modified to work on the TI C33 >> and an extra iteration is a better option. I just need to avoid >> divides. > > I can check what I don't remember, but not now (it's after midnight). > Isn't the 'C33 float a signed exponent rather than offset? If so, that's > even easier than IEEE. Just divide the exponent part by three, subtract > 1, and set the mantissa to (almost) 1.
The exponent is indeed two's complement. No offset. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Feb 26, 10:35&#4294967295;pm, Tim Wescott <t...@seemywebsite.com> wrote:
> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: > > I need a fast cube root routine. &#4294967295;I started with the obvious > >http://metamerist.com/cbrt/cbrt.htm > > What is key is getting the initial estimate using the Kahan bit hack but > > that only works with IEEE floating point and not with TI floating point. > > &#4294967295;I can't believe that I am the first to do this since the C33 is > > ancient. &#4294967295;The obvious solution is to convert the TI float to the IEEE > > float, apply the hack, and then convert back to TI float but there must > > be a better way. &#4294967295; Actually doing another iteration is probably faster. > > The problem is that sign of the mantissa is at bit 23. > > > Does anyboy know of a link? &#4294967295;Thanks. > > > Peter Nachtwey > > Dunno the details (I started doing DSP long after the C33 went obsolete, > and I'm too cool* to use a floating point DSP anyway). &#4294967295;So I can't help > you directly, but I _can_ jump in and kibbitz. > > Have you considered extracting the exponent, doing a quick table look-up > on the resulting mantissa bits, and using whatever the table coughs up as > a seed for N-R? > > * or perhaps my employers are too cheap. &#4294967295;Hmm. > > --http://www.wescottdesign.com
The exponent look up table is a good alternative because is quick but a memory hog. I was looking for a variation of the Kahan "bit hack". I might try that and use the mantissa to interpolate between the values. At least I will know. I think this can be done without a divide. Peter Nachtwey
On Feb 27, 6:20&#4294967295;am, Martin Eisenberg <martin.eisenb...@udo.edu> wrote:
> pnachtwey wrote: > > On Feb 26, 8:19&#4294967295;pm, cincy...@gmail.com wrote: > >> On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote: > > >> > I need a fast cube root routine. &#4294967295;I started with the > >> > obvioushttp://metamerist.com/cbrt/cbrt.htm > > I was hoping someone knows a variation of this that works on a > > TI C33. > > &#4294967295; &#4294967295; &#4294967295;unsigned int* p = (unsigned int *) &f; > > &#4294967295; &#4294967295; &#4294967295;*p = *p/3 + 709921077; > > My first try would be to clear the sign bit, do the divide and > offset, and put the sign back on, though I don't know how easy the > C33 makes the last step.
Bit 23 is the sign bit and it should be cleared. The problem is that after dividing the sign bit can be set depending on the exponent. I make sure the sign bit is cleared after the divide.
> Since the exponent is unbiased an offset of > zero should work. However, you can do better -- plot the relative > error over any interval spanning three exponent values (e.g. [1/2,4]) > and pick the offset to minimize the peak error. I have Maple code to > do this for IEEE754 that you could adapt. > > Additionally, you might gain from computing the *inverse* cube root > because that has a polynomial Newton function, reducing the number of > float divisions to one inversion at the end. > > Martin
The algorithm I use needs only one divide to compute the inverse of 3 times the parameter. After that I am just mulitplying. I believe this is the inverse cube root algorithm. I am thinking I need to OR a 0x00800000. The idea is that bit 23 is also an implied 1 as well as a sign bit, then I add a magic number derived from the Kahan "bit hack" and after the divide, or multiply by 0.333333333, I must make sure that bit 23 is cleared after this. The Kahan "bit hack" is taking into account that the IEEE floating point exponent has an offset. The TI floating point does not. Peter Nachtwey
On Feb 27, 9:54&#4294967295;am, pnachtwey <pnacht...@gmail.com> wrote:
> On Feb 26, 10:35&#4294967295;pm, Tim Wescott <t...@seemywebsite.com> wrote: > > > > > On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: > > > I need a fast cube root routine. &#4294967295;I started with the obvious > > >http://metamerist.com/cbrt/cbrt.htm > > > What is key is getting the initial estimate using the Kahan bit hack but > > > that only works with IEEE floating point and not with TI floating point. > > > &#4294967295;I can't believe that I am the first to do this since the C33 is > > > ancient. &#4294967295;The obvious solution is to convert the TI float to the IEEE > > > float, apply the hack, and then convert back to TI float but there must > > > be a better way. &#4294967295; Actually doing another iteration is probably faster. > > > The problem is that sign of the mantissa is at bit 23. > > > > Does anyboy know of a link? &#4294967295;Thanks. > > > > Peter Nachtwey > > > Dunno the details (I started doing DSP long after the C33 went obsolete, > > and I'm too cool* to use a floating point DSP anyway). &#4294967295;So I can't help > > you directly, but I _can_ jump in and kibbitz. > > > Have you considered extracting the exponent, doing a quick table look-up > > on the resulting mantissa bits, and using whatever the table coughs up as > > a seed for N-R? > > > * or perhaps my employers are too cheap. &#4294967295;Hmm. > > > --http://www.wescottdesign.com > > The exponent look up table is a good alternative because is quick but > a memory hog. &#4294967295;I was looking for a variation of the Kahan "bit > hack". &#4294967295; I might try that and use the mantissa to interpolate between > the values. &#4294967295;At least I will know. &#4294967295;I think this can be done without a > divide. > > Peter Nachtwey- Hide quoted text - > > - Show quoted text -
Hey Peter, you know you can do range reduction to put the number into something like [0.125, 1.000]. Just divide or mult the original number by 8 until it is in the right range. And the answer is scaled by 2^reductions. (reductions is signed number), then use a poly to find the cube root within this range. Or range reduce and then pick a crude seed and use a Newton's or higher order method to iterate the solution. IHTH, Clay
Vladimir Vassilevsky wrote:
 >
 > That algorithm computes one bit per iteration. Simple, but very slow.
 > Newton-Raphson efficiently doubles the number of known bits at each step.
 >

True - and while we're on that topic:

I've found out that higher order iteration can execute a good deal 
faster than simple newton-raphson steps if the target processor can 
issue multiple instructions per cycle but has high latency (e.g. C6000 
family).

Here's a nice summary of square-roots and invese square roots up to 
octic iterations (I stick with the cubic iterations though).

Btw - Peter, the link below also covers some cookbook receipts how to 
derive the high order iterations for any n-th root.

http://numbers.computation.free.fr/Constants/Algorithms/inverse.html

Cheers,
   Nils
clay@claysturner.com wrote:

> Hey Peter, you know you can do range reduction to put the number > into something like [0.125, 1.000]. Just divide or mult the > original number by 8 until it is in the right range. And the > answer is scaled by 2^reductions. (reductions is signed number), > then use a poly to find the cube root within this range. > > Or range reduce and then pick a crude seed and use a Newton's or > higher order method to iterate the solution.
The thing with explicit range reduction is that you need to extract the exponent field and reassemble the result, whereas Peter wants a code that swallows the input whole. I've experimented with the analogous Quake inverse-sqrt trick on x86, and the advantage was worth roughly one Newton iteration or two reduced-interval polynomial orders. Martin -- Quidquid latine scriptum est, altum videtur.
Vladimir Vassilevsky wrote:

> Tim Wescott wrote:
>> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: >>> I need a fast cube root routine.
>> Does anyone here remember how to take the square >> root of a decimal number via "long division"?
> That algorithm computes one bit per iteration. Simple, but very slow. > Newton-Raphson efficiently doubles the number of known bits at each step.
Not quite fair. Newton-Raphson requires multiply and possibly divide. If your processor doesn't do that in hardware, then it requires a loop that may generate one bit per iteration. You could do Newton-Raphson with pencil and paper... -- glen
Jerry Avins wrote:
(snip)

> There is in general a fundamental difference between paper-and-pencil > algorithms and those run on a computer with a good divide, and usually, > the computer version needs fewer steps.
> With paper and pencil, one wants the next digit that is just not too > large. With machinery that can divide, one wants the best approximation > without regard to the number of digits produced.
Yes, with the assumption that you have hardware divide, or maybe even just multiply. There is a transform called the "Integer Cosine Transform", similar to DCT but meant for processors without hardware multiply. (Specifically, the RCA CDP1802.) It was designed such that the forward transform was easy to do in software without a multiply. The coefficients were selected to minimize the number of '1' bits to speed up the processing. The inverse transform is assumed to be done on earth with a much faster processor with high speed arithmetic hardware. http://tmo.jpl.nasa.gov/progress_report/42-119/119M.pdf Also, I note that the post that mentioned the pencil and paper square root method indicated that it was decimal based. I presume that algorithm could be adapted to other bases than 10. -- glen
On Feb 27, 1:32&#4294967295;am, Tim Wescott <t...@seemywebsite.com> wrote:
> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: > > I need a fast cube root routine. &#4294967295;I started with the obvious > >http://metamerist.com/cbrt/cbrt.htm > > What is key is getting the initial estimate using the Kahan bit hack but > > that only works with IEEE floating point and not with TI floating point. > > &#4294967295;I can't believe that I am the first to do this since the C33 is > > ancient. &#4294967295;The obvious solution is to convert the TI float to the IEEE > > float, apply the hack, and then convert back to TI float but there must > > be a better way. &#4294967295; Actually doing another iteration is probably faster. > > The problem is that sign of the mantissa is at bit 23. > > > Does anyboy know of a link? &#4294967295;Thanks. > > > Peter Nachtwey > > Does anyone here remember how to take the square root of a decimal number > via "long division"? &#4294967295;My dad taught me how when I was considerably > shorter than I am now, possibly because he was appalled at his brother > teaching me how to do it on a calculator using Newton's method. > > For those of you under 80, it's basically a method for churning out > decimals of the square root of a number in a method very akin to long > division, except that at each step instead of subtracting (cumulative > result) * (guess) you subtract (2 * cumulative result + guess) * (guess) > -- you're basically trying to complete the equation (cumulative result + > guess) ^ 2 = (cumulative result)^2 + 2 * (cumulative result)(guess) + > (guess)^2. > > It turns out that -- like long division -- it's really easy to do on a > computer in binary because when you're only multiplying by 0 or 1 you > don't have to waste a lot of time guessing. > > It also turns out that if you're a _real_ math geek you can work out how > to do this with cube roots, and, like square roots and division the > guesswork is held to a minimum because you're just dealing with ones and > zeros. > > If the group shows sufficient interest I'll write it up and post it. &#4294967295;I > figured it out when I was in high school then for got it. &#4294967295;Peter's post > got me interested enough to figure it out again (thanks Peter!), but I'm > not sure if it's faster than Newton's method unless you're doing the work > on an FPGA. > > Each step starts with you knowing the starting value - answer^3, and > needing to calculate 3*answer^2 and 3*answer. &#4294967295;You then have to subtract > those two numbers from your residual to know if you emit a '1' or a '0'. > > I _suspect_ that it's not as quick as Newton's method, but it's late and > I'm not (by gawd) going to check! > > --http://www.wescottdesign.com
How about: "A Radix-2 Digit-by-Digit Architecture for Cube Root" Source IEEE Transactions on Computers Volume 57 , Issue 4 (April 2008) table of contents Pages 562-566 Year of Publication: 2008 ISSN:0018-9340 Authors Alex Pineiro, Javier D. Bruguera, Fabrizio Lamberti, Paolo Montuschi Its a good darticle. Clay