DSPRelated.com
Forums

Fast Cube Root Using C33

Started by pnachtwey February 26, 2009
I need a fast cube root routine.  I started with the obvious
http://metamerist.com/cbrt/cbrt.htm
What is key is getting the initial estimate using the Kahan bit hack
but that only works with IEEE floating point and not with TI floating
point.  I can't believe that I am the first to do this since the C33
is ancient.  The obvious solution is to convert the TI float to the
IEEE float, apply the hack, and then convert back to TI float but
there must be a better way.   Actually doing another iteration is
probably faster.  The problem is that sign of the mantissa is at bit
23.

Does anyboy know of a link?  Thanks.

Peter Nachtwey
On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote:
> I need a fast cube root routine. I started with the obvioushttp://metamerist.com/cbrt/cbrt.htm > What is key is getting the initial estimate using the Kahan bit hack > but that only works with IEEE floating point and not with TI floating > point. I can't believe that I am the first to do this since the C33 > is ancient. The obvious solution is to convert the TI float to the > IEEE float, apply the hack, and then convert back to TI float but > there must be a better way. Actually doing another iteration is > probably faster. The problem is that sign of the mantissa is at bit > 23. > > Does anyboy know of a link? Thanks. > > Peter Nachtwey
Did you evaluate the Turkowski algorithm at the page you linked? It uses a slightly more complicated seeding algorithm for the initial guess for Newton's algorithm, but it's still relatively simple (a quadratic). I don't know what TI's floating-point format is, but I'd guess it might be more expedient to evaluate one quadratic polynomial than to perform the bit manipulations needed to convert back and forth between formats. Jason
On Feb 26, 8:19&#4294967295;pm, cincy...@gmail.com wrote:
> On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote: > > > I need a fast cube root routine. &#4294967295;I started with the obvioushttp://metamerist.com/cbrt/cbrt.htm > > What is key is getting the initial estimate using the Kahan bit hack > > but that only works with IEEE floating point and not with TI floating > > point. &#4294967295;I can't believe that I am the first to do this since the C33 > > is ancient. &#4294967295;The obvious solution is to convert the TI float to the > > IEEE float, apply the hack, and then convert back to TI float but > > there must be a better way. &#4294967295; Actually doing another iteration is > > probably faster. &#4294967295;The problem is that sign of the mantissa is at bit > > 23. > > > Does anyboy know of a link? &#4294967295;Thanks. > > > Peter Nachtwey > > Did you evaluate the Turkowski algorithm at the page you linked? It > uses a slightly more complicated seeding algorithm for the initial > guess for Newton's algorithm, but it's still relatively simple (a > quadratic).
I didn't consider that one because of the mod operator and too many divides. The polynomial isn't a problem because all the instructions are 1 clock cycle and just a few multiply and adds but there is no divide or mod instruction on a TI C33 so they are expensive. The ldexp () and frexp() functions aren't that cheap either. I was hoping someone knows a variation of this that works on a TI C33. unsigned int* p = (unsigned int *) &f; *p = *p/3 + 709921077; This is mean and lean. I think that even the generic nth root modified to work on the TI C33 and an extra iteration is a better option. I just need to avoid divides. Peter Nachtwey
pnachtwey wrote:

>>On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote:
>>>I need a fast cube root routine.
(big snip)
> I didn't consider that one because of the mod operator and too many > divides. The polynomial isn't a problem because all the instructions > are 1 clock cycle and just a few multiply and adds but there is no > divide or mod instruction on a TI C33 so they are expensive. The ldexp > () and frexp() functions aren't that cheap either.
I didn't look at the specific routine, but the most important part should be dividing the unbiased exponent by three. (That can be done with a fixed point multiply by something close to 1/3.) If the C33 floating point is like some of the other TI DSPs it is a little unusual, but it shouldn't be so hard to get the exponent out. Maybe extract the exponent bits then use a lookup table to find a multiplier. Multiply by the appropriate table entry and then start Newton-Raphson. The better the initial guess the faster it converges.
> I was hoping someone knows a variation of this that works on a TI C33. > unsigned int* p = (unsigned int *) &f; > *p = *p/3 + 709921077; > This is mean and lean. > I think that even the generic nth root modified to work on the TI C33 > and an extra iteration is a better option. I just need to avoid > divides.
-- glen
pnachtwey wrote:
> On Feb 26, 8:19 pm, cincy...@gmail.com wrote: >> On Feb 26, 10:04 pm, pnachtwey <pnacht...@gmail.com> wrote: >> >>> I need a fast cube root routine. I started with the obvioushttp://metamerist.com/cbrt/cbrt.htm >>> What is key is getting the initial estimate using the Kahan bit hack >>> but that only works with IEEE floating point and not with TI floating >>> point. I can't believe that I am the first to do this since the C33 >>> is ancient. The obvious solution is to convert the TI float to the >>> IEEE float, apply the hack, and then convert back to TI float but >>> there must be a better way. Actually doing another iteration is >>> probably faster. The problem is that sign of the mantissa is at bit >>> 23. >>> Does anyboy know of a link? Thanks. >>> Peter Nachtwey >> Did you evaluate the Turkowski algorithm at the page you linked? It >> uses a slightly more complicated seeding algorithm for the initial >> guess for Newton's algorithm, but it's still relatively simple (a >> quadratic). > I didn't consider that one because of the mod operator and too many > divides. The polynomial isn't a problem because all the instructions > are 1 clock cycle and just a few multiply and adds but there is no > divide or mod instruction on a TI C33 so they are expensive. The ldexp > () and frexp() functions aren't that cheap either. > > I was hoping someone knows a variation of this that works on a TI C33. > unsigned int* p = (unsigned int *) &f; > *p = *p/3 + 709921077; > This is mean and lean. > I think that even the generic nth root modified to work on the TI C33 > and an extra iteration is a better option. I just need to avoid > divides.
I can check what I don't remember, but not now (it's after midnight). Isn't the 'C33 float a signed exponent rather than offset? If so, that's even easier than IEEE. Just divide the exponent part by three, subtract 1, and set the mantissa to (almost) 1. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote:

> I need a fast cube root routine. I started with the obvious > http://metamerist.com/cbrt/cbrt.htm > What is key is getting the initial estimate using the Kahan bit hack but > that only works with IEEE floating point and not with TI floating point. > I can't believe that I am the first to do this since the C33 is > ancient. The obvious solution is to convert the TI float to the IEEE > float, apply the hack, and then convert back to TI float but there must > be a better way. Actually doing another iteration is probably faster. > The problem is that sign of the mantissa is at bit 23. > > Does anyboy know of a link? Thanks. > > Peter Nachtwey
Does anyone here remember how to take the square root of a decimal number via "long division"? My dad taught me how when I was considerably shorter than I am now, possibly because he was appalled at his brother teaching me how to do it on a calculator using Newton's method. For those of you under 80, it's basically a method for churning out decimals of the square root of a number in a method very akin to long division, except that at each step instead of subtracting (cumulative result) * (guess) you subtract (2 * cumulative result + guess) * (guess) -- you're basically trying to complete the equation (cumulative result + guess) ^ 2 = (cumulative result)^2 + 2 * (cumulative result)(guess) + (guess)^2. It turns out that -- like long division -- it's really easy to do on a computer in binary because when you're only multiplying by 0 or 1 you don't have to waste a lot of time guessing. It also turns out that if you're a _real_ math geek you can work out how to do this with cube roots, and, like square roots and division the guesswork is held to a minimum because you're just dealing with ones and zeros. If the group shows sufficient interest I'll write it up and post it. I figured it out when I was in high school then for got it. Peter's post got me interested enough to figure it out again (thanks Peter!), but I'm not sure if it's faster than Newton's method unless you're doing the work on an FPGA. Each step starts with you knowing the starting value - answer^3, and needing to calculate 3*answer^2 and 3*answer. You then have to subtract those two numbers from your residual to know if you emit a '1' or a '0'. I _suspect_ that it's not as quick as Newton's method, but it's late and I'm not (by gawd) going to check! -- http://www.wescottdesign.com
On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote:

> I need a fast cube root routine. I started with the obvious > http://metamerist.com/cbrt/cbrt.htm > What is key is getting the initial estimate using the Kahan bit hack but > that only works with IEEE floating point and not with TI floating point. > I can't believe that I am the first to do this since the C33 is > ancient. The obvious solution is to convert the TI float to the IEEE > float, apply the hack, and then convert back to TI float but there must > be a better way. Actually doing another iteration is probably faster. > The problem is that sign of the mantissa is at bit 23. > > Does anyboy know of a link? Thanks. > > Peter Nachtwey
Dunno the details (I started doing DSP long after the C33 went obsolete, and I'm too cool* to use a floating point DSP anyway). So I can't help you directly, but I _can_ jump in and kibbitz. Have you considered extracting the exponent, doing a quick table look-up on the resulting mantissa bits, and using whatever the table coughs up as a seed for N-R? * or perhaps my employers are too cheap. Hmm. -- http://www.wescottdesign.com
Tim Wescott wrote:
(snip)

> Does anyone here remember how to take the square root of a decimal number > via "long division"? My dad taught me how when I was considerably > shorter than I am now, possibly because he was appalled at his brother > teaching me how to do it on a calculator using Newton's method.
I have a book that tells how to do it on an abacus. I believe it is the same or similar algorithm, but I never used the pencil and paper method. I once did sqrt(2) to six places after the decimal point on an abacus, though. For sqrt, you divide the abacus in half, one part starts with the number whose square root is desired, and decreases during the calculation. The other increases toward the desired square root. There is also a cube root algorithm that divides the abacus in thirds. Again the original number decreases and the 2/3 and 1/3 power increase, finally resulting in the cube root. For more places you need two abacii (abacuses) together, but I only have one. -- glen
Tim Wescott wrote:
> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: > >> I need a fast cube root routine. I started with the obvious >> http://metamerist.com/cbrt/cbrt.htm >> What is key is getting the initial estimate using the Kahan bit hack but >> that only works with IEEE floating point and not with TI floating point. >> I can't believe that I am the first to do this since the C33 is >> ancient. The obvious solution is to convert the TI float to the IEEE >> float, apply the hack, and then convert back to TI float but there must >> be a better way. Actually doing another iteration is probably faster. >> The problem is that sign of the mantissa is at bit 23. >> >> Does anyboy know of a link? Thanks. >> >> Peter Nachtwey > > Does anyone here remember how to take the square root of a decimal number > via "long division"? My dad taught me how when I was considerably > shorter than I am now, possibly because he was appalled at his brother > teaching me how to do it on a calculator using Newton's method. > > For those of you under 80, it's basically a method for churning out > decimals of the square root of a number in a method very akin to long > division, except that at each step instead of subtracting (cumulative > result) * (guess) you subtract (2 * cumulative result + guess) * (guess) > -- you're basically trying to complete the equation (cumulative result + > guess) ^ 2 = (cumulative result)^2 + 2 * (cumulative result)(guess) + > (guess)^2. > > It turns out that -- like long division -- it's really easy to do on a > computer in binary because when you're only multiplying by 0 or 1 you > don't have to waste a lot of time guessing. > > It also turns out that if you're a _real_ math geek you can work out how > to do this with cube roots, and, like square roots and division the > guesswork is held to a minimum because you're just dealing with ones and > zeros. > > If the group shows sufficient interest I'll write it up and post it. I > figured it out when I was in high school then for got it. Peter's post > got me interested enough to figure it out again (thanks Peter!), but I'm > not sure if it's faster than Newton's method unless you're doing the work > on an FPGA. > > Each step starts with you knowing the starting value - answer^3, and > needing to calculate 3*answer^2 and 3*answer. You then have to subtract > those two numbers from your residual to know if you emit a '1' or a '0'. > > I _suspect_ that it's not as quick as Newton's method, but it's late and > I'm not (by gawd) going to check!
There is in general a fundamental difference between paper-and-pencil algorithms and those run on a computer with a good divide, and usually, the computer version needs fewer steps. With paper and pencil, one wants the next digit that is just not too large. With machinery that can divide, one wants the best approximation without regard to the number of digits produced. Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;

Tim Wescott wrote:

> On Thu, 26 Feb 2009 19:04:01 -0800, pnachtwey wrote: > > >>I need a fast cube root routine.
[...]
> > Does anyone here remember how to take the square root of a decimal number > via "long division"?
[...] That algorithm computes one bit per iteration. Simple, but very slow. Newton-Raphson efficiently doubles the number of known bits at each step. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com