## IQ mag: Integer approximation of square root of the sum of two squares

Started by 1 year ago●23 replies●latest reply 1 year ago●277 viewsAnyone knows the name of this algorithm and a fast and also integer alternative?

Int16 Mag(int16 i, int16 q)

(

If abs(i) > abs(q)

Return abs(i) + abs(q) / 4;

Else

Return abs(q) + abs(i) / 4;

)

This is the "Alpha Max + Beta Min" algorithm as a high-speed approximation for the magnitude of a complex number (or more generically the square root of the sum of two squares. Please see https://en.wikipedia.org/wiki/Alpha_max_plus_beta_...

Hi Dan. I learned Robmar's algorithm back in the 1980s and was told it was called "Max plus beta min." Back then I thought, with an additional multiply perhaps we could improve that algorithm's performance. So I started experimenting with multiplying the max value by an alpha value where 0 ≤ alpha ≤ 1. I decided to call the modified algorithm "Alpha max plus beta min."

Interesting, even more so if you would share your work :) Thanks!

Hello Robmar. My ol' DSP pal Grant Griffin posted material on the Alpha Max Plus Beta Min algorithm at:

Very nice Rick! I didn't know that, thanks for the additional history and nice work. Until I reread the post, I was going to go forward from this point thinking that it was originally "Robmar's Algorithm" up until the day you improved it. But now I get that you meant the algorithm that the poster Robmar here described.

Thanks for that, very useful!!

Dan,

I can't believe you didn't mention these:

Efficient Magnitude Comparison for Complex Numbers

Best Approach to Rank Complex Magnitude Comparision Problem

Related but those don’t really fit the bill as they are used to compare two complex magnitudes rather than estimate the complex magnitude. However what I didn’t mention but is also an efficient estimator is the CORDIC Rotator. (Use the CORDIC to rotate the angle to zero with the result on I as the magnitude times the CORDIC gain

Close enough to be very useful, IMO. The big challenge there was avoiding any multiplications, hence a lot of shifts in the solutions.

Here is how I think I would do this one:

===================== prep() for M = 0 to 2^5 A = atan( M / (2^5) ) XF[M] = 2^7 * cos( A ) YF[M] = 2^7 * sin( A ) next ===================== mag( I, Q ) AI = abs( I ) AQ = abs( Q ) if AI + AQ = 0 return 0 if AI < AQ then X = AQ : Y = AI else X = AI : Y = AQ M = ( Y << 5 ) / X D = ( X * XF[M] + Y * YF[M] ) >> 7 return D =====================

Here the '5' represents a 2^5 + 1 = 33 place lookup table.

The '7' represents a fixed point fraction offset of 7 in the table values.

Why it works.

--------------- X = R cos( A ) Y = R sin( A ) X * XF + Y * YF = R cos( A ) * cos( A ) + R sin( A ) * sin( A ) = R ( cos^2( A ) + sin^2( A ) ) = R ---------------

You can also think of it as a local plane approximation to sqrt( x^2 + y^2 ), or a rotation of the point to the real axis.

The OP didn't specify any tolerance, so I have no recommendation for the table size.

I have known about this for decades, but never saw it called an integer approximation. In the early days of my career, a square root operation was a big expense, but I noticed that most of the time, one ended up comparing the computed square root of the sum of squares to some threshold, and it was easier to compare the sum of the squares to the square of the threshold.

It is also worth reflecting on how this affects investigations of signals in noise and the apparent sharpness of the transition.

In a broad band signal you can add 50% noise for only a 10% increase in the total power, and like wise if the signal is a full 50% of the noise, you won't see it's 10% overall contribution.

It you are ever testing for susceptibility you should try 'synchronising' the noise injection to a visible alias frequency of the equipment operating frequency. Unfortunately its not mentioned in any of the standards. Essentially you look to make the injection noise look like 'Hum Bars' on the signal (Hum bars were a feature of US TV sets where the AC supply frequency (and it's harmonic) was a fraction different (apparently by design) from the TV line frequency. Works wherever you have a synchronised 'image' display - because everyone can see the obvious moving injceted noise!

I haven't got my head around the varied use of noise injection yet. Was reading on that in the technique to increases ADC resolution if there was enough noise on the sampled signal, or adding noise, so that a 1 or more but increase could be achieved. I mean if the signal is so stable that you need noise to flip the LS bit, what's the gain?! Thanks for your reply

Years ago (on a Sinclair QL IIRC) I did calcs regarding adding rms noise equal to 1/2 lsb and found that with sufficient averaging, you got at least 8 extra bits of linearity. You need 2^(2N) samples to get N bits of extra linearity.

I suspect it's also a part of the reason we get DAC / ADC non-linearity limits of a 2-lsb step, so 1 lsb rms (Gaussian) would hide it ;-).

The spike noise injection (e.g. close capacitive coupling) is useful with systems that have sampled ADCs such as cameras and imagers so the aperture time for a single noise spike is low, but if you make it synchronised to a scrolling pattern, the eye will pick it out really easy. Had that happen many years ago with a perfect 600Hz drive signal interfering with an NTSC timed imager system and generating beautiful scrolling bands, even though all the RFI tests passed.

There are variations on the integer IQ magnitude approximation (still with compute-friendly coefficient values) that give a lower error that that posted above and also with an error that is better centred about zero. If I recall correctly:

mag ~= 7/8 abs(largest) + 1/2 abs(smallest)

and better still mag ~= 15/16 abs(largest) + 1/2 abs(smallest)

(7/8 and 15/16 can be computed using a shift and a subtract)

On a side note, if you need **log** **magnitude** and can afford the time needed for a log10 call then using 10 * log10 (sum of squares) gives you the square root for free.

Tim.

I learned something today! Updated my spreadsheet to confirm :)

I | Q | SOS | Z = sqrt((I*I)+(Q*Q)) |
20*log10(Z) | 10*log10(SOS) | “Mag” function | % error |

20 | 0.01 | 400.000 | 20.000 | 26.021 | 26.021 | 20.0025 | 0.012 |

20 | 1 | 401.000 | 20.025 | 26.031 | 26.031 | 20.25 | 1.124 |

20 | 5 | 425.000 | 20.616 | 26.284 | 26.284 | 21.25 | 3.078 |

Thanks Tim, that's helpful, I'll give those a try!

Check out this post:

https://www.dsprelated.com/showthread/comp.dsp/652...

Same math, different reason.

Thanks for the link, its an interesting post, but too many CPU cycles!

I guess with z = sqrt( ( I * I ) + ( Q * Q ) ), as Q becomes <<< I, the contribution from Q becomes less and less, and the result approaches sqrt( I * I ), or abs(I), with some change, which is the abs(Q)/4. And vice versa.

I did some quick spreadsheetin' as empirical proof:

I | Q | z | Mag fn | % error |

20 | 0.01 | 20.000 | 20.0025 | 0.012 |

20 | 1 | 20.025 | 20.25 | 1.124 |

20 | 5 | 20.616 | 21.25 | 3.078 |

I think this is already a fast approximation to the full sqrt operation (sqrt being costly in CPU cycles), using integer values. The big hit in your routine is in the pipeline-clearing logical If-Else. If you knew for certain that Q was always <<< I, or vice versa, you could get rid of that by just using one of those two Return statements.

Hi, thanks for your reply, that's interesting! Maybe it is about the best there is.

The '7' is actually the number of effective significant bits. Headroom calculations are application dependent.

Otherwise, without too large a table, you should be able to get very reasonable results.

===================== prep() for M = 0 to 2^5 A = atan( M / (2^5) ) Cos_M[M] = 2^7 * cos( A ) Sin_M[M] = 2^7 * sin( A ) next ===================== mag( I, Q ) AI = abs( I ) AQ = abs( Q ) if AI < AQ then M = ( AI << 5 ) / AQ D = ( AQ * Cos_M[M] + AI * Sin_M[M] ) >> 7 else if AI = 0 return 0 M = ( AQ << 5 ) / AI D = ( AI * Cos_M[M] + AQ * Sin_M[M] ) >> 7 return D =====================

This answer is what you might be looking for.

https://dsp.stackexchange.com/questions/62893/#630...

It divides the arc up without the division, and multiplies the coefficients with hard coded shifts and adds.

Since you are "counting cycles", I would say, "test it, and time it" instead.

In your code, abs() is called six times when you need to do it just twice and also MUL usually is cheaper than DIV so /4 could be changed to *0.25. How much you could gain in execution speed depends on your compiler/-options... . If you do it using C++, current GCC and Clang compilers also does good job with those std library functions (maybe with sqrt() and abs() too (I have not bench-marked these two)).

EDIT: https://www.desmos.com/calculator/vtrllpldhf

Looks like (when compiled using C++/GCC -O3 -ffast-math) library sqrt() as

int16_t intsqrt(int16_t x, int16_t y){ return sqrt(x * x + y * y); }

is just little (~0.3 rdtsc/val) slower than function (*with little modifications mentioned in discussion*) given in opening post, so, why approximate (of course if you use other programming language or compiler which does not do good optimization...)?

The abs calls can only occur four times in any particular call. Again, an optimizer will catch this with temporary variables implemented as registers, but since this is pedantic mode, your point is true.

Getting hardcore, it used to be that "if x < 0 then x = -x" was faster than "abs()". You just can't tell anymore with pipelines and predictive branching and stuff. Like I said in my post, code the alternatives and test them. Compilers have gotten rather sophisticated and tend to beat hand assembly now a days.