## Floating Point to Fixed Point Conversion

Started by 7 years ago●13 replies●latest reply 7 years ago●1336 viewsHi!

I am working on converting a floating point MATLAB code into fixed point C code. The code is to be run on a DSP platform, the specifics of which are yet to be decided.

I'm new to floating point to fixed point conversion and am facing quite a few challenges.

My questions are:

1. What is the best way for me to start and where should I start, because the topic is quite confusing.

2. My MATLAB code is basic filtering code; there's FFT, windowing followed by IFFT. Windowing seems simple enough as its multiplication, but I'm totally lost when it comes to FFT and IFFT.

Thanks!

Hi,

passing from floating point to fixed is no more, no less than scaling numbers.

But in all cases, the VERY FIRST question when you come to fixed point is : what is the resolution and format you can/you want to use?

Resolution is number of bits, format is position of the 2^-1 bit in the word. Resolution gives the range, format gives you the accuracy you can achieve in fractional numbers.

Once you have done that, you will know what is the range of the values, and what SNR you can achieve. You can also turn it in the other way : take the SNR and range prerequisites, then define the resolution and format matching your needs.

Once you have done that, you will know the scaling you have to apply to all your numbers. If you use a general purpose processor, take care to implement the shift after each multiplication (in most cases, fixed point DSPs do this shift automatically)

You can also take a look to FFT source code from Analog Devices for their ADSP-218x DSP family, which are implementing FFT very efficiently in fixed point format, it's very helpful most of the time

Benoit

Hi,

With regards to the point of format and resolution; the output of the ADC that I'm using which will provide the input is 24 bits. My filter coefficients are fully fractional. I'm thinking of using 16 bits for the time being and taking a look at the results. This will have to be modified somehow later as we are aiming for a 16 bit processor or in the worst case scenario a 32 bit one.

I know how to convert the data into fixed point by quantizing i.e. multiplying the range by 2^n - 1. And I'm able to perform simple operations. But I'm not able to wrap my head around fixed point FFT. Surely such a code must be available easily right given that fixed point processors are found in abundance in various devices?

Hi,

It is an FIR filter for use in a biomedical device. The filter has been designed to extract the required signal components. So the way I'm implementing it is through fast convolution. Are you saying that instead of the frequency domain multiplication, the time domain convolution will be better?

Oh -- well, an FIR filter isn't a simple low-pass filter.... how many FIR coefficients?

You're probably doing the right thing, except that fixed-point implementation is tricky and this might push you towards "regular" convolution if the FIR filter isn't too large.

My FIR filter has 512 coefficients and I'm trying to make it a real time processing program. Ever since FFT algorithm has been discovered, haven't all time-convolutions been replaced by fast convolutions?

Usually CIC filters are used in multi-rate DSP applications right? My filter is a gaussian FIR filter which has been developed by someone else.

You are right that FFT/iFFT need special care in order to avoid overflow without ruining SNR. This topic has been discussed here before - try to find the previous threads.

In any case you are in luck, since all fixed point DSP platforms come with optimally encoded FFT libraries. Find the library routine that fits your needs and use it.

Y(J)S

Hi,

My main problem is that the platform hasn't been decided yet. My project manager wants me to convert the filtering code into fixed point C, then simulate it on various platforms to figure out which platform would be suited best. I understood the concept of fixed point but am not able to get around to implementing FFT in fixed point. I'm also not able to find any fixed point DSP libraries.

Understood.

You must decide whether you want to go for variable scaling (i.e., at each butterfly check if rescaling is required) or constant scaling (which can be worst case - i.e., reduction by a factor of 2 at each stage, or reasonable - i.e., reduction by a factor of 2 every other stage). The former is more computation, but saves more bits of accuracy.

A quick google for "fixed point FFT" turned up C language routines for both options. I don't want to include links as I haven't reviewed the code.

Y(J)S

Hello

I suggest to use ARM Cortex M4 processor. It has all the capabilities to do FFT/IFFT since it provides ARM CMSIS library.

Even im also trying do the same thing. FFT/Convolution code convert into run with Fixed point DSP. I recommend to use Teensy 3.2. It has a ARM Cortex M4. But doesn't have any Hardware floating point unit.

If you can post your MATLAB code here or send PM

you're in the right track, i guess. before deciding the platform you should simulate the cases using reference code. so the sequence can be:

1. write your algorithm in floating point code and make sure that it works as you design/matlab code.

2. convert #1 to fixed-point simulation code by replacing all floating point data type and operations into fixed point. in your case (FIR or FFT), you will only have Add and Mult.

2a. Data type : starting with Int32 should be fine unless you're looking for very high accuracy cases. decide scaling scheme based on Q7.24.

2b. operations : add - write a macro/function to to add two numbers. for debug/analysis, use int64 for intermediate data overflow check. ie, when the abs sum goes above 2^24. fix an accumulator size and check this range overflow also for the intermediate sum.

2c. operations : mult - to start with have a unifirm multiplication macro/function which multiplies Q.24 with Q.24 to produce Q.24.

2.d additionally you may want to fix your tables, in this case, sin and cos values for FFT/IFFT twiddling. this also start with a uniform Q format, say Q.24.

once you have the above code compiled, run the code with couple of inputs. compare the results with that with #1. if ok, run with more cases:you'll find some cases will fail. if there is no other bugs in your code, for all the failed cases, you will notice that there is an overflow message from #2b.

keep changing the format and accumulator sizes till your case(s) pass. any case which is not passing with your possible combinations you may have to go for stagewise variable scaling. ie, keep different Q format for different algorithm stages. This is the most common approach.

if you're failing to find a variable scaling scheme, you have to go for variable scaling scheme. in this you have more than one option, simplest, is to go for block scaling. ie, pre-scale you input to the range in which your algorithm is comfortable; then operate on the result and state based on the scale. with this scheme, you can re-use all you fixed-scale scheme as is.