## Parallel CIC filter processing enquiry

Started by 4 years ago●11 replies●latest reply 4 years ago●440 viewsHi all,

I am new to the forums and have read several posts regarding CIC filter theory and implementation, however I am trying to do something different to a single input/single output CIC filter and would appreciate some advice.

__The Proposed Design__

I am attempting to implement several digital lock-in amplifiers operating in parallel in a single FPGA. My single lock-in amplifier circuit already has the reference digital oscillator and signal which I want to demodulate and I have already done the easy part, which is just to multiply them together. I am attempting to implement a hardware efficient filter to filter out everything except the DC component and I have settled on a CIC Decimation filter, for resource efficiency. The aim is to eventually replicate this circuit many times within a single FPGA so resource use is a priority. The FPGA is a Xilinx FPGA and I am using Vivado 2015.2.

__The Difficult part__

The data stream is a supersampled dataset where the ADC is operating at 500MS/s and the FPGA clock is 100MHz. The data is split across 5 parallel, 16 bit samples. As a result, I don't think I can simply instantiate 5 CIC IP cores and be done with it. Please correct me if I am wrong here. It seems to me that the best way to achieve the filtering in this case is to have 5 integrators, whose outputs then feed into a single comb stage in a multiplexed fashion. The coding of the CIC filter seemed straight forward enough, however I am not getting any meaningful results. I am also choosing to not use a compensation filter as I am only concerned with the DC component of the lock-in.

**Current progress**

I have been simulating the single stage CIC decimator in Matlab to try and understand the output properly and it seems to me that when there is a signal with a non-zero DC fed into the CIC, the output just grows over time. So I have a couple of questions.

__Questions__

1. Can a CIC filter be simulated in matlab without the dsp toolbox, using code only?

2. I have read that twos complement arithmetic wraparound is an inherent property of the CIC filter. I can understand that data will wrap around in the integrator stage, however I have not been able to understand why this can be ignored. In my wave simulations for my verilog code, I can see that the integrator wraps around and the output of the comb stage just looks like noise.

3. Is the proposed CIC filter the best way to find the DC component of each lock-in signal? If not, I would love to hear about any other filters that might be more suitable.

I have not done a lot of DSP before so any advice would be greatly appreciated!

Cheers,

Phil

To answer your first question, yes, you can simulate CIC filters in Matlab or Octave or whatever tool you wish. One caveat with Matlab/Octave is that you have to keep track of any possible operations that might have a non-integer result or otherwise not behave like the limited precision in your implementation. Rollover is a good example, but it's simple to just test for the rollover condition and simulate its effect.

I've done this many, many times in Matlab/Octave. You do have to pay attention to the details, but it's not that bad and then you get all the benefit of the plotting tools, ability to debug, etc.

When your target is a hardware implementation it is then also very useful to use the Matlab/Octave data as test vectors for the hardware. This works very well and makes for a simpler and shorter debug cycle in the hardware.

Thanks for the advice.

If I understand you correctly, the best way to simulate this in matlab would be to convert any voltage waveform to the decimal representation of the 16 bit number that would usually be achieved through digitization and proceed from there?

For instance, a sine wave with a Vpeak of +-1.5 would translate to +-32768 in the 16 bit digitized equivalent. Then all CIC operations should simply result in integer values since we are only using shift and add operations. I have also applied the same logic to simulating the rollover condition. I just use +-32768 as a limit.

Good tip about the test vectors as well. That will be very useful.

Thanks!

For the most part, that's what I'd do, yes, but for stuff like that I'm usually simulating something that will ultimately be implemented. If the implementation is in an FPGA, then it will be integer arithmetic, especially for a high-rate CIC filter, so making sure everything is simulated that way is useful in the long run.

Quantizing the inputs as you suggest gets you most of the way there, since then all of the simulated operations will result in integer results until/unless you throw a divide or fractional multiply in there somewhere. After that you just watch out for the rollovers and you're pretty much there.

Num_Samples = 80; % Number of input time samples

N = 8;% DFT size

n = 0:Num_Samples-1;

Fs = 1000; % Sample rate in Hz

Fo = 50; % Input sinusoid's frequency

%%%%%%%%%%%%%%%%%%%%%

%Generate an input sequence

%%%%%%%%%%%%%%%%%%%%%%

x = sin(2*pi*Fo*n/Fs);

%x(20:65) = x(20:65) + 2; % Add a DC bias

%x = zeros(1,Num_Samples); x(1) = 1; % Impulse input

%x = ones(1,Num_Samples); % All-ones input

% Initialise variables

Delay_Line = zeros(1,N+1);

Integrator_Out_Old = 0;

% The iteration loop

for n=1:length(x);

% Calculate the output of the Integrator

Integrator_Out = x(n) + Integrator_Out_Old;

% Update the FIFO Delay_Line

Delay_Line = [Delay_Line(2:end), Integrator_Out];

% Find the value of w(n-N) from Delay_Line

Integrator_Out_n_minus_N = Delay_Line(1);

% Calculate the output of the Comb filter

Comb_Out = Integrator_Out -Integrator_Out_n_minus_N;

CIC_Out(n) = Comb_Out;

% Shift input data

Integrator_Out_Old = Integrator_Out;

end

%Plot the input

figure(1), clf

plot(x, ':bs')

grid on, zoom on

% Plot the CIC output

figure(2), clf

plot(CIC_Out,':bs')

grid on, zoom on

I suggest you apply one of your five input sequences (it doesn't matter which one) to a CIC filter with no decimation. The CIC filter output samples will be proportional to the instantaneous N-point moving average of your input sequence.

Answer 1: Here's some MATLAB code (not using the 'dsp toolbox') for you:

Answer# 2: I can send you some information on two's complement arithmetic wraparound if you wish. Just send me a private e-mail by clicking my name on this web page.

Thanks so much for the code Rick!

I simulated your code and using the inputs that I have for my system and it works as expected! I have put it side by side with my code to see what I was doing wrong and it looks as though I am not modelling the delay line correctly, so I will rejig my code and see if I can get it to simulate like yours.

I will send you an email as well regarding the two's complement wraparound. I just simply don't have an intuitive understanding of why its effects can be ignored and the output result is still accurate. I'll be in touch.

Thanks,

PhilHi Rick,

Regarding your view "I suggest you apply one of your five input sequences (it doesn't matter which one) to a CIC filter with no decimation..."

It seems you have misunderstood the five substreams. There is one ADC signal decimated directly into five. e.g. substream1 has samples s1,s6,s11, etc and substream2 has samples s2,s7,s12 and so on.

In other words each of five substreams is just branch of main ADC signal. You cannot process one substream on its own as it is directly decimated by 5 and will have (dc) aliasing right at input to any filter.

Hi kaz.

I think you're correct. I figured any DC component of the original analog signal will exist as a DC component of any decimated sequence. But if each data sequence is a "decimated data sequence" then any spectral energy at ±100 or ±200 Mhz will alias to DC in all of the decimated sequences. Good point kaz!!

Hi,

If you are concerned with dc component only then consider a simpler approach to get the average (block average) of the 5 streams into one stream. Next you can use any further filtering to get dc only from that combined stream.

Hi Kaz,

Thanks for this reply, I didn't think of this at all! I'll give it a shot and see how it goes. I suppose it will be fine, I am looking to down sample the data by a factor of 40 anyway.

Thanks for the suggestion.

Hi Phil_SG. You wrote: "The data is split across 5 parallel, 16 bit samples." What does that exactly mean? Can you post a diagram showing data split across 5 parallel 16 bit samples?

Hi Rick,

Basically, the 500MS/s ADC operates on a 100MHz clock and generates 5 samples with each clock tick. Simply, the full bandwidth of the sampler is only achieved by having the samples in parallel. For any processing module this will look like.....

input wire [15:0] sample1,

input wire [15:0] sample2,

input wire [15:0] sample3,

input wire [15:0] sample4,

input wire [15:0] sample5,

With each of those samples being clocked in simultaneously at a clock rate of 100MHz.