DSPRelated.com
Forums

Can we paralleize the integrator stage of CIC?

Started by flutekick 8 years ago13 replieslatest reply 7 years ago661 views

Hi,


Since, the integrator stage of #CIC operates at higher frequency compared to differentiator, it becomes critical for timing in a high-speed design. Is there a known way to parallelize the integrator?

Thanks in advance,

Krishna

[ - ]
Reply by ahmedshaheinApril 18, 2017

Hi Krishna,

I hope I am not too late.

There is a publication by Serigne Mbaye Fallo Dia entitled by "A Very High Speed and Efficient CIC Decimation Filter Core" on 2006. They proposed a polyphase decomposition CIC filter and they managed to reach a sampling frequency up to 1.6 GHz. The major constrain regarding the decimation factor that it has to be power-of-two. Otherwise, the analysis of the polyphase is straight forward.

I didn't try it myself, but you might use these open-source codes for testing this approach.

https://opencores.org/project,cic

https://opencores.org/project,gppd

Hope it help.

Regards.

Ahmed.

[ - ]
Reply by Tim WescottApril 18, 2017

Dunno if this is the right answer, but search on "pipelined adder", or maybe "pipelined accumulator".  I just tried it and got oodles of results that at least looked like they were on track (i.e., "pipelined accumulator" didn't give me a bunch of hits on oil well equipment or power hydraulic manufacturers).

[ - ]
Reply by kazApril 18, 2017

Yes pipeline the adder. This introduces one delay stage but to all stream. So I believe you can then match this delay at next stage adders.

[ - ]
Reply by gretzteamApril 18, 2017

Make sure you're using delaying integrator so that the critical path is only that of one adder. Many CIC filters are drawn using non-delaying integrator, creating a very long critical path (which could even include all integrator adders AND all differentiators adders if you're not careful!).

[ - ]
Reply by flutekickApril 18, 2017

Hi gretzteam,

Thanks a lot for the comments. I am using the integrator with the delay in the forward path, so no issues there. But I further want to reduce the frequency at which these operate by parallelizing. 

Thanks,

Krishna

[ - ]
Reply by gretzteamApril 18, 2017

Then typically what people do is either:

1) split the CIC filter into multiple CIC stages. The constraints on the first stage will be less stringent so you can get by with a lower order filter, meaning less bit-growth.

2) use the polyphase representation of the CIC filter, that way you can push all the combinatorial logic at the lower rate. 

...or a combination of the above.

If you're not keeping all the bits at the output of the filter, you can also use bit-pruning, ie reducing the wordlength at each stages while keeping the added truncation noise below the LSB of the final output (it's all in the original Hogenauer paper). However that probably won't reduce the wordlength of the first few integrators except for very odd cases.

Dave

[ - ]
Reply by flutekickApril 18, 2017

Dave,

But polyphase for CIC would mean I can no longer implement a integrator/comb structure? I mean to say each of the polyphase would now look more like a regular FIR than a Integrator/comb?

Correct me if I am wrong


Krishna

[ - ]
Reply by gretzteamApril 18, 2017

You are correct, polyphase CIC is just a fancy name for FIR filter. Since it's decimating, you can use all the tricks that exists for decimating FIR filter, one of which is to move computation to the lower rate. Seems like this is something you are after. Of course that doesn't mean this approach is more 'efficient', it's just different.

If you select the right FIR coefficient, the overall response of a polyphase CIC is the same as that of a CIC filter, so it doesn't really matter if it's integrator/comb does it?

hint: when using polyphase CIC (an FIR filter), using the CIC coefficient is typically a bad idea, there is a better filter for a given order...

[ - ]
Reply by flutekickApril 18, 2017

Exactly Dave. CIC polyphase doesn't make much sense.

[ - ]
Reply by gretzteamApril 18, 2017

I wouldn't go that far...To me it's just an observation that the recursive structure of those filters is not always the preferred one depending on sampling rate, filter order, technology used etc. 

What I was saying is that if you're NOT going to use the recursive structure for any of those reasons, I don't see why you'd stick with the CIC impulse response, as it's pretty much guaranteed to not be optimal. Saying this, it is very possible that it is the optimal one if you're also constraining the FIR to simple integer coefficients, one would have to verify.

Dave

[ - ]
Reply by flutekickApril 18, 2017

In my case, I want the decimation ratios to be programmable. In a CIC, one need not design the filter again for different decimation ratio. But it is not the case with FIR, right?

I am trying to see if I could somehow parallelize the CIC operation (the integrator) in particular and let it run slower.

Krishna

[ - ]
Reply by napiermApril 18, 2017

I've never seen an implementation of a parallel CIC integrator but it should be possible.  Integration is a linear operation so it can be spread out and the results added later.  I don't think this should be too bad if you are willing to put in delays and adder trees.

If I was doing this I would string together the operation as variables such as A, A+B, A+B+C, ... etc.  Then put together the same thing with Integrator stages after a 1 to N commutator.  So then figure out how to delay the intermediate answers such the the sum gives you the same answer as the single full rate Integrator.  You should be able to reduce the clock rate by N at the integrator for the expense of at least N times more hardware.  Be aware that you will have the same bit width throughout the stage for proper operation.

Get this to work for a single integrator and you are there.  The output will be spread out: it will look like another 1 to N commutator after a single full rate integrator stage.

Cheers,

Mark Napier


[ - ]
Reply by jmarceloldApril 18, 2017

Accumulation is a linear operation, so it is possible. You can have N parallel accumulators, preceded by a commutator, and followed by a pipeline adder to sum up the accumulators output before the differentiation stage. This reduces the clock frequency at the accumulators register by N. The output of the pipeline adder is still updated at frequency rate of the input signal.