I have specifications for an upsampling filter chain on an ASIC and need recommendations for a more efficient design approach.
The filtering happens after upsampling, with the input sampling rate of f_s. The low-pass filter requirements are:
Passband ripple: 0.01
Stopband attenuation: 86 dB
Assumptions (normalized frequencies based on the sampling frequency):
Cutoff frequency: wc = 0.37 * pi
Stopband edge: ws = 0.6 * pi
Note: wc + ws != pi
Given these constraints, using a half-band FIR filter is not optimal. question1:
What filter structure would be more efficient for these specifications than a half-band filter?
question2:
Is using the least squares algorithm a good choice for calculating filter coefficients, or is there a better approach? Thanks in advance for your insights!
Excuse me for pushing my own favorite topic.
Why use an FIR filter? If you don't have a latency constraint, there is nice way to make a linear phase recursive filter.
Start by designing an elliptic filter, which is a recursive filter. But when you specify the pass band ripple and the stop band attenuation, make these the square root of what you really need. You are going to run your data through the filter in both directions, that is, first filtering samples as they arrive in increasing time, and then use an identical filter using the outputs of the forward time filter as the inputs, but taking them in the reverse direction.
OK, that gives you a linear phase filter and its attenuation is the square of what the recursive filter design achieved for you.
The immediate problem with this approach is that you seem to need to run the "forward direction" filter all the way to t=infinity before you can begin using the "backward direction" filter.
For decades, this was an insoluble difficulty. Not any more.
Partition your input into blocks of constant length. Call the block length B. For each block, you can run the filters, in both directions, taking rather a lot less than infinite time. Look at the impulse response of the designed recursive elliptic filter. Once the last input of the block arrives, continue running the filter with inputs equal to zero. This can continue until the output of the recursive filter get to be zero. Of course it never gets to be zero precisely, but it will get to be close enough to zero that it is all noise after that. I'm going to call the length of the impulse response to the point that the impulse response dies out as L. Since the impulse response decays exponentially, what you decide is L won't change much depending on how you decided that it was near enough to the quantization noise level.
At this point you have a filter output whose length is B+L.
Now use that output as the input to the reverse direction filter. The output of the reverse direction filter will be at the noise length after B+2L steps.
You have now run a combined filter with the specifications you wanted, but you haven't run it on all your data. So run the next block of your data through the same process. The output of your second block will have an overlap of 2L data points when you align it with the first block. I am assuming that your original block length B is much larger than L. (Here is where my initial comment about not needing low latency matters.)
Obviously you can process a third block, and a fourth, etc. in the same way.
Stepping back, you need a lot more memory that you would need for an FIR design, but memory is relatively cheap. I contend that you will, over a wide range of designs, find that you need many many fewer multiplications than you would need to get the same specification of attenuation, and transition band width, than you would need with an FIR filter.
If you have the time to invest in creative design, you will find that you can reuse some of the hardware modules. The reuse of the hardware for each block is straightforward, but there is also reuse of the multipliers in both the forward direction filter and the backward direction filter.
Of course, if you were doing this process in an old fashioned computer rather than an ASIC, it would take far less creativity.
How long is the latency? I plan to implement it for communication-related work, so the latency needs to be minimal. I considered an IIR filter as an alternative, but its latency makes it unsuitable if I underestood it correctly. Is there any way to reduce the latency?
My scheme's extra latency is a consequence of the need to run the samples through a filter in the backward direction. The bigger the block, the worse the latency. In the applications I used it for, the block size was about 200. The length of the IIR impulse response between start and negligible amplitude point was less than that, necessarily less than that, but not much since the block length was chosen to be more, to save computation. If computation was free, you could make the block length quite short, but not less than twice the length of the IIR impulse response, and then, of course, you'd use an FIR filter anyway.
I really didn't care much about the latency because there was no feedback in my application.
Hello,
I will need more information.
If I understand correctly, we have:
Fs_in = sample_rate; % input sample rate.
Fc = 0.6 * Fs_in/2; % Desired cut off. Do you mean start of the stop band?
Fs = 0.37 * Fs_in/2; % Stop band? Do you mean pass band edge?
Pass band ripple < 0.01 dB.
Stop band ripple < 86 dB.
Fs_out = interp_rate * Fs_in; % Output sample rate. What is interp_rate? Integer? Variable?
Is this application for communications? Do you need it to be linear phase? If so can it be approximately linear over just the pass band?
Also, for an ASIC using multipliers for a FIR is very wasteful. You'll find a blog post by Neil Robertson on CSD representations for multiplication. We used a script to search for a minimum CSD given a needed response. Once we had the coefficients there is another script to turn that into a Verilog implementation that is very small. Not good for FPGAs but great for an ASIC.
Generally you get a more efficient implementation to filter to band limit at a lower sample rate and then interpolate. Depending on what you are doing. Cascaded half bands, CIC filters, etc. are all possibilities.
Cheers,
Mark Napier
Here is the post that Mark referenced:
https://www.dsprelated.com/showarticle/1011.php
I also wrote two posts on Multiplierless halfband filters:
https://www.dsprelated.com/showarticle/1585.php
https://www.dsprelated.com/showarticle/1609.php
-- Neil
Yeah, I see, I wrote that part incorrectly. What you mentioned is correct.
Fs_in = sample_rate; % input sample rate.
Fc = 0.6 * Fs_in/2; % start of the stop band?
Fs = 0.37 * Fs_in/2; pass band edge
Pass band ripple < 0.01 dB.
Stop band ripple should attenuate 86 db of the input signal.
Fs_out = interp_rate * Fs_in; interpr_rate is an integer, that could be 2,3,4 at most since the cost of higher interpolation is huge on the hardware.
Yes, the filter should be linear over the passband, and it is indeed for communication at very high frequencies, above 10 GHz.
The problem with CIC filters at high frequencies is that the accumulation part of the filter takes a very long time, making it unsuitable for my case. This is because there are a lot of additions and accumulations at such high frequencies, leading to significant delays. Additionally, my filter requirements cannot be satisfied with a half-band FIR filter. Are there any other structures, such as a third-band filter, that consume less energy? Energy consumption is a critical concern in my application.
Could you clarify this part:
"Also, for an ASIC using multipliers for a FIR is very wasteful. You'll find a blog post by Neil Robertson on CSD representations for multiplication. We used a script to search for a minimum CSD given a needed response. Once we had the coefficients, there is another script to turn that into a Verilog implementation that is very small. Not good for FPGAs but great for an ASIC."
Do you mean your FIR filter uses the CSD representation from the beginning to find the coefficients, or are the coefficients converted to CSD afterward?
Also, regarding this statement:
"Once we had the coefficients, there is another script to turn that into a Verilog implementation that is very small."
Could you assist me with this part? I'd like to understand how the script works for generating the Verilog implementation.
Hi,
The filter uses the CSD coefficients from the beginning. Mark can answer the question about the script.
-- Neil
Based on the statement: "We used a script to search for a minimum CSD given a needed response," is there still a need for an MILP optimization formula? Since it is already minimized, would combining it with MILP further optimize the design? or u probably using MILP?
Hello,
To start with, a CIC has a linear phase response. It is essentially a recursive box-car filter. Delay isn't a problem. If you are talking about the register-less micro-architecture of the CIC flow graph, that isn't a problem either. You can put registers at stages to meet timing at very high clock rates. The CIC gain droop often isn’t a problem for interpolation but if so can be compensated for by shaping the pass band response in the band-limiting filter that you need anyway.
If you truly must have the absolute minimum in delay for a given filter response you get that by zero stuffing your input samples to the output rate and then filtering. So your delay is that of the FIR. But at a high cost due to the filter requirements. Everything is a trade-off. An airplane is a long series of compromises flying together in close formation.
However, if I understand correctly, you may not have the time to do any math at the output rate at a very high clock frequency. If that's true then you have to go parallel at a lower sample rate and then mux in the samples. Depends what you are doing.
About CSD. First consider fixed point coefficients. You can use any filter approximation method you like. I always use remez. Sometimes with the 1/f roll off in the stop-band ala fred harris. So this gives you a response using floating point coefficients. Great. So then truncate to fixed point. Depending on the values of the smaller coefficients your response suffers. So for FPGAs I use a script where I vary the gain of the filter over a range looking for the best stop-band performance ratio. Also vary the stop band edge a bit. The goal is to get some variation in the output coefficients and looking for minimums where I get the best response. I have a “cost” function that scores the response. This may just be peak deviation in the stop band. (Note for Xilinx series 7 and beyond there are features that can make this much less of a problem)
Now CSD. Each coefficient is represented by a very few bits. 1, 2 3. Just depends. The filter response takes a big hit.10 dB is not uncommon. If you are willing to pay, (and if you are doing an ASIC then yes, you are willing) there are software packages available that will optimize the CSD to maybe only cost you 1 or 2 dB. Personally I dislike canned software. I can’t see what is under the hood and I can’t optimize it for whatever odd-ball thing I’m doing.
My approach in the ASIC group was to write a search script like the fixed point one. I had my filter requirements. The outer loop might tweak a parameter, say the pass band edge. Small tweak that didn’t really matter. Inner loop might let the gain move over a range. Inside I would call remez to get my floating point filter coefficients. Then call the pair-wise CSD optimization script that Neil wrote and I modified for my use. Score the output and if it was any good (better than x-number of others) save the design parameters. Let the script grind on to find some good ones. I might experiment with a few likely candidates and see if anything better might fall out. You’re going to make millions of copies of this in an ASIC so it is time well spent.
I was going to share a reference but I can’t find that paper on pair-wise dithering of the CSD coefficients. Try IEEE search.
The automatic Verilog FIR generation tool is a C program written by Ray Sanders, another brilliant engineer I was very blessed to work with. And like so many other tools you hear about it’s just not mine to give out.
Mark Napier
Thank you for your response and clarification. It was very helpful, and I now have a better understanding of how I can move forward with my design.
I couldn't find any articles online discussing delay in this context, so it seems there might not be any significant delay after all. My initial assumption was that multiple additions could introduce some delay, but this may not be the case.
Initially, I considered implementing an equivalent CIC filter for interpolation to compare its area and power consumption with a polyphase FIR filter under the same requirements. However, after discussing it with someone, I thought it might not be worth the effort and could be a waste of time.
Is there any post, article, or book that explains CIC filters with high rates in ASIC design?
Regarding cost, is the number of adders a significant concern for CIC filters? For instance, if the number of adders becomes too large, could this pose an issue?
Lastly, pair-wise CSD optimization seems to be a form of subexpression sharing. Is that correct?
For the Verilog implementation, I considered using the MATLAB HDL Coder toolbox to generate HDL code. Do you think this would be a suitable approach, or would it be less optimized for this specific case?
I think Mark was referring to the CSD optimization technique presented in the following paper:
Samueli, Henry, "An Improved Search Algorithm for the Design of Multiplierless FIR Filters with Powers-of-Two Coefficients", IEEE Transactions on Circuits and Systems, July, 1989.
Hey Neil! Yes, that's the one. I had forgotten it. I couldn't see a way to make that work but you wrote a script that did it.
Mark,
That Samueli paper is now 35 years old!
Just for the record, my DSPrelated post on multiplierless FIR filters does NOT use the bivariate search discussed in the Samueli paper. The reason is that my bivariate search code was long with several subroutines. The code in my post is very rudimentary: it simply determines the main tap scaling that results in the fewest total significant digits for the filter. There is no cap on the number of SD's, except as determined by the number of coefficient bits (8, 9, 10, etc).
Link to my post:
I've never used the HDL coder myself. These are the CIC papers I've used:
01163535 E.B. Hogenauer, "An Economical Class of Digital Filters for Decimation and Interpolation," IEEE TRANSACTIONS ONACOUSTICS, SPEECH, AND SIGNALPROCESSING, VOL.ASSP-29, NO. 2, APRIL 1981.
05537066G.J. Dolecek, L. Dolecek, "Novel Multiplierless Wide-Band CIC Compensator,"Proceedings of the IEEE Symposium on Circuits and Systems, May 30-June 2, 2010, pp. 2119-2122.
01364115K. S. Yeung and S. C. Chan, "The design and multiplier-less realization of software radio receivers with reduced system delay," IEEE Trans. on Circuits and Systems-I: Regular papers, vol.51, no.12, pp.2444-2459, Dec. 2004.
04912329A. Fernandez Vazquez and G. Jovanovic Dolecek, "A general method to design GCF compensation filter", IEEE Trans. on Circuits and Systems II: Express Briefs, vol.56, no.5, pp.409-413, May 2009.
04625201G. Jovanovic-Dolecek and S. K. Mitra, “Simple method for compensation of CIC decimation filter,” Electron. Lett., vol. 44, no. 19, pp. 1162-1163, Sep. 2008.
04537383G.J. Dolecek and S.K. Mitra, "On Design of CIC Decimation Filter with Improved Response," 3rd International Symposium on Communications, Control and Signal Processing,ISCCSP 2008, Malta, 12-14 March 2008, pp. 1072-1076.
01487669G. Jovanovic-Dolecek and S. K.Mitra, “A new two-stage sharpened comb decimator,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1414–1420, Jul. 2005.
04195638M. Laddomada, “Generalized comb decimator filter for ΣΔ A/D converters: Analysis and design,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 5, pp. 994–1005, May 2007.
04468692M. Laddomada, “On the polyphase decomposition for design of generalized comb decimation filters,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 2287–2299, Sep. 2008.
00885128L. L. Presti, “Efficient modified-sinc filters for sigma-delta Σ/Δ converters,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 11, pp. 1204–1213, Nov. 2000.
04156403M. Laddomada, “Comb-based decimation filters for ΣΔ A/D converters: Novel schemes and comparisons,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 1769–1779, May 2007.
00749080I. W. Selesnick, “Low-pass filter realizable as all-pass sums: Design via a new flat delay filter,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 1, pp. 40–50, Jan. 1999.
04700271G. J. Dolecek and F. Harris, "Design of CIC Compensator Filter in a Digital IF Receiver," International Symposium on Communications and Information Technologies, pp. 638-643, Oct. 21-23, 2008. (Note: Wow!)
05213597G. J. Dolecek and F. Harris, "On Design of Two-Stage CIC Compensation Filter," IEEE International Symposium on Industrial Electronics (ISlE 2009), pp. 903-908, Seoul Olympic Parktel, Seoul, Korea July 5-8, 2009.
Ricardo A. Losada and Richard Lyons, "Reducing CIC Filter Complexity,"IEEE DSP Tips and Tricks
That last one has a nice trick to simplify a CIC interpolator.
A quick question regarding CIC filters: Suppose I want to process data using a polyphase CIC filter, as shown in the figure below. The suggestion here is to downsample, but I am performing upsampling.
How is the CIC filter considered multiplierless if we still perform multiplication in the first stage of the polyphase implementation? Is this design correct from the beginning, and will the delay time of the accumulator become excessively large?
Should I divide the input into polyphases as suggested and then perform the CIC filter, or is there a better approach? Are there alternative ways to design a CIC filter without dividing it into polyphases, given that the system clock cannot be infinitely fast? Additionally, how can I design and test such a filter, both in software and hardware?
Some kind of pseudocode or example implementation would be very helpful for understanding how to approach this.
Hello,
I can't see the image.
Also, what is the polyphase CIC decomposition used. Is it one of the papers?
Just so I have an idea, what is the clock/sample frequency of your output stage? Can you get an adder to close timing?
Mark Napier
Sorry, there was something wrong with uploading the picture. I hope you can see that now.
I need to implement a polyphase CIC filter to achieve these high frequencies. My target output frequencies are at least 10 GHz, with a system clock frequency limited to 1 GHz. To reach these high frequencies, I must use a polyphase CIC filter or include a compensation stage.
The paper suggesting the polyphase CIC filter is titled:
High speed polyphase CIC decimation filters | IEEE Conference Publication | IEEE Xplore
Same kina design mentioned in this one:
04912329A. Fernandez Vazquez and G. Jovanovic Dolecek, "A general method to design GCF compensation filter", IEEE Trans. on Circuits and Systems II: Express Briefs, vol.56, no.5, pp.409-413, May 2009.
My main concern is the polyphase decomposition process. Since interpolation happens in several stages, let’s assume the input frequency is 1 GHz. After interpolating by 2, the resulting 2 GHz signal cannot be processed in a single phase because the system clock is limited to 1 GHz. To address this, the data is divided into odd and even samples using delay elements and processed simultaneously. This creates multiple streams of data after each upsampling stage. For example, 2 GHz is not a single stream at 2 GHz but rather two streams of 1 GHz each, which together correspond to 2 GHz.
If I perform an interpolation by 4 on data sampled at a 1 GHz clock, I will get four streams of 1 GHz data as output. This process continues until the desired output frequency is achieved. For instance, if the desired output frequency is 16 GHz and the input is 1 GHz, I would need to perform an interpolation of 4 followed by another interpolation of 4 (or use another suitable combination). So basically 16 addition in the last stage is required. I have included a picture that will explain this idea with 4 coefficients for simplicity.
Another concern is that many of the articles primarily focus on sigma-delta filters rather than traditional digital filters. Is it feasible to achieve these high speeds using polyphase decomposition for the compensation stage while integrating it with a traditional CIC filter? Can the same approach be applied to a conventional CIC filter instead of a sigma-delta filter? Is it possible for a digital up-converter to utilize a CIC implementation at these high frequencies?
My idea is to have a CIC after each of the stream after adding them up from different stream as show in the picture. so basically 16 CIC-filter after each addition of different streams. I forget to include the interpolation factor here btw but it exisit.
1GHz => 2GHz ...=> 16GHz on system clock of 1GHz? I wouldn't go that way.
Your post was about FIR but now changed to super-sampling filter design. When filtering in such cases there is dependency between all streams and it is terribly complicated.
One possible way is to have one upsampling FIR filter (prototype) with 16 polyphases (subfilters).
It can run on on 1GHz outputting 16 parallel streams each @ 1GHz.
You cannot reuse resource as all 16 subfilters need to run in parallel.
So in such case use halfband prototype filter and exploit small value coeffs.
Using multiplier-less CIC followed by compensation will just complicate it.
If you then have 16GHz clock (and I am surprised at such clock) then you can serialise the samples 0-15 back onto one stream.
But it is still FIR FILTER. Just in multi stream which is very common in dsp related stuff. Instead of accumulating each data set, they are just sending out independently. But anyway, using cic followed by compensation is even possible? How is that gonna be complicated when we just adding a cic after each stream of upsampled data? (Or probably the other way around). could you please explain it why? In my eyes, We just giving each branch its cic filter and hopefully it will reduce the power consumption of overall design and area consumption probably b doing larger interpolation at once. Like 4.
Any filtering (FIR or IIR) is based on a pipe of signal samples. When clock rate is same as signal rate that is straight forward. But when signal rate is multiple of clock rate you split up samples across multiple parallel substreams. As such the pipe is broken and you have to account for that per each substream pipe. You can't just filter and have final adder.
For example:
2 parallel samples per clock require four sub-filters (two polyphases)
3 samples per clock require 9 sub-filters (3 polyphases)
4 samples per clock require 16 sub-filters (4 polyphases)
Other techniques are available to reduce sub-filters at expense of adders/subtractors.
The sub-filters are polyphases of the prototype filter.
in each case you need some specific delay arrangements at inputs or outputs or both before you add/subtract to/from substreams.
The easiest one is 2 samples per clock single rate FIR as below:
Yeah, you are right. In the picture I uploaded, there is something wrong. The input signal should be divided into 4 substreams (x(0), x(1), x(2), x(3)), and each of them should go through a separate filter, making a total of 16 subfilters.
1: Would using a CIC filter become challenging because it would require 16 CIC filters, one for each substream? Cause I thought it might be possible to combine the substreams after the polyphase FIR part into 4 streams and then perform CIC filtering on those 4 streams.
2: If I understand CIC filters correctly, I could divide the upsampling factor into stages. For example, in the first stage with the FIR, I could interpolate by a factor of 2, followed by interpolation using the CIC filter by another factor of 2, resulting in a total upsampling factor of 4.
This way, we could perform interpolation by a larger factor in multiple stages.
Try and consider my proposal for 16 polyphase FIR upsampler using your prototype filter of choice.
With CIC you have to adapt the pipe specifically for supersampling CIC requiring plenty prior modelling as there is not much info around.
Hello,
I'm more than a little confused. If you want a 10 GHz sample rate with a 1 GHz clock does that mean you are delivering 10 samples over the interface on every system clock? Difficult to imagine but if so that clarifies things a bit.
I'll assume that you can at least have an adder working at that 1 GHz rate.
Yeah, breaking this up in parallel is the only thing I can see working. I think it is harder if not a power of 2 interpolation but maybe not. Obviously I have not done this type of decomposition with the CIC but my gut says it is possible.
I'm thinking about the internal state of the 1st stage of the output side after the up-sampler (or zero-order hold). The input to this stage is static for X sample cycles. So why can't each successive output sample of that 1st stage be calculated by multiples [1,2,.. (X-1)] of that static input into copies of the 1st stage. Again think CSD with shared terms for each multiplier. It is a look-ahead scheme. I would model this and compare against a simple CIC running at full rate. There might also be some shuffling to get the right initial conditions into each stage copy at each input cycle. Seems like a similar kind of look-ahead for the following stages too. First blush I think it would work. 2nd stage the input is not static over X cycles so there could be more look-ahead type logic.
It is also quite doable compute the output with a FIR in parallel. Maybe bigger but there are an awful lot of inserted zeros that you don't have to compute for the output.
I'm also thinking again of half-band filters. Maybe with a 1/3 band filter at the start. A lot of parallel pipes because of your system clock WRT the output sample rate but still seems like a possibility to save ASIC gates.
Also wondering about the sigma-delta converter. If you are simply running a DAC at an insanely high rate (????) then this converter is often used.
Mark Napier
Some new designs in industry (FPGAs or ASIC) tend to use system clock at a suitable maximum rate but increase processing by parallelism into multiple streams and hence use much higher resource. This is due to clock technology limits on such platforms yet increased demand on data rates.
Modules like mixers are easy but memory based modules like filters or FFTs, DPD ...etc. get complicated due to dependency of all substreams.
At the DAC/ADC, I assume, the final signal is serialised/deserialised on the fast enough clock. This way the burden of timing closure is shifted to DAC/ADC.
There are many different FIR filter design approaches that could be applied here. I've used Parks-McClellan/Remez exchange for many years, but that's considered a bit old-school these days. Do you have access to any filter design software? Many give options on providing filter specifications and then selecting the design/optimization algorithms.
Remez/P-McC is old school and so is Least Squares (firls()). But what's wrong with it?
If your FIR filter is limited in length to be L and *if* the measure or norm is the L-infinity norm, that you just want to minimize the maximum error (the Tchebyshev norm) in either passband or stopband, then I don't think that theoretically you can get a better result than P-McC.
Now I don't necessarily want to do that. Sometimes firls() looks better to me (and sounds better) than what comes outa firpm(). Also sometimes Kaiser-windowed inverse FFT design looks better.
But if I strictly wanted to minimize the maximum error, I can't think of another method than firpm().
The Remez function sometimes overestimates the stopband ripple, which can result in excessive attenuation of the stopband. While this is not bad, it may unnecessarily increase the filter's complexity. or am I wrong?
Most filter design methods require a bit of an iterative development approach, since it's difficult to get exactly what you specify, especially if you're pushing the boundaries of what a particular filter order might be able to achieve. If you're resource limited in any dimension, that's probably going to be the case, so it's usually a process of design it, see if it does what you want, tweak the parameters, and try again.
So P-M/Remez exchange is no different that way, but what it does do is give you a very quick way of developing and tweaking filters to get the response you need with limited computational resources (i.e., fewer taps, lower dynamic range, whatever).
This is why I've rarely found a need to use other methods because, as rbj mentioned, what's wrong with it? Sometimes there are other metrics that need to be optimized, and many times it's just finding a methodology that works efficiently for you, so it can be just a personalization issue as well. Finding something that makes sense and works for you is a big part of the battle.
I've used a lot of other design methods as well, but usually when there was a compelling reason to do so based on the requirements or characteristics of that particular problem.
You're really the only one that knows the subtleties of what you need for this particular problem, so take the inputs here and I hope you find something that works for you.
As a filter design Toolbox I like pyfda. It's FOSS.
https://github.com/chipmuenk/pyfda
(I'm not affiliated with this project, just like this tool)
I hope I'm not stating the obvious, but since the filtering is happening after upsampling, you should probably use a polyphase filter architecture. That reduces your resource requirement by a factor roughly equal to the upsampling factor. It doesn't change anything mathematically (including your coefficient design), but can make a huge difference in silicon area.
I think the obvious potential problem with Least Squares is that it can produce large errors across narrow frequency bands (with tiny errors everywhere else). It is often preferable to instead minimize the worst error (that's the "minimax" a.k.a. Chebyshev criterion). This is what the Parks-McClellan/Remez exchange algorithm (mentioned by Slartibartfast) does. That's also what I would probably use in this case.
Yes, I do have a polyphase structure since the system clock cannot run at arbitrary speeds, and the input data is processed in polyphases. However, my question is whether there is an equivalent approach to replace the halfband FIR filter, such as a third-band filter or another structure, that can be combined with the polyphase structure to reduce the overall number of multiplications and achieve greater efficiency compared to the halfband filter combined with polyphase filter becuase as you may know it is possible to have halfnband polyphase fir filter,
And I want to reduce the power at any cost. Since Remez tends to over-attenuate the stopband, wouldn't it be better to use a least-squares approach instead, or am I completely wrong?
For upsampling, the halfband filter is just convenient. You can use any filter that passes your signal band and removes the copies(images) of zero insertion. And any such filter can then be implemented as polyphases.
Yes, exactly! I agree with you, but could you name some of these types of structures? I’ve heard that I could use two all-pass filters and combine them to get the desired output but i dont underestadn that one and if there are there other kinds of structures that could be used?
Matlab/Octave offers fir1, fir2, firpm ... and others plus various windowing methods, weight methods ... etc.
You need to play with until you hit the required attenuation.
any single LPF can fit polyphase structure if upsampling.
So why use two filters, two designs, two tests...etc. With FPGA/ASIC you target minimum modules. In contrast soft DSP here can just call short statements and it is done !!
The structure of a filter is not the same as the method used to design its coefficients. Methods like fir1
, fir2
, etc., are used to determine the filter coefficients, but there are additional structural designs, such as half-band filters, that further optimize the coefficients. For example, half-band filters set half of the coefficients to zero, which reduces the computational complexity.
"Any single LPF can fit a polyphase structure if upsampling." Exactly, but I’m wondering if there are other structures that similarly set some coefficients to zero while also reducing the filter’s cost—either by minimizing the number of taps or strategically setting some taps to zero. I can use half band filter, but I know that is not optimal and there should be other approaches.
When you split up a filter into two polyphases you do that because there are zeros in the input (assumed or physically inserted). As such the half band zeros become obsolete from resource perspective.
zeros in coeffs, also even/odd symmetry, small values ...etc can help per filter or polyphase if available.
I find that the most resource saving strategy for LPF is to implement those small side values in adders and only use dedicated mults for large values but if speed allows
Hello,
I've used the two-path all-pass half-band filter for decimation in an FPGA. It is very small if you don't care about linear phase. There is a linear phase variation as well but I didn't need it.
However, it does require a multiplier and enough bits in the coefficient (18 in the Xilinx) to get an accurate response. At high speed in an ASIC these multipliers would be large and hungry.
Just my opinion but I don't think they are a good fit.
Mark Napier