I need to convert a (fixed point) stereo 16bit/32kHz stream into stereo 16bit/16kHz (on a Cortex M4). For this I have successfully used the CMSIS-DSP function arm_fir_decimate_q15(). However, with a 115 tap FIR filter, the call takes ~3.68ms to execute which is way too long for my application to tolerate. So I try the same with a 2 tap filter instead (which, surprisingly, does the trick as well) and now the call takes ~300us. A questions though; What is taking so much time ? A plain arm_copy_q15(), on the same block size, takes ~17us and you would think that with an almost "non existent" FIR filter the arm_fir_decimate_q15() function would only be tasked with throwing away every other sample. But then, I'm not very experienced in signal processing ...
Can someone explain what is going on ?
Is there a faster method to accoplish the same goal ?
Thanks in advance !
For 32KHz to 16KHz, if there is no significant power beyond 8KHz then direct decimation will do.
To elaborate a bit: I have a MEMS microphone connected to an nRF52. To get the wanted 16kHz sample rate I need to set the PDM clock to 1MHz. Now, most PDM microphones have a "low power" mode when clocked below ~700kHz and a "normal power" mode when clocked with >1.1MHz. So 1MHz is "illegal" but it seems my microphone is operating in "low power" mode because the sensitivity is higher(!) than the listed one for "normal power" mode. I need the lower sensitivity.
So, due to the design of the PDM pheripheral, clocking at 2MHz (to get the correct sensitivity) yields a 32kHz samplerate. Since the rest of my system expects a 16kHz stream (a proprietary radioprotocol etc) I need to bring the samplerate down to 16Khz.
I tried discarding every second sample but the frequency content between 8kHz-16kHz, previously blocked by the PDM pheripheral's LP filter, now create aliasing. Consequently I need to do some additional filtering before the decimation.
I am not familiar with your setup specifically but if we assume the group delay of filter is behind the delay then I get:
for 115 taps group delay = 1/16000*114/2*1000 = 3.56 ms
for 3 taps group delay ~= 1/16000*2/2*1000000 = 62.5us
The extra delay could be due to delay of register pipes.
Thanks for you kind advice !
Yes, if your assumptions are valid the decimation part takes:
115 taps: 3.68(my measurement) - 3.56 = 120us
3 taps : 300-62.5 = 237,5us.
The numbers does not completely add up but at least it is in the same ballpark, i.e. 100-200us.
I have not yet tested if the 300us is tolerable to my application. Hope it is ...
I am not sure what you mean by replacing the 115 tap filter with a 2 tap filter. Are you filtering your signal before using arm_fir_decimate_q15().
ARM arm_fir_decimate_q15() has an internal FIR filtering, so you do not need an extra filtering and the longer time could be due to this additional filtering.
There is also a fast version: arm_fir_decimate_fast_q15().
Also, if your signal is clean after 8 kHz, you can simply down sample the signal (take every other sample) rather than using decimation.
Please see my reply to kaz.
I tried a standalone FIR filter before droppingh every other sample but I did not get the desired result. So I tried arm_fir_decimate_q15 instead. That works but, as explained, takes too much time.
The arm_fir_decimate_q15 function expects some FIR filter coefficients and those are the ones that I relate to when talking about 115 or 2 taps.
Also tried with arm_fir_decimate_fast_q15. It brings the cycles down a bit but not significantly.
So my original question remains.
I was about to suggest the same thing "standalone FIR filter and then every other sample" but if you tried and it did not work then it could hint to some other problems. My suggestion is to simulate it first (either matlab or python), feed the input signal to the simulated FIR filter and compare its output with the implemented FIR code. Then test the "every other sample" part. Eventually when they match, you can replace the code with the arm_fir_decimate_q15.
If my application does not tolerate the 300us I will need to revisit the "standalon FIR" approach. Should be able to bring it down below 100us with a super short FIR filter. Which me luck !
Seems my application is fine with the 352us delay I get with arm_fir_decimate_fast_q15 and the 2-tap filter. So I'm happy with this for now.
Thank you so much SaeidSeyed/kaz for taking your time.